-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[METRICS] New thread for waiting list checking #1836
[METRICS] New thread for waiting list checking #1836
Conversation
不好意思,可否分享這段程式碼的連結給我? 另外我試著釐清是說, MetricStore 的 receiveJob (head) 會呼叫 receive 拿到 metrics 後會餵食給 sensor,由於 receive and sensor 是允許外部自己定義,所以外部使用者可能在過程中埋入了 wait 的邏輯,然後就導致 receiveJob (thread) 卡住?這樣正確嗎 |
不大一樣,主要是 |
不好意思,我想回頭討論一下 waiting 機制,目前看起來這個機制是讓外部可以傳入ㄧ個 predicator 來針對目前已經收集的 ClusterBean 做判斷 (timeout or pass),所以我不太確定為何需要在 |
是的,外部可以傳入一個 predicator 然後等待 latch 歸零,或是 timeout。 所以這邊外部並沒有在檢查(除了剛進來會做唯一一次檢查),而是掛著等待 "別人檢查"。而這個"別人"就是 MetricStore 的 receiverJob。 當初會讓 MetricStore 來檢查,而不是外部自己檢查,其實是為了不讓檢查不斷空轉,而是有 clusterBean 更新的時候才做檢查。當初的討論可參考 #1660 。 |
我在想是否就讓外部 user thread 去檢查就好?檢查如果沒有的話,就等一個內部的 lock,至於 receiveJob 就是在更新 bean 後去喚醒該 lock 就好,這樣實作應該不難,同時也避免讓 receiveJob (thread) 跟 user thread 綁再一起 |
贊同,這樣也就不用再建立新的 thread 了。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinghongfang 感謝更新,有一個跟設計有關的建議請看一下
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
@chinghongfang 請問這個還有更新嗎?把等待的邏輯再簡化一點 |
我已經照上面的建議更新了,還是請問還有什麼不夠精簡的嗎?再麻煩指教了。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinghongfang 不好意思,我忽略了更新,幾個建議請看一下
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/metrics/collector/MetricStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
實驗時發現,MetricStore 檢查 waiting list 會影響指標拉取速率,這隻 PR 改成獨立使用一條執行緒來檢查。
在使用 Cost Aware Assignor 時,該程式會等待 MetricStore 有"完整"效能指標時才會做計算,其判斷"完整"的方式是實際代入 cost-function 計算,會在 MetricStore 有更新指標的時候檢查。但這些計算不應該由佔用 "MetricStore 的拉取執行緒" ,MetricStore 應該繼續拉取最新指標。
所以這裡的作法是建立一個新的執行緒檢查,且同時只會有一個執行緒在執行檢查(透過
isChecking
確保一次只會有一個執行緒)。另外,先前的作法是看本次迴圈指標有無更新,再來執行檢查,但使用多執行緒執行的話,有可能會在前一個迴圈(執行檢查後)更新,為避免漏檢查,所以使用另一個參數
needChecking
確保每次更新都有檢查到。