-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: node scores computation and storage #222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work here @jorgeantonio21 this is some critical work 🚀 🥳 , and I am sure we may have to adjust this once it's integrated based on observations of the scores.
I've left some comments and suggestions.
I also think it would suit us to have some additional property-based tests for scoring functions e.g.
proptest! {
#[test]
fn test_gpu_vram_score_properties(
free in vec(0i64..32_000_000_000i64, 1..4),
total in vec(32_000_000_000i64..64_000_000_000i64, 1..4)
) {
let score = compute_gpu_vram_utilization_score(&free, &total, 0.3);
prop_assert!(score >= -1.0 && score <= 0.3);
}
#[test]
fn test_gpu_temp_score_properties(temps in vec(20.0f64..100.0f64, 1..4)) {
let score = compute_gpu_temp_score(&temps, 80.0, 100.0, 0.2);
prop_assert!(score >= -1.0 && score <= 0.2);
}
}
and also some integration tests for complete scoring pipeline e.g
#[tokio::test]
async fn test_performance_scoring_pipeline() -> Result<()> {
let state = setup_test_db().await;
let metrics = NodeMetrics {
node_small_id: 1,
cpu_usage: 0.5,
ram_used: 8_000_000_000,
ram_total: 16_000_000_000,
gpu_memory_free: vec![8_000_000_000],
gpu_memory_total: vec![16_000_000_000],
gpu_percentage_time_execution: vec![0.3],
gpu_temperatures: vec![70.0],
gpu_power_usages: vec![180.0],
};
let performance = get_node_performance(&state, metrics).await?;
// Test score components and correlations
assert!(performance.total_performance_score >= 0.0 && performance.total_performance_score <= 1.0);
// Test score changes with metrics changes
let worse_metrics = NodeMetrics {
cpu_usage: 0.9,
..metrics
};
let worse_performance = get_node_performance(&state, worse_metrics).await?;
assert!(worse_performance.total_performance_score < performance.total_performance_score);
Ok(())
}
Would love to hear your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @jorgeantonio21
No description provided.