-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧪 [DRAFT] Agent Scoring v4.0 #96
Draft
teslashibe
wants to merge
45
commits into
main
Choose a base branch
from
feat-scoring-v4
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add SemanticScorer class for analyzing post originality and uniqueness - Integrate semantic scoring into PostsScorer calculation - Adjust engagement and length weights to balance with semantic scores - Add detailed logging for score components and ranges Technical changes: - Introduce sentence-transformers for semantic analysis - Add cosine similarity calculations for post comparisons - Reduce engagement metric weights to prevent overshadowing - Set semantic weight to 3.0 to prioritize original content Breaking changes: None
- Add start_date and end_date parameters to PostsGetter initialization - Implement timestamp conversion for date range filtering - Update API request to include both 'since' and 'until' parameters - Improve logging to show date range information - Add explicit return for exception handling - Clean up .gitignore to exclude .pyc files This change allows more granular control over the time range when fetching posts, defaulting to the previous 7-day period if no dates are specified.
- Add SemanticScorer class for analyzing post originality and uniqueness - Integrate semantic scoring into PostsScorer calculation - Adjust engagement and length weights to balance with semantic scores - Add detailed logging for score components and ranges Technical changes: - Introduce sentence-transformers for semantic analysis - Add cosine similarity calculations for post comparisons - Reduce engagement metric weights to prevent overshadowing - Set semantic weight to 3.0 to prioritize original content Breaking changes: None
- Add start_date and end_date parameters to PostsGetter initialization - Implement timestamp conversion for date range filtering - Update API request to include both 'since' and 'until' parameters - Improve logging to show date range information - Add explicit return for exception handling - Clean up .gitignore to exclude .pyc files This change allows more granular control over the time range when fetching posts, defaulting to the previous 7-day period if no dates are specified.
…onfig - Convert PostsGetter to dataclass for cleaner initialization - Add constants for API configuration (URL, version, subnet path) - Introduce custom PostsAPIError exception - Improve error handling with specific HTTP error cases - Add default 7-day lookback period - Add debug logging for API authentication status - Reorganize methods for better readability - Add comprehensive test logging output Test: pytest tests/test_posts_getter.py -v -s
…nt-arena-subnet into feat-scoring-v4
- Add detailed module-level documentation with usage examples - Improve class and method docstrings with attributes and return types - Add error handling documentation for PostsAPIError - Include environment variable requirements - Document UTC timezone handling
- Rename posts_getter.py to get_agent_posts.py - Rename PostsGetter class to GetAgentPosts - Rename PostsAPIError to GetAgentPostsAPIError - Update imports in validator.py, semantic_scorer.py, and test files - Fix logger formatting in posts_scorer.py (debugf -> debug) This change improves naming consistency and better reflects the class's purpose of fetching agent posts specifically.
…odule - Align registration.py with get_agent_posts.py API client pattern - Add comprehensive module and method documentation - Introduce RegistrationAPIError for consistent error handling - Extract API constants and endpoints into class properties - Improve HTTP client setup and configuration - Standardize error handling and logging patterns - No breaking changes to existing interfaces This refactor improves code consistency and maintainability while keeping all existing functionality intact.
- Add IsActive field to RegisteredAgentResponse dataclass - Add _filter_agent_fields method to ValidatorRegistration and MockValidator - Improve error handling in fetch_registered_agents - Fix field filtering to match API response structure The changes ensure proper handling of API responses and prevent missing field errors when processing registered agents. This resolves the initial field mismatch errors and improves the robustness of agent data processing.
- Add proper handling of empty posts and texts in PostsScorer - Fix division by zero in SemanticScorer uniqueness calculation - Initialize scores with zeros for all registered agents - Add robust error handling for infinity values and normalization - Improve logging for better debugging and monitoring - Group posts by UID for more efficient processing This fixes issues with agent scoring when no posts are present and ensures all scores remain finite and properly normalized.
- Update test time window from 24h to 7d to match production - Add 7-day filter window in posts scorer - Add detailed logging for post filtering and UID mapping - Fix variable naming for consistency (date -> time) This ensures consistent scoring behavior between tests and production, and provides better visibility into the scoring process through logs.
- Add single progress bar for overall agent scoring process - Add detailed score statistics (min/max/avg) - Include Twitter usernames in score output - Disable redundant progress bars in semantic scorer - Clean up logging format for better readability The scoring process now shows clear progress and provides more detailed insights into the scoring results, making it easier to monitor long-running scoring operations.
- Add sampling to reduce computation load (max 1000 posts) - Reduce background samples for SHAP explainer from full set to 100 - Reduce nsamples parameter in SHAP value calculation to 100 - Add feature importance visualization with bar charts - Update return type to include feature importance scores Performance: Reduces computation time while maintaining statistical significance of feature importance calculations.
- Add HardwareConfig and PerformanceConfig classes for hardware-specific tuning - Implement auto-detection for CUDA GPUs and Apple Silicon - Optimize batch sizes and sample counts based on available hardware - Add hardware acceleration support in SemanticScorer - Update tests to use and log hardware configurations - Add torchvision and torchaudio dependencies Performance improvements: - Optimized batch processing for M3 Max (64GB) and high-end GPUs - Hardware-accelerated tensor operations for similarity scoring - Memory-efficient batching for large datasets - Auto-scaling configurations based on available RAM/GPU memory Testing: - Add hardware configuration logging in tests - Improve test output readability with structured logging - Add sample post logging with metrics
- Move PostsScorer to agent_scorer.py and rename to AgentScorer - Update imports across test files to use new AgentScorer location - Standardize progress stage names (SEMANTIC) across scoring modules - Fix semantic scoring progress bar status updates - Move hardware config to dedicated config module BREAKING CHANGES: - PostsScorer import path changed from validator.posts_scorer to validator.agent_scorer - PerformanceConfig and HardwareConfig moved to validator.config.hardware_config Migration: Update imports from: from validator.posts_scorer import PostsScorer, PerformanceConfig, HardwareConfig to: from validator.agent_scorer import PostsScorer from validator.config.hardware_config import PerformanceConfig, HardwareConfig
- Add dedicated progress configs for scoring and SHAP analysis - Remove step/s and stage indicators from progress display - Add agents/s rate tracking to scoring progress - Simplify progress bar updates and postfix information - Consolidate progress tracking in ScoringProgressConfig and ShapProgressConfig
- Add GracefulKiller class for handling SIGINT/SIGTERM signals - Implement separate hardware configs for scoring and SHAP calculations - Reduce SHAP sample sizes for better performance - Add progress tracking and interruption handling for SHAP calculations - Add cleanup handling in test suite
- Refactor FeatureImportanceCalculator to use consistent semantic scoring - Consolidate engagement metrics into single weighted score - Add explicit feature count in SHAP explainer - Improve progress bar information with feature names - Remove graceful shutdown handling for simpler error management - Add CSV exports for feature importance and agent metrics - Add detailed scoring results to test output files The changes ensure semantic scores are calculated consistently between AgentScorer and FeatureImportanceCalculator by sharing the same SemanticScorer instance. Engagement metrics are now pre-weighted and combined into a single score for clearer SHAP analysis. Test output is now saved to three files: - scoring_results_[timestamp].txt: Detailed analysis and statistics - feature_importance_[timestamp].csv: SHAP values and percentages - agent_metrics_[timestamp].csv: Per-agent scoring metrics Breaking changes: None
- Add hierarchical directory structure (YYYYMMDD/HHMMSS) for test results - Create metadata.json to track test configuration and file relationships - Simplify output filenames (remove timestamps from filenames) - Update file paths to use dated directory structure: * scoring_results.txt * feature_importance.csv * agent_metrics.csv * metadata.json Directory structure example: test_results/ 20240117/ # Date folder 151318/ # Timestamp folder scoring_results.txt feature_importance.csv agent_metrics.csv metadata.json The metadata.json file includes: - Timestamp information - Time range of analysis - File manifest - Test configuration details This change improves test result organization and makes historical comparisons easier while maintaining backward compatibility. Breaking changes: None
…calculations - Rebalance scoring weights to prioritize semantic analysis (80/15/5 split) - Add exponential penalty and non-linear scaling in semantic scoring - Normalize text length and reduce its weight in overall scoring - Align feature importance calculations with semantic scoring system - Add quality multiplier based on semantic scores - Cap engagement impact and add bonus for balanced engagement The changes improve semantic analysis importance while maintaining scoring consistency across all components. Feature importance calculations now better reflect the intended weight distribution between semantic, engagement, and length factors. Test results show improved balance between semantic (31.11%) and engagement (68.86%) scores, with reduced text length impact (0.04%) as intended.
- Add keyword stuffing detection to better identify low-quality content - Introduce penalties for repetitive phrases and templated content - Add minimum post length requirement (20 chars) with penalties - Add async processing of agent SHAP values for performance - Add per-agent SHAP analysis to test results output The changes improve semantic scoring by detecting and penalizing: - Repetitive keyword usage and phrases - Template-like content patterns - Short or incomplete posts - Low-effort, repetitive messaging Test results show improved detection of low-quality content while maintaining scoring consistency. SHAP analysis now provides per-agent feature importance breakdown for better insight into scoring factors. Breaking
- Add follower score component (25% weight) to post scoring calculation - Adjust weights: semantic (55%), engagement (15%), follower (25%), length (5%) - Add debug logging for agent and UserID matching - Improve error handling and logging for unmatched UserIDs - Update feature importance calculator to include follower metrics Part of #[ticket-number]
…system" This reverts commit 770221c.
- Add ProfileScorer class for evaluating X/Twitter profiles - Implement follower count normalization and verification status scoring - Add configurable weights for score components - Add comprehensive test suite with real API data integration - Support subnet 59 profile data fetching and analysis The profile scorer provides: - Normalized scoring (0-1) for follower counts - Verification status weighting - Detailed score component breakdown - Integration with existing registration system Test coverage includes: - API data fetching - Profile statistics - Follower count analysis - Score calculation verification
- Add ProfileScorer integration to AgentScorer with 10% weight - Adjust scoring weights: semantic 70%, engagement 15%, profile 10%, length 5% - Streamline test output by saving metrics CSV before SHAP calculation - Remove async SHAP processing in favor of sequential streaming to file - Add FollowersCount and IsVerified to agent metrics output - Improve readability of test results with immediate file writes The changes prioritize immediate output of scoring results while maintaining the detailed SHAP analysis in a more readable sequential format. Profile scoring is now properly integrated into the overall scoring system.
- Add 95% final score penalty for unverified agents in AgentScorer - Update ProfileScorer to use BaseScorer interface and Tweet type - Adjust profile scoring weights (60% followers, 40% verification) - Improve follower score normalization with log scale and 100k cap - Clean up code and improve documentation This change ensures unverified agents consistently score lower in rankings while maintaining the existing scoring components for verified agents.
teslashibe
changed the title
🧪 [DRAFT] Add Semantic Scoring for Post Evaluation
🧪 [DRAFT] Agent Scoring v4.0
Jan 18, 2025
- Separate scoring logic for verified and unverified accounts - Ensure verified accounts always score higher than unverified - Scale unverified scores relative to minimum verified score - Add _get_agent_uid helper method for consistent UID lookup - Improve score normalization with separate group handling - Maintain backward compatibility with PostsScorer class This change ensures verified accounts receive appropriate scoring priority while maintaining relative quality rankings within each verification group.
- Modify _calculate_post_score to use validator's registered agents for verification status - Scale verified account scores to 0.1-1.0 range - Scale unverified account scores to 0-0.1 range - Remove ambiguous verification checks from post data - Ensure consistent scoring separation between verified/unverified agents This change guarantees that verified accounts will always score higher than unverified ones while maintaining relative ranking within each group.
- Add normalization and sigmoid-like transformation with kurtosis factor - Scale normalized scores back to original range - Preserve min/max score boundaries - Add safeguard for cases where max_score equals min_score
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request: Scoring System v4.0
Overview
This PR updates the scoring system to enforce a strict separation between verified and unverified agents while maintaining relative rankings within each group. The changes ensure verified accounts always score higher than unverified ones through a guaranteed scoring range separation.
Key Changes
Verification Requirements
Scoring Architecture
_calculate_post_score
to enforce verification rangesScore Components (Unchanged)
Technical Details
Verification Scoring
Component Integration
Hardware Requirements
Testing
Migration
No breaking changes. Agents will see score adjustments based on verified status, but internal ranking logic remains consistent.
Dependencies
Configuration
Documentation
Performance Impact
Security
Monitoring