Skip to content

Commit

Permalink
Merge pull request #16 from FOI-Bioinformatics/v0.3.0
Browse files Browse the repository at this point in the history
Prepare for v0.3.0
  • Loading branch information
druvus authored Mar 9, 2025
2 parents 9960e79 + 4ab2f41 commit c47cc02
Show file tree
Hide file tree
Showing 41 changed files with 6,362 additions and 1,584 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,12 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# IDE files
.idea/
.vscode/
*.swp
*.swo

# OS specific files
.DS_Store
158 changes: 158 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# MetaQuest Architecture

This document provides an overview of the MetaQuest architecture and design principles.

## Design Principles

MetaQuest is designed around the following principles:

1. **Separation of Concerns**: Clear separation between data access, business logic, and user interfaces
2. **Extensibility**: Plugin-based design allowing for easy addition of new features
3. **Maintainability**: Well-defined interfaces and abstractions to simplify maintenance
4. **Robustness**: Comprehensive error handling and validation
5. **Usability**: Intuitive CLI with consistent command patterns

## Architecture Overview

The application is structured using a layered architecture with the following components:

```
+------------------+
| CLI |
+------------------+
|
+------------------+
| Core Logic |
+------------------+
|
+------------------+ +------------------+
| Data Layer |<--->| Plugin System |
+------------------+ +------------------+
|
+------------------+
| External Systems |
+------------------+
```

### Layers

#### CLI Layer
The CLI layer provides the command-line interface for the application. It's responsible for:
- Parsing command-line arguments
- Routing commands to the appropriate handlers
- Formatting output for the user

#### Core Logic Layer
The core logic layer contains the business logic of the application. It includes:
- Data models and domain objects
- Validation logic
- Processing algorithms
- Error handling

#### Data Layer
The data layer manages access to data sources and external systems:
- File I/O operations
- Data format conversions
- Communication with external APIs
- Caching mechanisms

#### Plugin System
The plugin system enables extensibility:
- Format plugins for different file formats
- Visualization plugins for different visualization types
- Processing plugins for different analysis methods

### Key Components

#### Core Components

- **Models**: Data classes representing domain objects like `Containment` and `SRAMetadata`
- **Validation**: Functions for validating input data and configurations
- **Exceptions**: Custom exception hierarchy for clear error reporting

#### Data Access Components

- **file_io**: Abstract file operations for reading/writing various file formats
- **branchwater**: Functionality for working with Branchwater data
- **metadata**: Functions for downloading and processing metadata
- **sra**: Functions for downloading and working with SRA data

#### Processing Components

- **containment**: Algorithms for analyzing containment data
- **counts**: Functions for counting and summarizing metadata
- **statistics**: Statistical analysis utilities

#### Visualization Components

- **plots**: Functions for generating various types of plots
- **reporting**: Tools for generating reports

#### Plugin System Components

- **base**: Base classes and registries for plugins
- **formats**: File format plugins
- **visualizers**: Visualization plugins

#### Utility Components

- **logging**: Logging configuration and utilities
- **config**: Configuration management

## Data Flow

1. **Input**: User provides commands via CLI
2. **Command Handling**: CLI routes to appropriate command handler
3. **Data Access**: Commands interact with data sources via the data layer
4. **Processing**: Data is processed using core logic components
5. **Visualization**: Results are visualized or formatted for output
6. **Output**: Results are presented to the user via CLI

## Plugin System

The plugin system uses a registry pattern:

1. Plugins inherit from a base `Plugin` class
2. Plugins are registered in a `PluginRegistry`
3. The application can discover plugins at runtime
4. Plugins provide specific implementations for abstract operations

### Example: Format Plugins

Format plugins handle different file formats:

- `BranchWaterFormatPlugin`: Handles Branchwater CSV files
- `MastiffFormatPlugin`: Handles Mastiff CSV files

Each plugin provides standard methods like:
- `validate_header`: Validates file headers
- `parse_file`: Parses file content
- `extract_metadata`: Extracts metadata from parsed content

## Error Handling

The application uses a consistent error handling approach:

1. Custom exception hierarchy starting with `MetaQuestError`
2. Specific exception types for different error categories
3. Each layer handles errors appropriate to its level
4. CLI layer catches and formats errors for user display

## Configuration Management

The application manages configuration through:

1. Default configuration values
2. Configuration file in user's home directory
3. Environment variables
4. Command-line overrides

## Future Extensions

The architecture is designed to accommodate future extensions:

1. New file format plugins
2. Additional visualization types
3. More analysis algorithms
4. Web interface or API
5. Database integration
17 changes: 17 additions & 0 deletions metaquest/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
"""
MetaQuest - A toolkit for analyzing metagenomic datasets based on genome containment.
MetaQuest helps users search through SRA datasets to find containment of specified
genomes and analyze associated metadata.
"""

__version__ = "0.3.0"
__author__ = "Andreas Sjödin"
__email__ = "[email protected]"

# Import commonly used modules for easier access
from metaquest.core.exceptions import MetaQuestError, ValidationError
from metaquest.utils.config import get_config, set_config
from metaquest.utils.logging import setup_logging

# Initialize logging by default
setup_logging()
22 changes: 22 additions & 0 deletions metaquest/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
"""
Command Line Interface for MetaQuest.
This package provides the command-line interface for the MetaQuest toolkit.
"""

# Import command handlers and CLI entry point
from metaquest.cli.commands import (
download_test_genome_command,
process_branchwater_command,
extract_branchwater_metadata_command,
parse_containment_command,
download_metadata_command,
parse_metadata_command,
count_metadata_command,
single_sample_command,
plot_containment_command,
plot_metadata_counts_command,
download_sra_command,
assemble_datasets_command
)
from metaquest.cli.main import main, create_parser
Loading

0 comments on commit c47cc02

Please sign in to comment.