Skip to content

Commit

Permalink
logging
Browse files Browse the repository at this point in the history
  • Loading branch information
soldni committed Oct 24, 2024
1 parent fd9ea5d commit 346a325
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
6 changes: 5 additions & 1 deletion classifiers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,14 @@ pip install -e classifiers

## Examples

Run [NVIDIA's Deberta quality classifier](https://huggingface.co/nvidia/quality-classifier-deberta) on S3 data:
Run [Huggingface FineWeb classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier) on S3 data:

```bash
python -m dolma_classifiers.inference \
-s 's3://ai2-llm/pretraining-data/sources/dclm/v0/documents/40b-split/*/*zstd' \
-m HuggingFaceFW/fineweb-edu-classifier
```


<!-- Run [NVIDIA's Deberta quality classifier](https://huggingface.co/nvidia/quality-classifier-deberta) on S3 data:
-->
9 changes: 6 additions & 3 deletions classifiers/src/dolma_classifiers/loggers.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class ProgressLogger:
def __init__(self, log_every: int = 10_000, wandb_logger: WandbLogger | None = None):
self.log_every = log_every
self.logger = get_logger(self.__class__.__name__)
self.prev_time = time.time()
self.start_time = self.prev_time = time.time()
self.total_docs = 0
self.current_docs = 0
self.current_files = 0
Expand All @@ -103,8 +103,11 @@ def increment(self, docs: int = 0, files: int = 0):
if self.wandb_logger is not None:
self.wandb_logger.log(
step=self.total_docs,
docs_throughput=docs_throughput,
files_throughput=files_throughput,
instant_doc_throughput=docs_throughput,
total_doc_throughput=self.total_docs / (current_time - self.start_time),
instant_file_throughput=files_throughput,
total_file_throughput=self.total_files / (current_time - self.start_time),
total_files=self.total_files,
)

self.prev_time = current_time
Expand Down

0 comments on commit 346a325

Please sign in to comment.