Skip to content

Pull requests: huggingface/datatrove

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

correct metadata parsing for finemath
#355 opened Mar 24, 2025 by VivienCabannes Loading…
fix bos token missing
#346 opened Feb 13, 2025 by jquesnelle Loading…
[draft] Add chunking option to DocumentTokenizer
#344 opened Feb 12, 2025 by craffel Loading…
Add RayPipelineExecutor
#331 opened Jan 27, 2025 by nelson-liu Loading…
Resolve issue 308
#309 opened Nov 29, 2024 by habanoz Loading…
Add open-source text extraction libraries
#293 opened Sep 27, 2024 by garrethlee Loading…
Mersenne prime hashing fix.
#200 opened May 28, 2024 by Apsod Loading…
Linewise filters
#125 opened Mar 14, 2024 by guipenedo Draft
ProTip! Adding no:label will show everything without a label.