Skip to content
View mbanon's full-sized avatar
🥔
🥔

Organizations

@paracrawl @bitextor @macocu

Block or report mbanon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Hammerfest web game sources

Mathematica 45 10 Updated Sep 26, 2023

What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets

Python 212 20 Updated Nov 16, 2024
HTML 4 1 Updated Jan 21, 2025

Targetted language identifier, based on FastText and Hunspell.

Python 34 4 Updated Feb 13, 2025

HPLT Analytics

HTML 13 1 Updated Mar 4, 2025

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

Python 50 15 Updated Jan 16, 2025

The Database Toolkit for Python

Python 10,143 1,484 Updated Mar 11, 2025

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

SCSS 17 3 Updated Nov 6, 2023
Python 6 1 Updated May 31, 2023

Pre-filtering step for bicleaner

Python 4 2 Updated Dec 2, 2024

Monocleaner models repository

1 Updated Jan 8, 2025
Python 7 1 Updated Jan 8, 2025

Hunspell dictionaries in UTF-8

JavaScript 1,250 403 Updated Sep 9, 2024

Repository for data models, dictionaries and more resources for Bicleaner

6 Updated Dec 15, 2022

Repository of Bicleaner AI models

5 Updated Mar 28, 2023

Transform TMX to text

Python 1 Updated Jan 17, 2025

Bicleaner fork that uses neural networks

Python 39 4 Updated Jul 26, 2024

Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

Python 10 2 Updated Mar 3, 2025

A Corpus of Quotes

68 16 Updated May 4, 2019

Code for Neural Inverse Knitting: From Images to Manufacturing Instructions

Python 51 14 Updated Nov 14, 2023

A React site simulates knitting different stitch widths using a skein of variegated yarn.

JavaScript 2 Updated Nov 27, 2018

Tool for manual evaluation of parallel sentences.

PHP 14 4 Updated Nov 6, 2024
Python 71 19 Updated Feb 27, 2025

Program used to split text into segments

Java 2 1 Updated Sep 19, 2022

Program used to split text into segments

Java 25 9 Updated Oct 27, 2024

Print resource usage of processes to stderr with LD_PRELOAD

C++ 3 Updated Jul 25, 2016

Tool to fix bitexts and tag near-duplicates for removal

Python 30 3 Updated Feb 5, 2025

Transform TMX to text

Python 28 10 Updated Nov 23, 2022

Results of the human evaluation

Rich Text Format 5 3 Updated Dec 9, 2020

Iterative JSON parser with Pythonic interface

Python 620 135 Updated Jan 15, 2020
Next