Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Galician support to the Segmenter #40

Merged
merged 11 commits into from
Jan 9, 2023
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/build
/commonvoice_utils.egg-info
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ include cvutils/data/ckt/phon.tsv
include cvutils/data/gl
include cvutils/data/gl/alphabet.txt
include cvutils/data/gl/validate.tsv
include cvutils/data/gl/punct.tsv
include cvutils/data/gl/abbr.tsv
include cvutils/data/gl/phon.tsv
include cvutils/data/gl/vocab.tsv
include cvutils/data/rm-vallader
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ A-hend-all e vez gounezet arc'hant dre chaseal ha pesketa.
| Frisian | Frysk |`fry` | `fy-NL` |`fy`| | ✔ | ✔ | ✔ |
| Igbo | Ásụ̀sụ́ Ìgbò |`ibo` | `ig` |`ig`| ✔ | ✔ | ✔ | |
| Irish | Gaeilge |`gle` | `ga-IE` |`ga`| | ✔ | ✔ | |
| Galician | Galego |`glg` | `gl` |`gl`| ✔ | ✔ | ✔ | |
| Galician | Galego |`glg` | `gl` |`gl`| ✔ | ✔ | ✔ | |
| Guaraní | Avañeʼẽ |`gug` | `gn` |`gn`| ✔ | ✔ | ✔ | |
| Hindi | हिन्दी |`hin` | `hi` | `hi` | ✔ | ✔ | ✔ |
| Hausa | Harshen Hausa |`hau` | `ha` |`ha` | ✔ | ✔ | ✔ | |
Expand Down
371 changes: 371 additions & 0 deletions cvutils/data/gl/abbr.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,371 @@
1 a.
1 AA.
1 ab.
1 a.C.
1 acad.
1 acadca.
1 acadco.
1 acep.
1 adm.
1 admdor.
1 admdora.
1 admtva.
1 admtvo.
1 adv.
1 adx.
1 ag.
1 agr.
1 agrón.
1 alc.
1 alm.
1 alt.
1 a.m.
1 ampl.
1 and.
1 ant.
1 ap.
1 apdo.
1 aprox.
1 apto.
1 arq.
1 arquit.
1 art.
1 asdo.
1 asoc.
1 át.
1 aum.
1 aus.
1 aut.
1 aux.
1 avda.
1 axud.
1 bibl.
1 bibliog.
1 bl.
1 b.o.
1 bol.
1 c.
1 ca.
1 cant.
1 cap.
1 carr.
1 cast.
1 cat.
1 cát.
1 catedr.
1 célt.
1 cént.
1 cert.
1 ch.
1 cit.
1 cl.
1 clás.
1 cód.
1 coed.
1 col.
1 colab.
1 com.
1 comp.
1 conc.
1 constr.
1 cont.
1 convoc.
1 coord.
1 corp.
1 corrix.
1 cp.
1 cta.
1 cto.
1 d.
1 d.C.
1 dec.
1 del.
1 dem.
1 dep.
1 desp.
1 det.
1 dic.
1 dipl.
1 dir.
1 dir.ª
1 disp.
1 distr.
1 d.l.
1 doc.
1 dpto.
1 Dr.
1 Dra.
1 dta.
1 dto.
1 dupl.
1 d/v.
1 d.v.
1 d.x.
1 econ.
1 ed.
1 edit.
1 ef.
1 Em.
1 entr.
1 enx.
1 e.p.d.
1 epíl.
1 escr.
1 esp.
1 esq.
1 esqda.
1 esqdo.
1 est.
1 estat.
1 estr.
1 etc.
1 e.t.s.
1 e.u.
1 eusc.
1 éusc.
1 ex.
1 exc.
1 exped.
1 ext.
1 f.
1 fábr.
1 fac.
1 facs.
1 fact.
1 fasc.
1 feb.
1 fem.
1 fest.
1 fig.
1 fotogr.
1 fr.
1 fund.
1 fut.
1 gal.
1 gar.
1 gl.
1 gob.
1 gr.
1 gram.
1 h.
1 hab.
1 habit.
1 íb.
1 íd.
1 igr.
1 il.
1 ilustr.
1 imp.
1 imper.
1 imperf.
1 impers.
1 impr.
1 inc.
1 incl.
1 incompl.
1 ind.
1 índ.
1 indet.
1 inf.
1 infin.
1 info.
1 inform.
1 ing.
1 ins.
1 insep.
1 inst.
1 int.
1 inter.
1 interr.
1 interx.
1 intr.
1 introd.
1 invent.
1 irr.
1 it.
1 l.
1 lab.
1 lám.
1 lat.
1 lca.
1 lco.
1 ldo.lda.
1 lic.
1 licda.
1 licdo.
1 lit.
1 loc.
1 lonx.
1 ltda.
1 ltdo.
1 m.
1 maiúsc.
1 masc.
1 mat.
1 máx.
1 mc.
1 mecan.
1 med.
1 merc.
1 mercad.
1 min.
1 mín.
1 minist.
1 mod.
1 ms.
1 mt.
1 mun.
1 mús.
1 mz.
1 n.
1 nac.
1 n.do
1 n.doed.
1 neg.
1 nom.
1 not.
1 nov.
1 n.p.
1 ntva.
1 ntvo.
1 núm.
1 o.
1 obs.
1 of.
1 o.p.
1 op.
1 op.cit.
1 opús.
1 orix.
1 out.
1 p.
1 pal.
1 par.
1 parr.
1 part.
1 pat.
1 pav.
1 páx.
1 p.b.
1 P.D.
1 pdo.
1 pen.
1 per.
1 pers.
1 pl.
1 plu.
1 p.m.
1 p.m.a.
1 p.n.
1 pob.
1 pol.
1 port.
1 pos.
1 pr.
1 pral.
1 pref.
1 prelim.
1 prep.
1 pres.
1 prínc.
1 priv.
1 prnl.
1 proc.
1 prof.
1 pról.
1 pron.
1 prov.
1 próx.
1 P.S.
1 pta.
1 pte.
1 publ.
1 públ.
1 pza.
1 r.
1 rec.
1 red.
1 reed.
1 ref.
1 reg.
1 rel.
1 rev.
1 rex.
1 R.I.P.
1 r.p.m.
1 rte.
1 s.
1 S.A.
1 sáb.
1 s.d.
1 sec.
1 séc.
1 secr.
1 seg.
1 sent.
1 s.e.o.o.
1 serv.
1 set.
1 símb
1 símb.
1 sing.
1 s.l.
1 S.L.
1 s.l.s.a.
1 s.n.
1 sobr.
1 soc.
1 Sr.
1 Sra.
1 st.
1 Sta.
1 Sto.
1 subs.
1 subx.
1 sum.
1 sup.
1 supl.
1 suplem.
1 sus.
1 t.
1 téc.
1 tel.
1 teléf.
1 telegr.
1 test.
1 tfno.
1 tip.
1 tít.
1 tón.
1 trad.
1 trans.
1 trat.
1 trav.
1 trib.
1 tripl.
1 tv.
1 u.
1 ú.
1 últ.
1 univ.
1 urb.
1 v.
1 v.
1 Vde.
1 Vde/s.
1 ven.
1 venc.
1 vers.
1 v.gr.
1 vid.
1 vol.
1 VV.
1 x.
1 xan.
1 xer.
1 xll.
1 x.p.
1 xud.
1 xur.
1 xust.
1 xv.
2 changes: 1 addition & 1 deletion cvutils/data/gl/alphabet.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
aábcdeéfghiílmnñoópqurstuúvxz
aábcdeéfghiílmnñoópqrstuúüvxz
3 changes: 3 additions & 0 deletions cvutils/data/gl/punct.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
EOS !
EOS ?
EOS .
Loading