Skip to content

Commit

Permalink
Merge pull request #40 from zuazo-forks/gl-segmenter
Browse files Browse the repository at this point in the history
Add Galician support to the Segmenter
  • Loading branch information
ftyers authored Jan 9, 2023
2 parents 4880834 + 8f190f9 commit 40af628
Show file tree
Hide file tree
Showing 7 changed files with 400 additions and 6 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/build
/commonvoice_utils.egg-info
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ include cvutils/data/ckt/phon.tsv
include cvutils/data/gl
include cvutils/data/gl/alphabet.txt
include cvutils/data/gl/validate.tsv
include cvutils/data/gl/punct.tsv
include cvutils/data/gl/abbr.tsv
include cvutils/data/gl/phon.tsv
include cvutils/data/gl/vocab.tsv
include cvutils/data/rm-vallader
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ A-hend-all e vez gounezet arc'hant dre chaseal ha pesketa.
| Frisian | Frysk |`fry` | `fy-NL` |`fy`| ||||
| Igbo | Ásụ̀sụ́ Ìgbò |`ibo` | `ig` |`ig`|||| |
| Irish | Gaeilge |`gle` | `ga-IE` |`ga`| ||| |
| Galician | Galego |`glg` | `gl` |`gl`|||| |
| Galician | Galego |`glg` | `gl` |`gl`|||| |
| Guaraní | Avañeʼẽ |`gug` | `gn` |`gn`|||| |
| Hindi | हिन्दी |`hin` | `hi` | `hi` ||||
| Hausa | Harshen Hausa |`hau` | `ha` |`ha` |||| |
Expand Down
371 changes: 371 additions & 0 deletions cvutils/data/gl/abbr.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,371 @@
1 a.
1 AA.
1 ab.
1 a.C.
1 acad.
1 acadca.
1 acadco.
1 acep.
1 adm.
1 admdor.
1 admdora.
1 admtva.
1 admtvo.
1 adv.
1 adx.
1 ag.
1 agr.
1 agrón.
1 alc.
1 alm.
1 alt.
1 a.m.
1 ampl.
1 and.
1 ant.
1 ap.
1 apdo.
1 aprox.
1 apto.
1 arq.
1 arquit.
1 art.
1 asdo.
1 asoc.
1 át.
1 aum.
1 aus.
1 aut.
1 aux.
1 avda.
1 axud.
1 bibl.
1 bibliog.
1 bl.
1 b.o.
1 bol.
1 c.
1 ca.
1 cant.
1 cap.
1 carr.
1 cast.
1 cat.
1 cát.
1 catedr.
1 célt.
1 cént.
1 cert.
1 ch.
1 cit.
1 cl.
1 clás.
1 cód.
1 coed.
1 col.
1 colab.
1 com.
1 comp.
1 conc.
1 constr.
1 cont.
1 convoc.
1 coord.
1 corp.
1 corrix.
1 cp.
1 cta.
1 cto.
1 d.
1 d.C.
1 dec.
1 del.
1 dem.
1 dep.
1 desp.
1 det.
1 dic.
1 dipl.
1 dir.
1 dir.ª
1 disp.
1 distr.
1 d.l.
1 doc.
1 dpto.
1 Dr.
1 Dra.
1 dta.
1 dto.
1 dupl.
1 d/v.
1 d.v.
1 d.x.
1 econ.
1 ed.
1 edit.
1 ef.
1 Em.
1 entr.
1 enx.
1 e.p.d.
1 epíl.
1 escr.
1 esp.
1 esq.
1 esqda.
1 esqdo.
1 est.
1 estat.
1 estr.
1 etc.
1 e.t.s.
1 e.u.
1 eusc.
1 éusc.
1 ex.
1 exc.
1 exped.
1 ext.
1 f.
1 fábr.
1 fac.
1 facs.
1 fact.
1 fasc.
1 feb.
1 fem.
1 fest.
1 fig.
1 fotogr.
1 fr.
1 fund.
1 fut.
1 gal.
1 gar.
1 gl.
1 gob.
1 gr.
1 gram.
1 h.
1 hab.
1 habit.
1 íb.
1 íd.
1 igr.
1 il.
1 ilustr.
1 imp.
1 imper.
1 imperf.
1 impers.
1 impr.
1 inc.
1 incl.
1 incompl.
1 ind.
1 índ.
1 indet.
1 inf.
1 infin.
1 info.
1 inform.
1 ing.
1 ins.
1 insep.
1 inst.
1 int.
1 inter.
1 interr.
1 interx.
1 intr.
1 introd.
1 invent.
1 irr.
1 it.
1 l.
1 lab.
1 lám.
1 lat.
1 lca.
1 lco.
1 ldo.lda.
1 lic.
1 licda.
1 licdo.
1 lit.
1 loc.
1 lonx.
1 ltda.
1 ltdo.
1 m.
1 maiúsc.
1 masc.
1 mat.
1 máx.
1 mc.
1 mecan.
1 med.
1 merc.
1 mercad.
1 min.
1 mín.
1 minist.
1 mod.
1 ms.
1 mt.
1 mun.
1 mús.
1 mz.
1 n.
1 nac.
1 n.do
1 n.doed.
1 neg.
1 nom.
1 not.
1 nov.
1 n.p.
1 ntva.
1 ntvo.
1 núm.
1 o.
1 obs.
1 of.
1 o.p.
1 op.
1 op.cit.
1 opús.
1 orix.
1 out.
1 p.
1 pal.
1 par.
1 parr.
1 part.
1 pat.
1 pav.
1 páx.
1 p.b.
1 P.D.
1 pdo.
1 pen.
1 per.
1 pers.
1 pl.
1 plu.
1 p.m.
1 p.m.a.
1 p.n.
1 pob.
1 pol.
1 port.
1 pos.
1 pr.
1 pral.
1 pref.
1 prelim.
1 prep.
1 pres.
1 prínc.
1 priv.
1 prnl.
1 proc.
1 prof.
1 pról.
1 pron.
1 prov.
1 próx.
1 P.S.
1 pta.
1 pte.
1 publ.
1 públ.
1 pza.
1 r.
1 rec.
1 red.
1 reed.
1 ref.
1 reg.
1 rel.
1 rev.
1 rex.
1 R.I.P.
1 r.p.m.
1 rte.
1 s.
1 S.A.
1 sáb.
1 s.d.
1 sec.
1 séc.
1 secr.
1 seg.
1 sent.
1 s.e.o.o.
1 serv.
1 set.
1 símb
1 símb.
1 sing.
1 s.l.
1 S.L.
1 s.l.s.a.
1 s.n.
1 sobr.
1 soc.
1 Sr.
1 Sra.
1 st.
1 Sta.
1 Sto.
1 subs.
1 subx.
1 sum.
1 sup.
1 supl.
1 suplem.
1 sus.
1 t.
1 téc.
1 tel.
1 teléf.
1 telegr.
1 test.
1 tfno.
1 tip.
1 tít.
1 tón.
1 trad.
1 trans.
1 trat.
1 trav.
1 trib.
1 tripl.
1 tv.
1 u.
1 ú.
1 últ.
1 univ.
1 urb.
1 v.
1 v.
1 Vde.
1 Vde/s.
1 ven.
1 venc.
1 vers.
1 v.gr.
1 vid.
1 vol.
1 VV.
1 x.
1 xan.
1 xer.
1 xll.
1 x.p.
1 xud.
1 xur.
1 xust.
1 xv.
2 changes: 1 addition & 1 deletion cvutils/data/gl/alphabet.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
aábcdeéfghiílmnñoópqurstuúvxz
aábcdeéfghiílmnñoópqrstuúüvxz
3 changes: 3 additions & 0 deletions cvutils/data/gl/punct.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
EOS !
EOS ?
EOS .
Loading

0 comments on commit 40af628

Please sign in to comment.