You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is logically part of the functionality suggested in #11 and I needed something like this to proceed with freedict/fd-dictionaries#62 .
This ticket is partially to document my interim solution but mostly to serve as an anchor for the hopefully not-too-distant commit, and a place to consider extending the functionality, again in the spirit of #11.
First, a non-commit that does the job, in a maddeningly slow manner:
validate_all:
for dict in $(DICTS); do \
cd ./$$dict ; \
xmllint --noout --xinclude --relaxng freedict-P5.rng $$dict.tei ; \
cd .. ; \
done
The slowness is sadly due to the overall architecture of this kind of call: the schema is recompiled for each new iteration, and then the parsing of the XML begins, and it can't be helped with the --stream flag, because it looks like our huge databases are still too small for streaming to be an effective enhancement.
I tried to make the above more kosher with respect to our building system architecture, use the variables set in dicts.mk and the target defined there, for single databases, namely validation.
validate_all:
for dict in $(DICTS); do \
$(MAKE) -C $$dict validation; \
done
And with the above, the troubles I described in #27 began. So until #27 is handled, I can only use my makeshift solution at the top. Not sure if it's worth committing temporarily, because we all know how permanent temporary solutions can be.
Potential extensions
apart from validation against the RNG schema, we might want to simply check well-formedness (in the case of xmllint, it just a matter of dropping the --relaxng freedict-P5.rng fragment, so the whole thing could be a single parametrized statement)
depending on how the databases grow, maybe --stream can be used, for huge databases (otherwise it doesn't buy any time, but rather costs extra)
we might also see if we can find a sensible parser different from xmllint that precompiles the schema, to save some time, or if we can keep the parsing library preloaded; of course, that would have to be an optional extra for the user
The text was updated successfully, but these errors were encountered:
Just noting that commit e5cdfe0 is a minor addition to dicts.mk: the --xinclude parameter to xmllint, to handle eng-pol, but maybe, in the future, also other large databases, if we decide to split them up into more manageable chunks. It is harmless if there are no Xinclusions in the document.
Please execute `make help` in the dictroot.
Hint: using `make validation -j8` will run validation in parallel. Use twice
as many as you have CPU cores.
This is logically part of the functionality suggested in #11 and I needed something like this to proceed with freedict/fd-dictionaries#62 .
This ticket is partially to document my interim solution but mostly to serve as an anchor for the hopefully not-too-distant commit, and a place to consider extending the functionality, again in the spirit of #11.
First, a non-commit that does the job, in a maddeningly slow manner:
The slowness is sadly due to the overall architecture of this kind of call: the schema is recompiled for each new iteration, and then the parsing of the XML begins, and it can't be helped with the
--stream
flag, because it looks like our huge databases are still too small for streaming to be an effective enhancement.I tried to make the above more kosher with respect to our building system architecture, use the variables set in
dicts.mk
and the target defined there, for single databases, namelyvalidation
.And with the above, the troubles I described in #27 began. So until #27 is handled, I can only use my makeshift solution at the top. Not sure if it's worth committing temporarily, because we all know how permanent temporary solutions can be.
Potential extensions
--relaxng freedict-P5.rng
fragment, so the whole thing could be a single parametrized statement)--stream
can be used, for huge databases (otherwise it doesn't buy any time, but rather costs extra)The text was updated successfully, but these errors were encountered: