Add a validate_all target #28

bansp · 2021-01-06T01:50:41Z

This is logically part of the functionality suggested in #11 and I needed something like this to proceed with freedict/fd-dictionaries#62 .
This ticket is partially to document my interim solution but mostly to serve as an anchor for the hopefully not-too-distant commit, and a place to consider extending the functionality, again in the spirit of #11.

First, a non-commit that does the job, in a maddeningly slow manner:

validate_all:
	for dict in $(DICTS); do \
		cd ./$$dict ; \
		xmllint --noout --xinclude --relaxng freedict-P5.rng $$dict.tei ; \
		cd .. ; \
	done

The slowness is sadly due to the overall architecture of this kind of call: the schema is recompiled for each new iteration, and then the parsing of the XML begins, and it can't be helped with the --stream flag, because it looks like our huge databases are still too small for streaming to be an effective enhancement.

I tried to make the above more kosher with respect to our building system architecture, use the variables set in dicts.mk and the target defined there, for single databases, namely validation.

validate_all:
	for dict in $(DICTS); do \
		$(MAKE) -C $$dict validation; \
	done

And with the above, the troubles I described in #27 began. So until #27 is handled, I can only use my makeshift solution at the top. Not sure if it's worth committing temporarily, because we all know how permanent temporary solutions can be.

Potential extensions

apart from validation against the RNG schema, we might want to simply check well-formedness (in the case of xmllint, it just a matter of dropping the --relaxng freedict-P5.rng fragment, so the whole thing could be a single parametrized statement)
depending on how the databases grow, maybe --stream can be used, for huge databases (otherwise it doesn't buy any time, but rather costs extra)
we might also see if we can find a sensible parser different from xmllint that precompiles the schema, to save some time, or if we can keep the parsing library preloaded; of course, that would have to be an optional extra for the user

The text was updated successfully, but these errors were encountered:

bansp · 2021-01-06T02:12:38Z

Just noting that commit e5cdfe0 is a minor addition to dicts.mk: the --xinclude parameter to xmllint, to handle eng-pol, but maybe, in the future, also other large databases, if we decide to split them up into more manageable chunks. It is harmless if there are no Xinclusions in the document.

humenda · 2021-01-06T07:11:32Z

Please execute `make help` in the dictroot. Hint: using `make validation -j8` will run validation in parallel. Use twice as many as you have CPU cores.

bansp · 2021-01-06T13:45:37Z

Thank you, Wizard! :-)

bansp added the enhancement label Jan 6, 2021

bansp mentioned this issue Jan 6, 2021

refreshing the schemas: freeze the p5subset, add it to our vc, update the syntax in the ODD freedict/fd-dictionaries#62

Open

humenda closed this as completed in 574a35a Jan 6, 2021

bansp mentioned this issue Jan 6, 2021

imbalance: the build system requires dependencies that are not needed for simple tasks #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a validate_all target #28

Add a validate_all target #28

bansp commented Jan 6, 2021

bansp commented Jan 6, 2021

humenda commented Jan 6, 2021 via email

bansp commented Jan 6, 2021

Add a validate_all target #28

Add a validate_all target #28

Comments

bansp commented Jan 6, 2021

Potential extensions

bansp commented Jan 6, 2021

humenda commented Jan 6, 2021 via email

bansp commented Jan 6, 2021