-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qlever index for Wikidata doing nothing #139
Comments
@michaelbrunnbauer Can you please send the output of |
Command: system-info Show system information and Qleverfile System InformationVersion: 0.5.18 (qlever --version) Contents of QleverfileNo Qleverfile found |
I sometimes get this instead of the hanging behaviour. Something very timing-sensitive must be going on? Command: index echo '{ "languages-internal": [], "prefixes-external": [""], "locale": { "language": "en", "country": "US", "ignore-punctuation": true }, "ascii-prefixes-only": true, "num-triples-per-batch": 5000000 }' > wikidata.settings.json 2025-02-20 16:29:17.368 - INFO: QLever IndexBuilder, compiled on Wed Feb 19 16:11:23 UTC 2025 using git hash caaf76 @Prefix xsd: http://www.w3.org/2001/XMLSchema# . |
Thank you very much for that information, it seems to be that there is either a bug in the parser or an error in the input files, |
|
@michaelbrunnbauer I just tried this myself and had the same problem. @RobinTF I investigated and found that ad-freiburg/qlever#1807 broke the index build. Can you please have a look? @michaelbrunnbauer While we are fixing this, you can just use one of the Docker image from two days ago or earlier. For example, adfreiburg/qlever:pr-1816 should work (edit the Qleverfile or call |
This reverts commit 8678731, which breaks the index build, see ad-freiburg/qlever-control#139
@joka921 You adjusted the code in ad-freiburg/qlever#1807 to ensure the first error is always the one getting reported. Is there a chance this has something to do with this? A deadlock maybe? I just had a second look and I didn't see anything out of the ordinary, but the added |
…1827) This reverts commit 8678731, which breaks the index build, see ad-freiburg/qlever-control#139
I can confirm that with the option --image adfreiburg/qlever:pr-1816, the indexing actually starts. Should I even try to let it finish with 512GB disk space? |
@michaelbrunnbauer The total size of the index file for Wikidata will be around 430 GB in the end, which is very compact. During the index building, you will need more than that, so I doubt that 512 GB of disk space will be sufficient. How about buying a larger disk? For example, a 2 TB NVMe SSD is really cheap these days, and even 4 TB or 8 TB are pretty affordable. 512 GB is really little. |
As indexing the olympics dataset works and as the Wikidata dump is much bigger than I thought (already taking up 25% of available disk space in compressed form), I suspect hardware requirements are not met.
Indexing just gets stuck from the start, though and I can see no error messages.
-Output (top shows no CPU or io activity whatsoever):
Command: index
echo '{ "languages-internal": [], "prefixes-external": [""], "locale": { "language": "en", "country": "US", "ignore-punctuation": true }, "ascii-prefixes-only": true, "num-triples-per-batch": 5000000 }' > wikidata.settings.json$(id -u):$ (id -g) -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -w /index --init --entrypoint bash --name qlever.index.wikidata adfreiburg/qlever -c 'ulimit -Sn 1048576; IndexBuilderMain -i wikidata -s wikidata.settings.json -f <(lbzcat -n 4 latest-all.ttl.bz2) -g - -F ttl -p true -f <(lbzcat -n 1 latest-lexemes.ttl.bz2) -g - -F ttl -f <(cat dcatap.nt) -g - -F nt --stxxl-memory 10G | tee wikidata.index-log.txt'
docker run --rm -u
2025-02-20 13:23:49.889 - INFO: QLever IndexBuilder, compiled on Wed Feb 19 16:11:23 UTC 2025 using git hash caaf76
2025-02-20 13:23:49.890 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1"
2025-02-20 13:23:49.890 - INFO: You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2025-02-20 13:23:49.890 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2025-02-20 13:23:49.890 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2025-02-20 13:23:49.890 - INFO: Processing triples from 3 input streams ...
2025-02-20 13:23:49.891 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...
-Software:
Ubuntu 24.04.2 LTS
Python 3.12.3
qlever 0.5.18
-Hardware:
13th Gen Intel(R) Core(TM) i5-13500 with 64GB RAM and 512GB NVMe
-Steps to reproduce:
apt-get install build-essential python3.12-venv docker.io lbzip2 unzip
adduser qlever
usermod -aG docker qlever
su - qlever
python3 -m venv qlever
cd qlever
./bin/pip install qlever
mkdir wikidata
cd wikidata
../bin/qlever setup-config wikidata
../bin/qlever get-data
../bin/qlever index
The text was updated successfully, but these errors were encountered: