Silent failure when Loading Quads (Out of memory killed?) #131

C-Loftus · 2025-02-11T15:35:45Z

Background

I am trying to load a 8gb nquad file with qlever. The container exits without any error logs, but then when I run qlever start it does not work. Either because the vocabulary or meta-data.json files aren't present. Something appears to have silently failed generating the index

However, the index operation succeeds if I trim down my nquad file size to the first 100k lines or so. These lines were picked by random and I don't think there are any encoding or character issues in the rest of the file (it is a direct export from a valid graphdb). It seems to work simply by reducing the size.

I am assuming there is an OOM error when it reads/processes the file before the data is batched but that is just a guess.

What I have tried

I have tried changing "num-triples-per-batch": 500 anywhere from 500 to 10000 but didn't see much difference.
I have tried giving it lots of memory via the cli with --stxxl-memory 20G

Machine / Versions

Machine: M3 Macbook Pro; Sonoma 14.3; 36 GB RAM
Installed qlever controller using pipx 1.7.1
Installed qlever 0.5.17

Other Issues

Seems to be similar to #113 and #111 and #73 I wanted to file this since I am using quads and I don't think others were.

Logs

Click to expand Logs

 qlever index --format nq --overwrite-existing

To enable autocompletion, run the following command, and consider adding it to your `.bashrc` or `.zshrc`:

eval "$(register-python-argcomplete qlever)" && export QLEVER_ARGCOMPLETE_ENABLED=1


Command: index

echo '{  "ascii-prefixes-only": false, "num-triples-per-batch": 500 }' > geoconnex.settings.json
docker run --rm -u $(id -u):$(id -g) -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -w /index --init --entrypoint bash --name qlever.index.geoconnex docker.io/adfreiburg/qlever:latest -c 'cat iow-dump.nq | IndexBuilderMain -i geoconnex -s geoconnex.settings.json -F nq -f - --stxxl-memory 5G | tee geoconnex.index-log.txt'

2025-02-11 10:14:18.009 - INFO: QLever IndexBuilder, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:14:18.011 - INFO: Locale was not specified in settings file, default is en_US
2025-02-11 10:14:18.011 - INFO: You specified "locale = en_US" and "ignore-punctuation = 0"
2025-02-11 10:14:18.011 - INFO: You specified "num-triples-per-batch = 500", choose a lower value if the index builder runs out of memory
2025-02-11 10:14:18.011 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2025-02-11 10:14:18.011 - WARN: Implicitly using the parallel parser for a single input file for reasons of backward compatibility; this is deprecated, please use the command-line option --parse-parallel or -p
2025-02-11 10:14:18.011 - INFO: Processing triples from single input stream /dev/stdin (parallel = true) ...
2025-02-11 10:14:18.012 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...
2025-02-11 10:14:35.659 - INFO: Triples parsed: 10,000,000 [average speed 0.6 M/s, last batch 0.6 M/s, fastest 0.6 M/s, slowest 0.6 M2025-02-11 10:14:45.698 - INFO: Triples parsed: 20,000,000 [average speed 0.7 M/s, last batch 1.0 M/s, fastest 1.0 M/s, slowest 0.6 M2025-02-11 10:14:53.220 - INFO: Triples parsed: 22,461,623 [average speed 0.6 M/s, last batch 1.0 M/s, fastest 1.0 M/s, slowest 0.6 M/s] 
2025-02-11 10:14:53.228 - INFO: Number of triples created (including QLever-internal ones): 22,466,023 [may contain duplicates]
2025-02-11 10:14:53.228 - INFO: Merging partial vocabularies ...

qlever start

To enable autocompletion, run the following command, and consider adding it to your `.bashrc` or `.zshrc`:

eval "$(register-python-argcomplete qlever)" && export QLEVER_ARGCOMPLETE_ENABLED=1


Command: start

docker run -d --restart=unless-stopped -u $(id -u):$(id -g) -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -p 8888:8888 -w /index --init --entrypoint bash --name qlever.server.geoconnex docker.io/adfreiburg/qlever:latest -c 'ServerMain -i geoconnex -j 8 -p 8888 -m 5G -c 2G -e 1G -k 200 -s 30s -a _IbQrZYQE4TEX > geoconnex.server-log.txt 2>&1'

Follow geoconnex.server-log.txt until the server is ready (Ctrl-C stops following the log, but NOT the server)

2025-02-11 10:15:40.347 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:40.348 - INFO: Initializing server ...
2025-02-11 10:15:40.350 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:40.351 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:40.353 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:40.565 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:40.566 - INFO: Initializing server ...
2025-02-11 10:15:40.567 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:40.567 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:40.567 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:40.875 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:40.877 - INFO: Initializing server ...
2025-02-11 10:15:40.877 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:40.878 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:40.878 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:41.425 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:41.428 - INFO: Initializing server ...
2025-02-11 10:15:41.428 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:41.428 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:41.429 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:42.384 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:42.386 - INFO: Initializing server ...
2025-02-11 10:15:42.387 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:42.387 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:42.389 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:44.139 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:44.140 - INFO: Initializing server ...
2025-02-11 10:15:44.141 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:44.143 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:44.144 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 10:15:47.492 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 10:15:47.494 - INFO: Initializing server ...
2025-02-11 10:15:47.494 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 10:15:47.494 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 10:15:47.495 - ERROR: Tried to read from a File but too few bytes were returned

Docker Stats Graphs right before container exits

The text was updated successfully, but these errors were encountered:

hannahbast · 2025-02-11T18:05:09Z

@C-Loftus The batch size is much to small, can you please try again with 1000000 (one million).

QLever creates one partial vocabulary per batch. In the merging stage (where your index build) crashes, these partial vocabularies are merged. On some systems this crashes if there are too many partial vocabularies.

There is no need to make a batch very small. A single batch has to fit into RAM, that's all. So one million triples per batch should never be a problem.

C-Loftus · 2025-02-11T18:26:49Z

@hannahbast Thank you for your response. Unfortunately that doesnt seem to work for me. I changed "num-triples-per-batch": 1000000 then ran qlever index --format nq --overwrite-existing and have the same issue.

[data]
NAME         = geoconnex
GET_DATA_CMD = less iow-dump.nq
DESCRIPTION  = geoconnex

[index]
INPUT_FILES     = iow-dump.nq
# INPUT_FILES     = small_iow.nq
CAT_INPUT_FILES = cat ${INPUT_FILES}
SETTINGS_JSON   = {  "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }

[server]
PORT         = 8888
ACCESS_TOKEN = _IbQrZYQE4TEX

[runtime]
SYSTEM = docker
IMAGE  = docker.io/adfreiburg/qlever:latest

[ui]
UI_PORT   = 8176
UI_CONFIG = default

qlever start

Follow geoconnex.server-log.txt until the server is ready (Ctrl-C stops following the log, but NOT the server)

2025-02-11 13:24:27.033 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 13:24:27.042 - INFO: Initializing server ...
2025-02-11 13:24:27.044 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 13:24:27.044 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 13:24:27.047 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 13:24:27.252 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 13:24:27.254 - INFO: Initializing server ...
2025-02-11 13:24:27.254 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 13:24:27.254 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 13:24:27.255 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 13:24:27.566 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 13:24:27.568 - INFO: Initializing server ...
2025-02-11 13:24:27.568 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 13:24:27.568 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 13:24:27.569 - ERROR: Tried to read from a File but too few bytes were returned
2025-02-11 13:24:28.115 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d

hannahbast · 2025-02-11T20:46:30Z

@C-Loftus Can you paste the index log? And can you provide a link to your input file? The server log you posted indicates that the index build did not complete.

C-Loftus · 2025-02-11T22:04:01Z

@hannahbast Thank you for your reply. Here is the full index log. I briefly caught the memory in the container for the indexing go up to 7.4ish GB (the max) before it crashed for what its worth. The container is killed and removed and I can no longer see it in Docker desktop after this occurs.

The link to the data can be downloaded here: https://zenodo.org/records/14853116 If zenodo does not work for any reason, please let me know and I find an alternative way to send the data.

Index log: (note that indexing exits 0 and the container is not running, so doesn't appear there are any other indexing processes in the background)

2025-02-11 13:24:03.831 - INFO: �[1mQLever IndexBuilder, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d�[22m
2025-02-11 13:24:03.834 - INFO: Locale was not specified in settings file, default is en_US
2025-02-11 13:24:03.834 - INFO: You specified "locale = en_US" and "ignore-punctuation = 0"
2025-02-11 13:24:03.834 - INFO: You specified "num-triples-per-batch = 1,000,000", choose a lower value if the index builder runs out of memory
2025-02-11 13:24:03.834 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2025-02-11 13:24:03.834 - WARN: Implicitly using the parallel parser for a single input file for reasons of backward compatibility; this is deprecated, please use the command-line option --parse-parallel or -p
2025-02-11 13:24:03.834 - INFO: Processing triples from single input stream /dev/stdin (parallel = true) ...
2025-02-11 13:24:03.836 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...

server log

2025-02-11 14:03:17.324 - INFO: QLever Server, compiled on Mon Feb 10 17:29:26 UTC 2025 using git hash 949e7d
2025-02-11 14:03:17.325 - INFO: Initializing server ...
2025-02-11 14:03:17.325 - INFO: The git hash used to build this index was "949e7d"
2025-02-11 14:03:17.326 - INFO: Reading vocabulary from file geoconnex.vocabulary ...
2025-02-11 14:03:17.326 - ERROR: Tried to read from a File but too few bytes were returned

hannahbast · 2025-02-11T22:57:56Z

@C-Loftus Thanks for the link, I could build an index with the data without problems.

Your machine or your Docker container seems to have little memory, but the dataset contains some very long lines (the longest line has almost a million characters). Can you try again with "num-triples-per-batch": 100000.

C-Loftus · 2025-02-12T02:28:28Z

Hi Hannah, thank you for your help. I increased the RAM from 8gb to 20gb in Docker desktop and it seems that worked. It appears 8gb was not enough to ingest the data.

I appreciate your help

hannahbast · 2025-02-12T11:29:49Z

@C-Loftus Thanks for the feedback and happy to hear that it worked. The problem with limited memory is that the operating system can decide to kill the process and then the process just terminates and there is no opportunity to output a proper error message. It's not a QLever-specific problem.

hannahbast mentioned this issue Feb 11, 2025

Qlever cannot finish indexing and thus load BSBM data #113

Open

C-Loftus closed this as completed Feb 12, 2025

hannahbast mentioned this issue Feb 12, 2025

Qlever index not creating metadata file for custom dataset #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent failure when Loading Quads (Out of memory killed?) #131

Silent failure when Loading Quads (Out of memory killed?) #131

C-Loftus commented Feb 11, 2025 •

edited

Loading

hannahbast commented Feb 11, 2025

C-Loftus commented Feb 11, 2025

hannahbast commented Feb 11, 2025 •

edited

Loading

C-Loftus commented Feb 11, 2025 •

edited

Loading

hannahbast commented Feb 11, 2025 •

edited

Loading

C-Loftus commented Feb 12, 2025

hannahbast commented Feb 12, 2025

Silent failure when Loading Quads (Out of memory killed?) #131

Silent failure when Loading Quads (Out of memory killed?) #131

Comments

C-Loftus commented Feb 11, 2025 • edited Loading

Background

What I have tried

Machine / Versions

Other Issues

Logs

hannahbast commented Feb 11, 2025

C-Loftus commented Feb 11, 2025

hannahbast commented Feb 11, 2025 • edited Loading

C-Loftus commented Feb 11, 2025 • edited Loading

hannahbast commented Feb 11, 2025 • edited Loading

C-Loftus commented Feb 12, 2025

hannahbast commented Feb 12, 2025

C-Loftus commented Feb 11, 2025 •

edited

Loading

hannahbast commented Feb 11, 2025 •

edited

Loading

C-Loftus commented Feb 11, 2025 •

edited

Loading

hannahbast commented Feb 11, 2025 •

edited

Loading