Skip to content

Commit

Permalink
RRes ETL, fixing Neo dump problems.
Browse files Browse the repository at this point in the history
  • Loading branch information
marco-brandizi committed Feb 9, 2024
1 parent 2ed349b commit d862e79
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 20 deletions.
6 changes: 6 additions & 0 deletions rres-endpoints/build-endpoint.snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,12 @@ rule tdb_load:
shell:
'./endpoint-steps/tdb-load.sh'

# ==> NOTE: review the options in $NEO4J_HOME about the transaction log retentions,
# to get rid of past transactions retention, else, the dump can be much bigger than
# necessary.
#
# See: https://neo4j.com/docs/operations-manual/current/database-internals/transaction-logs/
#

rule neo_export:
input:
Expand Down
43 changes: 23 additions & 20 deletions rres-endpoints/endpoint-steps/neo-dump.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,42 @@
#
set -e

neo_dump="$2" # The output of this step is the Neo dump, ready for deployment
neo_dump="$1" # The output of this step is the Neo dump, ready for deployment

neo_url=`ketl_get_neo_url`

export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Dneo4j.boltUrl='$neo_url'"
export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Dneo4j.user='$KETL_NEO_USR'"
export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Dneo4j.password='$KETL_NEO_PWD'"

echo -e "\n\n Waiting before stopping Neo4j\n"
sleep 60
sleep 30
"$KETL_NEO_STOP"

echo -e "Another pause, just in case.\n"
sleep 20
if [[ "$KETL_ENVIRONMENT" == "rres" ]] then

sleep_time=10m

cat <<EOT
*** IMPORTANT ****
We're using Neo4j through SLURM and it seems Neo4j needs
one more restart with a long pause before the final shutdown,
to complete transactions.
So, now we will wait $sleep_time before stopping Neo again.
If you see problems with the dump command, restart Neo manually,
check "$NEO4J_HOME/log/debug.log" to ensure the server actually
restarted, then run the ETL workflow again, to have this hereby
script re-running.
# TODO: remove, seemed to be needed in the past, now it should be fixed.
# WARNING: it is broken anyway, we don't use is_slurm_neo anymore, move it
# to neo-stop-slurm.sh
# 
if false && `$is_slurm_neo`; then
echo -e "\nOne more restart, needed under SLURM"
EOT

sleep 60
"$KETL_NEO_START"

sleep 60
sleep $sleep_time
"$KETL_NEO_STOP"
fi

# TODO: review the options in $NEO4J_HOME about the transaction log retentions, we need to get rid
# of any past transactions, else, the dump is much bigger than necessary.
#
echo -e "\n Neo4j Dump to '$neo_dump'\n"
"$NEO4J_HOME/bin/neo4j-admin" database dump --to-stdout neo4j >"$neo_dump"
"$NEO4J_HOME/bin/neo4j-admin" database dump --to-stdout neo4j --verbose >"$neo_dump"

echo -e "\n Neo4j Dump done\n"

0 comments on commit d862e79

Please sign in to comment.