sudhirtumati implementation #598

sudhirtumati · 2024-01-27T07:10:30Z

Check List:

Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
All formatting changes by the build are committed
Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
Output matches that of calculate_average_baseline.sh
For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)

Execution time: ~45 sec
Execution time of reference implementation: ~4m 15s

My PC configuration:

WSL2
Intel Core i7-9750H
32GB RAM (24 GB allocated to WSL)
6 cores with 2 threads per code (8 / 12 threads allocated to WSL)

gunnarmorling · 2024-01-27T14:57:48Z

Produces incorrect output for the 1B file:

Validating calculate_average_sudhirtumati.sh -- measurements_1B.txt
6c6
< Adelaide;-34.2;17.2;67.1
---
> Adelaide;-34.2;17.3;67.1
31c31
< Baghdad;-30.2;22.7;70.2
---
> Baghdad;-30.2;22.8;70.2
142c142
< Hamilton;-34.7;18.0;73.1
---
> Hamilton;-34.7;18.1;73.1
191c191
< La Ceiba;-22.5;26.1;75.2
---
> La Ceiba;-22.5;26.2;75.2

FAILURE: ./test.sh sudhirtumati measurements_1B.txt failed

sudhirtumati · 2024-01-28T04:56:35Z

./test.sh sudhirtumati

Using java version 21.0.2-open in this shell.
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-1.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-10.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-2.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-20.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-3.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-boundaries.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-complex-utf8.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-dot.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-rounding.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-short.txt
Validating calculate_average_sudhirtumati.sh -- src/test/resources/samples/measurements-shortest.txt

This is the output of test execution on my environment. I do not see measurements_1B.txt file under `src/main/resources/samples' directory.

Could you please make measurements_1B.txt available for me to reproduce the issue?

gunnarmorling · 2024-01-28T09:14:06Z

That file is 13 G, so it's a bit hard to share. You can create it yourself using create_measurements.sh (for the standard eval file) and create_measurements3.sh (for the 10K key set file). Your implementation must show the same output for those as the base line (or compare to output of the top of the leaderboard who are known to be correct and will complete much faster).

sudhirtumati · 2024-01-29T04:29:43Z

Noted @gunnarmorling

Unfortunately, I couldn't reproduce the error in my environment. The output generated by my implementation is identical with the baseline. I am sure there must be an issue with the implementation as the same code is not working in your environment. I suspect thread contention might be resulting in incorrect map updates.

Modified my implementation to reduce thread contention to the extent possible. Total processing time (with 1B rows) also came down from ~40 seconds to ~30 seconds on my personal laptop.

Test suite execution is successful. In addition to the test suite, I ran tests with multiple files with different row counts (.5m, 1m, 10m, 50, 100, 500m, 1B) and found that the results are identical to the baseline output.

gunnarmorling · 2024-01-29T20:08:22Z

Hey, can you please rebase this to latest main and squash everything into one commit? There should be no unrelated commits in this PR.

sudhirtumati · 2024-01-30T11:30:52Z

Done. Please check

gunnarmorling · 2024-01-31T20:57:30Z

00:25.064.

gunnarmorling added the output differs label Jan 27, 2024

sudhirtumati implementation

1a14699

sudhirtumati force-pushed the sudhirtumati_impl branch from ab513a7 to 1a14699 Compare January 30, 2024 11:29

gunnarmorling merged commit b91c95a into gunnarmorling:main Jan 31, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sudhirtumati implementation #598

sudhirtumati implementation #598

sudhirtumati commented Jan 27, 2024

gunnarmorling commented Jan 27, 2024 •

edited

Loading

sudhirtumati commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

sudhirtumati commented Jan 29, 2024

gunnarmorling commented Jan 29, 2024

sudhirtumati commented Jan 30, 2024

gunnarmorling commented Jan 31, 2024

sudhirtumati implementation #598

sudhirtumati implementation #598

Conversation

sudhirtumati commented Jan 27, 2024

Check List:

gunnarmorling commented Jan 27, 2024 • edited Loading

sudhirtumati commented Jan 28, 2024

gunnarmorling commented Jan 28, 2024

sudhirtumati commented Jan 29, 2024

gunnarmorling commented Jan 29, 2024

sudhirtumati commented Jan 30, 2024

gunnarmorling commented Jan 31, 2024

gunnarmorling commented Jan 27, 2024 •

edited

Loading