Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple File Bulk Upload yields Inconsistent Results #636

Open
joannekoch opened this issue Jan 11, 2023 · 23 comments
Open

Multiple File Bulk Upload yields Inconsistent Results #636

joannekoch opened this issue Jan 11, 2023 · 23 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@joannekoch
Copy link
Collaborator

When Bulk Upload AssetRecord files are upload individually one at a time, a total of 5,087 Inventory items are created in the Retired location. Note that there are 6 Bulk AssetRecord files. This is the correct number of Inventory items calculated from the AssetRecord csv files.

When the 6 Bulk AssetRecord files are uploaded all at once, the number of Inventory items created in the Retired location is different each time. Note that I waited about 20 minutes for all the Inventory items to be created or for the number of Inventory items to stop increasing. I cleared all the Inventory items in the Retired location after each Bulk upload (by deleting the associated Part Template).

Below are each of the run results from the bulk upload of 6 AssetRecord files at the same time:

  1. 4258 Inventory items created in Retired.
  2. only 650 Inventory items created in Retired.
  3. 4377 Inventory items created in Retired.
  4. 2808 Inventory items created in Retired.
  5. 3220 Inventory items created in Retired.

Note: the celeryworker reports the bulk file parse succeeded with each run.
Capture

@joannekoch joannekoch added the bug Something isn't working label Jan 11, 2023
@joannekoch joannekoch added this to the v1.7.0 milestone Jan 11, 2023
@MarioCarloni
Copy link
Collaborator

MarioCarloni commented Jan 19, 2023

Hi @joannekoch, based on the optimized bulk import logic, the smallest files should finish first, followed by larger files. The largest file, sensor_bulk_load_AssetRecord, takes roughly 3 minutes locally, also confirmed by your screenshot. Smaller files should also generate Inventory first, so the number of Inventory items should increase with every consecutive AssetRecord file parsing task completion.

Are you saying the Inventory item total count was increasing while AssetRecord files were being parsed, or that it was both increasing and decreasing?

@joannekoch
Copy link
Collaborator Author

Hi @MarioCarloni, I'm not talking about the total count increasing or decreasing. I'm referring to the Final Count of Inventory Items after waiting 20 minutes for the import to complete.

When Bulk Upload AssetRecord files are upload individually one at a time, a total of 5,087 Inventory items are created each and every time - this result is expected and the final count is correct.

When the 6 Bulk AssetRecord files are uploaded all at once, after waiting 20 minutes, the final count of Inventory items NEVER reaches 5,087 and the final count varies as I mentioned above in run results 1 - 5.

After waiting 20 minutes, the final count of Inventory items does not change, so I know the upload has completed.

@MarioCarloni
Copy link
Collaborator

MarioCarloni commented Jan 19, 2023

Great find @joannekoch , this was a complicated issue to chase down. I think I see what's happening. The Part Template values in the Bulk AssetRecord files from the Test Asset Management repo are duplicated:
https://github.com/oceanzus/asset-management/tree/addcolumntest

In a multithreaded task, queried and generated data must be unique. If more than one task (File) requests the same resource (Part, for example) the task MAY inconsistently fail. As a test, I appended numbers to the Part Template values in the two largest Bulk AssetRecord CSV files, sensor and unclassified bulk load CSV, such that unique Parts were generated for every AssetRecord row. The correct number of Inventory items were generated based on the number of rows from each CSV. The correct number of items also were generated if Part values were duplicated in a single file, but unique across Files.

This may be a question for @cahearn, how unique will Part Templates be once incorporated in the Bulk AssetRecord CSV's?

@MarioCarloni
Copy link
Collaborator

MarioCarloni commented Jan 20, 2023

Please see below for two modified AssetRecord CSVs containing unique Part values per row:

unclassified_bulk_load-AssetRecord - unclassified_bulk_load-AssetRecord.csv
sensor_bulk_load-AssetRecord - sensor_bulk_load-AssetRecord.csv

@joannekoch
Copy link
Collaborator Author

Testing with part number duplicates only within the AssetRecord files is not working either.

20 min after the upload is executed, only 2070 Inventory items were created.

None of the inventory in the platform or sensor files were created.

I used the following rdb_part_template assignments:
array_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay1
node_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay2
unclassified_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay3
platform_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay4
eng_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay5
sensor_bulk_load-AssetRecord.csv - Acomms-DUBO_343_Science Bay6

Capture

@MarioCarloni
Copy link
Collaborator

Hi @joannekoch another great find! I modified the Part value in each AssetRecord CSV to match yours and obtained the same result. I was able to find the culprit: Custom Fields.

In the Bulk Import logic, Custom Fields are obtained or created if not existing. This worked great before the optimization, but now we're running into the a similar duplication issue as we did before with Part values. To get around this, there are a list of Custom Fields that would need to exist prior to Bulk Import, by adding them to the "Custom Fields" template:

CI TYPE
CI Mobile
Manufacturer Serial Number
Firmware Version
Date Received
CI comments
Owner
CI Array Geometry
CI Commission Date
CI Decommission Date

Using the unique Part values across Files, but duplicated within each File, I was able to generate the correct number of rows.

Good find!

@joannekoch
Copy link
Collaborator Author

Hi @MarioCarloni, that's great you were able to find the problem in the Custom Fields! I'll give it a try.

@joannekoch
Copy link
Collaborator Author

joannekoch commented Feb 2, 2023

Hey @MarioCarloni, hmmm.... this doesn't work for me. I started with an empty database. I created the Custom Fields above of instrument type and Bulk Imported the 6 AssetRecord files above. Only 1152 Inventory Items were created and not the expected 5,087.

In addition, this wierd thing happened - "copies" of the uploaded AssetRecord files were created in my local dir by the bulk upload.
Capture

After an hour, I also noticed this in my terminal window:
Capture

@joannekoch
Copy link
Collaborator Author

This still is not uploading all of the 5087 Inventory Items. Only 2451 of the Inventory were uploaded after waiting 20 minutes.
In addition, a Deadlock Detect error occurred in the running console.

Capture

Capture1

@MarioCarloni
Copy link
Collaborator

@joannekoch I deleted and rebuilt all containers, including the volume, remade the Custom Fields referenced above, then imported the files below. Each file contains an rdb_part_template value that is unique to the file, and duplicated across rows. Approxmately 4500 records in total. Since the Part values were unique across Files, the correct number of Inventory items should be found under the Part template's "Inventory Count" field.

Please see below for files used:

sensor_bulk_load-AssetRecord - sensor_bulk_load-AssetRecord.csv
unclassified_bulk_load-AssetRecord - unclassified_bulk_load-AssetRecord.csv
eng_bulk_load-AssetRecord - eng_bulk_load-AssetRecord.csv

@joannekoch
Copy link
Collaborator Author

When I deleted and rebuilt all containers, including the volume, remade the Custom Fields referenced above, then imported your files above. I only get 283 inventory items imported for bay2 when there should be 1,354.
Capture1

Do you get this error in the console where there's an exception in the celeryworker? I get this error everytime I try to upload multiple bulk files.
Capture

@MarioCarloni
Copy link
Collaborator

@joannekoch No related Sentry errors found. Thinking about it more I did have an atomic transaction flag on the bulk upload process (celery-task related) but have removed it since. Changes should be in release-1.7.0.

@MarioCarloni
Copy link
Collaborator

@joannekoch On RDB-Test, I have deleted the ACOMMS SCIENCE BAY Parts, and all related Inventory items. I then uploadedthe 3 files mentioned above using the Bulk Upload CSV tool from RDB Test:

eng_bulk csv: ~ 544 rows
unclassified_bulk csv: 1354 rows
sensor_bulk csv: 2584 rows

On RDB-Test, the respective ACOMMS SCIENCE BAY Parts generated contain Inventory item counts equal to the number of rows in each respective bulk upload csv. No errors are generated from the Sentry app. I'm still not sure why we're getting different results locally. I will try updating to the latest version of Docker to see if that causes additional issues.

If you'd like, try deleting the related Parts from RDB-Test, and reupload with the csv's from above.

Thanks again for your help chasing this down!

@joannekoch
Copy link
Collaborator Author

@MarioCarloni On RDB-Test, I deleted the ACOMMS SCIENCE BAY Parts, and all related Inventory items. I then uploaded your 3 files mentioned above using the Bulk Upload CSV tool.

I noticed that the behavior on RDB-Test is very different than my local environment.

On RDB-Test, the 3 upload files are processed SEQUENTIALLY: first Bay1 Part is created and then its 544 Inv items are created, then Bay 3 Part is created and its 2582 Inv items are created, and lastly Bay 2 Part is created and its 1354 Inv items.

In my Local Environment, the 3 upload files are processed at the SIMULTANEOUSLY: Bay1, Bay2, and Bay3 Parts are created right away and at the same time. The Inv items are created for Bay1, Bay2, and Bay3 all at once. Bay1 finishes first with 544 Inv items created, Bay2 finishes next with 1354 items created, and Bay3 finishes last with only 494 Inv items.

I get the Deadlock error I reported above every time I do this bulk upload. I never get all 2582 Inventory Items created for Bay3. The celeryworker ForkPoolWorker task never posts a succeeded message in my terminal window for Bay3.

What would account for the difference in the processing method?

Note: I upgraded my docker engine, deleted all my images, containers, and volumes.

@MarioCarloni
Copy link
Collaborator

The difference in processing method between our local environment and RDB-Test could be due to differences in the hardware resources available; the number of CPUs and amount of RAM.

In your local environment, it appears that the processing is happening simultaneously because there are enough resources to handle the creation of all three parts and their inventory items at the same time. On RDB-Test, however, the processing is happening sequentially, possibly because there are not enough resources to handle all three concurrently.

We could investigate the hardware resources available on RDB-Test and consider upgrading or scaling up as necessary to prevent these kinds of issues.

My other thought is that the Celery multithreaded service on RDB-Test has not been restarted, causing the service to cache old code.

And I could explore other ways to optimize the bulk upload process, such as breaking up the upload into smaller batches or optimizing the database schema to improve performance.

@joannekoch
Copy link
Collaborator Author

Thanks for the explanation @MarioCarloni and thanks for your patience investigating this issue!

@MarioCarloni
Copy link
Collaborator

MarioCarloni commented Mar 24, 2023

@joannekoch You were right! On RDB-Testing, I deleted the 3 ACOMMS SCIENCE BAY Parts and all related Inventory. I then reuploaded the 3 csv's mentioned above and am checking Sentry for errors while the files process.

I'm getting the Deadlock error from Sentry:
image

It looks to be Inventory-related. I'll find out what's going on and create another PR:
image

@joannekoch
Copy link
Collaborator Author

@MarioCarloni, thanks for testing on rdb-testing! Glad it was found.

@joannekoch
Copy link
Collaborator Author

@MarioCarloni, I got the latest hotfix and deleted 3 ACOMMS SCIENCE BAY Parts and all related Inventory. I then reuploaded the 3 csv's mentioned above. A deadlock occured and there were only 268 Inventory items created for Science_Bay_3 instead of the expected 2584.

Capture

@MarioCarloni
Copy link
Collaborator

Thanks @joannekoch I'll take a look

@MarioCarloni
Copy link
Collaborator

MarioCarloni commented Apr 5, 2023

@joannekoch Reuploaded the two files below. For ACCOMS part 2 (https://rdb-testing.whoi.edu/parts/1195) there are 1354 items, matching the number of CSV rows. For ACOMMS part 3 (https://rdb-testing.whoi.edu/parts/1194) there are 2584 inventory items, also matching the number of CSV rows. Not getting any new Deadlock errors on Sentry. Thinking about it, I'm wondering if there potentially is a duplicate inventory item mentioned in more than 1 Bulk Upload file..

Can you try deleting the ACCOMS Parts, and try uploading again?

sensor_bulk_load-AssetRecord.-.sensor_bulk_load-AssetRecord.csv
unclassified_bulk_load-AssetRecord.-.unclassified_bulk_load-AssetRecord.csv

@joannekoch
Copy link
Collaborator Author

@MarioCarloni, I tried what you did above and I got the same result as you and I didn't get any deadlock errors.

I then uploaded the eng csv file after and it uploaded without any errors.

I wonder if it's a simple as three or more large upload files running at the same time?

@joannekoch
Copy link
Collaborator Author

@MarioCarloni, hmm.. interesting... yeah maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants