Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Large Dictionaries Failing to Delete #375

Closed
wants to merge 1 commit into from

Conversation

MarvNC
Copy link
Member

@MarvNC MarvNC commented Dec 18, 2023

This changes the bulk delete function to make delete requests in sequence to resolve #374.

I'm not sure if this is the best way to fix the issue, am open to feedback.

@MarvNC MarvNC requested a review from a team as a code owner December 18, 2023 06:21
Copy link

✔️ No visual differences introduced by this PR.

View Playwright Report (note: open the "playwright-report" artifact)

@djahandarie
Copy link
Collaborator

Not sure about approach, maybe @toasted-nutbread will have a thought.

Unrelated to that, do we have a basic test for adding then removing a dictionary and checking that nothing has changed in the DB (or only the expected thing has changed in the DB, not sure if something does)? That might be a good way to ensure that a change like this isn't somehow breaking the basic deletion functionality.

@toasted-nutbread toasted-nutbread self-requested a review December 18, 2023 16:57
@toasted-nutbread
Copy link

toasted-nutbread commented Dec 18, 2023

I made a generalized version of the function which can be configured to either run everything in parallel or everything in serial, so I could test a few things:

    /**
     * @param {IDBObjectStore} objectStore The object store from which items are being deleted.
     * @param {IDBValidKey[]} keys An array of keys to delete from the object store.
     * @param {number} maxActiveRequests The maximum number of concurrent requests.
     * @param {number} maxActiveRequestsForContinue The maximum number of requests that can be active before the next set of requests is started.
     *   For example:
     *   - If this value is `0`, all of the `maxActiveRequests` requests must complete before another group of `maxActiveRequests` is started off.
     *   - If the value is greater than or equal to `maxActiveRequests-1`, every time a single request completes, a new single request will be started.
     * @param {?(completedCount: number, totalCount: number) => void} onProgress An optional progress callback function.
     * @param {(error: ?Error) => void} onComplete A function which is called after all operations have finished.
     *   If an error occured, the `error` parameter will be non-`null`. Otherwise, it will be `null`.
     * @throws {Error} An error is thrown if the input parameters are invalid.
     */
    _bulkDeleteInternal(objectStore, keys, maxActiveRequests, maxActiveRequestsForContinue, onProgress, onComplete) {
        if (maxActiveRequests <= 0) { throw new Error(`maxActiveRequests has an invalid value: ${maxActiveRequests}`); }
        if (maxActiveRequestsForContinue < 0) { throw new Error(`maxActiveRequestsForContinue has an invalid value: ${maxActiveRequestsForContinue}`); }

        const count = keys.length;
        if (count === 0) {
            onComplete(null);
            return;
        }

        let completedCount = 0;
        let completed = false;
        let index = 0;
        let active = 0;

        const onSuccess = () => {
            if (completed) { return; }
            --active;
            ++completedCount;
            if (onProgress !== null) {
                try {
                    onProgress(completedCount, count);
                } catch (e) {
                    // NOP
                }
            }
            if (completedCount >= count) {
                completed = true;
                onComplete(null);
            } else if (active <= maxActiveRequestsForContinue) {
                next();
            }
        };

        /**
         * @param {Event} event
         */
        const onError = (event) => {
            if (completed) { return; }
            completed = true;
            const request = /** @type {IDBRequest<undefined>} */ (event.target);
            const {error} = request;
            onComplete(error);
        };

        const next = () => {
            for (; index < count && active < maxActiveRequests; ++index) {
                const key = keys[index];
                const request = objectStore.delete(key);
                request.onsuccess = onSuccess;
                request.onerror = onError;
                ++active;
            }
        };

        next();
    }

The following is parallel, which is roughly the same as what already exists, except commit is called at the end:

this._bulkDeleteInternal2(objectStore, keys, Number.POSITIVE_INFINITY, Number.POSITIVE_INFINITY, onProgress, (error) => {
    if (error !== null) {
        transaction.commit();
    }
});

The following is serial, which is roughly the same as yours:

this._bulkDeleteInternal2(objectStore, keys, 1, 0, onProgress, (error) => {
    if (error !== null) {
        transaction.commit();
    }
});

I first installed the JA Wikipedia dictionary you shared and did a few tests on Chrome, each one that completed took about 200s regardless of parameters:

  • Original function failed as expected.
  • The parallel version of the function above worked a few times and failed a few others.
  • The serial version worked.
  • Trying with parameters maxActiveRequests = 1000, maxActiveRequestsForContinue = 1000 worked. This effectively keeps 1000 requests active at all times.
  • Trying with parameters maxActiveRequests = 1000, maxActiveRequestsForContinue = 0 worked. This effectively runs requests in batches of 1000 and waits for each to complete before starting a new batch.

Testing on Firefox:

  1. this._bulkDeleteInternal(objectStore, keys, 1, 0, onProgress, onComplete); - 163s
  2. this._bulkDeleteInternal(objectStore, keys, 1000, 0, onProgress, onComplete); - 76s
  3. this._bulkDeleteInternal(objectStore, keys, 1000, 1000, onProgress, onComplete); - 78s

So on Firefox, there seems to be a modest performance gain by batching. What the optimal batch size might be, I don't know and would take a long time to test, but test 2 would probably be the best overall for both browsers.

Reference branch: https://github.com/toasted-nutbread/yomichan/commits/refactor-bulk-delete/

@MarvNC
Copy link
Member Author

MarvNC commented Dec 18, 2023

Thanks for looking into this and writing a better version! I wonder why there aren't performance benefits on Chrome.

Also, what's the purpose of surrounding onProgress with a trycatch block? I don't get the purpose when it's already verified to exist.

@MarvNC MarvNC closed this Dec 18, 2023
@toasted-nutbread
Copy link

Also, what's the purpose of surrounding onProgress with a trycatch block? I don't get the purpose when it's already verified to exist.

In case it throws, it won't break the internal implementation. Perhaps it should log, but it's not a major issue if it's entirely silent.

@toasted-nutbread
Copy link

Thanks for looking into this and writing a better version! I wonder why there aren't performance benefits on Chrome.

Both browsers have curious idiosyncrasies with regards to IndexedDB, such as the storage quota issue you pointed out in #374 (comment). What I've found is that the IndexedDB implementations work (and sometimes don't) in mysterious ways.

@djahandarie
Copy link
Collaborator

Firefox's IndexedDB is backed by sqlite, and in general performs much better. Chromium's is backed by LevelDB and has issues. They have an issue on the Chromium side to switch to sqlite but it seems to not be moving recently: https://bugs.chromium.org/p/chromium/issues/detail?id=1409618 I think it'd be quite beneficial for our project if they did tbh...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very Large Dictionaries Fail to Delete
3 participants