Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIGH] Add SQLite support to web browser and fallback to IndexedDB #34289

Closed
muttmuure opened this issue Jan 10, 2024 · 61 comments
Closed

[HIGH] Add SQLite support to web browser and fallback to IndexedDB #34289

muttmuure opened this issue Jan 10, 2024 · 61 comments
Assignees
Labels
NewFeature Something to build that is a new item. Planning Changes still in the thought process Weekly KSv2

Comments

@muttmuure
Copy link
Contributor

muttmuure commented Jan 10, 2024

Placeholder tracking issue for adding SQLite support to Chrome, Safari, Edge and FireFox (Not IE)

Also for adding a fallback mechanism to retrieve data from IndexedDB when SQLite fails or has a problem

cc @roryabraham

@muttmuure muttmuure changed the title [CRITICAL] Add SQLite support to web browser and fallback to IndexDB [HIGH] Add SQLite support to web browser and fallback to IndexDB Jan 10, 2024
@MIAhmed
Copy link

MIAhmed commented Jan 10, 2024

Below is the way to check if the browser supports SQL Lite or IndexedDB then we can perform the appropriate operations like storing and retrieving data from either SQL Lite or IndexedDB.

// Check for SQLite support
const isSQLiteSupported = 'openDatabase' in window;

// Check for IndexedDB support
const isIndexedDBSupported = 'indexedDB' in window;

if (isSQLiteSupported) {
  // Initialize SQLite database
  // ...
} else if (isIndexedDBSupported) {
  // Initialize IndexedDB
  // ...
} else {
  // Fallback to other storage mechanisms or do an alternative
  console.error("SQLite and IndexedDB are not supported in this browser.");
}

@muttmuure
Copy link
Contributor Author

Hey @MIAhmed! What's your Slack handle? Would you be interested in creating a proposal for this?

@roryabraham
Copy link
Contributor

Thanks for running with this @muttmuure, I think generally there was consensus that this is a good idea, but let me post a formal proposal first just to get everyone on board. Going to slap a planning label on this and put it on HOLD

@roryabraham roryabraham changed the title [HIGH] Add SQLite support to web browser and fallback to IndexDB [HOLD][HIGH] Add SQLite support to web browser and fallback to IndexDB Jan 10, 2024
@roryabraham roryabraham self-assigned this Jan 10, 2024
@roryabraham roryabraham added Planning Changes still in the thought process NewFeature Something to build that is a new item. labels Jan 10, 2024
Copy link

melvin-bot bot commented Jan 10, 2024

@melvin-bot melvin-bot bot added the Weekly KSv2 label Jan 10, 2024
@roryabraham
Copy link
Contributor

Chatting in this thread

@adhorodyski
Copy link
Contributor

Does this still rely on the deprecated WebSQL standard? As far as I'm aware only WASM is now a viable option to use SQL on this platform.

@muttmuure muttmuure changed the title [HOLD][HIGH] Add SQLite support to web browser and fallback to IndexDB [HOLD][HIGH] Add SQLite support to web browser and fallback to IndexedDB Jan 11, 2024
@MIAhmed
Copy link

MIAhmed commented Jan 11, 2024

Hey @MIAhmed! What's your Slack handle? Would you be interested in creating a proposal for this?
@muttmuure Yes, I am interested in implementing fallback to the indexedDB as I have already implemented that.
I do not have Slack access

@roryabraham roryabraham changed the title [HOLD][HIGH] Add SQLite support to web browser and fallback to IndexedDB [HIGH] Add SQLite support to web browser and fallback to IndexedDB Jan 11, 2024
@roryabraham
Copy link
Contributor

Does this still rely on the deprecated WebSQL standard? As far as I'm aware only WASM is now a viable option to use SQL on this platform.

No, I'm referring to the official SQLite wasm build

@roryabraham
Copy link
Contributor

Sounds like we got consensus to do some exploratory research here and investigate the potential performance benefits.

@roryabraham
Copy link
Contributor

For transparency to our open-source community, we've asked Margelo to begin researching this since they're experienced with working in Onyx and maintain https://github.com/Margelo/react-native-quick-sqlite

@chrispader
Copy link
Contributor

commenting here for assignment! @roryabraham

@roryabraham
Copy link
Contributor

@chrispader having set up wasm support for Encryptify, I wonder if this is something we can implement at the react-native-quick-sqlite layer?

That way, we would be able to leave Onyx mostly unchanged – just check if OPFS is supported, and if it is use the existing quick-sqlite storage provider, if not fall back on IndexedDB.

We could also create a log of grafana graph to see how much the IndexedDB provider is actually used in the wild. That way a year from now there's no guesswork – we can look at how much it's actually used by users and consider dropping it from Onyx to simplify the codebase.

@chrispader
Copy link
Contributor

chrispader commented Jan 25, 2024

I did some research today and here's are my initial thoughts: cc @roryabraham @muttmuure @adhorodyski

(Lmk if you have any comments/critic)

Which SQLite + WASM approach to take

As suggested in the documentation of the official library (SQLite WASM) i would go for the approach where we use the extra wrapped worker thread. This way, additionally to (potentially) improved speed we have the benefit of offloading much of the load in Onyx to a different thread.

Reference: Worker + Promise approach

@chrispader having set up wasm support for Encryptify, I wonder if this is something we can implement at the react-native-quick-sqlite layer?

I would definitely add this functionality to react-native-quick-sqlite directly. The library is not yet web-compatible anyway, so this would simply extend the library by adding the whole web part and wouldn't interfere with the native implementation at all.

Onyx integration

That way, we would be able to leave Onyx mostly unchanged – just check if OPFS is supported, and if it is use the existing quick-sqlite storage provider, if not fall back on IndexedDB.

Exactly! As you've suggested, we would then check for availability of OPFS in Onyx and use IndexedDB (idb-keyval provider) as fallback.

Onyx operates async anyways, so this approach wouldn't require any changes in the library.

Relational data approach / Onyx re-design?

From https://expensify.slack.com/archives/C05LX9D6E07/p1704972138886149?thread_ts=1704921473.038539&cid=C05LX9D6E07 by @adhorodyski

one of the benefits is the unification of the query layer - this can potentially really simplify the architecture which is now limited to the least capable platform (key-value on web). Think indexes for example, running migrations, or bringing in an ORM for a type-safe database access.

I'm not sure if i understood your idea 100%, but if it refers to re-designing Onyx to support relational data, i don't think that would be a good idea atm. I wouldn't suggest to rely on any specific storage provider implementation too much, like SQLite on both web and native. Or was the idea to drop Onyx and create a completely re-designed (relational data) storage solution?

I like the idea of improving the overall performance by not trying to convert relational data into key-value data and therefore losing potential of indexing and efficient querying. I just think it wouldn't be a good idea to try achieve this in Onyx...

Imo Onyx is exclusively designed as a key-value store and its big advantage is that it's not dependent on any specific storage layer, instead it can use any underlying storage solution... and we will potentially improve this aspect even more.

I also agree with @tgolen, that as long as we don't completely re-structure our approach to querying and persisting data, i don't think we need react-query right now.

(Correct me if i got any aspect of your suggestion wrong though)

Performance

As @roryabraham already mentioned in this Slack thread the main benefit would definitely be faster speeds and to offload a lot of work to a separate worker thread and therefore unblock the main thread and improve app's performance.

I haven't created any real benchmarks and performance comparisons with IndexedDB yet, but from what i've read it's definitely faster than IndexedDB.

(Asking about what to do next in "Going forward...")

Analytics

We could also create a log of grafana graph to see how much the IndexedDB provider is actually used in the wild. That way a year from now there's no guesswork – we can look at how much it's actually used by users and consider dropping it from Onyx to simplify the codebase.

That definitely makes sense! Do we already use Grafana and is there an account/token i can use?

I could also imagine adding analytics for web and native performance of SQLite. This way we could analyze which kind of devices profit the most from this change and which might have problems.

Going forward...

@roryabraham @tgolen just to make sure i'm not putting too much work into this straight ahead..

Was the idea to create a P/S for this with all the actual problems solved in the app first?

Or do we want to dive straight into the actual implementation in react-native-quick-sqlite? After my initial research, i don't see any functionality of the library that couldn't be realized with the WASM implementation of SQLite.

Also, do we want to create performance benchmarks before or after the actual implementation in the library?

@sgbeal
Copy link

sgbeal commented Feb 12, 2024

based on the speed test in https://wasm-testing.sqlite.org/speedtest1.html, if instead of storing JSONs we had a few structured tables for data that will be abundant and we have a format that is mostly defined (reportActions, transactions) would those operations be faster?

That's really anyone's guess and mine would be mere speculation. That will, at least in part, depend on whether those objects span db pages, because...

@sgbeal so you are saying, that usage SharedArrayBuffer should make communication faster?

Faster compared to (say) postMessage() because it eliminates an extra copy which might otherwise be necessary1. For example, when the library reads from storage, it passes a destination byte array to the VFS. In this particular case, the library and the VFS-related I/O live in different threads but they communicate via the same SharedArrayBuffer, so the I/O itself is performed directly on that shared buffer. The coordination/synchronization of the communication between those two threads, irrespective of the actual I/O, is where the biggest single performance hit comes from. The process goes something like:

  • The library tells the VFS "go fetch this byte range while I wait on you," at which point the library uses Atomics.wait() to wait on completion.
  • The I/O thread, which has been using Atomics.wait() to wait for a task request, wakes up, reads those bytes into the SharedArrayBuffer, reports "work is done - you may continue now" using Atomics.notify(). It then calls Atomics.wait() to wait on the next request.
  • The library thread receives that Atomics.notify() notification via its own blocking Atomics.wait() call and can then consume the bytes from the SharedArrayBuffer.

The same happens for writing: the library thread populates the SAB, tells the I/O thread that data is available, and then waits on the I/O thread's response to that write. Thanks to the SAB, the I/O thread does not need to make a separate copy of that data before passing it on to the OPFS layer.

Does that answer your question?

Footnotes

  1. postMessage() can move, rather than copy, buffers between threads, but in our case doing so is more expensive/slower than an Atomics.wait/notify()-based construct because we have to either create a new buffer for each operation or coordinate moving of a single buffer back and forth between threads. Our very first prototype used postMessage() for the cross-thread communication but it didn't perform as well.

@kirillzyusko
Copy link
Contributor

Does that answer your question?

Yes, thank you 🙏

We've investigated SharedArrayBuffer usage, but since it operates on bytes level, we anyway need to serialize/deserialize a data and here we will loose our performance. I. e. if we do JSON.stringify to pass a data to worker, then here we are wasting a lot of CPU resources (but if there is another optimal way of converting JS object into SharedArrayBuffer - let me know).

We also looked at the performance when we have no serialisation/deserialisation on the worker thread API, but assuming that all the data layer code would operate in the worker thread directly (ie. API calls happening there and onyx code lives there as well). For that case we have these results:

Operation SQLite (worker) - sql queries execution time
Single set 21.9
Get collection 9.21
Merge collection 229.6
Single merge 124.1
Clear 506.2

I. e. if there is no communication between threads, then SQLiteWASM is an absolute winner 💪 (I also added these numbers to main comparison table).

Let me know what do you about the results and further direction 😊

@sgbeal
Copy link

sgbeal commented Feb 12, 2024

We've investigated SharedArrayBuffer usage, but since it operates on bytes level, we anyway need to serialize/deserialize a data and here we will loose our performance. I. e. if we do JSON.stringify to pass a data to worker, then here we are wasting a lot of CPU resources (but if there is another optimal way of converting JS object into SharedArrayBuffer - let me know).

AFAIK there is no more optimal approach for general-purpose or high-level software. We internally use a fit-to-purpose serialization approach which has proven to be faster, but it's in no way flexible. It depends, for example, on having a fixed number of fixed-size values, which isn't practical for anything but the most specialized of use cases.

Let me know what do you about the results and further direction

Unfortunately, my lack of background with your apps, infrastructure, and use cases leaves me unable to say terribly much which is likely to be of use.

If you like, we can do a video call and, with some hand-holding from you, perhaps i'll be able to offer some concrete suggestions and/or provide guidance on getting the most out of sqlite's JS/WASM APIs. Just send a meeting link and time to support at sqlite org and i can make the time slot work.

@kirillzyusko
Copy link
Contributor

@sgbeal I sent an invitation to your email (took it from github profile, sorry, didn't find your contact on support page)

@sgbeal
Copy link

sgbeal commented Feb 13, 2024

I sent an invitation to your email (took it from github profile,

Got it. See you on Friday!

@kirillzyusko
Copy link
Contributor

We had a meeting with @sgbeal, key notes are:

  • we can not use main-thread approach because our data can be more than 3MBs so we will not be able to persist it to local-storage;
  • one VFS doesn't support multi tab access (VFS-2) so we can not use in web app;
  • VFS-2 can be used with concurrency, but it's supported only in latest Chrome browser versions (potentially desktop app can use it, since there we can specify chrome version that should be used);

After a discussion some options with @sgbeal we came to next conclusions:

  • we can try to pass transfer array to avoid copying of large string/JSON objects (I did it and it didn't give a performance boost)
  • to run all react-native-onyx code in a worker thread - it'll be very fast, but we'll need to change a lot of the code (We'd also considerably need to rearchitecture the app to move the code that interacts with Onyx (such as loading data from API) to that worker thread. In this new architecture all the business logic would probably happen in the worker thread and we'd only post to the main thread updates for UI Models. This could be interesting, but its hard to guess the effort. If you want to we can spend some time investigating that approach to see if its possible without a major rewrite)
  • to support structured data - in this case we'll need to rework an entire architecture in app to have a relationship data instead of key-value approach (but in this case we'll be able to use all the power of SQLite)

@sgbeal
Copy link

sgbeal commented Feb 16, 2024

VFS-2 can be used with concurrency, but it's supported only in latest Chrome browser versions

To clarify: that capability is currently hypothetical. The new Chrome feature might allow us to add concurrency to that VFS, but we won't know for sure until we're able to experiment with it. Even if we do that, however, that OPFS feature is experimental and currently only in Chrome, so it may still change in incompatible ways. We have no information about whether the other browsers plan to adopt that feature and we are not keen on the idea of releasing a Chrome-only VFS.

@melvin-bot melvin-bot bot added the Overdue label Feb 26, 2024
@roryabraham
Copy link
Contributor

most recent discussion here

@melvin-bot melvin-bot bot removed the Overdue label Feb 26, 2024
@melvin-bot melvin-bot bot added the Overdue label Mar 5, 2024
@roryabraham
Copy link
Contributor

conversation is ongoing, with some recent highlights being:

@quinthar:

So there are kind of three different optimizations being discussed, and they are entirely independent (ie, we can do any of them by themselves), but we are talking about them as fi they are somehow the same:

  1. Query speed: Use on-disk imaging to improve disk read times
  2. Worker offload: Add APIs to postprocess (ie, sort/filter) results on the worker thread to improve main thread responsiveness
  3. Worker overhead: Reduce the amount of data going over postMessage to improve overall command performance

Of all those, I think the most important goal should be offload both querying and postprocessing to the worker thread to keep the main thread continuously responsive. Improving the query speed is nice, but kind of marginal.

@adhorodyski:

On the 3 extracted topics, worker offload, worker overhead, query speed (thanks for this clarification!):
I gave it another spin and and as crazy as it gets, I think I've managed to solve all of them at once with the current key-value data structure for SQLite (amazes my every day how powerful it is 😛). You can think of it as an API for heavy processing on the other thread with very little adjustments to what we have right now 🙂
I'll do my best to share a POC shortly with you but need to run some more tests.
Without moving anything around I'm able to query the KV structure using SQL and get the UI reactivity based on the database updates. So far it looks like it can be both performant and require little to changes, just adding the new queries on top and syncing with database hooks

excited to hear more about @adhorodyski's proposal.

@melvin-bot melvin-bot bot removed the Overdue label Mar 6, 2024
@kidroca
Copy link
Contributor

kidroca commented Mar 7, 2024

I’ve been deeply invested in ongoing discussions about performance and have observed that our primary challenge doesn't seem to stem from the storage database itself but rather from extensive processing times on the main thread. This challenge manifests in two main areas:

  • Substantial computations within client code, notably in components like LHN.
  • Onyx’s architecture, which processes everything on the main thread, contributing to bottlenecks.

It’s clear these issues are significant and not directly related to the choice of storage provider. I’m curious about the native side - do we face similar challenges there, or is the conversation about moving storage handling off the main thread mainly focused on web implementations?

Given my history with Onyx, I have immense respect for what we've built and its evolution from a simple pub-sub model to a sophisticated key/value storage solution. As we navigate these performance challenges, I believe it's beneficial to revisit the foundational goals of Onyx:

  • What needs did Onyx initially aim to fulfill?
  • Could exploring a facade or middleware approach with another state management solution, like Redux, benefit us?

Reflections on Onyx’s Purpose:

  • Onyx’s simplicity and the ease of its pub-sub interface for components have been its strengths, enabling quick onboarding and straightforward usage.
  • It was designed to function outside of React, which, while initially seen as an advantage, has led to challenges with static connections and the inability to lazy load resources as one might with React components.
  • It aimed to reduce boilerplate code.

Considering Alternatives:

  • Before Onyx, had we explored using existing solutions like Redux? Redux’s adaptability with various storage backends could offer valuable insights into alternative architectures.
  • The interface Onyx provides for component subscriptions isn’t vastly different from Redux’s. However, the approach to state updates and management diverges significantly, without the structured flow of reducers and actions seen in Redux.

I once hesitated to use Redux, favoring more familiar patterns like MVC with MobX. Yet, the potential for integrating existing solutions through middleware, possibly enhancing Onyx or even adopting new approaches, is worth considering. This doesn’t diminish Onyx’s value but opens a dialogue on leveraging collective wisdom and advancements from the wider community.

As we ponder the future of Onyx, I see an opportunity for evolution rather than abandonment. Could we envision a redesign or integration that honors Onyx's original principles while addressing our current challenges? I’m eager to contribute to a path forward that builds on our collective achievements and addresses the needs of our evolving technology landscape.

@tgolen
Copy link
Contributor

tgolen commented Mar 7, 2024

I have deeply appreciated all your work on Onyx over the years! Your expertise and opinions are super gracious and encouraging. I'll respond to a few points here and give some general opinions of mine.

What needs did Onyx initially aim to fulfill?

  • A single interface Onyx.connect()
  • Lightweight
  • Data is persisted to disk
  • UI binds to the data
  • No boilerplate
  • Purely async (should not access promises returned from methods)
  • cross platform

I think we did a pretty good job of delivering those things.

Before Onyx, had we explored using existing solutions like Redux?

Yes, I explored that heavily. I have used Redux in other projects, so I was pretty familiar with it. I found that the excessive boilerplate was just too much to deal with in a huge project. I also explored alt in a desire to see if a flux pattern with less boilerplate would help. I built the K2 extension with alt originally. I still didn't like it there and the patterns and boilerplates were still more than we desired.

It’s clear these issues are significant and not directly related to the choice of storage provider. I’m curious about the native side - do we face similar challenges there, or is the conversation about moving storage handling off the main thread mainly focused on web implementations?

The off-thread conversation has been focused on web implementations. That's not to say that the native side is perfect, but it seems like the web side is where most people have focused on performance. I don't know if there is a correlation there or if it's just natural tendencies.

Could we envision a redesign or integration that honors Onyx's original principles while addressing our current challenges?

Yeah, I am totally down for this. I think of the useOnyx hook as almost like version 1.5 of Onyx, and I think it's entirely possible to build a 2.0 version that is a complete rewrite and rethinking of the library, building off of everything good that was going on in 1.0. I'd much rather explore this than to try to grab an off-shelf solution that we have little control over and try to get it to work (only to find out that we are in a much worse spot than we were with Onyx 1.0).

@roryabraham
Copy link
Contributor

roryabraham commented Mar 11, 2024

An interesting library we could consider (or pull from for inspiration) is @legendapp/state. Reviewing the design goals for Onyx written by @tgolen, and adding this one:

  • it should be as performant as possible

legend-state seems to check all the boxes. I believe it's quite similar to Onyx in its design, with even less boilerplate. I also think it would be feasible to leverage it to simplify Onyx internals while keeping the existing interface for Onyx largely unchanged, avoiding the need for a major app-wide refactoring effort.

I've also met the maintainer in person a few times at React Native conferences, and I believe he'd be open to working with us. A direct line to support could alleviate concerns with lack of control over the library.

@adhorodyski
Copy link
Contributor

adhorodyski commented Mar 13, 2024

@roryabraham I'm going to need some more time to prepare a draft of how I think this could look like, but to give you all more context I genuinely think nearly all we need is the right design and composition of some primitives, not much of the new tools - this might be the 10% but mostly due to the ergonomics of not rewriting community libraries.

I basically work with SQL queries & JS Promises, binding them to the UI.

I'll do my best to share a WIP proposal of this design next week.

@melvin-bot melvin-bot bot added the Overdue label Mar 19, 2024
@roryabraham
Copy link
Contributor

It seems like conversation has stalled for the last few weeks, and the initial investigations we did have prettymuch concluded. It sounds like @adhorodyski has an idea he'd like to propose for which SQLite on the web would be a prerequisite, so I'm going to assign this issue to him. I'm also going to unassign @hannojg and @kirillzyusko because their explorations are complete.

@melvin-bot melvin-bot bot removed the Overdue label Mar 21, 2024
@adhorodyski
Copy link
Contributor

@roryabraham still working on it - correct with the web support being a prerequisite, but after a few discussions with @jbroma together we decided to slightly pivot as IndexedDB is for now the only option which fully supports this platform. I'm aiming for a design where once web catches up, we can easily migrate just this very piece with a very little effort. Will update within days!

@roryabraham
Copy link
Contributor

Hey @adhorodyski, we discussed this internally, and since we're not working from a clear problem statement here and the ROI of any changes is still very much up-in-the-air, we've agreed that we're going to close this issue for now.

We advise that you shift your focus into other areas of the performance audit for the time being. We love SQLite and may want to pick this discussion back up someday, but for now we're going to cap it here for #focus. Thanks everyone for all your participation so far!

@adhorodyski
Copy link
Contributor

adhorodyski commented Apr 2, 2024

@roryabraham I agree this was fine to be closed - I'll post my research on Slack just so we don't loose it later on (I learned a lot on the integrations for SQLite/IndexedDB), I think this is still valuable and holds/maybe we'll continue the discussion from there/kill it with fire for now :D

@adhorodyski
Copy link
Contributor

cc @roryabraham posted internally under this thread, please let me know if there's a better place for this :)

@roryabraham
Copy link
Contributor

thanks, I think that's an appropriate place for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NewFeature Something to build that is a new item. Planning Changes still in the thought process Weekly KSv2
Projects
Development

No branches or pull requests