-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline support fixes #546 #1000
base: master
Are you sure you want to change the base?
Conversation
@jasonwilliams Sorry for the ultra late reply! I promised last Saturday but it took me longer sadly.. I checked https://jason-williams.co.uk/book/ but when I plug off my Ethernet cable I get a "Not found page" on Firefox 68.0.2 (64-bit). Is it possible that something changed? I will go check the other instructions as well. |
@jasonwilliams Could you give me a few more pointers on the instructions?
I cloned the book repo and added the offline-support line. I guess I also need to clone your version of the mdbook repository? What do you mean exactly with disable the check for localhost? What should I verify once I did these steps? |
@DevQps could you refresh your cache and try https://jason-williams.co.uk/book/ again? (whilst online). If you have the same problem, open your dev tools and go to |
I tried it again! I figured out that it caches the pages I have been to before, but not the pages I have been to. For example I could browse to Functions again, but not to other pages. I did a Crtl + Shift + R refresh and deleted all storage / cookies manually as well, but no change. |
@DevQps what browser are you using? |
Hey Jason!, I tried it again on Firefox and it worked. Then I tested it on Chrome again, and the first time it failed. But when I reconnected + browsed a page it worked. Could it be that either A) It takes some times (as in around 10 seconds) to fully load everything, or B) That it only works after a reload (i.e. the new service worker is run?) So I think everything works now! |
@DevQps the dev tools tab (applications) should show you which pages have downloaded You may have clicked away before everything downloaded |
That's probably the case! I think this MR is fine! |
@DevQps what’s the next steps? |
I am not sure! @steveklabnik Can you review this maybe? |
I am not really involved in mdbook maintenance these days; @Dylan-DPC should know. |
src/theme/sw.js
Outdated
|
||
// Local resources | ||
workbox.routing.registerRoute( | ||
new RegExp(".woff2?|.ttf|.css|.js|.json|.png|.svg"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more robust regex?
And all registered routes above may need some modifications and escaping such as ^
.
new RegExp(".woff2?|.ttf|.css|.js|.json|.png|.svg"), | |
/\.(woff2?|ttf|css|js|json|png|svg)$/ |
src/theme/book.js
Outdated
previousScrollTop = document.scrollingElement.scrollTop; | ||
}, { passive: true }); | ||
if ("serviceWorker" in navigator && !isLocalhost) { | ||
navigator.serviceWorker.register("sw.js").catch(function(error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encountered this error when navigate to pages other than base URL.
Service worker registration failed: TypeError: "ServiceWorker script
at http://localhost:3000/format/sw.js
for scope http://localhost:3000/format/ encountered an error during installation."
The problem here is that the register path is relative to current file URL. We can resolve it by adding <base href="https://our.basedomain.com">
though this would be a huge change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we could merge this PR and open another if the problem is too thorny to deal with at this time.
e7fbd09
to
655961d
Compare
@weihanglo rebased & fixed regex |
Besides this |
@jasonwilliams |
What's currently blocking this from being merged? Having it use the cache first then fetch and give a refresh pop up? |
It looks like fontawesome's acting weird. With "Lie-Fi", it takes 2 failed requests for the SW to return and since it for some reason isn't being cache properly it shows invalid icons every time you go to a new page. Not sure if this has been fixed since the version you have on your hosted version, but seems odd. It loads fine completely offline. Also, not sure if it's worth adding a config table for custom regex, definitely has a use but you can just make your own |
Not much just testing. I can fix issues but if there’s no response there’s not much I can do. |
Only issues I've had while testing is FA afaik. Knowing when it's done caching/refreshing the cache might be helpful on mobile (my favourite is https://stripe.com/docs/api which has a little icon at the top, although I don't know how it behaves if it's stale). Since it would be helpful with my project I'd even be willing to maintain a fork and keep rebasing/merging if the devs don't find this useful (judging from the reactions in the issue it didn't seem like a priority but since it's already implemented I'm not sure). Are you expecting issues to be found? I feel like it would mostly be feature requests that can be added after an initial release (e.g. the popup when updated) |
@nihaals the first issue we need to sort out is StaleWhilerevalidate vs “fetch first”. I don’t think we made a decision on this. Personally I don’t mind SWR but @ehuss had issues with it. We need to make a decision on this to move forward, otherwise we’re at an impasse. I think outside of that it was fine (for me) but @ehuss hit some bugs. I couldn’t reproduce any of them so we we’re stuck.
This is good to know, nice to see there is precedent for doing this on documentation pages |
Just off the top of my head SWR Pros/FF Cons
SWR Cons/FF Pros
I think double refresh is a hybrid approach but hurts UX and takes away the 2nd and 3rd SWR Pros I haven't tested on iOS yet but I don't see why it wouldn't work, although workbox is fairly out of date now, there's a new major version, in fact v6 is currently in its 2nd alpha release Edit: iOS works great |
I'm sorry that it seems that I am ignoring this. I have very little time to spend on mdbook. I don't really know anything about webworkers, so it is difficult for me to understand or review this. I asked a few questions above which never got answered, which makes me more weary about this. In particular, I have no idea how to inspect, debug, or otherwise work with webworkers. I don't know how to delete them, or stop them, or control them as a user. I see your points about offline-first, but I still feel like for the common case of people being online will result in a worse experience (that is, a flash and reload, or a manual intervention). Perhaps if offline support required an opt-in by the user, that would reduce the risk, and make it easier to roll out and test? I'm also not proposing that as a permanent solution, but rather as a way to roll it out and get more testing and experience and exposure with it. It would also allay my concern that if it has any problems, there would be some way for the user to disable and remove the cached version. |
Is there anyone else maintaining this who can help?
Oh what did i miss? I'm sure i answered all the Qs.
All of this is achieved in dev tools.
You can unregister them from that view and that will stop them.
This hasn't been my experience when using:
It's already opt-in by the user of mdBook so no one will be using this feature unless its switched on. Having the end users also opt-in is a bit overkill and would involve some UX explaining to them what they're opting into. I think there's risk of it being confusing, plus I've not seen this prior art with other offline sites or documentation pages. I think a way forward would be to have the feature off by default (which it already is) and mark it as an experimental feature.
Users are able to do this in their developer tools but it should be rare anyone needs to do this unless there's a bug. |
Unfortunately, not really.
Here:
Other questions:
Thanks. How do I do this for Safari and Mobile Safari?
How do they handle when content changes? Do they just show stale content? How do the user know when to reload? I've been doing a little testing again, and it is quite concerning when I load up a page, and it shows content from months ago, and it is not really clear how to get it to update. Sometimes it seems navigating to pages is enough, sometimes reloading is enough, but that experience just seems wrong to me. For example, let's say there's a blog post, "hey, check out this new Rust feature" with a link to the docs, and a bunch of users visit that, and the docs aren't there, because they have an old cache. Does that not seem bad? It seems really bad to me. I'm also getting a lot of font flashing on your book page (chrome and firefox). Is it having problems with the FA fonts? Are those being reloaded from scratch on each page?
Who would experiment with it? How would we know it is safe to use, or turn it on for rust-lang books? |
@ehuss could you go into more detail off how you’re testing, what endpoint you’re hitting etc. |
Sure, I have a variety of ways I am trying to test:
When testing content changes, I just make a change to the book, and rebuild it, then try to see what scenarios it will reload (visiting in a new tab, refreshing, navigating between pages, etc.). So, from a basic test (no offline testing):
|
Pretty sure the standard is to show a little popup at the bottom saying there's a new version and they can update by refreshing (sometimes with a refresh button)
I agree, this seems to be the most popular way and in my opinion the best (using the approach above). I don't think I've seen anyone do a double reload
I agree, offline websites is somewhat new, and for most users it definitely is. In one of the talks I linked they said that they hope having something like "this website can now be used offline" will fall out of fashion as eventually most websites will be like this, but at the moment, it's useful to simply tell the user as they probably won't expect offline
I also had this issue (#1000 (comment))
I think this would be the best approach as creators may need to be aware of what it is doing due to them possibly having little exposure to SWs. I'm sure when people see the option added in the change log they'll be intrigued and want to test it which will help get more testers (although there shouldn't be many issues as the library used is popular and well tested)
I'm not sure as I don't use Safari, but they're dev tools seem to be lacking (which doesn't seem rare). This support documentation suggests that there are not Safari dev tools for SWs (although it does warn that it is possibly out of date and could be added in Safari 14). As I said above, browser unique bugs should be extremely rare and would most likely be a library issue or even a browser bug. The current library version is fairly out of date (#1000 (comment)) so updating could help if you do run into any issues
The only errors I have are relating to FA (but I believe this issue may be elsewhere, I'm not sure if it's due to a bad service worker. I'm really unsure what's causing the FA issue, I haven't used it much myself but it the setup seems fine. Possibly updating could help but I haven't tested yet
localhost definitely works, not sure if local files is part of the spec but probably safe to use a server
I didn't really think about that and it's a great point. If the page is new, the 404 shouldn't get cached so it would be fine, and if it was something like a changelog page that gets updated, I do agree that it could be confusing, however, with an update pop up I think this could be solved. The Google pop ups aren't obnoxious but are still easily visible which is probably an important balance. If someone just navigates to another page though it would update, so stale content would only be displayed on the first page they see. It seems most of your issues are UX related. Would fixing the FA problem and having the "update is available" pop up solve those? I think the pop up could make a huge difference and could be useful to actually test on users unaware of the feature to see how well it performs (maybe once this is merged, PRing it into the Rust Book and seeing if that community notices it, finds it annoying and think it's worth the benefit of decreasing load times and having offline access. Even if a feature wasn't used in The Book I still think it could have value for some mdBook users (for example I want to create a book that would primarily be used in places where there wouldn't be great internet access or needs to be accessed no matter what, so this trade off would be trivial to me, especially when stale content would not cause major problems, another type of user who could find this useful) |
If you're referring to my site ignore the fonts errors, i think that's my server not handling that file type, I don't think its related to offline at all. |
I was, but there was also strange FA behaviour, not just console errors, but I haven't built the PR and tried hosting it myself yet |
I still feel like this would not be a good user experience. For docs that update rapidly, this would show up essentially every day they visit. For users who read the docs sporadically (say once a month), they will essentially run into this every time they visit, which I feel is a bad experience. Would it be possible to have it be more network-centric so that the current behavior is retained, and the "offline" behavior on engages when needed? For example, can it:
Another question, I'm not sure if I asked before, but is there a way to "unload" the service worker? Let's say someone enables this, and then decides to turn it off. Will it be permanently be enabled for any user who visited the site before it was disabled? |
I’m ok to try “network first”. It’s not “great” when you have no network because some browsers hang for a bit before falling back to the cache but it’s probably the closest we’ll get to pleasing both scenarios.
I think we can have a switch for offline on and off, if you turn it off it can unload the service worker. However I don’t think the switch is needed if we’re doing “network first” as it will make no difference to your experience as you’re fetching from the network anyway. So we need to make some decisions here:
|
If the goal is to have both online first and offline first implemented, I think having offline first as the default would be better as it's definitely where most projects seem to be going. I'm not sure how easily you could describe this switch though, you may need a short 2 sentences describing each option. After reading a web.dev post about pretty much this problem, I thought of another idea which would also have a more user friendly configuration screen. What if we use offline first, but have 3 states:
Having both of these options user configurable and book author configurable means:
|
If it can do something like I described above, I think the delay should only be visible once over some time period (say 24 hours). I'm uncertain how often that would happen in practice. What kind of scenarios would cause it to go offline?
Yes, my intent was that if it can automatically switch to offline mode, there would not be a need for a user-visible switch. My question was more for the publisher side. If I publish a book with this turned on, and it causes problems for my readers, I want to turn it off for everyone. AIUI, if you just disable the config option, it doesn't disable the service worker, and there isn't a practical way for users to unload the service worker, leaving it permanently running on their system. Would it be possible, if the option is turned off, if there was some javascript that removed the service worker if it exists? |
@ehuss What do you think about #1000 (comment)? |
I guess I don't understand the difference between option1 and option2. Does workbox.js always issue a request in the background? I thought it would always check. What would be the purpose of completely disabling that background request? And for option 3, what would be the purpose of never showing the cached entry? Isn't that really important, like the whole point of offline viewing? I would expect it to show the cached content for as long as possible if it is unable to update. Also, I may be a bit unclear by what you mean by "user", as there are two different users, the reader and the publisher. Perhaps it would be best to use that terminology when talking about the interaction. I would personally prefer to avoid complicating things for the reader as much as possible. |
Some pseudo-code that should hopefully explain it better: def fetch_and_cache(path: str) -> Content:
# Probably wouldn't actually use a path when refreshing as the entire book should be cached (config option to only cache visited pages in the future?)
content = get(path)
set_cache(path, content) # store content with path and timestamp
if content_already_displayed():
# Specifically for the background refresh, true if the user can already see content but this is still running
# Would also check if the fresh version is actually different to the previous version
show_popup('Update available, refresh to see')
return content
def get_content(path: str, config_option_1: int, config_option_2: int) -> Content:
"""Gets content at path respecting cache options"""
if config_option_1 > config_option_2:
raise ValueError()
if config_option_1 <= -1:
raise ValueError()
if config_option_2 < -1:
raise ValueError()
if config_option_2 == -1:
# Simplify this snippet's logic
# Would actually behave like infinity
config_option_2 = 2^128
cached_content, timestamp = get_cached_content(path) # timestamp is a UNIX timestamp for simplicity
if cached_content is None:
return fetch_and_cache(path)
age = now() - timestamp
if age < config_option_1:
# Only show cached content, no cache refresh
return cached_content
if config_option_1 <= age < config_option_2:
# Concurrently refresh cache and show cached content (background fetch)
run_in_background(fetch_and_cache(path))
return cached_content
if age >= config_option_2:
# Fresh content *only*
set_cache(path, None)
return fetch_and_cache(path) Publishers would be able to set default values for
State 1 will only show cached content, state 2 will show cached content while also refreshing the cache (useful if they are quickly clicking through pages to remove some origin server load, expected value (
I believe there is a lower level API where you have more control, otherwise you can simply not use workbox and write the serviceworker directly
To remove unneccesary server load and egress (anyone could benefit from having this at a few seconds as it's unlikely the server updating within those 5 seconds will be important since they are already browsing). It would also help the client if they have a bandwidth cap like on cellular data (refreshing the cache every 5 seconds is already probably more often than they would want so if the publisher sets a low value the reader can increase it)
State 3's user experience doesn't feel that much different from online first, many users on mobile stop loading a page after a few seconds and the timeout for online first would need to be high enough to be useful. A possible extra boolean config option could be added to change state 3 (or just replace the state 3 in the above snippet with this one if it doesn't seem useful): if age >= config_option_2:
# Force fetch new content with timeout, fall back to cache
with timeout(seconds=5):
return fetch_and_cache(path)
return cached_content I think this is probably the best way to implement the advantage of online only since the only benefit is showing fresher content after a period of inactivity (which works well with the final state) |
Does this PR not use the same caching logic as the browser normally does? I would expect it to have the same behavior as normal online behavior (doing whatever a browser normally does to determine if it needs to access the network and checking if a page is out-of-date). |
You can't cache the initial request without SWs AFAIK, so normal online behaviour doesn't exist (for the actual pages) In this implementation, any normal browser caching stuff should be bypassed as that would cause conflicts and you wouldn't have control over it. By having it under a config option, the publisher and reader can actually control the no fetch caching of pages (which I think is objectively better?) |
@ehuss Please note there is a significant use case for this. |
@nihaals looking at #1000 (comment) id rather not get too complex as a first attempt at this feature. I think for now we can say publishers can select network-first or cache-first, then maybe a cache time for how long they want their pages to stay in the cache for (default to some time)
@ehuss the choices are here: https://developers.google.com/web/tools/workbox/guides/route-requests#handling_a_route_with_a_workbox_strategy |
@jasonwilliams I don't have write access.. |
@sanmai-NL you should be able to clone https://github.com/jasonwilliams/mdBook/tree/offline_support and rebase it against upstream then do a pull request back up |
Just saw this It reminded me that client side apps already have something like the update pop up we were talking about and I don't think that causes many UX problems (although this is only if you leave the tab open during an update, if you go to it for the first time in a while you still get the new assets unless they also have a service worker) |
@nihaals if the publisher opts in to offline first we can implement that option. |
I still believe there is a significant use case for mdbooks to support offline usability. But won't have time to catch up on this now, im happy if someone else wanted to pick up on this from where I left off |
This PR is adding a service worker (via the use of workbox.js) to the mdBook pages which allows for offline navigation and usage. We use the
StaleWhileRevalidate
mode from workboxIn the gif below the red you see in the network tab is the StaleWhileRevalidate kicking in and trying to load a fresh version, this obviously fails offline so is expected.
Gif, click for animation:

I have updated the workbox library as it was very out of date. Not sure on the failing test though
@sorin-davidoi @Michael-F-Bryan
How does this work?
When you visit the page the service worker will start to pre-cache all of the chapters within the book, it basically visits those urls and caches the response (but not the assets on those pages). So it doesn't matter which page you hit first, the service work will cache all pages in the book.
When you visit a page in the book, the contents are served from the cache, if there is no matching content in the cache it is served from the network.
Below is an image of what the cache table looks like in Chrome.

I made a slight change to the appendix page, you can see workbox replaced the cache with the updated version, but kept the other pages as they are. (Time Cached is different to the other entries)
When does the cache invalidate?
The mode set is staleWhileRevalidate so it will always try to pull a fresh copy in the background (whilst serving you content from the cache). If the content is the same it is basically thrown away. If the content is new workbox replaces the cache with the up to date page and uses that instead.
There is no explicit expiry time set.
Testing Instructions
Testing Instructions (build)
target/debug/mdBook
)book
repo