Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Documentation caching #104

Open
tpluscode opened this issue Sep 20, 2022 · 1 comment
Open

API Documentation caching #104

tpluscode opened this issue Sep 20, 2022 · 1 comment

Comments

@tpluscode
Copy link
Contributor

I'm looking into ways for improving hydra APIs by using cache headers. A first-order recommendation is to use versioned assets and long-lived immutable cache. I think this fits the most common case of API Documentation which remains static at least until the server app is restarted.

To implement this behaviour would require three changes:

First, to add a query string to the documentation header. Possible something like UNIX timestamp

-link: </api>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation"
+link: </api?v=123456789>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation"

Second, cache-control to the API Documentation itself

Cache-Control: max-age=31536000, immutable

Lastly, to actually serve the triples with all URIs /api rewritten to /api?v=123456789 so that client can correctly find it in the representation.

This should allow proxies to cache the API documentation.

Here is the diff that solved my problem:

diff --git a/node_modules/hydra-box/lib/middleware/apiHeader.js b/node_modules/hydra-box/lib/middleware/apiHeader.js
index 8751fbc..e9b799c 100644
--- a/node_modules/hydra-box/lib/middleware/apiHeader.js
+++ b/node_modules/hydra-box/lib/middleware/apiHeader.js
@@ -1,16 +1,29 @@
 const { Router } = require('express')
+const $rdf = require('rdf-ext')
+
+const timestamp = Date.now()
 
 function factory (api) {
   const router = new Router()
 
+  const timeDependentApiId = $rdf.namedNode(`${api.term.value}?v=${timestamp}`)
+  const dataset = api.dataset.map(({ subject, predicate, object, graph }) => {
+    return $rdf.quad(
+      subject.equals(api.term) ? timeDependentApiId : subject,
+      predicate,
+      object.equals(api.term) ? timeDependentApiId : object,
+      graph)
+  })
+
   router.use((req, res, next) => {
-    res.setLink(api.term.value, 'http://www.w3.org/ns/hydra/core#apiDocumentation')
+    res.setLink(timeDependentApiId, 'http://www.w3.org/ns/hydra/core#apiDocumentation')
 
     next()
   })
 
   router.get(api.path, (req, res, next) => {
-    res.dataset(api.dataset).catch(next)
+    res.setHeader('cache-control', 'max-age=31536000, immutable')
+    res.dataset(dataset).catch(next)
   })
 
   return router

This issue body was partially generated by patch-package.

@tpluscode
Copy link
Contributor Author

Having experimented with this approach a little I had limited success. The problem with a query string is that this is identified as a different identifier which caused me trouble on the client trying to find the documentation resource.

A different approach I tried was with a shorter cache age and etag. This appears to work nicely

diff --git a/node_modules/hydra-box/lib/middleware/apiHeader.js b/node_modules/hydra-box/lib/middleware/apiHeader.js
index 8751fbc..33546b7 100644
--- a/node_modules/hydra-box/lib/middleware/apiHeader.js
+++ b/node_modules/hydra-box/lib/middleware/apiHeader.js
@@ -1,15 +1,32 @@
 const { Router } = require('express')
+const $rdf = require('rdf-ext')
+const etag = require('etag')
+const toCanonical = require('rdf-dataset-ext/toCanonical.js')
+const preconditions = require('express-preconditions')
 
 function factory (api) {
   const router = new Router()
 
+  const apiEtag = etag(toCanonical(api.dataset))
+
   router.use((req, res, next) => {
     res.setLink(api.term.value, 'http://www.w3.org/ns/hydra/core#apiDocumentation')
 
     next()
   })
 
-  router.get(api.path, (req, res, next) => {
+  router.get(api.path,
+    preconditions({
+      async stateAsync() {
+        return {
+          etag: apiEtag
+        }
+      }
+    }),
+    (req, res, next) => {
+
+    res.setHeader('cache-control', 'max-age=30, stale-while-revalidate=30')
+    res.setHeader('etag', apiEtag)
     res.dataset(api.dataset).catch(next)
   })

There is no one way to set caching, and APIs may choose not to completely. I was thinking that maybe hydra-box could introduce extension points to plug middleware before the get(api.path) handler? Something like

-function factory (api) {
+function factory (api, ...beforeApi) {

-  router.get(api.path, (req, res, next) => {
+  router.get(api.path, ...beforeApi, (req, res, next) => {
    res.dataset(api.dataset).catch(next)
  })
}

For the configuration above, I would provide the preconditions middleware and a second, to set the cache-control and etag headers to my liking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant