Http and https cache proxy implementation with a goal to make work with online refreshed tools reproducible.
Can be used to proxy installers, maven and p2 repositories. Amd maybe many others.
To create a local repository cache of various repositories that are used when installing development environment or doing auto-builds. The local cache created is used for these purposes:
-
Speed up download time and reduce bandwith - using the cache which is on the same computer or in the same office.
-
Make it possible to work offline with tools that are designed online only.
-
Make past builds reproducible even if the Internet has forgot some of the old packages. For projects where binary reproducibility is a requirement.
All programs have to be set up to use the proxy server when accessing http and https. The repocache server returns the resources from the cache (which are updated if configured so).
In order to use the https proxy feature the fake root CA certificate of the repo cache has to be installed onto the client because the proxy itself is a MITM.
All files accessed through the cache are first stored in a git repository then served to the client. This has the following features:
-
It is never possible to return anything by the server not in the cache.
-
Going back to a past state is possible.
The resulted git repo is just the accessed content from the web in folders representing the servers and paths.
When a listing is downloaded then all the internal links are rewritten to relative. Rewriting is done before storing the page.
Downloading http://targettocache/a/b/: http://targettocache/a/b/c → "c" http://targettocache/a/ → ".." /a/b/c → "c" /a/ → "../"
Automatic crawling of subtree (and thus caching all in the subtree) is implemented. Crawling uses HTML listing of files in the folders. (If the cached server implements such listing then it will work.)
Crawling can be started using the crawl link on the bottom part of the listing.
Possible to set up the server to refresh some resources while keeping others read-only. It is useful when only a set of repositories are to be refreshed.
path/to/config/ certs/ - cert creation scripts certs/public/ - certificates certs/keys/ - private keys and additional data for certificates access.config - access mode configuration client-alias.config - client side aliasing plugins.config - plugins configuration
-
Use the configuration page of the repo cache management web page to generate root certificate
-
download the certificate from the configuration page
-
install the certificate into the https clients
-
set up the client to use the https proxy