Skip to content

Latest commit

 

History

History
81 lines (50 loc) · 3.2 KB

README.asciidoc

File metadata and controls

81 lines (50 loc) · 3.2 KB

repocache

Http and https cache proxy implementation with a goal to make work with online refreshed tools reproducible.

Can be used to proxy installers, maven and p2 repositories. Amd maybe many others.

Goal

To create a local repository cache of various repositories that are used when installing development environment or doing auto-builds. The local cache created is used for these purposes:

  • Speed up download time and reduce bandwith - using the cache which is on the same computer or in the same office.

  • Make it possible to work offline with tools that are designed online only.

  • Make past builds reproducible even if the Internet has forgot some of the old packages. For projects where binary reproducibility is a requirement.

Implementation details

Semi transparent http and https proxy

All programs have to be set up to use the proxy server when accessing http and https. The repocache server returns the resources from the cache (which are updated if configured so).

In order to use the https proxy feature the fake root CA certificate of the repo cache has to be installed onto the client because the proxy itself is a MITM.

GIT backend

All files accessed through the cache are first stored in a git repository then served to the client. This has the following features:

  • It is never possible to return anything by the server not in the cache.

  • Going back to a past state is possible.

Files are just plain files within the git repo

The resulted git repo is just the accessed content from the web in folders representing the servers and paths.

Merge feature

Automatic merge feature for p2 repos is planned.

Folder listing rewrite

When a listing is downloaded then all the internal links are rewritten to relative. Rewriting is done before storing the page.

Downloading http://targettocache/a/b/: http://targettocache/a/b/c → "c" http://targettocache/a/ → ".." /a/b/c → "c" /a/ → "../"

Crawl subtree

Automatic crawling of subtree (and thus caching all in the subtree) is implemented. Crawling uses HTML listing of files in the folders. (If the cached server implements such listing then it will work.)

Crawling can be started using the crawl link on the bottom part of the listing.

Modes of operation by server and resource path

Possible to set up the server to refresh some resources while keeping others read-only. It is useful when only a set of repositories are to be refreshed.

Configuration folder layout

path/to/config/
               certs/ - cert creation scripts
               certs/public/ - certificates
               certs/keys/ - private keys and additional data for certificates
               access.config - access mode configuration
               client-alias.config - client side aliasing
               plugins.config - plugins configuration

HTTPS proxy

  • Use the configuration page of the repo cache management web page to generate root certificate

  • download the certificate from the configuration page

  • install the certificate into the https clients

  • set up the client to use the https proxy