You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
It would be great helpful to document the incremental indexing process in details.
Since the source version is controlled outside OpenGrok, users need to understand the precautions for source code management/update.
Describe the solution you'd like
A clear and concise description of what you want to happen.
A clear and concise document to describe the incremental re-indexing process, at least includes:
source file update detection mechanism
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered:
history cache update: happens via Indexer#prepareIndexer()
index update: via Indexer#doIndexerExecution()
history cache update
assuming the directory /var/opengrok/data/ is the data root and foo is the project being indexed with its source having just a single file called file.txt, the history cache directory will have these contents:
$ ls /var/opengrok/data/historycache/foo/
file.txt.gz OpenGrokDirHist.gz OpenGroklatestRev
the file.txt.gz is compressed XML representation of the History object that contains history of the file.txt
OpenGrokDirHist.gz contains History object with history of the whole top level directory of project foo
OpenGroklatestRev is plain text file containing the revision ID of the latest indexed revision of the repository
HistoryGuru#createCacheReal() is the main workhorse. For VCS implementations based on changesets, it takes the revision stored in OpenGroklatestRev and calls Repository#createCache(). It calls getHistory() with the latest indexed changeset ID. This method is overriden for certain repositories (such as Git, Mercurial and others) to make this efficient. FileHistoryCache#store() will then take the changesets and construct inverse map that maps files to changesets in which the file was changed. This way it is not necessary to retrieve history for each file individually, just for the project top level directory. doFileHistory() will deal with merging already existing history with newly added history for given file.
index update
Assuming the indexer is not doing per project index it scans the whole source root (otherwise it would scan just the project directory under source root). The indexer updates each project in parallel. Index update is done in IndexDatabase#update().
In indexDown() the source directory is recursively traversed and for each file its last modified time stamp is compared with its UID of related Lucene term stored in the index. If the file is to be reindexed, it will be done via removeFile() and later addFile() in indexParallel(). The AnalyzerGuru#populateDocument() will then put all the data together (including history) and store it in a Lucene document.
So, this is not really an incremental reindex since it needs to traverse the whole directory tree. #3077 tracks the enhancement to use VCS to avoid that.
Is your feature request related to a problem? Please describe.
It would be great helpful to document the incremental indexing process in details.
Since the source version is controlled outside OpenGrok, users need to understand the precautions for source code management/update.
Describe the solution you'd like
A clear and concise description of what you want to happen.
A clear and concise document to describe the incremental re-indexing process, at least includes:
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: