This is a REST endpoint that permits the in-built Lucene++ index on Synology NAS instances to be queried remotely.
The output can then be consumed via eg synology-lucene-client-ui, which permits the files to be queried and retrieved via the browser.
This is an alpha release. There are probably issues with threading and deadlocks. Caveat emptor.
The Synology file index service creates a bunch of files under /<volume>/<share/@eaDir/[email protected]
. Poking
around the files tells you these are generated by Lucene++ (v3.0.7 on my system). A summary of these file types
can be found below; taken from https://lucene.apache.org/core/2_9_4/fileformats.html:
Name | Extension | Brief Description |
---|---|---|
Segments File | segments.gen, segments_N | Stores information about segments |
Lock File | write.lock | The Write lock prevents multiple IndexWriters from writing to the same file. |
Compound File | .cfs | An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles. |
Fields | .fnm | Stores information about the fields |
Field Index | .fdx | Contains pointers to field data |
Field Data | .fdt | The stored fields for documents |
Term Infos | .tis | Part of the term dictionary, stores term info |
Term Info Index | .tii | The index into the Term Infos file |
Frequencies | .frq | Contains the list of docs which contain each term along with frequency |
Positions | .prx | Stores position information about where a term occurs in the index |
Norms | .nrm | Encodes length and boost factors for docs and fields |
Term Vector Index | .tvx | Stores offset into the document data file |
Term Vector Documents | .tvd | Contains information about each document that has term vectors |
Term Vector Fields | .tvf | The field level info about term vectors |
Deleted Documents | .del | Info about what files are deleted |
Luke 4.3.0 can be used to read the Lucene++ indices. With that, we can extract the fields within the Lucene++ index. I've mapped them across as follows (note last column, where I've tried to describe what is stored within these fields):
Field name within Synology Lucene | Name within this library | Description / format |
---|---|---|
SYNODriveFileID | driveFileID | empty |
SYNODriveFileLabel | driveFileLabel | empty |
SYNODriveFileStar | driveFileStar | empty |
SYNOMDAcquisitionMake | acquisitionMake | empty |
SYNOMDAcquisitionModel | acquisitionModel | empty |
SYNOMDAttributeChangeDate | attributeChangeDate | Unix epoch sec |
SYNOMDAuthors | authors | eg Jane Doe |
SYNOMDCity | city | empty |
SYNOMDContentModificationDate | contentModificationDate | Unix epoch sec |
SYNOMDContributors | contributors | empty |
SYNOMDCopyright | copyright | empty |
SYNOMDCountry | country | empty |
SYNOMDCoverage | coverage | empty |
SYNOMDCreator | creator | eg Microsoft Office 2010 |
SYNOMDDateAdded | dateAdded | Unix epoch sec |
SYNOMDDescription | description | empty |
SYNOMDDisplayName | displayName | File name without path |
SYNOMDDocInfo.SYNOMDPageLengthVector | docInfo | Character count per page eg 1280 1820 ... |
SYNOMDExtension | extension | eg docx |
SYNOMDFSContentChangeDate | fsContentChangeDate | Unix epoch sec |
SYNOMDFSCreationDate | fsCreationDate | Unix epoch sec |
SYNOMDFSName | fsName | File name without path |
SYNOMDFSSize | fsSize | Size in bytes |
SYNOMDFinderOpenDate | finderOpenDate | Unix epoch sec |
SYNOMDHeadline | headline | empty |
SYNOMDIdentifier | identifier | empty |
SYNOMDIsDir | isDir | String y / n |
SYNOMDKeywords | keywords | empty |
SYNOMDKind | kind | eg docx |
SYNOMDLanguages | languages | empty |
SYNOMDLastUsedDate | lastUsedDate | Unix epoch sec |
SYNOMDLogicalSize | logicalSize | Size in bytes |
SYNOMDOwnerGroupID | ownerGroupID | Unix GID |
SYNOMDOwnerUserID | ownerUserID | Unix UID |
SYNOMDParent | parent | eg /volume1/sharename |
SYNOMDPath | path | Full path to file eg /volume1/sharename/file.docx |
SYNOMDPrivilege | privilege | Unix privs string eg rwxrwx--- |
SYNOMDPublishers | publishers | empty |
SYNOMDRights | rights | empty |
SYNOMDSearchAncestor | searchAncestor | empty |
SYNOMDSearchFileName | searchFileName | empty |
SYNOMDTextContent | textContent | Full text of document |
SYNOMDTitle | title | empty |
SYNOMDWildcard | wildcard | empty |
SYNOStateOrProvince | stateOrProvince | empty |
_SYNOMDFinderLabel | sysFinderLabel | eg 0 |
_SYNOMDGroupId | sysGroupId | Unix GID |
_SYNOMDUserTags | sysUserTags | empty |
The REST API provides two endpoints: /search
(for performing the search), and /get
(for retrieving the doc).
- Runs in Docker
- Exposes search interface on port
18080
docker build . -t synosearch
To find relevant paths within volumeX:
find /volumeX -maxdepth 1 -mindepth 1 -type d ! -name "@*"
important do not deploy this version on any production system: the /get
endpoint allows all documents to be
returned with no permissions checks, and there is no filtering currently applied to the search.
To run and expose a share /volume1/dropbox
, execute as follows:
nas$> docker run -p 18080:18080 \
-v /volume1/dropbox:/volume1/dropbox:ro \
-v /volume1/dropbox/@eaDir/[email protected]:/indices/dropbox:ro \
synosearch -index /indices/dropbox
Then you should be able to hit the /search
endpoint:
curl "http://hostname:18080/search?q=example" | jq '.'
This will then return search results as JSON:
{
"hits": [
{
"path": "/volume1/dropbox/ExampleNotificationPlugin.java",
"fs_size": 995,
"score": 1.72329,
"extension": "java"
}
],
"total_hits": 1
}
Then you can run the synology-lucene-client-ui on top of this to provide a friendly UI.