-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search suffix tree implementation #51954
Open
hannojg
wants to merge
21
commits into
Expensify:main
Choose a base branch
from
margelo:perf/search-tree
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+750
−7
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
e90556d
Revert "Revert "Revert "Revert "Search suffix tree implementation""""
hannojg caa7dc5
exclude comma from search values
hannojg a2d8012
wip: refactoring test to be reusable
hannojg 9ed2253
Revert "wip: refactoring test to be reusable"
hannojg 77d200c
Merge branch 'main' of github.com:Expensify/App into perf/search-tree
hannojg a01a375
fix: sort search results correctly
hannojg 80d8065
Merge branch 'main' of github.com:Expensify/App into perf/search-tree
hannojg 1e02c82
cleanup option list
hannojg c73aad5
fix duplicate search results
hannojg dd52d6a
Merge branch 'main' of github.com:Expensify/App into perf/search-tree
hannojg 6a7b7e8
eslint
hannojg 253d17b
wip
hannojg 0c2fb05
Merge branch 'main' of github.com:Expensify/App into perf/search-tree
hannojg 2c856b4
fixes after merge
hannojg 84873d8
wip: use fast search in SearchRouterList
hannojg 1f64b57
cleanup tests
hannojg 928d330
add `useFastSearchFromOptions` hook
hannojg eebf638
remove comment
hannojg 31ae881
remove unnecessary test case
hannojg 9cba20a
add docs
hannojg 555e884
remove obsolete test
hannojg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
import {useMemo} from 'react'; | ||
import FastSearch from '@libs/FastSearch'; | ||
import * as OptionsListUtils from '@libs/OptionsListUtils'; | ||
|
||
type AllOrSelectiveOptions = OptionsListUtils.ReportAndPersonalDetailOptions | OptionsListUtils.Options; | ||
|
||
type Options = { | ||
includeUserToInvite: boolean; | ||
}; | ||
|
||
// You can either use this to search within report and personal details options | ||
function useFastSearchFromOptions( | ||
options: OptionsListUtils.ReportAndPersonalDetailOptions, | ||
config: {includeUserToInvite: false}, | ||
): (searchInput: string) => OptionsListUtils.ReportAndPersonalDetailOptions; | ||
// Or you can use this to include the user invite option. This will require passing all options | ||
function useFastSearchFromOptions(options: OptionsListUtils.Options, config: {includeUserToInvite: true}): (searchInput: string) => OptionsListUtils.Options; | ||
|
||
/** | ||
* Hook for making options from OptionsListUtils searchable with FastSearch. | ||
* Builds a suffix tree and returns a function to search in it. | ||
* | ||
* @example | ||
* ``` | ||
* const options = OptionsListUtils.getSearchOptions(...); | ||
* const filterOptions = useFastSearchFromOptions(options); | ||
*/ | ||
function useFastSearchFromOptions( | ||
options: OptionsListUtils.ReportAndPersonalDetailOptions | OptionsListUtils.Options, | ||
{includeUserToInvite}: Options = {includeUserToInvite: false}, | ||
): (searchInput: string) => AllOrSelectiveOptions { | ||
const findInSearchTree = useMemo(() => { | ||
const fastSearch = FastSearch.createFastSearch([ | ||
{ | ||
data: options.personalDetails, | ||
toSearchableString: (option) => { | ||
const displayName = option.participantsList?.[0]?.displayName ?? ''; | ||
return [option.login ?? '', option.login !== displayName ? displayName : ''].join(); | ||
}, | ||
}, | ||
{ | ||
data: options.recentReports, | ||
toSearchableString: (option) => { | ||
const searchStringForTree = [option.text ?? '', option.login ?? '']; | ||
|
||
if (option.isThread) { | ||
if (option.alternateText) { | ||
searchStringForTree.push(option.alternateText); | ||
} | ||
} else if (!!option.isChatRoom || !!option.isPolicyExpenseChat) { | ||
if (option.subtitle) { | ||
searchStringForTree.push(option.subtitle); | ||
} | ||
} | ||
|
||
return searchStringForTree.join(); | ||
}, | ||
}, | ||
]); | ||
function search(searchInput: string): AllOrSelectiveOptions { | ||
const [personalDetails, recentReports] = fastSearch.search(searchInput); | ||
|
||
if (includeUserToInvite && 'currentUserOption' in options) { | ||
const userToInvite = OptionsListUtils.filterUserToInvite(options, searchInput); | ||
return { | ||
personalDetails, | ||
recentReports, | ||
userToInvite, | ||
currentUserOption: options.currentUserOption, | ||
}; | ||
} | ||
|
||
return { | ||
personalDetails, | ||
recentReports, | ||
}; | ||
} | ||
|
||
return search; | ||
}, [includeUserToInvite, options]); | ||
|
||
return findInSearchTree; | ||
} | ||
|
||
export default useFastSearchFromOptions; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
/* eslint-disable rulesdir/prefer-at */ | ||
import CONST from '@src/CONST'; | ||
import Timing from './actions/Timing'; | ||
import SuffixUkkonenTree from './SuffixUkkonenTree'; | ||
|
||
type SearchableData<T> = { | ||
/** | ||
* The data that should be searchable | ||
*/ | ||
data: T[]; | ||
/** | ||
* A function that generates a string from a data entry. The string's value is used for searching. | ||
* If you have multiple fields that should be searchable, simply concat them to the string and return it. | ||
*/ | ||
toSearchableString: (data: T) => string; | ||
}; | ||
|
||
// There are certain characters appear very often in our search data (email addresses), which we don't need to search for. | ||
const charSetToSkip = new Set(['@', '.', '#', '$', '%', '&', '*', '+', '-', '/', ':', ';', '<', '=', '>', '?', '_', '~', '!', ' ', ',']); | ||
|
||
/** | ||
* Creates a new "FastSearch" instance. "FastSearch" uses a suffix tree to search for substrings in a list of strings. | ||
* You can provide multiple datasets. The search results will be returned for each dataset. | ||
* | ||
* Note: Creating a FastSearch instance with a lot of data is computationally expensive. You should create an instance once and reuse it. | ||
* Searches will be very fast though, even with a lot of data. | ||
*/ | ||
function createFastSearch<T>(dataSets: Array<SearchableData<T>>) { | ||
Timing.start(CONST.TIMING.SEARCH_CONVERT_SEARCH_VALUES); | ||
const maxNumericListSize = 400_000; | ||
// The user might provide multiple data sets, but internally, the search values will be stored in this one list: | ||
let concatenatedNumericList = new Uint8Array(maxNumericListSize); | ||
// Here we store the index of the data item in the original data list, so we can map the found occurrences back to the original data: | ||
const occurrenceToIndex = new Uint32Array(maxNumericListSize * 4); | ||
// As we are working with ArrayBuffers, we need to keep track of the current offset: | ||
const offset = {value: 1}; | ||
// We store the last offset for a dataSet, so we can map the found occurrences to the correct dataSet: | ||
const listOffsets: number[] = []; | ||
|
||
for (const {data, toSearchableString} of dataSets) { | ||
// Performance critical: the array parameters are passed by reference, so we don't have to create new arrays every time: | ||
dataToNumericRepresentation(concatenatedNumericList, occurrenceToIndex, offset, {data, toSearchableString}); | ||
listOffsets.push(offset.value); | ||
} | ||
concatenatedNumericList[offset.value++] = SuffixUkkonenTree.END_CHAR_CODE; | ||
listOffsets[listOffsets.length - 1] = offset.value; | ||
Timing.end(CONST.TIMING.SEARCH_CONVERT_SEARCH_VALUES); | ||
|
||
// The list might be larger than necessary, so we clamp it to the actual size: | ||
concatenatedNumericList = concatenatedNumericList.slice(0, offset.value); | ||
|
||
// Create & build the suffix tree: | ||
Timing.start(CONST.TIMING.SEARCH_MAKE_TREE); | ||
const tree = SuffixUkkonenTree.makeTree(concatenatedNumericList); | ||
Timing.end(CONST.TIMING.SEARCH_MAKE_TREE); | ||
|
||
Timing.start(CONST.TIMING.SEARCH_BUILD_TREE); | ||
tree.build(); | ||
Timing.end(CONST.TIMING.SEARCH_BUILD_TREE); | ||
|
||
/** | ||
* Searches for the given input and returns results for each dataset. | ||
*/ | ||
function search(searchInput: string): T[][] { | ||
const cleanedSearchString = cleanString(searchInput); | ||
const {numeric} = SuffixUkkonenTree.stringToNumeric(cleanedSearchString, { | ||
charSetToSkip, | ||
// stringToNumeric might return a list that is larger than necessary, so we clamp it to the actual size | ||
// (otherwise the search could fail as we include in our search empty array values): | ||
clamp: true, | ||
}); | ||
const result = tree.findSubstring(Array.from(numeric)); | ||
|
||
const resultsByDataSet = Array.from({length: dataSets.length}, () => new Set<T>()); | ||
// eslint-disable-next-line @typescript-eslint/prefer-for-of | ||
for (let i = 0; i < result.length; i++) { | ||
const occurrenceIndex = result[i]; | ||
const itemIndexInDataSet = occurrenceToIndex[occurrenceIndex]; | ||
const dataSetIndex = listOffsets.findIndex((listOffset) => occurrenceIndex < listOffset); | ||
|
||
if (dataSetIndex === -1) { | ||
throw new Error(`[FastSearch] The occurrence index ${occurrenceIndex} is not in any dataset`); | ||
} | ||
const item = dataSets[dataSetIndex].data[itemIndexInDataSet]; | ||
if (!item) { | ||
throw new Error(`[FastSearch] The item with index ${itemIndexInDataSet} in dataset ${dataSetIndex} is not defined`); | ||
} | ||
resultsByDataSet[dataSetIndex].add(item); | ||
} | ||
|
||
return resultsByDataSet.map((set) => Array.from(set)); | ||
} | ||
|
||
return { | ||
search, | ||
}; | ||
} | ||
|
||
/** | ||
* The suffix tree can only store string like values, and internally stores those as numbers. | ||
* This function converts the user data (which are most likely objects) to a numeric representation. | ||
* Additionally a list of the original data and their index position in the numeric list is created, which is used to map the found occurrences back to the original data. | ||
*/ | ||
function dataToNumericRepresentation<T>(concatenatedNumericList: Uint8Array, occurrenceToIndex: Uint32Array, offset: {value: number}, {data, toSearchableString}: SearchableData<T>): void { | ||
data.forEach((option, index) => { | ||
const searchStringForTree = toSearchableString(option); | ||
const cleanedSearchStringForTree = cleanString(searchStringForTree); | ||
|
||
if (cleanedSearchStringForTree.length === 0) { | ||
return; | ||
} | ||
|
||
SuffixUkkonenTree.stringToNumeric(cleanedSearchStringForTree, { | ||
charSetToSkip, | ||
out: { | ||
outArray: concatenatedNumericList, | ||
offset, | ||
outOccurrenceToIndex: occurrenceToIndex, | ||
index, | ||
}, | ||
}); | ||
// eslint-disable-next-line no-param-reassign | ||
occurrenceToIndex[offset.value] = index; | ||
// eslint-disable-next-line no-param-reassign | ||
concatenatedNumericList[offset.value++] = SuffixUkkonenTree.DELIMITER_CHAR_CODE; | ||
}); | ||
} | ||
|
||
/** | ||
* Everything in the tree is treated as lowercase. | ||
*/ | ||
function cleanString(input: string) { | ||
return input.toLowerCase(); | ||
} | ||
|
||
const FastSearch = { | ||
createFastSearch, | ||
}; | ||
|
||
export default FastSearch; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a CONST somewhere/do we forsee this list growing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, not really sure if this list would grow or be used anywhere else 🤔