-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Optimise site search #1069
ENH Optimise site search #1069
Conversation
90172d1
to
4122799
Compare
As long as this had an extension point to be able to change what it indexes then it should be ok, I guess. We rely on shorcodes being processed as those could bring in content that is not in any of the db fields directly and should be indexed for the given element, so for me it should support that, too. The list of excluded fields could be a map to iterate over, or two maps where one are fields and other are trailing strings like ID, Hash etc.. so that we don't create lists of comparisons that would be harder to manage. Could be config, too. |
6d455a6
to
611cff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously since this is draft I'm mostly just doing a "is this approach okay" review - but I've highlighted a few things that will need to change eventually in any case.
I think this has to be opt in, since we're changing the way developers even define what is searchable for their blocks in the cms, let alone how. The general approach seems sensible - though I wonder if there will be projects that could benefit from this new config that won't just go straight to setting search_for_term_in_content
to false... but in lieu of a way to validate that ahead of time, I see no reason to not go ahead with this.
// Check whether the search term exists in the nested page content | ||
$pageContent = $siteTree->getElementsForSearch(); | ||
} else { | ||
$pageContent = $siteTree->getContentFromElementsForCmsSearch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we're just looking at DB fields directly here... is there a way we can craft this as a db query instead of using stripos
? I suspect that would be faster. Though if that's too complex to do right now we can always merge this PR and look at that as a potential enhancement down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ...... technically. Though it would involve working out what all of the fields to query are ahead of time, and then putting all of those into a big $filter
and putting :PartialMatch
on all the things. PartialMatch translates to SQL WHERE <field> LIKE '%<searchterm>%'
Feels much more like a future ENH to me rather than a do now thing. It may be faster and should def use less PHP + memory, though more database
eb993cd
to
af60a0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this a more thorough review once we've resolved the current conversations about the approach - but it sounds like we're getting to a consensus now.
8549742
to
0c03596
Compare
0c03596
to
fc76a86
Compare
39de753
to
9a55042
Compare
9a55042
to
46fd44a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, works well locally.
There's a subtle side-effect where when rendering templates, if the title of the block isn't displayed (which is the default for most blocks), it won't be included in search - but with the new feature, it will, which is actually a slight improvement to search fidelity 😅
Issue #1067
There's a couple of bits to this PR:
A) Utilise eager loading - this gives a small performance gain of ~9%
B) Don't render elemental templates, instead only look at database values on the element (won't include any child DataObjects) - this gave a large performance gain, over 100% improvement
There's likely going to be a "loss in accuracy" (though may also be a gain) in exchange for the massive performance boost - the change in accuracy is a result of:
We'd also need to agree whether to make this an opt-in or an opt-out. In this PR I've got it as the more conservative opt-in, though I'd prefer opt-out as most people will likely not even realise it's an option available, and I given it's "only" cms page search I don't think any possible loss in accuracy really warrants the fact that page search really is horrible slow without this optimization (I've had people tell me about 10 second response times, and even sites timing out).
Options:
Benchmarks
Running a CMS search for "lorem" against 1,000 pages with 5 blocks each (total of 5,000 blocks) - data generated with https://github.com/emteknetnz/silverstripe-test-data - total of 3 runs - times are in seconds
render elements - eager load off - (current) - avg 5.81
5.92
5.63
5.87
render elements - eager load on - avg 5.35
5.35
5.21
5.50
unrendered - eager load off - avg 2.36
2.37
2.35
2.36
unrendered - eager load on - avg 2.15
2.09
2.24
2.13