-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW Search across multiple searchable fields by default #10382
NEW Search across multiple searchable fields by default #10382
Conversation
9b2340b
to
064ac46
Compare
be05fbc
to
468725f
Compare
I can also see an argument for having a configuration for the search filter used by the primary field (instead of using the filters defined in searchable_fields) - i am leaning towards it being worth implementing as part of this PR and if whoever reviews this agrees I'll be happy to do it. |
Any idea what the performance difference is between this method vs Max's original approach on large datasets? |
Not sure. I wouldn't expect there to be much difference though... in the PHP code the complexity is basically identical (loop through a list of fields, add each field to either a whereAny clause (in Max's case) or a subQuery's where clause (in my case). Max's approach also loops through each term in the search phrase which I guess adds extra complexity but unless your search phrase has thousands of terms I can't see that making a notable difference. The performance difference if any would come from the database handling the query - where Max's looks something like this (according to your comment in his PR):
This PR's SQL would look something like this (assuming a very simple scenario where there are no joins - note it has
The main difference here is that the search phrase hasn't been split out into its component terms in this PR, which could potentially have some performance benefit for large datasets - especially as the search phrase gains more and more terms. But I have done no actual performance testing. |
Ah right, thanks for clarifying, I'd expect the non-splitting words approach to have better performance, however I'd expect that splitting the words would be closer to how most people would expect search to work and give a better user experience? Seems like splitting words should be the default and possibly with a config option to not split words (at a per dataobject level?) for particularly large datasets with performance issues? |
Yeah that seems reasonable, and would be pretty easy to implement. Definitely the config should be at the /**
* The search filter to use when searching with the primary search field.
* If this is an empty string, the search filters configured for each field are used instead.
*/
private static string $primary_search_field_filter = 'PartialMatchFilter';
/**
* If true, the search phrase is split into individual terms, and checks all searchable fields for each search term.
* If false, all fields are checked for the entire search phrase as a whole.
*/
private static bool $primary_search_split_terms = true; |
Yeah that looks good |
Added search filter and split terms config, functionality, and tests. |
Added docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main objective is to remove as much new API surface as possible. Seems like the primary way to configure this is via yml config, not subclassing, so the protected
methods should go private
.
The public
methods should also go private
where possible too, though with the use of anonymous functions it may make it hard as $this
may not work in some scenarios, though try your best to refactor out as much new API as you can.
Also some minor syntax stuff.
I disagree - there is explicit documentation about One thing I have wanted to do in projects was replace the db search with a more comprehensive solr-powered search. With the new methods introduced by this PR that becomes very easy to do. |
From a semver perspective, adding protected methods is the same as adding public methods and makes it harder to change things later. For instance if we decide we want to add another parameter later, it has to be an optional parameter and we need to support it being present or not. So as much as possible we want to minimize API additions to reduce the future maintenance overhead. Looking through the existing protected $modelClass - assigned in constructor protected $fields - assigned in constructor protected $filters - assigned in constructor protected $searchParams - assigned in protected function applyBaseTableFields() - has comments above So the current state of SearchContext doesn't really say "subclass me". Nor do the docs.
What do you think of the possibility of splitting off the "database search" part of this PR into a new class with an interface e.g. (And yes I know I've said in the past that we sometimes box ourselves in with interfaces that are too-narrow ). |
I'll just set those methods to Also it turns out that |
cb05f5e
to
6129530
Compare
Note: The documentation portion should not be merged here. I have started the process of migrating docs to a new repository so the docs will need to be removed from this PR and a new PR raised in the new repo instead. |
1cf2fcb
to
3545524
Compare
Changes to docs are moved to silverstripe/developer-docs#5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial test, mostly seems good
1)
I think we'll want to run this past a UX designer about what the correct way to handle multiple words is
Testing in the member section where I've added the following member:
First name: Bob
Surname: Smith
Email: [email protected]
First name: Barry
Surname: Jones
Email: [email protected]
a) Searching for "bob" matches
b) Searching for "bo" matches
c) Searching for "smith" matches
d) Searching for "smith bob" matches
e) Searching for "bob charlie" does not match
f) Searching for "bob barry" does not match
I feel that d) and f) should match both records, as a google search would still so a partial match, rather than returning all results. However a UX designer should be able to give a more definitive answer than I can.
2)
General question, is the term "Primary search" a new term, or an existing term? What would a "Secondary search" be?
3)
Maybe an edge case though doesn't seem like we can disable the primary search for existing core fields e.g.
app/src/MemberExtension.php
<?php
use SilverStripe\ORM\DataExtension;
class MemberExtension extends DataExtension
{
private static $searchable_fields = [
'Email' => [
'primary' => false
],
];
}
app/__config/mysite.yml
---
Name: myconfig
After: '*'
---
SilverStripe\ORM\DataObject:
primary_search_field_name: 'my_primary_field_name'
primary_search_split_terms: true
SilverStripe\Security\Member:
extensions:
- MemberExtension
Did not disable search on the Member.Email field which I would expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another question:
4)
How do we disable this behaviour and revert to the legacy single field behaviour? There's probably a fairly good argument that the multifield search is simply fair better and we shouldn't provide a way to revert it.
If we're not going to allow reverting to legacy behaviour, then we should close the following 2 PRs:
- NEW: Allow search field customisation #9341
- DOC Document GridFieldFilterHeader general field customisation developer-docs#6
That said, there may be some giant databases that requires limiting the fields that are searched across for performance reasons. If that's a concern, then we should perhaps add an allowlist style config that can be used to define the fields that are searched across which, if defined, is used instead of $searchable_fields. We'd probably still want to retain the primary => false
style config which acts as a denylist
@maxime-rainville another one we need a UX designer to take a look at please.
It's a new term, I think. I stole it from #9341 when it had docs (docs now at silverstripe/developer-docs#6) 'cause it was better than the "global search" terminology I had been using.
I would consider using the individual filter fields in the dropdown as "secondary". It could also be considered "advanced" or "more specific"
This is a quirk of the config system, but should probably be (separately) fixed. It's because the $searchable_fields = [
'Email' => [
'primary' => false
],
'FirstName',
'Surname',
'Email',
];
This is documented both in code and in the corresponding docs PR. You can simple set
There is already configuration for this - set |
f3f0667
to
efbc4ae
Compare
Requested changes made. Rebased on |
I suppose so, though I think it's still worth doing in this context. My thinking is that the denylist list method isn't resiliant against adding more searchable_fields. For instance 6 months after implementation when everything is fine someone may add another 10 searchable fields. I guess the counter argument is that you can then just set
Normally I'd say I'd prefer an explicit config bool here However what I think what's better here is to add the allowlist config which would be a better implementation of #9341. If we do that we've maximized flexibility.
Something about this naming does not quite sit right with me, probably because there's no such thing as "secondary". If we refer to the drop down search as "specific", then this should be "general". If we refer to it as "advanced" then this would be "basic", however it's not really advanced as you can only search individual fields, you can't do things like search date ranges or specify and/or conjucitons. So "specific" makes more sense there, so we should be using the word "general" instead of "primary"? Worth spending the time getting this right as it'll be around for years |
It's a catch-22 situation I think. If we have an allowlist and then someone adds new It's especially confusing if we keep the Which takes precedence if a field is included in both? Or do we throw an exception in that case? And how do we effectively communicate which to use in which situation to new developers who are already trying to get their heads around all the other searchable and summary fields config?
If the allow list is for performance reasons on large datasets, surely this would be almost exclusively set within projects, not in modules? In which case developers will be adding new searchable_fields directly to the classes rather than extensions I would expect. And modules rarely if ever add searchable_fields via extensions. It seems like we'd be adding an extra configuration layer to catch weird edge cases.
I'd be open to doing this. I didn't do it this way initially just to avoid adding too many new config options, but this does seem more intuitive, you're right. I also don't think that the ability to disable this should be at all related to whether there's an allow-list - especially since this change could break customisations people have made around the current way things work. We should absolutely provide a way to disable this new behaviour at least for the lifetime of CMS4, whether it's a new boolean or a blank string on the field name.
I don't think the allowlist config is related to that PR at all. That PR is about how the gridfield component works, which could theoretically be used separately from this PR's changes. I'm also not sold on the allowlist config... seems like unnecessary additional complexity which is likely to be confusing as well as being more maintenance surface. It also seems unlikely that many projects will need it - and it would be an easy enough customisation to make in those few projects directly. If you can get another core committer or CMS squad member to back you up on it I'll probably be more okay with the change though - at the moment it's just one opinion vs another lol
I still think of the dropdown stuff as secondary. I did suggest some alternative interpretations but primary and secondary seems intuitive to me. I don't really think the semantics matter too much though so long as it's clear through documentation what we mean. |
OK how about the following:
? Re primary/general - the 'secondary' search is actually a 'filter' :-) So even "general" does not make sense here. Is it viable to just drop the word "primary" altogether? |
Sounds okay to me 👍
We need some name for it just to be able to talk about it in docs and to give it a name in code. I guess we could just say |
We could make the prefix "cms" e.g. It is for "CMS search" as opposed to "Front end search" Though at the same time we already have
Yeah I think just drop it |
I'd rather "admin" than "CMS" - but even then, the PHPDoc for
So we can't rule out people using it in the frontend.
This seems okay...
If I saw this my instinct would be that it's the default filter for
Doesn't seem clear enough that it's not for splitting terms on the I honestly don't think referring to it as a primary field is a big enough problem to be worth solving - especially as the alternatives are (subjectively) just as confusing, if not potentially more so. But since we need a UX person to look at this functionality for the splitting conjunctive vs disjunctive conundrum, we could get them to weigh in on this discussion as well. It doesn't have to be decided right away since we aren't merging until we have had some UX feedback (I think?) |
Happy to wait for UX if you think that helps make a decision If these new config option are being used strictly inside the CMS, then as you say 'admin' is a better prefix than 'cms'. If they might also be used on the front-end (it's not clear to me they would be, though I could be wrong) then something other than 'primary' would be less confusing I think. also happy with just using 'searchable_fields_` as the prefix instead of 'primary_search_', that may be best - would you be OK with that? |
I'd be less okay with prefixing it with |
I don't think we should do partial search term matching. If we had a way to sort results to have the more relevant one (AKA the ones containing all the search terms) first, maybe. But yes, it's probably worth bring some UX help to resolve these kind of questions. |
UX advice:
|
efbc4ae
to
4477200
Compare
UX advice has been applied. |
Just did a quick test, something here still feels wrong due to the inability to matches multiple fields which should yield a better match If I create a member with the firstname "andy" and surname "bob", when I search in the Security section for "andy" it matches, for "bob" it matches, though "andy bob" it does not match. Likewise if I change the surname to "big mac", then searching for "big", "mac" or "big mac" they all match, though "andy big mac" does not Based on the UX advise above |
This refers specifically to when searching for "big mac" - a field that only contains the string "big" or the string "mac" will not match.
But if you search "big", a field that contains "big" will match, so "big mac" will match, because it contains "big". We're still using the
This is the same as if you create someone with the first name "big" and the surname "mac". If you search "big" or "mac" it will match, which is expected. If you search "big mac" it will not match, because no single field contains the search phrase "big mac". This is because we are not splitting the search phrase, so we're explicitly searching for |
Yeah this just feels wrong IMO. Searching for a member with "[firstname] [surname]" in the Security section feels like a pretty normal thing to do, so it's just seems wrong to get no results when doing this
This UX advise makes sense in isolation i.e. must match the whole term "big mac", though I'm presuming the "can also partial match across fields" scenario wasn't considered at the time? Seems like we need the following rules when searching for "big mac" Can match if one of the following conditions is met This won't match if firstname is "small" and surname is "mac" i.e. still needs to match everything that was searched Of course at that point you don't need logic for scenario a) since logic used in b) will cover all the scenarios I get this is technically harder and annoying feedback at this stage, though it seems a much more correct user experience IMO |
We did discuss that briefly and the advice was that users can just use the specific field search in the dropdown if they want to search across fields in that way. We can always make that change in the future as another enhancement if that's something that people start clammering for. If you like though we can discuss this in |
New UX advice is that we should split terms, for the reasons Steve has expressed. I will re-implement the term-splitting logic, including having it configurable (which, as per documentation in the related PR, is necessary to avoid unexpected behaviour if the filter is changed to There is also advice about how to more clearly indicate the functionality to the user (i.e. that it will search each term across all searchable fields) and some mockups are likely to come - IMO if the mockups are not complete or not agreed on by the time the rest of this PR is ready to merge, it would not be the end of the world for that to be a follow-up enhancement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with code, have tested manually and it feels pretty good
I've left some other comments on the docs PR which are also applicable to this PR - silverstripe/developer-docs#5 (review)
Changes the default field for gridfield filters from being the first
$searchable_field
to searching instead across all$searchable_fields
by default.Includes a way to fall back to the old way (use the first field), which may be useful for projects that already have their own custom implementation of this feature or which implement some other specific behaviour instead.
An alternative approach, similar to what Max did in #9836, would be to split the search query up into whitespace-delimited terms and pass those as the value for the filters - this would then result in using
applyMany
instead ofapplyOne
in the filter. I'd be open to implementing that either as a thing to always do or a configuration option if someone thinks it would be a significant enough improvement over this approach - but bear in mind it comes at the cost of a more complex query. It would be easy enough to implement that after the fact however so I've opted for the simpler approach for now.Also refactored
SearchContext
somewhat to be easier to override specific parts of the process - e.g. overrideprepareQuery()
to change the query before the searching starts, or overridesearch()
to override the search behaviour (such as to use solr/algolia/etc instead of DB queries), or overrideindividualFieldSearch()
to change the way individual fields get searched against.Parent issue