Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter and search should be merged #269

Open
delgadom opened this issue Apr 26, 2017 · 1 comment
Open

filter and search should be merged #269

delgadom opened this issue Apr 26, 2017 · 1 comment

Comments

@delgadom
Copy link
Member

delgadom commented Apr 26, 2017

The Issue

The difference between filter and search is not intuitive and is unnecessary. Furthermore, the api should be refined to be more clear and provide more error handling

The motivation

I just ran this query and had no idea what happened:

In [1]: archives = api.batch_get_archive(api.search(
  ... :     pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
  ... :

In [2]: len(archives) # The query took forever, then returned ALL archives
Out[2]: 13375

Turns out the problem was that api.search doesn't have a pattern argument, but since it accepts **kwargs it failed to catch this. The correct query is:

In [3]: archives = api.batch_get_archive(api.filter(
  ... :     pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
  ... :

In [4]: len(archives)
Out[4]: 44

Proposed solution

We should merge filter and search into one function:

    def search(self, *query, prefix=None, pattern=None, engine='path'):
        '''
        Search for archives using tags, patterns, and prefixes

        Parameters
        ---------
        query: str
            tags to search for

        prefix: str
            start of archive name. Providing a start string improves search
            speed.

        pattern: str
            string matching the characters within the archive or set of
            archives you are filtering on. Note that authority prefixes, e.g.
            ``local://my/archive.txt`` are not supported in pattern searches.

        engine: str
            string of value 'str', 'path', or 'regex'. That indicates the
            type of pattern you are filtering on

        yields
        ------
        archive_name : str
            names of archives matching the specified criteria
        '''

        if prefix is not None:
            prefix = fs.path.relpath(prefix)

        if pattern is not None:
            pattern = fs.path.relpath(pattern)

        archives = self.manager.search(query, begins_with=prefix, )

        if not pattern:
            for archive in archives:
                yield archive

        elif engine == 'str':
            for arch in archives:
                if pattern in arch:
                    yield arch

        elif engine == 'path':
            # Change to generator version of fnmatch.filter

            for arch in archives:
                if fnmatch.fnmatch(arch, pattern):
                    yield arch

        elif engine == 'regex':
            for arch in archives:
                if re.search(pattern, arch):
                    yield arch

        else:
            raise ValueError(
                'search engine "{}" not recognized. '.format(engine) +
                'choose "str", "fn", or "regex"')
@delgadom
Copy link
Member Author

that said, it didn't actually take forever - only a minute or two to return not just names but full archive objects for 13,000 archives! woot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant