Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with large pages #1

Open
yhoiseth opened this issue Oct 24, 2019 · 4 comments
Open

Problem with large pages #1

yhoiseth opened this issue Oct 24, 2019 · 4 comments

Comments

@yhoiseth
Copy link

Hi,

Thanks a lot for sharing. I encountered an issue that I suspect others will encounter, too.

Algolia has—at least to me—surprisingly small limits on record sizes:

  • 10 KB for Pro, Starter, or Free accounts
  • 20 KB for legacy (Essential and Plus)

It appears that for Enterprise accounts, you can have larger limits.

In practice, this means that if you e.g. have a page with a body, title, search_description, etc. of more than 10 000 characters put together (which is quite common), you'll encounter an error when indexing.

Algolia's official solution to this problem is splitting records, e.g. by paragraph.

I inquired with Algolia's support team how to do this with the Django integration. They answered that they don't think it's possible.

My assessment is that, in order to index large pages in Wagtail/Django, we would need to split records and build the index using Algolia's generic Python client. That, however, seems like a hack and more trouble than it's worth to us. (I'd be very interested in knowing if anyone has done this or has found a different solution.)

Also, regarding the blog post Using Algolia Search with Wagtail, I think it would be useful to add a warning about Algolia's record size limits. With such a warning, I probably wouldn't have spent any time trying to implement Algolia. It could, for example, say something like:

Warning: Be aware that Algolia at the time of writing has a 10 KB limit on record sizes for all new accounts except Enterprise. This means that indexing any page with more than 10 000 characters of text will fail. There is an official solution to this issue, but it doesn't work with the Django integration.

@TomKlotzPro
Copy link

Hello @yhoiseth ! I'm Tom and I work at Algolia, I used to work on laravel integration, we have a Splitter that helps us splitting large records. But I don't know if it's possible to do it with Django. I'll try to dig into it and try to implement it.

This is our documentation for our splitter : https://www.algolia.com/doc/framework-integration/laravel/advanced-use-cases/split-large-records/?language=php

And this is the repo of our laravel integration : https://github.com/algolia/scout-extended

Cheers

@yhoiseth
Copy link
Author

That's great, I really appreciate it ❤

Let me know if you need to talk things through or something. (Just be aware that I wouldn't consider myself a Django expert 😉)

@yhoiseth
Copy link
Author

For now, we have worked around this issue by making a similar solution using Elasticsearch and Bootstrap Autocomplete. I'm sharing how to achieve it here in case anyone else runs into a similar problem.

Caveats and prerequisites

  • You need Elasticsearch, which can be expensive. Our hosting provider, platform.sh, provides Elasticsearch free of charge.
  • This solution is not as snappy as Algolia.
  • My example is for an old-school website with Bootstrap 4 and jQuery.

Demo

See the search box in the navbar on https://www.entrepedia.com/.

How

Set up Elasticsearch

See Backends — Wagtail Documentation. (The other backends don't work as well because they don't return results until words are almost written out. Elasticsearch can return results when the query is as little as one character.)

Set up search endpoint

When you start a Wagtail project, it sets up a default search endpoint.

In order to make it work with Bootstrap Autocomplete, you need to change query to q. Do this in the view and the template if you plan on having a graceful fallback in case the JavaScript breaks.

Next, you need to return the search results as JSON if the autocomplete is doing the searching. There are many ways to do this. A slightly ugly but functional way is to add the following guard above the existing return statement:

    from json import dumps
    from django.http import HttpResponse
    # …
    if request.is_ajax():
        data = []
        for hit in search_results:
            data.append({"url": hit.url, "text": hit.title})
        return HttpResponse(dumps(data), content_type="application/json")

Set up frontend

For the frontend, we need a search field and some JavaScript. These are the relevant parts:

<input
  aria-label="Search"
  autocomplete="off"
  class="form-control"
  id="search-input"
  name="q"
  type="search"
>
<script
  src="https://cdn.jsdelivr.net/gh/xcash/[email protected]/dist/latest/bootstrap-autocomplete.min.js"
></script>
<script>
  $(document).ready(function () {
    var $searchInput = $("#search-input");
    $searchInput.autoComplete({
      minLength: 1,
      resolverSettings: {
        url: "{% url "search" %}"
      }
    });
    $searchInput.on("autocomplete.select", function (event, item) {
      window.location.pathname = item.url;
    });
    $searchInput.on("keydown", function (event) {
      if (event.keyCode === 13) {
        return false; // Do not submit form on ENTER
      }
    });
  });
</script>

CC algolia/algoliasearch-django#285

@yhoiseth
Copy link
Author

Hi @KalobTaulien,

Just a heads-up in case you didn't notice this 🙂

TLDR: You might want to add some info about the size limit to your blog post as a courtesy to readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants