Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatedly running query built with Query API slows down after Nth iteration. #213

Closed
mikunn opened this issue Mar 20, 2020 · 2 comments
Closed

Comments

@mikunn
Copy link

mikunn commented Mar 20, 2020

Environment

  • Elixir version: 1.10.2
  • Database and version: PostgreSQL 11.6
  • Ecto version: 3.3.4
  • Database adapter and version: postgrex 0.15.3
  • Operating system: macOS Mojave 10.14.6

Current behavior

I have a function that does a query using the query API on a table with around 14.6 million rows with various indexes. When I run the function multiple times in a row the query starts slowing down after a certain number of runs. The strange thing is that the same SQL query (copied from the logger output) runs just fine with Repo.query/3. It doesn't matter if I run them in "a loop" or by hand in iex.

How many times I can run the function is affected drastically by pool_size although it's only using one db connection and others are idling. With one connection, I can run it around 15 times before the slowdown on my machine. With Repo.query/3 though, I can run it with one db connection to my heart's content without any issues.

Another strange thing is that, after the slowdown, the only way I can make that particular query show normal execution time with the Query API is to restart the application. So once it has slowed down once, the execution time is slow until I restart the application. Other queries run just fine.

Here's an example. The first log entry is the 15th run and the second is the 16th. I use Benchee with the default settings. Running the function repeatedly manually causes the same behaviour, so Benchee is not the issue.

12:02:03.378 [debug] QUERY OK source="table_b" db=117.4ms decode=5.0ms idle=426.5ms
SELECT t0."id" FROM "table_b" AS t0 WHERE (((t0."int_field" = 1) AND (t0."date_field" >= $1)) AND (t0."date_field" <= $2)) [~D[2019-01-01], ~D[2020-12-31]]

12:02:13.542 [debug] QUERY OK source="table_b" db=10115.1ms decode=6.6ms idle=386.1ms
SELECT t0."id" FROM "table_b" AS t0 WHERE (((t0."int_field" = 1) AND (t0."date_field" >= $1)) AND (t0."date_field" <= $2)) [~D[2019-01-01], ~D[2020-12-31]]

See the db execution time dropping from around 120 ms to 10k ms.

Benchee output shows always something akin to

Name           ips        average  deviation         median         99th %
sql           7.67        0.130 s     ±6.37%        0.132 s        0.148 s
dsl           0.62         1.61 s   ±234.77%        0.176 s        10.16 s

The execution time is pretty consistent with pure SQL using Repo.query/3 while there's a huge deviation using the Query API.

The code

I wrote as small of a sample project as I could to demonstrate the issue. Hopefully you can reproduce the issue with it. Following is the most important code.

Migration
defmodule EctoTester.Repo.Migrations.AddTables do
  use Ecto.Migration

  def change do
    create table(:table_a) do
    end

    create table(:table_b) do
      add :table_a_id, references(:table_a, on_delete: :delete_all)
      add :int_field, :integer, null: false
      add :date_field, :date, null: false
    end

    create index(:table_b, [:table_a_id])
    create index(:table_b, [:int_field])
    create index(:table_b, [:date_field])
  end
end

If I don't create table_a, there is no problem. But it's not used in the query.

Query and benchmarking code
defmodule EctoTester do
  alias EctoTester.Repo

  import Ecto.Query

  def bench() do
    Benchee.run(
      %{
        "dsl" => fn -> dsl(~D[2019-01-01], ~D[2020-12-31]) end,
        "sql" => fn -> sql(~D[2019-01-01], ~D[2020-12-31]) end
      }, time: 10
    )
  end

  def dsl(from, to) do
    q =
      from(b in "table_b",
        where: b.int_field == 1 and b.date_field >= ^from and b.date_field <= ^to,
        select: b.id
      )

    Repo.all(q)
  end

  def sql(from, to) do
    Repo.query!(
      """
      SELECT t0."id"
      FROM "table_b" AS t0
      WHERE (((t0."int_field" = 1) AND (t0."date_field" >= $1)) AND (t0."date_field" <= $2))
      """,
      [from, to]
    )
  end
end

Expected behavior

The db query performance with Query API should be fairly consistent similar to that of Repo.query/3.

@josevalim
Copy link
Member

Please see discussion here for more info: elixir-ecto/postgrex#497

@mikunn
Copy link
Author

mikunn commented Mar 20, 2020

Thank you José for the link!

However, I'm not sure that is entirely related as, if I understood correctly, the other discussion is related to varying parameters for prepared statements, while in my case the parameters are always the same.

That being said, after upgrading to PostgreSQL 12.2 I now see constant results without the need to touch plan_cache_mode or any other settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants