-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate Postgres to Neon.tech #492
Migrate Postgres to Neon.tech #492
Conversation
I have decided to hook up the 1Password Service Account part of this so that we only need to configure 1 secret in the new app, The biggest benefit is that secrets will be version controlled, in As I'm going this, I would like to double-check that we still use these. This is what I think we need:
Did I cross out all secrets that we no longer use @jerodsanto? |
Looks correct. |
09186d5
to
e812ebc
Compare
Thanks for confirming @jerodsanto! I now have all of this wired up & working as https://changelog-2023-12-17.fly.dev/ . See the PR description for initial observations & next steps. Leaving something here for Neon Support: I am unable to make https://neon.tech/docs/guides/elixir-ecto#configure-ecto work for our Elixir application. If I use the following config with our credentials: config :friends, Friends.Repo,
database: "friends",
username: "alex",
password: "AbC123dEf",
hostname: "ep-cool-darkness-123456.us-west-2.aws.neon.tech",
ssl: true,
ssl_opts: [
server_name_indication: 'ep-cool-darkness-123456.us-west-2.aws.neon.tech',
verify: :verify_none
] I get the following errors: ** (Postgrex.Error) ERROR 26000 (invalid_sql_statement_name) prepared statement "ecto_1922" does not exist ![]() If I follow the https://neon.tech/docs/guides/elixir-ecto#configure-ecto documentation further and configure ssl_opts: [
cacerts: :public_key.cacerts_get(), # available since OTP26
verify: :verify_peer,
server_name_indication: String.to_charlist(System.get_env("DB_HOST", "db")),
customize_hostname_check: [
match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
]
],
ssl_opts: [
verify: :verify_peer,
cacerts: :public_key.cacerts_get(), # available since OTP26
versions: [:"tlsv1.3"],
depth: 3,
server_name_indication: String.to_charlist(System.get_env("DB_HOST", "db")),
customize_hostname_check: [
match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
]
], Ecto is not even able to connect to Neon: 17:41:53.361 [notice] Application changelog exited: Changelog.Application.start(:normal, []) returned an error: shutdown: failed to start child: Changelog.EpisodeTracker
** (EXIT) an exception was raised:
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2955ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
1. Ensuring your database is available and that you can connect to it
2. Tracking down slow queries and making sure they are running fast enough
3. Increasing the pool_size (although this increases resource consumption)
4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
See DBConnection.start_link/2 for more information
(ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:1047: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:945: Ecto.Adapters.SQL.execute/6
(ecto 3.10.3) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
(ecto 3.10.3) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
(changelog 0.0.1) lib/changelog/schema/episode/episode.ex:421: Changelog.Episode.flatten_for_filtering/1
(changelog 0.0.1) lib/changelog/episode_tracker.ex:130: Changelog.EpisodeTracker.refresh_episodes/0
(changelog 0.0.1) lib/changelog/episode_tracker.ex:57: Changelog.EpisodeTracker.init/1
(stdlib 5.2) gen_server.erl:980: :gen_server.init_it/2 ![]() This is the only config option that works - in combination with specifying the endpoint id in the password field
While the above workaround works, I am not comfortable skipping remote peer verification in production. What am I doing wrong? |
Leaving this fun gotcha here: Screen.Recording.2023-12-30.at.17.13.44.mp4 |
Rebasing master should reduce the number of |
e812ebc
to
5198e6f
Compare
That was helpful @jerodsanto! I just rebased on top of https://changelog-2022-03-13.fly.dev/ uses ![]() https://changelog-2023-12-17.fly.dev/ uses ![]() That is a sweet improvement @jerodsanto 💪 I am going to compare the |
https://changelog-2022-03-13.fly.dev/feed typically resolves in https://changelog-2023-12-17.fly.dev/feed resolves in |
Worth noting that |
Good point! Also, in the last 7 days, |
d5f6ad4
to
ceaaf14
Compare
Signed-off-by: Gerhard Lazu <[email protected]>
Same version that we are running in Neon.tech. Signed-off-by: Gerhard Lazu <[email protected]>
Signed-off-by: Gerhard Lazu <[email protected]>
I found this helpful when testing a new production setup. Signed-off-by: Gerhard Lazu <[email protected]>
We have not used this in years, pretty sure that we will not need it anytime soon. Making it easy to git revert if I'm wrong about it. Signed-off-by: Gerhard Lazu <[email protected]>
This adds the op CLI & versions the dagger CLI so that I can run the following locally & publish a prod image: op inject -i envrc.op -o .envrc direnv allow dagger run mage image:production Also check that a few more required env variables are present, otherwise we will continue being surprised why things don't work... Signed-off-by: Gerhard Lazu <[email protected]>
This looks better: Targets: cd Run the CD pipeline ci Run the CI pipeline fly:daggerStart Start Dagger Engine on Fly.io fly:daggerStop Stop Dagger Engine on Fly.io fly:deploy Push app container image to Fly.io image:production Build & publish the production image image:runtime Build & publish the runtime image test Run tests Signed-off-by: Gerhard Lazu <[email protected]>
This will deploy https://changelog-2023-12-17.fly.dev We are setting up a new production app so that we can test the Neon.tech Postgres integration before promoting this to the new production. There is one more commit missing to get this integration going... Meanwhile, the 1Password Service Account integration allows us to set a single secret in the app - OP_SERVICE_ACCOUNT_TOKEN - and then `op` takes care of templating all other secrets just-in-time, when specific commands are run, i.e. `db.migrate` or `app.start`. This simplifies the app configuration considerably, and also makes rotating secrets super simple - just modify them in 1Password, the `changelog` vault, and restart the app 😉 Signed-off-by: Gerhard Lazu <[email protected]>
This didn't work as documented, but I will add more context to the PR so that we can go over it with with Neon.tech Support... Signed-off-by: Gerhard Lazu <[email protected]>
So that it is close to Neon AWS us-east-1 (lower db latency). Signed-off-by: Gerhard Lazu <[email protected]>
Same as the current production config. Signed-off-by: Gerhard Lazu <[email protected]>
Just did this again & captured what worked today. Also made a few more changes to INFRASTRUCTURE so that it reflects the upcoming changes more accurately. Signed-off-by: Gerhard Lazu <[email protected]>
Still fails as documented initially. Signed-off-by: Gerhard Lazu <[email protected]>
Signed-off-by: Gerhard Lazu <[email protected]>
Signed-off-by: Gerhard Lazu <[email protected]>
Especially useful when iterating locally, and the git sha doesn't change. Signed-off-by: Gerhard Lazu <[email protected]>
Signed-off-by: Gerhard Lazu <[email protected]>
We are in 2024 baby! While at it, capture the step-by-step instructions. Signed-off-by: Gerhard Lazu <[email protected]>
While this app is still the current production, we will no longer be deploying to it after we merge this. I also updated all references to this app instance in our internal docs. I also removed the other app which we were using to debug various Fastly & Fly.io proxying issues. No longer needed, cleaning all of 2022.fly. Signed-off-by: Gerhard Lazu <[email protected]>
ceaaf14
to
e298066
Compare
Signed-off-by: Gerhard Lazu <[email protected]>
https://changelog.com Postgres is now on Neon.tech. There were a few issues, but nothing major: https://ui.honeycomb.io/changelog/datasets/fastly/result/Eb7gHnHDygg ![]() P99 latency is 6% higher - ![]() |
This looks good to me: last 30 minutes compared to a day before ![]() |
This is the next logical step after migrating to Neon.tech part of #492 cd changelog dagger call db-branch --neon-api-key=env:NEON_API_KEY To learn more, see `changelog/README.md`. Part of this, we also deployed a Dagger Engine v0.10.3 on Fly.io so that we don't need any sort of container runtime running locally. I know that Jerod will appreciate this. The beginning of a new generation of tooling, I'm sure of it. Signed-off-by: Gerhard Lazu <[email protected]>
This is the next logical step after migrating to Neon.tech part of #492 cd changelog dagger call db-branch --neon-api-key=env:NEON_API_KEY To learn more, see `changelog/README.md`. Part of this, we also deployed a Dagger Engine v0.10.3 on Fly.io so that we don't need any sort of container runtime running locally. I know that Jerod will appreciate this. The beginning of a new generation of tooling, I'm sure of it. Signed-off-by: Gerhard Lazu <[email protected]>
This is the next logical step after migrating to Neon.tech part of #492 cd changelog dagger call db-branch --neon-api-key=env:NEON_API_KEY To learn more, see `changelog/README.md`. Part of this, we also deployed a Dagger Engine v0.10.3 on Fly.io so that we don't need any sort of container runtime running locally. I know that Jerod will appreciate this. The beginning of a new generation of tooling, I'm sure of it. Signed-off-by: Gerhard Lazu <[email protected]>
This is the next logical step after migrating to Neon.tech part of #492 cd changelog dagger call db-branch --neon-api-key=env:NEON_API_KEY To learn more, see `changelog/README.md`. Part of this, we also deployed a Dagger Engine v0.10.3 on Fly.io so that we don't need any sort of container runtime running locally. I know that Jerod will appreciate this. The beginning of a new generation of tooling, I'm sure of it. Signed-off-by: Gerhard Lazu <[email protected]>
…nd (#508) * Downgrade Erlang to 26.2.2 Elixir 1.14.5 installed was segfaulting on macOS 12.7.3 ARM. Installing the Elixir `otp26` variant didn't fix it. Maybe an asdf issue... Will try again another time. Signed-off-by: Gerhard Lazu <[email protected]> * Enable changelog.com devs to create prod db forks with a single command This is the next logical step after migrating to Neon.tech part of #492 cd changelog dagger call db-branch --neon-api-key=env:NEON_API_KEY To learn more, see `changelog/README.md`. Part of this, we also deployed a Dagger Engine v0.10.3 on Fly.io so that we don't need any sort of container runtime running locally. I know that Jerod will appreciate this. The beginning of a new generation of tooling, I'm sure of it. Signed-off-by: Gerhard Lazu <[email protected]> --------- Signed-off-by: Gerhard Lazu <[email protected]>
We want to reference all secrets from 1Password so that when we need to rotate anything, we simply update the values in 1Password and restart the app so that it reads the new values just-in-time, at boot time. FTR: - thechangelog#492 (comment) Signed-off-by: Gerhard Lazu <[email protected]>
We want to reference all secrets from 1Password so that when we need to rotate anything, we simply update the values in 1Password and restart the app so that it reads the new values just-in-time, at boot time. FTR: - #492 (comment) Signed-off-by: Gerhard Lazu <[email protected]>
This is a follow-up to:
https://changelog-2024-01-12.fly.dev/ is configured to use https://console.neon.tech/app/projects/orange-sound-86604986
Initial observations
1/3. P99 latency for
/feed
increased by3x
-3s
vs1s
2/3.
Ecto SSL config doesn't seem to be working with:verify_peer
See the this commit for more details. FTR:chore: Enable TLS connection to postgres on newer OTP versions kitsteam/wordcharts#4We figured it out with @brendan-stephens via #492 (comment)
3/3.
We are doing 70+SELECTS
when serving/
This seems a lot, but maybe it's necessary. Since eachSELECT
adds an extra 2-10ms due to the network latency, it is reasonable to expect an extra 500ms latency across 70+SELECT
statements.See changelog-2024-01-12.fly.dev/ vs changelog-2022-03-13.fly.dev/. Now addressed in #492 (comment)I expect other pages such as
/feed
which use 473SELECT
statements to result in ever slower responses. Initial observations suggest4.7s.
vs the typical1.7s
.Next steps
:verify_peer
changelog-2022-03-13
app instancechangelog.com
which makes comparing the two origins side-by-side difficult::verify_peer
working inssl_opts
changelog-2024-01-12
app instancev176
!)After a few days, when we confirm that everything looks right, delete
changelog-2022-03-13
app instance +changelog-postgres-2023-07-31
& party 🎉