-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SQL for generation #11
Comments
I was thinking about the same thing. It would be interesting to compare if everything was offloaded to a single transaction script like this and see if offloading all the math to SQLite makes sense. While there are range restrictions on the values, there aren't distribution restrictions (normal, poisson, what have you) and the post already says that any regular PRNG is fine and crypto-strength not necessary, so SQLite Having a quick scratch at the math, it would look something like: Insert into user
with recursive usr(id, area, age, active) as (
values (1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1))
union all select id + 1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1) from usr
where id < 100000
)
select * from usr |
@WorldMaker hey, could you send this code as a PR? |
I ran this locally, this is quite slow. Do let me know if i am missing something.
#!/bin/sh
sqlite3 pure_sql.db <<'END_SQL'
insert into user
with recursive usr(id, area, age, active) as (
values (1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1))
union all select id + 1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1) from usr
where id < 100000000
)
select * from usr;
END_SQL saved this in a script and ran it with time: time ./sq.sh
./sq.sh 111.59s user 2.35s system 96% cpu 1:58.00 total |
@avinassh It turns out - is it at least faster than PyPy? |
PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY; Do they influence here? |
I had tried with them too, but it did not make much difference: #!/bin/sh
sqlite3 pure_sql.db <<'END_SQL'
PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;
insert into user
with recursive usr(id, area, age, active) as (
values (1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1))
union all select id + 1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1) from usr
where id < 100000000
)
select * from usr;
END_SQL output:
|
Interesting results. We're about at the limit of my personal experience with SQLite directly and a lot of the things you might try to speed things up in other SQL environments obviously don't apply to SQLite. Reviewing the Stack Overflow posted above, the #!/bin/sh
sqlite3 pure_sql.db <<'END_SQL'
insert into user (area, age, active)
select abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1)
from generate_series(0, 100000000);
END_SQL |
Using -- load.sql
insert into user (area, age, active)
select abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1)
from generate_series(0, 100000000);
Using recursive: -- load-rec.sql
insert into user
with recursive usr(id, area, age, active) as (
values (1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1))
union all select id + 1, abs(random() % 99999), (abs(random() % 3) + 1) * 5, abs(random() % 1) from usr
where id < 100000000
)
select * from usr;
Adding in the pragma statements didnt make a difference in my test, in fact is slowed it down a few seconds. Interestingly, I tried without INSERT INTO user (area, age, active)
SELECT
1000000,
5,
1
FROM generate_series(1, 100000000, 1);
So Unsure if you're still accepting PRs, but lmk if you want to see one using |
Someone else did a flamegraph, I think some good time is being spent in random (in rust solution)
I don't have a pure SQL solution, so I will be happy to accept a PR on this since is close to python one :) |
Created a PR for this in #23! Based on of my machine, it appears to be faster than even the fastest rust solution, but would love to see a more official benchmark run |
What about a using SQLite for test data generation?
Example:
https://paulbradley.org/sqlite-test-data/
https://stackoverflow.com/questions/17931320/how-to-insert-random-data-into-a-sqlite-table-using-only-queries
The text was updated successfully, but these errors were encountered: