-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables Schemas in DuckDB #470
Conversation
Thanks for the fix @rlancer! I'd like to work out how to test this for the single and multi schema case (and I'm happy to work on adding tests). Do you know of an easy way to construct a DuckDB connection with multiple schemas? |
Looks like the tests that use a single schema are failing: https://github.com/hex-inc/vegafusion/actions/runs/8179198596/job/22368156069?pr=470 |
@jonmmease thank you! Will look at getting the tests worked out also noticing the same issue the tests are highlighting |
Hi @rlancer, anything I can do to help get this over the finish line? Do you know of a self-contained way to construct a DuckDB connection with multiple schemas that triggers this error? |
Any connection to MotherDuck should trigger the error As far as fixing it I think we would need to change how table paths are resolved ie |
I don't have a MotherDuck account, and I wouldn't want to introduce this as a dependency for testing. I was hoping there would be a way to create this situation with a local DuckDB connection. |
Sure I'll do a repo with just DuckDB |
@jonmmease this will repo the issue using only DuckDB: import vegafusion as vf
import pandas as pd
import altair as alt
import duckdb
conn = duckdb.connect()
conn.sql("create or replace schema schema1; create or replace table schema1.table1 (a int, b int);")
vf.runtime.set_connection(conn) |
perfect! |
Looking at this a bit, and I think the approach of referencing tables by URL with the The newer alternative, that I haven't documented yet, is to wrap an individual It looks like this approach works fine with the schema case: For example: import vegafusion as vf
from vegafusion.dataset.duckdb import DuckDbDataset
import pandas as pd
import altair as alt
import duckdb
conn = duckdb.connect()
conn.sql("""
create or replace schema schema1;
create or replace table schema1.table1 (a int, b int);
INSERT INTO schema1.table1 VALUES (1, 5);
INSERT INTO schema1.table1 VALUES (2, 8);
""")
rel = conn.sql("select * from schema1.table1")
# Create VegaFusion duckdb dataset with verbose=True so the query gets printed out
ds = DuckDbDataset(rel, verbose=True)
alt.data_transformers.enable("vegafusion")
alt.Chart(DuckDbDataset(rel)).mark_bar().encode(x="a", y="b")
See if this approach works for you |
Thank you, that does solve it! |
Awesome. I'm thinking long term we may want to deprecate the special "table://" syntax for referencing a table in a connection since this doesn't scale to multiple schema. |
Fixes issue where using a DuckDB database with multiple schemas breaks
Current result when running with a DB that has multiple schemas: