Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaxRowsError for DuckDB table #557

Open
asterix314 opened this issue Feb 6, 2025 · 1 comment
Open

MaxRowsError for DuckDB table #557

asterix314 opened this issue Feb 6, 2025 · 1 comment

Comments

@asterix314
Copy link

I'm using

  • vegafusion 2.0.1
  • altair 5.5.0
  • duckdb 1.1.3

After loading a dudkdb table from a csv file (some 20K lines),

import duckdb

housing = duckdb.read_csv("housing.csv")

I encountered the MaxRowsError when trying to draw a histogram with altair, even after enabling the "vegafusion" data transformer. The code works when I convert the DuckDBPyRelation to a polars dataframe (alt.Chart(housing.pl())), though.

import altair as alt
alt.data_transformers.enable("vegafusion")
alt.renderers.enable("jupyter")

(
    alt.Chart(housing).mark_bar()
    .encode(
        alt.X('population').bin(maxbins=50).title(None),
        alt.Y('count()').title(None))
    .properties(width=200, height=100)
)

The error message was:

---------------------------------------------------------------------------
MaxRowsError                              Traceback (most recent call last)
File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/v5/api.py:1998, in TopLevelMixin.to_dict(self, validate, format, ignore, context)
   1995     except TypeError:
   1996         # Non-narwhalifiable type supported by Altair, such as dict
   1997         data = original_data
-> 1998     copy.data = _prepare_data(data, context)
   1999     context["data"] = data
   2001 # remaining to_dict calls are not at top level

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/v5/api.py:283, in _prepare_data(data, context)
    281 elif not isinstance(data, dict) and _is_data_type(data):
    282     if func := data_transformers.get():
--> 283         data = func(nw.to_native(data, pass_through=True))
    285 # convert string input to a URLData
    286 elif isinstance(data, str):

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/_vegafusion_data.py:105, in vegafusion_data_transformer(data, max_rows)
    100     return {"url": VEGAFUSION_PREFIX + table_name}
    101 else:
    102     # Use default transformer for geo interface objects
    103     # # (e.g. a geopandas GeoDataFrame)
    104     # Or if we don't recognize data type
--> 105     return default_data_transformer(data)

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/vegalite/data.py:42, in default_data_transformer(data, max_rows)
     39     return pipe
     41 else:
---> 42     return to_values(limit_rows(data, max_rows=max_rows))

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/data.py:165, in limit_rows(data, max_rows)
    162     values = data
    164 if max_rows is not None and len(values) > max_rows:
--> 165     raise_max_rows_error()
    167 return data

File ~/.cache/pypoetry/virtualenvs/lab-home-wtptIbf4-py3.12/lib/python3.12/site-packages/altair/utils/data.py:148, in limit_rows.<locals>.raise_max_rows_error()
    135 def raise_max_rows_error():
    136     msg = (
    137         "The number of rows in your dataset is greater "
    138         f"than the maximum allowed ({max_rows}).\n\n"
   (...)
    146         "on how to plot large datasets."
    147     )
--> 148     raise MaxRowsError(msg)

MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.
@jonmmease
Copy link
Collaborator

Hi @asterix314, sorry for the slow response here. What you have should work, I'll need to dig in to untangle what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants