Skip to content

Commit

Permalink
Doc/cross reference (apache#791)
Browse files Browse the repository at this point in the history
* Update docstrings so that cross references work in online docs. Also switch from autosummary to autoapi in sphinx for building API reference documents

* Update documentation to cross reference

* Correct class names and internal attr

* Revert changes that will end up coming in via PR apache#782

* Add autoapi to requirements file

* Add git ignore for files retrieved during local site building

* Remove unused portions of doc config

* Reset substrait capitalization that was reverted during rebase

* Small example changes
  • Loading branch information
timsaucer authored Aug 6, 2024
1 parent 1d61548 commit bd0e820
Show file tree
Hide file tree
Showing 34 changed files with 370 additions and 517 deletions.
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pokemon.csv
yellow_trip_data.parquet
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ maturin
jinja2
ipython
pandas
pickleshare
pickleshare
sphinx-autoapi
31 changes: 0 additions & 31 deletions docs/source/api.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/dataframe.rst

This file was deleted.

29 changes: 0 additions & 29 deletions docs/source/api/execution_context.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/expression.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/functions.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/object_store.rst

This file was deleted.

56 changes: 26 additions & 30 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,11 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"myst_parser",
"IPython.sphinxext.ipython_directive",
"autoapi.extension",
]

source_suffix = {
Expand All @@ -70,33 +66,35 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# Show members for classes in .. autosummary
autodoc_default_options = {
"members": None,
"undoc-members": None,
"show-inheritance": None,
"inherited-members": None,
}

autosummary_generate = True

autoapi_dirs = ["../../python"]
autoapi_ignore = ["*tests*"]
autoapi_member_order = "groupwise"
suppress_warnings = ["autoapi.python_import_resolution"]
autoapi_python_class_content = "both"

def autodoc_skip_member(app, what, name, obj, skip, options):
exclude_functions = "__init__"
exclude_classes = ("Expr", "DataFrame")

class_name = ""
if hasattr(obj, "__qualname__"):
if obj.__qualname__ is not None:
class_name = obj.__qualname__.split(".")[0]
def autoapi_skip_member_fn(app, what, name, obj, skip, options):
skip_contents = [
# Re-exports
("class", "datafusion.DataFrame"),
("class", "datafusion.SessionContext"),
("module", "datafusion.common"),
# Deprecated
("class", "datafusion.substrait.serde"),
("class", "datafusion.substrait.plan"),
("class", "datafusion.substrait.producer"),
("class", "datafusion.substrait.consumer"),
("method", "datafusion.context.SessionContext.tables"),
("method", "datafusion.dataframe.DataFrame.unnest_column"),
]
if (what, name) in skip_contents:
skip = True

should_exclude = name in exclude_functions and class_name in exclude_classes
return skip

return True if should_exclude else None


def setup(app):
app.connect("autodoc-skip-member", autodoc_skip_member)
def setup(sphinx):
sphinx.connect("autoapi-skip-member", autoapi_skip_member_fn)


# -- Options for HTML output -------------------------------------------------
Expand All @@ -106,9 +104,7 @@ def setup(app):
#
html_theme = "pydata_sphinx_theme"

html_theme_options = {
"use_edit_page_button": True,
}
html_theme_options = {"use_edit_page_button": False, "show_toc_level": 2}

html_context = {
"github_user": "apache",
Expand Down
2 changes: 0 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,5 +104,3 @@ Example
:hidden:
:maxdepth: 1
:caption: API

api
14 changes: 8 additions & 6 deletions docs/source/user-guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
.. _user_guide_concepts:

Concepts
========

Expand Down Expand Up @@ -52,7 +54,7 @@ The first statement group:
# create a context
ctx = datafusion.SessionContext()
creates a :code:`SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
creates a :py:class:`~datafusion.context.SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
of the connection between a user and an instance of the DataFusion engine. Additionally it provides the following functionality:

- Create a DataFrame from a CSV or Parquet data source.
Expand All @@ -72,9 +74,9 @@ The second statement group creates a :code:`DataFrame`,
df = ctx.create_dataframe([[batch]])
A DataFrame refers to a (logical) set of rows that share the same column names, similar to a `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_.
DataFrames are typically created by calling a method on :code:`SessionContext`, such as :code:`read_csv`, and can then be modified by
calling the transformation methods, such as :meth:`.DataFrame.filter`, :meth:`.DataFrame.select`, :meth:`.DataFrame.aggregate`,
and :meth:`.DataFrame.limit` to build up a query definition.
DataFrames are typically created by calling a method on :py:class:`~datafusion.context.SessionContext`, such as :code:`read_csv`, and can then be modified by
calling the transformation methods, such as :py:func:`~datafusion.dataframe.DataFrame.filter`, :py:func:`~datafusion.dataframe.DataFrame.select`, :py:func:`~datafusion.dataframe.DataFrame.aggregate`,
and :py:func:`~datafusion.dataframe.DataFrame.limit` to build up a query definition.

The third statement uses :code:`Expressions` to build up a query definition.

Expand All @@ -85,5 +87,5 @@ The third statement uses :code:`Expressions` to build up a query definition.
col("a") - col("b"),
)
Finally the :code:`collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.
Finally the :py:func:`~datafusion.dataframe.DataFrame.collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.
2 changes: 1 addition & 1 deletion docs/source/user-guide/common-operations/aggregations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Aggregation
============

An aggregate or aggregation is a function where the values of multiple rows are processed together to form a single summary value.
For performing an aggregation, DataFusion provides the :meth:`.DataFrame.aggregate`
For performing an aggregation, DataFusion provides the :py:func:`~datafusion.dataframe.DataFrame.aggregate`

.. ipython:: python
Expand Down
8 changes: 4 additions & 4 deletions docs/source/user-guide/common-operations/basic-info.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,26 +34,26 @@ In this section, you will learn how to display essential details of DataFrames u
})
df
Use :meth:`.DataFrame.limit` to view the top rows of the frame:
Use :py:func:`~datafusion.dataframe.DataFrame.limit` to view the top rows of the frame:

.. ipython:: python
df.limit(2)
Display the columns of the DataFrame using :meth:`.DataFrame.schema`:
Display the columns of the DataFrame using :py:func:`~datafusion.dataframe.DataFrame.schema`:

.. ipython:: python
df.schema()
The method :meth:`.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
The method :py:func:`~datafusion.dataframe.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
passing them to an Arrow table, and then converting them to a pandas DataFrame.

.. ipython:: python
df.to_pandas()
:meth:`.DataFrame.describe` shows a quick statistic summary of your data:
:py:func:`~datafusion.dataframe.DataFrame.describe` shows a quick statistic summary of your data:

.. ipython:: python
Expand Down
12 changes: 7 additions & 5 deletions docs/source/user-guide/common-operations/expressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
.. _expressions:

Expressions
===========

Expand All @@ -26,16 +28,16 @@ concept shared across most compilers and databases.
Column
------

The first expression most new users will interact with is the Column, which is created by calling :func:`col`.
This expression represents a column within a DataFrame. The function :func:`col` takes as in input a string
The first expression most new users will interact with is the Column, which is created by calling :py:func:`~datafusion.col`.
This expression represents a column within a DataFrame. The function :py:func:`~datafusion.col` takes as in input a string
and returns an expression as it's output.

Literal
-------

Literal expressions represent a single value. These are helpful in a wide range of operations where
a specific, known value is of interest. You can create a literal expression using the function :func:`lit`.
The type of the object passed to the :func:`lit` function will be used to convert it to a known data type.
a specific, known value is of interest. You can create a literal expression using the function :py:func:`~datafusion.lit`.
The type of the object passed to the :py:func:`~datafusion.lit` function will be used to convert it to a known data type.

In the following example we create expressions for the column named `color` and the literal scalar string `red`.
The resultant variable `red_units` is itself also an expression.
Expand All @@ -62,7 +64,7 @@ Functions
---------

As mentioned before, most functions in DataFusion return an expression at their output. This allows us to create
a wide variety of expressions built up from other expressions. For example, :func:`.alias` is a function that takes
a wide variety of expressions built up from other expressions. For example, :py:func:`~datafusion.expr.Expr.alias` is a function that takes
as it input a single expression and returns an expression in which the name of the expression has changed.

The following example shows a series of expressions that are built up from functions operating on expressions.
Expand Down
Loading

0 comments on commit bd0e820

Please sign in to comment.