From 826d8b07906b460f7e74795ad8dfd00a0a3ab364 Mon Sep 17 00:00:00 2001 From: tanruixiang Date: Mon, 18 Sep 2023 10:47:16 +0800 Subject: [PATCH] chore: fix catalog's usage docs error and add missing trait content --- docs/source/library-user-guide/catalogs.md | 49 ++++++++++++++++++---- 1 file changed, 42 insertions(+), 7 deletions(-) diff --git a/docs/source/library-user-guide/catalogs.md b/docs/source/library-user-guide/catalogs.md index 226e28332453..dfded7c86254 100644 --- a/docs/source/library-user-guide/catalogs.md +++ b/docs/source/library-user-guide/catalogs.md @@ -23,7 +23,7 @@ This section describes how to create and manage catalogs, schemas, and tables in ## General Concepts -Catalogs, schemas, and tables are organized in a hierarchy. A catalog contains schemas, and a schema contains tables. +Cataloglist, Catalogs, schemas, and tables are organized in a hierarchy. A cataloglist contains catalogs, a catalog contains schemas and a schema contains tables. DataFusion comes with a basic in memory catalog functionality in the [`catalog` module]. You can use these in memory implementations as is, or extend DataFusion with your own catalog implementations, for example based on local files or files on remote object storage. @@ -31,9 +31,9 @@ DataFusion comes with a basic in memory catalog functionality in the [`catalog` Similarly to other concepts in DataFusion, you'll implement various traits to create your own catalogs, schemas, and tables. The following sections describe the traits you'll need to implement. -The `CatalogProvider` trait has methods to set a schema to a name, get a schema by name, and list all schemas. The `SchemaProvider`, which can be registered with a `CatalogProvider`, has methods to set a table to a name, get a table by name, list all tables, deregister a table, and check for a table's existence. The `TableProvider` trait has methods to scan underlying data and use it in DataFusion. The `TableProvider` trait is covered in more detail [here](./custom-table-providers.md). +The `CatalogList` trait has methods to register new catalogs, get a catalog by name and list all catalogs .The `CatalogProvider` trait has methods to set a schema to a name, get a schema by name, and list all schemas. The `SchemaProvider`, which can be registered with a `CatalogProvider`, has methods to set a table to a name, get a table by name, list all tables, deregister a table, and check for a table's existence. The `TableProvider` trait has methods to scan underlying data and use it in DataFusion. The `TableProvider` trait is covered in more detail [here](./custom-table-providers.md). -In the following example, we'll implement an in memory catalog, starting with the `SchemaProvider` trait as we need one to register with the `CatalogProvider`. +In the following example, we'll implement an in memory catalog, starting with the `SchemaProvider` trait as we need one to register with the `CatalogProvider`. Finally we will implement `CatalogList` to register the `CatalogProvider`. ## Implementing `MemorySchemaProvider` @@ -116,14 +116,13 @@ impl SchemaProvider for Schema { } ``` -### Implementing `MemoryCatalogProvider` +## Implementing `MemoryCatalogProvider` As mentioned, the `CatalogProvider` can manage the schemas in a catalog, and the `MemoryCatalogProvider` is a simple implementation of the `CatalogProvider` trait. It stores schemas in a `DashMap`. ```rust -pub struct MemoryCatalogList { - /// Collection of catalogs containing schemas and ultimately TableProviders - pub catalogs: DashMap>, +pub struct MemoryCatalogProvider { + schemas: DashMap>, } ``` @@ -170,6 +169,41 @@ impl CatalogProvider for MemoryCatalogProvider { Again, this is fairly straightforward, as there's an underlying data structure to store the state, via key-value pairs. +## Implementing `MemoryCatalogList` + +```rust +pub struct MemoryCatalogList { + /// Collection of catalogs containing schemas and ultimately TableProviders + pub catalogs: DashMap>, +} +``` + +With that the `CataLogList` trait can be implemented. +```rust +impl CatalogList for MemoryCatalogList { + fn as_any(&self) -> &dyn Any { + self + } + + fn register_catalog( + &self, + name: String, + catalog: Arc, + ) -> Option> { + self.catalogs.insert(name, catalog) + } + + fn catalog_names(&self) -> Vec { + self.catalogs.iter().map(|c| c.key().clone()).collect() + } + + fn catalog(&self, name: &str) -> Option> { + self.catalogs.get(name).map(|c| c.value().clone()) + } +} +``` +Like other traits, it also maintains the mapping of the Catalog's name to the CatalogProvider. + ## Recap To recap, you need to: @@ -177,3 +211,4 @@ To recap, you need to: 1. Implement the `TableProvider` trait to create a table provider, or use an existing one. 2. Implement the `SchemaProvider` trait to create a schema provider, or use an existing one. 3. Implement the `CatalogProvider` trait to create a catalog provider, or use an existing one. +4. Implement the `CataLogList` trait to create a cataloglist, or use an existing one.