Skip to content

Commit

Permalink
Merge pull request #121 from artefactory/enh/refactoring_architecture
Browse files Browse the repository at this point in the history
Enh: refactoring repo organization
  • Loading branch information
gabrielleberanger authored Mar 9, 2021
2 parents c256f09 + b59d687 commit 13ce44d
Show file tree
Hide file tree
Showing 190 changed files with 4,127 additions and 2,482 deletions.
95 changes: 35 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,65 @@
# Nautilus Connectors Kit

**NCK is a Command-Line Interface (CLI), allowing you to easily request, stream and store raw reports, from the API source to the destination of your choice.**
**NCK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.**

The official documentation is available [here](https://artefactory.github.io/nautilus-connectors-kit/).
As of now, the most common output format of data loaded by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).

Official documentation is available [here](https://artefactory.github.io/nautilus-connectors-kit/).

---

## Philosophy

The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as data connectors, allowing you to stream data from a source to the destination of your choice:
The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as an E(T)L pipeline, allowing you to stream data from a source to the destination of your choice:

- [Readers](nck/readers) are reading data from an API source, and transform it into a stream object.
- [Streams](nck/streams) (*transparent to the end-user*) are local objects used by writers to process individual records collected from the source.
- [Writers](nck/writers) are writing the output stream object to the destination of your choice.

## Available connectors

As of now, the application is offering:
As of now, the application is offering the following Readers & Writers:

### Readers

**Analytics**

- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics

**Advertising**

- **DSP**

- **Analytics**
- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics
- **Advertising - Adserver**
- Google Campaign Manager
- **Advertising - DSP**
- Google Display & Video 360
- The Trade Desk

- **Adserver**

- Google Campaign Manager

- **Search**

- **Advertising - Search**
- Google Ads
- Google Search Ads 360
- Google Search Console
- Yandex Campaign
- Yandex Statistics

- **Social**

- **Advertising - Social**
- Facebook Marketing
- MyTarget
- Radarly
- Twitter Ads

**CRM**

- SalesForce

**Databases**

- MySQL

**Files (.csv, .njson)**

- Amazon S3
- Google Cloud Storage
- Google Sheets

**DevTools**

- Confluence

- **CRM**
- SalesForce
- **Databases**
- MySQL
- **DevTools**
- Confluence
- **Files (.csv, .njson)**
- Amazon S3
- Google Cloud Storage
- Google Sheets

### Writers

**Files (.njson)**

- Amazon S3
- Google Cloud Storage
- Local file

**Data Warehouse**

- Google BigQuery

**Debugging**

- Console

*A data connector could be, for instance, the combination of a Google Analytics reader + a Google Cloud Storage writer, collecting data from the Google Analytics API, and storing output stream records into a Google Cloud Storage bucket.*

For more information on how to use NCK, check out the [official documentation](https://artefactory.github.io/nautilus-connectors-kit/).
- **Data Warehouses**
- Google BigQuery
- **Debugging**
- Console
- **Files (.njson)**
- Amazon S3
- Google Cloud Storage
- Local file
76 changes: 58 additions & 18 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,21 +181,39 @@ How to develop a new reader

To create a new reader, you should:

1. Create the following modules: ``nck/readers/<SOURCE_NAME>_reader.py``` and ``nck/helpers/<SOURCE_NAME>_helper.py``
1. Create a ``nck/readers/<SOURCE_NAME>/`` directory, having the following structure:

The ``nck/readers/<SOURCE_NAME>_reader.py`` module should implement 2 components:
.. code-block:: shell
- nck/
-- readers/
--- <SOURCE_NAME>/
---- cli.py
---- reader.py
---- helper.py # Optional
---- config.py # Optional
``cli.py``

- A click-decorated reader function
This module should implement a click-decorated reader function:

- The reader function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input that should be provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The reader function should return a reader class (*more details below*). A source name prefix should be added to the name of each class attribute, using the ``extract_args()`` function.
- The reader function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The reader function should return a reader class (*more details below*). The source prefix of each option will be removed when passed to the writer class, using the ``extract_args()`` function.

- A reader class
``reader.py``

This module should implement a reader class:

- Class attributes should be the previously defined click options.
- The class should have a ``read()`` method, yielding a stream object. This stream object can be chosen from `available stream classes <https://github.com/artefactory/nautilus-connectors-kit/tree/dev/nck/streams>`__, and has 2 attributes: a stream name and a source generator function named ``result_generator()``, and yielding individual source records.
- The class should have a ``read()`` method, yielding a stream object. This stream object can be chosen from `available stream classes <https://github.com/artefactory/nautilus-connectors-kit/tree/dev/nck/streams>`__, and has 2 attributes: a stream name and a source generator function named ``result_generator()``, yielding individual source records.

``helper.py`` (Optional)

The ``nck/helpers/<SOURCE_NAME>_helper.py`` module should implement helper methods and configuration variables (*warning: we are planning to move configuration variables to a separate module for reasons of clarity*).
This module gathers all helper functions used in the ``reader.py`` module.

``config.py`` (Optional)

This module gathers all configuration variables.

2. In parallell, create unit tests for your methods under the ``tests/`` directory

Expand All @@ -204,8 +222,8 @@ The ``nck/helpers/<SOURCE_NAME>_helper.py`` module should implement helper metho
4. Complete the documentation:

- Add your reader to the list of existing readers in the :ref:`overview:Available Connectors` section.
- Add your reader to the list of existing readers in the repo's ``./README.md``.
- Create dedicated documentation for your reader CLI command on the :ref:`readers:Readers` page. It should include the followings sections: *Source API - How to obtain credentials - Quickstart - Command name - Command options*
- Add your reader to the reader list in the README, at the root of the GitHub project

---------------------------
How to develop a new stream
Expand All @@ -228,24 +246,46 @@ How to develop a new writer

To develop a new writer, you should:

1. Create the following module: ``nck/writers/<DESTINATION_NAME>_writer.py``
1. Create a ``nck/writers/<DESTINATION_NAME>/`` directory, having the following structure:

.. code-block:: shell
- nck/
-- writers/
--- <DESTINATION_NAME>/
---- cli.py
---- writer.py
---- helper.py # Optional
---- config.py # Optional
``cli.py``

This module should implement 2 components:
This module should implement a click-decorated writer function:

- A click-decorated writer function
- The writer function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The writer function should return a writer class (*more details below*). The destination prefix of each option will be removed when passed to the writer class, using the ``extract_args()`` function.

- The writer function should be decorated with: a ``@click.command()`` decorator, several ``@click.option()`` decorators (*one for each input that should be provided by end-users*) and a ``@processor()`` decorator (*preventing secrets to appear in logs*). For further information on how to implement these decorators, please refer to `click documentation <https://click.palletsprojects.com/en/7.x/>`__.
- The writer function should return a writer class (*more details below*). A destination name prefix should be added to the name of each class attribute, using the `extract_args` function.
``writer.py``

- A writer class
This module should implement a writer class:

- Class attributes should be the previously defined click options.
- The class should have a ``write()`` method, writing the stream object to the destination.

2. Add your click-decorated writer function to the ``nck/writers/__init__.py`` file
``helper.py`` (Optional)

This module gathers all helper functions used in the ``writer.py`` module.

``config.py`` (Optional)

This module gathers all configuration variables.

3. Complete the documentation:
2. In parallell, create unit tests for your methods under the ``tests/`` directory

3. Add your click-decorated writer function to the ``nck/writers/__init__.py`` file

4. Complete the documentation:

- Add your writer to the list of existing writers in the :ref:`overview:Available Connectors` section.
- Add your reader to the list of existing readers in the repo's ``./README.md``.
- Create dedicated documentation for your writer CLI command on the :ref:`writers:Writers` page. It should include the followings sections: *Quickstart - Command name - Command options*
- Add your writer to the writer list in the README, at the root of the GitHub project
92 changes: 33 additions & 59 deletions docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
Overview
########

**NCK is a Command-Line Interface (CLI), allowing you to easily request, stream and store raw reports, from the API source to the destination of your choice.** As of now, the most common output format of data extracted by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).
**NCK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.**

As of now, the most common output format of data loaded by the application is .njson (i.e. a file of n lines, where each line is a json-like dictionary).

==========
Philosophy
==========

The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as data connectors, allowing you to stream data from a source to the destination of your choice:
The application is composed of **3 main components** (*implemented as Python classes*). When combined, these components act as an E(T)L pipeline, allowing you to stream data from a source to the destination of your choice:

- :ref:`readers:Readers` are reading data from an API source, and transform it into a stream object.
- :ref:`streams:Streams` (*transparent to the end-user*) are local objects used by writers to process individual records collected from the source.
Expand All @@ -18,80 +20,52 @@ The application is composed of **3 main components** (*implemented as Python cla
Available connectors
====================

As of now, the application is offering:
As of now, the application is offering the following Readers & Writers:

*******
Readers
*******

**Analytics**

- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics

**Advertising**

- **DSP**
*******

- **Analytics**
- Adobe Analytics 1.4
- Adobe Analytics 2.0
- Google Analytics
- **Advertising - Adserver**
- Google Campaign Manager
- **Advertising - DSP**
- Google Display & Video 360
- The Trade Desk

- **Adserver**

- Google Campaign Manager

- **Search**

- **Advertising - Search**
- Google Ads
- Google Search Ads 360
- Google Search Console
- Yandex Campaign
- Yandex Statistics

- **Social**

- **Advertising - Social**
- Facebook Marketing
- MyTarget
- Radarly
- Twitter Ads

**CRM**

- SalesForce

**Databases**

- MySQL

**Files (.csv, .njson)**

- Amazon S3
- Google Cloud Storage
- Google Sheets

**DevTools**

- Confluence

- **CRM**
- SalesForce
- **Databases**
- MySQL
- **DevTools**
- Confluence
- **Files (.csv, .njson)**
- Amazon S3
- Google Cloud Storage
- Google Sheets

*******
Writers
*******

**Files (.njson)**

- Amazon S3
- Google Cloud Storage
- Local file

**Data Warehouse**

- Google BigQuery

**Debugging**

- Console


*A data connector could be, for instance, the combination of a Google Analytics reader + a Google Cloud Storage writer, collecting data from the Google Analytics API, and storing output stream records into a Google Cloud Storage bucket.*
- **Data Warehouses**
- Google BigQuery
- **Debugging**
- Console
- **Files (.njson)**
- Amazon S3
- Google Cloud Storage
- Local file
Loading

0 comments on commit 13ce44d

Please sign in to comment.