Php etl is a ETL that will allow transformation of data in a simple and efficient way.
It comes in 2 php components :
The rule engine allows to have configuration based transformations to transform a particular data.
This is integrated inside the ETL with the RuleTransformOperation
.
The rule engine can be used in standalone, see docs
An ETL chain is described by chain operations and Items. The chain operation holds the logic, this means it can:
- Extract data, (possibly duplicate the item)
- Transform data
- Load the data somewhere.
Data will propagate throught the ETL operations using Items, there can be different type of items (We will detail this later.)
Chain operations consumes one item in order to create a new item or an iterator. The purpose is to always process data individually. For example if we are importing customers we try to never have the data of more than one customer in memory.
We will have more detailed real use cases with sample data a bit further in the document.
In the simplest case the chains receive an iterator containing 2 items in input, both items are processed by each chain operation. This could be for example a list of customer. Each operation changes the items.
In the following example the iterator sends a single item. The first operation will then send GroupedItems containing 2 items. The first item could be a customer, and then we fetch each order of the customer in the operation1.
We can also group items, to make aggregations. The chain receives an iterator containg 2 items, the first operation processes both items. It breaks the chain for the first item, and returns an aggregation of item1 & item 2. This can be used to count the number of customers. This kind of grouping can use more memory and should therefore be used with care.
Chains can also be split, this would allow 2 different operations to be executed on the same item.
There are 2 ways of writing a chain, either you code it; or you describe the chain in a yaml file.
- Using php code to initiate each operation yourself, this is not recommended!
- Using yaml files to descrive the chain.
Please see the describe chains using yaml configurations, you can check the doc on sub chains and more complex cases as well
Please refer to the Custom Opertions doc
Please refer to the FAQ