diff --git a/.gitignore b/.gitignore
index 4d6e9263..dfedcde4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,4 @@
docs/sample/docker/data/custom/csv
docs/sample/docker/data/custom/json
docs/sample/docker/data/custom/parquet
+docs/sample/docker/data/custom/generated
diff --git a/docs/connections/connections.md b/docs/connections/connections.md
index 146a00f8..677ed63e 100644
--- a/docs/connections/connections.md
+++ b/docs/connections/connections.md
@@ -4,13 +4,13 @@ Details of all the connection configuration supported can be found in the below
## Supported Data Connections
-| Data Source Type | Data Source |
-|------------------|----------------------------|
-| Database | Postgres, MySQL, Cassandra |
-| File | CSV, JSON, ORC, Parquet |
-| Kafka | Kafka |
-| JMS | Solace |
-| HTTP | GET, PUT, POST |
+| Data Source Type | Data Source |
+|------------------|-----------------------------------------------------|
+| Database | Postgres, MySQL, Cassandra |
+| File | CSV, JSON, ORC, Parquet |
+| Kafka | Kafka |
+| JMS | Solace |
+| HTTP | GET, PUT, POST, DELETE, PATCH, HEAD, TRACE, OPTIONS |
All connection details follow the same pattern.
@@ -198,8 +198,10 @@ GRANT INSERT ON Define your Kafka bootstrap server to connect and send generated data to corresponding topics. Topic gets set at a step level. Define your Kafka bootstrap server to connect and send generated data to corresponding topics. Topic gets set at a step
+level. Define a URL to connect to when sending HTTP requests. Define any username and/or password needed for the HTTP requests. When defining a total count within the In the example below, we have In the example below, we have TO
@@ -960,8 +960,10 @@
expression:"#{Address.city}/#{Demographic.maritalStatus}" | Will generate a string based on the faker expression provided. All possible faker expressions can be found [here](../sample/datafaker/expressions.txt)
Expression has to be in format `#{
expression:"#{Address.city}/#{Demographic.maritalStatus}" | Will generate a string based on the faker expression provided. All possible faker expressions can be found [here](../sample/datafaker/expressions.txt)
Expression has to be in format `#{Supported Data Connections
HTTP
-GET, PUT, POST
+GET, PUT, POST, DELETE, PATCH, HEAD, TRACE, OPTIONS
Cassandra
Kafka
-
-Further details can be found here
+Further details can be
+found herekafka {
kafka {
kafka.bootstrap.servers = "localhost:9092"
@@ -992,15 +994,13 @@
JMS
}
HTTP
-
-Later, can have the ability to define generated data as part of the URL.
+The url is defined in the tasks to allow for generated data to be populated in the url.http {
customer_api {
- url = "http://localhost:80/get"
- url = ${?HTTP_URL}
- user = "admin" #optional
+ user = "admin"
user = ${?HTTP_USER}
- password = "admin" #optional
+ password = "admin"
password = ${?HTTP_PASSWORD}
}
}
diff --git a/site/generators/count/index.html b/site/generators/count/index.html
index a5fe4d94..56b2ccb1 100644
--- a/site/generators/count/index.html
+++ b/site/generators/count/index.html
@@ -605,7 +605,7 @@
Per Column Count
Total
perColumn
configuration, it translates to only creating (count.total * count.perColumn.total)
records.
This is a fixed number of records that will be generated each time, with no variation between runs.count.total=1000
and count.perColumn.total=2
. Which means that 1000 * 2=2000
records will be generated
+count.total = 1000
and count.perColumn.total = 2
. Which means that 1000 * 2 = 2000
records will be generated
for this CSV file every time data gets generated.name: "csv_file"
steps:
diff --git a/site/get-started/docker/index.html b/site/get-started/docker/index.html
index 87f3da24..c04cc52d 100644
--- a/site/get-started/docker/index.html
+++ b/site/get-started/docker/index.html
@@ -657,7 +657,7 @@
Generate plan and tasks
cat data/custom/generated/plan/plan_*
Generate data with record tracking
-APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=true ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_RECORD_TRACKING=true DATA_SOURCE=postgresdvd PLAN=generated/plan/plan_20230803T040203Z docker-compose up -d datacaterer
+
APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=true ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_RECORD_TRACKING=true DATA_SOURCE=postgresdvd PLAN=generated/plan/$(ls data/custom/generated/plan/ | grep plan | head -1 | awk -F " " '{print $NF}' | sed 's/\.yaml//g') docker-compose up -d datacaterer
Delete the generated data
APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=false ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_DELETE_GENERATED_RECORDS=true DATA_SOURCE=postgresdvd docker-compose up -d datacaterer
diff --git a/site/index.html b/site/index.html
index f4a1c0a8..59da04b4 100644
--- a/site/index.html
+++ b/site/index.html
@@ -454,6 +454,12 @@
Home
your local laptop.
Just define your data source connections and data will be generated.
It can also be manually altered to produce data or scenarios the way you want.
Main features of the data generator include: +- Ability to gather metadata about data sources +- Generate data in either batch or real-time +- Maintain referential integrity across generated data +- Create custom data generation scenarios +- Delete generated data
diff --git a/site/sample/docker/data/custom/application-dvd.conf b/site/sample/docker/data/custom/application-dvd.conf index 980cdffc..2694baab 100644 --- a/site/sample/docker/data/custom/application-dvd.conf +++ b/site/sample/docker/data/custom/application-dvd.conf @@ -9,6 +9,8 @@ flags { enableRecordTracking = ${?ENABLE_RECORD_TRACKING} enableDeleteGeneratedRecords = false enableDeleteGeneratedRecords = ${?ENABLE_DELETE_GENERATED_RECORDS} + enableFailOnError = true + enableFailOnError = ${?ENABLE_FAIL_ON_ERROR} } folders { @@ -46,7 +48,7 @@ spark { jdbc { postgresDvd { - url = "jdbc:postgresql://postgresdvd:5432/dvdrental" + url = "jdbc:postgresql://localhost:5432/dvdrental" url = ${?POSTGRES_URL} user = "postgres" user = ${?POSTGRES_USER} diff --git a/site/sample/docker/data/custom/application.conf b/site/sample/docker/data/custom/application.conf index 1254ffbf..7ff229da 100644 --- a/site/sample/docker/data/custom/application.conf +++ b/site/sample/docker/data/custom/application.conf @@ -61,7 +61,7 @@ parquet { jdbc { postgresCustomer { - url = "jdbc:postgresql://postgres:5432/customer" + url = "jdbc:postgresql://postgresserver:5432/customer" url = ${?POSTGRES_URL} user = "postgres" user = ${?POSTGRES_USER} @@ -70,7 +70,7 @@ jdbc { driver = "org.postgresql.Driver" } mysql { - url = "jdbc:mysql://mysql:3306/customer" + url = "jdbc:mysql://mysqlserver:3306/customer" url = ${?MYSQL_URL} user = "root" user = ${?MYSQL_USERNAME} @@ -83,7 +83,7 @@ jdbc { org.apache.spark.sql.cassandra { cassandra { - spark.cassandra.connection.host = "cassandra" + spark.cassandra.connection.host = "cassandraserver" spark.cassandra.connection.host = ${?CASSANDRA_HOST} spark.cassandra.connection.port = "9042" spark.cassandra.connection.port = ${?CASSANDRA_PORT} @@ -96,8 +96,6 @@ org.apache.spark.sql.cassandra { http { httpbin { - url = "http://httpbin:80/put" - url = ${?HTTP_URL} } } diff --git a/site/sample/docker/data/custom/clean_dvd.sql b/site/sample/docker/data/custom/clean_dvd.sql index ba8e6d49..81e3af99 100644 --- a/site/sample/docker/data/custom/clean_dvd.sql +++ b/site/sample/docker/data/custom/clean_dvd.sql @@ -1,3 +1,9 @@ +delete from payment where payment_id > 27502; +delete from rental where rental_id > 10005; +delete from customer where customer_id > 599; +delete from store where store_id > 2; +delete from staff where staff_id > 2; +delete from address where address_id > 605; delete from film_category where category_id > 16; delete from inventory where film_id > 1000; delete from film where language_id > 16; diff --git a/site/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml b/site/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml deleted file mode 100644 index 614a9bb5..00000000 --- a/site/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml +++ /dev/null @@ -1,38 +0,0 @@ ---- -name: "plan_20230803T040203Z" -description: "Generated plan" -tasks: -- name: "postgresDvd" - dataSourceName: "postgresDvd" - enabled: true -sinkOptions: - foreignKeys: - postgresDvd.postgresDvd_public_category.category_id: - - "postgresDvd.postgresDvd_public_film_category.category_id" - postgresDvd.postgresDvd_public_city.city_id: - - "postgresDvd.postgresDvd_public_address.city_id" - postgresDvd.postgresDvd_public_language.language_id: - - "postgresDvd.postgresDvd_public_film.language_id" - postgresDvd.postgresDvd_public_address.address_id: - - "postgresDvd.postgresDvd_public_customer.address_id" - - "postgresDvd.postgresDvd_public_staff.address_id" - - "postgresDvd.postgresDvd_public_store.address_id" - postgresDvd.postgresDvd_public_actor.actor_id: - - "postgresDvd.postgresDvd_public_film_actor.actor_id" - postgresDvd.postgresDvd_public_inventory.inventory_id: - - "postgresDvd.postgresDvd_public_rental.inventory_id" - postgresDvd.postgresDvd_public_country.country_id: - - "postgresDvd.postgresDvd_public_city.country_id" - postgresDvd.postgresDvd_public_customer.customer_id: - - "postgresDvd.postgresDvd_public_payment.customer_id" - - "postgresDvd.postgresDvd_public_rental.customer_id" - postgresDvd.postgresDvd_public_rental.rental_id: - - "postgresDvd.postgresDvd_public_payment.rental_id" - postgresDvd.postgresDvd_public_staff.staff_id: - - "postgresDvd.postgresDvd_public_payment.staff_id" - - "postgresDvd.postgresDvd_public_rental.staff_id" - - "postgresDvd.postgresDvd_public_store.manager_staff_id" - postgresDvd.postgresDvd_public_film.film_id: - - "postgresDvd.postgresDvd_public_film_actor.film_id" - - "postgresDvd.postgresDvd_public_film_category.film_id" - - "postgresDvd.postgresDvd_public_inventory.film_id" diff --git a/docs/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml b/site/sample/docker/data/custom/generated/plan/plan_20230809T033317Z.yaml similarity index 98% rename from docs/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml rename to site/sample/docker/data/custom/generated/plan/plan_20230809T033317Z.yaml index 614a9bb5..f5083626 100644 --- a/docs/sample/docker/data/custom/generated/plan/plan_20230803T040203Z.yaml +++ b/site/sample/docker/data/custom/generated/plan/plan_20230809T033317Z.yaml @@ -1,5 +1,5 @@ --- -name: "plan_20230803T040203Z" +name: "plan_20230809T033317Z" description: "Generated plan" tasks: - name: "postgresDvd" diff --git a/site/sample/docker/data/custom/generated/task/postgresDvd_task.yaml b/site/sample/docker/data/custom/generated/task/postgresDvd_task.yaml index ef8abaad..e8483e0a 100644 --- a/site/sample/docker/data/custom/generated/task/postgresDvd_task.yaml +++ b/site/sample/docker/data/custom/generated/task/postgresDvd_task.yaml @@ -11,7 +11,7 @@ steps: type: "generated" fields: - name: "actor_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -88,606 +88,6 @@ steps: nullCount: "0" nullable: false enabled: true -#- name: "postgresDvd_public_actor_info" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.actor_info" -# schema: -# type: "generated" -# fields: -# - name: "actor_id" -# type: "integer" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "200" -# distinctCount: "199" -# sourceDataType: "integer" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "32" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smLgQAAOAXAAC64Q8E8QA/6TJkyZMmTQAAAP4/8AABAAQIABFADgBAAABACAYAQAAAQBAGAAcIABAUDgBAAABAGAYAQAAAQBwGAEAAAEAgBgAHCAAQIg4AQAAAQCQGAEAAAEAmBgBAAABAKAYABwgAECoOAEAAAEAsBgBAAABALgYABwgAEDAOAEAAAEAxBgBAAABAMgYAQAAAQDMGAAcIABA0DgBAAABANQYAQAAAQDYGAEAAAEA3BgAHCAAQOA4AQAAAQDkGAEAAAEA6BgAHCAAQOw4AQAAAQDwGAEAAAEA9BgBAAABAPgYABwgAED8OADMAAEAxATBAQIAPADEAQEEHAAcIAAMYABNCGAATQhAAE0MQAAUIAAMYABNEGAAURBAAAwgAE0UYABNFEAATRhAAFEYQAAMIACFHAAEAI0BHEAARSA8ANABASBAAAwgAE0kYABNJEAATShAABQgAAxgAE0sYABNLEAATTBAABQgAAxgAAE4CAGgAE00QABFODwAHCAADGAATTxgAFE8QAAMIABNQGAASUFEBJEBQGAARwBkABQgAEVEPADMAQFEoABNRKAATUSgABAgAE1IoABNSKAATUigAE1IoAAQIABNTKAATUygAE1MoAAUIAAMwABNUKAATVCgAE1QoAAUIAAMoABNVKAATVSgAE1UoAAUIAAMoABNWKAATVigABQgAAzAAE1YoABNXKAATVygABQgAAygAE1coABNYKAATWCgABQgAAygAE1goABNZKAAFCAADMAATWSgAE1koABNaKAAFCAADKAATWigAE1ooABNbKAAFCAADKAATWygAE1soABNcKAAFCAADKAATXCgAFFwoAAMIABNdMAATXSgAE10oABRdIAADCAATXigAE14oABNeKAAUXiAAAwgAE18oABNfKAATXygABQgAAygAE2AoABJg0QQjQGAwAAUIAAIhACNAYEAAIWCgMQAkQGBAAAQIABLgGAARYQcAMwBAYVAAE2FQAAUIAANQABNhUAATYVAABQgAA1AAE2FQABNiUAATYlAABQgAA1gAE2JQABNiUAATYlAABQgAA1AAE2JQABNjUAAFCAADWAATY1AAE2NQABNjUAAFCAADWAATY1AAE2NQABNkUAAFCAADUAATZFAAE2RQAAUIAANYABNkUAATZFAAFGRQAAMIABNlWAATZVAAE2VQABNlUAAFCAADUAATZVAAE2VQABRlSAADCAATZlAAE2ZQABNmUAAFCAADWAATZlAAE2ZQABNmUAAFCAADUAATZ1AAE2dQABNnUAAFCAADUAATZ1AAE2dQAAUIAANYABNnUAATaFAAE2hQAAUIAANYABNoUAATaFAAE2hQAAUIAANQABNoUAASaVAABPAHAkYAD+gH/////////84E8AcDAgAfAQgA/////////9tQAf////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "200" -# maxLen: "4" -# avgLen: "4" -# nullCount: "0" -# nullable: true -# - name: "first_name" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "200" -# distinctCount: "124" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Name.firstname}" -# maxLen: "45" -# avgLen: "6" -# nullCount: "0" -# nullable: true -# - name: "last_name" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "200" -# distinctCount: "128" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Name.lastname}" -# maxLen: "45" -# avgLen: "7" -# nullCount: "0" -# nullable: true -# - name: "film_info" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "200" -# distinctCount: "200" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "830" -# avgLen: "553" -# nullCount: "0" -# nullable: true -# enabled: true -#- name: "postgresDvd_public_customer_list" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.customer_list" -# schema: -# type: "generated" -# fields: -# - name: "id" -# type: "integer" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "592" -# sourceDataType: "integer" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "32" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smggQAAOAXAACCrUYK8QBAAt27du3btwAAAP4/8AABACFACAcAMwBAFAgAEyAIABMkCAATKAgAEy4IABMxCAATMwgAEzYIABM4CAATOggAEz0IABM/CAATQQgAE0IIABNDCAAhRIAJACNARQgAE0YIACFIAAEAIUBJBwAzAEBKCAATSyAAE0wIABNNCAATTyAAFFAIABHACQAzQFFACAADEAATUjAAEVMXADMAQFMQABNUKAATVCgAE1UQABNWKAATVigAE1cQABNXKAATWCgAE1kYABNZKAATWhAAE1ooABNbKAATWxAAE1woABNdKAATXRAAE14oABNeKAATXxAAE2AoABRgEAARoBEAMkBg4AgAEmG5ASNAYUgAE2FAABFiHwAyAEBiQQAjQGI4ABNiOAATY1AAE2M4ABNjGAATZFAAE2Q4ABNkUAATZVAAE2U4ABNlUAATZTgAE2Y4ABNmUAATZjgAE2c4ABNnUAATZzgAE2gYABNoUAATaDgAE2hQABNpUAATaTgAE2lQABNqOAATajgAE2pQABNqOAATazgAE2tQABNrOAATbBgAE2xQABNsOAATbVAAE21QABNtOAATbVAAE244ABNuOAATblAAE284ABNvOAATb1AAE284ABNwOAATcCAAE3BAACFwkDkAM0BwsAgAEtAIABFxBwAzAEBxOAATcTgAIXFwGQAjQHE4ABNxOAATcWgAEXIfAEEAQHIwCQAiQHIJAyNAcjgAE3KYABNyuAATcjgAInMQMAATczgAE3M4ACNzgBgAAzgAE3OgABJzOQQjQHQ4ABN0oAATdNgAE3Q4ABN0oAATdDgAE3Q4ABN12AATdTgAE3U4ABN12AATdTgAE3WgABF2hwAzAEB2OAATdqAAE3bYABN2OAATdtgAE3Y4ABN3OAATd9gAE3c4ABN3OAATd9gAE3c4ABN3oAATeNgAE3g4ABN4oAATeNgAE3g4ABN42AATeDgAE3k4ACF5QHkAI0B5OAATeTgAE3nYABN5OAAReicAMwBAetgAE3o4ABN6oAATetgAE3o4ABN62AATezgAE3s4ABN72AATezgAE3s4ABN72AATezgAE3ygABN82AATfDgAE3ygABN82AATfDgAE3zYABN9OAATfTgAE33YABN9OAATfaAAE33YABN9OAATfqAAE37YABN+OAATfqAAE344ABN+OAATf9gAE384ABN/OAATf9gAE384ABN/oAATf9gAAocAQQBAgBhBACJAgLEGIkCAkQYjQIDoABOAgAATgFAAI4CIMAATmAgAE6gIAANgABOAmAAjgOgYABL4CAASgTEHI0CBqAATgRgBE4GwACOBWCgAAtkEM0CBeBAAA8AAE4FoAROBAAEjgcggABPYCAADQAERgg8AMwBAgoABE4LYABOC2AASgiEHI0CC2AATgtgAEoIRASNAgtgAE4LYAAK3AA/oB//////////WIoK4OQgCAgASAwcAIwACCAAMEAAPGAA1D1gARQ+gAAUPcAD/Jg84Af8mD4gCzQ8YAv///7BQAv////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "599" -# maxLen: "4" -# avgLen: "4" -# nullCount: "0" -# nullable: true -# - name: "name" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "599" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Name.name}" -# maxLen: "21" -# avgLen: "13" -# nullCount: "0" -# nullable: true -# - name: "address" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "599" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.fullAddress}" -# maxLen: "50" -# avgLen: "20" -# nullCount: "0" -# nullable: true -# - name: "zip code" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "596" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "10" -# avgLen: "5" -# nullCount: "0" -# nullable: true -# - name: "phone" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "599" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "20" -# avgLen: "12" -# nullCount: "0" -# nullable: true -# - name: "city" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "561" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.city}" -# maxLen: "50" -# avgLen: "9" -# nullCount: "0" -# nullable: true -# - name: "country" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "105" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.country}" -# maxLen: "50" -# avgLen: "9" -# nullCount: "0" -# nullable: true -# - name: "notes" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "1" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "6" -# avgLen: "6" -# nullCount: "0" -# nullable: true -# - name: "sid" -# type: "short" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "599" -# distinctCount: "2" -# sourceDataType: "smallint" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "16" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smSQAAAOAXAAB+8aED8QBAAt27du3btwAAAP4/8AABAA8IAP////9BEUBWBA8IAP///4IP6Af/////////3ASACwMCAB8BCAD/////////21AB/////0xaNEJsb2NrFgAAAAAAAAAAAAAAAA==" -# max: "2" -# maxLen: "2" -# avgLen: "2" -# nullCount: "0" -# nullable: true -# enabled: true -#- name: "postgresDvd_public_film_list" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.film_list" -# schema: -# type: "generated" -# fields: -# - name: "fid" -# type: "integer" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "997" -# sourceDataType: "integer" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "32" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smcAQAAOAXAABZHhcK8QBAD2bNmzZs2gAAAP4/8AABACFAEAcAMwBAIAgAEygIABMwCAATNAgAEzgIABM8CAATQAgAE0IIABNECAATRggAE0gIABNKCAAhS4AJACNATQgAE08IACJQwBgAE1EIABNSCAATUwgAE1QIABNVCAATVggAE1cIABNYCAATWQgAE1pYABNbCAATXAgAE10IABNeCAATXwgAEmDJACNAYEAAE2EQABNhEAATYhAAE2IQABNjEAATYxAAEmQ5ATJAZKDIABNlEAATZRAAE2YQABNmEAATZxAAE2cQABNoEAATaBAAE2kQABNpEAATahAAE2qwABFrZwAzAEBrEAATbBAAE2wQABNtEAATbRAAE24QABNuEAATbxAAE28QABNwEAAhcFAJACRAcBgAA+AAEXEXADMAQHH4ABNxIAATcSAAE3IgABNyIAATciAAE3IgABNzIAATcyAAE3MgABRzIAACiQIjQHQgABN0IAATdCAAE3VAABN1IAATdSAAE3UgABN2IAATdiAAE3YgABN2IAATdyAAEnfRAiJAd/EAMUB3sBkAI0B3gAATeCAAE3ggABN4IAATeCAAE3kgABN5IAATeSAAE3kgABN6IAAiemBYABN6wAEieuAQABN7yAETeyAAE3sgABN7IAATfCAAE3wgABN8IAATfCAAE30gABN9IAATfSAAIn3QYAASfskDI0B+yAEjfpAYAAMgABN/IAATfyAAE38gABN/IAAigAgwACOAKAgAAsEDM0CAaBAAA1gBE4B4ABOAYAETgKAAEYEnADMAQIGoACGBQBEAI0CBsAATgUAAE4FAABOBQAATgUAAEYIvADMAQIJAABKCcQQxQIJYGQAzQIJ4CAATmAgAE7gIABPYCAAS+AgAI4MYCAADQAATg0AAE4NAABODQAATg0AAE4MAARODyAEThCgBE4TQAROEMAEThPgBE4Q4AROEAAIThEAAE4RAABOFQAAThUAAE4VAABOFSAEihYigACOFqAgAE8gIABLoCAAThogBE4aIAROGiAEThkAAE4ZAABOGQAAThkAAE4ZAABOHQAATh0gBE4eIAROHiAETh4gBE4eIAROHiAETh4gBAl8AMwBAiEAAE4hAABOIQAATiEAAE4hAABOISAETiIgBE4iIAROJQAATiUAAE4lAABOJQAATiUAAE4mAABOJgAARigcBMwBAikAAE4pAABOKyAETisgBE4rIAROKgAATioAAE4qAABOLCAITiwgCE4tAABOLQAATi0AAE4tAABOLQAATi0AAE4zIAROMyAETjMgBE4wIAhOMCAITjAgCE4wIAhOMCAITjUAAE41AABONQAATjUAAE41AABONyAETjcgBE43IAROOyAETjggCE44IAhOOCAISjqkBI0COQAATjkAAE45AABOPQAATj0AAAo8AD+gH/////////9YSjzAJAwIAHwQIAFQfA2gAvAQ4AQ/YAL0P0ABVD0AB//8HDxgC//8HD5gE/y5QA/////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "1000" -# maxLen: "4" -# avgLen: "4" -# nullCount: "0" -# nullable: true -# - name: "title" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "997" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "255" -# avgLen: "15" -# nullCount: "0" -# nullable: true -# - name: "description" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "983" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "130" -# avgLen: "94" -# nullCount: "0" -# nullable: true -# - name: "category" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "16" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "25" -# avgLen: "7" -# nullCount: "0" -# nullable: true -# - name: "price" -# type: "decimal(4,2)" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "3" -# sourceDataType: "numeric" -# isNullable: "true" -# isUnique: "false" -# scale: 2 -# numericPrecision: "4" -# numericScale: "2" -# min: "0.99" -# histogram: "TFo0QmxvY2smXQAAAOAXAABDceoB/wVAD2bNmzZs2gAAAP4/764UeuFHrggA//+fj0AH64UeuFHsCAD//3h/E/XCj1wo9ggA//+PD+gH/////////9wEiAoSAAEAEgEHAA8IAP/////////VUAH/////TFo0QmxvY2sWAAAAAAAAAAAAAAAA" -# max: "4.99" -# maxLen: "8" -# avgLen: "8" -# nullCount: "0" -# nullable: true -# - name: "length" -# type: "short" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "141" -# sourceDataType: "smallint" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "16" -# numericScale: "0" -# min: "46" -# histogram: "TFo0QmxvY2smRwMAAOAXAABmxt0I8QBAD2bNmzZs2gAAAP5ARwABAAYIABOAEAADCAARSA8ABwgABBgAAwgAE0kgAAUIAAQYAAMIABNKIAAUShAACwgAE0sgABNLGAATTBAAFEwQAAMIABNNGAAFCAAEGAADCAATTiAABQgABBgACwgAE08oABNPGAAECAATUBgADQgAEUAZACNAUDAAI1DAEAADCAARUQ8AMwBAUSgABQgAAzAABQgAAzAAE1IwABNSMAANCAADMAANCAAEQAALCAATU1AABQgAA1gAE1NAAAUIAAM4ABNUMAANCAADOAATVDgABQgAAzgAE1U4AA0IAAM4AA8IAA4DWAATVVAAE1ZQABNWOAAFCAADKAAUVigAAwgAE1cwAA0IAANAAAUIAANAABNYMAATWBgAFFgYAAQIAANYAAQIABNZMAANCAADQAAFCAADQAAFCAAEQAALCAATWlAAE1o4AAUIAAM4ABNaMAAMCAATWzgAE1s4AAUIAAM4AAUIAANAAAQIABNcOAANCAADSAAFCAADQAANCAADSAATXUgAE10wAAUIAAMwAAUIAAMwABNeMAANCAADQAAFCAADOAAFCAAEQAADCAATX0gAE18wAAUIAAMwAAUIAAQwAAMIABNgOAAhYCAJAAYIAANIAAUIAAIpACNAYEgABQgAE6A4AARQAAQIABPgGAADCAARYQ8ABwgAA3AADQgAA3AAE2FwAA0IAAOAABNheAAFCAADeAAFCAAEeAADCAATYoAABQgAA3gAE2JwAAUIAAN4AAUIAANwAAUIAAN4ABNiaAAFCAAEaAADCAATY3AABQgAA3AADQgAA4AABQgAA3gAE2NwABNjcAAFCAADeAAUY2gAAwgAE2RwABNkYAAFCAARQBkABggAA2AABQgAA2gAE2RoABNkYAAUZFgAAwgAEWVHADMAQGVgABNlWAATZUgABQgAA1AADQgAA2AAE2VgAAUIAANgABNmYAANCAADcAATZnAADQgAA4AADQgAA4AABQgAA4AABQgAA4gABQgAA4gAE2eIAAUIAAOAAAQIAAXwBwNYAAQIAA/oB//////////EA/AHAwIAHwEIAP//vh8C0AL//74PoAX//zdQAf////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "185" -# maxLen: "2" -# avgLen: "2" -# nullCount: "0" -# nullable: true -# - name: "rating" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "5" -# sourceDataType: "USER-DEFINED" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "5" -# avgLen: "3" -# nullCount: "0" -# nullable: true -# - name: "actors" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "997" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "216" -# avgLen: "78" -# nullCount: "0" -# nullable: true -# enabled: true -#- name: "postgresDvd_public_nicer_but_slower_film_list" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.nicer_but_slower_film_list" -# schema: -# type: "generated" -# fields: -# - name: "fid" -# type: "integer" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "997" -# sourceDataType: "integer" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "32" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smcAQAAOAXAABZHhcK8QBAD2bNmzZs2gAAAP4/8AABACFAEAcAMwBAIAgAEygIABMwCAATNAgAEzgIABM8CAATQAgAE0IIABNECAATRggAE0gIABNKCAAhS4AJACNATQgAE08IACJQwBgAE1EIABNSCAATUwgAE1QIABNVCAATVggAE1cIABNYCAATWQgAE1pYABNbCAATXAgAE10IABNeCAATXwgAEmDJACNAYEAAE2EQABNhEAATYhAAE2IQABNjEAATYxAAEmQ5ATJAZKDIABNlEAATZRAAE2YQABNmEAATZxAAE2cQABNoEAATaBAAE2kQABNpEAATahAAE2qwABFrZwAzAEBrEAATbBAAE2wQABNtEAATbRAAE24QABNuEAATbxAAE28QABNwEAAhcFAJACRAcBgAA+AAEXEXADMAQHH4ABNxIAATcSAAE3IgABNyIAATciAAE3IgABNzIAATcyAAE3MgABRzIAACiQIjQHQgABN0IAATdCAAE3VAABN1IAATdSAAE3UgABN2IAATdiAAE3YgABN2IAATdyAAEnfRAiJAd/EAMUB3sBkAI0B3gAATeCAAE3ggABN4IAATeCAAE3kgABN5IAATeSAAE3kgABN6IAAiemBYABN6wAEieuAQABN7yAETeyAAE3sgABN7IAATfCAAE3wgABN8IAATfCAAE30gABN9IAATfSAAIn3QYAASfskDI0B+yAEjfpAYAAMgABN/IAATfyAAE38gABN/IAAigAgwACOAKAgAAsEDM0CAaBAAA1gBE4B4ABOAYAETgKAAEYEnADMAQIGoACGBQBEAI0CBsAATgUAAE4FAABOBQAATgUAAEYIvADMAQIJAABKCcQQxQIJYGQAzQIJ4CAATmAgAE7gIABPYCAAS+AgAI4MYCAADQAATg0AAE4NAABODQAATg0AAE4MAARODyAEThCgBE4TQAROEMAEThPgBE4Q4AROEAAIThEAAE4RAABOFQAAThUAAE4VAABOFSAEihYigACOFqAgAE8gIABLoCAAThogBE4aIAROGiAEThkAAE4ZAABOGQAAThkAAE4ZAABOHQAATh0gBE4eIAROHiAETh4gBE4eIAROHiAETh4gBAl8AMwBAiEAAE4hAABOIQAATiEAAE4hAABOISAETiIgBE4iIAROJQAATiUAAE4lAABOJQAATiUAAE4mAABOJgAARigcBMwBAikAAE4pAABOKyAETisgBE4rIAROKgAATioAAE4qAABOLCAITiwgCE4tAABOLQAATi0AAE4tAABOLQAATi0AAE4zIAROMyAETjMgBE4wIAhOMCAITjAgCE4wIAhOMCAITjUAAE41AABONQAATjUAAE41AABONyAETjcgBE43IAROOyAETjggCE44IAhOOCAISjqkBI0COQAATjkAAE45AABOPQAATj0AAAo8AD+gH/////////9YSjzAJAwIAHwQIAFQfA2gAvAQ4AQ/YAL0P0ABVD0AB//8HDxgC//8HD5gE/y5QA/////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "1000" -# maxLen: "4" -# avgLen: "4" -# nullCount: "0" -# nullable: true -# - name: "title" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "997" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "255" -# avgLen: "15" -# nullCount: "0" -# nullable: true -# - name: "description" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "983" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "130" -# avgLen: "94" -# nullCount: "0" -# nullable: true -# - name: "category" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "16" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "25" -# avgLen: "7" -# nullCount: "0" -# nullable: true -# - name: "price" -# type: "decimal(4,2)" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "3" -# sourceDataType: "numeric" -# isNullable: "true" -# isUnique: "false" -# scale: 2 -# numericPrecision: "4" -# numericScale: "2" -# min: "0.99" -# histogram: "TFo0QmxvY2smXQAAAOAXAABDceoB/wVAD2bNmzZs2gAAAP4/764UeuFHrggA//+fj0AH64UeuFHsCAD//3h/E/XCj1wo9ggA//+PD+gH/////////9wEiAoSAAEAEgEHAA8IAP/////////VUAH/////TFo0QmxvY2sWAAAAAAAAAAAAAAAA" -# max: "4.99" -# maxLen: "8" -# avgLen: "8" -# nullCount: "0" -# nullable: true -# - name: "length" -# type: "short" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "141" -# sourceDataType: "smallint" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "16" -# numericScale: "0" -# min: "46" -# histogram: "TFo0QmxvY2smRwMAAOAXAABmxt0I8QBAD2bNmzZs2gAAAP5ARwABAAYIABOAEAADCAARSA8ABwgABBgAAwgAE0kgAAUIAAQYAAMIABNKIAAUShAACwgAE0sgABNLGAATTBAAFEwQAAMIABNNGAAFCAAEGAADCAATTiAABQgABBgACwgAE08oABNPGAAECAATUBgADQgAEUAZACNAUDAAI1DAEAADCAARUQ8AMwBAUSgABQgAAzAABQgAAzAAE1IwABNSMAANCAADMAANCAAEQAALCAATU1AABQgAA1gAE1NAAAUIAAM4ABNUMAANCAADOAATVDgABQgAAzgAE1U4AA0IAAM4AA8IAA4DWAATVVAAE1ZQABNWOAAFCAADKAAUVigAAwgAE1cwAA0IAANAAAUIAANAABNYMAATWBgAFFgYAAQIAANYAAQIABNZMAANCAADQAAFCAADQAAFCAAEQAALCAATWlAAE1o4AAUIAAM4ABNaMAAMCAATWzgAE1s4AAUIAAM4AAUIAANAAAQIABNcOAANCAADSAAFCAADQAANCAADSAATXUgAE10wAAUIAAMwAAUIAAMwABNeMAANCAADQAAFCAADOAAFCAAEQAADCAATX0gAE18wAAUIAAMwAAUIAAQwAAMIABNgOAAhYCAJAAYIAANIAAUIAAIpACNAYEgABQgAE6A4AARQAAQIABPgGAADCAARYQ8ABwgAA3AADQgAA3AAE2FwAA0IAAOAABNheAAFCAADeAAFCAAEeAADCAATYoAABQgAA3gAE2JwAAUIAAN4AAUIAANwAAUIAAN4ABNiaAAFCAAEaAADCAATY3AABQgAA3AADQgAA4AABQgAA3gAE2NwABNjcAAFCAADeAAUY2gAAwgAE2RwABNkYAAFCAARQBkABggAA2AABQgAA2gAE2RoABNkYAAUZFgAAwgAEWVHADMAQGVgABNlWAATZUgABQgAA1AADQgAA2AAE2VgAAUIAANgABNmYAANCAADcAATZnAADQgAA4AADQgAA4AABQgAA4AABQgAA4gABQgAA4gAE2eIAAUIAAOAAAQIAAXwBwNYAAQIAA/oB//////////EA/AHAwIAHwEIAP//vh8C0AL//74PoAX//zdQAf////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "185" -# maxLen: "2" -# avgLen: "2" -# nullCount: "0" -# nullable: true -# - name: "rating" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "5" -# sourceDataType: "USER-DEFINED" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "5" -# avgLen: "3" -# nullCount: "0" -# nullable: true -# - name: "actors" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "997" -# distinctCount: "970" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "201" -# avgLen: "73" -# nullCount: "0" -# nullable: true -# enabled: true -#- name: "postgresDvd_public_sales_by_film_category" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.sales_by_film_category" -# schema: -# type: "generated" -# fields: -# - name: "category" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "16" -# distinctCount: "16" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "25" -# avgLen: "7" -# nullCount: "0" -# nullable: true -# - name: "total_sales" -# type: "decimal(38,18)" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "16" -# distinctCount: "14" -# sourceDataType: "numeric" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# min: "3071.520000000000000000" -# histogram: "TFo0QmxvY2sm3gAAAOAXAADtT2AH/wU/sCBAgQIECAAAAP5Ap/8KPXCj1wgAZn+pNrhR64UfCABnb9rHrhR64QgAZn+qMsKPXCj2CABnIZKKAAIPCABlf61LTMzMzM0IAGcT7IAADwgAZH+upFwo9cKPCABnb7zwo9cKPQgAXxDfdwMfSAgAZxL8eAMPCABlf69E9cKPXCkIAGYxsBZ1gAAPCABmb5VPXCj1wwgAZyHwAngFDwgAZTGzHDB4Aw8IAF0P8AdlD+AHZgRgCA/oB////////+IDWAgSAAEAEgEHAA8IAP/////////VUAH/////TFo0QmxvY2sWAAAAAAAAAAAAAAAA" -# max: "4892.190000000000000000" -# maxLen: "16" -# avgLen: "16" -# nullCount: "0" -# nullable: true -# enabled: true - name: "postgresDvd_public_store" type: "jdbc" count: @@ -698,7 +98,7 @@ steps: type: "generated" fields: - name: "store_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -721,7 +121,7 @@ steps: nullCount: "0" nullable: false - name: "manager_staff_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -742,7 +142,7 @@ steps: nullCount: "0" nullable: false - name: "address_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -783,219 +183,6 @@ steps: nullCount: "0" nullable: false enabled: true -#- name: "postgresDvd_public_sales_by_store" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.sales_by_store" -# schema: -# type: "generated" -# fields: -# - name: "store" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "19" -# avgLen: "18" -# nullCount: "0" -# nullable: true -# - name: "manager" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "12" -# avgLen: "12" -# nullCount: "0" -# nullable: true -# - name: "total_sales" -# type: "decimal(38,18)" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "numeric" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# min: "30628.910000000000000000" -# histogram: "TFo0QmxvY2smUwAAAOAXAACm0aUB/wU/gCBAgQIECAAAAP5A3ek6PXCj1wgA////6m/2yFHrhR8IAP///9oP8Af////oD+gH////4ALwAxIAAQASAQcADwgA/////////9VQAf////9MWjRCbG9jaxYAAAAAAAAAAAAAAAA=" -# max: "30683.130000000000000000" -# maxLen: "16" -# avgLen: "16" -# nullCount: "0" -# nullable: true -# enabled: true -#- name: "postgresDvd_public_staff_list" -# type: "jdbc" -# count: -# total: 1000 -# options: -# dbtable: "public.staff_list" -# schema: -# type: "generated" -# fields: -# - name: "id" -# type: "integer" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "integer" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "32" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smSAAAAOAXAADfH2wC8QA/gCBAgQIECAAAAP4/8AABAA8IAP///+gRQP4DDwgA////2g/oB//////////cBNgLAwIAHwEIAP/////////bUAH/////TFo0QmxvY2sWAAAAAAAAAAAAAAAA" -# max: "2" -# maxLen: "4" -# avgLen: "4" -# nullCount: "0" -# nullable: true -# - name: "name" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "text" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Name.name}" -# maxLen: "12" -# avgLen: "12" -# nullCount: "0" -# nullable: true -# - name: "address" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.fullAddress}" -# maxLen: "50" -# avgLen: "19" -# nullCount: "0" -# nullable: true -# - name: "zip code" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "1" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "10" -# avgLen: "0" -# nullCount: "0" -# nullable: true -# - name: "phone" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# maxLen: "20" -# avgLen: "11" -# nullCount: "0" -# nullable: true -# - name: "city" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.city}" -# maxLen: "50" -# avgLen: "10" -# nullCount: "0" -# nullable: true -# - name: "country" -# type: "string" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "character varying" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# expression: "#{Address.country}" -# maxLen: "50" -# avgLen: "8" -# nullCount: "0" -# nullable: true -# - name: "sid" -# type: "short" -# generator: -# type: "random" -# options: -# isPrimaryKey: "false" -# count: "2" -# distinctCount: "2" -# sourceDataType: "smallint" -# isNullable: "true" -# isUnique: "false" -# scale: 0 -# numericPrecision: "16" -# numericScale: "0" -# min: "1" -# histogram: "TFo0QmxvY2smSAAAAOAXAADfH2wC8QA/gCBAgQIECAAAAP4/8AABAA8IAP///+gRQP4DDwgA////2g/oB//////////cBNgLAwIAHwEIAP/////////bUAH/////TFo0QmxvY2sWAAAAAAAAAAAAAAAA" -# max: "2" -# maxLen: "2" -# avgLen: "2" -# nullCount: "0" -# nullable: true -# enabled: true - name: "postgresDvd_public_address" type: "jdbc" count: @@ -1006,7 +193,7 @@ steps: type: "generated" fields: - name: "address_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1079,7 +266,7 @@ steps: nullCount: "0" nullable: false - name: "city_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1162,7 +349,7 @@ steps: type: "generated" fields: - name: "category_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1232,7 +419,7 @@ steps: type: "generated" fields: - name: "city_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1272,7 +459,7 @@ steps: nullCount: "0" nullable: false - name: "country_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1323,7 +510,7 @@ steps: type: "generated" fields: - name: "country_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1393,7 +580,7 @@ steps: type: "generated" fields: - name: "customer_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1416,7 +603,7 @@ steps: nullCount: "0" nullable: false - name: "store_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1488,7 +675,7 @@ steps: nullCount: "0" nullable: true - name: "address_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1568,7 +755,7 @@ steps: nullCount: "0" nullable: true - name: "active" - type: "integer" + type: "int" generator: type: "random" options: @@ -1599,7 +786,7 @@ steps: type: "generated" fields: - name: "actor_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -1822,7 +1009,7 @@ steps: - "200" nullable: false - name: "film_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1876,7 +1063,7 @@ steps: type: "generated" fields: - name: "film_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -1898,7 +1085,7 @@ steps: nullCount: "0" nullable: false - name: "category_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -1969,7 +1156,7 @@ steps: type: "generated" fields: - name: "inventory_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -1992,7 +1179,7 @@ steps: nullCount: "0" nullable: false - name: "film_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -2013,7 +1200,7 @@ steps: nullCount: "0" nullable: false - name: "store_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -2069,7 +1256,7 @@ steps: type: "generated" fields: - name: "language_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -2139,7 +1326,7 @@ steps: type: "generated" fields: - name: "rental_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -2181,7 +1368,7 @@ steps: nullCount: "0" nullable: false - name: "inventory_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -2202,7 +1389,7 @@ steps: nullCount: "0" nullable: false - name: "customer_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -2842,7 +2029,7 @@ steps: nullCount: "0" nullable: true - name: "staff_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -2898,7 +2085,7 @@ steps: type: "generated" fields: - name: "staff_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -2955,7 +2142,7 @@ steps: nullCount: "0" nullable: false - name: "address_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -2993,7 +2180,7 @@ steps: nullCount: "0" nullable: true - name: "store_id" - type: "short" + type: "smallint" generator: type: "random" options: @@ -3112,7 +2299,7 @@ steps: type: "generated" fields: - name: "payment_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -3135,7 +2322,7 @@ steps: nullCount: "0" nullable: false - name: "customer_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -3756,7 +2943,7 @@ steps: - "195" nullable: false - name: "staff_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -3780,7 +2967,7 @@ steps: - "1" nullable: false - name: "rental_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -3864,7 +3051,7 @@ steps: type: "generated" fields: - name: "film_id" - type: "integer" + type: "int" generator: type: "random" options: @@ -3919,7 +3106,7 @@ steps: nullCount: "0" nullable: true - name: "release_year" - type: "integer" + type: "int" generator: type: "oneOf" options: @@ -3942,7 +3129,7 @@ steps: - "2006" nullable: true - name: "language_id" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -3965,7 +3152,7 @@ steps: - "1" nullable: false - name: "rental_duration" - type: "short" + type: "smallint" generator: type: "oneOf" options: @@ -4019,7 +3206,7 @@ steps: - "2.99" nullable: false - name: "length" - type: "short" + type: "smallint" generator: type: "random" options: @@ -4129,7 +3316,7 @@ steps: - "2013-05-26 14:50:58.951" nullable: false - name: "special_features" - type: "arrayAll you need to do is define which data source you want to run with via a command like below:
DATA_SOURCE=postgres docker-compose up -d datacaterer
-You can change DATA_SOURCE
to one of the following:
+
You can change DATA_SOURCE
to one of the following:
- postgres
- mysql
- cassandra
- solace
-- kafka
Using Data Caterer, you have the ability to generate production like data based on any source/target system whether it be a CSV file, database table, etc. anywhere you want the data to be. Whether it be in a test environment or even in your local laptop. Just define your data source connections and data will be generated. It can also be manually altered to produce data or scenarios the way you want.
"},{"location":"advanced/advanced/","title":"Advanced use cases","text":""},{"location":"advanced/advanced/#special-data-formats","title":"Special data formats","text":"There are many options available for you to use when you have a scenario when data has to be a certain format.
If you have a use case where you require a columns value to match in another data set, this can be achieved in the plan definition. For example, if I have the column account_number
in a data source named customer-postgres
and column account_id
in transaction-cassandra
,
sinkOptions:\n foreignKeys:\n #The foreign key name with naming convention [dataSourceName].[taskName].[columnName]\n \"customer-postgres.accounts.account_number\":\n #List of columns to match with same naming convention\n - \"transaction-cassandra.transactions.account_id\"\n
Sample can be found here. You can define any number of foreign key relationships as you want.
"},{"location":"advanced/advanced/#edge-cases","title":"Edge cases","text":"For each given data type, there are edge cases which can cause issues when your application processes the data. This can be controlled at a column level by including the following flag in the generator options:
fields:\n - name: \"amount\"\n type: \"double\"\n generator:\n type: \"random\"\n options:\n enableEdgeCases: \"true\" \n
If you want to know all the possible edge cases for each data type, can check the documentation here.
"},{"location":"advanced/advanced/#scenario-testing","title":"Scenario testing","text":"You can create specific scenarios by adjusting the metadata found in the plan and tasks to your liking. For example, if you had two data sources, a Postgres database and a parquet file, and you wanted to save account data into Postgres and transactions related to those accounts into a parquet file. You can alter the status
column in the account data to only generate open
accounts and define a foreign key between Postgres and parquet to ensure the same account_id
is being used. Then in the parquet task, define 1 to 10 transactions per account_id
to be generated.
Postgres account generation example task Parquet transaction generation example task Plan
"},{"location":"advanced/advanced/#storing-plantasks-in-cloud-storage","title":"Storing plan/task(s) in cloud storage","text":"You can generate and store the plan/task files inside either AWS S3, Azure Blob Storage or Google GCS. This can be controlled via configuration set in the application.conf
file where you can set something like the below:
folders {\n generatedPlanAndTaskFolderPath = \"s3a://my-bucket/data-caterer/generated\"\n planFilePath = \"s3a://my-bucket/data-caterer/generated/plan/customer-create-plan.yaml\"\n taskFolderPath = \"s3a://my-bucket/data-caterer/generated/task\"\n}\n\nspark {\n config {\n ...\n #S3\n \"spark.hadoop.fs.s3a.directory.marker.retention\" = \"keep\"\n \"spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled\" = \"true\"\n \"spark.hadoop.fs.defaultFS\" = \"s3a://my-bucket\"\n #can change to other credential providers as shown here\n #https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Changing_Authentication_Providers\n \"spark.hadoop.fs.s3a.aws.credentials.provider\" = \"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider\"\n \"spark.hadoop.fs.s3a.access.key\" = \"access_key\"\n \"spark.hadoop.fs.s3a.secret.key\" = \"secret_key\"\n }\n}\n
"},{"location":"connections/connections/","title":"Data Source Connections","text":"Details of all the connection configuration supported can be found in the below subsections for each type of connection.
"},{"location":"connections/connections/#supported-data-connections","title":"Supported Data Connections","text":"Data Source Type Data Source Database Postgres, MySQL, Cassandra File CSV, JSON, ORC, Parquet Kafka Kafka JMS Solace HTTP GET, PUT, POSTAll connection details follow the same pattern.
<connection format> {\n <connection name> {\n <key> = <value>\n }\n}\n
When defining a configuration value that can be defined by a system property or environment variable at runtime, you can define that via the following:
url = \"localhost\"\nurl = ${?POSTGRES_URL}\n
The above defines that if there is a system property or environment variable named POSTGRES_URL
, then that value will be used for the url
, otherwise, it will default to localhost
.
To find examples of a task for each type of data source, please check out this page.
"},{"location":"connections/connections/#file","title":"File","text":"Linked here is a list of generic options that can be included as part of your file data source configuration if required. Links to specific file type configurations can be found below.
"},{"location":"connections/connections/#csv","title":"CSV","text":"csv {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?CSV_PATH}\n }\n}\n
Other available configuration for CSV can be found here
"},{"location":"connections/connections/#json","title":"JSON","text":"json {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?JSON_PATH}\n }\n}\n
Other available configuration for JSON can be found here
"},{"location":"connections/connections/#orc","title":"ORC","text":"orc {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?ORC_PATH}\n }\n}\n
Other available configuration for ORC can be found here
"},{"location":"connections/connections/#parquet","title":"Parquet","text":"parquet {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?PARQUET_PATH}\n }\n}\n
Other available configuration for Parquet can be found here
"},{"location":"connections/connections/#delta-not-supported-yet","title":"Delta (not supported yet)","text":"delta {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?DELTA_PATH}\n }\n}\n
"},{"location":"connections/connections/#jdbc","title":"JDBC","text":"Follows the same configuration used by Spark as found here. Sample can be found below
jdbc {\n postgres {\n url = \"jdbc:postgresql://localhost:5432/customer\"\n url = ${?POSTGRES_URL}\n user = \"postgres\"\n user = ${?POSTGRES_USERNAME}\n password = \"postgres\"\n password = ${?POSTGRES_PASSWORD}\n driver = \"org.postgresql.Driver\"\n }\n}\n
Ensure that the user has write permission so it is able to save the table to the target tables.
GRANT INSERT ON <schema>.<table> TO <user>;\n
"},{"location":"connections/connections/#postgres","title":"Postgres","text":""},{"location":"connections/connections/#permissions","title":"Permissions","text":"Following permissions are required when generating plan and tasks:
GRANT SELECT ON information_schema.tables TO < user >;\nGRANT SELECT ON information_schema.columns TO < user >;\nGRANT SELECT ON information_schema.key_column_usage TO < user >;\nGRANT SELECT ON information_schema.table_constraints TO < user >;\nGRANT SELECT ON information_schema.constraint_column_usage TO < user >;\n
"},{"location":"connections/connections/#mysql","title":"MySQL","text":""},{"location":"connections/connections/#permissions_1","title":"Permissions","text":"Following permissions are required when generating plan and tasks:
GRANT SELECT ON information_schema.columns TO < user >;\nGRANT SELECT ON information_schema.statistics TO < user >;\nGRANT SELECT ON information_schema.key_column_usage TO < user >;\n
"},{"location":"connections/connections/#cassandra","title":"Cassandra","text":"Follows same configuration as defined by the Spark Cassandra Connector as found here
org.apache.spark.sql.cassandra {\n cassandra {\n spark.cassandra.connection.host = \"localhost\"\n spark.cassandra.connection.host = ${?CASSANDRA_HOST}\n spark.cassandra.connection.port = \"9042\"\n spark.cassandra.connection.port = ${?CASSANDRA_PORT}\n spark.cassandra.auth.username = \"cassandra\"\n spark.cassandra.auth.username = ${?CASSANDRA_USERNAME}\n spark.cassandra.auth.password = \"cassandra\"\n spark.cassandra.auth.password = ${?CASSANDRA_PASSWORD}\n }\n}\n
Ensure that the user has write permission so it is able to save the table to the target tables.
GRANT INSERT ON <schema>.<table> TO <user>;\n
"},{"location":"connections/connections/#kafka","title":"Kafka","text":"Define your Kafka bootstrap server to connect and send generated data to corresponding topics. Topic gets set at a step level. Further details can be found here
kafka {\n kafka {\n kafka.bootstrap.servers = \"localhost:9092\"\n kafka.bootstrap.servers = ${?KAFKA_BOOTSTRAP_SERVERS}\n }\n}\n
When defining your schema for pushing data to Kafka, it follows a specific top level schema. An example can be found here. You can define the key, value, headers, partition or topic by following the linked schema.
"},{"location":"connections/connections/#jms","title":"JMS","text":"Uses JNDI lookup to send messages to JMS queue. Ensure that the messaging system you are using has your queue/topic registered via JNDI otherwise a connection cannot be created.
jms {\n solace {\n initialContextFactory = \"com.solacesystems.jndi.SolJNDIInitialContextFactory\"\n connectionFactory = \"/jms/cf/default\"\n url = \"smf://localhost:55555\"\n url = ${?SOLACE_URL}\n user = \"admin\"\n user = ${?SOLACE_USER}\n password = \"admin\"\n password = ${?SOLACE_PASSWORD}\n vpnName = \"default\"\n vpnName = ${?SOLACE_VPN}\n }\n}\n
"},{"location":"connections/connections/#http","title":"HTTP","text":"Define a URL to connect to when sending HTTP requests. Later, can have the ability to define generated data as part of the URL.
http {\n customer_api {\n url = \"http://localhost:80/get\"\n url = ${?HTTP_URL}\n user = \"admin\" #optional\n user = ${?HTTP_USER}\n password = \"admin\" #optional\n password = ${?HTTP_PASSWORD}\n }\n}\n
"},{"location":"generators/count/","title":"Record Count","text":"There are options related to controlling the number of records generated that can help in generating the scenarios or data required.
"},{"location":"generators/count/#total-count","title":"Total Count","text":"Total count is the simplest as you define the total number of records you require for that particular step. For example, in the below step, it will generate 1000 records for the CSV file
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n
"},{"location":"generators/count/#generated-count","title":"Generated Count","text":"As like most things in data-caterer, the count can be generated based on some metadata. For example, if I wanted to generate between 1000 and 2000 records, I could define that by the below configuration:
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n generator:\n type: \"random\"\n options:\n min: 1000\n max: 2000\n
"},{"location":"generators/count/#per-column-count","title":"Per Column Count","text":"When defining a per column count, this allows you to generate records \"per set of columns\". This means that for a given set of columns, it will generate a particular amount of records per combination of values for those columns.
One example of this would be when generating transactions relating to a customer. A customer may be defined by columns account_id, name
. A number of transactions would be generated per account_id,name
.
You can also use a combination of the above two methods to generate the number of records per column.
"},{"location":"generators/count/#total","title":"Total","text":"When defining a total count within the perColumn
configuration, it translates to only creating (count.total * count.perColumn.total)
records. This is a fixed number of records that will be generated each time, with no variation between runs.
In the example below, we have count.total=1000
and count.perColumn.total=2
. Which means that 1000 * 2=2000
records will be generated for this CSV file every time data gets generated.
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n perColumn:\n total: 2\n columnNames:\n - \"account_id\"\n - \"name\"\n
"},{"location":"generators/count/#generated","title":"Generated","text":"You can also define a generator for the count per column. This can be used in scenarios where you want a variable number of records per set of columns.
In the example below, it will generate between (count.total * count.perColumn.generator.options.minValue) = (1000 * 1) = 1000
and (count.total * count.perColumn.generator.options.maxValue) = (1000 * 2) = 2000
records.
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n perColumn:\n columnNames:\n - \"account_id\"\n - \"name\"\n generator:\n type: \"random\"\n options:\n maxValue: 2\n minValue: 1\n
"},{"location":"generators/generators/","title":"Data Generators","text":""},{"location":"generators/generators/#data-types","title":"Data Types","text":"Below is a list of all supported data types for generating data:
Data Type Spark Data Type Options Description string StringType minLen, maxLen, expression, enableNull integer IntegerType min, minValue, max, maxValue long LongType min, minValue, max, maxValue short ShortType min, minValue, max, maxValue decimal(precision, scale) DecimalType(precision, scale) min, minValue, max, maxValue double DoubleType min, minValue, max, maxValue float FloatType min, minValue, max, maxValue date DateType min, max, enableNull timestamp TimestampType min, max, enableNull boolean BooleanType binary BinaryType minLen, maxLen, enableNull byte ByteType array ArrayType listMinLen, listMaxLen _ StructType Implicitly supported when a schema is defined for a field"},{"location":"generators/generators/#options","title":"Options","text":""},{"location":"generators/generators/#all-data-types","title":"All data types","text":"Some options are available to use for all types of data generators. Below is the list along with example and descriptions:
Option Default Example Description enableEdgeCases false enableEdgeCases: \"true\" Enable/disable generated data to contain edge cases based on the data type. For example, integer data type has edge cases of (Int.MaxValue, Int.MinValue and 0) isUnique false isUnique: \"true\" Enable/disable generated data to be unique for that column. Errors will be thrown when it is unable to generate unique data seed seed: \"1\" Defines the random seed for generating data for that particular column. It will override any seed defined at a global level sql sql: \"CASE WHEN amount < 10 THEN true ELSE false END\" Define any SQL statement for generating that columns value. Computation occurs after all non-SQL fields are generated. This means any columns used in the SQL cannot be based on other SQL generated columns. Data type of generated value from SQL needs to match data type defined for the field"},{"location":"generators/generators/#string","title":"String","text":"Option Default Example Description minLen 1 minLen: \"2\" Ensures that all generated strings have at least lengthminLen
maxLen 10 maxLen: \"15\" Ensures that all generated strings have at most length maxLen
expression expression: \"#{Name.name}\" expression:\"#{Address.city}/#{Demographic.maritalStatus}\" Will generate a string based on the faker expression provided. All possible faker expressions can be found here Expression has to be in format #{<faker expression name>}
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (\"\", \"\\n\", \"\\r\", \"\\t\", \" \", \"\\u0000\", \"\\ufff\")
"},{"location":"generators/generators/#numeric","title":"Numeric","text":"For all the numeric data types, there are 4 options to choose from: min, minValue, max and maxValue. Generally speaking, you only need to define one of min or minValue, similarly with max or maxValue. The reason why there are 2 options for each is because of when metadata is automatically gathered, we gather the statistics of the observed min and max values. Also, it will attempt to gather any restriction on the min or max value as defined by the data source (i.e. max value as per database type).
"},{"location":"generators/generators/#integerlongshortdecimal","title":"Integer/Long/Short/Decimal","text":"Option Default Example Description minValue 0 minValue: \"2\" Ensures that all generated values are greater than or equal tominValue
min 0 min: \"2\" Ensures that all generated values are greater than or equal to min
. If minValue
is defined, minValue
will define the lowest possible generated value maxValue 1000 maxValue: \"25\" Ensures that all generated values are less than or equal to maxValue
max 1000 max: \"25\" Ensures that all generated values are less than or equal to maxValue
. If maxValue
is defined, maxValue
will define the largest possible generated value Edge cases Integer: (2147483647, -2147483648, 0) Edge cases Long/Decimal: (9223372036854775807, -9223372036854775808, 0) Edge cases Short: (32767, -32768, 0)
"},{"location":"generators/generators/#doublefloat","title":"Double/Float","text":"Option Default Example Description minValue 0.0 minValue: \"2.1\" Ensures that all generated values are greater than or equal tominValue
min 0.0 min: \"2.1\" Ensures that all generated values are greater than or equal to min
. If minValue
is defined, minValue
will define the lowest possible generated value maxValue 1000.0 maxValue: \"25.9\" Ensures that all generated values are less than or equal to maxValue
max 1000.0 max: \"25.9\" Ensures that all generated values are less than or equal to maxValue
. If maxValue
is defined, maxValue
will define the largest possible generated value Edge cases Double: (+infinity, 1.7976931348623157e+308, 4.9e-324, 0.0, -0.0, -1.7976931348623157e+308, -infinity, NaN) Edge cases Float: (+infinity, 3.4028235e+38, 1.4e-45, 0.0, -0.0, -3.4028235e+38, -infinity, NaN)
"},{"location":"generators/generators/#date","title":"Date","text":"Option Default Example Description min now() - 365 days min: \"2023-01-31\" Ensures that all generated values are greater than or equal tomin
max now() max: \"2023-12-31\" Ensures that all generated values are less than or equal to max
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (0001-01-01, 1582-10-15, 1970-01-01, 9999-12-31) (reference)
"},{"location":"generators/generators/#timestamp","title":"Timestamp","text":"Option Default Example Description min now() - 365 days min: \"2023-01-31 23:10:10\" Ensures that all generated values are greater than or equal tomin
max now() max: \"2023-12-31 23:10:10\" Ensures that all generated values are less than or equal to max
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (0001-01-01 00:00:00, 1582-10-15 23:59:59, 1970-01-01 00:00:00, 9999-12-31 23:59:59)
"},{"location":"generators/generators/#binary","title":"Binary","text":"Option Default Example Description minLen 1 minLen: \"2\" Ensures that all generated array of bytes have at least lengthminLen
maxLen 20 maxLen: \"15\" Ensures that all generated array of bytes have at most length maxLen
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (\"\", \"\\n\", \"\\r\", \"\\t\", \" \", \"\\u0000\", \"\\ufff\", -128, 127)
"},{"location":"generators/generators/#list","title":"List","text":"Option Default Example Description listMinLen 0 listMinLen: \"2\" Ensures that all generated lists have at least lengthlistMinLen
listMaxLen 5 listMaxLen: \"15\" Ensures that all generated lists have at most length listMaxLen
enableNull false enableNull: \"true\" Enable/disable null values being generated"},{"location":"get-started/docker/","title":"Run Data Caterer","text":""},{"location":"get-started/docker/#docker","title":"Docker","text":""},{"location":"get-started/docker/#quick-start","title":"Quick start","text":"git clone git@github.com:pflooky/data-caterer-docs.git\ncd docs/sample/docker\nDATA_SOURCE=postgres docker-compose up -d datacaterer\n
You can change DATA_SOURCE
to one of the following:
If you want to test it out with your own setup, you can alter the corresponding files under docs/sample/docker/data
"},{"location":"get-started/docker/#run-with-multiple-data-sources-postgres-and-csv-file","title":"Run with multiple data sources (Postgres and CSV File)","text":"PLAN=plan/scenario-based DATA_SOURCE=postgres docker-compose up -d datacaterer\nhead data/custom/csv/transactions/part-00000*\nsample_account=$(head -1 data/custom/csv/transactions/part-00000* | awk -F \",\" '{print $1}')\ndocker exec docker-postgres-1 psql -Upostgres -d customer -c \"SELECT * FROM account.accounts WHERE account_number='$sample_account'\"\n
You should be able to see the linked data between Postgres and the CSV file created along with 1 to 10 records per account_id, name combination in the CSV file.
"},{"location":"get-started/docker/#run-with-custom-data-sources","title":"Run with custom data sources","text":"data/custom/plan
data/custom/task
data/custom/application.conf
DATA_SOURCE=<data source name> docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#generate-plan-and-tasks","title":"Generate plan and tasks","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=false ENABLE_GENERATE_PLAN_AND_TASKS=true DATA_SOURCE=postgresdvd docker-compose up -d datacaterer\ncat data/custom/generated/plan/plan_*\n
"},{"location":"get-started/docker/#generate-data-with-record-tracking","title":"Generate data with record tracking","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=true ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_RECORD_TRACKING=true DATA_SOURCE=postgresdvd PLAN=generated/plan/plan_20230803T040203Z docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#delete-the-generated-data","title":"Delete the generated data","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=false ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_DELETE_GENERATED_RECORDS=true DATA_SOURCE=postgresdvd docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#helm","title":"Helm","text":"Link to sample helm on GitHub here
Update the configuration to your own data connections and configuration.
git clone git@github.com:pflooky/data-caterer-docs.git\nhelm install data-caterer ./data-caterer-docs/helm/data-caterer\n
"},{"location":"sample/","title":"Samples","text":"Below are examples of different types of plans and tasks that can be helpful when trying to create your own. You can use these as a template or to search for something related to your particular use case.
"},{"location":"sample/#base-concept","title":"Base Concept","text":"The execution of the data generator is based on the concept of plans and tasks. A plan represent the set of tasks that need to be executed, along with other information that spans across tasks, such as foreign keys between data sources. A task represent the component(s) of a data source and its associated metadata so that it understands what the data should look like and how many steps (sub data sources) there are (i.e. tables in a database, topics in Kafka). Tasks can define one or more steps.
"},{"location":"sample/#plan","title":"Plan","text":""},{"location":"sample/#foreign-keys","title":"Foreign Keys","text":"Define foreign keys across data sources in your plan to ensure generated data can match Link to associated task 1 Link to associated task 2
"},{"location":"sample/#task","title":"Task","text":"Data Source Type Data Source Sample Task Notes Database Postgres Sample Database MySQL Sample Database Cassandra Sample File CSV Sample File JSON Sample Contains nested schemas and use of SQL for generated values File Parquet Sample Partition by year column Kafka Kafka Sample Specific base schema to be used, define headers, key, value, etc. JMS Solace Sample JSON formatted message HTTP PUT Sample JSON formatted PUT body"},{"location":"sample/#configuration","title":"Configuration","text":"Basic configuration
"},{"location":"sample/docker/","title":"Data Caterer - Docker Compose","text":"If you want to try out data caterer generating data for various data sources, you do use the following docker-compose file.
All you need to do is define which data source you want to run with via a command like below:
DATA_SOURCE=postgres docker-compose up -d datacaterer\n
You can change DATA_SOURCE
to one of the following: - postgres - mysql - cassandra - solace - kafka
Using Data Caterer, you have the ability to generate production like data based on any source/target system whether it be a CSV file, database table, etc. anywhere you want the data to be. Whether it be in a test environment or even in your local laptop. Just define your data source connections and data will be generated. It can also be manually altered to produce data or scenarios the way you want.
Main features of the data generator include: - Ability to gather metadata about data sources - Generate data in either batch or real-time - Maintain referential integrity across generated data - Create custom data generation scenarios - Delete generated data
"},{"location":"advanced/advanced/","title":"Advanced use cases","text":""},{"location":"advanced/advanced/#special-data-formats","title":"Special data formats","text":"There are many options available for you to use when you have a scenario when data has to be a certain format.
If you have a use case where you require a columns value to match in another data set, this can be achieved in the plan definition. For example, if I have the column account_number
in a data source named customer-postgres
and column account_id
in transaction-cassandra
,
sinkOptions:\n foreignKeys:\n #The foreign key name with naming convention [dataSourceName].[taskName].[columnName]\n \"customer-postgres.accounts.account_number\":\n #List of columns to match with same naming convention\n - \"transaction-cassandra.transactions.account_id\"\n
Sample can be found here. You can define any number of foreign key relationships as you want.
"},{"location":"advanced/advanced/#edge-cases","title":"Edge cases","text":"For each given data type, there are edge cases which can cause issues when your application processes the data. This can be controlled at a column level by including the following flag in the generator options:
fields:\n - name: \"amount\"\n type: \"double\"\n generator:\n type: \"random\"\n options:\n enableEdgeCases: \"true\" \n
If you want to know all the possible edge cases for each data type, can check the documentation here.
"},{"location":"advanced/advanced/#scenario-testing","title":"Scenario testing","text":"You can create specific scenarios by adjusting the metadata found in the plan and tasks to your liking. For example, if you had two data sources, a Postgres database and a parquet file, and you wanted to save account data into Postgres and transactions related to those accounts into a parquet file. You can alter the status
column in the account data to only generate open
accounts and define a foreign key between Postgres and parquet to ensure the same account_id
is being used. Then in the parquet task, define 1 to 10 transactions per account_id
to be generated.
Postgres account generation example task Parquet transaction generation example task Plan
"},{"location":"advanced/advanced/#storing-plantasks-in-cloud-storage","title":"Storing plan/task(s) in cloud storage","text":"You can generate and store the plan/task files inside either AWS S3, Azure Blob Storage or Google GCS. This can be controlled via configuration set in the application.conf
file where you can set something like the below:
folders {\n generatedPlanAndTaskFolderPath = \"s3a://my-bucket/data-caterer/generated\"\n planFilePath = \"s3a://my-bucket/data-caterer/generated/plan/customer-create-plan.yaml\"\n taskFolderPath = \"s3a://my-bucket/data-caterer/generated/task\"\n}\n\nspark {\n config {\n ...\n #S3\n \"spark.hadoop.fs.s3a.directory.marker.retention\" = \"keep\"\n \"spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled\" = \"true\"\n \"spark.hadoop.fs.defaultFS\" = \"s3a://my-bucket\"\n #can change to other credential providers as shown here\n #https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Changing_Authentication_Providers\n \"spark.hadoop.fs.s3a.aws.credentials.provider\" = \"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider\"\n \"spark.hadoop.fs.s3a.access.key\" = \"access_key\"\n \"spark.hadoop.fs.s3a.secret.key\" = \"secret_key\"\n }\n}\n
"},{"location":"connections/connections/","title":"Data Source Connections","text":"Details of all the connection configuration supported can be found in the below subsections for each type of connection.
"},{"location":"connections/connections/#supported-data-connections","title":"Supported Data Connections","text":"Data Source Type Data Source Database Postgres, MySQL, Cassandra File CSV, JSON, ORC, Parquet Kafka Kafka JMS Solace HTTP GET, PUT, POST, DELETE, PATCH, HEAD, TRACE, OPTIONSAll connection details follow the same pattern.
<connection format> {\n <connection name> {\n <key> = <value>\n }\n}\n
When defining a configuration value that can be defined by a system property or environment variable at runtime, you can define that via the following:
url = \"localhost\"\nurl = ${?POSTGRES_URL}\n
The above defines that if there is a system property or environment variable named POSTGRES_URL
, then that value will be used for the url
, otherwise, it will default to localhost
.
To find examples of a task for each type of data source, please check out this page.
"},{"location":"connections/connections/#file","title":"File","text":"Linked here is a list of generic options that can be included as part of your file data source configuration if required. Links to specific file type configurations can be found below.
"},{"location":"connections/connections/#csv","title":"CSV","text":"csv {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?CSV_PATH}\n }\n}\n
Other available configuration for CSV can be found here
"},{"location":"connections/connections/#json","title":"JSON","text":"json {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?JSON_PATH}\n }\n}\n
Other available configuration for JSON can be found here
"},{"location":"connections/connections/#orc","title":"ORC","text":"orc {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?ORC_PATH}\n }\n}\n
Other available configuration for ORC can be found here
"},{"location":"connections/connections/#parquet","title":"Parquet","text":"parquet {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?PARQUET_PATH}\n }\n}\n
Other available configuration for Parquet can be found here
"},{"location":"connections/connections/#delta-not-supported-yet","title":"Delta (not supported yet)","text":"delta {\n customer_transactions {\n path = \"/data/customer/transaction\"\n path = ${?DELTA_PATH}\n }\n}\n
"},{"location":"connections/connections/#jdbc","title":"JDBC","text":"Follows the same configuration used by Spark as found here. Sample can be found below
jdbc {\n postgres {\n url = \"jdbc:postgresql://localhost:5432/customer\"\n url = ${?POSTGRES_URL}\n user = \"postgres\"\n user = ${?POSTGRES_USERNAME}\n password = \"postgres\"\n password = ${?POSTGRES_PASSWORD}\n driver = \"org.postgresql.Driver\"\n }\n}\n
Ensure that the user has write permission so it is able to save the table to the target tables.
GRANT INSERT ON <schema>.<table> TO <user>;\n
"},{"location":"connections/connections/#postgres","title":"Postgres","text":""},{"location":"connections/connections/#permissions","title":"Permissions","text":"Following permissions are required when generating plan and tasks:
GRANT SELECT ON information_schema.tables TO < user >;\nGRANT SELECT ON information_schema.columns TO < user >;\nGRANT SELECT ON information_schema.key_column_usage TO < user >;\nGRANT SELECT ON information_schema.table_constraints TO < user >;\nGRANT SELECT ON information_schema.constraint_column_usage TO < user >;\n
"},{"location":"connections/connections/#mysql","title":"MySQL","text":""},{"location":"connections/connections/#permissions_1","title":"Permissions","text":"Following permissions are required when generating plan and tasks:
GRANT SELECT ON information_schema.columns TO < user >;\nGRANT SELECT ON information_schema.statistics TO < user >;\nGRANT SELECT ON information_schema.key_column_usage TO < user >;\n
"},{"location":"connections/connections/#cassandra","title":"Cassandra","text":"Follows same configuration as defined by the Spark Cassandra Connector as found here
org.apache.spark.sql.cassandra {\n cassandra {\n spark.cassandra.connection.host = \"localhost\"\n spark.cassandra.connection.host = ${?CASSANDRA_HOST}\n spark.cassandra.connection.port = \"9042\"\n spark.cassandra.connection.port = ${?CASSANDRA_PORT}\n spark.cassandra.auth.username = \"cassandra\"\n spark.cassandra.auth.username = ${?CASSANDRA_USERNAME}\n spark.cassandra.auth.password = \"cassandra\"\n spark.cassandra.auth.password = ${?CASSANDRA_PASSWORD}\n }\n}\n
Ensure that the user has write permission so it is able to save the table to the target tables.
GRANT INSERT ON <schema>.<table> TO <user>;\n
"},{"location":"connections/connections/#kafka","title":"Kafka","text":"Define your Kafka bootstrap server to connect and send generated data to corresponding topics. Topic gets set at a step level. Further details can be found here
kafka {\n kafka {\n kafka.bootstrap.servers = \"localhost:9092\"\n kafka.bootstrap.servers = ${?KAFKA_BOOTSTRAP_SERVERS}\n }\n}\n
When defining your schema for pushing data to Kafka, it follows a specific top level schema. An example can be found here. You can define the key, value, headers, partition or topic by following the linked schema.
"},{"location":"connections/connections/#jms","title":"JMS","text":"Uses JNDI lookup to send messages to JMS queue. Ensure that the messaging system you are using has your queue/topic registered via JNDI otherwise a connection cannot be created.
jms {\n solace {\n initialContextFactory = \"com.solacesystems.jndi.SolJNDIInitialContextFactory\"\n connectionFactory = \"/jms/cf/default\"\n url = \"smf://localhost:55555\"\n url = ${?SOLACE_URL}\n user = \"admin\"\n user = ${?SOLACE_USER}\n password = \"admin\"\n password = ${?SOLACE_PASSWORD}\n vpnName = \"default\"\n vpnName = ${?SOLACE_VPN}\n }\n}\n
"},{"location":"connections/connections/#http","title":"HTTP","text":"Define any username and/or password needed for the HTTP requests. The url is defined in the tasks to allow for generated data to be populated in the url.
http {\n customer_api {\n user = \"admin\"\n user = ${?HTTP_USER}\n password = \"admin\"\n password = ${?HTTP_PASSWORD}\n }\n}\n
"},{"location":"generators/count/","title":"Record Count","text":"There are options related to controlling the number of records generated that can help in generating the scenarios or data required.
"},{"location":"generators/count/#total-count","title":"Total Count","text":"Total count is the simplest as you define the total number of records you require for that particular step. For example, in the below step, it will generate 1000 records for the CSV file
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n
"},{"location":"generators/count/#generated-count","title":"Generated Count","text":"As like most things in data-caterer, the count can be generated based on some metadata. For example, if I wanted to generate between 1000 and 2000 records, I could define that by the below configuration:
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n generator:\n type: \"random\"\n options:\n min: 1000\n max: 2000\n
"},{"location":"generators/count/#per-column-count","title":"Per Column Count","text":"When defining a per column count, this allows you to generate records \"per set of columns\". This means that for a given set of columns, it will generate a particular amount of records per combination of values for those columns.
One example of this would be when generating transactions relating to a customer. A customer may be defined by columns account_id, name
. A number of transactions would be generated per account_id,name
.
You can also use a combination of the above two methods to generate the number of records per column.
"},{"location":"generators/count/#total","title":"Total","text":"When defining a total count within the perColumn
configuration, it translates to only creating (count.total * count.perColumn.total)
records. This is a fixed number of records that will be generated each time, with no variation between runs.
In the example below, we have count.total = 1000
and count.perColumn.total = 2
. Which means that 1000 * 2 = 2000
records will be generated for this CSV file every time data gets generated.
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n perColumn:\n total: 2\n columnNames:\n - \"account_id\"\n - \"name\"\n
"},{"location":"generators/count/#generated","title":"Generated","text":"You can also define a generator for the count per column. This can be used in scenarios where you want a variable number of records per set of columns.
In the example below, it will generate between (count.total * count.perColumn.generator.options.minValue) = (1000 * 1) = 1000
and (count.total * count.perColumn.generator.options.maxValue) = (1000 * 2) = 2000
records.
name: \"csv_file\"\nsteps:\n - name: \"transactions\"\n type: \"csv\"\n options:\n path: \"app/src/test/resources/sample/csv/transactions\"\n count:\n total: 1000\n perColumn:\n columnNames:\n - \"account_id\"\n - \"name\"\n generator:\n type: \"random\"\n options:\n maxValue: 2\n minValue: 1\n
"},{"location":"generators/generators/","title":"Data Generators","text":""},{"location":"generators/generators/#data-types","title":"Data Types","text":"Below is a list of all supported data types for generating data:
Data Type Spark Data Type Options Description string StringType minLen, maxLen, expression, enableNull integer IntegerType min, minValue, max, maxValue long LongType min, minValue, max, maxValue short ShortType min, minValue, max, maxValue decimal(precision, scale) DecimalType(precision, scale) min, minValue, max, maxValue double DoubleType min, minValue, max, maxValue float FloatType min, minValue, max, maxValue date DateType min, max, enableNull timestamp TimestampType min, max, enableNull boolean BooleanType binary BinaryType minLen, maxLen, enableNull byte ByteType array ArrayType listMinLen, listMaxLen _ StructType Implicitly supported when a schema is defined for a field"},{"location":"generators/generators/#options","title":"Options","text":""},{"location":"generators/generators/#all-data-types","title":"All data types","text":"Some options are available to use for all types of data generators. Below is the list along with example and descriptions:
Option Default Example Description enableEdgeCases false enableEdgeCases: \"true\" Enable/disable generated data to contain edge cases based on the data type. For example, integer data type has edge cases of (Int.MaxValue, Int.MinValue and 0) isUnique false isUnique: \"true\" Enable/disable generated data to be unique for that column. Errors will be thrown when it is unable to generate unique data seed seed: \"1\" Defines the random seed for generating data for that particular column. It will override any seed defined at a global level sql sql: \"CASE WHEN amount < 10 THEN true ELSE false END\" Define any SQL statement for generating that columns value. Computation occurs after all non-SQL fields are generated. This means any columns used in the SQL cannot be based on other SQL generated columns. Data type of generated value from SQL needs to match data type defined for the field"},{"location":"generators/generators/#string","title":"String","text":"Option Default Example Description minLen 1 minLen: \"2\" Ensures that all generated strings have at least lengthminLen
maxLen 10 maxLen: \"15\" Ensures that all generated strings have at most length maxLen
expression expression: \"#{Name.name}\" expression:\"#{Address.city}/#{Demographic.maritalStatus}\" Will generate a string based on the faker expression provided. All possible faker expressions can be found here Expression has to be in format #{<faker expression name>}
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (\"\", \"\\n\", \"\\r\", \"\\t\", \" \", \"\\u0000\", \"\\ufff\")
"},{"location":"generators/generators/#numeric","title":"Numeric","text":"For all the numeric data types, there are 4 options to choose from: min, minValue, max and maxValue. Generally speaking, you only need to define one of min or minValue, similarly with max or maxValue. The reason why there are 2 options for each is because of when metadata is automatically gathered, we gather the statistics of the observed min and max values. Also, it will attempt to gather any restriction on the min or max value as defined by the data source (i.e. max value as per database type).
"},{"location":"generators/generators/#integerlongshortdecimal","title":"Integer/Long/Short/Decimal","text":"Option Default Example Description minValue 0 minValue: \"2\" Ensures that all generated values are greater than or equal tominValue
min 0 min: \"2\" Ensures that all generated values are greater than or equal to min
. If minValue
is defined, minValue
will define the lowest possible generated value maxValue 1000 maxValue: \"25\" Ensures that all generated values are less than or equal to maxValue
max 1000 max: \"25\" Ensures that all generated values are less than or equal to maxValue
. If maxValue
is defined, maxValue
will define the largest possible generated value Edge cases Integer: (2147483647, -2147483648, 0) Edge cases Long/Decimal: (9223372036854775807, -9223372036854775808, 0) Edge cases Short: (32767, -32768, 0)
"},{"location":"generators/generators/#doublefloat","title":"Double/Float","text":"Option Default Example Description minValue 0.0 minValue: \"2.1\" Ensures that all generated values are greater than or equal tominValue
min 0.0 min: \"2.1\" Ensures that all generated values are greater than or equal to min
. If minValue
is defined, minValue
will define the lowest possible generated value maxValue 1000.0 maxValue: \"25.9\" Ensures that all generated values are less than or equal to maxValue
max 1000.0 max: \"25.9\" Ensures that all generated values are less than or equal to maxValue
. If maxValue
is defined, maxValue
will define the largest possible generated value Edge cases Double: (+infinity, 1.7976931348623157e+308, 4.9e-324, 0.0, -0.0, -1.7976931348623157e+308, -infinity, NaN) Edge cases Float: (+infinity, 3.4028235e+38, 1.4e-45, 0.0, -0.0, -3.4028235e+38, -infinity, NaN)
"},{"location":"generators/generators/#date","title":"Date","text":"Option Default Example Description min now() - 365 days min: \"2023-01-31\" Ensures that all generated values are greater than or equal tomin
max now() max: \"2023-12-31\" Ensures that all generated values are less than or equal to max
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (0001-01-01, 1582-10-15, 1970-01-01, 9999-12-31) (reference)
"},{"location":"generators/generators/#timestamp","title":"Timestamp","text":"Option Default Example Description min now() - 365 days min: \"2023-01-31 23:10:10\" Ensures that all generated values are greater than or equal tomin
max now() max: \"2023-12-31 23:10:10\" Ensures that all generated values are less than or equal to max
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (0001-01-01 00:00:00, 1582-10-15 23:59:59, 1970-01-01 00:00:00, 9999-12-31 23:59:59)
"},{"location":"generators/generators/#binary","title":"Binary","text":"Option Default Example Description minLen 1 minLen: \"2\" Ensures that all generated array of bytes have at least lengthminLen
maxLen 20 maxLen: \"15\" Ensures that all generated array of bytes have at most length maxLen
enableNull false enableNull: \"true\" Enable/disable null values being generated Edge cases: (\"\", \"\\n\", \"\\r\", \"\\t\", \" \", \"\\u0000\", \"\\ufff\", -128, 127)
"},{"location":"generators/generators/#list","title":"List","text":"Option Default Example Description listMinLen 0 listMinLen: \"2\" Ensures that all generated lists have at least lengthlistMinLen
listMaxLen 5 listMaxLen: \"15\" Ensures that all generated lists have at most length listMaxLen
enableNull false enableNull: \"true\" Enable/disable null values being generated"},{"location":"get-started/docker/","title":"Run Data Caterer","text":""},{"location":"get-started/docker/#docker","title":"Docker","text":""},{"location":"get-started/docker/#quick-start","title":"Quick start","text":"git clone git@github.com:pflooky/data-caterer-docs.git\ncd docs/sample/docker\nDATA_SOURCE=postgres docker-compose up -d datacaterer\n
You can change DATA_SOURCE
to one of the following:
If you want to test it out with your own setup, you can alter the corresponding files under docs/sample/docker/data
"},{"location":"get-started/docker/#run-with-multiple-data-sources-postgres-and-csv-file","title":"Run with multiple data sources (Postgres and CSV File)","text":"PLAN=plan/scenario-based DATA_SOURCE=postgres docker-compose up -d datacaterer\nhead data/custom/csv/transactions/part-00000*\nsample_account=$(head -1 data/custom/csv/transactions/part-00000* | awk -F \",\" '{print $1}')\ndocker exec docker-postgres-1 psql -Upostgres -d customer -c \"SELECT * FROM account.accounts WHERE account_number='$sample_account'\"\n
You should be able to see the linked data between Postgres and the CSV file created along with 1 to 10 records per account_id, name combination in the CSV file.
"},{"location":"get-started/docker/#run-with-custom-data-sources","title":"Run with custom data sources","text":"data/custom/plan
data/custom/task
data/custom/application.conf
DATA_SOURCE=<data source name> docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#generate-plan-and-tasks","title":"Generate plan and tasks","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=false ENABLE_GENERATE_PLAN_AND_TASKS=true DATA_SOURCE=postgresdvd docker-compose up -d datacaterer\ncat data/custom/generated/plan/plan_*\n
"},{"location":"get-started/docker/#generate-data-with-record-tracking","title":"Generate data with record tracking","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=true ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_RECORD_TRACKING=true DATA_SOURCE=postgresdvd PLAN=generated/plan/$(ls data/custom/generated/plan/ | grep plan | head -1 | awk -F \" \" '{print $NF}' | sed 's/\\.yaml//g') docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#delete-the-generated-data","title":"Delete the generated data","text":"APPLICATION_CONFIG_PATH=/opt/app/custom/application-dvd.conf ENABLE_GENERATE_DATA=false ENABLE_GENERATE_PLAN_AND_TASKS=false ENABLE_DELETE_GENERATED_RECORDS=true DATA_SOURCE=postgresdvd docker-compose up -d datacaterer\n
"},{"location":"get-started/docker/#helm","title":"Helm","text":"Link to sample helm on GitHub here
Update the configuration to your own data connections and configuration.
git clone git@github.com:pflooky/data-caterer-docs.git\nhelm install data-caterer ./data-caterer-docs/helm/data-caterer\n
"},{"location":"sample/","title":"Samples","text":"Below are examples of different types of plans and tasks that can be helpful when trying to create your own. You can use these as a template or to search for something related to your particular use case.
"},{"location":"sample/#base-concept","title":"Base Concept","text":"The execution of the data generator is based on the concept of plans and tasks. A plan represent the set of tasks that need to be executed, along with other information that spans across tasks, such as foreign keys between data sources. A task represent the component(s) of a data source and its associated metadata so that it understands what the data should look like and how many steps (sub data sources) there are (i.e. tables in a database, topics in Kafka). Tasks can define one or more steps.
"},{"location":"sample/#plan","title":"Plan","text":""},{"location":"sample/#foreign-keys","title":"Foreign Keys","text":"Define foreign keys across data sources in your plan to ensure generated data can match Link to associated task 1 Link to associated task 2
"},{"location":"sample/#task","title":"Task","text":"Data Source Type Data Source Sample Task Notes Database Postgres Sample Database MySQL Sample Database Cassandra Sample File CSV Sample File JSON Sample Contains nested schemas and use of SQL for generated values File Parquet Sample Partition by year column Kafka Kafka Sample Specific base schema to be used, define headers, key, value, etc. JMS Solace Sample JSON formatted message HTTP PUT Sample JSON formatted PUT body"},{"location":"sample/#configuration","title":"Configuration","text":"Basic configuration
"},{"location":"sample/docker/","title":"Data Caterer - Docker Compose","text":"If you want to try out data caterer generating data for various data sources, you do use the following docker-compose file.
All you need to do is define which data source you want to run with via a command like below:
DATA_SOURCE=postgres docker-compose up -d datacaterer\n
You can change DATA_SOURCE
to one of the following: - postgres - mysql - cassandra - solace - kafka - http