IDEMSInternational · istride · Sep 17, 2024 · Sep 17, 2024 · Sep 23, 2024 · Sep 24, 2024
diff --git a/README.md b/README.md
@@ -16,7 +16,6 @@ The CLI supports the following subcommands:
 - `create_flows`: create RapidPro flows (in JSON format) from spreadsheets using content index
 - `flows_to_sheets`: convert RapidPro flows (in JSON format) into spreadsheets
 - `convert`: save input spreadsheets as JSON
-- `save_data_sheets`: save input spreadsheets as nested JSON using content index - an experimental feature that is likely to change.
 
 Full details of the available options for each can be found via the help feature:
 

diff --git a/docs/notation.md b/docs/notation.md
@@ -0,0 +1,205 @@
+# Spreadsheet notation
+
+Summary of spreadsheet notation used to convert sheets into a nested data structure (JSON). A series of data tables will be shown alongside the resultant JSON structure.
+
+# Books
+
+A container for multiple tables. Also known as a spreadsheet or workbook. A book is converted to an object containing a property for each table. The property key is the name of the sheet; the value is the converted contents of the sheet.
+
+For example, given an Excel workbook with two sheets ("table1" and "table2"), the resulting JSON will be:
+
+```json
+{
+  "table1": [],
+  "table2": []
+}
+```
+
+# Tables
+
+Also known as a sheet in a spreadsheet (or workbook).
+
+The contents of a table are converted to a sequence of objects - corresponding to rows in the sheet. Each object will have keys corresponding to the column headers of the sheet, and values corresponding to a particular row in the sheet.
+
+| a  | b  |
+|----|----|
+| v1 | v2 |
+
+`data`
+
+```json
+{
+  "data": [
+    {"a": "v1", "b": "v2"}
+  ]
+}
+```
+
+This means that the first row of every table should be a header row that specifies the name of each column.
+
+# Basic types
+
+Refers to the following value types in JSON: `string`, `number`, `true` and `false`.
+
+| string | number | true | false |
+|--------|--------|------|-------|
+| hello  | 123    | true | false |
+
+`basic_types`
+
+```json
+{
+  "basic_types": [
+    {
+      "string": "hello",
+      "number": 123,
+      "true": true,
+      "false": false
+    }
+  ]
+}
+```
+
+The JSON type `null` is not represented because an empty cell is assumed to be equivalent to the empty string ("").
+
+# Sequences
+
+An ordered sequence of items. Also known as lists or arrays.
+
+| seq1 | seq1 | seq2.1 | seq2.2 | seq3     | seq4               |
+|------|------|--------|--------|----------|--------------------|
+| v1   | v2   | v1     | v2     | v1 \| v2 | v1 ; v2 \| v3 ; v4 |
+
+`sequences`
+
+```json
+{
+  "sequences": [
+    {
+      "seq1": ["v1", "v2"],
+      "seq2": ["v1", "v2"],
+      "seq3": ["v1", "v2"]
+      "seq4": [["v1", "v2"], ["v3", "v4"]]
+    }
+  ]
+}
+```
+
+`seq1`, `seq2` and `seq3` are equivalent. In all cases, the order of items is specified from left to right.
+
+`seq1` uses a 'wide' layout, where the column header is repeated and each column holds one item in the sequence. Values from columns with the same name are collected into a sequence in the resulting JSON object.
+
+`seq2` is similar to `seq1`, but the index of each item is specified explicitly.
+
+`seq3` uses an 'inline' layout, where the sequence is defined as a delimited string within a single cell of the table. The default delimiter is a vertical bar or pipe character ('|').
+
+Two levels of nesting are possible within a cell, as shown by `seq4` - a list of lists. This could be used to model a list of key-value pairs, which could easily be converted to an object (map / dictionary). The default delimiter for second-level sequences is a semi-colon (';').
+
+The interpretation of delimiter characters can be skipped by escaping the delimiter characters. An escape sequence begins with a backslash ('\\') and ends with the character to be escaped. For example, to escape a vertical bar, use: '\\|'.
+
+# Objects
+
+An unordered collection of key-value pairs (properties). Also known as maps, dictionaries or associative arrays.
+
+| obj1.key1 | obj1.key2 | obj2                   |
+|-----------|-----------|------------------------|
+| v1        | v2        | key1 ; v1 \| key2 ; v2 |
+
+`objects`
+
+```json
+{
+  "objects": [
+    {
+      "obj1": {
+        "key1": "v1",
+        "key2": "v2"
+      },
+      "obj2": [
+        ["key1", "v1"],
+        ["key2", "v2"]
+      ]
+    }
+  ]
+}
+```
+
+`obj1` and `obj2` are slightly different, but can be interpreted in the same way, as a list of key-value pairs.
+
+A wide layout is used for `obj1`, where one or more column headers use a dotted 'keypath' notation to identify a particular property key belonging to a particular object, and the corresponding cells in subsequent rows contain the values for that property. The dotted keypath notation can be used to access properties at deeper levels of nesting e.g. `obj.key.subkey.etc`.
+
+An inline layout is used for `obj2`, where properties are defined as a sequence of key-value pairs. The delimiter of properties is a vertical bar or pipe character - same as top-level sequences. The delimiter of keys and values is a semi-colon character - same as second-level sequences.
+
+All the previous notation can be combined to create fairly complicated structures.
+
+| obj1.key1              | obj1.key1                      |
+|------------------------|--------------------------------|
+| 1 ; 2 ; 3 \| one ; two | active ; true \| debug ; false |
+
+`nesting`
+
+```json
+{
+  "nesting": [
+    {
+      "obj1": {
+        "key1": [
+          [
+            [1, 2, 3],
+            ["one", "two"]
+          ],
+          [
+            ["active", true],
+            ["debug", false]
+          ]
+        ],
+      }
+    }
+  ]
+}
+```
+
+# Templates
+
+Table cells may contain Jinja templates. A cell is considered a template if it contains template placeholders anywhere within it. There are three types of template placeholders:
+
+- `{{ ... }}`
+- `{% ... %}`
+- `{@ ... @}`
+
+When converting between spreadsheets and JSON, templates will not be interpreted in any way, just copied verbatim. This means that sequence delimiters do not need to be escaped if they exist within a template. It is intended for templates to eventually be interpreted at a later stage, during further processing.
+
+# Metadata
+
+Information that would otherwise be lost during the conversion from spreadsheets to JSON is stored as metadata - in a top-level property with key `_idems`. The metadata property is intended to be 'hidden' and unlikely to be shared by any sheet name.
+
+The original header names for each sheet are held as metadata to direct the conversion process from JSON back to spreadsheet. The original headers preserve the order of columns and whether a wide or inline layout was used.
+
+
+| seq1 | seq1 | seq2     |
+|------|------|----------|
+| v1   | v2   | v1 \| v2 |
+
+`sequences`
+
+```json
+{
+  "_idems": {
+    "tabulate": {
+      "sequences": {
+        "headers": [
+          "seq1",
+          "seq1",
+          "seq2"
+        ]
+      }
+    }
+  }
+  "sequences": [
+    {
+      "seq1": ["v1", "v2"],
+      "seq2": ["v1", "v2"]
+    }
+  ]
+}
+```
diff --git a/pyproject.toml b/pyproject.toml
@@ -34,9 +34,11 @@ dependencies = [
     "google-api-python-client~=2.6.0",
     "google-auth-oauthlib~=0.4.4",
     "networkx~=2.5.1",
+    "odfpy",
     "openpyxl",
     "pydantic >= 2",
-    "tablib[ods]>=3.1.0",
+    "python-benedict",
+    "tablib @ git+https://github.com/istride/[email protected]",
 ]
 
 [project.urls]

diff --git a/src/rpft/cli.py b/src/rpft/cli.py
@@ -39,16 +39,16 @@ def flows_to_sheets(args):
     )
 
 
-def save_data_sheets(args):
-    output = converters.save_data_sheets(
-        args.input,
-        None,
-        args.format,
-        data_models=args.datamodels,
-        tags=args.tags,
-    )
-    with open(args.output, "w", encoding="utf-8") as export:
-        json.dump(output, export, indent=4)
+def uni_to_sheets(args):
+    with open(args.output, "wb") as handle:
+        handle.write(converters.uni_to_sheets(args.input))
+
+
+def sheets_to_uni(args):
+    data = converters.sheets_to_uni(args.input)
+
+    with open(args.output, "w", encoding="utf-8") as f:
+        json.dump(data, f, indent=2)
 
 
 def create_parser():
@@ -64,7 +64,8 @@ def create_parser():
     _add_create_command(sub)
     _add_convert_command(sub)
     _add_flows_to_sheets_command(sub)
-    _add_save_data_sheets_command(sub)
+    _add_uni_to_sheets_command(sub)
+    _add_sheets_to_uni_command(sub)
 
     return parser
 
@@ -77,25 +78,13 @@ def _add_create_command(sub):
     )
 
     parser.set_defaults(func=create_flows)
-    _add_content_index_arguments(parser)
-
-
-def _add_content_index_arguments(parser):
     parser.add_argument(
-        "--datamodels",
+        "input",
         help=(
-            "name of the module defining user data models underlying the data sheets,"
-            " e.g. if the model definitions reside in"
-            " ./myfolder/mysubfolder/mymodelsfile.py, then this argument should be"
-            " myfolder.mysubfolder.mymodelsfile"
+            "paths to XLSX or JSON files, or directories containing CSV files, or"
+            " Google Sheets IDs i.e. from the URL; inputs should be of the same format"
         ),
-    )
-    parser.add_argument(
-        "-f",
-        "--format",
-        choices=["csv", "google_sheets", "json", "xlsx"],
-        help="input sheet format",
-        required=True,
+        nargs="+",
     )
     parser.add_argument(
         "-o",
@@ -114,12 +103,20 @@ def _add_content_index_arguments(parser):
         nargs="*",
     )
     parser.add_argument(
-        "input",
+        "--datamodels",
         help=(
-            "paths to XLSX or JSON files, or directories containing CSV files, or"
-            " Google Sheets IDs i.e. from the URL; inputs should be of the same format"
+            "name of the module defining user data models underlying the data sheets,"
+            " e.g. if the model definitions reside in"
+            " ./myfolder/mysubfolder/mymodelsfile.py, then this argument should be"
+            " myfolder.mysubfolder.mymodelsfile"
         ),
-        nargs="+",
+    )
+    parser.add_argument(
+        "-f",
+        "--format",
+        choices=["csv", "google_sheets", "json", "xlsx"],
+        help="input sheet format",
+        required=True,
     )
 
 
@@ -180,14 +177,37 @@ def _add_flows_to_sheets_command(sub):
     )
 
 
-def _add_save_data_sheets_command(sub):
+def _add_uni_to_sheets_command(sub):
+    parser = sub.add_parser(
+        "uni-to-sheets",
+        help="convert JSON to sheets",
+    )
+    parser.set_defaults(func=uni_to_sheets)
+    parser.add_argument(
+        "input",
+        help=("location of input JSON file"),
+    )
+    parser.add_argument(
+        "output",
+        help=("location where sheets will be saved"),
+    )
+
+
+def _add_sheets_to_uni_command(sub):
     parser = sub.add_parser(
-        "save_data_sheets",
-        help="save data sheets referenced in context index as nested json",
+        "sheets-to-uni",
+        help="convert sheets to nested JSON",
     )
 
-    parser.set_defaults(func=save_data_sheets)
-    _add_content_index_arguments(parser)
+    parser.set_defaults(func=sheets_to_uni)
+    parser.add_argument(
+        "input",
+        help=("location of workbook"),
+    )
+    parser.add_argument(
+        "output",
+        help=("location where JSON will be saved"),
+    )
 
 
 if __name__ == "__main__":