Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SyntaxError: --params is not a correctly formatted JSON string or a JSON serializable dictionary" When running LTV lookalike notebook #71

Open
rain-ml opened this issue Jan 18, 2022 · 2 comments

Comments

@rain-ml
Copy link

rain-ml commented Jan 18, 2022

Hi all,

I'm running the LTV lookalike notebook: https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/retail/ltv/bqml/notebooks/bqml_automl_ltv_activate_lookalike.ipynb on a Vertex AI user-managed notebook, but I got the error as written in the title.

I got the error when running the first cell of the Aggregate per day per customer chapter, i.e. the cell starting with this code:

%%bigquery --params $LTV_PARAMS --project $PROJECT_ID

DECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;
DECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY;

CREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS
SELECT
  customer_id,
  order_day,
  ROUND(day_value_after_returns, 2) AS value,
  day_qty_after_returns as qty_articles,
  day_num_returns AS num_returns,
  CEIL(avg_time_to_return) AS time_to_return
FROM (

.....

Does anyone know how to fix this?

Here is the full error message I got:


Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 515, in _cell_magic
    params_option_value, rest_of_args = _split_args_line(line)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 720, in _split_args_line
    tree = scanner.input_line()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 203, in input_line
    options = self.option_list()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 264, in option_list
    option = self.params_option()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 319, in params_option
    opt_value = self.py_dict()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 333, in py_dict
    dict_items = self.dict_items()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 347, in dict_items
    item = self.dict_item()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 371, in dict_item
    value = self.py_value()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 423, in py_value
    self.error(msg, exc_type=QueryParamsParseError)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 193, in error
    raise exc_type(message)

QueryParamsParseError: Unexpected token type UNKNOWN at position 27.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "/tmp/ipykernel_17444/3535280929.py", line 1, in <module>
    get_ipython().run_cell_magic('bigquery', '--params $LTV_PARAMS --project $PROJECT_ID', '\nDECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;\nDECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY;\n\nCREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS\nSELECT\n  customer_id,\n  order_day,\n  ROUND(day_value_after_returns, 2) AS value,\n  day_qty_after_returns as qty_articles,\n  day_num_returns AS num_returns,\n  CEIL(avg_time_to_return) AS time_to_return\nFROM (\n  SELECT\n    customer_id,\n    order_day,\n    SUM(order_value_after_returns) AS day_value_after_returns,\n    STDDEV(SUM(order_value_after_returns)) OVER(PARTITION BY customer_id ORDER BY SUM(order_value_after_returns)) AS stdv_value,\n    SUM(order_qty_after_returns) AS day_qty_after_returns,\n    STDDEV(SUM(order_qty_after_returns)) OVER(PARTITION BY customer_id ORDER BY SUM(order_qty_after_returns)) AS stdv_qty,\n    CASE\n      WHEN MIN(order_min_qty) < 0 THEN count(1)\n      ELSE 0\n    END AS day_num_returns,\n    CASE\n      WHEN MIN(order_min_qty) < 0 THEN AVG(time_to_return)\n      ELSE NULL\n    END AS avg_time_to_return\n  FROM (\n    SELECT \n      customer_id,\n      order_id,\n      -- Gives the order date vs return(s) dates.\n      MIN(transaction_date) AS order_day,\n      MAX(transaction_date) AS return_final_day,\n      DATE_DIFF(MAX(transaction_date), MIN(transaction_date), DAY) AS time_to_return,\n      -- Aggregates all products in the order \n      -- and all products returned later.\n      SUM(qty * unit_price) AS order_value_after_returns,\n      SUM(qty) AS order_qty_after_returns,\n      -- If negative, order has qty return(s).\n      MIN(qty) order_min_qty\n    FROM \n      `ltv_ecommerce.10_orders`\n    GROUP BY\n      customer_id,\n      order_id)\n  GROUP BY\n    customer_id,\n    order_day)\nWHERE\n  -- [Optional] Remove dates with outliers per a customer.\n  (stdv_value < MAX_STDV_MONETARY\n    OR stdv_value IS NULL) AND\n  (stdv_qty < MAX_STDV_QTY\n    OR stdv_qty IS NULL);\n\n\nSELECT * FROM `ltv_ecommerce.20_aggred` LIMIT 5;\n')

  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2419, in run_cell_magic
    result = fn(*args, **kwargs)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 521, in _cell_magic
    raise rebranded_error from exc

  File "<string>", line unknown
SyntaxError: --params is not a correctly formatted JSON string or a JSON serializable dictionary
@polong-lin
Copy link
Member

Hi rain-ml,
It looks like you may not have run one of the prior cells that creates the LTV_PARAMS dictionary:

LTV_PARAMS = {
    'WINDOW_LENGTH': 0,
    'WINDOW_STEP': 30,
    'WINDOW_STEP_INITIAL': 90,
    'LENGTH_FUTURE': 30,
    'MAX_STDV_MONETARY': 500,
    'MAX_STDV_QTY': 100, 
    'TOP_LTV_RATIO': 0.2
}
LTV_PARAMS

In other words, in the first line of your code, %%bigquery is expecting to import a dictionary which passes in variables that can then be used in the SQL code below it. This is why your output says:

--params is not a correctly formatted JSON string or a JSON serializable dictionary

Once you create the LTV_PARAMS dictionary and use it as part of %%bigquery, it should then allow you to reference the two variables @MAX_STDV_MONETARY @MAX_STDV_QTY which were set in LTV_PARAMS above:

DECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;
DECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY; 

If you don't like passing in LTV_PARAMS, you can also remove it and replace it with values directly, such as:

%%bigquery --project $PROJECT_ID

DECLARE MAX_STDV_MONETARY INT64 DEFAULT 500;
DECLARE MAX_STDV_QTY INT64 DEFAULT 100;

CREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS
SELECT
  customer_id,
  order_day,
  ROUND(day_value_after_returns, 2) AS value,
  day_qty_after_returns as qty_articles,
  day_num_returns AS num_returns,
  CEIL(avg_time_to_return) AS time_to_return
FROM (

.....

Hope this helps

@rain-ml
Copy link
Author

rain-ml commented Jan 19, 2022

Hi polong-lin,

I did run the LTV_PARAMS cell before referencing it and that's how I got the original error.

In fact, If I didn't run the LTV_PARAMS cell, here's the error message:
NameError: Parameter expansion failed, undefined variable "LTV_PARAMS".

Anyways, I just hard code it for now like you suggested without using params.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants