Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize dynamic masking #5

Closed
wants to merge 1,467 commits into from
Closed

optimize dynamic masking #5

wants to merge 1,467 commits into from

Conversation

Wzhipeng
Copy link

Before submitting the PR, please make sure you do the following
  • If you're attempting to fix a translation issue, please submit your changes to our POEditor project instead of opening a PR.

Tests

  • Run the frontend and Cypress end-to-end tests with yarn lint && yarn test)
  • If there are changes to the backend codebase, run the backend tests with clojure -X:dev:test
  • Sign the Contributor License Agreement
    (unless it's a tiny documentation change).

metabase-bot bot and others added 30 commits March 27, 2023 15:38
…base#29566) (metabase#29574)

* Configure foreign key support for Mongo to allow implicit joins

Co-authored-by: metamben <[email protected]>
… when moving between dashboards (metabase#29497) (metabase#29596)

* Add a repro for metabase#26230

* Move the prep phase to the `beforeEach`

* Use `visitDashboard` helper

* Wait for the second dashboard to load

* You don't have to use `within` here

* Add a comment

* Add main repro assertion

* Greatly simplify creation of the markdown card

* Update the test title

---------

Co-authored-by: Mahatthana (Kelvin) Nomsawadi <[email protected]>
Co-authored-by: Nemanja <[email protected]>
* Updating to cypress 12

* updating deps

* Updating Tests. Be Green Plz

* removing custom test id from select button

* Select Button Tuning

Co-authored-by: Nick Fitzpatrick <[email protected]>
* fix typo

* fix links

---------

Co-authored-by: Noah Moss <[email protected]>
* Don't log in bigquery results hotpath

Right now we log in the parser of bigquery results for unrecognized
types. But the problem is that we log for each _value_ and not each
column. This is causing an _enormous_ amount of logs and performance
penalty.

See
- metabase#29118 (performance)
- metabase#28868 (filling disk space)

This log was added between 45.1 and 45.3

```diff
❯ git diff v0.45.1..v0.45.3 modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/**
diff --git a/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj b/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
index a0d8081c30..f367199b55 100644
--- a/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
+++ b/modules/drivers/bigquery-cloud-sdk/src/metabase/driver/bigquery_cloud_sdk/query_processor.clj
@@ -22,7 +22,7 @@
             [metabase.util :as u]
             [metabase.util.date-2 :as u.date]
             [metabase.util.honeysql-extensions :as hx]
-            [metabase.util.i18n :refer [tru]]
+            [metabase.util.i18n :refer [trs tru]]
             [pretty.core :refer [PrettyPrintable]]
             [schema.core :as s])
   (:import [com.google.cloud.bigquery Field$Mode FieldValue]
@@ -88,7 +88,8 @@
     (parse-fn v)))

 (defmethod parse-result-of-type :default
-  [_ column-mode _ v]
+  [column-type column-mode _ v]
+  (log/warn (trs "Warning: missing type mapping for parsing BigQuery results of type {0}." column-type))
   (parse-value column-mode v identity))
```

The result is that selecting 50,000 rows for download in excel:

| version                      | time       |
|------------------------------|------------|
| 0.45.1                       | 28 seconds |
| 0.45.3                       | 52 seconds |
| 0.45.3 with logging disabled | 30 seconds |

(disable logging by adding `<Logger
name="metabase.driver.bigquery-cloud-sdk.query-processor"
level="ERROR"/>` and `-Dlog4j2.configurationFile=log4j2.xml` to jar
startup)

For the query (3 string columns, 5 rows):

```sql
SELECT game_id, first_name, last_name
FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr`
LIMIT 5
```

BEFORE:

```
```
2023-03-31 17:17:52,146 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,147 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,147 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,149 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,150 WARN bigquery-cloud-sdk.query-processor :: Warning: missing type mapping for parsing BigQuery results of type STRING.
2023-03-31 17:17:52,155 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 795.2 ms (6 DB calls) App DB connections: 0/10 Jetty threads: 4/50 (2 idle, 0 queued) (192 total active threads) Queries in flight: 0 (0 queued)
```

Note this is 15 logs (3 columns x 5 rows)

AFTER:

```
2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column game_id of type STRING.
2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column first_name of type STRING.
2023-03-31 17:19:15,694 WARN driver.bigquery-cloud-sdk :: Warning: missing type mapping for parsing BigQuery results column last_name of type STRING.
2023-03-31 17:19:15,757 DEBUG middleware.log :: POST /api/dataset 202 [ASYNC: completed] 973.5 ms (6 DB calls) App DB connections: 0/10 Jetty threads: 4/50 (3 idle, 0 queued) (193 total active threads) Queries in flight: 0 (0 queued)
```

* unused require to appease our overlords

Co-authored-by: dpsutton <[email protected]>
* Create `cache_info` table in our persisted schemas

```sql
-- postgres/redshift
test-data=# select * from metabase_cache_424a9_379.cache_info ;
       key        |                value
------------------+--------------------------------------
 settings-version | 1
 created-at       | 2023-03-28
 instance-uuid    | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e
 instance-name    | Metabase Test
(4 rows)
```

```sql
--mysql
mysql> select * from metabase_cache_424a9_435.cache_info;
+------------------+--------------------------------------+
| key              | value                                |
+------------------+--------------------------------------+
| settings-version | 1                                    |
| created-at       | 2023-03-28                           |
| instance-uuid    | 407e4ba8-2bab-470f-aeb5-9fc63fd18c4e |
| instance-name    | Metabase Test                        |
+------------------+--------------------------------------+
```

our key values in v1:
```clojure
(defn kv-table-values
  "Version 1 of the values to go in the key/value table `cache_info` table."
  []
  [{:key   "settings-version"
    :value "1"}
   {:key   "created-at"
    ;; "2023-03-28"
    :value (.format (LocalDate/now) DateTimeFormatter/ISO_LOCAL_DATE)}
   {:key   "instance-uuid"
    :value (public-settings/site-uuid)}
   {:key   "instance-name"
    :value (public-settings/site-name)}])
```

This will enable us to delete cached schemas in shared databases (cloud
versions like redshift) without risking wiping out a cached schema
during an active test run.

The code to delete the schemas will be something like,

```clojure
(def spec (sql-jdbc.conn/connection-details->spec
           :redshift
           {:host "xxx.us-east-1.reshift.amazonaws.com"
            :db "db"
            :port 5439
            :user "user"
            :password "password"}))

(let [days-prior 1 ;; probably 2 to handle crossing day time. Can also adjust to 6 hours, etc
      threshold (.minus (java.time.LocalDate/now)
                        days-prior
                        java.time.temporal.ChronoUnit/DAYS)]
  (doseq [schema (->> ["select nspname from pg_namespace where nspname like 'metabase_cache%'"]
                      (jdbc/query spec)
                      (map :nspname))]
    (let [[{created :value}] (jdbc/query spec (sql/format {:select [:value]
                                                           :from [(keyword schema "cache_info")]
                                                           :where [:= :key "created-at"]}
                                                          {:dialect :ansi}))]
      (when created
        (let [creation-day (java.time.LocalDate/parse created)]
          (when (.isBefore creation-day threshold)
            (jdbc/execute! spec [(format "drop schema %s cascade" schema)])))))))
```

or if we want to do it in a single query:

```clojure
schemas=> (let [days-prior 1
                threshold  (.minus (java.time.LocalDate/now)
                                   days-prior
                                   java.time.temporal.ChronoUnit/DAYS)]
            (let [schemas (->> ["select nspname
                         from pg_namespace
                         where nspname like 'metabase_cache%'"]
                               (jdbc/query spec)
                               (map :nspname))]
              (jdbc/with-db-connection [conn spec]
                (jdbc/execute! conn [(format "set search_path= %s;" (str/join ", " schemas))])
                (let [sql               (sql/format {:select [:schema :created-at]
                                                     :from   {:union-all
                                                              (for [schema schemas]
                                                                {:select [[[:inline schema] :schema]
                                                                          [{:select [:value]
                                                                            :from   [(keyword schema "cache_info")]
                                                                            :where  [:= :key [:inline "created-at"]]}
                                                                           :created-at]]})}}
                                                    {:dialect :ansi})
                      old?              (fn [{create-str :created-at}]
                                          (let [created (java.time.LocalDate/parse create-str)]
                                            (.isBefore created threshold)))
                      schemas-to-delete (->> (jdbc/query spec sql)
                                             (filter old?)
                                             (map :schema))]
                  schemas-to-delete))))
("metabase_cache_424a9_503") ;; when there are any, string join them and delete them as above
```

* Use Instants so we can compare by hours instead of just days

* require toucan as db
* Clean cache after branch closes

* Add whitespace

* Add whitespace

* Finalize caches cleanup workflow

---------

Co-authored-by: Luis Paolini <[email protected]>
Co-authored-by: Nemanja <[email protected]>
…e#29793)

* Increase the BE timeout

* Increase the drivers timeout [ci skip]
@Wzhipeng Wzhipeng closed this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.