DSBulk DELETE can not accept any ranges on the clustering column when used within -query #490

biswadipseth1980 · 2024-03-12T20:16:43Z

DSBulk DELETE can not accept any ranges on the clustering column where as SELECT does
Also cqlsh can work with the same

Consider this is the table from where customer needs to delete the data

CREATE TABLE live_mom_transaction (
str_num text,
day_no int,
transaction_uid text,
trans_typ text,
sequence_no text,
created_date timestamp,
raw_data text,
updated_date timestamp,
PRIMARY KEY ((str_num, day_no), transaction_uid, trans_typ, sequence_no)
) WITH CLUSTERING ORDER BY (transaction_uid ASC, trans_typ ASC, sequence_no ASC)

Executing a SELECT query with conditions on these columns works in cqlsh, but attempts to perform a similar operation with DSBulk result in failure due to the inability to specify a range on the clustering key sequence_no. For instance, in cqlsh, the following query is successful:```

cqlsh> Select *  FROM biswa_ks.live_mom_transaction WHERE str_num='006' and day_no=7 and transaction_uid='14d6f78c-c468-4b71-84ce-0a8009bf3dc1' and trans_typ= 'SALE' and  sequence_no > '0';

 str_num | day_no | transaction_uid                      | trans_typ | sequence_no | created_date                    | raw_data                       | updated_date
---------+--------+--------------------------------------+-----------+-------------+---------------------------------+--------------------------------+---------------------------------
     006 |      7 | 14d6f78c-c468-4b71-84ce-0a8009bf3dc1 |      SALE |     seq-188 | 2024-06-05 00:55:41.789000+0000 | {"item":"sample","price":9.99} | 2024-06-05 01:55:41.789000+0000

However, trying to replicate this deletion with DSBulk, using the command below, results in an error because DSBulk does not accept a range condition on the clustering column sequence_no, even though cqlsh can handle it:`

cat /home/automaton/data/output-000001.csv|./dsbulk load -query "DELETE FROM biswa_ks.live_mom_transaction WHERE str_num=:str_num and day_no=:day_no and transaction_uid=:transaction_uid and trans_typ=:trans_typ and sequence_no > '0' "
Operation directory: /home/automaton/dsbulk-1.9.0/bin/logs/LOAD_20240312-155253-850242

Operation LOAD_20240312-155253-850242 failed: Missing required primary key column sequence_no from schema.mapping or schema.query.

This also means DSBulk generated deletes (using full PK & not cluster range) may create a large amount of tombstones e.g. if there are 10 sequence_no on a avg per str_num, day_no, transaction_uid, trans_typ combo, then it will create 10x tombstones

same select works in DSBULK

[dse_test_kmip11:0] automaton@ip-10-166-76-196:~/dsbulk-1.9.0/bin$ ./dsbulk unload -url /home/automaton/data -query "Select *  FROM biswa_ks.live_mom_transaction WHERE str_num='006' and day_no=7 and transaction_uid='068cfdf3-ae5b-48f1-a8de-27b70e2c3daf' and trans_typ= 'SALE' and  sequence_no > '0' "
Operation directory: /home/automaton/dsbulk-1.9.0/bin/logs/UNLOAD_20240312-202339-691820
total | failed | rows/s | p50ms | p99ms | p999ms
    1 |      0 |      3 | 20.51 | 20.58 |  20.58
Operation UNLOAD_20240312-202339-691820 completed successfully in less than one second.

slack conversation is there in
https://datastax.slack.com/archives/C6B5L9GQN/p1710264412970939

The text was updated successfully, but these errors were encountered:

biswadipseth1980 changed the title ~~DSBulk can not accept any ranges on the clustering column where as cqlsh can work with the same~~ DSBulk DELETE can not accept any ranges on the clustering column Mar 12, 2024

msmygit changed the title ~~DSBulk DELETE can not accept any ranges on the clustering column~~ DSBulk DELETE can not accept any ranges on the clustering column when used within -query Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSBulk DELETE can not accept any ranges on the clustering column when used within -query #490

DSBulk DELETE can not accept any ranges on the clustering column when used within -query #490

biswadipseth1980 commented Mar 12, 2024 •

edited

Loading

DSBulk DELETE can not accept any ranges on the clustering column when used within -query #490

DSBulk DELETE can not accept any ranges on the clustering column when used within -query #490

Comments

biswadipseth1980 commented Mar 12, 2024 • edited Loading

biswadipseth1980 commented Mar 12, 2024 •

edited

Loading