You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importcalendarimporttimecurrent_GMT=time.gmtime()
time_stamp=calendar.timegm(current_GMT)
defgenerate():
withopen("lens_post.csv", "w+") asfd:
foriinrange(0,20000000):
rows= [i, str(i), i, i, i, i, str(i), str(time_stamp)]
fd.write('0x%0.2X,https://data.lens.phaver.com/api/lens/posts/%s,0xa31FF85E840ED117E172BC9Ad89E55128A999205%0.2X,0xa31FF85E840ED117E172BC9Ad89E55128A999205%0.2X,0xa31FF85E840ED117E172BC9Ad89E55128A999205%0.2X,0xa31FF85E840ED117E172BC9Ad89E55128A999205%0if__name__=="__main__":
generate()
Compress the Data
frompyarrowimportcsv, parquetfromdatetimeimportdatetimedeffile_to_data_frame_to_parquet(local_file: str, parquet_file: str) ->None:
table=csv.read_csv(local_file)
parquet.write_table(table, parquet_file, compression="gzip")
if__name__=="__main__":
local_csv_file="lens_post.csv"t1=datetime.now()
file_to_data_frame_to_parquet(local_csv_file, "lens_post.gz.parquet")
t2=datetime.now()
took=t2-t1print(f"it took {took} seconds to write csv to parquet.")
Report
file
rows
fie size
lens_post.csv
2000k
5.2G
lens_post.gz.parquet
2000k
320M
compress the lens_post.csv to lens_post.gz.parquet with 4C8G
it took 0:00:46.190250 seconds to write csv to parquet.
compression
storage
storage_cost
computing cost
total
N
5.2GB
5.2 * $5.1 ~ $27
0
$27
Y
320M
0.32 * $5.1 ~ $1.5
$0.98 /60 ~$0.016
$1.516
the pricing of 4c8g is $0.98/h in aws
the storage cost in aweave is $5.1/G
Background
db3 network uses a data-rollup technology to reduce the Arweave cost by compressing the structure data and we expect a 10x storage cost reduction
Experiment
Best Case
we use the following schema to store data and the schema comes from https://arweave.app/tx/rtstthXo8T8wG1odJPAto9vfMCYUqr6Grp6j8KfVtuM
Generate CSV Data
Compress the Data
Report
compress the
lens_post.csv
tolens_post.gz.parquet
with 4C8Gthe pricing of 4c8g is
$0.98/h
in awsthe storage cost in aweave is $5.1/G
Reference
The text was updated successfully, but these errors were encountered: