You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found that using PyHive and HiveThriftServer2 to load data into OpenOA is unacceptably slow (~1 minute to load La Haute Borne data into a PlantData). This ticket aims to overcome this issue by switching the ENTR constructor to use PySpark directly instead of PyHive, and ensure reasonably correct optimizations are made (e.g., Apache Arrow, number and size of Spark executors which makes sense for the demo).
Impacted Repositories:
entr/entr_runtime
remove automatic pyhive at startup
tune spark standalone cluster
entr/openoa
switch entr constructor to use pyspark dataframes instead of pyhive
The text was updated successfully, but these errors were encountered:
jordanperr
changed the title
Efficiency improvements in OpenOA - ENTR warehouse constructor, preserve Spark Dataframe in PlantData
Efficiency improvements in OpenOA - ENTR warehouse constructor, use PySpark and Arrow in OpenOA ENTR Constructor
Dec 2, 2022
We found that using PyHive and HiveThriftServer2 to load data into OpenOA is unacceptably slow (~1 minute to load La Haute Borne data into a PlantData). This ticket aims to overcome this issue by switching the ENTR constructor to use PySpark directly instead of PyHive, and ensure reasonably correct optimizations are made (e.g., Apache Arrow, number and size of Spark executors which makes sense for the demo).
Impacted Repositories:
The text was updated successfully, but these errors were encountered: