-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: ydata-profiling won't work in Azure Synapse #1578
Comments
Hi @ramonsuarez , thank you for your request, you have to make sure that you install the version of YData-profiling that is compatible with Synapses pre-installed packages. We are not responsible for synapses environment and packages that they are currently using. As an example, synapse's environment is using an older pandas version, as per the message you are getting You can check our release history here: https://github.com/ydataai/ydata-profiling/releases Let me know if this was helpful. |
Thanks a lot @fabclmnt . These notes are for your information, I hope they are useful. I've tested both in Synapse and Fabric using a different Yellow Taxi db (huge) because I couldn't load the one in your databricks blogpost:
|
Hi @ramonsuarez , thank you for your inputs. Regarding the error, indeed we have it fixed with later versions, and unfortunately we can't ensure or control the versions that other platforms are using in terms of pandas, numpy and other core libraries. Based on Fabric error message, they are using older versions for packages such as typeguard and typing-extensions due to the use of an older version of Tensorflow. I would suggest to report to Azure in order to have the packages on their side upgraded. |
Current Behaviour
Using your databricks notebook example with a different table and and added cell for installing ydata-profiling, when I try to install ydata towards the end (and after messages about not being able to uninstall Pandas and Seaborn) I get this error:
The kernel autostarts at the end of the install process and the the code fails because the df that is loaded at the beginning it is no longer in memory, so I have to run the first cell again.
After this I run the code that did not work (pasted into a new cell not to trigger the install again), and the next error comes when running
report_html = report.to_html()
.But then when it arrives to the
profile_json
, I do get data from the table in it.I've installed following the instructions in your website with and without [pyspark], [notebook] and [pyspark, notebook]. The errors are the same
Pastebin with all the output.
Expected Behaviour
Install without errors and display html report in Azure Synapse notebook (pyspark)
Data Description
Dataset is one I'm practicing with that is already in my workspace. It contains metadata about the publicly available tables I've imported from a public transport company.
Code that reproduces the bug
pandas-profiling version
4.7.0
Dependencies
OS
spark on Azure Synapse
Checklist
The text was updated successfully, but these errors were encountered: