Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented dataframe.cov #2142

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

implemented dataframe.cov #2142

wants to merge 6 commits into from

Conversation

LSturtew
Copy link
Contributor

@LSturtew LSturtew commented Apr 8, 2021

ref #1929

Implement DataFrame.cov

>>> kdf = ks.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> kdf.cov()
                  dogs      cats
        dogs  0.666667 -1.000000
        cats -1.000000  1.666667

@codecov-io
Copy link

Codecov Report

Merging #2142 (7987192) into master (d7f6e88) will decrease coverage by 2.23%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
- Coverage   95.37%   93.14%   -2.24%     
==========================================
  Files          60       60              
  Lines       13694    13601      -93     
==========================================
- Hits        13060    12668     -392     
- Misses        634      933     +299     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 74.57% <ø> (-25.43%) ⬇️
databricks/koalas/frame.py 95.65% <85.71%> (-0.84%) ⬇️
databricks/koalas/usage_logging/__init__.py 28.20% <0.00%> (-64.36%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) ⬇️
databricks/koalas/missing/series.py 60.56% <0.00%> (-39.44%) ⬇️
databricks/koalas/__init__.py 80.26% <0.00%> (-11.85%) ⬇️
databricks/koalas/missing/indexes.py 88.63% <0.00%> (-11.37%) ⬇️
databricks/conftest.py 89.09% <0.00%> (-10.91%) ⬇️
databricks/koalas/typedef/typehints.py 86.22% <0.00%> (-9.19%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7f6e88...7987192. Read the comment docs.

]
kdf = self[num_cols]
names = [name for t in num_cols for name in t]
mat = kdf.to_pandas().to_numpy(dtype=float, copy=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid using to_pandas() without any restriction is not a good idea. It will cause OOM if the data side doesn't fit in a driver's memory.

@xinrong-meng
Copy link
Contributor

xinrong-meng commented Aug 3, 2021

Hi @LSturtew, since Koalas has been ported to Spark as pandas API on Spark, would you like to migrate this PR to the Spark repository? Here is the ticket https://issues.apache.org/jira/browse/SPARK-36396. Otherwise, I may do that for you next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants