Skip to content
This repository has been archived by the owner on Oct 28, 2019. It is now read-only.

Add ability to read .dataset formats (deserialization) #6

Open
andrie opened this issue Aug 3, 2015 · 3 comments
Open

Add ability to read .dataset formats (deserialization) #6

andrie opened this issue Aug 3, 2015 · 3 comments

Comments

@andrie
Copy link
Contributor

andrie commented Aug 3, 2015

The .dataset format is used as the output of most modules in ML Studio (intermediate datasets). For example, the Split module results are in that format.

Studio currently disables the Generate Data Access Code and Open in Notebook features on those output nodes due to lack of deserialization support for that format in Python.

To access those intermediate datasets from Python code, the user needs to insert a Convert to CSV module. Note that this conversion loses some metadata, such as column type information. Pandas can infer the types most of the time, but sometimes it requires user post-processing.

@piccolbo
Copy link
Contributor

piccolbo commented Aug 3, 2015

OK, but this is not in python version yet so I think we could focus on python parity first, because we are behind on many things and having a reference implementation in place is a big help. So I am saying absolutely, but slightly lower priority.

@piccolbo
Copy link
Contributor

piccolbo commented Aug 4, 2015

I didn't see a spec of the format in the material you attached to your email message. If you run into something more detailed can you post it here? Thanks

@bwlewis
Copy link
Contributor

bwlewis commented Oct 22, 2015

datasets format requires .NET. A description of Dataset can be found here:
https://msdn.microsoft.com/en-us/library/azure/dn905850.aspx

The underlying thing is referred to as a "Data Table" which is an object of .NET class "Array":

https://msdn.microsoft.com/library/system.array.aspx

On non-windows platforms, mono would be required to read this. It's a pain and probably not worth dealing with right now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants