Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is a SentenceSet? #205

Open
tastyminerals opened this issue Aug 6, 2016 · 0 comments
Open

What is a SentenceSet? #205

tastyminerals opened this issue Aug 6, 2016 · 0 comments

Comments

@tastyminerals
Copy link

tastyminerals commented Aug 6, 2016

I do not understand the following SentenceSet paragraph:

A DataSet used for language modeling. 
Takes a sequence of words stored as a tensor of word IDs and a Tensor holding the start index of the sentence of its commensurate word id (the one at the same index). 
Unlike DataSets, for memory efficiency reasons, this class does not store its data in Views. 
However, the outputs of factory methods batch, sub, and index are Batches containing input and target ClassViews.
The returned batch:inputs() are filled according to Google 1-Billion Words guidelines.

So words stored as two tensors, a "tensor of word IDs" and the other as "tensor holding the start index of the sentence of its commensurate word id" what?
I am not even asking why words are stored as two tensors in the first place, because this is even more confusing. Can somebody explain how are words actually stored?

@tastyminerals tastyminerals changed the title What is the DataSet? What is a DataSet? Aug 6, 2016
@tastyminerals tastyminerals changed the title What is a DataSet? What is a SentenceSet? Aug 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant