Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we generate .vector files #34

Open
pmirla opened this issue Apr 10, 2017 · 3 comments
Open

Can we generate .vector files #34

pmirla opened this issue Apr 10, 2017 · 3 comments

Comments

@pmirla
Copy link

pmirla commented Apr 10, 2017

Hello Ben,

As "*.bin" output vector files are not readable? Is there any way to generate ".Vectors" file as it was possible in earlier releases of word2vec?

I am asking because ".vector" files are easier to convert to tensor flow's ".bytes" format to visualise them in tensor flow projector. ".vector" file format matched Mikolov's original format for capturing vectors.

I want to be able to generate output vector files which are readable.

Thanks for great work in developing this package.

Regards

@bmschmidt
Copy link
Owner

Yeah, this would be generally useful because "converting between word2vec formats" can be desirable in a lot of cases. I tend to use gensim for this because there isn't an R solution.

I'll try to add it in a subsequent release. The block below should work for now though, although it may be slow.

#' Write in word2vec text format
#'
#' @param model The wordVectors model you wish to save. (This can actually be any matrix with rownames,
#' if you want a smaller binary serialization in single-precision floats.)
#' @param filename The file to save the vectors to. I recommend ".vectors" as a suffix.
#'
#' @return Nothing
#' @export
write.txt.word2vec = function(model,filename) {
 filehandle = file(filename,"wb")
 dim = dim(model)
 writeChar(as.character(dim[1]),filehandle,eos=NULL)
 writeChar(" ",filehandle,eos=NULL)
 writeChar(as.character(dim[2]),filehandle,eos=NULL)
 writeChar("\n",filehandle,eos=NULL)
 names = rownames(model)
 # I just store the rownames outside the loop, here.
 i = 1
 names = rownames(model)
 silent = apply(model,1,function(row) {
   # EOS must be null for this to work properly, because, ridiculously,
   # 'eos=NULL' is the command that tells R *not* to insert a null string
   # after a character.
   writeChar(paste0(names[i]," "),filehandle,eos=NULL)
   text = paste(as.character(row),collapse=" ")
   writeChar(paste(text,"\n"),filehandle,eos=NULL)
   i <<- i+1
 })
 close(filehandle)
}

@vikrammirla
Copy link

vikrammirla commented Apr 10, 2017 via email

@vikrammirla
Copy link

vikrammirla commented Apr 10, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants