-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we generate .vector files #34
Comments
Yeah, this would be generally useful because "converting between word2vec formats" can be desirable in a lot of cases. I tend to use gensim for this because there isn't an R solution. I'll try to add it in a subsequent release. The block below should work for now though, although it may be slow.
|
Thanks Ben. How would you read in a binary file? for the model argument.
I tried readBin(my_bin_file.bin, character(), n = 3) and
readBin(my_bin_file.bin, integer(), n = 3).
This model from readBin doesn't look right
…On Mon, Apr 10, 2017 at 11:28 AM, Benjamin Schmidt ***@***.*** > wrote:
Yeah, this would be generally useful because "converting between word2vec
formats" can be desirable in a lot of cases. I tend to use gensim for this
because there isn't an R solution.
I'll try to add it in a subsequent release. The block below should work
for now though, although it may be slow.
#' Write in word2vec text format
#'
#' @param model The wordVectors model you wish to save. (This can actually be any matrix with rownames,
#' if you want a smaller binary serialization in single-precision floats.)
#' @param filename The file to save the vectors to. I recommend ".vectors" as a suffix.
#'
#' @return Nothing
#' @export
write.txt.word2vec = function(model,filename) {
filehandle = file(filename,"wb")
dim = dim(model)
writeChar(as.character(dim[1]),filehandle,eos=NULL)
writeChar(" ",filehandle,eos=NULL)
writeChar(as.character(dim[2]),filehandle,eos=NULL)
writeChar("\n",filehandle,eos=NULL)
names = rownames(model)
# I just store the rownames outside the loop, here.
i = 1
names = rownames(model)
silent = apply(model,1,function(row) {
# EOS must be null for this to work properly, because, ridiculously,
# 'eos=NULL' is the command that tells R *not* to insert a null string
# after a character.
writeChar(paste0(names[i]," "),filehandle,eos=NULL)
text = paste(as.character(row),collapse=" ")
writeChar(paste(text,"\n"),filehandle,eos=NULL)
i <<- i+1
})
close(filehandle)
}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AK2fSHNjyIbZ4FP1bNC5ewiRiKBg-KPbks5rujx6gaJpZM4M40Rn>
.
|
I figured out the solution. Copying it here for othere benefit.
bin_model <- wordVectors::read.binary.vectors("./DATA/googleNext.bin")
write.txt.word2vec(bin_model, "text_model.txt")
Thanks Ben
On Mon, Apr 10, 2017 at 2:03 PM, Pavan Vikram Mirla <[email protected]>
wrote:
… Thanks Ben. How would you read in a binary file? for the model argument.
I tried readBin(my_bin_file.bin, character(), n = 3) and
readBin(my_bin_file.bin, integer(), n = 3).
This model from readBin doesn't look right
On Mon, Apr 10, 2017 at 11:28 AM, Benjamin Schmidt <
***@***.***> wrote:
> Yeah, this would be generally useful because "converting between word2vec
> formats" can be desirable in a lot of cases. I tend to use gensim for this
> because there isn't an R solution.
>
> I'll try to add it in a subsequent release. The block below should work
> for now though, although it may be slow.
>
> #' Write in word2vec text format
> #'
> #' @param model The wordVectors model you wish to save. (This can actually be any matrix with rownames,
> #' if you want a smaller binary serialization in single-precision floats.)
> #' @param filename The file to save the vectors to. I recommend ".vectors" as a suffix.
> #'
> #' @return Nothing
> #' @export
> write.txt.word2vec = function(model,filename) {
> filehandle = file(filename,"wb")
> dim = dim(model)
> writeChar(as.character(dim[1]),filehandle,eos=NULL)
> writeChar(" ",filehandle,eos=NULL)
> writeChar(as.character(dim[2]),filehandle,eos=NULL)
> writeChar("\n",filehandle,eos=NULL)
> names = rownames(model)
> # I just store the rownames outside the loop, here.
> i = 1
> names = rownames(model)
> silent = apply(model,1,function(row) {
> # EOS must be null for this to work properly, because, ridiculously,
> # 'eos=NULL' is the command that tells R *not* to insert a null string
> # after a character.
> writeChar(paste0(names[i]," "),filehandle,eos=NULL)
> text = paste(as.character(row),collapse=" ")
> writeChar(paste(text,"\n"),filehandle,eos=NULL)
> i <<- i+1
> })
> close(filehandle)
> }
>
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#34 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AK2fSHNjyIbZ4FP1bNC5ewiRiKBg-KPbks5rujx6gaJpZM4M40Rn>
> .
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello Ben,
As "*.bin" output vector files are not readable? Is there any way to generate ".Vectors" file as it was possible in earlier releases of word2vec?
I am asking because ".vector" files are easier to convert to tensor flow's ".bytes" format to visualise them in tensor flow projector. ".vector" file format matched Mikolov's original format for capturing vectors.
I want to be able to generate output vector files which are readable.
Thanks for great work in developing this package.
Regards
The text was updated successfully, but these errors were encountered: