Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even lazier idea for a MS Office and LibreOffice sniffing function. #237

Closed
1 of 3 tasks
bokov opened this issue Oct 4, 2019 · 3 comments
Closed
1 of 3 tasks

Even lazier idea for a MS Office and LibreOffice sniffing function. #237

bokov opened this issue Oct 4, 2019 · 3 comments

Comments

@bokov
Copy link
Contributor

bokov commented Oct 4, 2019

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

I often am given MS Office files that have missing extensions or incorrect ones (e.g. a novice user tried to 'convert' them to csv by renaming them). Not only are these zip files, they can have the same file signature as LibreOffice files and who knows what else.

This is a function that scans for files that are specific to MS Office and LibreOffice/OpenOffice.

As with #236, my question is: would this function be useful to contribute to rio, for example to augment the current extension-only dispatch of readxl and readODS methods.

Put your code here:

isfilezipdoc <- function(filename, 
                                     docpaths=c(MSO='[Content_Types].xml',
                                                         OO='META-INF/manifest.xml'), tf=TRUE){
    matchedpaths <- docpaths %in% unzip(filename,list=TRUE)[,1] 
    if(tf) return(any(matchedpaths))
    return(names(docpaths[matchedpaths])) 
}
@bokov bokov changed the title Even lazier idea for a MS Office and LibreOffice sniffing file. Even lazier idea for a MS Office and LibreOffice sniffing function. Oct 4, 2019
@bokov
Copy link
Contributor Author

bokov commented Oct 4, 2019

Note: as written it cannot distinguish between .xlsx vs .docx vs .pptx nor between the various LibreOffice equivalents, though it can tell them from each other. This should be okay though because rio only supports the spreadsheet file types from those respective office suites, right?

@leeper
Copy link
Contributor

leeper commented Oct 19, 2019

I'm not sure it's that useful because these are pretty unambiguous file extensions?

@leeper
Copy link
Contributor

leeper commented Dec 20, 2019

Gonna close this.

@leeper leeper closed this as completed Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants