-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mathematica Notebook format #1
Comments
I'm going to try this one if that's ok |
Sam - absolutely! Have at it. I have a large body of files to test your signature on. |
I have created a signature file for a notebook file. The current problem I'm facing (also described above) is that there are multiple versions of Wolfram Notebook and there is only one fmt. Should I create a signature file for every version I can find? Or do I need to handle this differently? Thanks for any advise Sam |
Sam I think for Mathematica, a single signature for the variations would be acceptable. The plain text nature of the format mitigates future archiving issues. |
Thanks for your response. I think I'm going to try to make different signatures then, because there are no tools (I'm aware of) that say which version a file is. So maybe it is useful to know this. But if you (or anybody else) has objections to this, I'll create one for all versions) Sam |
I've been thinking further on this and I think I will make a signature file for every version of Notebook I have and create a fallback for Wolfram Notebooks in general so that all Notebooks will be identified. Most will have versions and some will have the general ID. So we don't need to deprecate the current Wolfram Notebook PUID and if more specific PUIDs are available these are assigned. How does this sound? |
I like the idea of a catch-all signature (the existing one) and then individual ones. There must be some format changes from the earlier version to the later versions (after Wolfram bought Mathematica). I can test your signatures when they are ready. |
Hi @gleporeNARA I have a test signature for all the mathematica notebook files I have and that you included in this issue. The all.zip file contains the signature file for 10.0, 10.1, 10.2, 10.3, 10.4, 11.0, 11.2, 7.0, 8.0, 9.0, 11.1, 11.3, 12.0, 6.0 and a catch all for unknown versions. cc @thorsted |
The version specific signatures look good, they all match up with my test files (except for some of the 4.2 versions.) The generic signature is probably too brief at 2 bytes to be specific enough. It matched hundreds of non-Mathematica files in my test collections. See attached for a small sample. It mostly looks like Pascal code, but there are other formats that come up postive as well. I would suggest the generic signature should also include the word Mathematica somewhere in the first 200 or so characters, and perhaps a few more asterisks. The others that aren't matching a specific signature all have the string "Mathematica-Compatible Notebook" in addition to the '(*' string. Perhaps a separate signature for that would be useful. There's obviously some program out there that outputs its Mathematica files with that string. Thanks for working on this! False Positives |
Hi @gleporeNARA, Can you also provide the examples of Mathematica 4.2 files that fail the match? Thanks! |
The four files with the names beginning with Math42 in the original zip file I uploaded. It's weird, because it looks like they should match the hex values 4372656174656442793d274d617468656d617469636120342e3227 Can you verify? |
Fix it, just a copy paste error. |
I'll look into the false positives |
Regarding the false positives, I think I found a solution. If the end of the file also match: Now I'm trying to create a signature for that, but it is more challenging then I thought :-) |
Looking for additional real-world examples from earlier version of Mathematica (say, versions 1 and 2, if they existed). Also wondering about the resource cost in performing an identification on a signature with several long text strings to search for.
Format name: Mathematica Notebook files
Version number(s): all?
PRONOM fmt/201 - No current signatures on file - http://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=926&strPageToDisplay=summary
Extensions: nb
mime-type: text/plain; charset=us-ascii
Description: "Wolfram Mathematica (usually termed Mathematica) is a modern technical computing system spanning most areas of technical computing — including neural networks, machine learning, image processing, geometry, data science, visualizations, and others. The system is used in many technical, scientific, engineering, mathematical, and computing fields. It was conceived by Stephen Wolfram and is developed by Wolfram Research of Champaign, Illinois The Wolfram Language is the programming language used in Mathematica."
Format type: Text (Structured)
Vendor: Wolfram Research
The signature from the 'file' command is:
Below is a list of common strings that appear in these files.
Content-type: application/vnd.wolfram.mathematica
Content-type: application/mathematica
Wolfram Notebook File
Mathematica-Compatible Notebook
CreatedBy='Mathematica x.x'
http://www.wolfram.com/nb
mathematica.zip
The text was updated successfully, but these errors were encountered: