-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WordPerfect 4.2 (fmt/949) signature + sample file #10
Comments
Johan, I can confirm the pattern from samples saved from later versions of WordPerfect as well, but I worry it does not represent all files from that time period. From the many samples I have found from install disks and donations, there is no discernible pattern to WP 4 format. It is simply ascii with format codes. Here is a link to the WP 4 File Format Specification: |
Hi Tyler, Your response got me curious, so I located and installed a copy of WordPerfect 4.2 for DOS, and did some tests. First I created a document with some text, without applying any formatting, and saved that to file. Here's what it looks like in WP 4.2 (with the reveal codes window at the bottom): I then opened it in a Hex editor: Which is indeed pure ASCII. Then I went back to WordPerfect, and added a font definition (using the font dialog that opens after pressing Ctrl F8). I then saved the result as a separate file. Here again what this looks like in WordPerfect (note the Font Change code in the reveal window): And here's that file in a Hex editor: The file is identical to the earlier one, except for the addition of these 6 bytes at the start:
In the WP 4 spec you linked to you can see this corresponds to a "set pitch and/or font" instruction (the number 313 is the octal representation of
I imagine this may be a pretty common pattern, but I agree this isn't suitable as a signature for identifying WordPerfect 4 files. So yes, you're completely right! I've uploaded both WordPerfect 4.2 test files here: |
I suppose the font instruction is a common code, especially later GUI versions of WordPerfect probably set this automatically. But probably won't see too many files saved down from later versions in the wild. For a pure ascii file, there isn't anything different from a regular plain text, therefore a txt identification is probably appropriate. If we know all the possible formatting codes, we should be able to identify a WP4 file with the right tool. Thanks for digging into the format, running WP 4 DOS is no easy feat these days, I still need a keyboard cheat sheet! |
@thorsted Yes, adding the extension might be useful, although I'm not sure how commonly any of these extensions were used at the time. For example, when you save a document in WP 4.2 it doesn't enforce or even hint at a particular extension. |
@thorsted I don't think any WP user bothered to use consistent extensions until WP for Windows came along, and users were more or less obliged to use them. I started using WPDOS in 1985 and have never used a ".wp" or ".wpd" extension for any file that it creates, unless I intend to open it easily in WP for Windows. |
The format database of the TrID tool has a signature for WordPerfect 4.2 files:
https://file-extension.net/seeker/file_extension_wp
Here's the signature:
Signature author is Philip Storry. License appears to be AGPL, based on what I found here.
A few years ago I submitted a derived version of the sig to Apache Tika, see commit here.
This adds a
0xCB
byte at offset 5, don't remember why I did that (possibly to avoid a collision with another format?), and what it was based on precisely, so proceed with caution!Test file here (I created this with WordPerfect 6.1 for Windows, running on VM with Windows 3.11):
https://github.com/bitsgalore/tika/raw/8c7c760319b85cfa87c1a8dc3f7cf64278df8710/tika-parsers/src/test/resources/test-documents/testWordPerfect_42.doc
Note that I'm not 100% sure that WordPerfect 4.0 and 4.1 (which are both under the same PUID) have the same signature!
The text was updated successfully, but these errors were encountered: