Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordPerfect 4.2 (fmt/949) signature + sample file #10

Open
bitsgalore opened this issue Oct 5, 2020 · 7 comments
Open

WordPerfect 4.2 (fmt/949) signature + sample file #10

bitsgalore opened this issue Oct 5, 2020 · 7 comments

Comments

@bitsgalore
Copy link

bitsgalore commented Oct 5, 2020

The format database of the TrID tool has a signature for WordPerfect 4.2 files:

https://file-extension.net/seeker/file_extension_wp

Here's the signature:

0xCB 0A 01

Signature author is Philip Storry. License appears to be AGPL, based on what I found here.

A few years ago I submitted a derived version of the sig to Apache Tika, see commit here.

This adds a 0xCB byte at offset 5, don't remember why I did that (possibly to avoid a collision with another format?), and what it was based on precisely, so proceed with caution!

Test file here (I created this with WordPerfect 6.1 for Windows, running on VM with Windows 3.11):

https://github.com/bitsgalore/tika/raw/8c7c760319b85cfa87c1a8dc3f7cf64278df8710/tika-parsers/src/test/resources/test-documents/testWordPerfect_42.doc

Note that I'm not 100% sure that WordPerfect 4.0 and 4.1 (which are both under the same PUID) have the same signature!

@thorsted
Copy link
Contributor

thorsted commented Oct 5, 2020

Johan,

I can confirm the pattern from samples saved from later versions of WordPerfect as well, but I worry it does not represent all files from that time period.

From the many samples I have found from install disks and donations, there is no discernible pattern to WP 4 format. It is simply ascii with format codes.

Here is a link to the WP 4 File Format Specification:
https://archive.org/download/wordperfectsdkperfectfit1994/WordPerfect_SDK_PerfectFit1994.iso/51PCSDK%2FWP42FF.TXT

@bitsgalore
Copy link
Author

bitsgalore commented Oct 6, 2020

Hi Tyler,

Your response got me curious, so I located and installed a copy of WordPerfect 4.2 for DOS, and did some tests. First I created a document with some text, without applying any formatting, and saved that to file. Here's what it looks like in WP 4.2 (with the reveal codes window at the bottom):

wp42

I then opened it in a Hex editor:

wp42-hex

Which is indeed pure ASCII. Then I went back to WordPerfect, and added a font definition (using the font dialog that opens after pressing Ctrl F8). I then saved the result as a separate file. Here again what this looks like in WordPerfect (note the Font Change code in the reveal window):

wp42-fc

And here's that file in a Hex editor:

wp42-fc-hex

The file is identical to the earlier one, except for the addition of these 6 bytes at the start:

CB 0A 01 0A 01 CB

In the WP 4 spec you linked to you can see this corresponds to a "set pitch and/or font" instruction (the number 313 is the octal representation of 0xCB):

6     313 cb   Set pitch and/or font
          <313><old pitch><old font><new pitch><new font><313>
          If pitch is negative, then it is proportional.

I imagine this may be a pretty common pattern, but I agree this isn't suitable as a signature for identifying WordPerfect 4 files. So yes, you're completely right!

I've uploaded both WordPerfect 4.2 test files here:

wp4-test.zip

@thorsted
Copy link
Contributor

thorsted commented Oct 6, 2020

I suppose the font instruction is a common code, especially later GUI versions of WordPerfect probably set this automatically. But probably won't see too many files saved down from later versions in the wild.

For a pure ascii file, there isn't anything different from a regular plain text, therefore a txt identification is probably appropriate.

If we know all the possible formatting codes, we should be able to identify a WP4 file with the right tool.

Thanks for digging into the format, running WP 4 DOS is no easy feat these days, I still need a keyboard cheat sheet!

@thorsted
Copy link
Contributor

thorsted commented Oct 6, 2020

Would it be good to add the extension .WP to fmt/949?
WordPerfect6 1-SaveAs

@emendelson
Copy link

WP 4.2 keyboard cheat sheet (F3, F3):

WP42

@bitsgalore
Copy link
Author

@thorsted Yes, adding the extension might be useful, although I'm not sure how commonly any of these extensions were used at the time. For example, when you save a document in WP 4.2 it doesn't enforce or even hint at a particular extension.

@emendelson
Copy link

emendelson commented Oct 9, 2020

@thorsted I don't think any WP user bothered to use consistent extensions until WP for Windows came along, and users were more or less obliged to use them. I started using WPDOS in 1985 and have never used a ".wp" or ".wpd" extension for any file that it creates, unless I intend to open it easily in WP for Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants