-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
202 lines (140 loc) · 7.94 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
= read_filepro for Python =
Python source for extracting map, key, and data records from an ancient filePro
database. Reads the raw files so a copy of filePro is not required.
Copyright Jim Storch, 2010.
Released under the Apache 2.0 LICENSE
Contact info: jimstorch then that amper thingy symbol then gmail and then the
TLD for commercial entities.
== Background ==
FIRST: Copy your data and only work with a disposable copy!
Recently, I was asked to extract the data from over four hundred filePro
databases created by an office over a nine year period. To hopefully spare
someone else this hassle, I'm posting my findings and code. I copied 544mb in
filePro stuff and ended up with 109mb in .CSV files. They were using filePro
version 5.0.09, however, I did not use filePro itself for any step of this
conversion.
I'm capitalizing for discussion only. The code expects all filenames to be in
lowercase. My coding/conversion platform was Linux using Python 2.6. I did not
have access to the filePro application itself.
Filepro is not the same thing as F,I,L,E,M,A,K,E,R, ,P,R,O. I'll try not to
waylay googlers.
== filePro ==
Filepro stores data in three files; MAP, DATA, and KEY.
There are other files for indexing, screens, and whatnot. We don't care, our
goal is get the data out.
=== Map Files ===
You could call this the schema. The first line holds six segments separated by
colons. Here they are;
1. The string literal 'map'. If it starts with 'Alien:' then it's some
kind of external data reference (??). I had two and ignored them.
2. The size in bytes of one record in the KEY file. If the record
represents five fields that are 10 bytes wide, this would be 50.
Note that filepro tacks on a 20 byte header to each record in the
KEY file, so the real KEY record size would be 70.
3. The size in bytes of one record in the DATA file. No header. May be
zero if all columns are kept in the KEY file.
4. Number of fields that are in the KEY file. All fields beyond this
count are kept in the DATA file.
5. Eight byte checksum for the encoded password. We don't care.
6. The encoded password. Again, we don't care. It's only used by filePro
to permit changes. All the actual data is in cleartext.
The second through last lines contain details of each column in the database.
Again, columns are usually broken across the KEY and DATA files.
Each line contains three segments separated by colons:
1. The name of the fiels. Sometimes empty, sometimes has weird
characters.
2. The width in bytes of the field. Sometimes empty for dummy or
placeholder fields. Not really sure why. I ignored all fields with
a blank width.
3. The edit type. I think this is used by filePro when validating a
field value. Seemed ok to ignore since everything was stored as
non-terminated ASCII sequences, even numbers.
Ref: https://www.fptech.com/Products/Docs/FPFormat/ftmap.shtml
=== Key Files ===
The KEY file is a flat file. It should always be record_count * record_size
long. The record size is the width of all the key fields plus a 20 byte
'deleted' and 'used' headers.
The only byte in the header I care about is at offset 0:
0x00 = this is a deleted record
0x01 = this is an active record
The database column in the KEY file are apparently those that filePro has built
indexes for. All other columns are stored in the DATA file. It is possible
for all columns to be in KEY and have a DATA file 0 bytes long.
Filepro does not shrink the file as you delete records. It maintains a
chain of deleted records and uses those as slots for saving new ones.
Ref: https://www.fptech.com/Products/Docs/FPFormat/ftkey.shtml
=== Data Files ===
As stated above, the DATA files seems to be "other half" of the fields outlined
in the MAP file -- that that are not indexed(?). It's a flatfile that seems
to match the KEY file record for record (in matching order) but does not use a
record header so it's length is equal to ((sum of field widths) * number of
records)).
If all columns are indexed, then the map file is empty because all columns are
in KEY.
== DOS/UNIX Line Endings ==
WARNING: This is not a step. If you don't understand what I'm talking about,
don't do this.
I was lucky enough to find the FTP daemon running on the SCO box I needed to
grab data from. I copied the entire filePro directory to a Windows PC and then
burned it to CD. Sadly, I forgot to set the mode to binary and most of my KEY
files got messed up because Filezilla was trying to convert line endings from
UNIX to DOS/Windows style.
It took me a while to realize because I was also trying to figure out how
filePro stored its data. Some of the files were evenly divisible by record
size and some weren't.
I was able to fix some using the 'flip' utility:
$ sudo apt-get install flip
$ flip -u -b -v appl/filepro/A_FOLDER/key
If you have a bunch, you can batch convert them with:
$ cd appl
$ find . -name key -exec flip -u -b -v {} \;
My success rate was about around 75%. We ended up re-FTP'ing the files in
binary mode and all worked fine.
== The Code ==
First, a couple disclaimers:
* Copy your data and only work with a disposable copy!
* Do not run filePro on the database while trying to read it from Python.
Which wont happen because you're working on a disposable copy, right?
* It only reads from filePro databases, it does not write to them.
* It slurps the entire database into memory. It may choke on giant sets.
OTOH, I don't expect there are a lot of giant filePro sets.
* The folder you pass must contain a 'map', 'key', and 'data' file -- with
lowercase names. It will bomb on an empty folder.
* The MAP file cannot be an Alien pointer (for reading external ASCII files?).
* This code is not highly tested. I worked on it until I got all my files
converted.
=== The Source Files ===
read_filepro/fpmap.py - the FPMap class used to open MAP files.
read_filepro/fpkey.py - the FPKey class used to open KEY files.
read_filepro/fpdata.py - the FPData class used to open DATA files.
read_filepro/fpdatabase - the FPDatabase class that wraps the three previous
classes to open an entire filePro database.
Using FPDatabase is pretty simple. Just pass it one argument, the path of the
directory containing a filePro database:
from read_filepro import FPDatabase
db = FPDatabase('/tmp/appl/filepro/A_DATABASE')
=== FPDatabase Methods ===
* db.get_field_count() - returns number of fields in the table
* db.get_field_names() - returns the names of the fields, in column order
* db.get_field_types() - guesses the value type, either 'string' or 'numeric'
* db.get_total_record_count() - returns number of records in table, including
deleted
* db.get_active_record_count() - returns number of active/non-deleted records
* db.get_deleted_record_count() - returns number of deleted records
The following methods return a Python list object containing records. Each
record is a Python list object containing strings for every column value.
* db.get_all_records() - returns a list of all records; active + deleted.
* db.get_active_records() - returns a list of non-deleted records
* db.get_deleted_records() - returns a list of deleted records.
* db.is_deleted(index) - returns true if index value is a deleted record.
* db.get_record(index) - returns the record at index as a Python list.
* db.get_record_dict(index) - returns the record at index as a Python
dictionary where key = field name and value = column value.
=== Batch Convert ===
I used the script 'batch_convert.py' to mass convert the filePro databases
into Comma Separated Value (CSV) files. You'll need to edit it and set
values for APPL_PATH and CSV_PATH . I had to delete a couple source folders
with 'Alien' map files and one empty directory (working from disposable
copies of course). It adds a timestamp to the filename, so running it over
and over produces additional files.
Good luck!