Skip to content

Commit

Permalink
version bump 0.8.4: formula parsing
Browse files Browse the repository at this point in the history
- BIFF 2-12 formula parsing
- more content type coverage
- unified `.f` form: A1-style string
- `.F` field for array formulae
- formula output groups array formulae
- bin script -A --arrays output JS row objects
- whitespace robustness in inline string xml
- UTF-8 parsing in rich text runs (fixes SheetJS#505 h/t @fuchsc)
- bold/italic/underline accept null val attr (h/t @qqilihq)
- sst trimming (fixes SheetJS#176 h/t @shakhal @oising)
  • Loading branch information
SheetJSDev committed Feb 19, 2017
1 parent ab2eceb commit d7ecca0
Show file tree
Hide file tree
Showing 47 changed files with 3,410 additions and 1,200 deletions.
18 changes: 8 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@ misc/prof.js
v8.log
tmp
*.txt
*.csv
*.dif
*.prn
*.slk
*.[cC][sS][vV]
*.[dD][iI][fF]
*.[pP][rR][nN]
*.[sS][lL][kK]
*.socialcalc
*.xls
*.xlsb
*.xlsm
*.xlsx
*.ods
*.xml
*.[xX][lL][sSwW]
*.[xX][lL][sS][xXmMbB]
*.[oO][dD][sS]
*.[xX][mM][lL]
*.htm
*.html
*.sheetjs
19 changes: 8 additions & 11 deletions .npmignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,15 @@ misc/
node_modules
tmp
*.txt
*.csv
*.dif
*.prn
*.slk
*.[cC][sS][vV]
*.[dD][iI][fF]
*.[pP][rR][nN]
*.[sS][lL][kK]
*.socialcalc
*.XLS
*.xls
*.xlsb
*.xlsm
*.xlsx
*.ods
*.xml
*.[xX][lL][sSwW]
*.[xX][lL][sS][xXmMbB]
*.[oO][dD][sS]
*.[xX][mM][lL]
*.htm
*.html
*.sheetjs
Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ node_js:
- "7"
- "6"
- "5"
- "4.2"
- "4"
- "0.12"
- "0.10"
- "0.9"
Expand Down
44 changes: 32 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -366,24 +366,28 @@ for(var R = range.s.r; R <= range.e.r; ++R) {

### Cell Object

| Key | Description |
| --- | ----------- |
| `v` | raw value (see Data Types section for more info) |
| `w` | formatted text (if applicable) |
| `t` | cell type: `b` Boolean, `n` Number, `e` error, `s` String, `d` Date |
| `f` | cell formula (if applicable) |
| `r` | rich text encoding (if applicable) |
| `h` | HTML rendering of the rich text (if applicable) |
| `c` | comments associated with the cell ** |
| `z` | number format string associated with the cell (if requested) |
| `l` | cell hyperlink object (.Target holds link, .tooltip is tooltip) |
| `s` | the style/theme of the cell (if applicable) |
| Key | Description |
| --- | ---------------------------------------------------------------------- |
| `v` | raw value (see Data Types section for more info) |
| `w` | formatted text (if applicable) |
| `t` | cell type: `b` Boolean, `n` Number, `e` error, `s` String, `d` Date |
| `f` | cell formula encoded as an A1-style string (if applicable) |
| `F` | range of enclosing array if formula is array formula (if applicable) |
| `r` | rich text encoding (if applicable) |
| `h` | HTML rendering of the rich text (if applicable) |
| `c` | comments associated with the cell |
| `z` | number format string associated with the cell (if requested) |
| `l` | cell hyperlink object (.Target holds link, .tooltip is tooltip) |
| `s` | the style/theme of the cell (if applicable) |

Built-in export utilities (such as the CSV exporter) will use the `w` text if it
is available. To change a value, be sure to delete `cell.w` (or set it to
`undefined`) before attempting to export. The utilities will regenerate the `w`
text from the number format (`cell.z`) and the raw value if possible.

The actual array formula is stored in the `f` field of the first cell in the
array range. Other cells in the range will omit the `f` field.

### Data Types

The raw value is stored in the `v` field, interpreted based on the `t` field.
Expand Down Expand Up @@ -418,6 +422,20 @@ dates in the local timezone. js-xlsx does not correct for this error.
Type `s` is the String type. `v` should be explicitly stored as a string to
avoid possible confusion.

### Formulae

The A1-style formula string is stored in the `f` field. Even though different
file formats store the formulae in different ways, the formats are converted.

Shared formulae are decompressed and each cell has the correct formula.

Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.

The `sheet_to_formulae` method generates one line per formula or array formula.
Array formulae are rendered in the form `range=formula` while plain cells are
rendered in the form `cell=formula or value`.

### Worksheet Object

Expand Down Expand Up @@ -619,6 +637,8 @@ OSP-covered specifications:
- [MS-OLEDS]: Object Linking and Embedding (OLE) Data Structures
- [MS-OLEPS]: Object Linking and Embedding (OLE) Property Set Data Structures
- [MS-OSHARED]: Office Common Data Types and Objects Structures
- [MS-ODRAW]: Office Drawing Binary File Format
- [MS-ODRAWXML]: Office Drawing Extensions to Office Open XML Structure
- [MS-OVBA]: Office VBA File Format Structure
- [MS-CTXLS]: Excel Custom Toolbar Binary File Format
- [MS-XLDM]: Spreadsheet Data Model File Format
Expand Down
4 changes: 3 additions & 1 deletion bin/xlsx.njs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ program
.option('-S, --formulae', 'print formulae')
.option('-j, --json', 'emit formatted JSON (all fields text)')
.option('-J, --raw-js', 'emit raw JS object (raw numbers)')
.option('-A, --arrays', 'emit rows as JS objects (raw numbers)')
.option('-F, --field-sep <sep>', 'CSV field separator', ",")
.option('-R, --row-sep <sep>', 'CSV row separator', "\n")
.option('-n, --sheet-rows <num>', 'Number of rows to process (0=all rows)')
Expand Down Expand Up @@ -77,7 +78,7 @@ if(program.xlsx || program.xlsm || program.xlsb) {
opts.cellNF = true;
if(program.output) sheetname = program.output;
}
else if(program.formulae);
else if(program.formulae) opts.cellFormula = true;
else opts.cellFormula = false;

if(program.all) {
Expand Down Expand Up @@ -142,6 +143,7 @@ if(!program.quiet) console.error(target_sheet);
if(program.formulae) oo = X.utils.get_formulae(ws).join("\n");
else if(program.json) oo = JSON.stringify(X.utils.sheet_to_row_object_array(ws));
else if(program.rawJs) oo = JSON.stringify(X.utils.sheet_to_row_object_array(ws,{raw:true}));
else if(program.arrays) oo = JSON.stringify(X.utils.sheet_to_row_object_array(ws,{raw:true, header:1}));
else oo = X.utils.make_csv(ws, {FS:program.fieldSep, RS:program.rowSep});

if(program.output) fs.writeFileSync(program.output, oo);
Expand Down
2 changes: 1 addition & 1 deletion bits/01_version.js
Original file line number Diff line number Diff line change
@@ -1 +1 @@
XLSX.version = '0.8.3';
XLSX.version = '0.8.4';
9 changes: 9 additions & 0 deletions bits/20_jsutils.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,12 @@ function cc2str(arr/*:Array<number>*/)/*:string*/ {
return o;
}

function dup(o/*:any*/)/*:any*/ {
if(typeof JSON != 'undefined') return JSON.parse(JSON.stringify(o));
if(typeof o != 'object' || !o) return o;
var out = {};
for(var k in o) if(o.hasOwnProperty(k)) out[k] = dup(o[k]);
return out;
}

function fill(c/*:string*/,l/*:number*/)/*:string*/ { var o = ""; while(o.length < l) o+=c; return o; }
4 changes: 2 additions & 2 deletions bits/22_xmlutils.js
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,9 @@ var matchtag = (function() {
var vtregex = (function(){ var vt_cache = {};
return function vt_regex(bt) {
if(vt_cache[bt] !== undefined) return vt_cache[bt];
return (vt_cache[bt] = new RegExp("<vt:" + bt + ">(.*?)</vt:" + bt + ">", 'g') );
return (vt_cache[bt] = new RegExp("<(?:vt:)?" + bt + ">(.*?)</(?:vt:)?" + bt + ">", 'g') );
};})();
var vtvregex = /<\/?vt:variant>/g, vtmregex = /<vt:([^>]*)>(.*)</;
var vtvregex = /<\/?(:?vt:)?variant>/g, vtmregex = /<(:?vt:)?([^>]*)>(.*)</;
function parseVector(data) {
var h = parsexmltag(data);

Expand Down
2 changes: 2 additions & 0 deletions bits/23_binutils.js
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ function ReadShift(size, t) {
case 'utf8': o = __utf8(this, this.l, this.l + size); break;
case 'utf16le': size *= 2; o = __utf16le(this, this.l, this.l + size); break;

case 'wstr': o = cptable.utils.decode(current_codepage, this.slice(this.l, this.l+2*size)); size = 2 * size; break;

/* [MS-OLEDS] 2.1.4 LengthPrefixedAnsiString */
case 'lpstr': o = __lpstr(this, this.l); size = 5 + o.length; break;
/* [MS-OLEDS] 2.1.5 LengthPrefixedUnicodeString */
Expand Down
30 changes: 21 additions & 9 deletions bits/25_cellutils.js
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
/* XLS ranges enforced */
function shift_cell_xls(cell, tgt) {
function shift_cell_xls(cell, tgt, opts) {
var out = dup(cell);
if(tgt.s) {
if(cell.cRel) cell.c += tgt.s.c;
if(cell.rRel) cell.r += tgt.s.r;
if(out.cRel) out.c += tgt.s.c;
if(out.rRel) out.r += tgt.s.r;
} else {
cell.c += tgt.c;
cell.r += tgt.r;
out.c += tgt.c;
out.r += tgt.r;
}
cell.cRel = cell.rRel = 0;
while(cell.c >= 0x100) cell.c -= 0x100;
while(cell.r >= 0x10000) cell.r -= 0x10000;
return cell;
if(!opts || opts.biff < 12) {
while(out.c >= 0x100) out.c -= 0x100;
while(out.r >= 0x10000) out.r -= 0x10000;
}
return out;
}

function shift_range_xls(cell, range) {
Expand All @@ -19,3 +21,13 @@ function shift_range_xls(cell, range) {
return cell;
}

function encode_cell_xls(c)/*:string*/ {
var s = encode_cell(c);
if(c.cRel === 0) s = fix_col(s);
if(c.rRel === 0) s = fix_row(s);
return s;
}

function encode_range_xls(r)/*:string*/ {
return encode_cell_xls(r.s) + ":" + encode_cell_xls(r.e);
}
14 changes: 11 additions & 3 deletions bits/28_binstructs.js
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ function write_XLWideString(data/*:string*/, o) {
return o;
}

/* [MS-XLSB] 2.5.165 */
var parse_XLNameWideString = parse_XLWideString;
var write_XLNameWideString = write_XLWideString;

/* [MS-XLSB] 2.5.114 */
var parse_RelID = parse_XLNullableWideString;
var write_RelID = write_XLNullableWideString;
Expand All @@ -101,8 +105,8 @@ function write_RkNumber(data/*:number*/, o) {
}


/* [MS-XLSB] 2.5.153 */
function parse_UncheckedRfX(data)/*:Range*/ {
/* [MS-XLSB] 2.5.117 RfX */
function parse_RfX(data)/*:Range*/ {
var cell/*:Range*/ = ({s: {}, e: {}}/*:any*/);
cell.s.r = data.read_shift(4);
cell.e.r = data.read_shift(4);
Expand All @@ -111,7 +115,7 @@ function parse_UncheckedRfX(data)/*:Range*/ {
return cell;
}

function write_UncheckedRfX(r/*:Range*/, o) {
function write_RfX(r/*:Range*/, o) {
if(!o) o = new_buf(16);
o.write_shift(4, r.s.r);
o.write_shift(4, r.e.r);
Expand All @@ -120,6 +124,10 @@ function write_UncheckedRfX(r/*:Range*/, o) {
return o;
}

/* [MS-XLSB] 2.5.153 UncheckedRfX */
var parse_UncheckedRfX = parse_RfX;
var write_UncheckedRfX = write_RfX;

/* [MS-XLSB] 2.5.171 */
/* [MS-XLS] 2.5.342 */
/* TODO: error checking, NaN and Infinity values are not valid Xnum */
Expand Down
6 changes: 6 additions & 0 deletions bits/30_ctype.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ var ct2type/*{[string]:string}*/ = ({
"application/vnd.ms-excel.pivotTable": "TODO",
"application/vnd.openxmlformats-officedocument.spreadsheetml.pivotTable+xml": "TODO",

/* Chart Colors */
"application/vnd.ms-office.chartcolorstyle+xml": "TODO",

/* Chart Style */
"application/vnd.ms-office.chartstyle+xml": "TODO",

/* Calculation Chain */
"application/vnd.ms-excel.calcChain": "calcchains",
"application/vnd.openxmlformats-officedocument.spreadsheetml.calcChain+xml": "calcchains",
Expand Down
5 changes: 1 addition & 4 deletions bits/35_custprops.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,7 @@ function parse_cust_props(data/*:string*/, opts) {
var x = m[i], y = parsexmltag(x);
switch(y[0]) {
case '<?xml': break;
case '<Properties':
if(y.xmlns !== XMLNS.CUST_PROPS) throw "unrecognized xmlns " + y.xmlns;
if(y.xmlnsvt && y.xmlnsvt !== XMLNS.vt) throw "unrecognized vt " + y.xmlnsvt;
break;
case '<Properties': break;
case '<property': name = y.name; break;
case '</property>': name = null; break;
default: if (x.indexOf('<vt:') === 0) {
Expand Down
12 changes: 9 additions & 3 deletions bits/38_xlstypes.js
Original file line number Diff line number Diff line change
Expand Up @@ -303,13 +303,15 @@ function parse_Bes(blob) {

/* [MS-XLS] 2.5.240 ShortXLUnicodeString */
function parse_ShortXLUnicodeString(blob, length, opts) {
var cch = blob.read_shift(1);
var cch = blob.read_shift(opts && opts.biff >= 12 ? 2 : 1);
var width = 1, encoding = 'sbcs-cont';
var cp = current_codepage;
if(opts && opts.biff >= 8) current_codepage = 1200;
if(opts === undefined || opts.biff !== 5) {
if(!opts || opts.biff == 8 ) {
var fHighByte = blob.read_shift(1);
if(fHighByte) { width = 2; encoding = 'dbcs-cont'; }
} else if(opts.biff == 12) {
width = 2; encoding = 'wstr';
}
var o = cch ? blob.read_shift(cch, encoding) : "";
current_codepage = cp;
Expand Down Expand Up @@ -340,6 +342,10 @@ function parse_XLUnicodeRichExtendedString(blob) {
/* 2.5.296 XLUnicodeStringNoCch */
function parse_XLUnicodeStringNoCch(blob, cch, opts) {
var retval;
if(opts) {
if(opts.biff >= 2 && opts.biff <= 5) return blob.read_shift(cch, 'sbcs-cont');
if(opts.biff >= 12) return blob.read_shift(cch, 'dbcs-cont');
}
var fHighByte = blob.read_shift(1);
if(fHighByte===0) { retval = blob.read_shift(cch, 'sbcs-cont'); }
else { retval = blob.read_shift(cch, 'dbcs-cont'); }
Expand All @@ -348,7 +354,7 @@ function parse_XLUnicodeStringNoCch(blob, cch, opts) {

/* 2.5.294 XLUnicodeString */
function parse_XLUnicodeString(blob, length, opts) {
var cch = blob.read_shift(opts !== undefined && opts.biff > 0 && opts.biff < 8 ? 1 : 2);
var cch = blob.read_shift(opts && opts.biff == 2 ? 1 : 2);
if(cch === 0) { blob.l++; return ""; }
return parse_XLUnicodeStringNoCch(blob, cch, opts);
}
Expand Down
Loading

0 comments on commit d7ecca0

Please sign in to comment.