Skip to content

Architecture of NanoXLSX

Raphael Stoeckli edited this page Aug 25, 2018 · 2 revisions

Introduction

The architecture of NanoXLSX is based on the OOXML specifications for spreadsheets (spreadsheetML).
You can find the full documentation at: http://www.ecma-international.org/publications/standards/Ecma-376.htm An overview is available at: http://www.ecma-international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf
Also a good resource is the unofficial site: http://officeopenxml.com/

What is an XLSX file?

XLSX is the successor of the XLS format from Microsoft. XLS is a proprietary and binary format, whereas XLSX is based on XML and officially standardized by ECMA.
However, an XLSX file is not just one renamed XML file. It is a collection of several XML files, compiled by a packer. Simply said, an XLSX file is a renamed zip file that contains several XML files (and other formats in case of images or macros) and is organized in a specific folder structure. To explore an XLSX file, you simply can rename the XLSX file to the file extension .zip (test.xlsx → test.zip). Now you can unzip the file and browse the previously compiled (by the packer) XML files. Don't forget to make a backup of your file if it is important. Alternatively, you can use a tool like Media-Extractor to explore the content without tampering of the file.

What is the content of an XLSX file?

NanoXLSX adds only the mandatory minimum of XML files to the XLSX file. If you check a file generated by Microsoft Excel, you will find several additional files. Not mentioned here are the .rels files (maintaining relations between XML files) and the [Content_Types] file (summary of the content and its types). Furthermore, other files like .emf images or .bin files in case of macros may be embedded by Excel.

The mandatory files for a valid XLSX file are:

  • workbook.xml (in folder \xl): This file defines some global parameters for the whole workbook.
  • styles.xml (in folder \xl): This file defines all styles within the worksheets.
  • sheet[n].xml (in folder \xl\worksheets): Per worksheet is one single XML file defined. In each file are all cells and the contents defined.
  • sharedStrings.xml (in folder \xl): This file contains all strings of all worksheets in a list. The values in the worksheets are replaced with the corresponding ID of the string value in this document.
  • app.xml (in folder \docProps): This file defines application specific meta-data. This file is only mandatory if meta-data is used.
  • core.xml (in folder \docProps): This file defines document specific meta-data like keywords, creator or creation date/time. This file defines application specific meta-data. This file is only mandatory if meta-data is used.

Link between NanoXLSX and the OOXML standard

NanoXLSX uses the nomenclature of OOXML (most times) and defines classes based on the entities of OOXML. Since NanoXLSX is derived from PicoXLSX, most classes handle writing functionalities and follow the OOXML specs. The classes for the handling of file generation do not follow the nomenclature since they are not specified by OOXML and can be seen as helper classes to retrieve XLSX data.
Defined classes are:

  • Cell: Class representing a cell of a worksheet
  • Cell.Address: Class representing a cell address (e.g. "A20")
  • Cell.Range: Class representing a cell range (e.g. "C2:R150")
  • Cell.BasicFormulas (not in OOXML specs): Factory class for the most important Excel formulas (min, max, round, ceil, floor, median, average, sum and vlookup)
  • Column: Class representing a column of a worksheet
  • Metadata (not in OOXML specs): Class representing the meta data of a workbook
  • Style: Class representing a Style with sub classes within a style sheet. An instance of this class is only a container for the different sub-classes. These sub-classes contain the actual styling information.
  • Style.BasicStyles (not in OOXML specs): Factory class with the most important predefined styles
  • Style.Border: Class representing a Border entry. The Border entry is used to define frames and cell borders
  • Style.CellXf: Class representing an XF entry. The XF entry is used to make reference to other style instances like Border or Fill and for the positioning of the cell content
  • Style.Fill: Class representing a Fill (background) entry. The Fill entry is used to define background colors and fill patterns
  • Style.Font: Class representing a Font entry. The Font entry is used to define text formatting
  • StyleManager (not in OOXML specs): Class to handle all aspects of styles. Since the styling of Excel, the several aspects (e.g. fonts or fills) respectively, can become quite complicated, the class manages the styles and their subsets automatically.
  • Workbook: Class representing a workbook
  • Workbook.Shortener (not in OOXML specs): Class to ease the use of NanoXLSX4
  • Worksheet: Class representing the style sheet of a workbook
  • LowLevel (not in OOXML specs): Class for the low level handling when it comes to writing Excel files, like assembling and packing
  • Reader (not in OOXML specs): Class for to load Excel files
  • Reader (not in OOXML specs): Main class to load Excel files
  • WorkbookReader (not in OOXML specs): Class to retrieve workbook data when loading an XLSX file
  • SharedStringReader (not in OOXML specs): Class to retrieve shared strings when loading an XLSX file
  • WorksheetReader (not in OOXML specs): Class to retrieve worksheet data of the workbook when loading an XLSX file

Exceptions are (not defined in the specifications but used to handle errors):

  • FormatException: Class for exceptions regarding format error incidents
  • IOException: Class for exceptions regarding stream or save error incidents
  • RangeException: Class for exceptions regarding out-of-range incidents
  • StyleException: Class for exceptions regarding styles
  • WorksheetException: Class for exceptions regarding worksheets