Skip to content

Releases: dataspread/dataspread-web

Release 0.5.1 - Docker Deployment

08 Oct 05:42
4ca8bfc
Compare
Choose a tag to compare

This release enables DataSpread to be deployed locally through Docker, reducing the required software and number of setup steps needed to get DataSpread up and running.

Known Issues

  • Maven dependencies are not cached between image builds leading to a slower rebuild

Release 0.5 - New Interface (React Virtualized)

06 May 15:55
d2f7ba6
Compare
Choose a tag to compare

This release adds the following functionality:

  • Spreadsheet created using React Virtualized
    A new spreadsheet front-end has been developed based on React Virtualized to efficiently load arbitrarily large spreadsheets. This will allow users to seamlessly scroll through spreadsheets with billion rows without any delay. Moreover, the new front-end fixes the layout thrashing issue that was caused by the JavaScript-based ZK front-end where any scroll would display a white screen at the bottom for a few seconds before displaying the entire spreadsheet.

  • Navigation panel for exploring tabular spreadsheets (Demo). We support the following interactions:
    *Exploring the data by a specific attribute: displays an interactive histogram on which users can perform operations like panning and semantic zooming to navigate the data without having to scroll.
    *Aggregating spreadsheet data: users can issue spreadsheet formulae on the navigation panel to view aggregate statistics on the data. Users can view different representations of the aggregate data by switching between text and chart representation.
    *Histogram customization: users can customize the auto-generated histogram to fit their own navigation purpose using operations like splitting and merging.
    *Navigational history: user's navigational history and context are maintained by the interface similar to directory browsers. Users can interact with the history to view and access previously explored spreadsheet location.

  • Column width adjustment: now you can manually resize the column width of each cell.

  • Copy and paste: you can now copy and paste element using a copy/paste button located near the navigation bar.

  • File Upload and Reload: You can now upload any .csv files and reload using the new Interface!

  • Menu Bar: Introduced the new menu bar template! All functions now live in the drop-down menu.

Known Issues

  • The navigation feature is in memory and works with data that fits in memory.
  • The navigation attribute can be either numeric or text or Date.
  • Copy and paste buttons in the navigation does not work yet. Copy and paste using the keyboard shortcut is also not implemented.
  • The menu bar icons will not work and that is functional. Please use the drop-down menu
  • There are reports regarding formula's in the uploaded csvs to become string.
  • New file not yet supported

Snapshot 0.4.1 - New Interface Pre-Alpha

18 Jun 07:44
8d1f8f4
Compare
Choose a tag to compare

This "pre-alpha" snapshot includes the new interface and the resulting changes for the backend. In this "snapshot release," we migrated the previous framework to work with the new interface. The features are shown below (for details of the features, please refer to the RESTful API wiki page):

For this release, our objective includes the following functions-- that's the plan, but not all were completed. Due to the incomplete work, we decided to make this release only a "snapshot" instead of a full release.

I. New Interface: Ths design is similar to the previous interface, but is now implemented based on Handontable, a Javascript spreadsheet library.

  1. Menu bars including styling, selection, and formula bars.
  2. Worksheet including a grid of cells, sheet tabs, and a scroll bar.

II. Basic Functions

  1. Book Functions: Get a list of books, open a book, rename a book, delete a book, add a new book, share an existing book.
  2. Sheet Functions: Get a list of sheets, open a sheet, rename a sheet, reorder sheets, delete a sheet, copy an existing sheet, clear a sheet.
  3. Cell Functions: Get a collection of cells including formula-evaluated values and tables, put a range of cells, insert rows, insert columns, delete rows, delete columns.
  4. Cell Font, Format & Style Functions: Change the format of a collection of cells, get the format of a collection of cells.
  5. Formula Evaluation & Dependency Handlers: Put and evaluate a formula, notify dirty cells, notify dirty cells reevaluated.
  6. Table Functions: Create a table with schema, link a table, unlink a table, drop a table, reorder a table using its attributes, filter a table using its attributes, get a list of accessible tables, delete rows from a table, delete attributes from a table, insert rows into a table, insert attributes into a table, change attributes name and type of a table, update table tuples.

III. Authorization

  1. Google-based Login: Integrate Google third-party login to grant access to DataSpread private session.
  2. Books and Tables Sharing: Allow sharing books and tables to other users in DataSpread.

IV. Multi-Session Access

  1. Two-way Communication Mechanism: Communication between the front-end and back-end to support synchronization across related sessions for all supported functions.

Completed Functions

  1. I. New Interface: Completed.
  2. II. Basic Functions: Completed II.1 Book Functions, II.2 Sheet Functions (except reorder sheets), and II.3 Cell Functions; completed the back-end APIs of II.4 Format Functions (only update), II.5 Formula Functions (only put formula and evaluate formula) and II.6 Table Functions.
  3. III Authorization: Completed the back-end.
  4. IV Multi-Session Access: Completed the back-end.

Todo

  1. II Basic Functions: Integration of II.2 Sheet Functions (reorder sheets); back-end API and integration between the front-end and back-end of II.4 Format Functions and II.5 Formula Functions.
  2. III Authorization: Integration between front-end and back-end.
  3. IV Multi-Session Access: Integration between front-end and back-end.
  4. Overall error handling.

Known Issues

  1. Unable to use Apache Maven to package the compiled package war file.
  2. Interface and back-end are separated into two projects.
  3. The getCells function is not working properly (the latest version was merged with master).

Release 0.3.3 - New Features

30 Dec 04:12
Compare
Choose a tag to compare

This release adds the following functionality:

  • Navigation
    • scroll the navigation tree to see high level summary of spreadsheet
    • click through the tree nodes to jump to different spreadsheet positions
    • show histograms for each node for understanding of the spreadsheet data distribution
    • organize/sort the spreadsheet based on different columns and then view and interact with new
      navigation tree
  • Improved Import Sheet
    The performance of import sheet has been improved.
  • Formatting
    The row/column formatting is now preserved.
  • Asynchronous Formulae Evaluation
    Formula evaluation in a separate thread. The result will be reflected in the interface once the calculation is done.
  • Multiple User Accessing the Same Sheet
    Users can work on the same sheet at the same time.

Known Issues

  • Users on the same sheet cannot update the same tuple at a time.
  • Navigation only applies to tabular data with a header.
  • Navigation is slow for a large spreadsheet.
  • Navigation only support string sort.
  • Row/Column formatting has a limit on size.

Release 0.3.2 - More Operators

16 Oct 07:37
Compare
Choose a tag to compare

Add Select and Join operators
Allow drag&drop of the index function

Release 0.3.1 - Bug fixes for v0.3

19 Sep 20:22
Compare
Choose a tag to compare

This release fixes major bugs from v0.3.

Release 0.3 - Tables

19 Sep 20:21
Compare
Choose a tag to compare

This release adds the following functionality:

  • Relational Tables on a Spreadsheet
    We now support relational tables on a spreadsheet. We do not maintain a copy of the tables on the spreadsheet; thus the underlying database has the only copy of the table. Any updates to the tabular regions on the sheet are pushed in real-time to the database. We support the following operations:
    • Link Table: Declare an area on the spreadsheet linked with a relational table in the database.
    • Create Table: Create a relational table based on a region on the sheet and link the region to the newly created table.
    • Add/Delete Columns: Append columns to a linked table or delete existing columns.
    • Add/Delete Rows: Append rows to a linked table or delete existing columns.
  • R-tree for dependency
    The dependency table in internally handled by using an R-tree to improve scalability.
  • Relational Operators
    We now support the following set/relational-operator-like operations within formulae:
    UNION,
    DIFFERENCE,
    INTERSECTION,
    CROSSPRODUCT,
    PROJECT,
    RENAME
  • SQL Function
    We allow users to execute arbitrary SQL statements using a new function SQL.
  • Multi-cell functions
    The relational operators are multi-cell functions, i.e., the output can consist of a collection of cells.
  • Asynchronous formula computation
    The formulae on the spreadsheet will be computed asynchronously thereby not blocking the users.
  • Private sheets
    We now support user login and have individual books for the users.

Known Issues

  • New records can be only added to the bottom of the table.
  • The only way to reorder tuples is to delete and add new at the bottom.
  • The updates from the backend database are not pushed to the front end in real time. Refreshing a spreadsheet will reload the updates from the database.
  • Database errors cascade across other operations.

Release 0.2

07 Jun 21:34
Compare
Choose a tag to compare
Release 0.2 Pre-release
Pre-release

Internal release featuring

  1. Table functionality
    With a table of data in the sheet, the user can:
    • Create Table: by selecting the full range of the table, clicking on Table Menu in the Main-Menu Bar, selecting Create Table by Range, providing a table name (alphanumeric only without spaces and start with a letter).
    • Delete Table: by selecting the full range of a created table, clicking on Table Menu in the Main-Menu Bar, selecting Delete Table by Range.
    • Use Tables Sidebar: by toggling it from the Table Menu, a user can see the created tables within that book.
  2. R-tree for dependency
    The dependency tree now will be handled by using a compressed R-tree.
  3. Relational Operators
    We have the basic support of set/relational-operator-like operations. The following functions are now available for use in formulae:
    UNION,
    DIFFERENCE,
    INTERSECTION,
    CROSSPRODUCT,
    SELECT,
    PROJECT,
    RENAME
    The output of these functions has the array value type.
    • The function UNION has the syntax UNION(S1, S2), with each of S1 and S2 as an array value. Instead of an array value, one can use a (contiguous) range of non-array values instead; it would be treated as an array. Each row in an array is treated as a tuple. S1 and S2 must have the same number of columns, otherwise it returns a #VALUE! error.
    • DIFFERENCE and INTERSECTION, and CROSSPRODUCT have the same syntax.
    • CROSSPRODUCT also has the same syntax as above, but it does not have the same-column-size requirement.
    • SELECT has the syntax SELECT(S, C). Here C is an array of boolean values with the same number of rows as S, with one column. It selects the subset of rows whose corresponding value in C is true.
  4. INDEX Function
    To view the results of the previous formulae, we now support a new cell value type, "array". In addition, the function INDEX has a new behavior operating on array values.
    • A cell containing an array value will display the size of the (two-dimensional) array of the value, like [5x3].
    • In order to display an element in an array value, one needs to use INDEX(array, row, [column]), specifying the row/column index. For example, if cell A5 contains a 5 by 3 array value, INDEX(A5,2,3) is the array's element in row 2, column 3. INDEX returns a #REF! error if the row/column is out of range. If the array is one-dimensional (has one column), then the column index is optional; it defaults to the first column.

Known Issues

  • The output is undefined if an operand is not a set (has repetitions).
  • The relational operators are not well-integrated with tables: PROJECT, RENAME
    and JOIN would be more meaningful once attributes are actually named.
  • String equality is not checked properly.

Release 0.1.1

24 Apr 16:04
Compare
Choose a tag to compare

DataSpread: A Spreadsheet-Database Hybrid System

dataspread-fiverr2-cropped

DataSpread is a spreadsheet-database hybrid system, with a spreadsheet frontend, and a database backend. Thus, DataSpread inherits the flexibility and ease-of-use of spreadsheets, as well as the scalability and power of databases. A paper describing DataSpread's architecture, design decisions, and optimization can be found here. DataSpread is a multi-year project, supported by the National Science Foundation via award number 1633755.

Version

The current version is 0.1.

Features

DataSpread is built using PostgreSQL and ZKSpreadsheet, an open-source web-based spreadsheet tool.

DataSpread's version 0.1 enables users to scale to billions of cells and return results for common spreadsheet operations within seconds. It does so via on-demand loading of spreadsheet data.

Like traditional spreadsheet software, DataSpread supports standard spreadsheet book and sheet operations like Load, Rename, Delete, and Import (via XLS and XLSX, and CSV). Any updates to the spreadsheets are automatically saved.

Like traditional spreadsheet software, DataSpread supports the use of 225+ spreadsheet functions, along with formatting and styling operations. It also supports row and column operations like insert, delete, cut, copy, and paste; during insertion and deletion, formulae are updated as is the case in traditional spreadsheet software.

It supports all these operations while scaling to arbitrarily large spreadsheets.

In future releases, DataSpread will support SQL on the spreadsheet frontend, along with other relational algebra-based interactions. It will also support joint formula evaluation and optimization.

Key Design Innovations

  • DataSpread employs a flexible hybrid data model to represent spreadsheet data within a database.
  • DataSpread uses positional indexing techniques to both locate data by position, and keep it up-to-date as the data is updated.
  • DataSpread also employs a LRU caching mechanism to retrieve and keep in memory data from the database on demand.
  • DataSpread also employs speculative fetching to fetch additional data beyond the user's current spreadsheet window.

Setup Instructions:

You can directly use DataSpread via our cloud-hosted site.

To host DataSpread locally you can either use one of the pre-build war files, available here, or build the war file yourself from the source.

Required Software

Building Instructions (To generate a war file)

  1. Clone the DataSpread repository. Alternatively, you can download the source as a zip or tar.gz.

  2. Use maven to build the war file using the following command. After the build completes the war is available at webapp/target/DataSpread.war.

    mvn clean install
    

Deploying DataSpread locally.

  1. Install PostgreSQL database. Postgres.app is a quick way to get PostgreSQL working on Mac. For other operating systems check out the guides here.

  2. Create a database and an user who has access to the database. Note the database name, username and password. Typically when you have PostgreSQL installed locally the password is blank.

  3. Install Apache Tomcat. You can use the guide here. Make a note of the directory where tomcat is installed. This is known as TOMCAT_HOME in all documentation.

  4. Update the Tomcat configuration. You need to update the following two files, which are present in conf folder under TOMCAT_HOME folder.

    1. web.xml by adding the following text at the end of the file before the closing XML tag.
    <listener>
        <listener-class>org.model.DBHandler</listener-class>
    </listener>
    
    1. context.xml by adding the following text at the end of the file before the closing XML tag.
    <Resource name="jdbc/ibd" auth="Container"
              type="javax.sql.DataSource" driverClassName="org.postgresql.Driver"
              url="jdbc:postgresql://127.0.0.1:5432/<database_name>"
              username="<username>" password="<password>"
                  maxTotal="20" maxIdle="10" maxWaitMillis="-1" defaultAutoCommit="false" accessToUnderlyingConnectionAllowed="true"/>
    

    Replace <database_name>, <username> and <password> with your PostgreSQL's database name, user name and password respectively.

  5. Copy postgresql-9.4.1208 (Download from here) to lib folder under TOMCAT_HOME. It is crucial to have the exact version of this file.

  6. Deploy the war file within Tomcat. This can be done via Tomcat's web interface or by manually copying the war file in the webapps folder under TOMCAT_HOME.

  7. Now you are ready to run the program. Visit the url where Tomcat is installed. It will be typically http://localhost:8080/DataSpread_war/ for a local install.

License

MIT