Releases: dataspread/dataspread-web
Release 0.5.1 - Docker Deployment
This release enables DataSpread to be deployed locally through Docker, reducing the required software and number of setup steps needed to get DataSpread up and running.
Known Issues
- Maven dependencies are not cached between image builds leading to a slower rebuild
Release 0.5 - New Interface (React Virtualized)
This release adds the following functionality:
-
Spreadsheet created using React Virtualized
A new spreadsheet front-end has been developed based on React Virtualized to efficiently load arbitrarily large spreadsheets. This will allow users to seamlessly scroll through spreadsheets with billion rows without any delay. Moreover, the new front-end fixes the layout thrashing issue that was caused by the JavaScript-based ZK front-end where any scroll would display a white screen at the bottom for a few seconds before displaying the entire spreadsheet. -
Navigation panel for exploring tabular spreadsheets (Demo). We support the following interactions:
*Exploring the data by a specific attribute: displays an interactive histogram on which users can perform operations like panning and semantic zooming to navigate the data without having to scroll.
*Aggregating spreadsheet data: users can issue spreadsheet formulae on the navigation panel to view aggregate statistics on the data. Users can view different representations of the aggregate data by switching between text and chart representation.
*Histogram customization: users can customize the auto-generated histogram to fit their own navigation purpose using operations like splitting and merging.
*Navigational history: user's navigational history and context are maintained by the interface similar to directory browsers. Users can interact with the history to view and access previously explored spreadsheet location. -
Column width adjustment: now you can manually resize the column width of each cell.
-
Copy and paste: you can now copy and paste element using a copy/paste button located near the navigation bar.
-
File Upload and Reload: You can now upload any .csv files and reload using the new Interface!
-
Menu Bar: Introduced the new menu bar template! All functions now live in the drop-down menu.
Known Issues
- The navigation feature is in memory and works with data that fits in memory.
- The navigation attribute can be either numeric or text or Date.
- Copy and paste buttons in the navigation does not work yet. Copy and paste using the keyboard shortcut is also not implemented.
- The menu bar icons will not work and that is functional. Please use the drop-down menu
- There are reports regarding formula's in the uploaded csvs to become string.
- New file not yet supported
Snapshot 0.4.1 - New Interface Pre-Alpha
This "pre-alpha" snapshot includes the new interface and the resulting changes for the backend. In this "snapshot release," we migrated the previous framework to work with the new interface. The features are shown below (for details of the features, please refer to the RESTful API wiki page):
For this release, our objective includes the following functions-- that's the plan, but not all were completed. Due to the incomplete work, we decided to make this release only a "snapshot" instead of a full release.
I. New Interface: Ths design is similar to the previous interface, but is now implemented based on Handontable, a Javascript spreadsheet library.
- Menu bars including styling, selection, and formula bars.
- Worksheet including a grid of cells, sheet tabs, and a scroll bar.
II. Basic Functions
- Book Functions: Get a list of books, open a book, rename a book, delete a book, add a new book, share an existing book.
- Sheet Functions: Get a list of sheets, open a sheet, rename a sheet, reorder sheets, delete a sheet, copy an existing sheet, clear a sheet.
- Cell Functions: Get a collection of cells including formula-evaluated values and tables, put a range of cells, insert rows, insert columns, delete rows, delete columns.
- Cell Font, Format & Style Functions: Change the format of a collection of cells, get the format of a collection of cells.
- Formula Evaluation & Dependency Handlers: Put and evaluate a formula, notify dirty cells, notify dirty cells reevaluated.
- Table Functions: Create a table with schema, link a table, unlink a table, drop a table, reorder a table using its attributes, filter a table using its attributes, get a list of accessible tables, delete rows from a table, delete attributes from a table, insert rows into a table, insert attributes into a table, change attributes name and type of a table, update table tuples.
III. Authorization
- Google-based Login: Integrate Google third-party login to grant access to DataSpread private session.
- Books and Tables Sharing: Allow sharing books and tables to other users in DataSpread.
IV. Multi-Session Access
- Two-way Communication Mechanism: Communication between the front-end and back-end to support synchronization across related sessions for all supported functions.
Completed Functions
- I. New Interface: Completed.
- II. Basic Functions: Completed II.1 Book Functions, II.2 Sheet Functions (except reorder sheets), and II.3 Cell Functions; completed the back-end APIs of II.4 Format Functions (only update), II.5 Formula Functions (only put formula and evaluate formula) and II.6 Table Functions.
- III Authorization: Completed the back-end.
- IV Multi-Session Access: Completed the back-end.
Todo
- II Basic Functions: Integration of II.2 Sheet Functions (reorder sheets); back-end API and integration between the front-end and back-end of II.4 Format Functions and II.5 Formula Functions.
- III Authorization: Integration between front-end and back-end.
- IV Multi-Session Access: Integration between front-end and back-end.
- Overall error handling.
Known Issues
- Unable to use Apache Maven to package the compiled package war file.
- Interface and back-end are separated into two projects.
- The getCells function is not working properly (the latest version was merged with master).
Release 0.3.3 - New Features
This release adds the following functionality:
- Navigation
- scroll the navigation tree to see high level summary of spreadsheet
- click through the tree nodes to jump to different spreadsheet positions
- show histograms for each node for understanding of the spreadsheet data distribution
- organize/sort the spreadsheet based on different columns and then view and interact with new
navigation tree
- Improved Import Sheet
The performance of import sheet has been improved. - Formatting
The row/column formatting is now preserved. - Asynchronous Formulae Evaluation
Formula evaluation in a separate thread. The result will be reflected in the interface once the calculation is done. - Multiple User Accessing the Same Sheet
Users can work on the same sheet at the same time.
Known Issues
- Users on the same sheet cannot update the same tuple at a time.
- Navigation only applies to tabular data with a header.
- Navigation is slow for a large spreadsheet.
- Navigation only support string sort.
- Row/Column formatting has a limit on size.
Release 0.3.2 - More Operators
Add Select and Join operators
Allow drag&drop of the index function
Release 0.3.1 - Bug fixes for v0.3
This release fixes major bugs from v0.3.
Release 0.3 - Tables
This release adds the following functionality:
- Relational Tables on a Spreadsheet
We now support relational tables on a spreadsheet. We do not maintain a copy of the tables on the spreadsheet; thus the underlying database has the only copy of the table. Any updates to the tabular regions on the sheet are pushed in real-time to the database. We support the following operations:- Link Table: Declare an area on the spreadsheet linked with a relational table in the database.
- Create Table: Create a relational table based on a region on the sheet and link the region to the newly created table.
- Add/Delete Columns: Append columns to a linked table or delete existing columns.
- Add/Delete Rows: Append rows to a linked table or delete existing columns.
- R-tree for dependency
The dependency table in internally handled by using an R-tree to improve scalability. - Relational Operators
We now support the following set/relational-operator-like operations within formulae:
UNION
,
DIFFERENCE
,
INTERSECTION
,
CROSSPRODUCT
,
PROJECT
,
RENAME
- SQL Function
We allow users to execute arbitrary SQL statements using a new functionSQL
. - Multi-cell functions
The relational operators are multi-cell functions, i.e., the output can consist of a collection of cells. - Asynchronous formula computation
The formulae on the spreadsheet will be computed asynchronously thereby not blocking the users. - Private sheets
We now support user login and have individual books for the users.
Known Issues
- New records can be only added to the bottom of the table.
- The only way to reorder tuples is to delete and add new at the bottom.
- The updates from the backend database are not pushed to the front end in real time. Refreshing a spreadsheet will reload the updates from the database.
- Database errors cascade across other operations.
Release 0.2
Internal release featuring
- Table functionality
With a table of data in the sheet, the user can:- Create Table: by selecting the full range of the table, clicking on Table Menu in the Main-Menu Bar, selecting Create Table by Range, providing a table name (alphanumeric only without spaces and start with a letter).
- Delete Table: by selecting the full range of a created table, clicking on Table Menu in the Main-Menu Bar, selecting Delete Table by Range.
- Use Tables Sidebar: by toggling it from the Table Menu, a user can see the created tables within that book.
- R-tree for dependency
The dependency tree now will be handled by using a compressed R-tree. - Relational Operators
We have the basic support of set/relational-operator-like operations. The following functions are now available for use in formulae:
UNION
,
DIFFERENCE
,
INTERSECTION
,
CROSSPRODUCT
,
SELECT
,
PROJECT
,
RENAME
The output of these functions has the array value type.- The function
UNION
has the syntaxUNION(S1, S2)
, with each of S1 and S2 as an array value. Instead of an array value, one can use a (contiguous) range of non-array values instead; it would be treated as an array. Each row in an array is treated as a tuple. S1 and S2 must have the same number of columns, otherwise it returns a#VALUE!
error. DIFFERENCE
andINTERSECTION
, andCROSSPRODUCT
have the same syntax.CROSSPRODUCT
also has the same syntax as above, but it does not have the same-column-size requirement.SELECT
has the syntaxSELECT(S, C)
. Here C is an array of boolean values with the same number of rows as S, with one column. It selects the subset of rows whose corresponding value in C is true.
- The function
- INDEX Function
To view the results of the previous formulae, we now support a new cell value type, "array". In addition, the functionINDEX
has a new behavior operating on array values.- A cell containing an array value will display the size of the (two-dimensional) array of the value, like
[5x3]
. - In order to display an element in an array value, one needs to use
INDEX(array, row, [column])
, specifying the row/column index. For example, if cell A5 contains a 5 by 3 array value,INDEX(A5,2,3)
is the array's element in row 2, column 3.INDEX
returns a#REF!
error if the row/column is out of range. If the array is one-dimensional (has one column), then the column index is optional; it defaults to the first column.
- A cell containing an array value will display the size of the (two-dimensional) array of the value, like
Known Issues
- The output is undefined if an operand is not a set (has repetitions).
- The relational operators are not well-integrated with tables:
PROJECT
,RENAME
andJOIN
would be more meaningful once attributes are actually named. - String equality is not checked properly.
Release 0.1.1
DataSpread: A Spreadsheet-Database Hybrid System
DataSpread is a spreadsheet-database hybrid system, with a spreadsheet frontend, and a database backend. Thus, DataSpread inherits the flexibility and ease-of-use of spreadsheets, as well as the scalability and power of databases. A paper describing DataSpread's architecture, design decisions, and optimization can be found here. DataSpread is a multi-year project, supported by the National Science Foundation via award number 1633755.
Version
The current version is 0.1.
Features
DataSpread is built using PostgreSQL and ZKSpreadsheet, an open-source web-based spreadsheet tool.
DataSpread's version 0.1 enables users to scale to billions of cells and return results for common spreadsheet operations within seconds. It does so via on-demand loading of spreadsheet data.
Like traditional spreadsheet software, DataSpread supports standard spreadsheet book and sheet operations like Load, Rename, Delete, and Import (via XLS and XLSX, and CSV). Any updates to the spreadsheets are automatically saved.
Like traditional spreadsheet software, DataSpread supports the use of 225+ spreadsheet functions, along with formatting and styling operations. It also supports row and column operations like insert, delete, cut, copy, and paste; during insertion and deletion, formulae are updated as is the case in traditional spreadsheet software.
It supports all these operations while scaling to arbitrarily large spreadsheets.
In future releases, DataSpread will support SQL on the spreadsheet frontend, along with other relational algebra-based interactions. It will also support joint formula evaluation and optimization.
Key Design Innovations
- DataSpread employs a flexible hybrid data model to represent spreadsheet data within a database.
- DataSpread uses positional indexing techniques to both locate data by position, and keep it up-to-date as the data is updated.
- DataSpread also employs a LRU caching mechanism to retrieve and keep in memory data from the database on demand.
- DataSpread also employs speculative fetching to fetch additional data beyond the user's current spreadsheet window.
Setup Instructions:
You can directly use DataSpread via our cloud-hosted site.
To host DataSpread locally you can either use one of the pre-build war files, available here, or build the war file yourself from the source.
Required Software
- Apache Ant >= 1.6
- Java Platform (JDK) >= 8
- PostgreSQL >= 9.5
- PostgreSQL JDBC driver - 9.4.1208
- Apache Tomcat >= 8.5.4
- Apache Maven >= 3.5.0
Building Instructions (To generate a war file)
-
Clone the DataSpread repository. Alternatively, you can download the source as a zip or tar.gz.
-
Use maven to build the
war
file using the following command. After the build completes the war is available atwebapp/target/DataSpread.war
.mvn clean install
Deploying DataSpread locally.
-
Install PostgreSQL database. Postgres.app is a quick way to get PostgreSQL working on Mac. For other operating systems check out the guides here.
-
Create a database and an user who has access to the database. Note the database name, username and password. Typically when you have PostgreSQL installed locally the password is blank.
-
Install Apache Tomcat. You can use the guide here. Make a note of the directory where tomcat is installed. This is known as
TOMCAT_HOME
in all documentation. -
Update the Tomcat configuration. You need to update the following two files, which are present in
conf
folder underTOMCAT_HOME
folder.web.xml
by adding the following text at the end of the file before the closing XML tag.
<listener> <listener-class>org.model.DBHandler</listener-class> </listener>
context.xml
by adding the following text at the end of the file before the closing XML tag.
<Resource name="jdbc/ibd" auth="Container" type="javax.sql.DataSource" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://127.0.0.1:5432/<database_name>" username="<username>" password="<password>" maxTotal="20" maxIdle="10" maxWaitMillis="-1" defaultAutoCommit="false" accessToUnderlyingConnectionAllowed="true"/>
Replace
<database_name>
,<username>
and<password>
with your PostgreSQL's database name, user name and password respectively. -
Copy
postgresql-9.4.1208
(Download from here) tolib
folder underTOMCAT_HOME
. It is crucial to have the exact version of this file. -
Deploy the war file within Tomcat. This can be done via Tomcat's web interface or by manually copying the war file in the
webapps
folder underTOMCAT_HOME
. -
Now you are ready to run the program. Visit the url where Tomcat is installed. It will be typically http://localhost:8080/DataSpread_war/ for a local install.
License
MIT