Skip to content
Andy Jefferson edited this page Apr 11, 2014 · 6 revisions

This page provides a discussion area for implementation of the Cassandra store plugin.

Field-Column mapping

All current code assumes that any field (apart from embedded PCs) will have a single column representing it in the CQL "table". It could be argued that we should allow this to be more flexible (like with RDBMS mapping code) so we could have a Point type being mapped to 2 Cassandra columns (x, y) for example. Comments?

Null field values

Cassandra won't store null field values - see https://issues.apache.org/jira/browse/CASSANDRA-5648 . So, what do we do with fields that are null? if they are put in as null in an INSERT/UPDATE then the cell is deleted. How does this impact any querying of that column? should we put in a dummy value to represent null (and allow the user to define the default value)?

Inheritance

Current idea is to have each class with their own table. Since Cassandra can't "join" then having the data for an object stored in multiple places would imply multiple SELECTs to retrieve the data hence inefficiency. So the table for a class will have columns for the class and any superclasses. Obviously a user could define other classes up the inheritance tree to have the same table name - so schema generation needs to be sufficiently robust to add any extra columns on (where not already in the table that was created for the root class for example).
For the same reason Secondary Tables aren't likely to be supported any time soon.

Connection

Default is a Session per PMF/EMF since it is threadsafe. User can set persistence property "datanucleus.cassandra.sessionPerManager" to true and get a Session per PM/EM. Cassandra logs warnings about generating a PreparedStatement that has been created before (the same CQL). Since we create the PreparedStatement from a Session object I would assume that they cannot be used across Sessions, but then even after adding caching of PreparedStatement per Session we still seem to get this WARN message.

Transactions

Cassandra doesn't support transactions as such. Some people have ideas of how to mimic them, but the initial plan is for any JDO/JPA Transaction operation to effectively be a no-op (i.e they are ignored by the runtime). In terms of "optimistic transactions" we can still do a version check before applying any change (when changes are flushed in commit/flush).

Queries

User can only query using indexed fields/properties. Are we going to add indexes automatically, or simply not evaluate use of a non-indexed field when processing the query in-datastore, and instead do that part in-memory ?

Queries : CQL

What form should the result take? Perhaps it should be List<Map<String, Object>> with each row being a Map of column value keyed by the column name.

Value Generation

GitHub master provides "increment" strategy. A user also has the in-memory options of "uuid-string", "uuid-hex", "uuid". Not sure if there is a concept of datastore-attributing of an insert (like MySQL "auto-increment") ?