Skip to content

Insert Strategy

DuyHai DOAN edited this page Sep 6, 2016 · 8 revisions

Rationale

The Manager.crud().insert() operation generates INSERT statements with all fields of the entity. If some fields are not set (e.g. having null value), then Achilles just set the column to null in Cassandra

This sounds reasonable but has a huge impact on performance. Indeed for the CQL semantics, setting a column to null means deleting it thus creating a tombstone column.

Let's say you have an User entity with around 10 fields representing user details. It is obvious that on user account creation not all of them are provided, probably only login/name/password fields are filled.

When inserting this user in Cassandra you'll create 3 columns for the login/name/password fields and around 7 tombstones. Later on during compaction Cassandra will need to clean up those 7 tombstones.

Why don't we in the first place not insert those fields that are null thus not creating useless tombstones ?

The insert strategies below are the answer for this issue.

Insert all fields

This is the default behavior for Achilles. Although creating a lot of tombstones, this strategy still has a huge advantage with regard to the data consistency.

Let's suppose you have the following sequence of code:

	
	manager
		.crud()
		.insert(new User(10,'johndoe','John DOE','iamjohndoe!',32))
		.execute();
		
	...
	manager
		.crud()
		.insert(new User(10,'johndoe','John DOE','iamjohndoe!'))
		.execute();

What we would expect is that the 2nd insert() operation will wipe all the data from the first insert() and that's the case with the default insert all fields strategy.

Indeed the second insert() will generate a INSERT INTO user(id,login,name,password,age,...) VALUES(10,'johndoe','John DOE','iamjohndoe!',null,null....) statement which erases the previous value for the fields age and it saves our day.

Insert not null fields

This strategy only insert not null fields of an entity. If we take the previous example:

	manager
		.crud()
		.insert(new User(10,'johndoe','John DOE','iamjohndoe!',32))
		.execute();
		
	...
	manager
		.crud()
		.insert(new User(10,'johndoe','John DOE','iamjohndoe!'))
		.execute();

The second insert() now will generate a INSERT INTO user(id,login,name,password) VALUES(10,'johndoe','John DOE','iamjohndoe!') statement, erasing existing values for id/login/name/password but letting the old age column intact.

If we fetch all data from Cassandra for this user, we will end up with inconsistent data. To avoid that you'll need to issue a delete() operation before inserting again.

Configuration

Insert strategy can be customized for each entity or globally using the @CompileTimeConfig annotation. To choose between one or other strategy, annotate your entity with @Strategy(insert = ...). There are 2 possible values

  • info.archinnov.achilles.type.InsertStrategy.ALL_FIELDS
    • Pros: data consistency
    • Cons: may generate a lot of tombstones
  • info.archinnov.achilles.type.InsertStrategy.NOT_NULL_FIELDS
    • Pros: does not create useless tombstone
    • Cons: may introduce data inconsistency when overwriting existing partition with new value

Insert Strategy priority

Priority (ascending order) Description
1 (lowest priority) Global naming strategy defined at compile time on @CompileTimeConfig
2 Locally on each entity using the @Strategy annotation
3 (highest priority) at runtime, using the .withInsertStrateg() method on the manager.crud().inser() DSL

Home

Clone this wiki locally