Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract into multiple, more generic, libraries? #72

Open
eprothro opened this issue Nov 20, 2015 · 10 comments
Open

Extract into multiple, more generic, libraries? #72

eprothro opened this issue Nov 20, 2015 · 10 comments

Comments

@eprothro
Copy link
Contributor

@bsbodden and @hsgubert: I'm going to switch gears to some other things for a bit, but I wanted to get your thoughts on the following for when I get some time to come back to the topic of migrations in a few weeks.

It seems to me that this gem has three component parts:

  • configuration and session management
  • rails migrations
  • query helpers

It also seems to me that every application (not just rails applications) needs the first problem solved.

What would y'all think about creating a separate cassandra-configuration gem that could be responsible for management of cassandra.yml and client/cluster/session management best practices?

Short term, cassandra_migrations could depend on and utilize the cassandra-configuration library. I think this simplifies maintenance of this library and benefits the community who are mostly baking their own solutions to the cassandra-configuration problems (that library would allow them to solve the problem in a lightweight and generic way).

Longer term, I think it would make sense to have something like

  • cassandra-configuration
    • Standalone configuration and session management
    • Depends on cassandra-driver
  • cassandra-migrations
    • Configurable versioned migration management and DSL
    • Depends on cassandra-configuration
  • cassandra-migrations-rails
    • Rails generators, tasks, and helpers for cassandra migrations the Rails Way
    • Depends on cassandra-migrations and rails
  • cassandra-queries
    • Query abstraction layer for Cassandra Ruby driver
    • Depends on cassandra-configuration

Thoughts?

cc: #52

@hsgubert
Copy link
Owner

This is a really tricky question. There is a trade-off between coherence and simplicity here. When this gem was created, I and the first users just wanted a good way to integrate cassandra to a rails app. But later many people said they wanted to use a lot of the functionality in non-rails apps, so this is a real need people have.

On the other hand, we also want something easy an appealing for people to try out. When you're looking for a gem the last thing you want is a complex mosaic that you don't fully understand. I mean, one thing is rspec (that everyone knows) and the other is a more obscure/niche gem doing that.

I really don't know the answer and would love to hear other opinions. Currently my opinion is that we should break the gem if this brings benefits to the end user. For example, a version for rails and for non-rails apps is a benefit, as it allows people to use the gem in non-rails apps.

On the other hand, separating configuration/session management from migrations would be good for our internal organization, but can't we simply enforce this organization in our code? Thinking as the end user I think I prefer having a single gem and simply ignoring migrations if I don't want to, than having to understand the boundaries of 2 sub-gems.

That being said It would be great to hear other arguments

@eprothro
Copy link
Contributor Author

I agree with your hesitation. I'm trying to make sure my issue isn't just naming (e.g. If this were named less specifically than cassandra_migrations would I have issue with the multiple responsibilities?).

If the "client" class was designed in a way that the session could be used well outside the context of a migration, I could probably get behind combining configure/session and migrations.

Long term, I feel most strongly about the queries being in another library.

Short term I feel most strongly about allowing ruby users to have a configure/session solution that is designed to be clear and easy to use in their application code.

Thanks for the convo, will keep thinking over the weekend.

On Nov 20, 2015, at 4:11 PM, Henrique Gubert [email protected] wrote:

This is a really tricky question. There is a trade-off between coherence and simplicity here. When this gem was created, and I and the first users just wanted a good way to integrate cassandra to a rails app. But later many people said they wanted to use a lot of the functionality in non-rails apps, so this is a real need people have.

On the other hand, we also want something easy an appealing for people to try out. When you're looking for a gem the last thing you want is a complex mosaic that you don't fully understand. I mean, one thing is rspec (that everyone knows) and the other is a more obscure/niche gem doing that.

I really don't know the answer and would love to hear other opinions. Currently my opinion is that we should break the gem if this brings benefits to the end user. For example, a version for rails and for non-rails apps is a benefit, as it allows people to use the gem in non-rails apps.

On the other hand, separating configuration/session management from migrations would be good for our internal organization, but can't we simply enforce this organization in our code? Thinking as the end user I think I prefer having a single gem and simply ignoring migrations if I don't want to, than having to understand the boundaries of 2 sub-gems.

That being said It would be great to hear other arguments


Reply to this email directly or view it on GitHub.

@bsbodden
Copy link
Collaborator

@eprothro @hsgubert I agree with the separation. I was thinking about that. I even have some partial work on a querying + ORMish type of library. I think that we should explore the capabilities of the datastax driver deeper and then see where the holes are and whether the different libraries would have enough to do to be worth writing :-)

@hsgubert
Copy link
Owner

Right. I also feel very strongly about separating the querying and ORM. I guess the only reason why this was not done yet is because the current querying/ORM capability is so little that it doesn't justify a gem.

@eprothro I also agree with your short term goal, perhaps we could start with that? Extracting a session/configuration manager that works without rails.

@eprothro
Copy link
Contributor Author

Sounds like the right first step to me.

Next are decisions around naming and interface for a user (unique namespace vs. sharing Cassandra namespace vs. adding functionality to Cassandra module via monkey patch). Will think about these over the Thanksgiving holiday.

A few interface scenarios for which I assume we're all thinking about what the interface should be.

Adding a session attribute, transparently

class SomeQueryClass
  # include Cassandra config/session management methods    < ------ here

  def fetch
    session.execute(some_cql)
    ...
  end
end

Adding a session attribute, with configuration

class SomeDataMapperResourceClass
  # include Cassandra config/session management methods    < ------ here

  # specify which keyspace the session for this class should be connected to < ------ here

  def self.find_by_username(username, opts={})
    session.execute(some_cql)
    ...
  end
end

Getting the cluster config

# some_task.rb
class SomeTask

  def cluster_config
    # get the current cluster connection options     < ------ here
  end
end

Managing the cluster/sessions

# some_initializer_file.rb
SomeForkedProcess.after_fork do
  # Establish unique connection for this process fork      < ------ here
end

@sstgithub
Copy link
Contributor

@eprothro I think the separations make a lot of sense, but can you clear up what the scope of each gem would be? Would creating the keyspace, updating keyspace settings like RF, updating table settings all be part of cassandra-configuration (or would you have to do another migration using cassandra-migrations to update table settings)? Also, I assume since consistency level can be set on a cluster or on individual reads/writes that would be part of both cassandra-queries and cassandra-configuration?

@eprothro
Copy link
Contributor Author

@sstgithub Great questions.

I think the configuration gem's responsibility would be managing cluster and session configuration for multiple environments. These cluster and session objects would be instantiated cassandra-driver classes.

In that regard, I would expect to be able to tune default consistency of requests to the cluster/dc connection with the configuration gem by itself.

I don't think a configuration gem knows anything about a query, directly. So, tuning consistency for a query would be the client's responsibility, or a queries gem's responsibility if the client chose to use that.

The table settings one is an interesting question. I would expect cassandra.yml to describe connection settings. In that regard, something like replication_factor doesn't exactly belong. However, changing a replication factor probably doesn't makes sense as a traditional database (read: schema) migration, since it is something that is not shared across environments (the same way a table name, or column type is).

Currently, I only think replication_factor and class are being used for the cassandra:create task. For now, I would, personally, be ok with removing keyspace options from the cassandra.yml and having this create task simply use the defaults (e.g. simple, 1) and expect that users in production are managing non-schema properties competently.

My question would be "how are people currently managing non-schema database configuration mutations?" (e.g. keyspace properties like RF and Durable Writes, table options like compaction and cacheing).

I assume the answer is "manually" (e.g. tweaking settings and, prayerfully, documenting changes somewhere). If that's the case, I'd love to discuss and come up with a better answer eventually, but I don't know if that is the scope of this initial change.

Thoughts?

@sstgithub
Copy link
Contributor

@eprothro Apologies for the late reply.

I would say that replication_factor should remain in cassandra.yml as I see cassandra.yml as handling both connection and all other initial settings for the keyspace. Also, I think it might make more sense to use well-separated modules in one gem instead of multiple gems because there will be quite a bit of overlap with the DSL (for instance, defining default consistency level initially in the configuration module and then being able to use a different consistency level with each query in the query module)

As for the non-schema db config mutations, I don't know how others using this gem manage things but we do currently manage most of those manually. It would be great if they were handled by this gem but I agree thats outside the scope of this change.

@eprothro
Copy link
Contributor Author

eprothro commented Jan 4, 2016

@sstgithub I've been playing with this, and I agree that a single repo with well organized and loosely coupled modules is best. I think my current opinion is that this repo could still contain multiple gems, for those that want to use in isolation (similar to rails with activesupport activemodel, etc).

Regarding db config mutations, we handle these in a separate repo for our infrastructure configuration. I think this is appropriate for now, and I think just mentioning that this is a need that exists and is separate from schema mutations supported by the gem is probably responsible enough.

@eprothro
Copy link
Contributor Author

@sstgithub @hsgubert @bsbodden Wanted to update y'all on this.

Over the last year we've bootstrapped our way into production with Cassandra (went live in December). Along the way it turns out I've crated the libraries that we've discussed in this thread.

https://github.com/eprothro/cassie
https://github.com/eprothro/cassie-rails

In no way am I trying to be "that guy" and say "hey, let's all just use these!".

For now, I just wanted to let y'all know about them, mention that they are in line with a lot of what we've discussed here, and working well for us. I'm 100% open to any comments, questions, nits, or discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants