A command-line tool to build a queue of specs and run them in parallel on multiple machines/processes with RSpec.
A really large spec suite (hours) takes significantly less to execute when spread across multiple processes.
Naive approach is to split spec files into batches, and run a process passing it a batch of files, but due to uneven distribution spikes in total wall clock time happen. Distrib's queue feature allows to dynamically serve spec files one by one over the network, rather than pre-splitting them into batches.
In the cloud the machines are either ephemeral, e.g. short-lived and can be preempted at any moment of time, or significantly more expensive, making it either impossible or expensive to run specs on a single machine. The goal is to reduce total build time to minutes, and even running on a beefiest machine doesn't help much to achieve that.
rspec-distrib is a relatively simple client-server (Leader-Worker) wrapper on
top of RSpec, which dynamically serves spec file names to clients (workers) that
load and execute them one by one, and aggregates the results in the same format
a regular rspec
would do.
It is possible to run rspec-distrib on a local machine, or run the Leader locally, and Workers remotely, or run both Leader and Workers on CI, depending on your needs. There are no tangible limitations on the number of Workers. Workers can run side-by-side using parallel_tests, sharing the external services like database, Redis, Memcached, ElasticSearch, that your integration tests might need.
The queue is fault-tolerant, e.g. when a worker machine goes down or experiences a network partition, the spec file being executed is returned back to the queue on timeout, and later passed to another worker.
Add to the Gemfile:
git '[email protected]:toptal/test-distrib.git', branch: 'main' do
gem 'distrib-core', require: false, group: [:test]
gem 'rspec-distrib', require: false, group: [:test]
end
There is not much difference between running on local machine, or across the network.
$ bundle exec rspec-distrib start | $ bundle exec rspec-distrib join 127.0.0.1
4386 files have been enqueued | .................*......F..._
Using seed 27792 |
The Leader purpose is to serve spec file names one by one to workers, and aggregate the results of running those specs with RSpec.
The following command:
- builds a queue of spec files
- starts a watchdog thread (see more about watchdog in the 3rd stage)
- exposes a Leader DRb server on all the network interfaces
rspec-distrib start
You can specify seed, which will be used to randomise order of examples on workers. Or it will be generated automatically.
rspec-distrib start 12345
Once there are no more spec files left, the Leader drops all connections, and reports.
Worker connects to the Leader, receives spec file names from it, and report back the results.
The following command runs a worker:
rspec-distrib join leader_address
where leader_address
is either an IP address, or a domain name.
Worker requests spec file names from the Leader, next_test_to_run
and reports
the execution results report_file
back to the leader.
To start up development of the gem first make sure you could run the following commands without problems:
bundle install
bundle exec rspec spec
bundle exec rspec features
bundle exec rubocop
bundle exec yardoc --fail-on-warning
Tests under spec
are unit tests, while features
are integration tests.
Features could be used for manual tests while developing. To proceed such manual test open following directory in two separate console tabs/windows:
cd features/fixtures/specs/
Pick features set (directory name) from features/fixtures/specs.
Assuming we want to run passing
features set:
export RSPEC_DISTRIB_FOLDER=passing
Tune settings to have more time for manual tests:
export RSPEC_DISTRIB_FEATURES_TEST_TIMEOUT=5
export RSPEC_DISTRIB_FEATURES_FIRST_TEST_PICKED_TIMEOUT=30
Start leader:
bundle exec rspec-distrib start
bundle exec rspec-distrib join 127.0.0.1
Workers are sending the example reports to the Leader immediately after running specs from that spec file. This means that if a worker dies, the Leader has kept all the previous reports.
rspec-distrib
expects to find configuration in .rspec-distrib
file
which is loaded if it exists. Configuration is expected to be a Ruby file.
ℹ️ See rspec/distrib/configuration.rb for the full list of options.
Override default list of the spec files:
RSpec::Distrib.configure do |config|
config.tests_provider = -> {
Dir.glob(['spec/**/*_spec.rb', 'engines/**/*_spec.rb'])
}
end
There are several types of issue may occur during execution of specs on workers.
- Expectation errors (aka legit failures of spec)
- Failures in between of spec executions (
before
/after
blocks) - Worker failed to start completely
rspec-distrib
provides an ability to handle such errors.
rspec-distrib
can re-run specs by certain exceptions:
RSpec::Distrib.configure do |config|
config.error_handler.retryable_exceptions = ['Elasticsearch::Transport::Transport::Errors::ServiceUnavailable']
config.error_handler.retry_attempts = 2
end
It means that any spec which failed because of Elasticsearch::Transport::Transport::Errors::ServiceUnavailable
will be retried up to two times.
Leaving retryable_exceptions
empty means ANY.
rspec-distrib
can be configured to fail leader if failures which occur in workers on startup or in before
/after
blocks.
RSpec::Distrib.configure do |config|
config.error_handler.fatal_worker_failures = ['NameError']
config.error_handler.failed_workers_threshold = 2
end
It means that if worker failed with DB error outside of spec - leader will not stop and continue run anyway on other workers.
Leaving fatal_worker_failures
empty means no error will fail leader.
You can specify your own object to handle failures. Here is the interface for it:
class MyErrorHandlingStrategy
def retry_test?(file, example_groups, exception)
# return true to retry the file
end
def ignore_worker_failure?(exception, context_description)
# return true to ignore the exception
end
end
And use it like this:
RSpec::Distrib.configure do |config|
config.error_handler = MyErrorHandlingStrategy.new
end
Set equal timeout for all spec files to 30 seconds:
RSpec::Distrib.configure do |config|
config.test_timeout = 30 # seconds
end
To specify timeout per spec file use and object that responds to call
and receives
the spec file path as an argument. The proc returns the timeout in seconds.
This is also useful for cases where some specs have a timeout strategy and some
don't.
RSpec::Distrib.configure do |config|
config.test_timeout = ->(spec_file) do
10 + 2 * average_execution_in_seconds(spec_file)
end
end
All RSpec configuration should be in spec_helper.rb
or rails_helper.rb
.
You should require spec_helper.rb
/rails_helper.rb
in .rspec
file.
If you require spec_helper.rb
/rails_helper.rb
in spec file - configuration may not apply properly!
Is it simple to use it in my project?
Yes, it's simple if you are using RSpec already. rspec-distrib is almost a drop-in replacement. If you plan to run several Workers side-by-side on a single machine, check parallel_tests documentation how to set your project up.
How is timeout defined?
Timeout is configurable in a configuration file. It can be configured for all of the spec files, or, if you have a storage where you keep previous execution times per spec file, we encourage you to use this average of a couple of last builds to calculate the timeout. Using double the average execution time plus ten seconds is a good strategy that prevents spec files from returning to the queue while still being executed. It mitigates two cases:
-
spec execution time doubled
-
spec was fast (milliseconds), and then a spec file or its dependencies change and it execution time changes, and it takes seconds.
What happens if there's a really slow spec?
Spec file is picked up by Worker #1, times out, is returned to the queue, and is picked up by the next worker, Worker #2. The first to submit the results wins.
What if there's nothing left in the queue, but some spec files are being processed?
The idle workers wait in the queue just in case there's a timed out spec file.
Is it secure?
DRb, the transport used, is not secure. Executing of arbitrary code on the Leader machine is possible. Make sure no one outside of the test environment can communicate with it.
Who can access the Leader machine?
Anyone who has access to it over the network. Make sure it's not exposed to the Internet.
What is the default port?
It's port 8787, default for DRb.
Is it thread-safe?
Yes, thread-safe data structures are used in the implementation, and access to non-thread-safe data structures is synchronized.
Are the workers using the same seed to execute the specs?
Yes. Seed is set on the Leader side and passed to the workers.
Is it fault-tolerant?
Yes. Any number of workers can crash, the build results are not affected, but the total build time will. Make sure to restart or spawn additional workers when they crash.
How the worker knows when there are no more specs to run?
The Leader drops the connection when there's nothing left, and Workers shut down gracefully.
What's the order of serving the spec files?
Order is controlled by tests_provider
. Default provided strategy is at rspec/distrib/leader/tests_provider.rb - it's simplified one.
In more advanced scenario spec files could be served from the slowest (basing on previous builds results) to the fastest to reduce worker idle time, and reduce the risk of waiting for a long spec file to execute in the end of the build. This could be achieved by implementing custom provider.
Bug reports and pull requests are welcome on GitHub.
The gem is available as open source under the terms of the MIT License.