-
Notifications
You must be signed in to change notification settings - Fork 121
Performance and Fairness
Artio follows a number of principles in its code in order to achieve good performance.
Zero Critical Path Allocation - in order to avoid GC pauses Artio doesn't allocate objects on critical paths in its codebase. A critical path for example being handling of messages. All its codecs are designed to be reused to minimize allocation of objects. Implementations of specific datatypes (for example the time value representations and DecimalFloat
class) are designed to be allocation free as well. This doesn't mean that Artio doesn't allocate objects, but it only does so on events that are considered one-off or infrequent, for example receiving an inbound FIX connection.
Minimal Copying Artio uses Aeron's tryClaim
strategy in order to minimise the unnecessary copying of data around. In order to read values out of Bytebuffers the flyweight strategy is extensively used.
Avoid contended state update and locks Most application are written with the idea that if there is some state that is shared between different threads that it will be stored in some shared area of memory and locks are used to update that shared state. Artio avoids this, trying to keep as much state as possible accessed and written to by a single thread. Where coordination needs to occur we achieve this through lock-free algorithms and mostly through writing messages into an Aeron stream.
Back pressure is an approach to handling situations where consumers of messages take longer to process them than producers do to produce them. Artio applies Aeron's back-pressure model extensively throughout its implementation and API. In a number of places API users are expected to return io.aeron.logbuffer.ControlledFragmentHandler.Action
instances, in order to control how Artio deals with back pressure. Most of the time you should return CONTINUE
objects in order to continue processing, however, if you are processing a message and are being being back pressured in your attempt to send another message to the next system then returning an ABORT
object will enable you to receive this message and retry.
Artio has a concept of a slow FIX consumer. This is a FIX connection that has started to lag behind its processing of messages and is slow. We detect slowness based upon the processing of messages. When Artio attempts to write a FIX message if it cannot completely write the FIX message to your operating systems' TCP channel then it marks the FIX session as being a slow consumer. This may sound like a low bar for something to be marked as slow but its worth unpacking what this means for a moment.
If you can't write the bytes of a FIX message via TCP then it means that back pressure at the TCP layer (via either Flow Control or Congestion Control algorithms) has occurred in order to limit the number of bytes in flight. Additionally the two machines that the TCP connection is between both have buffers at either end. So your counter party at this point is sizeof(their TCP receive buffer) + TCP bytes in flight + sizeof(your TCP send buffer) behind.
Once a FIX connection has gone into a slow consumer state then there are two resulting outcomes:
- We continue to send messages to that FIX connection as fast as is possible and they eventually catchup with Artio's Aeron stream and become a normal FIX connection.
- The counter party's performance limitations stop us from writing messages fast enough to the FIX connection and they fall further behind.
The point at which they get disconnected is when there is over EngineConfiguration.senderMaxBytesInBuffer()
bytes in the internal Aeron stream that are behind the TCP stream. Alternatively, if no bytes are able to be written for longer than EngineConfiguration.slowConsumerTimeoutInMs()
then the connection will be disconnected.