Table of Contents generated with DocToc
- Getting Started with Rust
- Some links on Rust
- Cool Rust Projects
- Rust Error Handling
- Rust Concurrency
- Data Processing and Data Structures
- Rust and Scala/Java
- CLI and Misc
- IDE/Editor/Tooling
- Testing and CI/CD
- Performance and Low-Level Stuff
Why the developers who use Rust love it so much - from StackOverflow survey, really good quotes
- The Rust Book - probably the best starting point
- Rustlings - small exercises to learn
- Rust By Example - also the guide on their site is pretty good.
- explaine.rs - paste Rust code into the window and hover over keywords to get explanations. Great for learning.
- Rustlang in a Nutshell - great introduction
- Rust Borrowing and Ownership - easy-to-read, short summary of basic ownership, borrowing, and lifetime references
- A Java Programmer Understanding Rust Ownership
- Rust Error Handling for Pythonistas
cheats.rs - Awesome quick ref.
-
Rust: A Unique Perspective - comprehensive summary about Rust ownership from angle of unique access, covers RC/Arc etc.
-
Learn Rust with Too Many Linked Lists - hilarious.
-
Jon Gjengset on Rust Lifetime Annotations - actually check out his Youtube channel, lots of great tutorials
-
The Evolution of Rust Programmers - hilarious look at different coding styles
-
Fireflowers: Rust in the words of its Practitioners - just brilliant commentary on what Rust is.
-
Oxidizing the Interview - hilarious read on a Rust technical interview
-
Rust and the Three Laws of Informatics - great detailed guide to how Rust allows developers to uncompromisingly achieve correctness, maintainability, AND efficiency
-
Why Scientists are turning to Rust - from Nature mag
-
Rust After the Honeymoon - by Bryan Cantrill, a list of top favorite Rust tricks/properties. Did you know that
{:#x?}
will pretty-print structs in HEX?? -
Prefer Rust over C/C++ - when to and when not to prefer Rust
-
- C2Rust and Quake - a tool to auto translate C to Rust!
-
Clear Explanation of Rust's Module System - easy intro guide
-
On Rusts Module System - good explanation of paths, naming, modules -- see this when compiler complains about cannot find symbols
Speed without wizardry - how using Rust is safer and better than using hacks in Javascript
Dealing with strings are confusing in rust, because there are two types: a heap-
allocated String
and a pointer to a slice of String bytes: &str
. Knowing
what to use, and defining structures on them, immediately exposes the steep
learning curve of ownership.
See the Guide to Strings for some help.
Online resources and help:
- The Rust Discord #beginners channel has been pretty helpful for me
- Rust IRC channel
- Rust for Rubyists
- Rust Playpen - closest thing to a REPL :(
- makepad - Web-based Rust + WebASM multimedia playground
Specific topics:
- Rust conversion reference
- Async Rust - A really concise and great intro to async/await
- Elegant library APIs in Rust - lots of good tips here
- Effectively using Iterators in Rust - on differences between
iter()
,into_iter()
, types, etc. - Generic Return Types in Rust - deep dive into
Iterator.collect()
, traits, and Rust's type system - Rust-san - sanitizers for Rust code, if the basic compiler checks are not enough :)
- Colorized Rust backtraces. :)
- Rust Macros case studies
- Overview of Macros in Rust - from Steve Klabnik
- Rust TypeState Pattern
- Pretty State Machines in Rust - great article on diff state machine patterns, use of enums and structs
- Init Struct Pattern - on patterns for initializing structs
- COW, Rust vs C++ - great dive into details of copy-on-write. Might be a great pattern for working with things like strings, where cloning might be expensive.
CLI tools:
- XSV - a fast CSV parsing and analysis tool
- Ripgrep - insanely fast grep utility, great for code searches. Shows off power of Rust regex library
- Bat - A super
cat
with syntax highlighting, git integration, other features - Bottom - Cross-platform fancy
top
in Rust - process/sys mon with graphs, very useful! - ht - HTTPie clone / much better
curl
alternative - Dust - Rust graphical-text faster and friendlier version of du
- fd - Rust CLI, friendlier and faster replacement for
find
- Nushell - Rust shell that turns all output into tabular data. Pretty cool!
- imagecli - CLI for image batch processing
- Hyperfine - Rust performnace benchmarking CLI
- Alacritty - GPU accelerated terminal emulator
- jql - Rust version of popular
jq
JSON CLI processor, though not as powerful - Starship - "The minimal, blazing-fast, and infinitely customizable prompt for any shell!"
Wasm:
- Wasmer - general purpose WASM runtime
- Krustlet - WebAssembly (instead of containers) runtime on Kubernetes!! Use Rust + wasm + WASI for a truly portable k8s-based deploy!
Data/Others:
- Sled - an embedded database engine using latch-free Bw-tree on latch-free page cache techniques for speed
- IOx - New in-memory columnar InfluxDB engine using Arrow, DataFusion, rust! Persists using parquet. Super awesome stuff.
- IndraDB - Graph database/library written in Rust! and inspired by Facebook's TAO.
- TabNine - an ML-based autocompleter, written in Rust
- async-std - the standard library with async APIs
- MinSQL - interesting POC on lightweight SQL based log search, w automatic field parsing etc.
- Timely Dataflow - distributed data-parallel compute engine in Rust!!!
- Toshi - ElasticSearch written in Rust using Tantivy as the engine
- Convey - Layer 4 load balancer
Error handling survey - really good summary of the Rust error library landscape as of late 2019.
-
Rust Concurrency: Five Easy Pieces - a great intro to threads, using message queues, determinism, and more
-
Async stacktraces - this is SUPER COOL!!!
-
Rust Parallelism for non C/C++ Devs - great resource on the low-level primitives like
Mutex
andRwLock
-
Fearless Concurrency with Hazard Pointers - using the
conc
crate andAtomic
which implements hazard pointers for fine-grained and safe protection of readers and garbage -
Bastion - Erlang/Akka-style, remote supervised actor framework
Sometimes one needs to share a large data structure across threads and several of them must access it.
The most general way to share a data structure is to use Arc<RwLock<...>>
or Arc<Mutex<...>>
. The Arc
keeps track of lifetimes and lets different threads exist for different lengths of time, and is inexpensive since it is usually only accessed once at thread spawn. The Mutex
or RwLock
lets different threads mutate it safely, assuming the data structure is not thread-safe.
A thread-safe data structure could be used in place of the RwLock
or Mutex
.
Scoped threads could be used if only one owner will mutate the data structure, and one wants to share immutable refs with other threads for reading. However, the special threads in Crossbeam crate are still needed as Rustc by itself has no way of proving the lifetime of a thread or when it will be joined, thus any immutable refs created from the owner thread still cannot compile or be shared due to rustc lifetime checks. Scoped threads are a way around that as it gives rustc a guarantee that the threads will be joined before the owner goes away.
Arc-swap could potentially help too.
Also see beef - a leaner version of Cow.
-
Are we learning yet? - list of ML Rust crates
-
Timely Dataflow - distributed data-parallel compute engine in Rust!!!
-
DataFusion - a Rust query engine which is part of Apache Arrow!
-
Weld - Stanford's high-performance runtime for data analytics
-
Toshi - ElasticSearch written in Rust using Tantivy as the engine
-
MeiliDB - fast full-text search engine
-
Vector - unified client side collection agent for logs, metrics, events
-
Tremor - a simple event processing / log and metric processing and forwarding system, with scripting and streaming query support. Much more capable than Telegraf.
-
Clepsydra - Graydon Hoare working on distributed database protocol - in Rust!
For JSON DOM (IR) processing, using the mimalloc allocator provided me a 2x speedup with serde-json. Then, switching to json-rust provided another 1.8x speedup. The speedup is completely unreal, much faster than JVM. The main reason I guess is that json-rust has a Short
DOM class for short strings, which requires no heap allocation.
- simdjson-rs - SIMD-enabled JSON parser. NOTE: no writing of JSON.
-
dashmap - "Blazing fast concurrent HashMap for Rust"
-
Patricia Tree - Radix-tree based map for more compact storage
-
Using Finite State Automata and Rust to quickly index and find data amongst HUGE amount of strings
-
ahash - this seems to be the fastest hash algo for hash keys
-
Metrohash - a really fast hash algorithm
-
IndexMap - O(1) obtain by index, iteration by index order
-
FM-Index, a neat structure that allows for fast exact string indexing and counting while compressing original string data at the same time. There is a Rust crate
-
Rstar - n-dimensional R*-Tree for geospatial indexing and nearest-neighbor
-
Heapless - static data structures with fixed size; Vec, heap, map, set, queues
-
Petgraph - Graph data structure for Rust, considered perhaps most mature right now
-
Easy Persistent Data Structures in Rust - replacing
Box
withRc
-
VecMap - map for small integer keys, may use less space
Rust has native UTF8 string processing, which is AWESOME for performance. However, there are two concerns usually:
- Small string memory efficiency. The native
String
type uses at least two words just for pointer and length/cap, which might be longer than the string itself; - Minimizing number of heap allocations
Here are some solutions:
- String - string type with configurable byte storage, including stack byte arrays!
- Inlinable String - stores strings up to 30 chars inline, automatic promotion to heap string if needed.
- Also see smallstr
- kstring - intended for map keys: immutable, inlined for small keys, and have Ref/Cow types to allow efficient sharing. :)
- nested - reduce Vec type structures to just two allocations, probably more memory efficient too.
- tinyset - space efficient sets and maps, can be combined with nested perhaps
- bumpalo can do really cheap group allocations in a
Bump
and has customString
andVec
versions. At least lowers allocation overhead.
-
The presence of true unsigned types is really nice for low-level work. I hit a bug in Scala where I used >> instead of >>>. In Rust you declare a type as unsigned and don't have to worry about this.
-
Immutable byte slices and reference types again are awesome for low-level work.
-
Trait monomorphisation is awesome for ensuring trait methods can be inlined. JVM cannot do this when there is more than one implementation of a trait.
-
Being able to examine assembly directly from compiler output is super nice for low level perf work (compared to examining bytecode and not knowing the final output until runtime)
-
OTOH, rustc is definitely much much stricter (IMO) compared to scalac. Much of this is for good reason though, for example lack of integer/primitive coercion, ownership, etc. gives safety guarantees.
-
Calling Rust from Java - especially see the hint for using jnr-ffi
-
There is also j4rs for calling Java from Rust
-
SaferFFI - a neat library to make exposing C-like APIs much safer esp dealing with pointers, nulls, borrowing etc.
-
Exposing a Rust library to C - has some great tips on creating .so's and working with strings
-
It seems to me Circle CI's support for multiple docker images and explicit manifest style makes it very easy to set up multiple language and dependency support
-
Running LLVM on GraalVM - using GraalVM to embed and run LLVM bitcode! Too bad GraalVM is commercial/Oracle only
-
Oh no, my data science is getting Rusty! - neat post from CrowdStrike on integrating Rust with Python for improved performance AND safety
- Structopt - define CLI options using a struct!
-
EVCXR - a Rust REPL!!! With deps, and tab-completion for methods!!
-
comby-rust - rewrite Rust code using comby
-
no-panics-whatsoever - crate to detect and ensure at compile time there aren't panics in your code
-
RustAnalyzer - LSP-based plugin/server for IDE functionality in Sublime/VSCode/EMacs/etc
-
Cargo-play - run Rust scripts without needing to set up a project
- Also see cargo-eval and runner for diff ways of easily running scripts without projects
The two standard property testing crates are Quickcheck and proptest. Personally I prefer proptest due to much better control over input generation (without having to define your own type class).
- Rust Continuous Delivery - hints on using Docker, caching deps, and automated cloud-based CI/CD workflows for Rust
- Faster Build Times on MacOS
A common concern - how do I build different versions of my Rust lib/app for say OSX and also Linux?
- Easiest way now seems to be to use cross - I tried it and literally as easy as
cargo install cross
andcross build --target ...
as long as you have Docker.- NOTE: crates with non-Rust code (eg jemalloc, mimalloc) often have trouble
- Also see rust-musl-builder, another Docker-based solution
- musl is the best target for Linux as it removes need for G/LIBC dependencies and versioning. Musl creates a single static binary for super easy deploys.
- For automation, maybe better to create a single Docker image which combines crossbuild (which has a recipe for OSXCross + other targets) with a rustup container like abronan/rust-circleci which allows building both nightly and stable. Use Docker multi-stage builds to make combining multiple images easier
Finally, the Taking Rust everywhere with Rustup blog has good guide on how to use rustup to install cross toolchains, but the above steps to install OS specific linkers are still important.
A big part of the appeal of Rust for me is super fast, SAFE, built in UTF8 string processing, access to detailed memory layout, things like SIMD. Basically, to be able to idiomatically, safely, and beautifully (functionally?) do super fast and efficient data processing.
-
Cheap Tricks - Rust Performance - set of quick Cargo settings to try
-
How to Write Fast Rust Code - really good guide
-
High Performance Rust - a book
-
Optimizing String Processing in Rust - really useful stuff
-
Achieving warp speed with Rust - great tips on performance optimization
-
Modern storage is plenty fast - using a new Rust crate called glommio one can achieve multi-GB per sec read throughputs from modern SSDs. So maybe we don't need memory after all.
-
Representations - super important to understand low-level memory layouts for structs. C vs packed vs .... including alignment issues.
-
Precise memory layouts and how to dump out Rust struct memory layouts
- Or just use the memoffset crate
-
Rust uses system malloc by default. How to switch the default allocator.
- Use jemallocator and jemalloc-ctl crates for stats, deep dives, etc. Jemalloc from Facebook supposed to be fast.
- Also see MiMalloc - a high perf allocator from Microsoft. I got 2x improvement for JSON workloads!
- There are even epoch GCs available
- Also look into the arena and typed_arena crates... very cheap allocations within a region, then free entire region at once.
- Also see bumpalo - bump allocator which includes custom versions of standard collections
-
Watch out for dynamic dispatch (when you need to use
Box<dyn MyTrait>
etc). One solution is to use enum_dispatch.- Related: auto_enum - a way to return enums when you might need to return
impl A
for some trait A when you might be returning diff implementations
- Related: auto_enum - a way to return enums when you might need to return
Rust nightly now has a super slick asm! inline assembly feature. The way that it integrates Rust variables/expressions with auto register assignment is super awesome.
NOTE: simplest way to increase perf may be to enable certain CPU instructions: set -x RUSTFLAGS "-C target-feature=+sse3,+sse4.2,+lzcnt,+avx,+avx2"
NOTE2: lazy_static
accesses are not cheap. Don't use it in hot code paths.
NEW: I've created a Docker image for Linux perf profiling, super easy to use. The best combo is cargo flamegraph followed by perf and asm analysis.
-
cargo-flamegraph -- this is now the easiest way to get a FlameGraph on OSX and profile your Rust binaries. To make it work with bench and Criterion:
- First run
cargo bench
to build your bench executable - If you haven't already,
cargo install flamegraph
(recommend at least v0.1.13) sudo flamegraph target/release/bench-aba573ea464f3f67 --profile-time 180 <filter> --bench
(replace bench-aba* with the name of your bench executable)- The
--profile-time
is needed for flamegraph to collect enough stats
- The
open -a Safari flamegraph.svg
- NOTE: you need to turn on
debug = true
in release profile for symbols - This method works better for apps than small benchmarks btw, as inlined methods won't show up in the graph.
- First run
-
Rust Performance: Perf and Flamegraph - including finding hot assembly instructions
-
Top-down Microarchitecture Analysis Method - TMAM is a formal microprocessor perf analysis method from Intel, works with perf to find out what CPU-level bottlenecks are (mem IO? branch predictions? etc.)
-
Rust Profiling with DTrace and FlameGraphs on OSX - probably the best bet (besides Instruments), can handle any native executable too
- From
@blaagh
: though the predicate should be"/pid == $target/"
rather than using execname. - DTrace Guide is probably pretty useful here
- From
-
Hyperfine - Rust performnace benchmarking CLI
-
Tools for Profiling Rust - cpuprofiler might possibly work on OSX. It does compile. The cpuprofiler crate requires surrounding blocks of your code though.
-
Rust Profiling talk - discusses both OSX and Linux, as well as Instruments and Intel VTune
-
2017 RustConf - Improving Rust Performance through Profiling
-
Flamer - an alternative to generating FlameGraphs if one is willing to instrument code. Warning: might require nightly Rust features.
-
Rust Profiling with Instruments on OSX - but apparently cannot export CSV to FlameGraph :(
-
cargo-profiler - only works in Linux :(
-
coz and its Cargo plugin, coz-rs -- "a new kind of profiler that unlocks optimization opportunities missed by traditional profilers. Coz employs a novel technique we call causal profiling that measures optimization potential"
For heap profiling try memory-profiler - written in Rust by the Nokia team!
- stats_alloc can dump out incremental stats about allocation. Or just use
jemalloc-ctl
. - deepsize - macro to recursively find size of an object
- Measuring Memory Usage in Rust - thoughts on working around the fact we don't have a GC to track deep memory usage
cargo-asm can dump out assembly or LLVM/IR output from a particular method. I have found this useful for really low level perf analysis. NOTE: if the method is generic, you need to give a "monomorphised" or filled out method. Also, methods declared inline won't show up.
- What I like to do with asm output: check if rustc has inlined certain methods. Also you can clearly see where dynamic dispatch happens and how complicated generated code seems. More complicated code usually == slower.
- llvm-mca - really detailed static analysis and runtime prediction at the machine instruction level
What I've found that works (but see cargo flamegraph above for easier way):
sudo dtrace -c './target/release/bench-2022f41cf9c87baf --profile-time 120' -o out.stacks -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }'
~/src/github/FlameGraph/stackcollapse.pl out.stacks | ~/src/github/FlameGraph/flamegraph.pl >rust-bench.svg
open -a Safari rust-bench.svg
where -c bench.... is the executable output of cargo bench.
I was hoping cargo-with would allow us to run above dtrace command with the name of the bench output, but alas it doesn't seem to work with bench. (NOTE: they are working on a PR to fix this! :)
NOTE: The built in cargo bench
requires nightly Rust, it doesn't work on stable! I highly recommend for benchmarking to use criterion, which works on stable and has extra features such as gnuplot, parameterized benchmarking and run-to-run comparisons, as well as being able to run for longer time to work with profiling such as dtrace.
- nom - a direct parser using macros, commonly accepted as fastest generic parser
- pest is a PEG parser using an external, easy to understand syntax file. Not quite as fast but might be easier to understand and debug. There is also a book.
- combine is a parser combinator library, supposedly just as fast as nom, syntax seems slightly easier
- bitpacking - insanely fast integer bitpacking library
- packed_struct - bitfield packing/unpacking; can also pack arrays of bitfields; mixed endianness, etc.
The ideal performance-wise is to not need serialization at all; ie be able to read directly from portions of a binary byte slice. There are some libraries for doing this, such as flatbuffers, or flatdata for which there is a Rust crate; or Cap'n Proto. However, there may be times when you want more control or things like Cap'n Proto are not good enough.
How do we perform low-level byte/bit twiddling and precise memory access? Unfortunately, all structs in Rust basically need to have known sizes. There's something called dynamically sized types basically like slices where you can have the last element of a struct be an array of unknown size; however, they are virtually impossible to create and work with, and this only covers some cases anyhow. So we will unfortunately need a combination of techniques. In order of preference:
- Overall scroll is the best general-purpose struct serialization crate; it helps with reading integers and other fields too, and takes care of endianness. It generates pretty efficient code. It is a bit of a pain working with numeric enums however.
- num_enum - a way to derive TryFrom for numeric enums helps a little bit.
- I have found plain works really well. Mark your structs with
#[repr(C)]
. It only helps with size and alignment, not endianness - so maybe more for in-memory structures or when you are sure you don't need code to work across endianness platforms. If your structures are not aligned then use#[repr(C, packed)]
or#[align(1)]
. - Use a crate such as bytes or scroll to help extract and write structs and primitives to/from buffers. Might need extra copying though. Also see iobuf
- rel-ptr - small library for relative pointers/offsets, should be super useful for custom file formats and binary/persistent data structures
- arrayref might help extract fixed size arrays from longer ones.
- bytemuck for casts
- bitmatch could be great for bitfield parsing
- Or use the pod crate to help with some of the above conversions. However pod seems to no longer be maintained. nue and its macros can also help with struct alignment.
- Also see zero
- Allocate a
Vec::<u8>
and transmute specific portions to/from structs of known size, or convert pointers within regions back to references:
let foobar: *mut Foobar = mybytes[..].as_ptr() as *mut Foobar;
let &mut Foobar = (unsafe { foobar.as_ref() }).expect("Cannot convert foobar to ref");
- Or structview which offers types for unaligned integers etc.
- There are some DST crates worth checking out: slice-dst, thin-dst
- As a last resort, work with raw pointer math using the add/sub/offset methods, but this is REALLY UNSAFE.
let foobar: *mut Foobar = mybytes[..].as_ptr() as *mut Foobar;
unsafe {
(*foobar).foo = 17;
(*foobar).bar = -1;
}
Want to zero memory quickly? Use slice_fill for memset optimization, since there is no memory filling for slices in Rust yet.
Also check out the crazy number of crates available under compression - including various interesting radix and trie data structures, and more compression algorithms that one has never heard of.
There is this great article on Towards fearless SIMD, about why SIMD is hard, and how to make it easier. Along with pointers to many interesting crates doing SIMD. (There is a built in crate, std::simd
but it is really lacking) (However, packed_simd will soon be merged into it)
Another great article: learning simd with rust by finding planets is great too. simd is really about parallelism. it is better to do multiple operations in a parallel (vertical) fashion, vector on vector, than to do horizontal operations where the different components of a wide register depend on one another.
-
ssimd - an effort to bring std::simd/packed_simd to Rust stable, with auto vectorization (meaning auto detect and implement code paths and fallbacks for when SIMD not available!)
-
faster - "SIMD for Humans" -- probably my favorite one, very high level translation of numeric map loops into SIMD
-
fearless_simd, the blog post author's crate. Runtime CPU detection and use of the most optimal code, no need for unsafe, but only focused on f32.
-
SIMDeez - abstracts intrinsic SIMD instructions over different instruction sets & vector widths, runtime detection
-
simd_aligned and simd_aligned_rust - work with SIMD and packed_simd using vectors which have guaranteed alignment
-
aligned - newtype with byte alignment, for stack or heap!
-
https://www.rustsim.org/blog/2020/03/23/simd-aosoa-in-nalgebra/
NOTE: shuffle
in packed_simd
is not very fast. Replace with native instructions if possible.