Library that allows to encode any data with rateless codes. Rateless codes is a class of codes that does not have fixed rate value, but it can (in theory) produce limitless number of encoded symbols. One example of such codes is a group of fountain codes. There is even a nice picture with fountain that produce drops. To get a full glass of water there is enough to collect some amount of them (in theory only a glass). In practice (due to imperfection of algorithms), we have to collect a little bit more (overhead)
In the past I have been working with rateless codes - to be specific with fountain codes. I want to have an open source implementation to use it one day for a specific task. I am aware that there may be many other solutions, that is why I want to keep that one as lean as it can be with so little external dependencies as it can be.
Currently any data that can be passed as char* is able to be encoded. There are no padding options, so before choosing symbol size make sure that it will divide whole message size without rest - to equal pieces. I am planning to fix that in the future releases.
It can be seen as en extra feature - each encoding/decoding operation requires to specify seed value used to mix input data. If it is not correct, a message will be decoded but in a way that will produce invalid output. There is no way of telling if this is correct or not (at least there is nothing that can be used for that in general) except using some CRC beforehand and append that to the original message.
If this will be used as ECC for live communication, there is no need for retransmission. Each packet of data will provide some information (with probability near to 100%). There is enough to send one bit of information - "stop", at the point where decoding is successful.
Probability of successful decoding (in case of RLF) is
In case of LT codes, this is more complicated. There is Ideal Soliton Distribution and Robust Soliton Distribution.
Because it is ideal, there are some assumptions that are fulfilled only for ideal generator, but it is very hard to construct, so overhead could be quite large, i.e. ~50%. However this is good start to understand what is going on and how distribution should be designed. In theory there is enough to have one symbol with degree one to start decoding. It will reduce degree of other symbols, and there is high enough probability that another symbol of degree one will be available, and so on. Because this is ideal situation, sometimes there is not enough to have a one symbol with degree one, or even a bunch of them will not allow to perform further decoding.
Large overhead of practical use of ISD is why RSD have been designed to introduce more symbols with lower degree and additionally there is a spike in a probability distribution that will allow for all symbols to be "represented".
There are two parameters that can be used to customize code generation. They can be describe in literature as delta (fail probability) and C (some constant). There are many information and literature that can be used to customize them. Usually for higher C value, required overhead (on average) is also higher but variance is lower. This means that usually overhead is around 3-5%, but there might be messages with overhead as high as 20-30%. When it comes to delta, there is a claim that for certain C,
There are some distributions fine tuned for short messages (small number of packets), as in general, lower the number of packets, bigger the overhead. There might be some distributions to lower bandwidth or encoder/decoder complexity. At the end everybody can create distribution that suits specific needs.
This class of codes assumes that there is a high chance that most of packets can be decoded with small overhead and rest of them, let's say 5% requires a lot of extra data to be transmitted. So to deal with that, another coding with known rate can be used for inner coding and fountain codes are used for outer coding. If we decide to use low complexity inner coding and combine that with low degree LT code (our assumption), we can achieve near linear decoding complexity.
For fixed size channels RLF is quite good solutions as it can give really small overhead (20 extra symbols will give
There are other class of codes like LT codes, or Rapid Tornado codes that lower complexity of encoder and decoder to
- Implement LT with Ideal Soliton Distribution
- Implement Robust Soliton distribution
- Replace degree distribution PRNG by more portable solution
- Add Rapid Tornado Codes
- Implement distribution for Rapid Tornado Codes
- Add external coding to restore missing symbols (might require external library)
- Improve API
- Improve CPU performance (better data structures) - Done for LT
- Dedicated PRNG to avoid encoder/decoder mismatch
- Try to run as much encoding/decoding in parallel
- Investigate possibility to run encoding/decoding on the GPU (CUDA)
- Add option to use padding
- Add option to use simple checksums
This will probably change in the future to be simpler and more intuitive
int seed = 13;
int symbol_length = 8;
// create data container, vector of char, just for convenience
std::vector<char> data;
// fill with own data and pass total size to decoder
total_data_size = 64; // 8 symbols with 8 bytes each
// choose one of available encoders
Codes::Fountain::RLF encoder;
Codes::Fountain::LT encoder(new Codes::Fountain::RobustSolitonDistribution(0.05, 0.03));
encoder.set_seed(seed);
encoder.set_input_data(data.data(), data.size()); // no deep copy
encoder.set_symbol_length(symbol_length);
// generate as many symbols as you need
bool enough_symbols = false;
while(!enough_symbols)
{
auto* symbol = encoder.generate_symbol(); // caller is owner
// check if client already received enough
enough_symbols = client.has_enough();
}
// Use one of the available decoders, make sure LT params are the same
Codes::Fountain::RLF decoder;
Codes::Fountain::LT decoder(new Codes::Fountain::RobustSolitonDistribution(0.05, 0.03));
decoder.set_seed(seed);
decoder.set_input_data_size(total_data_size);
decoder.set_symbol_length(symbol_length);
auto already_decoded = false;
auto symbol_num = 0;
while(already_decoded)
{
auto* ptr = fetch_symbol();
// For RLF you can wait until at least some number of packets is received
// But it might be better to start decoding while data is still transmitted
decoder.feed_symbol(ptr, symbol_num, Memory::Owner, Decoding::Start);
already_decoded = decoder.decode(true);
// inform generator that there is enough symbols to decode
client.set_enough(true);
}
auto* payload = decoder.decoded_buffer();