[Project] Adds readme and license file.

P3KI · Jul 24, 2018 · 508aeaf · 508aeaf
1 parent d6a5d08
commit 508aeaf
Show file tree

Hide file tree

Showing 2 changed files with 292 additions and 0 deletions.
diff --git a/LICENSE-BSD3 b/LICENSE-BSD3
@@ -0,0 +1,29 @@
+Copyright (c) 2018 P3KI GmbH, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission. 
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
diff --git a/README.md b/README.md
@@ -0,0 +1,263 @@
+# Bendy
+A Rust library for encoding and decoding bencode with enforced canonicalization rules.
+[Bencode](https://en.wikipedia.org/wiki/Bencode) is a simple but very effective encoding
+scheme, originating with the BitTorrent peer-to-peer system.
+
+## Known alternatives:
+This is not the first library to implement Bencode. In fact there's several implementations
+already:
+
+    - Toby Padilla [serde-bencode](https://github.com/toby/serde-bencode)
+    - Arjan Topolovec's [rust-bencode](https://github.com/arjantop/rust-bencode),
+    - Murarth's [bencode](https://github.com/murarth/bencode),
+    - and Jonas Hermsmeier's [rust-bencode](https://github.com/jhermsmeier/rust-bencode)
+
+## Why should I use it?
+So why the extra work adding yet-another-version of a thing that already exists, you
+might ask?
+
+### Enforced correctness
+Implementing a canonical encoding form is straight forward. It comes down to defining
+*a proper way of handling unordered data*. The next step is that bendy's sorting data
+before encoding it using the regular Bencode rules. If your data is already sorted bendy
+will of course skip the extra sorting step to gain efficiency.
+But bendy goes a step further to *ensure correctness*: If you hand the library data that
+you say is already sorted, bendy still does an in-place verification to *ensure that your
+data actually is sorted* and complains if it isn't. In the end, once bendy serialized your
+data, it's Bencode through and through. So it's perfectly compatible with every other
+Bencode library.
+
+Just remember: At this point *only bendy* enforces the correctness of the canonical
+format if you read it back in.
+
+### Canonical representation
+Bendy ensures that any de-serialize / serialize roundtrip produces the exact *same*
+and *correct* binary representation. This is relevant if you're dealing with unordered
+sets or map-structured data where theoretically the order is not relevant, but in practice
+it is. Especially if you want to ensure that cryptographic signatures related to the data
+structure do not get invalidated accidentially.
+
+| Datastructure | Default Impl | Comment                                                                                    |
+|---------------|--------------|--------------------------------------------------------------------------------------------|
+| Vec           | ✔            | Defines own ordering                                                                       |
+| VecDeque      | ✔            | Defines own ordering                                                                       |
+| LinkedList    | ✔            | Defines own ordering                                                                       |
+| HashMap       | ✔            | Ordering missing but content is ordered by key byte representation.                        |
+| BTreeMap      | ✔            | Defines own ordering                                                                       |
+| HashSet       | ✘            | (Unordered) Set handling not yet defined                                                   |
+| BTreeSet      | ✘            | (Unordered) Set handling not yet defined                                                   |
+| BinaryHeap    | ✘            | Ordering missing                                                                           |
+| Iterator      | ~            | `emit_unchecked_list()` allows to emit any iterable but user needs to ensure the ordering. |
+
+**Attention:**
+
+- Since most list types already define their inner ordering, datastructures 
+  like `Vec`, `VecDeque`, and `LinkedList` will not get sorted during encoding!
+
+- There is no default implementation for handling generic iterators.
+  This is by design. `Bendy` cannot tell from an iterator whether the underlying
+  structure requires sorting or not and would have to take data as-is.
+
+## Usage
+
+### Optional: Limitiation of recursive parsing
+
+**What?**
+
+The library allows to set an expected recursion depth limit for de- and encoding.
+If set, the parser will use this value as an upper limit for the validation of any nested
+data structure and abort with an error if an additional level of nesting is detected.
+
+While the encoding limit itself is primarily there to increase the confidence of bendy
+users in their own validation code, the decoding limit should be used to avoid
+parsing of malformed or malicious external data.
+
+ - The encoding limit can be set through the `MAX_DEPTH`
+    field inside any implementation of the `Encodable` trait.
+ - The decoding limit can be set through a call of `with_max_depth`
+    on the `Decoder` object.
+
+**How?**
+
+The nesting level calculation always starts on level zero, is incremented by one when
+the parser enters a nested bencode element (i.e. list, dictionary) and decrement as
+soon as the related element ends. Therefore any values decoded as bencode strings
+or integers do not affect the nesting limit.
+
+### Encoding Bencode
+In most cases it should be enough to pass the object to encode into the `emit`
+function of the encoder as this will serialize any type implementing the
+`Encodable` trait.
+
+Next to `emit` the encoder also provides a list of functions to encode specific
+bencode primitives (i.e. `emit_int` and `emit_str`) and nested bencode elements
+(i.e. `emit_dict` and `emit_list`). These methods should be used during the
+implementation of the `Encodable` trait or if its necessary to output a specific
+non default data type.
+
+**Hint:** As its a very common pattern to serialize a `Vec<u8>` as a byte string
+Bendy exposes the `AsString` wrapper. This can be used to encapsulate any element
+implementing `AsRef<[u8]>` to output itself as a bencode string instead of a list.
+For a usage example see the categorie `Encode a byte string`.
+
+#### Encoding an integer
+
+```rust
+use bendy::encoder::Encoder;
+
+let mut encoder = Encoder::new();
+encoder.emit(1010011010).unwrap();
+
+let output = encoder.get_output().unwrap();
+assert_eq!("i1010011010e", std::str::from_utf8(&output).unwrap());
+```
+
+#### Encode a byte string
+
+```rust
+use bendy::encoder::Encoder;
+
+let mut encoder = Encoder::new();
+encoder.emit("foo").unwrap();
+
+let output = encoder.get_output().unwrap();
+assert_eq!("3:foo", std::str::from_utf8(&output).unwrap());
+```
+
+```rust
+use bendy::encoder::{Encoder, AsString};
+
+let byte_vector = vec![0u8, 1, 2];
+
+let mut encoder = Encoder::new();
+encoder.emit(AsString(byte_vector)).unwrap();
+
+let output = encoder.get_output().unwrap();
+assert_eq!("3:\x00\x01\x02", std::str::from_utf8(&output).unwrap());
+```
+
+#### Encode a dictionary
+
+```rust
+use bendy::{
+    encoder::{Encodable, SingleItemEncoder, Encoder},
+    Error as BencodeError,
+};
+
+struct Dict{
+    bar: String,
+}
+
+impl Encodable for Dict{
+    const MAX_DEPTH: usize = 1;
+
+    fn encode(&self, encoder: SingleItemEncoder) -> Result<(), BencodeError> {
+        encoder.emit_dict(|mut e| {
+            e.emit_pair(b"bar", &self.bar)?;
+            Ok(())
+        })
+    }
+}
+
+fn main() {
+    let dict = Dict { bar: "baz".to_owned() };
+
+    let mut encoder = Encoder::new();
+    encoder.emit(dict).unwrap();
+
+    let output = encoder.get_output().unwrap();
+    assert_eq!(
+        "d3:bar3:baze",
+        std::str::from_utf8(&output).unwrap()
+    );
+}
+```
+
+#### Encode a list
+
+```rust
+use bendy::encoder::{Encoder, List};
+
+let list = vec!["foo", "bar", "baz"];
+
+let mut encoder = Encoder::new();
+encoder.emit(List(&list)).unwrap();
+
+let output = encoder.get_output().unwrap();
+assert_eq!(
+    "l3:foo3:bar3:baze",
+    std::str::from_utf8(&output).unwrap()
+);
+```
+
+### Decoding Bencode
+
+#### Decode an integer
+
+```rust
+use bendy::decoder::Decoder;
+
+let mut decoder = Decoder::new(b"i1010011010e");
+let object = decoder.next_object().unwrap().unwrap();
+
+let number = object.integer_str_or_err(-1).unwrap();
+assert_eq!("1010011010", number);
+```
+
+#### Decode a byte string
+
+```rust
+use bendy::decoder::Decoder;
+
+let mut decoder = Decoder::new(b"11:foo bar baz");
+let object = decoder.next_object().unwrap().unwrap();
+
+let bytes =  object.bytes_or_err(-1).unwrap();
+assert_eq!("foo bar baz", std::str::from_utf8(&bytes).unwrap());
+```
+
+#### Decode a dictionary
+
+```rust
+use bendy::decoder::{Decoder, Object};
+
+let mut decoder = Decoder::new(b"d3:foo3:bare");
+let object = decoder.next_object().unwrap();
+
+if let Some(Object::Dict(mut dict_decoder)) = object {
+
+    if let (b"foo",value) = dict_decoder.next_pair().unwrap().unwrap() {
+        let bytes = value.bytes_or_err(-1).unwrap();
+        assert_eq!("bar", std::str::from_utf8(&bytes).unwrap());
+    }
+}
+```
+
+#### Decode a list
+
+```rust
+use bendy::decoder::{Decoder, Object};
+
+let mut decoder = Decoder::new(b"l3:foo3:bar3:baze");
+let object = decoder.next_object().unwrap();
+let mut result : Vec<&str> = vec![];
+
+if let Some(Object::List(mut list_decoder)) = object {
+
+    while let Some(list_element) = list_decoder.next_object().unwrap(){
+        let bytes =  list_element.bytes_or_err(-1).unwrap();
+        result.push(std::str::from_utf8(&bytes).unwrap());
+    }
+}
+
+assert_eq!(["foo", "bar", "baz"][..], result[..]);
+```
+
+## Usage of unsafe code
+The parser wouldn't require any unsafe code to work but it still contains a single unsafe call
+to `str::from_utf8_unchecked`. This call is used to avoid a duplicated UTF-8 check when the
+parser converts the bytes representing an incoming integer into a `&str` after its successful
+validation.
+
+*Disclaimer: Further unsafe code may be introduced through the dependency on `failure` and
+`failure-derive`.*