Skip to content

Commit

Permalink
Merge pull request #307 from brendanzab/formats/link-formats
Browse files Browse the repository at this point in the history
Link formats
  • Loading branch information
brendanzab authored Mar 7, 2022
2 parents 37c0ea5 + ff540b6 commit 8e67030
Show file tree
Hide file tree
Showing 11 changed files with 397 additions and 132 deletions.
45 changes: 42 additions & 3 deletions doc/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ elaboration, and core language is forthcoming.
- [Overlap formats](#overlap-formats)
- [Number formats](#number-formats)
- [Array formats](#array-formats)
- [Link formats](#link-formats)
- [Stream position formats](#stream-position-formats)
- [Succeed format](#succeed-format)
- [Fail format](#fail-format)
Expand All @@ -42,6 +43,7 @@ elaboration, and core language is forthcoming.
- [Array types](#array-types)
- [Array literals](#array-literals)
- [Positions](#positions)
- [References](#references)
- [Void](#void)

## Structure
Expand Down Expand Up @@ -69,17 +71,17 @@ definition during evaluation.

If no binding is found, names can refer to one of the built-in primitives:

- `Format`
- `Format`, `Repr`
- `u8`, `u16be`, `u16le`, `u32be`, `u32le`, `u64be`, `u64le`
- `s8`, `s16be`, `s16le`, `s32be`, `s32le`, `s64be`, `s64le`
- `f32be`, `f32le`, `f64be`, `f64le`
- `array8`, `array16`, `array32`, `array64`
- `link8`, `link16`, `link32`, `link64`
- `stream_pos`
- `succeed`, `fail`
- `Repr`
- `U8`, `U16`, `U32`, `U64`, `S8`, `S16`, `S32`, `S64`, `F32`, `F64`
- `Array8`, `Array16`, `Array32`, `Array64`
- `Pos`
- `Pos`, `Ref`
- `Void`

### Let expressions
Expand Down Expand Up @@ -342,6 +344,31 @@ of the host array types.
| `array32 len format` | `Array32 len (Repr format)` |
| `array64 len format` | `Array64 len (Repr format)` |

### Link formats

Link formats allow for references to other parts of a binary stream to be
registered during parsing. They take a base [position](#positions), an offset
from that position, and a format to expect at that position.

There is a different link type for each unsigned integer offset:

- `link8 : Pos -> U8 -> Format -> Format`
- `link16 : Pos -> U16 -> Format -> Format`
- `link32 : Pos -> U32 -> Format -> Format`
- `link64 : Pos -> U64 -> Format -> Format`

#### Representation of link formats

Links formats are [represented](#format-representations) as typed
[references](#references) to other parts of the binary stream.

| format | `Repr` format |
| ---------------------------- | --------------------------- |
| `link8 pos offset format` | `Ref (Repr format)` |
| `link16 pos offset format` | `Ref (Repr format)` |
| `link32 pos offset format` | `Ref (Repr format)` |
| `link64 pos offset format` | `Ref (Repr format)` |

### Stream position formats

The stream position format is interpreted as the current stream position during
Expand Down Expand Up @@ -376,6 +403,8 @@ parsing.

#### Representation of fail format

The fail format should never produce a term, so is represented with [void](#void).

| format | `Repr` format |
| ------ | ------------- |
| `fail` | `Void` |
Expand Down Expand Up @@ -581,6 +610,16 @@ Stream positions are represented as an abstract datatype:
Positions are usually encountered as a result of parsing a [stream position
format](#stream-position-formats).

## References

References are like [stream positions](#positions), only they also have an
expected type given as well:

- `Ref : Type -> Type`

References are usually encountered as a result of parsing a [link
format](#link-formats).

## Void

The void type is be used to mark terms that must never be constructed:
Expand Down
28 changes: 24 additions & 4 deletions fathom/src/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ pub enum Prim {
Array64Type,
/// Type of stream positions.
PosType,
/// Type of stream references.
RefType,

/// Type of format descriptions.
FormatType,
Expand Down Expand Up @@ -236,16 +238,28 @@ pub enum Prim {
FormatF64Be,
/// 64-bit, IEEE-754 floating point formats (little-endian).
FormatF64Le,
/// Array formats, with 8-bit indices.
/// Array formats, with unsigned 8-bit indices.
FormatArray8,
/// Array formats, with 16-bit indices.
/// Array formats, with unsigned 16-bit indices.
FormatArray16,
/// Array formats, with 32-bit indices.
/// Array formats, with unsigned 32-bit indices.
FormatArray32,
/// Array formats, with 64-bit indices.
/// Array formats, with unsigned 64-bit indices.
FormatArray64,
/// A format which returns the current position in the input stream.
FormatStreamPos,
/// A format that links to another location in the binary data stream,
/// relative to a base position and a unsigned 8-bit offset.
FormatLink8,
/// A format that links to another location in the binary data stream,
/// relative to a base position and a unsigned 16-bit offset.
FormatLink16,
/// A format that links to another location in the binary data stream,
/// relative to a base position and a unsigned 32-bit offset.
FormatLink32,
/// A format that links to another location in the binary data stream,
/// relative to a base position and a unsigned 64-bit offset.
FormatLink64,
/// Format representations.
FormatRepr,

Expand Down Expand Up @@ -273,6 +287,7 @@ impl Prim {
Prim::Array32Type => "Array32",
Prim::Array64Type => "Array64",
Prim::PosType => "Pos",
Prim::RefType => "Ref",

Prim::FormatType => "Format",
Prim::FormatSucceed => "succeed",
Expand All @@ -299,6 +314,10 @@ impl Prim {
Prim::FormatArray16 => "array16",
Prim::FormatArray32 => "array32",
Prim::FormatArray64 => "array64",
Prim::FormatLink8 => "link8",
Prim::FormatLink16 => "link16",
Prim::FormatLink32 => "link32",
Prim::FormatLink64 => "link64",
Prim::FormatStreamPos => "stream_pos",
Prim::FormatRepr => "Repr",

Expand All @@ -322,6 +341,7 @@ pub enum Const {
F32(f32),
F64(f64),
Pos(u64),
Ref(u64),
}

#[cfg(test)]
Expand Down
127 changes: 115 additions & 12 deletions fathom/src/core/binary.rs
Original file line number Diff line number Diff line change
@@ -1,27 +1,91 @@
//! Binary semantics of the data description language
use std::collections::HashMap;
use std::io::{self, Read, Seek, SeekFrom};
use std::sync::Arc;

use crate::core::semantics::{self, ArcValue, Head, Value};
use crate::core::{Const, Prim};
use crate::env::SliceEnv;
use crate::env::{EnvLen, SliceEnv};

pub struct Context<'arena, 'env> {
flexible_exprs: &'env SliceEnv<Option<ArcValue<'arena>>>,
pending_formats: Vec<(u64, ArcValue<'arena>)>,
}

pub struct RefData<'arena> {
pub r#type: ArcValue<'arena>,
pub expr: ArcValue<'arena>,
}

impl<'arena, 'env> Context<'arena, 'env> {
pub fn new(flexible_exprs: &'env SliceEnv<Option<ArcValue<'arena>>>) -> Context<'arena, 'env> {
Context { flexible_exprs }
Context {
flexible_exprs,
pending_formats: Vec::new(),
}
}

fn elim_context(&self) -> semantics::ElimContext<'arena, '_> {
semantics::ElimContext::new(self.flexible_exprs)
}

pub fn read(
&self,
fn conversion_context(&self) -> semantics::ConversionContext<'arena, '_> {
semantics::ConversionContext::new(EnvLen::new(), self.flexible_exprs)
}

// TODO: allow refs to be streamed
pub fn read_entrypoint(
&mut self,
reader: &mut dyn SeekRead,
format: ArcValue<'arena>,
) -> io::Result<HashMap<u64, RefData<'arena>>> {
let initial_pos = reader.stream_position()?;
let mut refs = HashMap::<_, RefData<'_>>::new();

// Parse the entrypoint from the beginning start of the binary data
self.pending_formats.push((0, format));

// NOTE: This could lead to non-termination if we aren't careful!
while let Some((pos, format)) = self.pending_formats.pop() {
use std::collections::hash_map::Entry;

let format_repr = self.elim_context().apply_repr(&format);

match refs.entry(pos) {
Entry::Occupied(entry) => {
let RefData { r#type, .. } = entry.get();
// Ensure that the format's representation type matches the
// type of the stored reference.
if !self.conversion_context().is_equal(r#type, &format_repr) {
return Err(io::Error::new(
io::ErrorKind::Other,
"ref is occupied by an incompatible type",
));
}
}
Entry::Vacant(entry) => {
// Seek to current current ref location
reader.seek(SeekFrom::Start(pos))?;
// Parse the data at that location
let expr = self.read_format(reader, &format)?;
// Record the data in the `refs` hashmap
entry.insert(RefData {
r#type: format_repr,
expr,
});
}
}
}

// Reset reader back to the start
reader.seek(SeekFrom::Start(initial_pos))?;

Ok(refs)
}

fn read_format(
&mut self,
reader: &mut dyn SeekRead,
format: &ArcValue<'arena>,
) -> io::Result<ArcValue<'arena>> {
Expand Down Expand Up @@ -52,8 +116,23 @@ impl<'arena, 'env> Context<'arena, 'env> {
(Prim::FormatArray16, [Fun(len), Fun(elem)]) => self.read_array(reader, len, elem),
(Prim::FormatArray32, [Fun(len), Fun(elem)]) => self.read_array(reader, len, elem),
(Prim::FormatArray64, [Fun(len), Fun(elem)]) => self.read_array(reader, len, elem),
(Prim::FormatLink8, [Fun(pos), Fun(offset), Fun(elem)]) => {
self.read_link(pos, offset, elem)
}
(Prim::FormatLink16, [Fun(pos), Fun(offset), Fun(elem)]) => {
self.read_link(pos, offset, elem)
}
(Prim::FormatLink32, [Fun(pos), Fun(offset), Fun(elem)]) => {
self.read_link(pos, offset, elem)
}
(Prim::FormatLink64, [Fun(pos), Fun(offset), Fun(elem)]) => {
self.read_link(pos, offset, elem)
}
(Prim::FormatStreamPos, []) => read_stream_pos(reader),
_ => return Err(io::Error::new(io::ErrorKind::Other, "invalid format")),
(Prim::FormatFail, []) => {
Err(io::Error::new(io::ErrorKind::Other, "parse failure"))
}
_ => Err(io::Error::new(io::ErrorKind::Other, "invalid format")),
},
Value::FormatRecord(labels, formats) => {
let mut formats = formats.clone();
Expand All @@ -62,7 +141,7 @@ impl<'arena, 'env> Context<'arena, 'env> {
while let Some((format, next_formats)) =
self.elim_context().split_telescope(formats)
{
let expr = self.read(reader, &format)?;
let expr = self.read_format(reader, &format)?;
exprs.push(expr.clone());
formats = next_formats(expr);
}
Expand All @@ -82,7 +161,7 @@ impl<'arena, 'env> Context<'arena, 'env> {
// Reset the stream to the start
reader.seek(SeekFrom::Start(initial_pos))?;

let expr = self.read(reader, &format)?;
let expr = self.read_format(reader, &format)?;
exprs.push(expr.clone());
formats = next_formats(expr);

Expand All @@ -104,14 +183,12 @@ impl<'arena, 'env> Context<'arena, 'env> {
| Value::RecordType(_, _)
| Value::RecordIntro(_, _)
| Value::ArrayIntro(_)
| Value::Const(_) => {
return Err(io::Error::new(io::ErrorKind::Other, "invalid format"))
}
| Value::Const(_) => Err(io::Error::new(io::ErrorKind::Other, "invalid format")),
}
}

fn read_array(
&self,
&mut self,
reader: &mut dyn SeekRead,
len: &ArcValue<'arena>,
elem_format: &ArcValue<'arena>,
Expand All @@ -125,12 +202,38 @@ impl<'arena, 'env> Context<'arena, 'env> {
};

for _ in 0..len {
let expr = self.read(reader, elem_format)?;
let expr = self.read_format(reader, elem_format)?;
elem_exprs.push(expr);
}

Ok(Arc::new(Value::ArrayIntro(elem_exprs)))
}

pub fn read_link(
&mut self,
pos: &ArcValue<'arena>,
offset: &ArcValue<'arena>,
elem_format: &ArcValue<'arena>,
) -> io::Result<ArcValue<'arena>> {
let pos = match self.elim_context().force(pos).as_ref() {
Value::Const(Const::Pos(pos)) => *pos,
_ => return Err(io::Error::new(io::ErrorKind::Other, "invalid link pos")),
};
let offset = match self.elim_context().force(offset).as_ref() {
Value::Const(Const::U8(len)) => *len as u64,
Value::Const(Const::U16(len)) => *len as u64,
Value::Const(Const::U32(len)) => *len as u64,
Value::Const(Const::U64(len)) => *len as u64,
_ => return Err(io::Error::new(io::ErrorKind::Other, "invalid link offset")),
};

let r#ref = u64::checked_add(pos, offset)
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "overflowing link"))?;

self.pending_formats.push((r#ref, elem_format.clone()));

Ok(Arc::new(Value::Const(Const::Ref(r#ref))))
}
}

pub trait SeekRead: Seek + Read {}
Expand Down
Loading

0 comments on commit 8e67030

Please sign in to comment.