Skip to content

Commit

Permalink
markdownlint
Browse files Browse the repository at this point in the history
  • Loading branch information
CGMossa committed Apr 6, 2024
1 parent f370013 commit 3380292
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 29 deletions.
57 changes: 31 additions & 26 deletions design_specifications/marshaling_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
This document outlines conversion and validation rules between `R` and `Rust` types.
The aim is to maintain type safety on the `Rust` side without sacrificing usability on the `R` side.


# Conversion and validation
# Conversion and validation

`extendr` takes into account the following problems when converting `R` object to `Rust` object:

Expand All @@ -17,41 +16,44 @@ The aim is to maintain type safety on the `Rust` side without sacrificing usabil
- Type conversion (applicable to `Vec<T>`)
- `logical()`, `raw()`, `character()` are treated as-is. If there is a `Rust` - `R` type mismatch, `extendr` wrapper `panic!`s
- `integer()` can be passed to functions that expect `double()` or `complex()`. `extendr` performs type cast, allocating memory for the new vector
- `double()` can be safely passed as `complex()`
- `double()` can be safely passed as `complex()`
- `double()` can sometimes be passed as `integer()`, if its values are representable by `i32`
- `complex()` can sometimes be passed as `double()` or `integer()` (see reasoning above)
- Whenever a numeric type-mismatch happens, a guaranteed allocation occurs
- An obscure iterator that accepts one of `integer()`, `double()` (and maybe `complex()`) handles both vectors and ALTREPs, does not allocate and offloads all validation and type checks onto user


Here is a list of examples:

- `Vec<i32>` triggers `NA` validation, altrep expanding, and type coercion if compatible (so `1.0` or `1.0 + 0i` convert to `1L`). Heavy on overhead and memory allocation, good for prototypes and testing things out.

- `Integers` is an obscure wrapper of either `&[Rint]` or some `AltIntegers`. Can be used to obtain an iterator, items of which are of type `Rint`, providing correct `NA` handling. Preferred way to handle vectors, similar to that of `{cpp11}`.
- `Integers` is an obscure wrapper of either `&[Rint]` or some `AltIntegers`. Can be used to obtain an iterator, items of which are of type `Rint`, providing correct `NA` handling. Preferred way to handle vectors, similar to that of `{cpp11}`.

- `Numerics` is a discriminated union of `Integers | Doubles`. Accepts all numeric inputs, but leaves it up to the user to decipher what exactly was received from `R`. No runtime validation, no extra allocation, ALTREPS remain not expanded.
- `Numerics` is a discriminated union of `Integers | Doubles`. Accepts all numeric inputs, but leaves it up to the user to decipher what exactly was received from `R`. No runtime validation, no extra allocation, ALTREPS remain not expanded.
- `ComplexNumerics` represents either `Complexes` or `Numerics`


----------------------------------------------------------------------------

# Underlying vector types

## Terminology

A 'vector' is a primitive type used in `R`. Vectors are designed to behave as a strongly typed 1D array of objects. There are two different implementations of vector types: one is basically a pointer to a contiguous block of memory with known length (and some additional metadata), another is an iterator deigned to store rules for generating sequences of elements (instead of storing potentially very large vectors in memory). Array-based vectors shall be referred to as 'plain old data' (POD), iterators -- as ALTREP.

`R` recognizes the following vector types that are directly exposed to the user:
- `logical (i32)`
- `integer (i32)`
- `real (f64)`
- `complex (f64, f64)`
- `raw (u8)`
- `character (usize)` (collection of pointers to character arrays)

- `logical (i32)`
- `integer (i32)`
- `real (f64)`
- `complex (f64, f64)`
- `raw (u8)`
- `character (usize)` (collection of pointers to character arrays)

Each vector can contain special `NA` values. None of the primitive types have built-in support for `NA` (including `f64`, which has notion of `NaN`, a different thing), so `R` treats one value from the range of allowed values as `NA`. For instance, `NA_integer_` is `i32::MIN`, which is `1i32 << 31 = -2147483648`. As a result, `x <- -2147483648L` results in an error in `R`.

## `Rust` counterparts

`R` objects passed to `Rust` require additional validation and transformation. Let us define the following types:

- `struct Rint(i32)`
- `struct Rbool(i32)`
- `struct Rfloat(f64)`
Expand All @@ -61,31 +63,35 @@ Each vector can contain special `NA` values. None of the primitive types have bu

Note: `complex` is an `(f64, f64)` struct. Support of implementations such as `num::complex::Complex` can be enabled using feature-guards (see example of arrays and `ndarray`).

Each of these types is binary compatible with their underlying type. An array of, say `i32`, represented by a `*i32` pointer and length, can be viewed as `*Rint` of the same length.
Each of these types is binary compatible with their underlying type. An array of, say `i32`, represented by a `*i32` pointer and length, can be viewed as `*Rint` of the same length.
This can be the preferred solution when dealing with `R` plain vectors.

For each supported primitive type `T` `Rt` would be its minimal wrapper. E.g., for `T = i32`, `Rt = Rint`.
`extendr` prefers `Rt` over `T` types. Parameters that use `T`-derived types will require runtime `NA` validation, introducing implicit overhead.

Type conversion traits for `Rt` are:

- `Into<Option<T>>` (this is always a valid conversion),
- `TryInto<T>`, errors on `NA`,
- `TryFrom<T>`, errors if provided argument equals to the value reserved for `NA`,
- `TryFrom<Option<T>>`, errors if provided argument is `Some(NA)`, i.e. wraps value reserved for `NA`.
- `TryFrom<Option<T>>`, errors if provided argument is `Some(NA)`, i.e. wraps value reserved for `NA`.

These conversions can be grouped in a trait `Rtype<T>`, which exposes conversions `Rt` <--> `T` mentioned above, as well as some `is_na() -> bool` method (and perhaps some other useful ones).

A limited number of binary-incompatible type conversions is also allowed. These rules are required to support common use scenarios on `R` side.

For `Rint` the following is allowed:

- `Into<Rfloat>`, this is always correct (all `i32` are within `f64` with no loss of accuracy)
- `Into<Rcomplex>`, for the same reason

For `Rfloat`

- `Into<Rcomplex>`, (`Real(f64)` are within `(Real(f64), Imaginary(f64))`)
- `TryInto<Rint>`; this conversion succeeds only when `f64` can be precisely represented as `i32` (lossless), e.g. `1.0f64` convert to `1i32`

For `Rcomplex` (see reasoning above)

- `TryInto<Rfloat>`
- `TryInto<Rint>`

Expand All @@ -94,59 +100,60 @@ Other primitive types are treated as-is and any type conversion should be perfor
### ALTREP

A separate public API for ALTREPs is not needed, there are no real use cases for a method to only accept ALTREPs. Instead, expose the following wrapper types:

- `Integers`
- `Logicals`
- `Doubles`
- `Raws`
- `Complexes`
- `Strings`


These opaque types wrap either plain data vectors (e.g., storing pointer & length) or ALTREPs.
These opaque types wrap either plain data vectors (e.g., storing pointer & length) or ALTREPs.
They should implement `std::iter::Iterator<Item = Rt>` to support `NA` validation, as well as `std::ops::Index<Output = Rt>`.

Another suggested methods:

- `len() -> usize` as both plain data and ALTREP know their size,
- `is_altrep() -> bool` to avoid unnecessary random access in case of ALTREP

The iterators are enriched with the following discriminated unions:

- `Numerics = Integers | Doubles`
- `ComplexNumerics = Integers | Doubles | Complexes`


## Naming convention
- Public (exported) wrapper types should have simple and concise names, possibly derived from names of their counterparts on `R` side, or from `{cpp11}` naming convention. Types that wrap vectors should have pluralized names, e.g. `Integers` (not `Integer`), to emphasize that they wrap a collection, not a single value. The notable exception is `Strings`, which is the preferred name for `R`'s `character()` (which is not a *character* collection, but rather a collection of pointers to strings, thus *strings*). Wrappers of non-vector types should (in most cases) have the same name as the type they wrap, e.g. `List`, `PairList`, `Environment`, `DataFrame` (period is removed).

- Public (exported) wrapper types should have simple and concise names, possibly derived from names of their counterparts on `R` side, or from `{cpp11}` naming convention. Types that wrap vectors should have pluralized names, e.g. `Integers` (not `Integer`), to emphasize that they wrap a collection, not a single value. The notable exception is `Strings`, which is the preferred name for `R`'s `character()` (which is not a *character* collection, but rather a collection of pointers to strings, thus *strings*). Wrappers of non-vector types should (in most cases) have the same name as the type they wrap, e.g. `List`, `PairList`, `Environment`, `DataFrame` (period is removed).

- Rust iterator types should be suffixed with `Iter`, e.g. `IntegersIter` or `ListIter`. This establishes a 1-to-1 relationship between a wrapper and its iterator.


- Rust types wrapping altreps should have `Alt` prefix, e.g. `AltIntegers`. This is somewhat similar to the naming convention adopted in `R`' headers (see `include/R_ext/Altrep.h`). If an iterator type is provided for `AltT`, then both prefix and suffix should be used, e.g. `AltIntegersIter` or `AltNumericsIter`.




-----------------------------------------------------------------------------------------------
<details>
<summary> TL;DR </summary>
Here is a set of functions with different parameter types and allowed arguments.

1. Default (aka comfortable on both ends)

```Rust
#[extendr]
fn fn_1(x : Vec<i32>)
```

| `R` type | Allocation | Coercion | Error | Validation |
| ---------------------- | ----------- | -------- | ---------------- | ------------------ |
| `integer()` | Yes | No | If `NA` found | Runtime |
| `altrep_integer()` | Yes | No | If `NA` found | Runtime |
| `real()` / `complex()` | Yes | Yes | If `NA` found | Runtime |

2. Close to metal (aka performance)

```Rust
#[extendr]
fn fn_2(x : ComplexNumerics)
```

| `R` type | Allocation | Coercion | Error | Validation |
| ---------------------- | ----------- | -------- | ---------------- | ----------- |
| `integer()` | No | No | No | User |
Expand All @@ -156,10 +163,8 @@ fn fn_2(x : ComplexNumerics)
| `complex()` | No | No | No | User |
| `altrep_complex()` | No | No | No | User |




</details>

# Return type conversions

The procedure is reversed. The preferred way it so return a `Vec<Rt>`, which is correctly encodes `NA`s. If `Vec<T>` is returned, then validation is performed by the wrapper, and `panic!` occurs if an invalid value is found (i.e., if `Vec<i32>` contains `i32::MIN`, which is an invalid value in `R`).
13 changes: 10 additions & 3 deletions extendr-api/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# extendr-api


A safe and user friendly R extension interface.

* Build rust extensions to R.
Expand All @@ -13,6 +12,7 @@ See [Robj] for much of the content of this crate.
[Robj] provides a safe wrapper for the R object type.

Use attributes and macros to export to R.

```rust
use extendr_api::prelude::*;
// Export a function or impl to R.
Expand All @@ -38,6 +38,7 @@ result <- fred(1)
[Robj] is a wrapper for R objects.
The r!() and R!() macros let you build R objects
using Rust and R syntax respectively.

```rust
use extendr_api::prelude::*;
test! {
Expand Down Expand Up @@ -100,6 +101,7 @@ test! {
To index a vector, first convert it to a slice and then
remember to use 0-based indexing. In Rust, going out of bounds
will cause and error (a panic) unlike C++ which may crash.

```rust
use extendr_api::prelude::*;
test! {
Expand All @@ -114,6 +116,7 @@ test! {
```

Much slower, but more general are these methods:

```rust
use extendr_api::prelude::*;
test! {
Expand All @@ -135,6 +138,7 @@ The [R!] macro lets you embed R code in Rust
and takes Rust expressions in {{ }} pairs.

The [Rraw!] macro will not expand the {{ }} pairs.

```rust
use extendr_api::prelude::*;
test! {
Expand Down Expand Up @@ -165,6 +169,7 @@ test! {

The [r!] macro converts a rust object to an R object
and takes parameters.

```rust
use extendr_api::prelude::*;
test! {
Expand All @@ -175,6 +180,7 @@ test! {
```

You can call R functions and primitives using the [call!] macro.

```rust
use extendr_api::prelude::*;
test! {
Expand Down Expand Up @@ -220,9 +226,10 @@ test! {
}
```

## Feature gates
## Feature gates

extendr-api has some optional features behind these feature gates:

* `ndarray`: provides the conversion between R's matrices and [ndarray](https://docs.rs/ndarray/latest/ndarray/).
* `num-complex`: provides the conversion between R's complex numbers and [num-complex](https://docs.rs/num-complex/latest/num_complex/).
* `serde`: provides the [Serde](https://serde.rs/) support.
Expand All @@ -233,12 +240,12 @@ extendr-api has different encodings (conversions) of a `Result<T,E>` into an `Ro
In below `x_ok` represents an R variable on R side which was returned from rust via `T::into_robj()` or similar.
Likewise `x_err` was returned to R side from rust via `E::into_robj()` or similar.
extendr-api

* `result_list'` `Ok(T)` is encoded as `list(ok = x_ok, err = NULL)` and `Err` as `list(ok = NULL, err = e_err)`
* `result_condition'` `Ok(T)` is encoded as `x_ok` and `Err(E)` as `condition(msg="extendr_error", value = x_err, class=c("extendr_error", "error", "condition"))`
* Multiple of above result feature gates. Only one result feature gate will take effect, the precedence is currently [`result_list`, `result_condition`, ... ].
* Neither of above (default) `Ok(T)` is encoded as `x_ok`and `Err(E)` will trigger `throw_r_error()` which is discouraged.


## License

MIT

0 comments on commit 3380292

Please sign in to comment.