Skip to content

Commit 57775f2

Browse files
committed
updated documentation and readme
1 parent be1ccd3 commit 57775f2

7 files changed

+77
-24
lines changed

README.md

+5-13
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[![docs](https://docs.rs/antlr-rust/badge.svg)](https://docs.rs/antlr-rust)
33
[![Crate](https://img.shields.io/crates/v/antlr_rust.svg)](https://crates.io/crates/antlr_rust)
44

5-
ANTLR4 runtime for Rust programming language
5+
[ANTLR4](https://github.com/antlr/antlr4) runtime for Rust programming language.
66

77
Tool(generator) part is currently located in rust-target branch of my antlr4 fork [rrevenantt/antlr4/tree/rust-target](https://github.com/rrevenantt/antlr4/tree/rust-target)
88
Latest version is automatically built to [releases](https://github.com/rrevenantt/antlr4rust/releases) on this repository.
@@ -13,9 +13,6 @@ and [tests/my_tests.rs](tests/my_test.rs) for actual usage examples
1313

1414
### Implementation status
1515

16-
Everything is implemented, "business" logic is quite stable and well tested, but user facing
17-
API is not very robust yet and very likely will have some changes.
18-
1916
For now development is going on in this repository
2017
but eventually it will be merged to main ANTLR4 repo
2118

@@ -40,7 +37,7 @@ Can be done after merge:
4037
- run rustfmt on generated parser
4138
###### Long term improvements
4239
- generate enum for labeled alternatives without redundant `Error` option
43-
- option to generate fields instead of getters by default
40+
- option to generate fields instead of getters by default and make visiting based on fields
4441
- make tree generic over pointer type and allow tree nodes to arena.
4542
(requires GAT, otherwise it would be a problem for users that want ownership for parse tree)
4643
- support stable rust
@@ -84,12 +81,6 @@ I.e. for `MultContext` struct will contain `a` and `b` fields containing child s
8481
`op` field with `TerminalNode` type which corresponds to individual `Token`.
8582
It also is possible to disable generic parse tree creation to keep only selected children via
8683
`parser.build_parse_trees = false`.
87-
88-
### Key properties
89-
- Supports full zero-copy parsing including byte parsers
90-
(you should be able to write zero-copy serde deserializers).
91-
- Supports downcasting in places where type is not known statically(trait objects and embedded action)
92-
- Listener and
9384

9485
### Differences with Java
9586
Although Rust runtime API has been made as close as possible to Java,
@@ -106,11 +97,12 @@ there are quite some differences because Rust is not an OOP language and is much
10697
If you need exactly the same behavior, use `[u32]` based `InputStream`, or implement custom `CharStream`.
10798
- In actions you have to escape `'` in rust lifetimes with `\ ` because ANTLR considers them as strings, e.g. `Struct<\'lifetime>`
10899
- To make custom tokens you should use `@tokenfactory` custom action, instead of usual `TokenLabelType` parser option.
109-
In Rust target TokenFactory is main customisation interface that allows to specify input type of token type.
100+
ANTLR parser options can accept only single identifiers while Rust target needs know about lifetime as well.
101+
Also in Rust target `TokenFactory` is the way to specify token type. As example you can see [CSV](grammars/CSV.g4) test grammar.
110102
- All rule context variables (rule argument or rule return) should implement `Default + Clone`.
111103

112104
### Unsafe
113-
Currently, unsafe is used only to cast from trait object back to original type
105+
Currently, unsafe is used only for downcasting (through another crate)
114106
and to update data inside Rc via `get_mut_unchecked`(returned mutable reference is used immediately and not stored anywhere)
115107

116108
### Versioning

src/input_stream.rs

+4-2
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,13 @@ pub type CodePoint8BitCharStream<'a> = InputStream<&'a [u8]>;
5151
pub type CodePoint16BitCharStream<'a> = InputStream<&'a [u16]>;
5252
pub type CodePoint32BitCharStream<'a> = InputStream<&'a [u32]>;
5353

54-
impl<'a, T> CharStream<&'a [T]> for InputStream<&'a [T]>
54+
impl<'a, T> CharStream<Cow<'a, [T]>> for InputStream<&'a [T]>
5555
where
5656
[T]: InputData,
5757
{
58-
fn get_text(&self, a: isize, b: isize) -> &'a [T] { self.get_text_inner(a, b).into() }
58+
fn get_text(&self, a: isize, b: isize) -> Cow<'a, [T]> {
59+
Cow::Borrowed(self.get_text_inner(a, b))
60+
}
5961
}
6062

6163
impl<'a, T> CharStream<String> for InputStream<&'a [T]>

src/lib.rs

+53-7
Original file line numberDiff line numberDiff line change
@@ -16,26 +16,73 @@
1616
#![warn(trivial_numeric_casts)]
1717
//! # Antlr4 runtime
1818
//!
19+
//! **This is pre-release version.**
20+
//! **Some small breaking changes are still possible, although none is currently planned**
21+
//!
1922
//! This is a Rust runtime for [ANTLR4] parser generator.
2023
//! It is required to use parsers and lexers generated by [ANTLR4] parser generator
2124
//!
2225
//! This documentation refers to particular api used by generated parsers,lexers and syntax trees.
2326
//!
24-
//! For info on how to generate parser please refer to:
27+
//! For info on what is [ANTLR4] and how to generate parser please refer to:
2528
//! - [ANTLR4] main repository
26-
//! - [README](https://github.com/rrevenantt/antlr4rust/blob/master/README.md) for Rust target
29+
//! - [README] for Rust target
2730
//!
2831
//! [ANTLR4]: https://github.com/antlr/antlr4
32+
//! [README]: https://github.com/rrevenantt/antlr4rust/blob/master/README.md
33+
//!
34+
//! ### Customization
35+
//!
36+
//! All input and output can be customized and optimized for particular usecase by implementing
37+
//! related trait. Each of them already has different implementations that should be enough for most cases.
38+
//! For more details see docs for corresponding trait and containing module.
39+
//!
40+
//! Currently available are:
41+
//! - [`CharStream`] - Lexer input, stream of char values with slicing support
42+
//! - [`TokenFactory`] - How lexer creates tokens.
43+
//! - [`Token`] - Element of [`TokenStream`]
44+
//! - [`TokenStream`] - Parser input, created from lexer or other token source.
45+
//! - [`ParserRuleContext`] - Node of created syntax tree.
2946
//!
3047
//! ### Zero-copy and lifetimes
3148
//!
3249
//! This library supports full zero-copy parsing. To allow this
33-
//! `'input` lifetime is used everywhere inside.
50+
//! `'input` lifetime is used everywhere inside to refer to data borrowed by parser.
51+
//! Besides reference to input it also can be [`TokenFactory`] if it returns references to tokens.
52+
//! See [`ArenaFactory`] as an example of such behavior. It allocates tokens in [`Arena`](typed_arena::Arena) and return references.
53+
//!
54+
//! Using generated parse tree you should be careful to not require longer lifetime after the parsing.
55+
//! If that's the case you will likely get "does not live long enough" error on the input string,
56+
//! despite actual lifetime conflict is happening much later
3457
//!
35-
//! Rust infers lifetimes from the end. It means that if something requires longer lifetime
36-
//! when you are using generated tree, then you will get error
58+
//! If you need to generate owned versions of parse tree or you want simpler usage,
59+
//! you can opt out zero-copy by requiring `'input` to be static. In this case it is easier to also use
60+
//! types that contains "owned" in their name or constructor function like `OwningTokenFactory`
61+
//! or `InputStream::new_owned()`
3762
//!
63+
//! ### Visitors and Listeners
3864
//!
65+
//! Currently visitors and listeners must outlive `'input`.
66+
//! In general this means that visitor has either `'static` or `'input` lifetime.
67+
//! Thus you can retrieve references to parsed data from syntax tree to listener/visitor. (as example you can see visitor test)
68+
//!
69+
//! You can try to give visitor outside references but in this case
70+
//! if those references do not outlive `'input` you will get very confusing error messages,
71+
//! so this is not recommended.
72+
//!
73+
//! ### Downcasting
74+
//!
75+
//! Rule context trait object support downcasting even for zero-copy case.
76+
//! Also generic types(currently these are `H:ErrorStrategy` and `I:`[`TokenStream`]) that you can
77+
//! access in generated parser from embedded actions also can be downcasted to concrete types.
78+
//! To do it `TidExt::downcast_*` extension methods should be used.
79+
//!
80+
//! [`CharStream`]: crate::char_stream::CharStream
81+
//! [`TokenFactory`]: crate::token_factory::TokenFactory
82+
//! [`ArenaFactory`]: crate::token_factory::ArenaFactory
83+
//! [`Token`]: crate::token::Token
84+
//! [`TokenStream`]: crate::token_stream::TokenStream
85+
//! [`ParserRuleContext`]: crate::parser_rule_context::ParserRuleContext
3986
4087
#[macro_use]
4188
extern crate lazy_static;
@@ -112,8 +159,7 @@ pub mod token;
112159
pub mod trees;
113160
mod utils;
114161
//pub mod tokenstream_rewriter_test;
115-
#[doc(hidden)]
116-
pub mod atn_type;
162+
mod atn_type;
117163
pub mod rule_context;
118164
pub mod vocabulary;
119165

src/parser_rule_context.rs

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
//!
2+
//!
3+
//!
4+
//!
15
use std::any::{type_name, Any};
26
use std::borrow::{Borrow, BorrowMut};
37
use std::cell::{Ref, RefCell, RefMut};
@@ -20,8 +24,6 @@ use crate::tree::{
2024
};
2125
use better_any::{Tid, TidAble, TidExt};
2226

23-
// use crate::utils::IndexIter;
24-
2527
pub trait ParserRuleContext<'input>:
2628
ParseTree<'input> + RuleContext<'input> + Debug + Tid<'input>
2729
{

src/token_factory.rs

+9
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,15 @@ pub type ArenaCommonFactory<'a> = ArenaFactory<'a, CommonTokenFactory, CommonTok
192192

193193
/// This is a wrapper for Token factory that allows to allocate tokens in separate arena.
194194
/// It can allow to significantly improve performance by passing Tokens by references everywhere.
195+
///
196+
/// Requires `&'a Tok: Default` bound to produce invalid tokens, which can be easily implemented
197+
/// like this:
198+
/// ```text
199+
/// lazy_static!{ static ref INVALID_TOKEN:CustomToken = ... }
200+
/// impl Default for &'_ CustomToken {
201+
/// fn default() -> Self { &**INVALID_TOKEN }
202+
/// }
203+
/// ```
195204
// Box is used here because it is almost always should be used for token factory
196205
#[derive(Tid)]
197206
pub struct ArenaFactory<'input, TF, T>

tests/gen/labelsparser.rs

+1
Original file line numberDiff line numberDiff line change
@@ -1031,6 +1031,7 @@ where
10311031
&mut recog.base,
10321032
)))?,
10331033
}
1034+
10341035
let tmp = recog.input.lt(-1).cloned();
10351036
recog.ctx.as_ref().unwrap().set_stop(tmp);
10361037
recog.base.set_state(36);

tests/gen/simplelrparser.rs

+1
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,7 @@ where
404404
recog.base.set_state(7);
405405
recog.base.match_token(ID, &mut recog.err_handler)?;
406406
}
407+
407408
let tmp = recog.input.lt(-1).cloned();
408409
recog.ctx.as_ref().unwrap().set_stop(tmp);
409410
recog.base.set_state(13);

0 commit comments

Comments
 (0)