-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathparsing.theory.txt
35 lines (29 loc) · 1.23 KB
/
parsing.theory.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
┏━━━━━━━━━━━━━┓
┃ PARSING ┃
┗━━━━━━━━━━━━━┛
Goal is STR/serialized -> OBJ/deserialized
OBJ:
- CST (Concrete Syntax Tree): contains even meaningless tokens
- AST (Abtract Syntax Tree): does not contain them
Types:
- lexer/tokenizer/lexical analysis:
- group of characters -> tokens
- parser/syntax analysis:
- tokens -> nodes of a syntax tree
- add semantics to tokens
Formal language:
- letters: possible token
- words/well-formed-formulas: several tokens
- alphabet: set of possible letters
- language: set of possible words
- expressed using grammar, i.e. set of rules:
- rule: turn a possible combination of words into higher-level ones
- set of words on the left side (higher-level) are "non-terminal words", all other are terminal
Type of grammar:
- regular grammar: cannot be recursive
- context-free grammar: rules have:
- left-side: 1 non-terminal word
- right-side: 0+ [non-]terminal words
- context-sensitive grammars: rules have:
- left|right-side: 0+ [non-]terminal words
- unrestricted grammar