-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathnotes.txt
712 lines (536 loc) · 26.9 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
The Definitive Crack Language Reference File
============================================
Copyright 2009 Google Inc.
This is a rough description of where the language is _going._ Lots of these
features haven't been implemented yet.
---------------------
#!/usr/bin/crack
stdout `hello world`;
---------------------
#!/usr/bin/crack
# Shell-style comments work
// so do ansi-C++ style
/* so do old-skool C style */
stdout.write("double-quoted strings - these are strings of _bytes_, "
"they just happen to be ascii characters. Constant string "
"concatenation works like in C."
);
stdout.write('Single-quoted strings are just like double-quoted strings '
'except for "which quote\'s" need to be escaped.'
);
# a verbose, java style variable initialization ("new" is implicit)
String var1 = String();
# Same as var1 - default constructor is implicit
String var2;
# yet another way to define a variable, ":=" defines and initializes. ''
# is equivalent to "String()" (roughly - it's actually a StaticString).
var3 := '';
# var4 will be null - like in C or java.
String var4 = null;
# reassign it to reference var1. All object variables are essentially
# pointers.
var4 = var1;
# the expression evaluates to true
if (var1 == var2 && var2 == var3 && var3 == var4)
stdout.write('"==" compares by value.');
# the identity check also works (constant strings are cached, String()
# returns an instance from the cache. You can negate it with "a !is b".
if (var1 is var2 && var2 is var3 && var3 is var4)
stdout.write('"is" compares identity.');
# the back-tick is a string formatting operator. When used on a stream
# (OutStream) object, this:
stdout `this is $var2`;
# will do the same thing as this:
stdout.format('this is ');
stdout.format(var2);
# when used in an expression context, this produces a string.
String formatted = `this is $var1`;
# which is equivalent to this: (assuming that we end up with return value
# chaining)
String formatted = StringOutStream().format('this is ').format(var1);
# primitive types - these are like the corresponding primitive types in
# java and are copied by value.
int x = 100;
bool a = true;
float f = 1.5;
# Their high-level counterparts with all of the advantages and
# disadvantages of full objects. (autoboxing is supported)
Int y = x;
Bool b = a;
Float g = f + x; # integer gets promoted to float in this expression
/**
* A function definition. Doxygen style docstrings are supported as
* first-class language elements.
* @param text some text that will be printed to standard output.
* @return the text that was printed.
*/
String myFunc(String text = 'default value') {
stdout.write(text);
return text;
}
# keyword arguments are supported. We use a colon to indicate keyword
# args instead of an equal sign to avoid ambiguity with the assignment
# operator ("text: str = 'some non default text'" would be valid)
myFunc(text: 'some non default text');
## Another type of docstring. ('///' is also supported)
void myOtherFunc() {
stdout `less interesting\n`;
}
# functions are objects (not sure about the typedef)
typedef String Callback(String val);
void useCallback(Callback cb) {
cb('something worth writing');
}
useCallback(myFunc);
# so are "bound methods" - which are a nice substitute for currying
# Not sure about "struct", see notes below
struct Masala {
int count;
void run(String text) {
for (i :in range(count))
stdout.write(text);
}
}
useCallback(Masala(100).run);
---------------------
#!/usr/bin/crack
import storage;
// "struct" is a class with an implicit constructor with optional
// arguments for each instance variable. The 'struct' keyword may end up
// being omitted in favor of the more general annotations mechanism
// (e.g. "@attr_constructor class Note : storage.StorableObject { ..."
struct Note : storage.StorableObject {
String title;
String desc;
};
// Create an instance of "Note" with title and desc arguments and invoke
// the store() method to store it.
Note('todo: build crack',
"Here's an example note that we're putting into our database."
"I haven't figured out how best to do multi-line strings as of yet."
).store();
// Define a "note2" variable, initialize with keyword arguments.
// note that we use "arg:" instead of "arg =" for keyword args so as not
// to conflict with assignment as an expression.
Note note2 = { title: 'todo: Add special note classes for todos',
desc: 'so we can diferentiate them programmatically.'
};
note2.store();
struct ToDo : Note {
bool done = false;
};
// Create a new todo, expand note2 into the argument list and augment with
// "done." Define and initialize with the ":=" operator.
note3 := ToDo(*note2, done = true);
note2.delete();
note3.store();
Basic Dogma:
1) Use the existing wiring.
2) Common patterns should be syntactically simple and terse.
3) Everything should be fast. Development cycles should be fast, the
runtime code should be fast.
Resulting rules:
1) Use C/C++/Java syntax except where it's broken.
2) compiles should be single-pass. Code generation should be at the time
of compile, with code generation steps occuring after every statement.
3) Documentation is a first-class citizen.
4) Rich augmentation of language structures through annotations and macros.
Other philosophical stances:
4) We do not try to restrict the coder to a set of acceptible patterns: the
coder is assumed to be responsible and we consider the role of a
language to be to facilitate the desires of the coder, not to restrict
them. That said, we chose to promote certain patterns
(e.g. reference counting") and we will not invest in providing special
support for features that we regard as anti-patterns. For example, we
regard cyclic dependencies as ultimately a bad thing, so we will not
increase the complexity of the compiler in order to facilitate them.
Curly brackets are magical - they're semantics are dependent on their context.
So in initialization context:
class Foo {
int a;
int b;
Foo(int a, int b) : a(a), b(b) {}
} // the semicolon is unncessary here.
Foo f = { a: 100, b: 200}; // equivalent to f := Foo(a: 100, b: 200);
In other expression contexts, they are lambdas:
tree.traverse({ _.apply() });
After keywords, they have meaning defined by the keyword:
if (expr) { statements ... }
do { statements ... }
struct { x = 100, y = 10 } // anonymous structure (like a named tuple)
Lambdas:
Lambdas should be expressible with minimal syntax, but flexible enough to
specialize. so we can do:
list.collect(lambda (Int sum, int item) { sum += item },
0 // initial value for first arg
);
or:
class List[T : Object] {
...
List[T] filter(bool filterFunc(T item)) {
List[T] result;
for (item :in this) {
if (filterFunc(item))
result.append(item);
}
}
...
}
List[Foo] list;
list.filter({ item.a > 100 }); // implicit lambda, implicit use of
// "item" parameter, implicit return.
list.filter(lambda(Foo x) { return x.a > 100; }); // equivalent
// longer form
Alternately, for single argument functions:
list.filter({ _.a > 100 }); // can use _ or item, not both.
Formatting:
The back-tick character works like a string with built-in formatting.
When used in an expression context, it evaluates to a string:
x := `my name is $name`;
When used immediately after an OutStream, it formats its contents to the
stream:
stdout `my name is $name`;
The above is equivalent to the following operations:
stdout.format('my name is ');
stdout.format(name);
OutStream.format() is defined for all of the primitive types. For Object,
it is implemented as:
void format(Object obj) { obj.writeTo(this); }
So you can override the writeTo() method for objects, and you can override
and overload the format() method for OutStream derivatives, giving you a
very flexible way to do special formatting.
Access control:
Access control is specified via naming conventions. Symbols beginning
with a double underscore ('__') are private to the context where they are
defined. Symbols beginning with a single underscore ('_') are "protected"
in a sense similar to java's "package protected" - they can be accessed
from derived classes and from within the same module.
How does all of this work for nested modules and module scoped stuff?
Modules:
By default, every source file is a "module". Every directory full of
source files is a "package" which is really just a module. Packages can
have other things in them besides nested modules if there is also a source
file of the same name: e.g. if "foo.crk" is a file and "foo/" is a
directory.
A source file module can include nested modules by use of the "module"
keyword:
module nested {
void _privateFunction() { ... }
void foo() { ... }
} // the "nested" module
nested.foo();
Nested modules have the useful property of being compiled and executed
while the source code module is still being compiled. This lets you
define annotations in a nested module that can be used to annotate code in
the enclosing module.
Just like any other symbols, nested modules can be accessed internally
subject to the access control rules - so nested modules whose names begin
with an underscore are private.
It is possible to define anonymous modules. Like other nested modules,
these are compiled and executed prior to the completion of the compile of
the outer module, but all symbols implicitly imported into the outer
namespace:
module {
void annotation(Function func) { ... }
}
// we can use the annotation defined in the anonymous module with no
// qualification.
@annotation void annotatedFunction() { ... }
Note that anonymous modules are not like anonymous namespaces in C++ -
symbols in anonymous modules are externally accessible as if they were
part of the outer module.
Function Resolution Order:
The rules for function resolution and argument matching are as follows:
1) Check the current context for a function matching the arguments.
Functions are checked in the order in which they are defined.
2) If none is found, check the parent contexts from left to right
depth-first recursively.
3) If no match is found, repeat from step 1 attempting to convert arguments
This doesn't apply to constructors - constructors resolution is not
delegated. Constructor inheritence works by generating constructors given
the set of constructors and instance variables of the base classes.
Before step 3, we can still do conversions of "adaptive" expressions
(currently only constants). If all of the arguments of a function are
adaptive, the first will not go through conversion. This is a work-around
for some extremely unintuitive behavior introduced by the normal
resolution order and the fact that constant integer types are very
accomodating (1 - 2 == 255 without it).
Generics:
Crack deviates from the C++ and Java generic syntax so as not to clash
with the use of the <> operators. It uses the square brackets instead of
the angle brackets:
Array[Foo] getAllFoos() { ... }
For typecasts, we can use the '*' operator:
foos := Array[Foo] * getObjectArray();
Although something like "cast" might be a better idea:
foos := Array[Foo].cast(getObjectArray());
Cleanups:
When we leave a context, we need to run "oper release" on all variables
within that context that define this method:
while (cond) {
Complex obj;
...
// obj.oper release() needs to happen here
}
We can't just always emit this code at the end because of terminal statements
void func() {
Complex obj1;
while (cond) {
Complex obj2;
if (cond2)
// cleanup obj2, then obj1
return;
// obj2.oper deref()
}
// obj1.oper deref()
}
Primitive Arrays:
Primitive arrays use the Generic syntax but emit very efficient, C-like
array and pointer semantics. There is no bounds checking, no refernce
counting and no cleanup associated with primitive arrays. They are used
like this:
// allocate a 10 element array
arr := array[int](10);
arr[0] = 100;
cout `$(arr[0])\n`;
// free the array using a normal "free" function.
free(arr);
Since there is no reference count management, for derived types you need
to do your own bind and release:
arr := array[Object](10);
arr[0].oper release(); # release the existing value
String s = 'stirng value';
arr[0] = s;
arr[0].oper bind(); # bind the new value
# do cleanups
for (i := 0; i < 10; ++i)
arr[i].oper release();
free(arr);
Design principals:
In cases where we can provide useful information from the Parser or keep
track of it in the Builder, it's better to provide it in the Parser.
Example is the "terminal" argument of emitElse/emitEndIf.
In retrospect, the expression types tend to be terminal (we don't derive
from them) so it would have been better to let the Builder create derived
classes that implement emit(). Definitions (VarDef, FuncDef...), OTOH,
are derived from in ways orthogonal to the Builder's implementation
details, so it makes sense to use a Bridge paradigm.
Reference counting rules:
In general, expressions returning an Object provide a new reference,
expressions using Objects borrow the references of the caller.
However, for certain kinds of expressions (field references, for example)
it doesn't make sense to increment the reference count and then decrement
in the cleanup. So we introduce the notion of "productive" expressions:
an expression is productive if it returns a new reference. It is
non-productive if it borrows an existing reference (and as such, requires
no cleanup).
This is obviously the case for all sorts of variable references. It
should also be something that we can apply via an annotation to function
definitions, indicating that the function does not return a new reference.
If this is done, the compiler must verify that the function only returns
non-productive expressions. If the function is a method, this must also
be verified for all derived functions.
Cleanup frames are a way of tracking which cleanups need to be run when
we exit a particular context. Objects that have a "oper release()"
method may have that method called at the end of the cleanup frame.
Every time the compiler emits an expression it produces a result. We
have to either call handleTransient() or handleAssignment() on that result
depending on what we're doing with it.
handleTransient() means that the object is temporary - it lives for the
life of the expression it is contained in. If the result is
non-productive, handleTransient() doesn't have to do anything. But if
the result is productive, handleTransient() adds it to the current
cleanup frame and "oper release()" will get called on it when we're done
with the statement.
handleAssignment() has the complimentary semantics: if the expression is
productive, ignore it (the "produced" value will be consumed by an
assignment). If it is non-productive, call "oper bind()" on it.
Constructors:
- A class will inherit the constructors of its first base class if:
- it has no constructors of its own
- it either has no instance variables or all instance variables have
default constructors.
- all other base classes have a default constructor.
Adaptive Expressions:
Constant integers are special in that they adapt to the type required by
the context. So for example, in "byte b = 100;" the constant 100 is of
type byte, and we want to verify that the value of the constant is small
enough to fit in a byte.
But this is problematic given the rules about function resolution. For
example, because we can do implicit conversions from "bigger" integer
types to smaller ones, we want to define our operators so that the
smallest ones are resolved first: "byte + byte" should match before
"int32 + int32". But say, for example, that we add 1 to a byte variable b:
b + 1
This won't match "byte + byte" because 1 is stored as an int32, so the
result will be int32, even though you would expect it to be a byte.
We could match "byte + byte" on the second (converting) pass, but we won't
get there because without conversion the expression will match "int32 +
int32".
We remedy this by making the constant "adaptive." Instead of limiting
conversions to the second, converting resolution pass, the compiler
converts adaptive expressions during the first pass. So since 1 can
safely convert to a byte, the expression will match "byte + byte".
Unfortunately, this introduces another problem when dealing with
expressions consisting entirely of constants. Consider the case of "1 -
2". 1 and 2 can both convert to a byte value, so the expression ends up
matching "byte + byte", producing a byte value of 255!
There is a hack in place to work around this: if all of the arguments of a
function are adaptive, we inhibit conversion of the first argument in the
first resolution pass - so the 1 in "1 - 2" will be interpreted as an
int32 and force the selection of "int32 + int32".
This is a rather lame solution to the problem. Another possibility would
be to make constants default to the smallest accomodating type and
introduce constant folding into the function model - so the compiler would
recognize 1 and 2 as byte, but then fold "1 - 2" to a new constant
(presumable an int16 with a value of -1).
Ugliness:
- There should be a Def base class instead of deriving everything from
VarDef. Def should have a name and a "parent" Namespace but no type.
Then we can get rid of VarDefImpl and just have the builder
specialize VarDef. (type needs to move into Def because of meta-types)
- Context is bad:
- there's a dichotomy between "compile context" and "the symbol
table" Make these "Context" and "Namespace"
- the symbol table wants to be part of ModuleDef/ClassDef/FuncDef so
there's probably a NamespaceDef base class
- Namespace should make use of polymorphism to deal better with
funny resolution stuff.
- we should make the Parser::error() function part of Context so we
can use it from all levels.
- AncestorPaths should be reference counted objects cached by type defs.
- The way I'm doing explicit class specification causes some serious
problems. Methods are defined in both the class context and the
meta-class context. This introduces ambiguity when looking up methods
on a class object. The ambiguity becomes extremely problematic when
doing lookups on builtins like "oper bind" and "oper release", where
we now have to verify that the function is not an alias after lookup.
It's difficult to generalize the "no-alias" lookups because of
overloads - overloads can probably end up aggregating both owned
functions and aliased functions, so what do you do if a function has
both owned and aliased varieties?
Possible solutions:
- dump the Class.method() syntax, reserve it for methods of the
meta-class. Revert to C++'s Class::method() syntax instead.
- Allowing the java-like syntax for base method specification (e.g.
"BaseClass.method()") was a bad idea given that this can also refer to
a method applied to the class object. It creates an ambiguity in the
language and complicates the compiler, requiring the "no alias"
lookups to prevent class objects from picking up instance methods.
My current feeling is that we should revert to "BaseClass::method()"
for this sort of thing, especially since it works with a non-this
instance: "other.BaseClass::method()".
- Parser could use a re-write. We need to come up with a normalized
grammar that deals with the class-as-class/class-as-object ambiguity.
- When a class derives directly from VTableBase, we currently require
that VTableBase be the first base class in the list. Likewise, when a
derives (directly or indirectly) from Object, the Object lineage must
be the first base. I can't think of a good reason for this, and we
should probably relax these restrictions unless we encounter a good
reason.
Parsing Oddities:
in a class, you can have:
variable defs
func defs & class defs
in a code block:
variable defs
func defs & class defs
statements
in a condition:
variable defs
expressions
So clause ::= var_def | expression
but this doesn't help much because
class:
if tok is class:
parseClassDef()
else
expr = parsePrimary()
make sure expr is a typespec
#generics_and_lazy_imports
Inconsistencies in LazyImports:
Lazy imports can only be defined at the module level. This is to keep the
caching format lean: if lazy imports could be defined in any context we'd
have to scatter them throughout the objects in the metadata files.
The set of lazy imports in a module is attached to the generic types in
the module when they are defined. This single collection associated with
the module itself is shared across all generics in the module.
This means that, left unchecked, lazy imports could be defined after the
generics that use them - if the module were in the persistent cache,
another module could instantiate such a generic without issue. but if the
module were freshly compiled, the import would not be available xxx I
don't quite get why this is right now, though I understood it enough to
foresee it before) doesn't the module need to be compiled (and all lazy
imports collected) before the other module can use it?
Desired Grammer:
This is going to be the 1.0 grammar description of the language.
XXX Finish this. XXX
module ::= block ;
block ::= statement_or_def* ;
statement_or_def ::= statement | definition ;
statement ::= expression ';' | # block can be terminated by something
# other than a semicolon?
if_stmt |
'while' '(' expr ')' clause |
'for' '(' for_expr ')' clause |
try_stmt
if_stmt ::= if_root | if_root else ;
try_stmt ::= 'try' '{' block '}' catch '(' typespec identifier ')' '{'
block '}'
if_root ::= 'if' '(' expr ')' clause ;
else ::= 'else' clause ;
clause ::= statement ';' | block ;
condition = expr | var_def
typespec_list = typespec |
typespec_list ',' typespec ;
typespec ::= identifier |
typespec '.' identifier |
typespec '[' typespec_list ']
atom ::= constant | identifier ;
cluster ::= atom |
cluster specializer ;
specializer :: '.' identifier | '::' identifier ;
primary ::= cluster arg_list |
cluster '[' expr ']' ;
expr ::= primary |
primary op expr |
'(' expr ')'
Annotations:
Annotations should allow you to do manipulations on language objects
during compile. They are introduced by an "at" sign
Examples:
Forward Declarations:
Forward declarations of functions are currently legal with the following
caveats: forward declarations must be in the same block context as the
definition; forward declarations must be resolved by the close of their
block context; in the interests of allowing keyword arguments, forward
declarations must have the same argument names as their definitions.
It's possible that rule 2 may be relaxed somewhat in the long term. It's
often useful to have a class method definition defined after the class
definition ends (like when it makes use of another class that depends on
the initial class).
Ugly Stuff:
I can't how the ModuleDef namespace can ever get populated with the
contents of a module, but I can't see how imports could work without it.
It seems like when parsing a module, the module toplevel context should
reference the ModuleDef as its namespace
Parser Refactor:
Function processing should all be in the general expression location - not
in parsePostIdent(). For this to happen, method lookups without args
need to return a special kind of field reference that accepts "oper call"
and translates it to the underlying method invocation.
Glossary of terms in the code:
master/slave: When modules have a cyclic relationship (as in the case of
ephemeral modules) they must be cached together for complex reasons
involving the original LLVM jit. We call the outermost module (under
whose name the entire cyclic set is persisted) the "master" and the
modules persisted with it the "slaves". Slave modules still have
their own file in the cache, but it is only a tiny stub that
references the master module.
owner: The namespace that "owns" a definition (corresponding to the scope
where the definiton was defined). There can only be one owner. Other
namespaces may contain the definition, but only as an alias.
origin module/class: A generic instantiation is created as a
specialization of a generic class. The class it is instantiated from
is known as its "origin" class and the module of the origin class is
called the "origin module".