-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self hosting tokenizer #9
Open
nickd4
wants to merge
20
commits into
uho:master
Choose a base branch
from
nickd4:self_hosting_tokenizer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…to bss rather than text section which avoids the need to call mprotect(), rename things
… be wrapped with PROGRAM / END, also removes automatic bye token that was generated by END
…time.seedsource, so that we can run textual forth code without the tests or the banner
… writes to stderr, fix self-hosted tokenizer termination issue (was debugged with eemit)
e3b1a9b
to
7115f49
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
More hacking... what I set out to do was to make the seedForth tokenizer self-hosting, so that after bootstrap you would not need gForth to develop applications. So my idea was make the tokenizer work in gForth like now (for bootstrapping) and also work in seedForth interactive version (for application development). It turned out to be quite difficult, but ultimately it works.
So the actual changes to
seedForth-tokenizer.fs
to make it run under seedForth were not that huge, mainly a matter of accounting for seedForth's case sensitivity and restricted syntax for hex and character literals and various things like that, as well as minor differences in the words available (parse-name
instead of<name>
etc). But the larger difficulty was in making aseedForth
orseedForthInteractive
program run cleanly as a filter. I had to modify the runtime library and I/O system a lot.There was also another issue to deal with which concerns the wrapping of the
*.seed
and*.seedsource
files. Originally the input was wrapped inPROGRAM
/END
and the output was wrapped with an automaticbye
token added at the end. I have removed the need for all of this wrapping, at the cost of its being slightly more awkward to invoke thegForth
version of the tokenizer. Since this is only done from theMakefile
during bootstrap, that's not a big deal. It's only just occurred to me now that the unusual extension*.seedsource
was probably due to the wrapping, so maybe we can rename them to*.forth
now?Here is a detailed summary of all the changes I have made to support the self-hosting tokenizer:
./seedForth-tokenizer
script, which operates as a filter and takes a*.seedsource
file onstdin
and outputs the corresponding*.seed
file onstdout
. It works similarly to./seed
by concatenating the various input files into./seedForth
.cat
into every compiled preForth/seedForth application, so it will either processstdin
if there are no command line arguments, or else open and read each file specified on the command line in sequence, where-
isstdin
. The effect of this change is that when running./seed
you no longer need to press Enter after typingbye
to makeseedForth
quit. The extra keystroke was needed to force the front-endcat
invocation to try to send something toseedForth
and then it would realize the pipe was broken and quit. WithseedForth
managing its own input, you can quit cleanly.key?
word use apoll()
rather thanioctl()
system call, since we now expect thatstdin
might come from a file.eemit
word throughout the system which is the same asemit
but writes tostderr
. I use this for debugging.key
andemit
to higher token numbers, to make it easier to detect theEOT
character which used to correspond to thekey
token. Implement a neweot
token at 4 which is similar to thebye
token. The reason for this change is because thebye
token was overloaded to use as[
, i.e. it would restart the interpreter after compiling the;
token and during certain control flow constructs. This meant you couldn't compile abye
token into a program. By moving the original usage ofbye
onto the neweot
token, it meansbye
is no longer special and can be compiled normally, while also the changes to the existing system are minimal, and as a bonus, if input runs out during a:
-definition, the resulting EOT will be interpreted as[
and send us back to interpretive state, where a further EOT is considered invalid and quits the interpreter too.seedForthInteractive.seedsource
into a newseedForthRuntime.seedsource
and fromhi.forth
into a newruntime.forth
. The original filesseedForthInteractive.seedsource
andhi.forth
still exist and contain all the tests as well as less essential words likesqr
which you can grab if you actually need them. To use the system as it was previously, you have to tokenizeseedForthRuntime.seedsource
+seedForthInteractive.seedsource
intoseedForthInteractive.seed
and then pass itruntime.forth
+hi.forth
and theMakefile
and./seed
script have been updated appropriately. But when running the self-hosted tokenizer, it uses a different*.seed
file which is basically generated by tokenizingseedForthRuntime.seedsource
+ a call toboot
, and it uses theruntime.forth
without thehi.forth
part.tib
from 80 to 255 characters, also fix a bug inaccept
which allowed it to write one character beyond thetib
. Note that some source lines in the system as originally were > 80 characters, and I think they may have been silently truncated and the incomplete code not noticed. The extra character seemed to cause a crash on my Z80 port, which alerted me to the issue. There isn't really a good way to flag too-long lines to the user, but I have at least made it not echo any extra characters.accept
,refill
andrestart
words to detect EOT and return something or quit. This is needed to prevent the self-hosted tokenizer from hanging after it tokenizes all the input. I'm not entirely happy with the solutions I came up with here, and I think possibly the entire concept of using EOT as a marker for the end of input might be flawed. Could we makekey
throw an exception instead? I'm not a very experienced Forth programmer so I don't really know how this would be done conventionally. But at any rate, you can now exit from./seed
by typing Ctrl-D (no Enter) orbye
and I find the first more comfortable. Just be aware that a partial last line is not supported, either in a*.seedsource
file or a./seed
session, it will say "not found".echo
andinput-echo
be off. That's because after loading a seedForth runtime from*.seed
file, you will always want to load further runtime as textual Forth source. So it's cleaner to let the second runtime enable echo. This makes the output of./seed
cleaner as well. But primarily it's needed to avoid junk getting into the tokenized*.seed
files.DO
/?DO
/LOOP
, as the experimental?DO
that was commented didn't have a correct companionLOOP
.Some of the more detailed changes might not be well explained in the above summary, or might be objectionable for whatever reason, so please feel free to check with me. Also, keep in mind that this changeset is "on top of" the previous changeset that I PR'ed the other day, so github will show both changesets. It's annoying the way github does this, and it does not recalculate the changeset after you merge the first PR. But you can force it to, by changing the base branch name and then changing it back.
I had a really good time doing this, even though it involved a lot of head-scratching and dealing with strange crashes and errors and unexpected behaviour. As I mentioned I'm not an experienced Forth programmer, but I've become more conversant with it.
Note: There is a minor bug in this PR, that I had directly invoked
gforth
inMakefile
instead of$(HOSTFORTH)
. It is fixed in #12 so I have not fixed it here. If you do want the fixed version of this PR see the branchself_hosting_tokenizer1
in my github account. I wouldn't recommend using that branch though, because it will cause conflicts later when mergining #12 and others.