Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some thoughts on edge cases #1

Open
Deewiant opened this issue Aug 26, 2021 · 12 comments
Open

Some thoughts on edge cases #1

Deewiant opened this issue Aug 26, 2021 · 12 comments

Comments

@Deewiant
Copy link

Hey, after you pinged me at cwesson/funge-plus-plus#1 I took a look at https://github.com/tjol/rfunge/wiki/Undefined-behaviour-and-known-bugs out of curiosity and had a few thoughts. I haven't given much thought to Funge business in years, so you managed to nerd-snipe me. 😄

(An e-mail might've been better suited for this but I figured a GitHub issue is fine too, and maybe this is interesting to someone else out there.)

I also noticed your blog at https://tjol.eu/blog.html — good job getting CCBI compiled, I actually feared it would be more work these days. FWIW, I also offer the binaries at https://deewiant.iki.fi/projects/ccbi/ which I think should work on most modern systems still.

Anyway, to the actual topics at hand.

What happens when you cross the edge?

RFunge takes the view that the program is in theory surrounded by an infinite amount of spaces, and should act that way whether these spaces were ever physically present in the file or not.

To be pedantic, the Funge-98 spec points out that the space ranges across [i32::MIN, i32::MAX] (or equivalently for non-32-bit implementations). So I'd argue that if one writes a # at the cell at i32::MAX and executes it eastwards, it should jump over the cell at i32::MIN.

FBBI and jsFunge-98 make the curious choice of having # and ' behave like in CCBI, but having " behave like in rcFunge.

"choice" is a bit generous... I don't think these kinds of edge cases were given much thought anywhere before I tried to formalize things a bit with Mycology, so these types of weirdnesses tend to be the result of implementation quirks. I think you get this behaviour if you implement # and ' as moving the IP without checking for wraparound (i.e. simply position += delta) and " as just setting a flag and relying on the normal IP movement (which practically must handle wraparound) for it.

Edge cases of k

Figuring out k was a seemingly never-ending source of problems. I remember having several discussions with @VorpalBlade (and other #esoteric regulars at the time) about it, and both CCBI and cfunge saw a number of tweaks.

Looking at it now, I think CCBI gets 0k wrong and rfunge and cfunge get it right. (And all three get positive k right, and cfunge's choice to reflect on negative k is probably the sanest option of the bunch.) Although apparently I explicitly changed CCBI away from that behaviour in Deewiant/CCBI@3c77413... glancing over my IRC logs from 2008–2009 suggests that even at the time I thought that was wrong, so I wonder what prompted it.

k and t

Sorry for having CCBI be broken here. I recall discussing this in the cfunge context and I think at the time I was already starting to be a bit burnt out on CCBI so I never really investigated it properly.


One more source of k "fun" that you haven't mentioned is the topic of nested k, especially combined with instructions that move the IP like #, or ones that also affect the stack while doing so like '. I doubt Mycology tests it (too much of a headache to think about, too many potential behaviours to try to recover from, etc.), but it's one of those things that will definitely cause different kinds of weirdness in different interpreters. In general the topic of "where is the IP during the execution of k" is not very well defined. (I guess you've already run into some of that in VorpalBlade/cfunge#4 .) Consider for example 12kk'abcde@:

  1. On hitting the first k, one should find k to be the instruction to execute twice.
  2. When the inner k is executed for the first time, you can already argue about what instruction it should run: Either, since the IP hasn't moved from the outer k yet, it should find itself, which effectively reduces kk to n (I think?), or, to be more interesting, it should find the next instruction from the inner k, i.e. '.
  3. So assuming the inner k executes ', it will do so once (since 1 is at the top of the stack at this point). Now it's anyone's guess as to where exactly the ' loads from, but I think that if we're not doing the kk == n interpretation then the execution has to happen from the inner k, so the ' finds ' and hence pushes 39. And then it moves the IP to be on top of the a.
  4. When the inner k executes for the second time, 39 is at the top of the stack, so it'll run 39 times. Now with the IP on top of the a, should it run b? Or the ' again? If the latter, the IP should move right 39 times, pushing a few copies of the whole program on the stack, and the outer k is exited at somewhere around the c or d if my mental arithmetic is correct.

I'm pretty sure CCBI and cfunge differ in the details and I wouldn't be surprised if neither one stands up to scrutiny, but cfunge is probably at least better tested in this respect.

(As a side note, I've been meaning to finish my improved Funge-Space implementation at https://github.com/Deewiant/mushspace and then use https://github.com/Deewiant/hali as a base to build a Rust-based interpreter on top of that, with another pass at correctness and scrutinizing these types of edge cases, for ages, but realistically and sadly it'll probably never happen... Let me know if you need help with something in rfunge instead, though!)

@tjol
Copy link
Owner

tjol commented Aug 26, 2021

Hi! How nice of you to have a look around :-)

I did see that you provide binaries, but they don't work on my Linux system for some reason. The Windows binaries work on Windows though, so that's nice.

Yes. k. Fun times. Actually, after having written that wiki page I came to the conclusion that rfunge has the worst implementation of 0k, and should be fixed! I'd say cfunge gets it right, though, and CBBI may not be making the best choice, but it's fairly coherent at least. So that'll definitely have to be fixed, probably to agree entirely with cfunge. And to support kt, though that might require major surgery...

As for kk shenanigans... I think you’re right about the possibly-sanest but least-interesting option reducing to n, but it’s too late to think about that or the other one in much detail right now. I may or may not come back to you on that.


If you have ideas for a faster funge-space implementation I'm sure that could be a great addition to rfunge, actually, should you ever feel up to it! I've used a stupid amount of generics in my code with the idea that it might be nice to try different funge-space implementations in future so all most of the interpreter knows about funge-space is that it implements a fairly simple trait

pub trait FungeSpace<Idx>: Index<Idx> + IndexMut<Idx>

(looking at that trait definition now it looks like those doc comments aren't quite right)

Or just have a look around, find bugs, whatever you like. Some other things vaguely on the agenda are Trefunge support, a few more fingerprints, and possibly a better web-based debugger/GUI.

@Deewiant
Copy link
Author

I did see that you provide binaries, but they don't work on my Linux system for some reason.

Oh, darn. I guess being compiled 10+ years ago means some library dependencies are out of date...

I came to the conclusion that rfunge has the worst implementation of 0k, and should be fixed! I'd say cfunge gets it right, though

Oops, I didn't realize they were different. Yeah, I think cfunge's interpretation makes more sense.

If you have ideas for a faster funge-space implementation I'm sure that could be a great addition to rfunge, actually, should you ever feel up to it!

mushspace basically uses an R-tree where each rectangle corresponds to a Vec<FungeValue>. I think something like that (not necessarily an R-tree specifically, but some spatial data structure in that vein) is ideal at least as part of a complete solution: The R-tree allows for wraparound to "jump" to the next allocated area quickly, so it prevents cases like writing at (i32::MAX, 0) from making wraparound brokenly slow. (Assuming the typical alternative implementation i.e. a hash table + tracking min/max bounds.) It involves a lot of heuristics around how to allocate those Vecs though, since typically you're given just the p operations and have to deduce things like "oh, this is actually a 1-D array so let's reserve 1024 cells in the direction it's being written in".

The bit where I got stuck was figuring out a decent tessellation algorithm for when many of those rectangles end up being co-located, and you want to merge them into something optimal. All in all an optimal Funge-Space implementation for arbitrary programs is a surprisingly complex problem.

Sadly there's no beating cfunge's simple trick of having a fixed-size array used for the main source code, "big enough for all known programs". (I.e. for Mycology, which I think is the biggest hand-written program to date.) Much as I tried, in the end the source code gets accessed so much that the if pt >= FIXED_MIN && pt < FIXED_MAX { return array[pt - FIXED_MIN]; } is a well-predictable branch and compiled to such a small amount of code that it always ekes out a few percentage points over anything else. The wins then come only for data outside that initial area.

(FWIW, hali beat cfunge in CPU-seconds on all the benchmarks I threw at it, and beat it on memory usage rather significantly except in the pathological case of a program that writes a long diagonal line. But with hali's incompleteness this should be taken with a grain of salt, e.g. the lack of concurrency support might make up a big chunk of the difference.)

@tjol
Copy link
Owner

tjol commented Aug 27, 2021

mushspace basically uses an R-tree where each rectangle corresponds to a Vec<FungeValue>. I think something like that (not necessarily an R-tree specifically, but some spatial data structure in that vein) is ideal at least as part of a complete solution: The R-tree allows for wraparound to "jump" to the next allocated area quickly, so it prevents cases like writing at (i32::MAX, 0) from making wraparound brokenly slow. (Assuming the typical alternative implementation i.e. a hash table + tracking min/max bounds.) It involves a lot of heuristics around how to allocate those Vecs though, since typically you're given just the p operations and have to deduce things like "oh, this is actually a 1-D array so let's reserve 1024 cells in the direction it's being written in".

That's interesting. It's structurally not too dissimilar from my attempt at a PagedFungeSpace: I'm using a HashMap of fixed-size rectangular Vec<T:FungeValue>. The rectangles being fixed-size makes things a lot simpler (especially lookup), but what size of to choose is a classic speed/memory use trade-off. When skipping across unallocated ‘gaps’ or wrapping, I check all allocated areas to see if they intersect the IP's path.

The bit where I got stuck was figuring out a decent tessellation algorithm for when many of those rectangles end up being co-located, and you want to merge them into something optimal. All in all an optimal Funge-Space implementation for arbitrary programs is a surprisingly complex problem.

That does sound like quite a difficult problem! I suppose how important this is for performance would depend on the size of the rectangles...

Sadly there's no beating cfunge's simple trick of having a fixed-size array used for the main source code, "big enough for all known programs". (I.e. for Mycology, which I think is the biggest hand-written program to date.) Much as I tried, in the end the source code gets accessed so much that the if pt >= FIXED_MIN && pt < FIXED_MAX { return array[pt - FIXED_MIN]; } is a well-predictable branch and compiled to such a small amount of code that it always ekes out a few percentage points over anything else. The wins then come only for data outside that initial area.

Well quite. My funge-space implementation is certainly not ideal, but I think I've been able to make it pretty quick by now – and it takes up about half of the interpreter’s time. For context, on my favourite benchmark rfunge is about 2/3 the speed of cfunge right now. But so much of the time appears to be taken up by lookups in the hashmap that I'm sure a more suitable data structure, even if it's more complex than a simple array, could make quite a difference.

I was wondering about an nD array of nullable fixed-size nD arrays (with side lengths a power of 2?). Vec<Option<Vec<T:FungeValue>>> sort of thing. lookups should be pretty fast (nowhere near cfunge-fast, but still than a hashmap or any kind of tree), and for 32x32 cell blocks (e.g.) the pathological case still takes 1024 times less memory than a plain array.

@VorpalBlade
Copy link

VorpalBlade commented Aug 27, 2021

Some quick thoughts on this (what with doing a PhD currently I don't have time to really dig into this):

  • k is indeed a real pain
  • However, there are some included tests in the cfunge repo for things that Mycology doesn't test. They may or may not be right, but they do test some of the expected behaviour for cfunge (and that I presumably thought was right at the time). I believe I even set it up to work with cmake/ctest at some point to run as automatic tests with some CI service or other. Worth taking a look at perhaps? For tests of k I believe you should look at tests named something with "iterate", though the other tests might also be of interest.

@VorpalBlade
Copy link

Another thought that came to my mind after looking at your blog is that you never looked at efunge (https://github.com/VorpalBlade/efunge). I wrote that one in Erlang and it is somewhat interesting for the fact that it is bignum. This bends the standard a bit since that assumes a finite cell size. I also have a vague memory of it having an experimental branch where I added a fingerprint for async multithreading (unlike the sync MT provided by t),

As far as I know it should still work on modern Erlang versions, though I haven't tested it for several years.

@tjol
Copy link
Owner

tjol commented Aug 27, 2021

Another thought that came to my mind after looking at your blog is that you never looked at efunge (https://github.com/VorpalBlade/efunge). I wrote that one in Erlang and it is somewhat interesting for the fact that it is bignum. This bends the standard a bit since that assumes a finite cell size. I also have a vague memory of it having an experimental branch where I added a fingerprint for async multithreading (unlike the sync MT provided by t),

As far as I know it should still work on modern Erlang versions, though I haven't tested it for several years.

I'll have to have a look at efunge at some point, as well as fungi (haskell) and clostridium (clojure). In short, all the implementations in functional languages looked a bit hard to get running.

Actually I think PyFunge also uses bigint (by accident, I imagine), and I imagine jsFunge secretly uses floats...

@Deewiant
Copy link
Author

That's interesting. It's structurally not too dissimilar from my attempt at a PagedFungeSpace: I'm using a HashMap of fixed-size rectangular Vec<T:FungeValue>. The rectangles being fixed-size makes things a lot simpler (especially lookup), but what size of to choose is a classic speed/memory use trade-off.

I guess with fixed-size boxes you can also end up with a lot of overhead in cases like the "1-D array" I mentioned. It's surely a lot less of a headache to think about, though!

When skipping across unallocated ‘gaps’ or wrapping, I check all allocated areas to see if they intersect the IP's path.

This can of course degrade to O(n) for sufficiently pathological cases, since the HashMap structure doesn't help at all.

I was wondering about an nD array of nullable fixed-size nD arrays (with side lengths a power of 2?). Vec<Option<Vec<T:FungeValue>>> sort of thing. lookups should be pretty fast (nowhere near cfunge-fast, but still than a hashmap or any kind of tree), and for 32x32 cell blocks (e.g.) the pathological case still takes 1024 times less memory than a plain array.

You mean covering the whole space with boxes that may or may not be allocated? Wouldn't that force you to allocate a ridiculous number of Nones if a program writes to (MAX, MAX)?

I'll have to have a look at efunge at some point, as well as fungi (haskell) and clostridium (clojure). In short, all the implementations in functional languages looked a bit hard to get running.

If you want to be exhaustive, there are plenty of other interpreters out there too. Other than the ones mentioned so far, I can account for at least: !Befunge, BeQunge, Fungus, GLfunge98, Language::Befunge, libBefunge, RubyFunge, stinkhorn, Webfunge, ZedFunge, and zfunge. (At some point I had a Mycology results table that covered most of these, but it got too painful to maintain and didn't seem worth the trouble.) I can dig out more info on them if you'd like (and if they've vanished from the Web by now).

@tjol
Copy link
Owner

tjol commented Aug 27, 2021

This can of course degrade to O(n) for sufficiently pathological cases, since the HashMap structure doesn't help at all.

Yes, it is generally O(n), but at least n is the occupied area, not the total area. It involves relatively few operations per unit area as well, but there are problems.

You mean covering the whole space with boxes that may or may not be allocated? Wouldn't that force you to allocate a ridiculous number of Nones if a program writes to (MAX, MAX)?

Yep. But it'd take gigabytes of RAM rather than terabytes! (Probably not going to do this)

If you want to be exhaustive, there are plenty of other interpreters out there too. Other than the ones mentioned so far, I can account for at least: !Befunge, BeQunge, Fungus, GLfunge98, Language::Befunge, libBefunge, RubyFunge, stinkhorn, Webfunge, ZedFunge, and zfunge. (At some point I had a Mycology results table that covered most of these, but it got too painful to maintain and didn't seem worth the trouble.) I can dig out more info on them if you'd like (and if they've vanished from the Web by now).

I'm not too fussed about being exhaustive myself, but in theory it might be nice if the list of interpreters on the esolang wiki were more complete and up-to-date...

@tjol
Copy link
Owner

tjol commented Aug 27, 2021

As you suspected, @Deewiant, kk behaves differently on different interpreters depending on what view they take on where the IP is

I tried out this:

58   kk.vvvvvvvvvvvvvv
        23456789abcdef
   v.,!'<<<<<<<<<<<<<<
   >na,24kk'>>>>>>>>>>>>:#,_@

rfunge/ccbi/rcfunge/pyfunge all go for

!12 

while cfunge picks an alternative strategy:

0 0 0 0 0 !2 
>'kk42,an>    @_,#:>>>>>>>>>>>>'kk42,an>    @_,#:>>>>>>>>>>>>''kk42,an>    @_,#:>>>>>>>>>>>>'kk42,an>    @_,#:>>>>>>>>>>>>''kk42,an>    @_,#:>>>>>>>>>>>>'kk42,an>    @_,#:>>>>>>>>>>>>''

Those two pretty much fit with your predictions.

(FBBI, however does something rather unexpected:

0 0 0 0 0 0 0 0 0 0 !2 
>>>

)

@tjol
Copy link
Owner

tjol commented Aug 27, 2021

One could make the argument that, in a strict reading of the spec, k shouldn't move the IP at all, ever. If it's given a 0 argument, it just arranges for the next instruction ‘in the path of the IP’ to not be executed when its time comes.

In this interpretation, everybody is wrong, as all implementations appear to implement 0k as moving the IP, skipping additional instructions that weren't really the next instruction in the path of the IP at the time.

With that way of looking at things, the program should print

0 !2
>

... I think

@cwesson
Copy link

cwesson commented Aug 28, 2021

Another edge case for k, what should 2k" do? Enter and immediately exit stringmode, then enter string mode again? Or does it enter stringmode twice then exit stringmode?

I tested Funge++ with:

2k"v"2.@"3.@
   >1.@

It entered stringmode twice, then exited stringmode, printing 1, but I'm starting to think enter/exit stringmode then enter stringmode might be the correct behavior.

@tjol
Copy link
Owner

tjol commented Aug 28, 2021

Well that’s fun. rfunge actually enters string mode once and then ends the k instruction, but that definitely makes no sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants