You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I first started using this library to have fun with some Advent of Code problems, and I noticed something weird. As I worked on my solutions, I'd first write a test case so I could debug my solution, and I'd provide the test data as a multiline string in the test case. For example:
Then, once my code worked properly, I'd download the real data to a local file, and invoke my solver. The solver would open the local file, read it into memory, and then fail. I found this quite frustrating, and ultimately solved the problem by reading the local file from STDIN, a line at a time, because consistently, processing a single line would work no matter what the source, but processing multiple lines would not.
Now, I'm working on a project where I'm trying to import a database dump and once again, I'm encountering this problem. I've managed to parse fields out of a line, and multiple lines out of a string literal, but the parsing fails spectacularly when reading from a file. It's as if the line endings are being consumed by the operating system, except it's even weirder than that.
In my test case, I've got code that loads the file from disk:
Subsequently, I iterate over all the lines and run my "line parser" on each one. This works.
Then, if I try to pass the data to a parser defined to process a bunch of lines, it fails. I can make it fail in a variety of ways, too. Most straightforward is, the result is that instead of 47k lines I get only one line with...well, a lot of values. Beyond the naive approach, I get either infinite loops or a failure to consume input, depending on whether I use literals, just a OneOf, Many, or an Optionally for the line ending separator.
The most frustrating thing is trying to convert from a linguistic expression to a code expression. Clearly and provably, my line parser is working. So what I want is, "Zero or more line, where each line is terminated by either an end of line or end of file."
Okay, that's the second most frustrating thing, because the most frustrating thing is that when I provide all the data as a multiline string literal, my parsers work as written. Which suggests that the problem is something to do with the invisible line ending sequence -- the difference between \r\n and \n.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I first started using this library to have fun with some Advent of Code problems, and I noticed something weird. As I worked on my solutions, I'd first write a test case so I could debug my solution, and I'd provide the test data as a multiline string in the test case. For example:
Then, once my code worked properly, I'd download the real data to a local file, and invoke my solver. The solver would open the local file, read it into memory, and then fail. I found this quite frustrating, and ultimately solved the problem by reading the local file from STDIN, a line at a time, because consistently, processing a single line would work no matter what the source, but processing multiple lines would not.
Now, I'm working on a project where I'm trying to import a database dump and once again, I'm encountering this problem. I've managed to parse fields out of a line, and multiple lines out of a string literal, but the parsing fails spectacularly when reading from a file. It's as if the line endings are being consumed by the operating system, except it's even weirder than that.
In my test case, I've got code that loads the file from disk:
Subsequently, I iterate over all the lines and run my "line parser" on each one. This works.
Then, if I try to pass the data to a parser defined to process a bunch of lines, it fails. I can make it fail in a variety of ways, too. Most straightforward is, the result is that instead of 47k lines I get only one line with...well, a lot of values. Beyond the naive approach, I get either infinite loops or a failure to consume input, depending on whether I use literals, just a
OneOf
,Many
, or anOptionally
for the line ending separator.The most frustrating thing is trying to convert from a linguistic expression to a code expression. Clearly and provably, my line parser is working. So what I want is, "Zero or more line, where each line is terminated by either an end of line or end of file."
Okay, that's the second most frustrating thing, because the most frustrating thing is that when I provide all the data as a multiline string literal, my parsers work as written. Which suggests that the problem is something to do with the invisible line ending sequence -- the difference between
\r\n
and\n
.Beta Was this translation helpful? Give feedback.
All reactions