-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 support #3
Comments
Hi Marcos.
You are right. This is a limitation I can remove. It is in my todo-list.
Originally, when I started with nuBASIC the purpose was to provide an
example for my programming courses.
I had in mind a kind of 80s interpreter, so supporting just ASCII was
enough for that purpose.
I will need to improve the tokenizer, which is responsible to read the
input and transform in tokens.
Such limitation was simplifying the implementation, but maybe now I can
improve it.
Thank you your suggestion.
Kind regards,
Antonino
…On Sun, 10 Nov 2019 at 13:19, Marcos Cruz ***@***.***> wrote:
Why only ASCII is supported? It's a suprising limitation. First I thougth
it was a mistake of the manual: I thougth it meant only the identifiers.
But effectively, UTF-8 or even Latin 1 strings are not accepted in BASIC
sources (all non-ASCII characters are removed). And only ASCII characters
are accepted by the command line interpreter.
Is nuBASIC ASCII-only by design? Or is UTF-8 going to be supported (at
least just to print strings, not to manipulate them) in a future version?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3?email_source=notifications&email_token=ADDNYVRCPZ3CHR3VI2RJWBDQTAC6ZA5CNFSM4JLMKKN2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HYHXDRA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADDNYVRR24DXTJE6HVHECOTQTAC6ZANCNFSM4JLMKKNQ>
.
|
I would prefer Latin-1 ( = ISO 8859-1 ), or a switch between ISO-8859-1 and UTF-8. |
I'm not sure what you mean by "switch".
That is a problem of your editors ;) Unicode is the way to go, and UTF-8 is its most practical encoding at the moment. Of course, it brings the issue about the BASIC string functions, but they could work with bytes as usual. The thing is to accept and print UTF-8 strings. But anyway ISO 8859-1 is better than nothing: it would make nuBASIC useful to write programs in a few European languages other than English. |
Thanks. I understand ASCII was enough for your initial scope, but it makes the language pretty useless for a more general usage. |
Hello, my answer regarding .bas Source-Files.
In Python the source code encoding is specified in Line 2 the following way: It could also be a new Basic command, why put important information into comments ? An other possibility for having a switch is: The Byte order mark, present = UTF8, not-Present = Latin1 or see my following post.
In bigger projects, the language-specific string constants are in "resource" or external files . Currently, nubasic strings are 8-bit-sequences! Only the Source Code is treated as 7-bit. |
I assume, around russia they have a huge amount of cp1251 -coded files, etc.
So a switch could also be between those two possibilities. |
Why only ASCII is supported? It's a suprising limitation. First I thougth it was a mistake of the manual: I thougth it meant only the identifiers. But effectively, UTF-8 or even Latin 1 strings are not accepted in BASIC sources (all non-ASCII characters are removed). And only ASCII characters are accepted by the command line interpreter.
Is nuBASIC ASCII-only by design? Or is UTF-8 going to be supported (at least just to print strings, not to manipulate them) in a future version?
The text was updated successfully, but these errors were encountered: