-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsigned integers #2
Comments
Yes, for certain algorithms (hash functions) the wrap around nature of unsigned ints is convenient and it would lead to less confusion with C interop. I do not believe the standard requires integers to be implemented as two's complement, but I do not know of a compiler that uses a different convention. This means that the underlying machinery for unsigned integers is already in place. (And that programmers can "roll their own" but this can be confusing or less clear/explicit in the source code. |
After talking about this with a few members on the committee, it seems most are in agreement that having this in the C interop might be a good idea, but allowing this in the Fortran language itself would do more harm than good (vendors don't like it; it's easy to have all kinds of subtle bugs with unsigned integers such as comparing subtracting etc.). Python does not have unsigned integers, although NumPy does, and so does Julia. I can personally see very good arguments both for and against having this in the language itself. We can start with the C interop, where it should be easier to get agreement, to see if there is anything that would make sense to propose. |
I wanted to chime in and say that one important use case of unsigned integers is handling images. To store a monochrome 8-bit image, one either has to use twice-as-large 16 bit integer or store it as 8 bit unsigned int and deal with wrapping modulo 128 which makes any arithmetic operation impossible. This is true for any binary data, not only images. So I think the issue is not stricly C-interop related. |
I agree with this: unsigned integers can be of great help in any systems programming context, handling of binary data of any form (images or otherwise) can be a use case within this space. Though some will argue unsigned integers are not an absolute must for systems programming, the fact is this facility can really make coders' lives easier. If Fortran intends to be taken truly seriously as a general-purpose language, it should consider including unsigned integers; its type system is general and it does not in any way appear to interfere with its introduction. Interestingly, unsigned integers feature was 4th on the top 6 list of desired features by users in the WG5 survey for Fortran 202X. Ignoring this any longer feels like suppression of the voice of the customers! |
I just want to react to this particular point:
Fortran is not a general-purpose language. Rather, it is a domain specific language for array oriented scientific computing. As a larger point, it touches what we want Fortran to be, see #59. |
Fortran is not a general-purpose language. Rather, it is a domain
specific language for array oriented scientific computing.
I agree with this and with that direction, however reading/writing binary
data is often a part of it, and this is the place where lack of byte/uint
really bit me as very often data is stored as uint16 FITS/TIFF files.
Dominik
pon., 28 paź 2019 o 22:31 Ondřej Čertík <[email protected]>
napisał(a):
… I just want to react to this particular point:
If Fortran intends to be taken truly seriously as a general-purpose
language
Fortran is not a general-purpose language. Rather, it is a domain specific
language for array oriented scientific computing.
As a larger point, it touches what we want Fortran to be, see #59
<#59>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AC4NA3N4HGNT7GC7E7BKHMDQQ5K3XA5CNFSM4JBFTXZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOOZEY#issuecomment-547155091>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3LG6CZDOIS7GUO7UIDQQ5K3XANCNFSM4JBFTXZA>
.
|
@gronki I think you are right about the use case of reading binary data files. Let's collect such use cases. I think we want to be able to write readers and writers for binary files in Fortran. |
That may be the current reality with Fortran, but almost everyone who have worked on its language design and continue to do so will greatly dislike being reduced as such and would very much want Fortran to be seen as a general-purpose language. It's a different matter whether the words are backed up by actions! |
Let's continue the discussion what Fortran should become here: #59. I started in this comment: #59 (comment).
|
I'm not sure that Fortran needs an unsigned type so much as it actually needs some unsigned operations and relations. Be wary of simply copying-and-pasting C's |
I too would like unsigned integers. Mainly because of bitmap, image data, and other binary coding or data. |
I would like to have unsigned integers as well, but there need to be clear definitions of how they interact with signed integers as well. Assume
then the question of what type Comparisons between signed and unsigned should take the sign into account, so that -2 is smaller than (unsigned) 1. So, lots of decisions, and lots of traps and pitfalls. Avoiding everything that C got wrong does not mean that a proposal would get it right... |
Imagine Fortran 202Y introduces a distinct new intrinsic type unsigned integers, say it is termed Now assume
Then what are the pitfalls from other experiments (C-like languages) that can be envisioned with such a design in Fortran? Note 1. above will mean integer :: i, foo
uinteger :: u, bar
u = i !<-- Not allowed
i = u !<-- Not allowed
foo = i + u !<-- Not allowed
bar = i + u !<-- Not allowed
if ( i > u ) !<-- Not allowed
..
u = UINT( i ) !<-- Ok
foo = i + INT(u) !<-- Ok
.. |
Here is a small subset of possible pitfalls that I recommend to address:
References: |
I'd prefer no wrap-around and an error if an unsigned integer overflowed /underflowed, but which could be checked and corrected with the same considerations as a carry bit so that increases/decreases could be handled. It would actually allow wrap-around with a correction function following by ignoring carry/borrow. |
Yes, this is the nature of modulo arithmetic.
Again, this is the nature of modulo arithmetic. It would be the expectation that people who use it know what they are doing. Maybe this can be alleviated by chosing some more descriptive name which has the modulo in the name.
It makes little sense to think of unsigned integers as "-1". Again, this is implied in modulo 2^n arithmetic.
Same thing.
Jep. I would actually not permit unsigned types for DO loops.
This, I would disallow - explicit type conversions only.
Make the conversion defined, and explicit only.
That, I would agree with.
Fortran has had
Again, agreed. If type conversion has to be explicit, then people will hopefully not use it just for the (non)-fun of it. |
subtracting two unsigned numbers can also overflow, though. |
I agree with tkoenig in every point. Coming from the field of computer
vision and image processing, where Fortran could fight for its large share,
not having unsigned type causes a lot of pain. I would not be concerned
about overflows as unsigned arithmetic is modulo arithmetics by design. I
agree that no implicit conversion of any kind should occur, nor should size
or other intrinsic return types be changed to unsigned. Unsigned int should
be only restricted to be useful where it is needed (signal processing and
data storage). Currently, using oversized data type causes half of the
memory to be effectively wasted, and leads to computational overhead.
Dominik
pt., 13 sty 2023, 16:48 użytkownik 8bitmachine ***@***.***>
napisał:
… subtracting two unsigned numbers can also overflow, though.
To ensure correct programming some means of flagging up the carry/borrow
is needed, in case of (perhaps) unexpected conditions.
That seems to me to be the problem. 3-5 <3 would be interpreted correctly
if a carry/borrow flag is used in the evaluation, if both are unsigned.
That would also work in a do loop.
Checking the size of files, etc could benefit from unsigned values, though
the usual solution of using a larger max integer is adequate as files are
not likely to need 31bit sizes.
I would still request unsigned, but with the restrictions of not mixing
them and explicit conversion if needed.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3OZEVI4I6KUZ3R6YEDWSF2OTANCNFSM4JBFTXZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@certik : I share your concerns about people making stupid mistakes because it is too easy to confuse signed and unsigned ints. To alleviate this, maybe this: Unsigned constants should also be distinguished from normal integers. Anybody who wants -1 as an unsigned number should either write something like So, Comparisons between signed and unsigned could actually be permitted, but the values would be compared, so |
Unsigned integers don't have to be a full-fledged type in Fortran; it just needs a few more unsigned integer operations. Just as |
I think idea by Peter Klausler is also worthwhile and much more simple to
introduce, since adding intrinsic is much less work intensive. I would add
that conversion from uint to float is a very common operation. So simple
arithmetic and conversion would cover most of the needs as far as the
computer vision goes.
Dominik
pt., 13 sty 2023, 18:12 użytkownik Peter Klausler ***@***.***>
napisał:
… Unsigned integers don't have to be a full-fledged *type* in Fortran; it
just needs a few more unsigned integer *operations*. Just as IAND can be
well-defined on integers, so could IUADD and IULT, &c.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3J2WFJPG7CHISYU3OTWSGEGXANCNFSM4JBFTXZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Possible, of course, but it would not be very much like "Formula Translation" any more, it would look more like LISP :-) Another possibility would be an intrinsic module which exports an otherwise opaque type which just happens to have certain operations, and others not. But this would not work seamlessly with I/O, so better not. |
Formatted I/O already has BOZ editing, so add U editing for unsigned decimal. (List-directed and NAMELIST I/O would require a new type, yes.) I think that people are underestimating the compiler engineering effort needed to extend Fortran's type system with a new intrinsic type, and overestimating the benefit to be gained from the effort. But adding just some unsigned integer operations would be cheap. |
The "can all agree upon" seems a bit presumptive. Here's a portion of Marsaglia's 64-bit KISS prng implemented and tested with gfortran
Note,
It would be even more difficult to parse if, e.g., Hopefully, there is some agreement that one of these |
@kargl awesome, thanks for the code. Can I include that in the proposal? Yes, indeed the |
Anything I post in a public forum without an explicit copyright notice is fair game for others to use as they see fit. Even the KISS example isn't too complicated, but shows the verbose use of functions. As a side note, I agree with @klausler and @tkoenig1 that introducing new processor-dependent behavior into the Fortran standard is undesirable. |
That's right, and we have to explore how to make it simpler, which is possible IMO.
It seems to me that the standard constantly introduces some undefined behaviors, which is not necessarily a bad thing in itself, as UB's leave more room for some optimizations by the compilers. |
Fortran has prohibitions in the code, as in "shall not", and requirements, as in "shall". If somebody does a calculation involving integer overflow, then that is illegal, The processor can do optimizations based on the assumption that this does not happen, and if the user makes that mistake then it can ignore the error, trap, or start World War III. Intentionally introducing undefined behavior into the language would be a) a bad idea, as C has learned and b) will be equivalent to a prohibition. Framing this as anything else is just wrong, and shows either a basic flaw in understanding of language design, or something else. |
Naively, I thought that having no default arithmetic operations and requesting the user to import a desired set of operators (+, *, etc, (*1)) from some intrinsic module (e.g., at the top of a user module) seems useful to keep the readability of the code. Is such an operator-based approach already out of option for some reason (e.g., because operators can take only up to 2 arguments so less flexible (*2)), and so intrinsic function-based approaches currently considered instead? (*1) Here, I imagine a set of operators are provided in an intrinsic module for each mode (e.g. modular) and the user can select which mode to use via the Personally, I expect the "modular" behavior (255 + 2 -> 1) will be okay as the default because (i) Fortran users use signed integers up to now, and only "power users" will use unsigned integer for their special purposes (so I expect they know what they are doing), and (ii) other languages seem to use the modular mode, so porting the code seems easier. |
Agree, but I don't think this is the intent of @certik , as he wrote in his proposal: "Just like the current (signed) integers, arithmetic overflow is undefined. This allows processors to optionally check for overflow." . So he is explicitely saying that the unsigned overflow should be handled the same way as the signed overflow. Nonetheless some words should be changed in the proposal (undefined -> prohibited).
Maybe I'm dumb, maybe something else... |
Indeed I previously suggested something similar in this discussion:
But yes, it could also be done with intrinsic modules ( |
Hi @PierUgit, |
@tkoenig1 wrote:
This was already addressed by @PierUgit at #2 (comment). A simple video call would clarify such basic points easily. @tkoenig1 all you need to do is reply to the email I sent you. :) The proposal in #341 does not prohibit to later do your full proposal. You can relax a prohibition and no user code will break (unless they were using something that is prohibited). |
I don't understand this reply. @certik is proposing that the result of unsigned arithmetic is not defined in some cases. How is that not introducing new undefined behavior? |
Saying that the behavior is "undefined" is actually consistent with the current text of the standard for the existing numeric types, which says: Actually, I can't find in the standard anything that says that the |
For people who want saturating or checked or ... arithmetic, here is an implementation - changing a parameter in user code changes the behavior. A user who desires different behavior could easily change that parameter; this This contains a subroutine chk_add which is modeled on the C23 feature, which in turn is modeled on gcc's integer overflow builtins
Based on this, it makes more sense to extend J3/24-116 with user-accessible overflow-checking intrinsics, especially since the overflow for multiplication is easier if one has access to widening multiplication that is widely implemented in hardware, like the compiler does. (Footnote: Only needed for the largest To anybody who thinks this is too verbose: If this finds acceptance, I volunteer to write a module which encompasses all variants from @certik : Could equivalent functionality also be implemented using templates? |
So there would be something like
Right? If yes, tell me how this is fundamentally different from what I proposed earlier with a kind of directive Moreover: any arithmetic operation on such a derived type would go through several function calls, some tests, etc... Doesn't sound good for performances... And what if one wants unchecked behavior for maximum performances ( |
If the user wants, and implements, that, yes, but I would not expect this to be in frequent use. Otherwise no; I would envision having a parameter in a module being the most common case, and people would then change that parameter
Very much different - user code vs. some sort of pragma, which Fortran currently does not have. A user could also have finer control by using different types.
A user-defined derived type can be tailored exactly towards what the user wants.
Modern compilers are quite good at inlining, and LTO exists for a reason.
This is based on the assumption of checking being changed on runtime, which I assume will not be a frequent occurence. It will be interesting if Fortran's templates are powerful enough to offer a different solution, though; this might be preferred. |
I will answer the rest of your post later on, but just about this:
Not only this puts all the burden on the compilers, but I'm not sure thay are as good as you say for that. At least this can hardly be a general statement. I took your module and just changed
This an old CPU (2011), but I would be surprised if the ratio was completely different with a more recent one.
|
What you have demonstrated is that benchmarking is difficult and one often finds what one is looking for. A simply modification of your program (see below) removes the loop overhead and gives
|
Let's see what a modern compiler can actually do:
Note that this is without checking (see the parameter). If you want checking, or if you want to be able to change it at runtime, you will pay a price in performance. Duh. I think the performance argument has been laid to rest. |
OK that benchmarking is difficult. That said, making the same changes as in your code (i.e. moving the calls to omp_get_wtime() inside the loops) doesn't reduce the total times in my case:
On the contrary both timings have an additional ~+0.35 sec.. I tend to give more credit to the timings that are made outside of the loop, and I am (naively ?) expecting the loop overhead with 100 iterations to be quite low compared to the total 10**9 additions (and moreover I am expecting the overhead to be the same for the two cases). That said, what I am also timing here are the RAM<->CPU transfers (but again, the extra cost is the same for the two cases). |
@PierUgit, what overhead does -fopenmp add? You seem to miss that both @tkoenig1 and I used the
BTW, I modified my version of your benchmark to RDTSC. Without the More importantly, the module does not even need to physically exist. A compiler can generate the module on the fly. One of |
I guess that with Anyway, as I wrote earlier I wanted to comment the rest of your post:
So it would be a module that everyone would copy, possible modify, adapt to their own need, etc... IMO this would soon be a mess, with similar but incompatible versions. Assuming that an approach with a module was eventually retained, it should definitly be a standard module ( The consequence is that the mode chek/nocheck/saturate should be selectable at runtime.
The mechanism is different, the effect is the same. I don't think anyway that "Fortran hasn't the feature xxx" is a strong argument: if the feature is useful, then it can be considered. And isn't
My point of view is that if we have to consider a user-defined derived type from the start of the design of a new intrisic type, it's maybe a sign that the design is not the best one and that a workaround is needed. Derived types are not as convenient as a first class intrisic types, they have various limitations... It doesn't seem to me that the |
No. That's not how the hardware works, so you're asking for compiler-generated code to branch based on an environment variable value or some other dynamic selection. That would perform badly.
No. Any program that is correct with Correction: I mean |
That is not a good argument to make in language design - if I can use A as a building block for B, it does not mean that B should also be included of necessity. This way lies the feature stampede of C++. |
What hardware? I was talking about the module proposed by @tkoenig1 : it contains a parameter to select the desired behavior at compile time. Just making it a variable enables selecting the behavior at runtime.
Mmmhhh...
Doesn't behave the same if the |
It's not "of necessity", and that's why I also used the word "maybe". But at the very least there should be discussions about that point, up to the committee level. And I am completely fine if at the end the conclusions are "better not including B in the language"... As long as it has been discussed. |
I agree with @PierUgit, things must be discussed. The committee sometimes has good internal discussions but not always, so it's not a guarantee either. My plan is to prepare the proposal in the coming weeks, and summarize the discussion so far, and we can discuss more, I also want to do a prototype. @klausler, @tkoenig1 what is the best way for a Fortran compiler to do bounds checking for signed integers? For example using LLVM, do you recommend using the llvm.sadd.with.overflow.* style intrinsics? It looks like on x86 it uses the |
Which, again, I think was and will always be an awful idea. The scoping of this would have to be very narrow, and there would be issues if you need to use both types of operation (overflowing and trapping) in one function or even worse -- one expression. As shown by your example with I still do not think there is any way other than using |
I've merged |
All integers in Fortran are signed. It is a common request to include unsigned integers. At the very least to help with the interoperation with the C API that uses unsigned integers.
The best approach currently is to use signed integers of the same size, and then convert them to unsigned Fortran integers of a bigger size appropriately.
The text was updated successfully, but these errors were encountered: