Skip to content

f16 addition difference between x86-64 and ARM64 #139882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevecheckoway opened this issue Apr 15, 2025 · 2 comments
Closed

f16 addition difference between x86-64 and ARM64 #139882

stevecheckoway opened this issue Apr 15, 2025 · 2 comments

Comments

@stevecheckoway
Copy link

I tried this code:

#![feature(f16)]

use std::f16;

fn main() {
    let pos_inf = f16::from_bits(0b0_11111_0000000000);
    let neg_inf = f16::from_bits(0b1_11111_0000000000);
    let sum = pos_inf + neg_inf;

    assert!(sum.is_nan());
    println!("{pos_inf:?} + {neg_inf:?} = {sum:?}");
}

on an ARM64 machine running macOS and on an x86-64 machine running Linux.

This code is adding +infinity + -infinity and producing a NaN.

I expected to see the same output on both systems.

Instead, the sign of the returned NaN differs between the two systems.

On ARM64, I get,

$ cargo +nightly run --quiet
0x7c00 + 0xfc00 = 0x7e00

On x86-64, I get,

$ cargo +nightly run --quiet
0x7c00 + 0xfc00 = 0xfe00

As you can see, the only difference is in the NaN's sign bit.

As I read IEEE 754-2008, §6.3, the sign bit of a NaN is not meaningful and that for addition (and most operations), the standard "does not specify the sign bit of a NaN result."

So in the sense of following the standard, this isn't a bug. But it might be nice to produce the same values on supported platforms. Of course, if hardware implementations differ, there's nothing to do.

If I change the code to this

    let pos_inf = f32::INFINITY;
    let neg_inf = -f32::INFINITY;
    let sum = pos_inf + neg_inf;
    println!("{:08x} + {:08x} = {:08x}", pos_inf.to_bits(), neg_inf.to_bits(), sum.to_bits());

then I get the same output for both systems:

7f800000 + ff800000 = 7fc00000

Meta

$ cargo +nightly --quiet rustc -- --version --verbose
rustc 1.88.0-nightly (2da29dbe8 2025-04-14)
binary: rustc
commit-hash: 2da29dbe8fe23df1c7c4ab1d8740ca3c32b15526
commit-date: 2025-04-14
host: aarch64-apple-darwin
release: 1.88.0-nightly
LLVM version: 20.1.2
$ cargo +nightly --quiet rustc -- --version --verbose
rustc 1.88.0-nightly (2da29dbe8 2025-04-14)
binary: rustc
commit-hash: 2da29dbe8fe23df1c7c4ab1d8740ca3c32b15526
commit-date: 2025-04-14
host: x86_64-unknown-linux-gnu
release: 1.88.0-nightly
LLVM version: 20.1.2
@stevecheckoway stevecheckoway added the C-bug Category: This is a bug. label Apr 15, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 15, 2025
@hanna-kruppe
Copy link
Contributor

As I read IEEE 754-2008, §6.3, the sign bit of a NaN is not meaningful and that for addition (and most operations), the standard "does not specify the sign bit of a NaN result."

More immediately relevant for Rust, https://rust-lang.github.io/rfcs/3514-float-semantics.html decided that the sign bit of any NaN result is non-deterministic.

@stevecheckoway
Copy link
Author

Fair enough! I'll close this issue.

@jieyouxu jieyouxu removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants