-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f146ebb
commit 642e3ef
Showing
3 changed files
with
207 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: post | ||
title: "(P0) 数值表示" | ||
date: 2024-11-14 13:00:04 +0800 | ||
labels: [ieee754] | ||
--- | ||
|
||
|
||
## 动机、参考资料、涉及内容 | ||
|
||
数值表示(整数,浮点数,大/小端序,字节对齐), 一些 Python/C++/Numpy 中关于数值类型的常用手段 | ||
|
||
## IEEE 754 | ||
|
||
IEEE 754 表示法的描述参见 CSAPP, 简要描述如下: | ||
|
||
浮点数的表示方法为:$V=(-1)^s\times M \times 2^E$, 其中 $M$ 表示 $[0, 2-\epsilon]$ 中的一个数, $E$ 表示一个整数, $s$ 表示 0(正数) 或 1(负数)。 | ||
|
||
以 float32 为例, 具体的编码方式为: | ||
|
||
- 第 1 位表示符号 | ||
- 接下来的 $k=8$ 位 $exp = e_7e_6...e_0$ 表示指数 $E$ (exponent) | ||
- 最后的 $n=23$ 位 $frac = f_{22}f_{21}f_0$ 表示系数 $M$ (significand) | ||
|
||
具体的编码方式如下, 分为 3 类情形: | ||
|
||
**1. Normalized values** | ||
|
||
当指数位不为全0或全1时,属于此类。这种情况下,$E=exp-Bias$,$M=1+0.f_{23}f_{22}...f_{0}$。其中 $Bias=2^{k-1}-1=2^7-1=127$。 | ||
|
||
**2. Denormalized values** | ||
|
||
当指数位全为0时,属于此类。这种情况下,$E=1-Bias=-126$, $M=0.f_{23}f_{22}...f_{0}$。 | ||
|
||
这类编码方式主要有两个目的:一是可以表示出0:+0.0被编码为全0, -0.0被编码为第一位是1,其余位全为0;二是可以表示数十分接近0的数字 | ||
|
||
**3. Special values** | ||
|
||
当指数位全为1时,属于此类。当 $frac$ 全为 0 时,代表 $-\inf$ (如果符号位为1) 或 $+\inf$ (如果符号位为1);如果 $frac$ 不全为0,则代表 $NaN$ | ||
|
||
**总结** | ||
|
||
从表示范围来看,三类值如下分布(一个数的正数表示与负数表示只相差符号位) | ||
|
||
正数: | ||
- +0.0 (denormalized): $s=0$, $exp=00000000$, $frac=00...00$ | ||
- 最小正数 (denormalized): $\epsilon=2^{-2^{k-1}+2-n}=2^{-126-23}$, $s=0$, $exp=00000000$, $frac=00...01$ | ||
- ... (denormalized) | ||
- 最大 denormalized 正数 (denormalized): $2^{-126} - \epsilon$, $s=0$, $exp=00000000$, $frac=11...11$ | ||
- 最小 normalized 正数 (normalized): $2^{-2^{k-1}+2}=2^{-126}$, $s=0$, $exp=00000001$, $frac=00...00$ | ||
- ... (normalized) | ||
- 最大正数 (normalized): $(2-2^{-n})\times2^{(2^{k-1}-1)}=(2-2^{-23}) \times 2^{127}$, $s=0$, $exp=11111110$, $frac=11...11$ | ||
- 正无穷 (special): $s=0$, $exp=11111111$, $frac=00...00$ | ||
- NaN: $s=0$, $exp=11111111$, $frac\neq 00...00$ | ||
|
||
这样我们知道: | ||
|
||
- fp64: $k=11$, $n=52$, 最大正数为 $2^{1024}-2^{971}$, 最小正数为 $2^{-1074}$ | ||
- fp32: $k=8$, $n=23$, 最大正数为 $2^{128}-2^{104}$, 最小正数为 $2^{-149}$ | ||
- fp16: $k=5$, $n=10$, 最大正数为 $2^{16}-2^3=65536-32=65504$, 最小正数为 $2^{-24}$ | ||
|
||
例子: | ||
``` | ||
1的表示: (1+0.0)*2^0: 0 01111111 000...000 | ||
2的表示: (1+0.0)*2^1: 0 10000000 000...000 | ||
5/2的表示: (1+1/4)*2^1: 0 10000000 010...000 | ||
``` | ||
|
||
## np.frombuffer | ||
|
||
```python | ||
np.frombuffer(b"\x00\x00\x80\x3f\x00\x00\x20\x40", np.float32) # [1.0, 2.5] | ||
|
||
# 每4位为一组: | ||
# \x00: 00000000, \x00: 00000000, \x80: 10000000, \x3f: 00111111 | ||
# 倒序拼接: | ||
# 00111111 10000000 00000000 00000000 | ||
# 然后解码为fp32: 1.0 | ||
|
||
# \x00: 00000000, \x00: 00000000, \x20: 00100000, \x40: 01000000 | ||
# 倒序拼接: | ||
# 01000000 00100000 00000000 00000000 | ||
# 然后解码为fp32: 2.5 | ||
``` |