Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory: Use triomphe::Arc for SharedReference #8622

Merged
merged 2 commits into from
Jun 28, 2024
Merged

Conversation

bgw
Copy link
Member

@bgw bgw commented Jun 28, 2024

Description

This eliminates the unused weakref count in std::sync::Arc, saving 64 bits per unique SharedReference.

I noticed there's additional potential optimizations we could do here too (not included in this PR), of increasing levels of difficulty:

  • We can deduplicate ValueTypeId by moving it inside the Arc.

  • We can build a mapping of std::any::TypeId to ValueTypeId, and avoid storing the ValueTypeId entirely.

  • We can deduplicate the fat pointer's layout metadata by also storing it inside the Arc using the nightly ptr_metadata feature, similar to triomphe::ThinArc (but that only works for slices of non-dst elements, presumably because they don't want to depend on nightly).

Testing Instructions

Using dhat for measuring the max heap size (vercel/next.js#67166)...

Do a release build

PACK_NEXT_COMPRESS=objcopy-zstd pnpm pack-next --release --features __internal_dhat-heap

Start the dev server in shadcn-ui, and try to load the homepage:

pnpm i
pnpm --filter=www dev --turbo
curl http://localhost:3003/

After curl exits, kill the dev server.

pkill -INT next-server

Before (2 Runs):

[dhat]: Teardown profiler
dhat: Total:     3,106,545,900 bytes in 20,113,580 blocks
dhat: At t-gmax: 731,860,693 bytes in 4,924,028 blocks
dhat: At t-end:  731,860,693 bytes in 4,924,028 blocks
[dhat]: Teardown profiler
dhat: Total:     3,108,036,858 bytes in 20,111,694 blocks
dhat: At t-gmax: 730,059,664 bytes in 4,923,002 blocks
dhat: At t-end:  730,059,536 bytes in 4,923,001 blocks

After (2 Runs):

[dhat]: Teardown profiler
dhat: Total:     3,093,298,170 bytes in 20,127,818 blocks
dhat: At t-gmax: 727,258,939 bytes in 4,923,863 blocks
dhat: At t-end:  727,155,146 bytes in 4,923,901 blocks
[dhat]: Teardown profiler
dhat: Total:     3,102,661,690 bytes in 20,124,408 blocks
dhat: At t-gmax: 728,190,876 bytes in 4,924,856 blocks
dhat: At t-end:  728,124,976 bytes in 4,924,236 blocks

This is a 0.4426% reduction.

Copy link

vercel bot commented Jun 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
rust-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 28, 2024 5:24am
8 Ignored Deployments
Name Status Preview Comments Updated (UTC)
examples-basic-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-designsystem-docs ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-gatsby-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-kitchensink-blog ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-native-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-svelte-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-tailwind-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am
examples-vite-web ⬜️ Ignored (Inspect) Visit Preview Jun 28, 2024 5:24am

Copy link
Member Author

bgw commented Jun 28, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bgw and the rest of your teammates on Graphite Graphite

Copy link
Contributor

github-actions bot commented Jun 28, 2024

🟢 Turbopack Benchmark CI successful 🟢

Thanks

Copy link
Contributor

github-actions bot commented Jun 28, 2024

⚠️ CI failed ⚠️

The following steps have failed in CI:

  • Turbopack Rust tests (mac/win, non-blocking)

See workflow summary for details

Copy link
Contributor

✅ This change can build next-swc

bgw added 2 commits June 27, 2024 22:20
This eliminates the unused weakref count in `std::sync::Arc`, saving 64
bits per unique `SharedReference`.

I noticed there's additional potential optimizations we could do here
too (not included in this PR), of increasing levels of difficulty:

- We can deduplicate `ValueTypeId` by moving it inside the `Arc`.

- We can build a mapping of `std::any::TypeId` to `ValueTypeId`, and
  avoid storing the `ValueTypeId` entirely.

- We can deduplicate the fat pointer's layout metadata by also storing
  it inside the `Arc` using the nightly `ptr_metadata` feature, similar
  to `triomphe::ThinArc` (but that only works for slices of non-dst
  elements, presumably because they don't want to depend on nightly).
@bgw bgw force-pushed the bgw/shared-reference-triomphe branch from 540d318 to d8a54b2 Compare June 28, 2024 05:22
@bgw bgw marked this pull request as ready for review June 28, 2024 05:24
@bgw bgw requested review from a team as code owners June 28, 2024 05:24
@bgw bgw requested review from mehulkar and Zertsov June 28, 2024 05:24
@bgw bgw assigned sokra, kdy1 and arlyon Jun 28, 2024
@bgw bgw merged commit 66c2f55 into main Jun 28, 2024
58 of 60 checks passed
@bgw bgw deleted the bgw/shared-reference-triomphe branch June 28, 2024 15:54
bgw added a commit to vercel/next.js that referenced this pull request Jul 3, 2024
Tobias Koppers - fix typo (vercel/turborepo#8619)
Benjamin Woodruff - Store aggregate read/execute count statistics
(vercel/turborepo#8286)
Tobias Koppers - box InProgress task state (vercel/turborepo#8644)
Tobias Koppers - Task Edges Set/List (vercel/turborepo#8624)
Benjamin Woodruff - Memory: Use `triomphe::Arc` for `SharedReference`
(vercel/turborepo#8622)
Will Binns-Smith - chore: release npm packages (vercel/turborepo#8614)
Will Binns-Smith - devlow-bench: add git branch and sha to datapoints
(vercel/turborepo#8602)

---

Fixes a `triomphe` package version conflict between turbopack and swc by
bumping it from 0.1.11 to 0.1.13.
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 25, 2024
)

### Description

This eliminates the unused weakref count in `std::sync::Arc`, saving 64 bits per unique `SharedReference`.

I noticed there's additional potential optimizations we could do here too (not included in this PR), of increasing levels of difficulty:

- We can deduplicate `ValueTypeId` by moving it inside the `Arc`.

- We can build a mapping of `std::any::TypeId` to `ValueTypeId`, and avoid storing the `ValueTypeId` entirely.

- We can deduplicate the fat pointer's layout metadata by also storing it inside the `Arc` using the nightly `ptr_metadata` feature, similar to `triomphe::ThinArc` (but that only works for slices of non-dst elements, presumably because they don't want to depend on nightly).

### Testing Instructions

Using dhat for measuring the max heap size (vercel/next.js/67166)...

Do a release build

```
PACK_NEXT_COMPRESS=objcopy-zstd pnpm pack-next --release --features __internal_dhat-heap
```

Start the dev server in shadcn-ui, and try to load the homepage:

```
pnpm i
pnpm --filter=www dev --turbo
curl http://localhost:3003/
```

After curl exits, kill the dev server.

```
pkill -INT next-server
```

**Before (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,106,545,900 bytes in 20,113,580 blocks
dhat: At t-gmax: 731,860,693 bytes in 4,924,028 blocks
dhat: At t-end:  731,860,693 bytes in 4,924,028 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,108,036,858 bytes in 20,111,694 blocks
dhat: At t-gmax: 730,059,664 bytes in 4,923,002 blocks
dhat: At t-end:  730,059,536 bytes in 4,923,001 blocks
```


**After (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,093,298,170 bytes in 20,127,818 blocks
dhat: At t-gmax: 727,258,939 bytes in 4,923,863 blocks
dhat: At t-end:  727,155,146 bytes in 4,923,901 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,102,661,690 bytes in 20,124,408 blocks
dhat: At t-gmax: 728,190,876 bytes in 4,924,856 blocks
dhat: At t-end:  728,124,976 bytes in 4,924,236 blocks
```

This is [a 0.4426% reduction](https://www.wolframalpha.com/input?i=Percent+change+from+%28731%2C860%2C693%2B730%2C059%2C664%29%2F2+to+%28727%2C258%2C939%2B728%2C190%2C876%29%2F2).
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
)

### Description

This eliminates the unused weakref count in `std::sync::Arc`, saving 64 bits per unique `SharedReference`.

I noticed there's additional potential optimizations we could do here too (not included in this PR), of increasing levels of difficulty:

- We can deduplicate `ValueTypeId` by moving it inside the `Arc`.

- We can build a mapping of `std::any::TypeId` to `ValueTypeId`, and avoid storing the `ValueTypeId` entirely.

- We can deduplicate the fat pointer's layout metadata by also storing it inside the `Arc` using the nightly `ptr_metadata` feature, similar to `triomphe::ThinArc` (but that only works for slices of non-dst elements, presumably because they don't want to depend on nightly).

### Testing Instructions

Using dhat for measuring the max heap size (vercel/next.js/67166)...

Do a release build

```
PACK_NEXT_COMPRESS=objcopy-zstd pnpm pack-next --release --features __internal_dhat-heap
```

Start the dev server in shadcn-ui, and try to load the homepage:

```
pnpm i
pnpm --filter=www dev --turbo
curl http://localhost:3003/
```

After curl exits, kill the dev server.

```
pkill -INT next-server
```

**Before (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,106,545,900 bytes in 20,113,580 blocks
dhat: At t-gmax: 731,860,693 bytes in 4,924,028 blocks
dhat: At t-end:  731,860,693 bytes in 4,924,028 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,108,036,858 bytes in 20,111,694 blocks
dhat: At t-gmax: 730,059,664 bytes in 4,923,002 blocks
dhat: At t-end:  730,059,536 bytes in 4,923,001 blocks
```


**After (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,093,298,170 bytes in 20,127,818 blocks
dhat: At t-gmax: 727,258,939 bytes in 4,923,863 blocks
dhat: At t-end:  727,155,146 bytes in 4,923,901 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,102,661,690 bytes in 20,124,408 blocks
dhat: At t-gmax: 728,190,876 bytes in 4,924,856 blocks
dhat: At t-end:  728,124,976 bytes in 4,924,236 blocks
```

This is [a 0.4426% reduction](https://www.wolframalpha.com/input?i=Percent+change+from+%28731%2C860%2C693%2B730%2C059%2C664%29%2F2+to+%28727%2C258%2C939%2B728%2C190%2C876%29%2F2).
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
)

### Description

This eliminates the unused weakref count in `std::sync::Arc`, saving 64 bits per unique `SharedReference`.

I noticed there's additional potential optimizations we could do here too (not included in this PR), of increasing levels of difficulty:

- We can deduplicate `ValueTypeId` by moving it inside the `Arc`.

- We can build a mapping of `std::any::TypeId` to `ValueTypeId`, and avoid storing the `ValueTypeId` entirely.

- We can deduplicate the fat pointer's layout metadata by also storing it inside the `Arc` using the nightly `ptr_metadata` feature, similar to `triomphe::ThinArc` (but that only works for slices of non-dst elements, presumably because they don't want to depend on nightly).

### Testing Instructions

Using dhat for measuring the max heap size (vercel/next.js/67166)...

Do a release build

```
PACK_NEXT_COMPRESS=objcopy-zstd pnpm pack-next --release --features __internal_dhat-heap
```

Start the dev server in shadcn-ui, and try to load the homepage:

```
pnpm i
pnpm --filter=www dev --turbo
curl http://localhost:3003/
```

After curl exits, kill the dev server.

```
pkill -INT next-server
```

**Before (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,106,545,900 bytes in 20,113,580 blocks
dhat: At t-gmax: 731,860,693 bytes in 4,924,028 blocks
dhat: At t-end:  731,860,693 bytes in 4,924,028 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,108,036,858 bytes in 20,111,694 blocks
dhat: At t-gmax: 730,059,664 bytes in 4,923,002 blocks
dhat: At t-end:  730,059,536 bytes in 4,923,001 blocks
```


**After (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,093,298,170 bytes in 20,127,818 blocks
dhat: At t-gmax: 727,258,939 bytes in 4,923,863 blocks
dhat: At t-end:  727,155,146 bytes in 4,923,901 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,102,661,690 bytes in 20,124,408 blocks
dhat: At t-gmax: 728,190,876 bytes in 4,924,856 blocks
dhat: At t-end:  728,124,976 bytes in 4,924,236 blocks
```

This is [a 0.4426% reduction](https://www.wolframalpha.com/input?i=Percent+change+from+%28731%2C860%2C693%2B730%2C059%2C664%29%2F2+to+%28727%2C258%2C939%2B728%2C190%2C876%29%2F2).
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 1, 2024
)

### Description

This eliminates the unused weakref count in `std::sync::Arc`, saving 64 bits per unique `SharedReference`.

I noticed there's additional potential optimizations we could do here too (not included in this PR), of increasing levels of difficulty:

- We can deduplicate `ValueTypeId` by moving it inside the `Arc`.

- We can build a mapping of `std::any::TypeId` to `ValueTypeId`, and avoid storing the `ValueTypeId` entirely.

- We can deduplicate the fat pointer's layout metadata by also storing it inside the `Arc` using the nightly `ptr_metadata` feature, similar to `triomphe::ThinArc` (but that only works for slices of non-dst elements, presumably because they don't want to depend on nightly).

### Testing Instructions

Using dhat for measuring the max heap size (vercel/next.js/67166)...

Do a release build

```
PACK_NEXT_COMPRESS=objcopy-zstd pnpm pack-next --release --features __internal_dhat-heap
```

Start the dev server in shadcn-ui, and try to load the homepage:

```
pnpm i
pnpm --filter=www dev --turbo
curl http://localhost:3003/
```

After curl exits, kill the dev server.

```
pkill -INT next-server
```

**Before (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,106,545,900 bytes in 20,113,580 blocks
dhat: At t-gmax: 731,860,693 bytes in 4,924,028 blocks
dhat: At t-end:  731,860,693 bytes in 4,924,028 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,108,036,858 bytes in 20,111,694 blocks
dhat: At t-gmax: 730,059,664 bytes in 4,923,002 blocks
dhat: At t-end:  730,059,536 bytes in 4,923,001 blocks
```


**After (2 Runs):**

```
[dhat]: Teardown profiler
dhat: Total:     3,093,298,170 bytes in 20,127,818 blocks
dhat: At t-gmax: 727,258,939 bytes in 4,923,863 blocks
dhat: At t-end:  727,155,146 bytes in 4,923,901 blocks
```

```
[dhat]: Teardown profiler
dhat: Total:     3,102,661,690 bytes in 20,124,408 blocks
dhat: At t-gmax: 728,190,876 bytes in 4,924,856 blocks
dhat: At t-end:  728,124,976 bytes in 4,924,236 blocks
```

This is [a 0.4426% reduction](https://www.wolframalpha.com/input?i=Percent+change+from+%28731%2C860%2C693%2B730%2C059%2C664%29%2F2+to+%28727%2C258%2C939%2B728%2C190%2C876%29%2F2).
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 16, 2024
Tobias Koppers - fix typo (vercel/turborepo#8619)
Benjamin Woodruff - Store aggregate read/execute count statistics
(vercel/turborepo#8286)
Tobias Koppers - box InProgress task state (vercel/turborepo#8644)
Tobias Koppers - Task Edges Set/List (vercel/turborepo#8624)
Benjamin Woodruff - Memory: Use `triomphe::Arc` for `SharedReference`
(vercel/turborepo#8622)
Will Binns-Smith - chore: release npm packages (vercel/turborepo#8614)
Will Binns-Smith - devlow-bench: add git branch and sha to datapoints
(vercel/turborepo#8602)

---

Fixes a `triomphe` package version conflict between turbopack and swc by
bumping it from 0.1.11 to 0.1.13.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants