Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove UB on overflow in Allocate::next() #250

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 17 additions & 14 deletions src/internals/entity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@ impl Clone for Entity {
const BLOCK_SIZE: u64 = 16;
const BLOCK_SIZE_USIZE: usize = BLOCK_SIZE as usize;

// Always divisible by BLOCK_SIZE.
// Safety: This must never be 0, so skip the first block
static NEXT_ENTITY: AtomicU64 = AtomicU64::new(BLOCK_SIZE);

/// An iterator which yields new entity IDs.
#[derive(Debug)]
pub struct Allocate {
Expand All @@ -62,20 +58,27 @@ impl<'a> Iterator for Allocate {

#[inline(always)]
fn next(&mut self) -> Option<Self::Item> {
// Get next block whenever the block ends.
if self.next % BLOCK_SIZE == 0 {
// This is either the first block, or we overflowed to the next block.
self.next = NEXT_ENTITY.fetch_add(BLOCK_SIZE, Ordering::Relaxed);
debug_assert_eq!(self.next % BLOCK_SIZE, 0);
static NEXT_ENTITY_BLOCK_START: AtomicU64 = AtomicU64::new(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing this to BLOCK_SIZE has the same effect of preventing the first entity from being allocated as 0, but without causing 1 entity in every block to be skipped. It skips the first block instead (which isn't a problem).


// Note: since BLOCK_SIZE is divisible by 2
// and NEXT_ENTITY_BLOCK_START starts at 1,
// this skips one entity index per block.
// It is not the best possible performance,
// but zero will be skipped even on overflow and not just on start.
self.next = NEXT_ENTITY_BLOCK_START.fetch_add(BLOCK_SIZE, Ordering::Relaxed);
debug_assert_eq!(self.next % BLOCK_SIZE, 1);
}

// Safety: self.next can't be 0 as long as the first block is skipped,
// and no overflow occurs in NEXT_ENTITY
let entity = unsafe {
debug_assert_ne!(self.next, 0);
Entity(NonZeroU64::new_unchecked(self.next))
};
Comment on lines -71 to -76
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomGillen
Here's the problem: you have no UB as long as NEXT_ENTITY does not overflow. But how do you know that it won't overflow? Welp, you don't know. If the game runs long enough, it will overflow at some point in time.
Alternatives:

  1. Skip the first block, use NonZero::new, return None after overflow. There is no UB, but this breaks Allocate promise to never return None, and the game will crash with panic on overflow, because we expect Allocate to never return None and simply get next entity by Allocate.next().unwrap().
  2. Skip the first ID in every block. No UB, never returns None. The game is probably doomed to have bugs after overflow anyway, due to rewriting old entities, but it won't crash with a panic.
  3. Do not use NonZero. No UB, never returns None, more performant than previous alternative, has a downside of Option<Entity> not being the same size as Entity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1 is what it is designed to do.

If the internal counter has overflowed, the system will panic (or it should, with checked NonZeroU64 construction). You have exhausted the available 64 bit address space. Every ECS has this issue (generational IDs or not) unless they don't provide unique IDs at all.

If you allocated 1000 entities every frame at 60fps, it would take 10 million years before your program panicked. Even single entity allocations that waste most of a block aren't much worse in reality.

The solution for someone experiencing this issue would be to switch the internal ID from a u64 to a u128. That would give you ~2x10^26 years until panic.

Options 2 and 3 will cause the application to behave incorrectly in bizarre and difficult to diagnose ways instead of panicking, which I am not convinced is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bizarre and difficult to diagnose ways instead of panicking

Yeah, you are right, after ten million years panic is better.

Though, there are ways to avoid those bizarre bugs with something like

fn leak(e: Entity) -> EntityLocation;

which would remove entity ID but won't remove entity data from archetype. But the thing is that users should call it on forever-living entities, and if the user forgets to do so, we are back to bizarre bugs.
Okay, ten million years is a thing.

Copy link

@LikeLakers2 LikeLakers2 Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for bumping this issue, but I have to ask: If the intention is to have it panic when NonZeroU64 overflows, then perhaps that should be included in a comment in this function, if not in the documentation for Allocate as a whole? Maybe a line like this:

Allocate will eventually overflow if enough IDs are generated. However, this is very unlikely to be a concern in reality, unless you plan on running your program for a few hundred millennia, and are generating hundreds of thousands of entity IDs every second during that timespan.

It just seems odd that you'd intend for the program to crash when it overflows, yet not document that fact.

// `NonZeroU64::new(self.next).map(Entity)` would give the same generated code
// as `unsafe { Some(unchecked) }` for *this* function.
// However, there may a difference when you match the resulting inlined `Option`:
// if its discriminant is always `Some`,
// then `None` case could be simply removed from generated code.
let entity = NonZeroU64::new(self.next).map(Entity);
self.next += 1;
Some(entity)
entity
}
}

Expand Down