Skip to content

Commit

Permalink
fix I_GetTime() bug
Browse files Browse the repository at this point in the history
  • Loading branch information
FrenkelS committed Aug 31, 2024
1 parent ce44092 commit 31c0d65
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion i_elks.c
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ static int32_t clock()
int32_t I_GetTime(void)
{

This comment has been minimized.

Copy link
@toncho11

toncho11 Aug 31, 2024

@ghaerr I am sure that there is better implementation of I_GetTime() for ELKS. I saw recently some PRs for improved timers in ELKS?

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 1, 2024

Ok, I’ll look into this. Do you know what the required precision needs to be? Is this function typically called to synchronize a frame display or is there another purpose?

We probably want a very fast function as Doom needs all the speed it can get, vs precision, I’m not sure.

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 1, 2024

Author Owner

1 / 35th of a second.

In the DOS version there's an interrupt that increases the variable ticcount every 1 / 35th of a second. So after a second ticcount is increased by 35. I_GetTime() just returns ticcount.

35 is half of 70, the refresh rate of VGA.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 1, 2024

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 1, 2024

Author Owner

Doom8088 replaces interrupt 8 with it's own implementation. Just like how that DOS article describes it.

And here's the code that sets the TICRATE, which is 35.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 4, 2024

Related: ghaerr/elks#1988

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 4, 2024

If high precision is in fact needed, it would definitely be better to use the new ELKS user mode precision timer routine get_ptime (which returns time in 0.838us increments by reading the 8254 programmable timer chip) than to replace INT 8 or reprogram the PIT. Doing either of the latter will cause the ELKS kernel to fail for any of its internal duties, or the ability to run other programs at the same time.

The unsigned long result of get_ptime could multiplied by an appropriate value to rebase it to 1/35 second, or whatever is desired.

And here's the code that sets the TICRATE, which is 35.

That helps - it appears high precision isn't required, just 1/35's of a second (=28.6ms). The ELKS timer, unlike DOS, is set to 1/100 second (=10ms). There is now a very fast way to get direct access to the 32-bit ELKS 10ms timer value, which could then be multiplied by 3 to easily approximate DOS/Doom's 1/35 second. The code to this is in libc/debug/prectimer.c:

static unsigned long __far *pjiffies;
void init_ptime(void)
{
    int fd, offset, kds;

    fd = open("/dev/kmem", O_RDONLY);
    if (fd < 0) {
        errmsg("No kmem\n");
        return;
    }
    if (ioctl(fd, MEM_GETDS, &kds) < 0 ||
        ioctl(fd, MEM_GETJIFFADDR, &offset) < 0) {
        errmsg("No mem ioctl\n");
    } else {
        pjiffies = _MK_FP(kds, offset);
    }
    close(fd);
}

Then just access the 32-bit "jiffies" timer with a far pointer:

unsigned long tick10ms = *pjiffies;  /* read kernel 10ms timer directly */

tick10ms could then be multiplied by 3, or other code changed slightly so no multiply needed.

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 6, 2024

Author Owner

I_GetTime() now looks like this:

int32_t I_GetTime(void)
{
	uint32_t tick10ms = *pjiffies; /* read kernel 10ms timer directly */
	//return tick10ms * TICRATE / 100;
	return (tick10ms * TICRATE * 10) / 1024;
}

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 6, 2024

Very cool, and a heck of a lot faster! I like the trick dividing by a power of two rather than 1000, saving a slow DIV with decent approximation.

Have you used the Watcom wdis program to check generated code? It might be worth it to ensure the compiler is generating a 10-bit right shift and not a DIV: wdis i_system.obj.

I'm not sure how dependent Doom is on exactly 35HZ but the above code effectively divides the kernel jiffies by 3.5. If I_GetTime is called a lot, things could be made even faster without using a MUL by just taking jiffies and shifting right 2, thus dividing by 4 (and reporting 40HZ intervals).

Given the speed issues on slow 8088/8086's, any other MUL/DIV combos, especially in the screen driver, might want to be looked at closely with wdis or eyeballs to ensure no MULs happen unless absolutely necessary.

Did you have to write a brand new screen driver for ELKS, or are you using the same one from your other Doom ports?

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 6, 2024

Another thought would be to redefine TICRATE to 10, and see what other parts of Doom need to be adjusted, given the change in actual timer frequency from 35ms on DOS to 10ms on ELKS.

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 6, 2024

are you using the same one from your other Doom ports?

Wow, looking at i_vmodey.c now - that's some pretty fast-looking code, nice! It seems Doom is able to take advantage of page-flipping within the EGA memory, since the 320x200 mode doesn't use up all the memory as a single page like the 640x480 graphics mode does. I suppose ELKS's Nano-X screen driver could possibly be reworked the same way, except the graphics would have to run 320x200.

It doesn't appear there's any or at least very few MULs or DIVs in the screen driver, nice!

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 7, 2024

I_GetTime() now looks like this:

int32_t I_GetTime(void)
{
	uint32_t tick10ms = *pjiffies; /* read kernel 10ms timer directly */
	//return tick10ms * TICRATE / 100;
	return (tick10ms * TICRATE * 10) / 1024;
}

Is it a problem that the calculation is unsigned, but the return of the function is signed?
Also we could explicitly return (tick10ms * TICRATE * 10) >> 10;. Or maybe it is better to leave it to the compiler in order not to mess something.
https://stackoverflow.com/questions/6357038/is-multiplication-and-division-using-shift-operators-in-c-actually-faster
For this optimisation to be valid the dividend (the number that is being divided) needs to be either unsigned or must be known to be positive which is the case here.

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 7, 2024

@toncho11: yes, making sure the compiler is emitting a 10-bit right shift is exactly what I was trying to see if it was being emitted by the compiler. ia16-elf-gcc will optimize a divide by 1024 and I'm pretty sure Watcom will also, but I don't know for sure.

needs to be either unsigned or must be known to be positive which is the case here.

Yes, the C rules for signed vs unsigned in this case are: if an unsigned quantity is used with an otherwise signed value or expression, the entire expression becomes unsigned, so it should be performed properly. Another thing to watch for in C is that integer constants are by default signed. So to perform an unsigned division, which is almost always faster, one can write 1024U to designate the number as unsigned 1024. I would guess however that Watcom is producing a right shift here, but wdis is the only way to tell. (And in reality probably doesn't matter in any case, since this routine isn't called that much?)

Or maybe it is better to leave it to the compiler in order not to mess something.

Interesting article - shows the complexities of varying CPUs and compilers.

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 8, 2024

Author Owner

I_GetTime() is called a couple of times per frame.
i_elks.c is now optimized for size. The output of wdis i_elks.c looks like this:

Routine Size: 215 bytes,    Routine Base: i_elks_TEXT + 0173

024A				I_GetTime_:
024A  53				push		bx
024B  51				push		cx
024C  C4 1E 00 00			les		bx,_pjiffies
0250  26 8B 07				mov		ax,es:[bx]
0253  26 8B 57 02			mov		dx,es:0x2[bx]
0257  BB 5E 01				mov		bx,0x015e
025A  31 C9				xor		cx,cx
025C  9A 00 00 00 00			call		__U4M
0261  89 C3				mov		bx,ax
0263  B9 0A 00				mov		cx,0x000a
0266				L$38:
0266  D1 EA				shr		dx,0x01
0268  D1 DB				rcr		bx,0x01
026A  E2 FA				loop		L$38
026C  89 D8				mov		ax,bx
026E  59				pop		cx
026F  5B				pop		bx
0270  CB				retf

When i_elks.c is optimized for time, the output of wdis looks like this:

Routine Size: 298 bytes,    Routine Base: i_elks_TEXT + 0154

027E				I_GetTime_:
027E  53				push		bx
027F  51				push		cx
0280  C4 1E 00 00			les		bx,_pjiffies
0284  26 8B 07				mov		ax,es:[bx]
0287  26 8B 57 02			mov		dx,es:0x2[bx]
028B  BB 5E 01				mov		bx,0x015e
028E  31 C9				xor		cx,cx
0290  9A 00 00 00 00			call		__U4M
0295  89 C3				mov		bx,ax
0297  B1 0A				mov		cl,0x0a
0299  D3 EB				shr		bx,cl
029B  D3 CA				ror		dx,cl
029D  31 D3				xor		bx,dx
029F  81 E2 3F 00			and		dx,0x003f
02A3  31 D3				xor		bx,dx
02A5  89 D8				mov		ax,bx
02A7  59				pop		cx
02A8  5B				pop		bx
02A9  CB				retf

In both cases there's a call to __U4M to multiply tick10ms by 350. I don't like that call.

For comparison, here's the assembly that gcc-ia16 produces:
ia16-elf-gcc -S i_gettime.c -masm=intel -march=i8088 -Os

	.arch i8086,jumps
	.code16
	.intel_syntax noprefix
#NO_APP
	.text
	.global	I_GetTime
	.type	I_GetTime, @function
I_GetTime:
	push	si
	push	bp
	mov	bp,	sp
	push	ds
	push	ds
	xor	bx,	bx
	mov	ds,	bx
	mov	ax,	word ptr [bx+2]
	mov	word ptr [bp-4],	ax
	mov	ax,	350
	mul	word ptr [bp-4]
	xchg	cx,	ax
	mov	bx,	word ptr [bx]
	mov	si,	350
	xchg	bx,	ax
	mul	si
	mov	word ptr [bp-2],	dx
	add	word ptr [bp-2],	cx
	mov	si,	word ptr [bp-2]
	mov	cl,	6
	shl	si,	cl
	mov	cl,	10
	shr	ax,	cl
	or	ax,	si
	mov	dx,	word ptr [bp-2]
	shr	dx,	cl
	mov	sp,	bp
	pop	bp
	pop	si
	push	ss
	pop	ds
	ret
	.size	I_GetTime, .-I_GetTime
	.ident	"GCC: (GNU) 6.3.0"

ia16-elf-gcc -S i_gettime.c -masm=intel -march=i8088 -Ofast

	.arch i8086,jumps
	.code16
	.intel_syntax noprefix
#NO_APP
	.text
	.global	I_GetTime
	.type	I_GetTime, @function
I_GetTime:
	push	si
	push	di
	push	bp
	mov	bp,	sp
	sub	sp,	2
	xor	di,	di
	mov	ds,	di
	mov	bx,	word ptr [di]
	mov	si,	word ptr [di+2]
	mov	ax,	bx
	mov	dx,	si
	shl	ax,	1
	rcl	dx,	1
	add	ax,	bx
	adc	dx,	si
	shl	ax,	1
	rcl	dx,	1
	shl	ax,	1
	rcl	dx,	1
	sub	ax,	bx
	sbb	dx,	si
	mov	di,	ax
	mov	cl,	12
	shr	di,	cl
	mov	cl,	4
	shl	dx,	cl
	mov	word ptr [bp-2],	dx
	shl	ax,	cl
	mov	dx,	word ptr [bp-2]
	or	dx,	di
	sub	ax,	bx
	sbb	dx,	si
	shl	ax,	1
	rcl	dx,	1
	mov	bx,	dx
	mov	cl,	6
	shl	bx,	cl
	mov	cl,	10
	shr	ax,	cl
	or	ax,	bx
	shr	dx,	cl
	mov	sp,	bp
	pop	bp
	pop	di
	pop	si
	push	ss
	pop	ds
	ret
	.size	I_GetTime, .-I_GetTime
	.ident	"GCC: (GNU) 6.3.0"

I could set the TICRATE from 35 to 100 and just return tick10ms, but that would make the game real fast on fast machines. I guess I could set TICRATE to 25 and return tick10ms >> 2. That would limit the maximum frame rate to 25 and the game would feel slower.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 8, 2024

Is it better if you compile with: return (tick10ms * TICRATE * 10) >> 10;?

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 8, 2024

I don't like that call.

Wow @FrenkelS, thanks for taking the time to see what both compilers output with each optimization setting, very interesting to say the least!! I need to take some time to understand how the code generated using ia16-elf-gcc's -Ofast option actually works, but I'm thinking that this particular routine isn't probably where any Doom bottleneck is, given we're only talking about two mulitplies normally.

I will continue looking into ways we might be able to get rid of the multiply completely though, just for the fun of it. I have some ideas that I might be able to use from my optimization of jiffies from 32- to 16-bit from my precision timer routine in ELKS at libc/debug/prectimer.c.

For the record, here's the wdis output for the watcom __U4M routine (libc/watcom/asm/i4m.asm):

Module: /Users/greg/net/elks-gh/libc/watcom/asm/i4m.asm
GROUP: 'DGROUP' _DATA

Segment: i4m_TEXT WORD USE16 00000000 bytes

Routine Size: 0 bytes,    Routine Base: i4m_TEXT + 0000

No disassembly errors

Segment: _DATA WORD USE16 00000000 bytes

Segment: _TEXT WORD USE16 00000018 bytes
0000				__U4M:
0000				__I4M:
0000  93				xchg		ax,bx
0001  50				push		ax
0002  92				xchg		ax,dx
0003  0B C0				or		ax,ax
0005  74 02				je		L$1
0007  F7 E2				mul		dx
0009				L$1:
0009  91				xchg		ax,cx
000A  0B C0				or		ax,ax
000C  74 04				je		L$2
000E  F7 E3				mul		bx
0010  03 C8				add		cx,ax
0012				L$2:
0012  58				pop		ax
0013  F7 E3				mul		bx
0015  03 D1				add		dx,cx
0017  CB				retf

Routine Size: 24 bytes,    Routine Base: _TEXT + 0000

So overall, its almost as fast as GCC's -Os output using two multiplies (but adding code to check whether each is needed), except for the far call/ret overhead, of course.

The gold nugget from what you're showing is how bad the Watcom generated code is for the divide by 1024 (implemented in all cases via a right shift 10) using the -Os optimization: it generates a SHR DX,1 in a loop which executes 10 TIMES!! Did you see that?!! Watcom is trading off a few bytes smaller output with a huge increase in time. Given this new finding, I'm thinking of changing the compilation of the entire ELKS C library for Watcom to use -Ofast, NOT -Os.

What optimization is Doom being compiled with for Watcom? This finding means that any other right or left shifts could be very slow unless used with -Ofast.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 8, 2024

Now that we have a little bit more memory, we could try elksdoom compiled entirely with time/speed optimizations for all routines (and not only the graphics).

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 8, 2024

Author Owner

Everything is compiled using 8086 instructions and optimized for the smallest size, except the drawing routines in i_vmodey.c and r_draw.c. Those are optimized for speed.

https://github.com/FrenkelS/elksdoom/blob/main/compelks.sh:

CCOPTSS="-os             -bt=none -0 -zq -s -mm -wx -zastd=c99 -zls"
CCOPTST="-oaxet -oh -ol+ -bt=none -0 -zq -s -mm -wx -zastd=c99 -zls"
        Optimization
-o{a,b,c,d,e[=<num>],f,f+,h,i,k,l,l+,m,n,o,p,r,s,t,u,x,z}
  a             - relax aliasing constraints
  b             - enable branch prediction
  c             - disable <call followed by return> to <jump> optimization
  d             - disable all optimizations
  e[=<num>]     - expand user functions inline (<num> controls max size)
  f             - generate traceable stack frames as needed
  f+            - always generate traceable stack frames
  h             - enable expensive optimizations (longer compiles)
  i             - expand intrinsic functions inline
  k             - include prologue/epilogue in flow graph
  l             - enable loop optimizations
  l+            - enable loop unrolling optimizations
  m             - generate inline code for math functions
  n             - allow numerically unstable optimizations
  o             - continue compilation if low on memory
  p             - generate consistent floating-point results
  r             - reorder instructions for best pipeline usage
  s             - favor code size over execution time in optimizations
  t             - favor execution time over code size in optimizations
  u             - all functions must have unique addresses
  x             - equivalent to -obmiler -s
  z             - NULL points to valid memory in the target environment

Running the game as ./elksdoom.os2 -timedemo demo3 runs a benchmark. I haven't been able to get that to run due to out of memory errors. So I don't know when a code change or a different optimization setting leads to a faster game. Although the cheat IDRATE, which shows the current frame rate, can be used as a rough indication.

BTW, are there any restrictions on using assembly in ELKS? For example, may I use registers ds and bp? In Doom8088 I use assembly for R_DrawColumn2 and dividing 0xFFFFFFFF by a number.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 8, 2024

I tried to use UMB memory and ./elksdoom.os2 -timedemo demo3, but I got:
Z_CheckHeap: block size does not touch the next block
I used: umb=0xD000:0x1000, so I am reserving 64kb of unused memory using the https://copy.sh/ emulator.
Maybe if you compile me a version with checks disabled I will be able to run it. With these check using the UMB is currently disabled I think. Or maybe add a parameter that disables the checks.

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 8, 2024

are there any restrictions on using assembly in ELKS?

Not really, except switching stacks by replacing SS could cause trouble. SI, DI, DS, ES and BP need to be saved and restored. Looking at R_DrawColumn2 - yes that ought to run just fine. Can wasm assemble that file unmodified, that would be great, or can you just include nasm-assembled files with the Watcom build and it all works?

I happen to have been involved in some other ELKS issues where I was looking at FDOS's long divide routines, so this kind of looks familiar - but what kinds of things is dividing -1UL by a number useful for?

I haven't been able to get that to run due to out of memory errors.

Are you running on an emulator or real hardware? Setting umb= to add 64K extra EMS (on an emulator) and using the new heap= minimal settings? Would it be helpful for me to try to get another 64K HMA High Memory Area just above 1M? I need to look into exactly when segment wraparound works and doesn't but that would realize another ~47k available if I got the ELKS kernel data segment and the 8 EXT buffers moved up there...

I am continuing to look into the GCC vs Watcom compiler issues with various optimizations, to determine the good, bad and ugly between all the options. I'm also thinking of writing an ELKS application profiler, which I think would be pretty cool to determine where programs are spending their time. I'm pretty sure Doom is spending lots of time in the display routine, so it'd probably be a good idea to move to i_vmodya.asm.

Ultimately we should also be able to get ia16-elf-gcc to compile and link Doom since it'll fit in medium model, but that's another story for another day.

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 9, 2024

And why is the demo3 using more memory?

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 9, 2024

Author Owner

I tried to use UMB memory and ./elksdoom.os2 -timedemo demo3, but I got: Z_CheckHeap: block size does not touch the next block I used: umb=0xD000:0x1000, so I am reserving 64kb of unused memory using the https://copy.sh/ emulator. Maybe if you compile me a version with checks disabled I will be able to run it. With these check using the UMB is currently disabled I think. Or maybe add a parameter that disables the checks.

I think there was a bug in the memory allocation code when a segment was higher than 0x7FFF, like in this case 0xD000. Here's a version with that bug fixed, I hope:
elksdoom.zip

can you just include nasm-assembled files with the Watcom build and it all works?

It looks like nasm i_vmodya.asm -f obj -DCPU=i8088 -o WC16\i_vmodya.obj and adding alias source=_source, nearcolormap=_nearcolormap, dest=_dest, R_DrawColumn2_=R_DrawColumn2 to the linker file should work. And changing mov bx, cx to mov cx, bx because gcc-ia puts the input of a function into registers ax, dx and cx, while Watcom uses registers ax, dx and bx.

what kinds of things is dividing -1UL by a number useful for?

Doom uses 16:16 fixed-point numbers. Division is slow and for the renderer exact computations aren't really necessary so a / b can be replaced by a * 1 / b where a and b are fixed-point numbers.
In 1 / b is 1 also a fixed-point number, so it should actually be 0x00010000 / b. Division of fixed-point numbers looks like this: ((int64_t)a << 16) / b. Setting a to 0x00010000 results in 0x00010000 << 16, which is 0x100000000 and that doesn't fit in 32 bits, So you would need to do a division on a 64-bit number and that's really slow on a 16-bit cpu. But 0x100000000 minus 1 is equal to 0xFFFFFFFF and that does fit in 32 bits.
And because exact computations aren't really necessary dividing -1UL by a number is good enough for the graphics part of the engine.

Are you running on an emulator or real hardware?

I'm using 86Box and I must confess I haven't really looked at all the memory settings in 86Box and ELKS yet. I just took the latest master build (fd1440.img) and didn't change any settings.

And why is the demo3 using more memory?

demo3 is a demo of level 7. That level is much bigger than the first level, so it requires more memory.

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 9, 2024

because gcc-ia puts the input of a function into registers ax, dx and cx,

Only when you are compiling using ia16-elf-gcc -mregparmcall, where they are passed in AX, DX, CX, DI and SI for the first five parms. Were you using that rather than the standard calling sequence, which is __cdecl, when ia16-elf-gcc was used? The register parameter calling sequence is somewhat poorly tested with ELKS, I think it works but not sure. When __cdecl calling sequence is used, ia16-elf-gcc pushes all the parameters on the stack in reverse order, and doesn't use registers for any argument passing. Of course, Watcom's register calling sequence is completely different, and IMO lots faster.

But 0x100000000 minus 1 is equal to 0xFFFFFFFF and that does fit in 32 bits.

Thanks for the full explanation, very interesting!! I see now why having such an ASM routine goes a long way for speed. Speaking of which, I have a big need for a very quick ASM version of divmod10 (divide by 10 and modulo by 10 and return both from the same operation) - have you happened to see a routine like that anywhere? Currently, ia16-elks-gcc generates seperate calls to __udivsi3 and __umodsi3 for things like the following, which are used by many routines doing numeric-to-ascii string conversion:

   unsigned long val;
   unsigned int rem;
   rem = val % 10;  // generates call to __umodsi3
   val = val / 10;  // generates another call to __udivsi3

What I'd like to see would be something like:

   unsigned long val;
   unsigned int rem;
   val = __divmod10(val, &rem);  // generate divide result and remainder in same call

which would be lots faster. I'll probably write my own but its sure nice to start with something already working.

I haven't tested Watcom yet for the above, but I notice that their __U4D (in libc/watcom/asm/i4d.asm) does calculate and return both quotient and remainder in the same call, which is promising if the compiler will use the separate results from the same operation in two adjacent lines of C.

This comment has been minimized.

Copy link
@FrenkelS

FrenkelS Sep 10, 2024

Author Owner

I'm pretty sure I've tried with and without -mregparmcall.

I know of ldiv in stdlib.h, but that's about it.
ldiv_t ldiv(long __numer, long __denom) where ldiv_t is

typedef struct 
{
  long quot; /* quotient */
  long rem; /* remainder */
} ldiv_t;

This comment has been minimized.

Copy link
@ghaerr

ghaerr Sep 10, 2024

Thanks @FrenkelS. The OpenWatcom codebase has got lots of great stuff in it. I ran across a similar __uldiv routine in their ltoa() function implementation and have been comparing that with __U4D, their full unsigned long divide. I have now written a custom __divmod routine based on OW's technique, which divides a 32-bit number by a 16-bit number as fast as possible, using one or two DIV instructions.

A custom quick multiply routine could also be written to multiply a 32-bit number by a 16-bit constant, if you think we need that, but looking harder at I4M.asm shows they're not executing any extra MUL instructions at the cost of a few CMP/OR instructions. Since you'd still need a far procedure call, I'm not sure its worth it. We could also do some tricks with jiffies when it wraps 16-bit and have C code do a 16x16 single multiply, but not sure that means much either, considering the time spent in the display update routine(s).

This comment has been minimized.

Copy link
@toncho11

toncho11 Sep 10, 2024

I think there was a bug in the memory allocation code when a segment was higher than 0x7FFF, like in this case 0xD000. Here's a version with that bug fixed, I hope:
elksdoom.zip

No. It gives the same error Z_CheckHeap: block size does not touch the next block as before when using UMB.

int32_t now = clock();
return ((now - basetime) * TICRATE) / CLOCKS_PER_SEC;
return ((now - basetime) * TICRATE) / 1000L;
}


Expand Down

0 comments on commit 31c0d65

Please sign in to comment.