Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index > 2GB unsupported #36

Open
flohoff opened this issue Mar 31, 2022 · 2 comments
Open

Index > 2GB unsupported #36

flohoff opened this issue Mar 31, 2022 · 2 comments

Comments

@flohoff
Copy link

flohoff commented Mar 31, 2022

Hi,
i am running into the issue that mairix fails on indexing because the index gets larger than 2GB. When that happens (On initial creation) the index is left a "0" bytes.

It throws an error on the "lseek" which is 32 bit only:

writer.c
106   if (sb.st_size < len) {
107     /* Extend */
108     if (lseek(fd, len - 1, SEEK_SET) < 0) {
109       report_error("lseek", filename);
110       unlock_and_exit(2);
111     }

Flo

@vandry
Copy link
Collaborator

vandry commented Apr 24, 2022

Unfortunately a whole bunch of offset values stored inside the index database which refer to data at other offsets inside the index file are fundamentally 32 bits. This is not just a matter of the lseek syscall, it requires an overhaul of the index format.

If we were to overhaul the whole database format, I think I'd want to take the opportunity to do it using something less error-prone, less custom, more modern and high level than pointer arithmetic, like sqlite, maybe with some protobuf. It would hardly be the same piece of software.

You must have a lot of email though! My personal mairix database which indexes all of my sent and received email for decades is only 81MiB. This excludes all mailing lists and (most) spam, to be sure; maybe that's the difference with yours.

@flohoff
Copy link
Author

flohoff commented Apr 24, 2022

Excluding mailinglists but compressed archive of old mail (gzip -9 on monthly archive folders since 1996)

flo@pax:~$ du -sh Mail
39G Mail

I now exluded old work email and thus reduced the index:

flo@pax:~$ ls -la .mairix*
-rw------- 1 flo flo 876498632 Apr 24 05:05 .mairix_database
-rw-r--r-- 1 flo flo 185 Mar 31 15:21 .mairixrc

Flo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants