Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use with 10 um atlas? #43

Open
tmchartrand opened this issue Sep 18, 2023 · 7 comments
Open

Use with 10 um atlas? #43

tmchartrand opened this issue Sep 18, 2023 · 7 comments
Assignees

Comments

@tmchartrand
Copy link

Hi, I'm curious if you've considered using this with a 10um version of the ABA 2017 atlas - would the performance issues likely be prohibitive? If it seems plausible but just hasn't been tried, do you have any documentation of the atlas file format (or even better, conversion scripts) in case I wanted to test it out?

@Tevemadar
Copy link
Member

Tevemadar commented Sep 20, 2023

This is somewhat embarassing, I thought I shared the format somewhere, but that isn't the case.
So far I have per-atlas converters only, this is the one for ABA 2017 25um:

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.IntBuffer;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.zip.Deflater;
import java.util.zip.DeflaterOutputStream;
import java.util.zip.GZIPInputStream;

public class ABAv3_2017 {
    static final String nii="C:\\Data\\NII\\ABA_Mouse_CCFv3_2017.nii.gz";
    static final String label="C:\\Data\\NII\\ABA_Mouse_CCFv3_2017.txt";
    static final String pack="ABA_Mouse_v3_2017.pack";
    static final String json="ABA_Mouse_v3_2017.json";
    static final int head=352;
    static final int xdim=456;
    static final int ydim=528;
    static final int zdim=320;
    public static void main(String[] args) throws Exception {
        byte raw[];
        try(var gis=new GZIPInputStream(new FileInputStream(nii))){
            gis.skipNBytes(head);
            raw=gis.readAllBytes();
        }
        ByteBuffer bb=ByteBuffer.wrap(raw);
        bb.order(ByteOrder.LITTLE_ENDIAN);
        IntBuffer ib=bb.asIntBuffer();
        var stats=new HashMap<Integer, Integer>();
        while(ib.hasRemaining()) {
            int i=ib.get();
            if(i!=0)
                stats.put(i, stats.containsKey(i)?stats.get(i)+1:1);
        }
        stats.put(0,Integer.MAX_VALUE);
        List<Entry<Integer,Integer>> entries=new ArrayList<Entry<Integer,Integer>>(stats.entrySet());
        entries.sort((a,b)->b.getValue()-a.getValue());
        var map=new HashMap<Integer, Integer>();
        var remap=new ArrayList<Integer>();
        for(var e:entries) {
            map.put(e.getKey(),remap.size());
            remap.add(e.getKey());
        }
        var labels=new String[remap.size()];
        try(var br=new BufferedReader(new FileReader(label))){
            Pattern p=Pattern.compile("\\s*(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)[^\"]+\"([^\\\"]+)\"");
            while(br.ready()) {
                String line=br.readLine();
                Matcher m=p.matcher(line);
                if(m.matches()) {
                    Integer r=map.get(Integer.valueOf(m.group(1)));
                    if(r!=null)
                        labels[r]=String.format("{\"rgb\":\"%02x%02x%02x\",\"name\":\"%s\"}",Integer.parseInt(m.group(2)),Integer.parseInt(m.group(3)),Integer.parseInt(m.group(4)),m.group(5));
                }
            }
        }
        try(var pw=new PrintWriter(json)){
            pw.print("{\"name\":\"ABA Mouse CCFv3 2017\",\"encoding\":2,\"xdim\":456,\"ydim\":528,\"zdim\":320,\"labels\":[{}");
            for(var i=1;i<labels.length;i++) {
                pw.println(",");
                pw.print(labels[i]);
            }
            pw.println("],");
            pw.println("\"transformations\":[");
            pw.println("{\"name\":\"Allen CCF v3 [mm] PIR\",\"matrix\":[[0,0,-0.025,0],[-0.025,0,0,0],[0,-0.025,0,0],[13.175,7.975,11.375,1]]}");
            pw.println("],");
            pw.print("\"remap\":");
            pw.print(remap.toString().replaceAll(" ", ""));
            pw.print("}");
            System.out.println(remap.size());
        }
        ib.rewind();
        try(var dos=new DeflaterOutputStream(new BufferedOutputStream(new FileOutputStream(pack)), new Deflater(9, true))){
            var current=Integer.MAX_VALUE;
            var count=0;
            for(var z=0;z<zdim;z++) {
                for(var xy=0;xy<xdim*ydim;xy++) {
                    var w=ib.get();
                    if(!map.containsKey(w))
                        throw new Exception("?"+w);
                    w=map.get(w);
                    if(w!=current) {
                        if(current!=Integer.MAX_VALUE) {
                            encode15(dos,current);
                            encode15(dos,count);
                        }
                        current=w;
                        count=0;
                    }else {
                        count++;
                        if(count==32768) {
                            encode15(dos,current);
                            encode15(dos,count-1);
                            count=0;
                        }
                    }
                }
                System.out.println(z);
            }
            encode15(dos,current);
            encode15(dos,count);
        }
    }
    
//    static void encodeByte(OutputStream os,int data) throws Exception {
//        if(data>255)
//            throw new Exception("!"+data);
//        os.write(data);
//    }

    static void encode15(OutputStream os,int data) throws Exception {
        if(data>=32768)
            throw new Exception("!"+data);
        if(data<128)
            os.write(data);
        else {
            os.write((data>>8)+128);
            os.write(data);
        }
    }
    
//    static void encodeN(OutputStream os,int data) throws Exception { // 1k
//        while(data>127) {
//            os.write((data & 127)+128);
//            data>>=7;
//        }
//        os.write(data);
//    }
}

The code reads a NIfTI volume (which is gzip-d) coupled with an ITK label file. Parsing the NIfTI is skipped though, it assumes a 352-byte header, 456x528x320 voxels resolution (and it's RAS), and 32-bit little endian elements.
The format itself is an RLE encoding (element-repeats pairs), which is then deflated.
The sparse, 32-bit element identifiers are compacted into a variable-length 16-bit representation, that's why their statistics is collected in the the first loop filling up stats. Then identifier 0 (so the empty space) is forced to be the most frequent one, just in case. Then identifiers are getting their new representation in map, the most frequent one will get 0, the second most frequent one gets 1, etc. The original identifiers are also preserved in a simple list, remap (this was added later, now I see a single LinkedHashMap could do both).
Knowing the new, compacted order, the labels array can be prepared, this is where the ITK label file is parsed, and its elements go to their new location, using map. The array elements are JSON objects with rgb and name properties, stored as a simple string.
Then the JSON descriptor is generated, using hardcoded name, resolution, and stuff, but the dynamically generated labels array and remap.
Then comes the deflated RLE thing. As ABA uses more than 256, but less than 32768 labels, "encode15" was a natural choice to represent the identifiers. Anything less than 128 is represented by itself, so a single byte (and those are the most frequent labels, that was the statistics-gathering all about). Anything above will be 2 bytes, first the upper byte, with it highest bit toggled to 1, then the lower byte. The repeats are also "encode15", it was picked by simply testing. With the 10um version it may be better to use the now-commented "encodeN" (modifying occurrences of encode15(dos,count) - there is one after the loop), that variant can handle numbers above 32767 (so the if(count==32768) is not needed any more). When using this combination (encode15 for identifiers and encodeN for repeats), the JSON has to be modified too, to "encoding":3.
And of course alignment data has to be modified, as it uses voxels in the alignment vectors, so when re-using an existing alignment all components need to be multiplied by 2.5.
I hope this helps to some extent, however my guess is that the packed atlas is going to be rather large. I'll switch to a tiled (well, here "cubed"), pyramidal representation for the atlases at some point, and that will solve this kind of problems, but right now I can't even guess when I'll have time to actually do that.

@PolarBean
Copy link
Member

I wonder how small we can get the 10um annotations. On the Allen website they are only 26MB. What in your opinion @Tevemadar is an acceptable size?

@Tevemadar
Copy link
Member

@PolarBean This RLE hack gained 20-25% for the 25um segmentations (4-ish MB to 3-ish MB). WHS is similar (3-4 MB-s to 2-2.5 MB-s). Which is okay for a cheap, 10-line decompressor, but not astonishing.
A tiled/cubed approach can benefit from locality, I have such test code, and that gains 45% on WHSv4, simply by compressing 64x64x64 voxel cubes individually. And then comes the part that not all cubes have to be downloaded, only the ones the user is looking at, especially since LocaliZoom doesn't alter the position/orientation.

@tmchartrand ehm, when writing this comment, I looked into the code and noticed that the decompressor for the variant with "encodeN" doesn't actually exist. Please let me know if you need it.

@tmchartrand
Copy link
Author

This is all very helpful - I'm not sure yet whether I'll put time into this, will probably spend a bit more time exploring the tool as it is first, but I'll let you know if/when I try anything new.
@Tevemadar unless I'm misunderstanding, you were suggesting the encodeN in case more distinct labels needed to be stored than in the existing atlas (not just more labeled voxels), right? That shouldn't be the case, the labeling scheme is unchanged at the different resolutions.

@Tevemadar
Copy link
Member

"encodeN" was meant for the run-lengths only. It may gain something in the empty spaces above and below the curved parts, as at 10um resolution there certainly are a lot of consecutive 0-voxels in those parts.
We are in the middle of multiple deadlines now, so LocaliZoom can't get much nicer until November. However if you need example datasets aligned to Allen CCF, we can provide some links.

@Tevemadar
Copy link
Member

Tevemadar commented Nov 22, 2023

@tmchartrand well, it works to some extent. But it's definitely not in a hurry when loading.
Say, someone

(*) Tadaam after a significant wait.

And doing the nonlinear adjustments thing, not just viewing, will need optimizations for sure. Responsiveness of 25um is okay-ish on most machines, at least that's what I think, while 10um is not something for daily work.

The code itself resides in a branch right now, https://github.com/Tevemadar/LocaliZoom/tree/Allen10 (except for configuration.js, which comes from this repo here).

@PolarBean
Copy link
Member

This is so smooth!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants