Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RANS Nx16 codec (Update CRAM Codecs to CRAM 3.1) #1618

Open
wants to merge 66 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
e493a5a
adding comments to Frequencies.java
Jan 14, 2022
9d9e681
separate encode and decode classes
Mar 1, 2022
a84de97
Add Frequency methods to encode and decode classes
Mar 1, 2022
17073e1
clean up rans tests and add separate packages for rans 4x8 and nx16
Mar 7, 2022
8d88da0
filter out extra column from q40+dir file
Mar 8, 2022
3ec829a
rans nx16 order 1 freq tables + refactor
Mar 18, 2022
835bdf6
clean up
Mar 18, 2022
44548c3
Update RAN test method names.
cmnbroad Apr 20, 2022
6f3e9d5
Remove unncessary params arg from uncompress methods (params are embe…
cmnbroad Apr 20, 2022
9548daf
Remove unnecessary RANSNx16Params state.
cmnbroad Apr 20, 2022
3542cde
Fix bug in the case where the cat bit is set.
cmnbroad Apr 20, 2022
6f71686
Reduce unncessary buffer allocation.
cmnbroad Apr 20, 2022
b39e87d
Thread RANSNx16 params through RANSNx16 implementation.
cmnbroad Apr 20, 2022
c0b961c
Dont initialize RANSNx16 decoding structures unless we're going to us…
cmnbroad Apr 20, 2022
7aa9da9
Move/inline RANS Nx16 D0N uncompress method into RANSNx16Decode.
cmnbroad Apr 22, 2022
c4588b4
Move/inline RANS Nx16 D1N uncompress method into RANSNx16Decode.
cmnbroad Apr 25, 2022
390289f
Move/inline RANS Nx16 E0N compress method into RANSNx16Encode.
cmnbroad Apr 25, 2022
a009786
Move/inline RANS Nx16 E1N compress method into RANSNx16Encode.
cmnbroad Apr 25, 2022
0dc38d4
Suppress spotbugs warnings.
cmnbroad Apr 25, 2022
393e6a6
Don't initialize RANS4x8 decoding structure unless we're going to use…
Apr 27, 2022
26618cf
Move/inline RANS 4x8 E04 compress method into RANS4x8Encode.
Apr 28, 2022
58f76a8
Move/inline RANS 4x8 E14 compress method into RANS4x8Encode.
Apr 28, 2022
58c14cd
Move/inline RANS 4x8 D04 uncompress method into RANS4x8Decode.
Apr 28, 2022
84fd017
Move/inline RANS 4x8 D14 uncompress method into RANS4x8Decode.
Apr 28, 2022
50b8050
Fix normalized Frequency (4096), add normalize Frequency using bit sh…
May 17, 2022
a34ed60
Add ransNx16 for format flags = 1,4,5 (N=32) and replace division wit…
Jun 3, 2022
eb07a9c
When CAT is true, add limit and rewind the outBuffer before returning…
Jun 6, 2022
943d454
Add RANSTest with formatflags = 32, 33, 36, 37
Jun 6, 2022
7411a41
Remove initialization of alphabet array.
Jun 6, 2022
63664f7
Add RLE Encode and Decode. Works as expected for RANSNx16 Order 0
Jul 25, 2022
6d00810
Move declaration of variables used within the for loop to inside the …
Jul 25, 2022
6913c84
Convert symbols from int to byte
Jul 25, 2022
3eaef41
rename getInterleaveSize to getNumInterleavedRANSStates in RANSNx16Pa…
Jul 25, 2022
a4b2bb1
RLE encode and decode works as expected for RANSNx16 Order 1
Jul 26, 2022
0c80250
add encode and decode Pack. Add test cases for pack
Aug 11, 2022
5b88e2f
rename variable for better readability
Aug 16, 2022
09af7e8
add exception when num of distinct symbols = 0 or > 16
Aug 19, 2022
5bfa10c
Add Decode Stripe to RANS Nx16. Add getFormatFlags() to RANSParams
Aug 23, 2022
c734fbb
Add test for Encoding when Stripe Flag is set
Aug 26, 2022
1c84f8c
Fix Spot Bugs warn - Use && for logical and
Aug 26, 2022
e641709
Addressing the feedback from Aug 30, 2022
Sep 6, 2022
c514bee
Use the Interop Test files from samtools-1.14/htslib-1.14/htscodecs/t…
Sep 8, 2022
3ee96c5
Replace hex literals with bit flag masks in RANSInteropTest Data Prov…
Sep 8, 2022
4129cb3
Addressing the feedback so far
Sep 9, 2022
0ffc6ba
rename methods that return boolean to start with 'is' instead of 'get'
Oct 1, 2022
947e8e4
debug
Oct 17, 2022
a04ede6
Addressing the feedback from 10/25/22
Oct 31, 2022
266ec6d
undo inadvertent deletion of RANSInterop roundtrip test logic
Nov 22, 2022
1ecb5c0
debug - add decodePack and decodeRLE on top of CAT flag
Mar 21, 2023
06a89a8
rewind outBuffer before it is returned
Mar 21, 2023
04c813f
remove duplicate outBuffer creation
Mar 21, 2023
dff9d51
Addressing the feedback from oct 11, 2023 except implementing the Str…
Oct 19, 2023
3b25380
Move common methods to CRAMInteropTestUtils class
Oct 26, 2023
015491b
Addressing the feedback from Nov 7 and Nov 20 - part 1
Dec 1, 2023
4341721
Addressing the feedback from Nov 7 and Nov 20 - part 2
Dec 5, 2023
c7a06a9
Addressing the feedback from Nov 7 and Nov 20 - part 3
Dec 6, 2023
bc7cced
Addressing the feedback from Nov 7 and Nov 20 - part 4
Dec 13, 2023
c0ff577
Addressing the feedback from Nov 7 and Nov 20 - part 5
Dec 18, 2023
7d72393
Addressing the feedback from Nov 7 and Nov 20 - part 6
Dec 20, 2023
e898080
Addressing the feedback from Nov 7 and Nov 20 - part 6
Dec 21, 2023
250f901
Move common code to CompressionUtils
Jan 10, 2024
6449de8
Minor final review comments.
cmnbroad Feb 27, 2024
c411ada
Use copies of the CRAM 3.1 interop test streams that originate inthe …
cmnbroad Feb 27, 2024
fe0e908
Update to use the locally checked in CRAM 3.1 interop test data.
cmnbroad Feb 27, 2024
8ed004f
A little naming cleanup.
cmnbroad Mar 5, 2024
f1f9c20
Fix issue introduced by conflict resolution of a rebase.
cmnbroad Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions src/main/java/htsjdk/samtools/cram/compression/CompressionUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
package htsjdk.samtools.cram.compression;

import htsjdk.samtools.cram.CRAMException;
import htsjdk.samtools.cram.compression.rans.Constants;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class CompressionUtils {
public static void writeUint7(final int i, final ByteBuffer cp) {
int s = 0;
int X = i;
do {
s += 7;
X >>= 7;
} while (X > 0);
do {
s -= 7;
//writeByte
final int s_ = (s > 0) ? 1 : 0;
cp.put((byte) (((i >> s) & 0x7f) + (s_ << 7)));
} while (s > 0);
}

public static int readUint7(final ByteBuffer cp) {
int i = 0;
int c;
do {
//read byte
c = cp.get();
i = (i << 7) | (c & 0x7f);
} while ((c & 0x80) != 0);
return i;
}

public static ByteBuffer encodePack(
final ByteBuffer inBuffer,
final ByteBuffer outBuffer,
final int[] frequencyTable,
final int[] packMappingTable,
final int numSymbols){
final int inSize = inBuffer.remaining();
final ByteBuffer encodedBuffer;
if (numSymbols <= 1) {
encodedBuffer = CompressionUtils.allocateByteBuffer(0);
} else if (numSymbols <= 2) {

// 1 bit per value
final int encodedBufferSize = (int) Math.ceil((double) inSize/8);
encodedBuffer = CompressionUtils.allocateByteBuffer(encodedBufferSize);
int j = -1;
for (int i = 0; i < inSize; i ++) {
if (i % 8 == 0) {
encodedBuffer.put(++j, (byte) 0);
}
encodedBuffer.put(j, (byte) (encodedBuffer.get(j) + (packMappingTable[inBuffer.get(i) & 0xFF] << (i % 8))));
}
} else if (numSymbols <= 4) {

// 2 bits per value
final int encodedBufferSize = (int) Math.ceil((double) inSize/4);
encodedBuffer = CompressionUtils.allocateByteBuffer(encodedBufferSize);
int j = -1;
for (int i = 0; i < inSize; i ++) {
if (i % 4 == 0) {
encodedBuffer.put(++j, (byte) 0);
}
encodedBuffer.put(j, (byte) (encodedBuffer.get(j) + (packMappingTable[inBuffer.get(i) & 0xFF] << ((i % 4) * 2))));
}
} else {

// 4 bits per value
final int encodedBufferSize = (int) Math.ceil((double)inSize/2);
encodedBuffer = CompressionUtils.allocateByteBuffer(encodedBufferSize);
int j = -1;
for (int i = 0; i < inSize; i ++) {
if (i % 2 == 0) {
encodedBuffer.put(++j, (byte) 0);
}
encodedBuffer.put(j, (byte) (encodedBuffer.get(j) + (packMappingTable[inBuffer.get(i) & 0xFF] << ((i % 2) * 4))));
}
}

// write numSymbols
outBuffer.put((byte) numSymbols);

// write mapping table "packMappingTable" that converts mapped value to original symbol
for(int i = 0; i < Constants.NUMBER_OF_SYMBOLS; i ++) {
if (frequencyTable[i] > 0) {
outBuffer.put((byte) i);
}
}

// write the length of data
CompressionUtils.writeUint7(encodedBuffer.limit(), outBuffer);
return encodedBuffer; // Here position = 0 since we have always accessed the data buffer using index
}

public static ByteBuffer decodePack(
final ByteBuffer inBuffer,
final byte[] packMappingTable,
final int numSymbols,
final int uncompressedPackOutputLength) {
final ByteBuffer outBufferPack = CompressionUtils.allocateByteBuffer(uncompressedPackOutputLength);
int j = 0;
if (numSymbols <= 1) {
for (int i=0; i < uncompressedPackOutputLength; i++){
outBufferPack.put(i, packMappingTable[0]);
}
}

// 1 bit per value
else if (numSymbols <= 2) {
int v = 0;
for (int i=0; i < uncompressedPackOutputLength; i++){
if (i % 8 == 0){
v = inBuffer.get(j++);
}
outBufferPack.put(i, packMappingTable[v & 1]);
v >>=1;
}
}

// 2 bits per value
else if (numSymbols <= 4){
int v = 0;
for(int i=0; i < uncompressedPackOutputLength; i++){
if (i % 4 == 0){
v = inBuffer.get(j++);
}
outBufferPack.put(i, packMappingTable[v & 3]);
v >>=2;
}
}

// 4 bits per value
else if (numSymbols <= 16){
int v = 0;
for(int i=0; i < uncompressedPackOutputLength; i++){
if (i % 2 == 0){
v = inBuffer.get(j++);
}
outBufferPack.put(i, packMappingTable[v & 15]);
v >>=4;
}
}
return outBufferPack;
}



public static ByteBuffer allocateOutputBuffer(final int inSize) {
// This calculation is identical to the one in samtools rANS_static.c
// Presumably the frequency table (always big enough for order 1) = 257*257,
// then * 3 for each entry (byte->symbol, 2 bytes -> scaled frequency),
// + 9 for the header (order byte, and 2 int lengths for compressed/uncompressed lengths).
final int compressedSize = (int) (inSize + 257 * 257 * 3 + 9);
final ByteBuffer outputBuffer = ByteBuffer.allocate(compressedSize).order(ByteOrder.LITTLE_ENDIAN);
if (outputBuffer.remaining() < compressedSize) {
throw new CRAMException("Failed to allocate sufficient buffer size for RANS coder.");
}
return outputBuffer;
}

// returns a new LITTLE_ENDIAN ByteBuffer of size = bufferSize
public static ByteBuffer allocateByteBuffer(final int bufferSize){
return ByteBuffer.allocate(bufferSize).order(ByteOrder.LITTLE_ENDIAN);
}

// returns a LITTLE_ENDIAN ByteBuffer that is created by wrapping a byte[]
public static ByteBuffer wrap(final byte[] inputBytes){
return ByteBuffer.wrap(inputBytes).order(ByteOrder.LITTLE_ENDIAN);
}

// returns a LITTLE_ENDIAN ByteBuffer that is created by inputBuffer.slice()
public static ByteBuffer slice(final ByteBuffer inputBuffer){
return inputBuffer.slice().order(ByteOrder.LITTLE_ENDIAN);
}
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package htsjdk.samtools.cram.compression;

import htsjdk.samtools.cram.compression.rans.RANS;
import htsjdk.samtools.cram.compression.rans.rans4x8.RANS4x8Decode;
import htsjdk.samtools.cram.compression.rans.rans4x8.RANS4x8Encode;
import htsjdk.samtools.cram.structure.block.BlockCompressionMethod;
import htsjdk.utils.ValidationUtils;

Expand Down Expand Up @@ -71,8 +72,8 @@ public static ExternalCompressor getCompressorForMethod(

case RANS:
return compressorSpecificArg == NO_COMPRESSION_ARG ?
new RANSExternalCompressor(new RANS()) :
new RANSExternalCompressor(compressorSpecificArg, new RANS());
new RANSExternalCompressor(new RANS4x8Encode(), new RANS4x8Decode()) :
new RANSExternalCompressor(compressorSpecificArg, new RANS4x8Encode(), new RANS4x8Decode());

case BZIP2:
ValidationUtils.validateArg(
Expand All @@ -85,5 +86,4 @@ public static ExternalCompressor getCompressorForMethod(
}
}

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -24,48 +24,60 @@
*/
package htsjdk.samtools.cram.compression;

import htsjdk.samtools.cram.compression.rans.RANS;
import htsjdk.samtools.cram.compression.rans.RANSParams;
import htsjdk.samtools.cram.compression.rans.rans4x8.RANS4x8Decode;
import htsjdk.samtools.cram.compression.rans.rans4x8.RANS4x8Encode;
import htsjdk.samtools.cram.compression.rans.rans4x8.RANS4x8Params;
import htsjdk.samtools.cram.structure.block.BlockCompressionMethod;

import java.nio.ByteBuffer;
import java.util.Objects;

public final class RANSExternalCompressor extends ExternalCompressor {
private final RANS.ORDER order;
private final RANS rans;
private final RANSParams.ORDER order;
private final RANS4x8Encode ransEncode;
private final RANS4x8Decode ransDecode;

/**
* We use a shared RANS instance for all compressors.
* @param rans
*/
public RANSExternalCompressor(final RANS rans) {
this(RANS.ORDER.ZERO, rans);
public RANSExternalCompressor(
final RANS4x8Encode ransEncode,
final RANS4x8Decode ransDecode) {
this(RANSParams.ORDER.ZERO, ransEncode, ransDecode);
}

public RANSExternalCompressor(final int order, final RANS rans) {
this(RANS.ORDER.fromInt(order), rans);
public RANSExternalCompressor(
final int order,
final RANS4x8Encode ransEncode,
final RANS4x8Decode ransDecode) {
this(RANSParams.ORDER.fromInt(order), ransEncode, ransDecode);
}

public RANSExternalCompressor(final RANS.ORDER order, final RANS rans) {
public RANSExternalCompressor(
final RANSParams.ORDER order,
final RANS4x8Encode ransEncode,
final RANS4x8Decode ransDecode) {
super(BlockCompressionMethod.RANS);
this.rans = rans;
this.ransEncode = ransEncode;
this.ransDecode = ransDecode;
this.order = order;
}

@Override
public byte[] compress(final byte[] data) {
final ByteBuffer buffer = rans.compress(ByteBuffer.wrap(data), order);
final RANS4x8Params params = new RANS4x8Params(order);
final ByteBuffer buffer = ransEncode.compress(ByteBuffer.wrap(data), params);
return toByteArray(buffer);
}

@Override
public byte[] uncompress(byte[] data) {
final ByteBuffer buf = rans.uncompress(ByteBuffer.wrap(data));
final ByteBuffer buf = ransDecode.uncompress(ByteBuffer.wrap(data));
return toByteArray(buf);
}

public RANS.ORDER getOrder() { return order; }

@Override
public String toString() {
return String.format("%s(%s)", this.getMethod(), order);
Expand Down Expand Up @@ -96,4 +108,4 @@ private byte[] toByteArray(final ByteBuffer buffer) {
return bytes;
}

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,25 +24,25 @@
*/
package htsjdk.samtools.cram.compression.rans;

final class ArithmeticDecoder {
final FC[] fc = new FC[256];
final public class ArithmeticDecoder {
public final int[] frequencies = new int[Constants.NUMBER_OF_SYMBOLS];

// reverse lookup table ?
byte[] R = new byte[Constants.TOTFREQ];
// reverse lookup table
public final byte[] reverseLookup = new byte[Constants.TOTAL_FREQ];

public ArithmeticDecoder() {
for (int i = 0; i < 256; i++) {
fc[i] = new FC();
for (int i = 0; i < Constants.NUMBER_OF_SYMBOLS; i++) {
frequencies[i] = 0;
}
}

public void reset() {
for (int i = 0; i < 256; i++) {
fc[i].reset();
for (int i = 0; i < Constants.NUMBER_OF_SYMBOLS; i++) {
frequencies[i] = 0;
}
for (int i = 0; i < Constants.TOTFREQ; i++) {
R[i] = 0;
for (int i = 0; i < Constants.TOTAL_FREQ; i++) {
reverseLookup[i] = 0;
}
}

}
}
16 changes: 11 additions & 5 deletions src/main/java/htsjdk/samtools/cram/compression/rans/Constants.java
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
package htsjdk.samtools.cram.compression.rans;

final class Constants {
static final int TF_SHIFT = 12;
static final int TOTFREQ = (1 << TF_SHIFT); // 4096
static final int RANS_BYTE_L = 1 << 23;
}
final public class Constants {
public static final int TOTAL_FREQ_SHIFT = 12;
public static final int TOTAL_FREQ = (1 << TOTAL_FREQ_SHIFT); // 4096
public static final int NUMBER_OF_SYMBOLS = 256;
public static final int RANS_4x8_LOWER_BOUND = 1 << 23;
public static final int RANS_4x8_ORDER_BYTE_LENGTH = 1;
public static final int RANS_4x8_COMPRESSED_BYTE_LENGTH = 4;
public static final int RANS_4x8_RAW_BYTE_LENGTH = 4;
public static final int RANS_4x8_PREFIX_BYTE_LENGTH = RANS_4x8_ORDER_BYTE_LENGTH + RANS_4x8_COMPRESSED_BYTE_LENGTH + RANS_4x8_RAW_BYTE_LENGTH;
public static final int RANS_Nx16_LOWER_BOUND = 1 << 15;
}
Loading
Loading