Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added functionality to write tuples as custom dtypes #63

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.ccls-cache
6 changes: 5 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ endif(COMMAND cmake_policy)

project(CNPY)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17")

option(ENABLE_STATIC "Build static (.a) library" ON)

Expand All @@ -23,6 +23,10 @@ if(ENABLE_STATIC)
install(TARGETS "cnpy-static" ARCHIVE DESTINATION lib)
endif(ENABLE_STATIC)

configure_file("${PROJECT_SOURCE_DIR}/cmake/cnpy-config.cmake.in" "${CMAKE_CURRENT_BINARY_DIR}/cnpy-config.cmake" @ONLY IMMEDIATE)
install(FILES "${PROJECT_BINARY_DIR}/cnpy-config.cmake" DESTINATION "lib/cmake/cnpy-${PROJECT_VERSION}")


install(FILES "cnpy.h" DESTINATION include)
install(FILES "mat2npz" "npy2mat" "npz2mat" DESTINATION bin PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE)

Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Purpose:

NumPy offers the `save` method for easy saving of arrays into .npy and `savez` for zipping multiple .npy arrays together into a .npz file.
NumPy offers the `save` method for easy saving of arrays into .npy and `savez` for zipping multiple .npy arrays together into a .npz file.

`cnpy` lets you read and write to these formats in C++.
`cnpy` lets you read and write to these formats in C++.

The motivation comes from scientific programming where large amounts of data are generated in C++ and analyzed in Python.

Writing to .npy has the advantage of using low-level C++ I/O (fread and fwrite) for speed and binary format for size.
Writing to .npy has the advantage of using low-level C++ I/O (fread and fwrite) for speed and binary format for size.
The .npy file header takes care of specifying the size, shape, and data type of the array, so specifying the format of the data is unnecessary.

Loading data written in numpy formats into C++ is equally simple, but requires you to type-cast the loaded data to the type of your choice.

# Installation:

Default installation directory is /usr/local.
Default installation directory is /usr/local.
To specify a different directory, add `-DCMAKE_INSTALL_PREFIX=/path/to/install/dir` to the cmake invocation in step 4.

1. get [cmake](www.cmake.org)
Expand All @@ -28,20 +28,20 @@ To specify a different directory, add `-DCMAKE_INSTALL_PREFIX=/path/to/install/d
To use, `#include"cnpy.h"` in your source code. Compile the source code mycode.cpp as

```bash
g++ -o mycode mycode.cpp -L/path/to/install/dir -lcnpy -lz --std=c++11
g++ -o mycode mycode.cpp -L/path/to/install/dir -lcnpy -lz --std=c++17
```

# Description:

There are two functions for writing data: `npy_save` and `npz_save`.

There are 3 functions for reading:
- `npy_load` will load a .npy file.
- `npz_load(fname)` will load a .npz and return a dictionary of NpyArray structues.
- `npy_load` will load a .npy file.
- `npz_load(fname)` will load a .npz and return a dictionary of NpyArray structues.
- `npz_load(fname,varname)` will load and return the NpyArray for data varname from the specified .npz file.

The data structure for loaded data is below.
Data is accessed via the `data<T>()`-method, which returns a pointer of the specified type (which must match the underlying datatype of the data).
The data structure for loaded data is below.
Data is accessed via the `data<T>()`-method, which returns a pointer of the specified type (which must match the underlying datatype of the data).
The array shape and word size are read from the npy header.

```c++
Expand Down
2 changes: 2 additions & 0 deletions cmake/cnpy-config.cmake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
set(CNPY_INCLUDE_DIRS @CMAKE_INSTALL_PREFIX@/include)
set(CNPY_LIBRARIES -L@CMAKE_INSTALL_PREFIX@/lib cnpy)
81 changes: 69 additions & 12 deletions cnpy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ void cnpy::parse_npy_header(unsigned char* buffer,size_t& word_size, std::vector
}

//endian, word size, data type
//byte order code | stands for not applicable.
//byte order code | stands for not applicable.
//not sure when this applies except for byte array
loc1 = header.find("descr")+9;
bool littleEndian = (header[loc1] == '<' || header[loc1] == '|' ? true : false);
Expand All @@ -101,9 +101,9 @@ void cnpy::parse_npy_header(unsigned char* buffer,size_t& word_size, std::vector
word_size = atoi(str_ws.substr(0,loc2).c_str());
}

void cnpy::parse_npy_header(FILE* fp, size_t& word_size, std::vector<size_t>& shape, bool& fortran_order) {
void cnpy::parse_npy_header(FILE* fp, size_t& word_size, std::vector<size_t>& shape, bool& fortran_order) {
char buffer[256];
size_t res = fread(buffer,sizeof(char),11,fp);
size_t res = fread(buffer,sizeof(char),11,fp);
if(res != 11)
throw std::runtime_error("parse_npy_header: failed fread");
std::string header = fgets(buffer,256,fp);
Expand Down Expand Up @@ -135,7 +135,7 @@ void cnpy::parse_npy_header(FILE* fp, size_t& word_size, std::vector<size_t>& sh
}

//endian, word size, data type
//byte order code | stands for not applicable.
//byte order code | stands for not applicable.
//not sure when this applies except for byte array
loc1 = header.find("descr");
if (loc1 == std::string::npos)
Expand All @@ -152,6 +152,66 @@ void cnpy::parse_npy_header(FILE* fp, size_t& word_size, std::vector<size_t>& sh
word_size = atoi(str_ws.substr(0,loc2).c_str());
}

void cnpy::parse_npy_header(FILE* fp, std::vector<char> dtype_descr, std::vector<size_t>& shape, bool& fortran_order) {
char buffer[256];
size_t res = fread(buffer,sizeof(char),11,fp);
if(res != 11)
throw std::runtime_error("parse_npy_header: failed fread");
std::string header = fgets(buffer,256,fp);
assert(header[header.size()-1] == '\n');

size_t loc1, loc2;

//fortran order
loc1 = header.find("fortran_order");
if (loc1 == std::string::npos)
throw std::runtime_error("parse_npy_header: failed to find header keyword: 'fortran_order'");
loc1 += 16;
fortran_order = (header.substr(loc1,4) == "True" ? true : false);

//shape
size_t loc = header.find("]");
if(loc == std::string::npos)
throw std::runtime_error("parse_npy_header: failed to find header keyword: ']' signalling end of dtype descriptor");
loc1 = header.find("(",loc);
loc2 = header.find(")",loc);
if (loc1 == std::string::npos || loc2 == std::string::npos)
throw std::runtime_error("parse_npy_header: failed to find header keyword: '(' or ')'");

std::regex num_regex("[0-9][0-9]*");
std::smatch sm;
shape.clear();

std::string str_shape = header.substr(loc1+1,loc2-loc1-1);
while(std::regex_search(str_shape, sm, num_regex)) {
shape.push_back(std::stoi(sm[0].str()));
str_shape = sm.suffix().str();
}

// Only enforces matching dtypes
loc1 = header.find("[");
loc2 = loc;
if (loc1 == std::string::npos || loc2 == std::string::npos)
throw std::runtime_error("parse_npy_header: failed to find header keyword: '[' or ']'"); // Find bounds of dtype

std::string descr = header.substr(loc1,loc2-loc1+1);
int offset_in = 0;
for(int i = 0;i != descr.size();i++){ // Check if found and provided dtype match
if(dtype_descr[i+offset_in] == ' '){ // Possible Out Of Bounds, but only if invalid dtype, or if this loop is buggy/the substr gen is buggy
offset_in++;
i--;
continue;
}
if(descr[i] == ' '){
offset_in--;
continue;
}
if(descr[i] != dtype_descr[i+offset_in])
throw std::runtime_error("Wrong dtyp of .npy file"); // Can only avoid if corrupt file, but then throws error above
// NO error if compatible datatypes (i.e. char and int is <i4 in this framework)
}
}

void cnpy::parse_zip_footer(FILE* fp, uint16_t& nrecs, size_t& global_header_size, size_t& global_header_offset)
{
std::vector<char> footer(22);
Expand Down Expand Up @@ -234,7 +294,7 @@ cnpy::npz_t cnpy::npz_load(std::string fname) {
throw std::runtime_error("npz_load: Error! Unable to open file "+fname+"!");
}

cnpy::npz_t arrays;
cnpy::npz_t arrays;

while(1) {
std::vector<char> local_header(30);
Expand All @@ -252,7 +312,7 @@ cnpy::npz_t cnpy::npz_load(std::string fname) {
if(vname_res != name_len)
throw std::runtime_error("npz_load: failed fread");

//erase the lagging .npy
//erase the lagging .npy
varname.erase(varname.end()-4,varname.end());

//read in the extra field
Expand All @@ -273,7 +333,7 @@ cnpy::npz_t cnpy::npz_load(std::string fname) {
}

fclose(fp);
return arrays;
return arrays;
}

cnpy::NpyArray cnpy::npz_load(std::string fname, std::string varname) {
Expand All @@ -293,15 +353,15 @@ cnpy::NpyArray cnpy::npz_load(std::string fname, std::string varname) {
//read in the variable name
uint16_t name_len = *(uint16_t*) &local_header[26];
std::string vname(name_len,' ');
size_t vname_res = fread(&vname[0],sizeof(char),name_len,fp);
size_t vname_res = fread(&vname[0],sizeof(char),name_len,fp);
if(vname_res != name_len)
throw std::runtime_error("npz_load: failed fread");
vname.erase(vname.end()-4,vname.end()); //erase the lagging .npy

//read in the extra field
uint16_t extra_field_len = *(uint16_t*) &local_header[28];
fseek(fp,extra_field_len,SEEK_CUR); //skip past the extra field

uint16_t compr_method = *reinterpret_cast<uint16_t*>(&local_header[0]+8);
uint32_t compr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+18);
uint32_t uncompr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+22);
Expand Down Expand Up @@ -335,6 +395,3 @@ cnpy::NpyArray cnpy::npy_load(std::string fname) {
fclose(fp);
return arr;
}



Loading