The DTensor
class is for manipulating data on a GPU.
It manages their memory and facilitates various algebraic operations.
A tensor has three axes: [rows (m) x columns (n) x matrices (k)]
.
An (m,n,1)-tensor stores a matrix, and an (m,1,1)-tensor stores a vector.
We first need to decide on a data type between float
or double
.
We will use float
in the following examples.
The simplest way to create an empty DTensor
object is by constructing a vector:
size_t n = 100;
DTensor myTensor(n);
Important
This creates an n-dimensional vector as an (n,1,1)-tensor on the device.
A DTensor
can be instantiated from host memory:
std::vector<float> h_a{4., -5., 6., 9., 8., 5., 9., -10.2, 9., 11.};
DTensor<float> myTensor(h_a, h_a.size());
std::cout << myTensor << "\n";
Caution
Printing a DTensor
to std::cout
will slow down your program
(it requires the data to be downloaded from the device).
Printing was designed for quick debugging.
We will often need to create slices (or shallow copies) of a DTensor
given a range of values. We can then do:
size_t axis = 0; // rows=0, cols=1, mats=2
size_t from = 3;
size_t to = 5;
DTensor<float> mySlice(myTensor, axis, from, to);
std::cout << mySlice << "\n";
Sometimes we need to reuse an already allocated DTensor
by uploading
new data from the host by using the method upload
. Here is a short example:
std::vector<float> h_a{1., 2., 3.}; // host data a
DTensor<float> myVec(h_a, 3); // create vector in tensor on device
std::vector<float> h_b{4., -5., 6.}; // host data b
myVec.upload(h_b);
std::cout << myVec << "\n";
We can upload some host data to a particular position of a DTensor
as follows:
std::vector<float> hostData{1., 2., 3.};
// here, `true` tells the constructor to set all allocated elements to zero
DTensor<float> x(7, 1, 1, true); // x = [0, 0, 0, 0, 0, 0, 0]'
DTensor<float> mySlice(x, 0, 3, 5);
mySlice.upload(hostData);
std::cout << x << "\n"; // x = [0, 0, 0, 1, 2, 3, 0]'
If necessary, the data can be downloaded from the device to the host using
download
.
Very often we will also need to copy data from an existing DTensor
to another DTensor
(without passing through the host).
To do this we can use deviceCopyTo
. Here is an example:
DTensor<float> x(10);
DTensor<float> y(10);
x.deviceCopyTo(y); // x ---> y (device memory to device memory)
The copy constructor has also been implemented; to hard-copy a DTensor
just
do DTensor<float> myCopy(existingTensor)
.
Lastly, a not so efficient method that should only be used for
debugging, if at all, is the ()
operator (e.g., x(i, j, k)
), which fetches
one element of the DTensor
to the host.
This cannot be used to set a value, so don't do anything like x(0, 0, 0) = 4.5
!
Caution
For the love of god, do not put this ()
operator in a loop.
The following scalar quantities can be computed (internally,
we use cublas
functions):
-
.normF()
: the Frobenius norm of a tensor$x$ , usingnrm2
(i.e., the 2-norm, or Euclidean norm, if$x$ is a vector) -
.sumAbs()
: the sum of the absolute of all the elements, usingasum
(i.e., the 1-norm if$x$ is a vector)
We can element-wise add DTensor
s on the device as follows:
std::vector<float> host_x{1., 2., 3., 4., 5., 6., 7.};
std::vector<float> host_y{1., 3., 5., 7., 9., 11., 13.};
DTensor<float> x(host_x, host_x.size());
DTensor<float> y(host_y, host_y.size());
x += y; // x = [2, 5, 8, 11, 14, 17, 20]'
std::cout << x << "\n";
To element-wise subtract y
from x
we can use x -= y
.
We can also scale a DTensor
by a scalar with *=
(e.g, x *= 5.0f
).
To negate the values of a DTensor
we can do x *= -1.0f
.
We can also compute the inner product (as a (1,1,1)-tensor) of two vectors as follows:
std::vector<float> host_x{1., 2., 3., 4., 5., 6., 7.};
std::vector<float> host_y{1., 3., 5., 7., 9., 11., 13.};
DTensor<float> xtr(host_x, 1, host_x.size()); // column vector
DTensor<float> y(host_y, host_y.size()); // row vector
DTensor<float> innerProduct = x * y;
If necessary, we can also use the following element-wise operations
DTensor<float> x(host_x, host_x.size()); // row vector
auto sum = x + y;
auto diff = x - y;
auto scaledX = 3.0f * x;
To store a matrix in a DTensor
we need to provide the data in an array;
we can use either column-major (default) or row-major format.
TODO implement row-major
Suppose we need to store the matrix
where this data is stored in row-major format. Then, we do
size_t rows = 5;
size_t cols = 3;
std::vector<float> h_data{1.0f, 2.0f, 3.0f,
4.0f, 5.0f, 6.0f,
7.0f, 8.0f, 9.0f,
10.0f, 11.0f, 12.0f,
13.0f, 14.0f, 15.0f};
DTensor<float> myTensor(h_data, rows, cols, 1, rowMajor);
Choose rowMajor
or columnMajor
as appropriate.
We can also preallocate memory for a DTensor
as follows:
DTensor<float> a(rows, cols, 1);
Then, we can upload the data as follows:
a.upload(h_data, rowMajor);
The copy constructor has also been implemented;
to hard-copy a vector just do
DTensor<float> myCopy(existingTensor)
.
The number of rows and columns of a DTensor
can be
retrieved using the methods .numRows()
and .numCols()
respectively.
The operators +=
are -=
supported for device matrices.
Matrix-matrix multiplication is as simple as:
size_t m = 2, k = 3, n=5;
std::vector<float> aData{1.0f, 2.0f, 3.0f,
4.0f, 5.0f, 6.0f};
std::vector<float> bData{1.0f, 2.0f, 3.0f, 4.0f, 5.0f,
6.0f, 7.0f, 8.0f, 9.0f, 10.0f,
11.0f, 12.0f, 13.0f, 14.0f, 15.0f};
DTensor<float> A(aData, m, k, 1, rowMajor);
DTensor<float> B(bData, k, n, 1, rowMajor);
auto X = A * B;
std::cout << A << B << X << "\n";
As you would expect, all operations mentioned so far are supported by actual tensors as batched operations (that is, (m,n)-matrix-wise).
Also, we can create the transposes of a DTensor
using .tr()
.
This transposes each (m,n)-matrix and stores it in a new DTensor
at the same k-index.
Transposition in-place is not possible.
The solution of least squares has been implmented as a tensor method.
Say we want to solve A\b
using least squares.
We first create
size_t m = 4;
size_t n = 3;
std::vector<float> aData{1.0f, 2.0f, 4.0f,
2.0f, 13.0f, 23.0f,
4.0f, 23.0f, 77.0f,
6.0f, 7.0f, 8.0f};
std::vector<float> bData{1.0f, 2.0f, 3.0f, 4.0f};
DTensor<float> A(aData, m, n, 1, rowMajor);
DTensor<float> B(bData, m);
Then, we can solve the system by
A.leastSquaresBatched(B);
The DTensor
B
will be overwritten with the solution.
Important
This particular example demonstrates how the solution may
overwrite only part of the given B
, as B
is a
(4,1,1)-tensor and the solution is a (3,1,1)-tensor.
Tensor data can be stored in simple text files or binary files. The text-based format has the following structure
number_of_rows
number_of_columns
number_of_matrices
data (one entry per line)
To save a tensor in a file, simply call DTensor::saveToFile(filename)
.
If the file extension is .bt
(binary tensor), the data will be stored in binary format.
The structure of the binary encoding is similar to that of the text encoding:
the first three uint64_t
-sized positions correspond to the number of rows, columns
and matrices, followed by the elements of the tensor.
To load a tensor from a file, the static function DTensor<T>::parseFromFile(filename)
can be used. For example:
auto z = DTensor<double>::parseFromFile("path/to/my.dtensor")
If necessary, you can provide a second argument to parseFromFile
to specify the order in which the data are stored (the StorageMode
).
Soon we will release a Python API for reading and serialising (numpy) arrays to .bt
files.
Warning
This factorisation only works with positive-definite matrices.
Here is an example:
This is how to perform a Cholesky factorisation:
size_t n = 3;
std::vector<float> aData{1.0f, 2.0f, 4.0f,
2.0f, 13.0f, 23.0f,
4.0f, 23.0f, 77.0f};
DTensor<float> A(aData, n, n, 1, rowMajor);
CholeskyFactoriser<float> cfEngine(A);
status = cfEngine.factorise();
Then, you can solve the system A\b
std::vector<float> bData{1.0f, 2.0f, 3.0f};
DTensor<float> B(bData, n);
cfEngine.solve(B);
The DTensor
B
will be overwritten with the solution.
Warning
This implementation only works with square or tall matrices.
Here is an example with the 4-by-3 matrix
Evidently, the rank of
This is how to perform an SVD decomposition:
size_t m = 4;
size_t n = 3;
std::vector<float> bData{1.0f, 2.0f, 3.0f,
6.0f, 7.0f, 8.0f,
6.0f, 7.0f, 8.0f,
6.0f, 7.0f, 8.0f};
DTensor<float> B(bData, m, n, 1, rowMajor);
SvdFactoriser<float> svdEngine(B);
status = svdEngine.factorise();
By default, SvdFactoriser
will not compute matrix SvdFactoriser
as follows
SvdFactoriser<float> svdEngine(B, true); // computes U
Note that the default behaviour of .factorise()
is to destroy
the given matrix false
.
After you have factorised the matrix, you can access
std::cout << "S = " << svdEngine.singularValues() << "\n";
std::cout << "V' = " << svdEngine.rightSingularVectors() << "\n";
Note that .leftSingularVectors()
which returns an object
of type std::optional<DeviceMatrix<TElement>>
.
Here is an example:
auto U = svdEngine.leftSingularVectors();
if (U) std::cout << "U = " << U.value();
The nullspace of a matrix is computed by SVD.
The user provides a DTensor
made of (padded) matrices.
Then, Nullspace
computes, possibly pads, and returns the
nullspace matrices N = (N1, ..., Nk)
in another DTensor
.
DTensor<float> paddedMatrices(m, n, k);
Nullspace N(paddedMatrices); // computes N and NN'
DTensor<float> ns = N.nullspace(); // returns N
Each padded nullspace matrix Ni
is orthogonal,
and Nullspace
further computes and stores the
nullspace projection operators NN' = (N1N1', ..., NkNk')
.
This allows the user to project-in-place onto the nullspace.
DTensor<float> vectors(m, 1, k);
N.project(vectors);
std::cout << vectors << "\n";