Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element-wise BLAS APIs & new Tensor for Python: ⬆️ 450 kernels #220

Open
wants to merge 68 commits into
base: main
Choose a base branch
from

Conversation

ashvardanian
Copy link
Owner

@ashvardanian ashvardanian commented Oct 31, 2024

It started as a straightforward optimization request from the @albumentations-team: to improve the special case of the wsum (Weighted Sum) operation for the "non-weighted" scenario and to add APIs for scalar multiplication and addition. This update introduces new public APIs in both C and Python:

  1. scale: Implements $\alpha * A_i + \beta$
  2. sum: Computes $A_i + B_i$

Recognizing the value of consistency with widely-used libraries, we’ve also added "aliases" aligned with names familiar to developers using NumPy and OpenCV for element-wise addition and multiplication across vectors and scalars:

NumPy OpenCV SimSIMD
np.add cv.add simd.add
np.multiply cv.multiply simd.multiply

Note: SimSIMD and NumPy differ in handling certain corner cases. SimSIMD offers broader support, with up to 64 tensor dimensions (compared to NumPy’s 32), wider compatibility with Python versions, operating systems, hardware, and numeric types—and of course, greater speed! However, SimSIMD requires input vectors to be of identical types. For integers, it also supports saturation to prevent overflow/underflow, which can simplify debugging but may be unexpected for some developers.

The real excitement came when we realized that larger projects would take time to adopt emerging numeric types like bfloat16 and float8, which are well-known in AI circles. To bridge this gap, SimSIMD now introduces an AnyTensor type designed for maximum interoperability via CPython's Buffer Protocol and beyond, setting it apart from similar types in NumPy, PyTorch, TensorFlow, and JAX.

Tensor Class for C, Python, and Rust 🦀

Element-wise Operations 🧮

Geospatial Operations 🛰️


If you have any feedback regarding the limitations of current array-processing software in a single- or multi-node AI training settings, I am all ears 👂

SimSIMD becomes more similar to BLAS with every commit! New operations are:

- Element-wise Sum: `a[i] + b[i]`
- Scale & Shift: `alpha * a[i] + beta`

Those are similar to `axpy` and `scal`
in BLAS.
@ashvardanian ashvardanian changed the title Element-wise BLAS-like APIs Element-wise BLASAPIs & new Tensor for Python Nov 1, 2024
@ashvardanian ashvardanian changed the title Element-wise BLASAPIs & new Tensor for Python Element-wise BLAS APIs & new Tensor for Python Nov 1, 2024
ashvardanian and others added 8 commits November 1, 2024 23:11
Without defining the executables as tests automatic
tools like "ctest" will not find tests.
New classes are added to the Python SDK:
public NDArray and NDIndex and internal
BufferOrScalarArgument. Those can be used
for high-dimensional tensor processing with
up to 64 dimensions, as opposed to 32 in
NumPy.

The new interface handles mixed-precision
much better, allowing to override the type
spec of every tensor individually. That feature
is in preview until the actual implementation
in subsequent commits.

No LibC is needed anymore for rounding floats
to 8-bit integers in elementwise ops. New 16-,
32-, and 64-bit integer element-wise kernels
are added for compatibility with NumPy. Serial
for now.

SimSIMD implementation is supposed to be
more efficient when at least a few continuous
dimensions are present.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants