Skip to content

Commit

Permalink
docs: add vector type documentation
Browse files Browse the repository at this point in the history
Add missing vector type documentation including: definition of vector,
adjustment of term definition, JSON encoding, Lua and cql3 type
mapping, vector dimension limit, and keyword specification.
  • Loading branch information
QuerthDP authored and Jadw1 committed Jan 24, 2025
1 parent 2111b36 commit ec7ae2e
Show file tree
Hide file tree
Showing 9 changed files with 54 additions and 8 deletions.
2 changes: 2 additions & 0 deletions docs/architecture/sstable/sstable3/sstables-3-statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,8 @@ User Type Yes
UTF8 Type No
----------------------- ------------
UUID Type No
----------------------- ------------
Vector Type Yes
======================= ============


Expand Down
2 changes: 2 additions & 0 deletions docs/cql/appendices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ or not.
+--------------------+-------------+
| ``VARINT`` | no |
+--------------------+-------------+
| ``VECTOR`` | no |
+--------------------+-------------+
| ``WHERE`` | yes |
+--------------------+-------------+
| ``WITH`` | yes |
Expand Down
4 changes: 2 additions & 2 deletions docs/cql/definitions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,8 @@ CQL has the notion of a *term*, which denotes the kind of values that CQL suppor
A term is thus one of:

- A :ref:`constant <constants>`.
- A literal for either :ref:`a collection <collections>`, a user-defined type or a tuple
(see the linked sections for details).
- A literal for either :ref:`a collection <collections>` (including usage of list_or_vector_literal
for :ref:`a vector <vectors>`), a user-defined type or a tuple (see the linked sections for details).
- An arithmetic operation between terms.
- A *type hint*
- A bind marker, which denotes a variable to be bound at execution time. See the section on :ref:`prepared-statements`
Expand Down
7 changes: 4 additions & 3 deletions docs/cql/json.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ JSON Encoding of ScyllaDB Data Types

Where possible, ScyllaDB will represent and accept data types in their native ``JSON`` representation. ScyllaDB will
also accept string representations matching the CQL literal format for all single-field types. For example, floats,
ints, UUIDs, and dates can be represented by CQL literal strings. However, compound types, such as collections, tuples,
and user-defined types, must be represented by native ``JSON`` collections (maps and lists) or a JSON-encoded string
representation of the collection.
ints, UUIDs, and dates can be represented by CQL literal strings. However, compound types, such as collections, tuples,
vectors, and user-defined types, must be represented by native ``JSON`` collections (maps and lists) or a JSON-encoded
string representation of the collection.

The following table describes the encodings that ScyllaDB will accept in ``INSERT JSON`` values (and ``fromJson()``
arguments) as well as the format ScyllaDB will use when returning data for ``SELECT JSON`` statements (and
Expand Down Expand Up @@ -101,6 +101,7 @@ arguments) as well as the format ScyllaDB will use when returning data for ``SEL
``varchar`` string string Uses JSON's ``\u`` character escape
``varint`` integer, string integer Variable length; may overflow 32 or 64 bit integers in
client-side decoder
``vector`` list, string list Uses JSON's native list representation
=============== ======================== =============== ==============================================================

The fromJson() Function
Expand Down
1 change: 1 addition & 0 deletions docs/cql/non-reserved-keywords.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Non-reserved keywords only have meaning in their particular area of context and
* VALUES
* VARCHAR
* VARINT
* VECTOR
* WRITETIME

.. include:: /rst_include/apache-copyrights.rst
39 changes: 36 additions & 3 deletions docs/cql/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ CQL is a typed language and supports a rich set of data types, including :ref:`n

.. code-block::
cql_type: `native_type` | `collection_type` | `user_defined_type` | `tuple_type`
cql_type: `native_type` | `collection_type` | `user_defined_type` | `tuple_type` | `vector_type`
Expand Down Expand Up @@ -252,16 +252,18 @@ and their values can be input using collection literals:

.. code-block:: cql
collection_literal: `map_literal` | `set_literal` | `list_literal`
collection_literal: `map_literal` | `set_literal` | `list_or_vector_literal`
map_literal: '{' [ `term` ':' `term` (',' `term` : `term`)* ] '}'
set_literal: '{' [ `term` (',' `term`)* ] '}'
list_literal: '[' [ `term` (',' `term`)* ] ']'
list_or_vector_literal: '[' [ `term` (',' `term`)* ] ']'
Note that neither :token:`bind_marker` nor ``NULL`` are supported inside collection literals.

Note that list_or_vector_literal is used for both :ref:`lists` and :ref:`vectors` as their syntax is the same.

Noteworthy characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -641,6 +643,37 @@ Unlike other "composed" types (collections and UDT), a tuple is always ``frozen<
Also, a tuple literal should always provide values for all the components of the tuple type (some of
those values can be null, but they need to be explicitly declared as so).

.. _vectors:

Vectors
^^^^^^^

A ``vector`` is an array of elements of the same type with a fixed size.
None of the elements stored in a vector can be null.
The vector type and it's respective literal are defined by:

.. code-block::
vector_type: VECTOR '<' `cql_type` ',' `integer` '>'
list_or_vector_literal: '[' [ `term` (',' `term`)* ] ']'
Note that list_or_vector_literal is used for both :ref:`lists` and :ref:`vectors` as their syntax is the same.

Vectors can be used as in the example below::

CREATE TABLE vectors (
id int PRIMARY KEY,
gene_expr vector<float, 3>,
)

INSERT INTO vectors (id, gene_expr) VALUES (180503, [0.2228, 0.2112, 0.2024]);

Similar to tuple type, a vector is always ``frozen<vector>`` (without the need of the `frozen` keyword), and
it is not possible to update only some elements of a vector (without updating the whole vector).

Nevertheless, types stored in a vector are not implicitly frozen, so if you want to store a frozen collection or
frozen UDT in a vector, you need to explicitly wrap them using `frozen` keyword.

.. .. _custom-types:
.. Custom Types
Expand Down
1 change: 1 addition & 0 deletions docs/dev/cql3-type-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
| map | map_type_impl | n/a | org.apache.cassandra.db.marshal.MapType |
| set | set_type_impl | n/a | org.apache.cassandra.db.marshal.SetType |
| tuple | tuple_type_impl | n/a | org.apache.cassandra.db.marshal.TupleType |
| vector | vector_type_impl | n/a | org.apache.cassandra.db.marshal.VectorType |
| UDT | user_type_impl | n/a | org.apache.cassandra.db.marshal.UserType |
| frozen | n/a | n/a | org.apache.cassandra.db.marshal.FrozenType |
| n/a | empty_type_impl | empty_type | org.apache.cassandra.db.marshal.EmptyType |
Expand Down
4 changes: 4 additions & 0 deletions docs/dev/lua-type-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,7 @@ return the values are checked to be true.

Note that, like every other Lua table, the set is underscored. It is
sorted when converting back to CQL.

## VECTOR

A vector is represented the same way as list.
2 changes: 2 additions & 0 deletions docs/reference/limits.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,6 @@ CQL Limits
- Number of keys: 65535 (2^16-1)
* - Blob size
- 2 GB ( less than 1 MB is recommended)
* - Dimension of a vector
- 16000 (according to OpenSearch limitations)

0 comments on commit ec7ae2e

Please sign in to comment.