From be7a052647ae989b402e28879fcf33b275a955e5 Mon Sep 17 00:00:00 2001 From: Amanda Lindsay Date: Thu, 5 Dec 2024 14:57:05 +0000 Subject: [PATCH] Doc 264 glossary (#1393) Updated and expanded the main HZC Platform glossary with the Flow glossary, and also relevant terms, abbreviations and acronyms from the glossary on the main HZC web site. Note - I haven't included the backport label because Flow is recent so please advise. --- docs/modules/ROOT/pages/glossary.adoc | 255 +++++++++++++++++++++++--- 1 file changed, 234 insertions(+), 21 deletions(-) diff --git a/docs/modules/ROOT/pages/glossary.adoc b/docs/modules/ROOT/pages/glossary.adoc index ffa5c3262..304ddef73 100644 --- a/docs/modules/ROOT/pages/glossary.adoc +++ b/docs/modules/ROOT/pages/glossary.adoc @@ -1,35 +1,248 @@ = Glossary [glossary] -2-phase commit:: An atomic commitment protocol for distributed systems. It consists of two phases: commit-request and commit. In commit-request phase, transaction manager coordinates all the transaction resources to commit or abort. In commit-phase, transaction manager decides to finalize operation by committing or aborting according to the votes of each transaction resource. -ACID:: A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing that transactions are processed reliably. Atomicity requires that each transaction be all or nothing, i.e., if one part of the transaction fails, the entire transaction fails). Consistency ensures that only valid data following all rules and constraints is written. Isolation ensures that transactions are securely and independently processed at the same time without interference (and without transaction ordering). Durability means that once a transaction has been committed, it will remain so, no matter if there is a power loss, crash, or error. +2-phase commit:: An atomic commitment protocol for distributed systems. It consists of two phases: commit-request and commit. +In commit-request phase, transaction manager coordinates all the transaction resources to commit or abort. +In commit-phase, transaction manager decides to finalize operation by committing or aborting according to the votes of each transaction resource. + +ACID:: A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing that transactions are processed reliably. +Atomicity requires that each transaction be all or nothing, i.e., if one part of the transaction fails, the entire transaction fails. +Consistency ensures that only valid data following all rules and constraints is written. +Isolation ensures that transactions are securely and independently processed at the same time without interference (and without transaction ordering). +Durability means that once a transaction has been committed, it will remain so, no matter if there is a power loss, crash, or error. + +BFF:: Backend-for-frontend. A web development architectural pattern that creates a dedicated backend for each frontend application. + Cache:: A high-speed access area that can be either a reserved section of main memory or a storage device. -Change data capture (CDC):: A <> pattern for observing changes made to a database and extracting them in a form usable by other systems, for the purposes of replication, analysis and more. -Client/server topology:: Hazelcast topology where members run outside the user application and are connected to clients using client libraries. The client library is installed in the user application. -[[data-pipeline]] -Data pipeline:: A series of actions that ingest data from one or more sources and move it to a destination for storage and analysis. + +Cache access patterns:: Strategies for optimizing memory access to improve performance and reduce latency. +The most common access patterns are read-through, write-through, and write-behind. + +Caching:: Often referred to as memory caching, this is a technique that allows computer applications to temporarily store data in a computer's memory for quick retrieval. + +Cache miss:: An event in which a system or application makes a request to retrieve data from a cache, but that specific data is not currently in cache memory. + +CDC:: Change data capture is a <> pattern for observing changes made to a database and extracting them in a form +usable by other systems, for the purposes of replication, analysis and more. + +CEP:: Complex event processing is a set of techniques for capturing and analyzing streams of data as they arrive to identify opportunities + or threats in real time. + +CLC:: Hazelcast Command Line Client, used to connect to and interact with clusters on Hazelcast Cloud and Hazelcast Platform direct from the command line or through scripts. + +CLI:: Command-line interface. + +Client/server topology:: Hazelcast topology where members run outside the user application and are connected to clients using client libraries. +The client library is installed in the user application. + +Cloud-native architecture:: An architecture in which an application is designed specifically for cloud deployment (in contrast to “lift-and-shift”, where applications are moved to the cloud unchanged). + +DaaS:: Data as a Service is a data management strategy and/or deployment model that focuses on the cloud (public or private) +to deliver a variety of data-related services such as storage, processing, and analytics. + +Data-at-rest:: Data that is stored in a digital form on a physical device, which is not being actively accessed or used. + +Data grid:: A set of computers that directly interact with each other to coordinate the processing of large jobs. + +Data-in-motion:: A term that refers to digital data sets that are currently being moved over a network from one location to another. + +DAG:: A directed acyclic graph is a conceptual representation of a series of activities depicted by a graph. + +[[data-pipeline]]Data pipeline:: A series of actions that ingest data from one or more sources and move it to a destination for storage and analysis. + +Data source:: Where data originates from — for example, in the case of Hazelcast <>, the databases, APIs, and messages that fuel it. + +Deserialization:: The process of reconstructing a data structure or object from a series of bytes or a string in order to instantiate the object for consumption. + +DIH:: A digital integration hub is a software architecture for a data layer that provides centralized access to a collection of data from +disparate sources. + +Distributed cache:: A system that pools together the random-access memory (RAM) of multiple networked computers into a single in-memory data +store used as a data cache to provide fast access to data. + +Distributed computing:: Sometimes called distributed processing, this is the technique of linking together multiple computer servers over a network into a cluster, to share data and to coordinate processing power. + +Distributed hash table:: A decentralized data store that looks up data based on key-value pairs. + +Distributed transaction:: A set of operations on data that is performed across two or more data repositories (especially databases). + +EDA:: Event-driven architecture is a modern software design approach centered around data that describes events—selection of a button +on a user interface (UI), the addition of an item to an online shopping cart, notification of payment on a point of sale (POS) system, +etc.—in real-time and enables applications to act on them as they occur. + +Edge computing:: Sometimes called IoT edge processing, this refers to taking action on data as near to the source as possible rather +than in a central, remote data center, to reduce latency and bandwidth use. + Embedded topology:: Hazelcast topology where the members are in-process with the user application and act as both client and server. -Extract transform load (ETL):: A <> pattern for collecting data from various sources, transforming (changing) it to conform to some rules, and loading it into a sink. -Garbage collection:: The recovery of storage that is being used by an application when that application no longer needs the storage. This frees the storage for use by other applications (or processes within an application). It also ensures that an application using increasing amounts of storage does not reach its quota. Programming languages that use garbage collection are often interpreted within virtual machines like the JVM. The environment that runs the code is also responsible for garbage collection. + +ESP:: Event stream processing is the practice of taking action on a series of data points that originate from a system that + continuously creates data. The term “event” refers to each data point in the system, and “stream” refers to the ongoing delivery of those events. + +ETL:: Extract transform load is a <> pattern for collecting data from various sources, transforming (changing) it to conform to some rules, and loading it into a sink. + +[[flow]]Flow:: Hazelcast Flow is a data gateway that automates the integration of microservices across an enterprise. +Flow accelerates application development by connecting multiple data sources and APIs together, without having to write integration code. + +Garbage collection:: The recovery of storage that is being used by an application when that application no longer needs the storage. +This frees the storage for use by other applications (or processes within an application). +It also ensures that an application using increasing amounts of storage does not reach its quota. +Programming languages that use garbage collection are often interpreted within virtual machines like the JVM. +The environment that runs the code is also responsible for garbage collection. + +Grid computing:: The practice of leveraging multiple computers, often geographically distributed but connected by networks, +to work together to accomplish joint tasks. + Hazelcast cluster:: A virtual environment formed by Hazelcast members communicating with each other in the cluster. -Hazelcast partition:: Memory segments containing the data. Hazelcast is built-on the partition concept, it uses partitions to store and process data. Each partition can have hundreds or thousands of data entries depending on your memory capacity. You can think of a partition as a block of data. In general and optimally, a partition should have a maximum size of 50-100 Megabytes. -IMDG:: An in-memory data grid (IMDG) is a data structure that resides entirely in memory and is distributed among many members in a single location or across multiple locations. IMDGs can support thousands of in-memory data updates per second and they can be clustered and scaled in ways that support large quantities of data. -Java heap:: Java heap is the space that Java can reserve and use in memory for dynamic memory allocation. All runtime objects created by a Java application are stored in heap. By default, the heap size is 128 MB, but this limit is reached easily for business applications. Once the heap is full, new objects cannot be created and the Java application shows errors. -[[job]] -Job:: A <> that's packaged and submitted to a cluster member to run. + +Hazelcast partition:: Memory segments containing the data. Hazelcast is built on the partition concept, and uses partitions to store and process data. +Each partition can have hundreds or thousands of data entries depending on your memory capacity. +You can think of a partition as a block of data. In general and optimally, a partition should have a maximum size of 50-100 megabytes. + +Hibernate second-level cache:: One of the data caching components available in the Hibernate object-relational mapping (ORM) library. +Hibernate is a popular ORM library for the Java language, and it lets you store your Java object data in a relational database management system (RDBMS). + +IaaS:: Infrastructure as a Service is a cloud-based service offering in which the vendor provides compute, +network and storage resources and the customer provides the operating system and application software. + +IaC:: Infrastructure as Code. A method of managing and provisioning IT infrastructure using code, rather than manual processes. + +IMDB:: In-memory database. A computer system that stores and retrieves data records that reside in a computer’s main memory, +e.g., random-access memory (RAM). + +IMDG:: An in-memory data grid (IMDG) is a data structure that resides entirely in memory and is distributed among many members in a single location + or across multiple locations. IMDGs can support thousands of in-memory data updates per second and they can be clustered and scaled in ways that support large quantities of data. + +Inference runner:: A component in large-scale software systems that lets you plug in machine learning (ML) algorithms (or “models”) to deliver data into those algorithms and calculate outputs. + +In-memory computation:: Also called in-memory computing, this is the technique of running computer calculations entirely in computer memory (e.g., in RAM). + +In-memory processing:: The practice of taking action on data entirely in computer memory (e.g., in RAM). + +Java heap:: The space that Java can reserve and use in memory for dynamic memory allocation. +All runtime objects created by a Java application are stored in heap. By default, the heap size is 128 MB, but this limit is reached easily for business applications. +Once the heap is full, new objects cannot be created and the Java application shows errors. + +Java microservices:: A set of software applications written in the Java programming language (and which typically leverage the vast ecosystem of Java tools and frameworks), +designed for a limited scope, that work with each other to form a bigger solution. + +JCache/Java cache:: A de facto standard Java cache API for caching data. + +[[job]]Job:: A <> that's packaged and submitted to a cluster member to run. + +JWT:: JSON Web Token, an open standard for transmitting information securely between parties as a JSON object. + +K8s:: Kubernetes. An open source system that manages and deploys containerized applications. + +Kappa:: The Kappa Architecture is a software architecture used for processing <>. + +Key-value store:: A type of data storage software program that stores data as a set of unique identifiers, each of which have an associated value. + +Lambda architecture:: A deployment model for data processing that organizations use to combine a traditional batch pipeline with a fast real-time stream pipeline for data access. + Least recently used (LRU):: A cache eviction algorithm where entries are eligible for eviction due to lack of interest by applications. + Least frequently used (LFU):: A cache eviction algorithm where entries are eligible for eviction due to having the lowest usage frequency. -[[lite-member]] -Lite member:: A member that does not store data and has no partitions. These members are often used to execute tasks and register listeners. -Member:: A Hazelcast instance. Depending on your Hazelcast topology, it can refer to a server or a Java virtual machine (JVM). Members belong to a Hazelcast cluster. Members may also be referred as member nodes, cluster members, Hazelcast members, or data members. -Multicast:: A type of communication where data is addressed to a group of destination members simultaneously. + +[[lite-member]]Lite member:: A member that does not store data and has no partitions. These members are often used to execute tasks and register listeners. + +Machine learning (ML) inference:: The process of running live data points into a machine learning algorithm (or “ML model”) to calculate an output such as a single numerical score. + +Management Center:: A tool for managing and monitoring Hazelcast Platform clusters. + +Member:: A Hazelcast instance. Depending on your Hazelcast topology, it can refer to a server or a Java virtual machine (JVM). +Members belong to a Hazelcast cluster. Members may also be referred as member nodes, cluster members, Hazelcast members, or data members. + +Micro-batch processing:: The practice of collecting data in small groups (“batches”) for the purposes of taking action on (processing) that data. + +[[microservices]]Microservices:: A set of software applications designed for a limited scope that work with each other to form a bigger solution. + +Microservices architecture:: A software architecture approach in which a set of software applications designed for a limited scope, +known as <>, work together to form a bigger solution. + +mTLS:: Mutual authentication. A method that ensures the authenticity of the parties at each end of a network connection. + +Multicast:: A type of communication in which data is sent to a defined group of destination members simultaneously (one to many). +Distinct from unicast (one to one) and broadcast (one to all). + +Mutation:: In Hazelcast <>, mutation queries make changes; for example, in performing a task or updating a record. + Near cache:: A caching model where an object retrieved from a remote member is put into the local cache and the future requests made to this object will be handled by this local member. -NoSQL:: "Not Only SQL". A database model that provides a mechanism for storage and retrieval of data that is tailored in means other than the tabular relations used in relational databases. It is a type of database which does not adhering to the traditional relational database management system (RDMS) structure. It is not built on tables and does not employ SQL to manipulate data. It also may not provide full ACID guarantees, but still has a distributed and fault-tolerant architecture. -OSGI:: Formerly known as the Open Services Gateway initiative, it describes a modular system and a service platform for the Java programming language that implements a complete and dynamic component model. + +NoSQL:: "Not Only SQL". A database model that provides a mechanism for storage and retrieval of data that is tailored in means other than the tabular relations used in relational databases. +It is a type of database which does not adhere to the traditional relational database management system (RDMS) structure. +It is not built on tables and does not employ SQL to manipulate data. +It also may not provide full ACID guarantees, but still has a distributed and fault-tolerant architecture. + +OIDC:: OpenID Connect provider. + +OOME:: Out of Memory Error. + +Operator:: Hazelcast Platform Operator simplifies working with Hazelcast clusters on Kubernetes and Red Hat OpenShift by eliminating the need for manual deployment and life-cycle management. + +OSGI:: Formerly known as the Open Services Gateway initiative, it describes a modular system and a service platform for the Java programming language +that implements a complete and dynamic component model. + +PaaS:: Platform as a Service is a cloud-based service offering in which the vendor provides hardware resources (as in IaaS), operating systems and management tools, +and the customer provides the application software. + Partition table:: Table containing all members in the cluster, mappings of partitions to members and further metadata. + +PKCE:: Proof Key for Code Exchange. An extension used in OAuth 2.0 to improve security for public clients. + +Projection:: In Hazelcast <>, projections are a way of taking data from one place, and then transforming and combining it with other data sources. + +Publish/subscribe:: A software architecture model by which applications create and share data. Pub/sub is particularly popular in serverless and microservices architectures. + +Query:: A request built using Hazelcast <>'s ability to retrieve and analyze data from different sources across an ecosystem. +For example, a query might combine three services together: a database, API and Kafka streaming data. + Race condition:: This condition occurs when two or more threads can access shared data and they try to change it at the same time. -RSA:: An algorithm developed by Rivest, Shamir and Adleman to generate, encrypt and decrypt keys for secure data transmissions. -Serialization:: Process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization. + +Real-time database:: A data store designed to collect, process, and/or enrich an incoming series of data points (i.e., a data stream) in real time, typically immediately after the data is created. + +Real-time machine learning:: The process of training a machine learning model by running live data through it, to continuously improve the model. + +Real-time stream processing:: The process of taking action on data at the time the data is generated or published. + +RSA:: An algorithm to generate, encrypt and decrypt keys for secure data transmissions. + +SAML:: Security Assertion Markup Language identity provider (IdP) authenticates users and passes authentication data to a service provider. + +Semantic data type:: A method of encoding data that allows software to discover and map data based upon its meaning rather than its structure. + +Serialization:: A process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. +Its main purpose is to save the state of an object in order to be able to recreate it when needed. + +Sharding:: The practice of optimizing database management systems by separating the rows or columns of a larger database table into multiple smaller tables. + Snapshot:: A distributed map that contains the saved state of a <> computations. + Split-brain:: A state in which a cluster of members gets divided (or partitioned) into smaller clusters of members, each of which believes it is the only active cluster. + +SSE:: Server-sent events. + +[[streaming-data]]Streaming data:: Also known as real-time data, event data, stream data processing, or data-in-motion, this refers to a continuous flow of information +generated by various sources, such as sensors, applications, social media, or other digital platforms. + +Streaming database:: A data store designed to collect, process, and/or enrich an incoming series of data points (i.e., a data stream) in real time, typically immediately after the data is created. + +Streaming ETL (Extract, Transform, Load):: The processing and movement of real-time data from one place to another. + +[[taxi]] +TaxiQL:: In Hazelcast <>,Taxi is a simple query language for describing how data and APIs across an ecosystem relate to one another. + +Taxonomy:: The practice of classifying and categorizing data. + +Time to live (TTL):: A value that determines how long data is retained, before it is discarded from internal cache. + Transaction:: A sequence of information exchange and related work (such as data store updating) that is treated as a unit for the purposes of satisfying a request and for ensuring data store integrity. + +TSDB:: A time-series database is a computer system that is designed to store and retrieve data records that are part of a “time series,” which is a set of data points that are associated with timestamps. + +Vector search:: An advanced information retrieval method that allows systems to go beyond highly organized, quantitative structured data, +and capture the context and semantic meaning of qualitative unstructured data that doesn't follow conventional models, including multimedia, +textual, geospatial, and Internet of Things (IoT) data. + +Workspace:: A Hazelcast <> Workspace is a collection of schemas, API specs and <> projects that describe data sources and provide a description of the data and capabilities they provide. + +WSDL:: Web Services Description Language.