Neo4j的存储结构 #5

GarinZ · 2021-02-07T09:42:54Z

数据类型

在概念上Neo4j有两个实体概念：Node和Relationship

Node - 节点

类似于Entity，包含：

Property：KV对，用于存储数据
Label：用于标记类型

Relationship - 关系

First Node：起始节点
Second Node：终止节点
Label：标签
Property：属性

逻辑存储结构

免索引邻接(Index Free Adjacency)，在逻辑上可以看做邻接表(Adjacency List)

物理存储结构

Neo4j用于持久化的文件列表如下所示：

Store File                        | Record size   | Contents
----------------------------------------------------------------------------------------------------------------------------
neostore.nodestore.db             | 15 B          | Nodes
neostore.relationshipstore.db     | 34 B          | Relationships
neostore.propertystore.db         | 41 B          | Properties for nodes and relationships
neostore.propertystore.db.strings | 128 B         | Values of string properties
neostore.propertystore.db.arrays  | 128 B         | Values of array properties
Indexed Property                  | 1/3 * AVG(X)  | Each index entry is approximately 1/3 of the average property value size

接下来我们分别看一看每个文件中存储的格式。理解这些格式非常重要，因为它决定了Neo4j的数据结构和查询效率，如果你希望能够回答出“Neo4j是如何执行一次查询的？它底层数据是怎么存储的？”就必须理解它的存储格式。

Node - 9Byte

存储Node的数据，节点是以单链表形式存储的
存储文件：neostore.nodestore.db
格式：Node:inUse+nextRelId+nextPropId
格式含义：

inUse(1Byte)：是否删除
nextRelId(4Byte): 下一个Relationsh ID
nextPropId(4Byte)：下一个Property ID

Relationship - 33Byte

关系是以双链表的形式存储的
存储文件：neostore.relationshipstore.db
格式：Relationship: inUse+firstNode+secondNode+relType+firstPrevRelId+fristNextRelId+secondPrevRelId+secondNextRelId+nextPropId
格式含义：

inUse(1Byte)：是否删除
firstNode(4Byte)：起始节点
secondNode(4Byte)：终止节点
relType(4Byte)：关系类型
firstPrevRelId(4Byte) & firstNextRelId(4Byte)：起始节点前一个和后一个关系的id
secondPrevRelId(4Byte) & secondNextRelId(4Byte)：终止节点前一个和后一个关系的id

Property - 37Byte

Source：Understanding Neo4j’s data on disk

实际的属性如何存储？

每个Property可以存储4个8Byte的propBlock，每个propBlock可以包含key或value或两种都有。如果一个Node的记录。Property之间以单链表形式关联。
Key和Type占用3.5Byte(key-4bit, type-24bit)，Key的字面量存储在"Indexed Property"
Value的占用不固定：

boolean, byte, short, int, char, float - KV存储在同一个block中
small long - KV存储在同一个Block中
big long或double - Key占一个Block，Value占一个Block
String/Array的引用 - KV在一个Block
short string或short array - Key存在Block中，如果剩下的Block空间能放下就放在一个Block中，放不下就存指向"动态存储文件 - 128Byte"的指针
long string或long array，8Byte放不进去就存储一个指向"动态存储文件 - 128Byte"的指针

存储文件：neostore.propertystore.db
格式：Property: inUse+propBlock+propBlock+propBlock+propBlock+nextPropId
格式含义：

inUse(1Byte)：是否删除
propBlock(8Byte)*4：属性块，存储key或value或都存
nextPropId(4Byte)：下一个属性ID

动态存储文件 - 128Byte

用于存储String和Array这种很长的数据结构，可以由多个块组成
存储文件：

neostore.propertystore.db.strings
neostore.propertystore.db.arrays

FAQ

Q：如何根据ID查找链表下一个节点？
A：因为数据结构定长，因此可以以O(1)复杂度计算下一个元素的位置。比如ID=100，下一个元素的数据起始点在900Byte的位置。

Q：为什么数据类型要以定长格式存储？
A：便于根据ID * Length快速查找

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neo4j的存储结构 #5

Neo4j的存储结构 #5

GarinZ commented Feb 7, 2021

Neo4j的存储结构 #5

Neo4j的存储结构 #5

Comments

GarinZ commented Feb 7, 2021

数据类型

Node - 节点

Relationship - 关系

逻辑存储结构

物理存储结构

Node - 9Byte

Relationship - 33Byte

Property - 37Byte

实际的属性如何存储？

动态存储文件 - 128Byte

FAQ