diff --git a/cn/docs/_print/index.html b/cn/docs/_print/index.html index 71b06e8b9..db0102541 100644 --- a/cn/docs/_print/index.html +++ b/cn/docs/_print/index.html @@ -1,7 +1,7 @@ Documentation | HugeGraph

This is the multi-page printable view of this section. -Click here to print.

Return to the regular view of this page.

Documentation

欢迎阅读HugeGraph文档

1 - Introduction with HugeGraph

Summary

HugeGraph是一款易用、高效、通用的开源图数据库系统(Graph Database,GitHub项目地址), +Click here to print.

Return to the regular view of this page.

Documentation

欢迎阅读HugeGraph文档

1 - Introduction with HugeGraph

Summary

HugeGraph是一款易用、高效、通用的开源图数据库系统(Graph Database,GitHub项目地址), 实现了Apache TinkerPop3框架及完全兼容Gremlin查询语言, 具备完善的工具链组件,助力用户轻松构建基于图数据库之上的应用和产品。HugeGraph支持百亿以上的顶点和边快速导入,并提供毫秒级的关联关系查询能力(OLTP), 并支持大规模分布式图分析(OLAP)。

HugeGraph典型应用场景包括深度关系探索、关联分析、路径搜索、特征抽取、数据聚类、社区检测、 @@ -1460,11 +1460,11 @@ # 注意: 诊断日志仅在作业失败时存在,并且只会保存一小时。 kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-system

2.2.8 显示作业的成功事件

NOTE: it will only be saved for one hour

kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-system
-

2.2.9 查询算法结果

如果输出到 Hugegraph-Server 则与 Locally 模式一致,如果输出到 HDFS ,请检查 hugegraph-computerresults{jobId}目录下的结果文件。

3 内置算法文档

3.1 支持的算法列表:

中心性算法:
社区算法:
路径算法:

更多算法请看: Built-In algorithms

3.2 算法描述

TODO

4 算法开发指南

TODO

5 注意事项

4 - Config

4.1 - HugeGraph 配置

1 概述

配置文件的目录为 hugegraph-release/conf,所有关于服务和图本身的配置都在此目录下。

主要的配置文件包括:gremlin-server.yaml、rest-server.properties 和 hugegraph.properties

HugeGraphServer 内部集成了 GremlinServer 和 RestServer,而 gremlin-server.yaml 和 rest-server.properties 就是用来配置这两个Server的。

下面对这三个配置文件逐一介绍。

2 gremlin-server.yaml

gremlin-server.yaml 文件默认的内容如下:

# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
+

2.2.9 查询算法结果

如果输出到 Hugegraph-Server 则与 Locally 模式一致,如果输出到 HDFS ,请检查 hugegraph-computerresults{jobId}目录下的结果文件。

3 内置算法文档

3.1 支持的算法列表:

中心性算法:
社区算法:
路径算法:

更多算法请看: Built-In algorithms

3.2 算法描述

TODO

4 算法开发指南

TODO

5 注意事项

4 - Config

4.1 - HugeGraph 配置

1 概述

配置文件的目录为 hugegraph-release/conf,所有关于服务和图本身的配置都在此目录下。

主要的配置文件包括:gremlin-server.yaml、rest-server.properties 和 hugegraph.properties

HugeGraphServer 内部集成了 GremlinServer 和 RestServer,而 gremlin-server.yaml 和 rest-server.properties 就是用来配置这两个 Server 的。

下面对这三个配置文件逐一介绍。

2 gremlin-server.yaml

gremlin-server.yaml 文件默认的内容如下:

# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
 #host: 127.0.0.1
 #port: 8182
 
-# Gremlin查询中的超时时间(以毫秒为单位)
+# Gremlin 查询中的超时时间(以毫秒为单位)
 evaluationTimeout: 30000
 
 channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
@@ -1572,7 +1572,7 @@
 }
 

上面的配置项很多,但目前只需要关注如下几个配置项:channelizer 和 graphs。

默认GremlinServer是服务在 localhost:8182,如果需要修改,配置 host、port 即可

同时需要在 rest-server.properties 中增加对应的配置项 gremlinserver.url=http://host:port

3 rest-server.properties

rest-server.properties 文件的默认内容如下:

# bind url
+推荐使用 HTTP 的通信方式,HugeGraph 的外围组件都是基于 HTTP 实现的;

默认 GremlinServer 是服务在 localhost:8182,如果需要修改,配置 host、port 即可

  • host:部署 GremlinServer 机器的机器名或 IP,目前 HugeGraphServer 不支持分布式部署,且 GremlinServer 不直接暴露给用户;
  • port:部署 GremlinServer 机器的端口;

同时需要在 rest-server.properties 中增加对应的配置项 gremlinserver.url=http://host:port

3 rest-server.properties

rest-server.properties 文件的默认内容如下:

# bind url
 restserver.url=http://127.0.0.1:8080
 # gremlin server url, need to be consistent with host and port in gremlin-server.yaml
 #gremlinserver.url=http://127.0.0.1:8182
@@ -1588,7 +1588,7 @@
 server.id=server-1
 server.role=master
 
  • restserver.url:RestServer 提供服务的 url,根据实际环境修改;
  • graphs:RestServer 启动时也需要打开图,该项为 map 结构,key 是图的名字,value 是该图的配置文件路径;

注意:gremlin-server.yaml 和 rest-server.properties 都包含 graphs 配置项,而 init-store 命令是根据 gremlin-server.yaml 的 graphs 下的图进行初始化的。

配置项 gremlinserver.url 是 GremlinServer 为 RestServer 提供服务的 url,该配置项默认为 http://localhost:8182,如需修改,需要和 gremlin-server.yaml 中的 host 和 port 相匹配;

4 hugegraph.properties

hugegraph.properties 是一类文件,因为如果系统存在多个图,则会有多个相似的文件。该文件用来配置与图存储和查询相关的参数,文件的默认内容如下:

# gremlin entrence to create graph
-gremlin.graph=com.baidu.hugegraph.HugeFactory
+gremlin.graph=org.apache.hugegraph.HugeFactory
 
 # cache config
 #schema.cache_capacity=100000
@@ -1671,7 +1671,7 @@
 #palo.poll_interval=10
 #palo.temp_dir=./palo-data
 #palo.file_limit_size=32
-

重点关注未注释的几项:

  • gremlin.graph:GremlinServer 的启动入口,用户不要修改此项;
  • backend:使用的后端存储,可选值有 memory、cassandra、scylladb、mysql、hbase、postgresql 和 rocksdb;
  • serializer:主要为内部使用,用于将 schema、vertex 和 edge 序列化到后端,对应的可选值为 text、cassandra、scylladb 和 binary;(注:rocksdb后端值需是binary,其他后端backend与serializer值需保持一致,如hbase后端该值为hbase)
  • store:图存储到后端使用的数据库名,在 cassandra 和 scylladb 中就是 keyspace 名,此项的值与 GremlinServer 和 RestServer 中的图名并无关系,但是出于直观考虑,建议仍然使用相同的名字;
  • cassandra.host:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 seeds;
  • cassandra.port:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 native port;
  • rocksdb.data_path:backend 为 rocksdb 时此项才有意义,rocksdb 的数据目录
  • rocksdb.wal_path:backend 为 rocksdb 时此项才有意义,rocksdb 的日志目录
  • admin.token: 通过一个token来获取服务器的配置信息,例如:http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55

5 多图配置

我们的系统是可以存在多个图的,并且各个图的后端可以不一样,比如图 hugegraph_rocksdbhugegraph_mysql,其中 hugegraph_rocksdbRocksDB 作为后端,hugegraph_mysqlMySQL 作为后端。

配置方法也很简单:

[可选]:修改 rest-server.properties

通过修改 rest-server.properties 中的 graphs 配置项来设置图的配置文件目录。默认配置为 graphs=./conf/graphs,如果想要修改为其它目录则调整 graphs 配置项,比如调整为 graphs=/etc/hugegraph/graphs,示例如下:

graphs=./conf/graphs
+

重点关注未注释的几项:

  • gremlin.graph:GremlinServer 的启动入口,用户不要修改此项;
  • backend:使用的后端存储,可选值有 memory、cassandra、scylladb、mysql、hbase、postgresql 和 rocksdb;
  • serializer:主要为内部使用,用于将 schema、vertex 和 edge 序列化到后端,对应的可选值为 text、cassandra、scylladb 和 binary;(注:rocksdb 后端值需是 binary,其他后端 backend 与 serializer 值需保持一致,如 hbase 后端该值为 hbase)
  • store:图存储到后端使用的数据库名,在 cassandra 和 scylladb 中就是 keyspace 名,此项的值与 GremlinServer 和 RestServer 中的图名并无关系,但是出于直观考虑,建议仍然使用相同的名字;
  • cassandra.host:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 seeds;
  • cassandra.port:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 native port;
  • rocksdb.data_path:backend 为 rocksdb 时此项才有意义,rocksdb 的数据目录
  • rocksdb.wal_path:backend 为 rocksdb 时此项才有意义,rocksdb 的日志目录
  • admin.token: 通过一个 token 来获取服务器的配置信息,例如:http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55

5 多图配置

我们的系统是可以存在多个图的,并且各个图的后端可以不一样,比如图 hugegraph_rocksdbhugegraph_mysql,其中 hugegraph_rocksdbRocksDB 作为后端,hugegraph_mysqlMySQL 作为后端。

配置方法也很简单:

[可选]:修改 rest-server.properties

通过修改 rest-server.properties 中的 graphs 配置项来设置图的配置文件目录。默认配置为 graphs=./conf/graphs,如果想要修改为其它目录则调整 graphs 配置项,比如调整为 graphs=/etc/hugegraph/graphs,示例如下:

graphs=./conf/graphs
 

conf/graphs 路径下基于 hugegraph.properties 修改得到 hugegraph_mysql_backend.propertieshugegraph_rocksdb_backend.properties

hugegraph_mysql_backend.properties 修改的部分如下:

backend=mysql
 serializer=mysql
 
@@ -1719,32 +1719,32 @@
 
curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend
 
 {"name":"hugegraph_rocksdb","backend":"rocksdb"}
-

4.2 - HugeGraph 配置项

Gremlin Server 配置项

对应配置文件gremlin-server.yaml

config optiondefault valuedescription
host127.0.0.1The host or ip of Gremlin Server.
port8182The listening port of Gremlin Server.
graphshugegraph: conf/hugegraph.propertiesThe map of graphs with name and config file path.
scriptEvaluationTimeout30000The timeout for gremlin script execution(millisecond).
channelizerorg.apache.tinkerpop.gremlin.server.channel.HttpChannelizerIndicates the protocol which the Gremlin Server provides service.
authenticationauthenticator: com.baidu.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties}The authenticator and config(contains tokens path) of authentication mechanism.

Rest Server & API 配置项

对应配置文件rest-server.properties

config optiondefault valuedescription
graphs[hugegraph:conf/hugegraph.properties]The map of graphs’ name and config file.
server.idserver-1The id of rest server, used for license verification.
server.rolemasterThe role of nodes in the cluster, available types are [master, worker, computer]
restserver.urlhttp://127.0.0.1:8080The url for listening of rest server.
ssl.keystore_fileserver.keystoreThe path of server keystore file used when https protocol is enabled.
ssl.keystore_passwordThe password of the path of the server keystore file used when the https protocol is enabled.
restserver.max_worker_threads2 * CPUsThe maximum worker threads of rest server.
restserver.min_free_memory64The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value.
restserver.request_timeout30The time in seconds within which a request must complete, -1 means no timeout.
restserver.connection_idle_timeout30The time in seconds to keep an inactive connection alive, -1 means no timeout.
restserver.connection_max_requests256The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited.
gremlinserver.urlhttp://127.0.0.1:8182The url of gremlin server.
gremlinserver.max_route8The max route number for gremlin server.
gremlinserver.timeout30The timeout in seconds of waiting for gremlin server.
batch.max_edges_per_batch500The maximum number of edges submitted per batch.
batch.max_vertices_per_batch500The maximum number of vertices submitted per batch.
batch.max_write_ratio50The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0.
batch.max_write_threads0The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads.
auth.authenticatorThe class path of authenticator implementation. e.g., com.baidu.hugegraph.auth.StandardAuthenticator, or com.baidu.hugegraph.auth.ConfigAuthenticator.
auth.admin_token162f7848-0b6d-4faf-b557-3a0797869c55Token for administrator operations, only for com.baidu.hugegraph.auth.ConfigAuthenticator.
auth.graph_storehugegraphThe name of graph used to store authentication information, like users, only for com.baidu.hugegraph.auth.StandardAuthenticator.
auth.user_tokens[hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31]The map of user tokens with name and password, only for com.baidu.hugegraph.auth.ConfigAuthenticator.
auth.audit_log_rate1000.0The max rate of audit log output per user, default value is 1000 records per second.
auth.cache_capacity10240The max cache capacity of each auth cache item.
auth.cache_expire600The expiration time in seconds of vertex cache.
auth.remote_urlIf the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’.
auth.token_expire86400The expiration time in seconds after token created
auth.token_secretFXQXbJtbCLxODc6tGci732pkH1cyf8QgSecret key of HS256 algorithm.
exception.allow_tracefalseWhether to allow exception trace stack.

基本配置项

基本配置项及后端配置项对应配置文件:{graph-name}.properties,如hugegraph.properties

config optiondefault valuedescription
gremlin.graphcom.baidu.hugegraph.HugeFactoryGremlin entrance to create graph.
backendrocksdbThe data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql].
serializerbinaryThe serializer for backend store, available values are [text, binary, cassandra, hbase, mysql].
storehugegraphThe database name like Cassandra Keyspace.
store.connection_detect_interval600The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time.
store.graphgThe graph table name, which store vertex, edge and property.
store.schemamThe schema table name, which store meta data.
store.systemsThe system table name, which store system data.
schema.illegal_name_regex.\s+$|~.The regex specified the illegal format for schema name.
schema.cache_capacity10000The max cache size(items) of schema cache.
vertex.cache_typel2The type of vertex cache, allowed values are [l1, l2].
vertex.cache_capacity10000000The max cache size(items) of vertex cache.
vertex.cache_expire600The expire time in seconds of vertex cache.
vertex.check_customized_id_existfalseWhether to check the vertices exist for those using customized id strategy.
vertex.default_labelvertexThe default vertex label.
vertex.tx_capacity10000The max size(items) of vertices(uncommitted) in transaction.
vertex.check_adjacent_vertex_existfalseWhether to check the adjacent vertices of edges exist.
vertex.lazy_load_adjacent_vertextrueWhether to lazy load adjacent vertices of edges.
vertex.part_edge_commit_size5000Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled.
vertex.encode_primary_key_numbertrueWhether to encode number value of primary key in vertex id.
vertex.remove_left_index_at_overwritefalseWhether remove left index at overwrite.
edge.cache_typel2The type of edge cache, allowed values are [l1, l2].
edge.cache_capacity1000000The max cache size(items) of edge cache.
edge.cache_expire600The expiration time in seconds of edge cache.
edge.tx_capacity10000The max size(items) of edges(uncommitted) in transaction.
query.page_size500The size of each page when querying by paging.
query.batch_size1000The size of each batch when querying by batch.
query.ignore_invalid_datatrueWhether to ignore invalid data of vertex or edge.
query.index_intersect_threshold1000The maximum number of intermediate results to intersect indexes when querying by multiple single index properties.
query.ramtable_edges_capacity20000000The maximum number of edges in ramtable, include OUT and IN edges.
query.ramtable_enablefalseWhether to enable ramtable for query of adjacent edges.
query.ramtable_vertices_capacity10000000The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity.
query.optimize_aggregate_by_indexfalseWhether to optimize aggregate query(like count) by index.
oltp.concurrent_depth10The min depth to enable concurrent oltp algorithm.
oltp.concurrent_threads10Thread number to concurrently execute oltp algorithm.
oltp.collection_typeECThe implementation type of collections used in oltp algorithm.
rate_limit.read0The max rate(times/s) to execute query of vertices/edges.
rate_limit.write0The max rate(items/s) to add/update/delete vertices/edges.
task.wait_timeout10Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend.
task.input_size_limit16777216The job input size limit in bytes.
task.result_size_limit16777216The job result size limit in bytes.
task.sync_deletionfalseWhether to delete schema or expired data synchronously.
task.ttl_delete_batch1The batch size used to delete expired data.
computer.config/conf/computer.yamlThe config file path of computer job.
search.text_analyzerikanalyzerChoose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. # if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer_modesmartSpecify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}.
snowflake.datecenter_id0The datacenter id of snowflake id generator.
snowflake.force_stringfalseWhether to force the snowflake long id to be a string.
snowflake.worker_id0The worker id of snowflake id generator.
raft.modefalseWhether the backend storage works in raft mode.
raft.safe_readfalseWhether to use linearly consistent read.
raft.use_snapshotfalseWhether to use snapshot.
raft.endpoint127.0.0.1:8281The peerid of current raft node.
raft.group_peers127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283The peers of current raft group.
raft.path./raft-logThe log path of current raft node.
raft.use_replicator_pipelinetrueWhether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent.
raft.election_timeout10000Timeout in milliseconds to launch a round of election.
raft.snapshot_interval3600The interval in seconds to trigger snapshot save.
raft.backend_threadscurrent CPU v-coresThe thread number used to apply task to backend.
raft.read_index_threads8The thread number used to execute reading index.
raft.apply_batch1The apply batch size to trigger disruptor event handler.
raft.queue_size16384The disruptor buffers size for jraft RaftNode, StateMachine and LogManager.
raft.queue_publish_timeout60The timeout in second when publish event into disruptor.
raft.rpc_threads80The rpc threads for jraft RPC layer.
raft.rpc_connect_timeout5000The rpc connect timeout for jraft rpc.
raft.rpc_timeout60000The rpc timeout for jraft rpc.
raft.rpc_buf_low_water_mark10485760The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network.
raft.rpc_buf_high_water_mark20971520The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time.
raft.read_strategyReadOnlyLeaseBasedThe linearizability of read strategy.

RPC server 配置

config optiondefault valuedescription
rpc.client_connect_timeout20The timeout(in seconds) of rpc client connect to rpc server.
rpc.client_load_balancerconsistentHashThe rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters.
rpc.client_read_timeout40The timeout(in seconds) of rpc client read from rpc server.
rpc.client_reconnect_period10The period(in seconds) of rpc client reconnect to rpc server.
rpc.client_retries3Failed retry number of rpc client calls to rpc server.
rpc.config_order999Sofa rpc configuration file loading order, the larger the more later loading.
rpc.logger_implcom.alipay.sofa.rpc.log.SLF4JLoggerImplSofa rpc log implementation class.
rpc.protocolboltRpc communication protocol, client and server need to be specified the same value.
rpc.remote_urlThe remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled.
rpc.server_adaptive_portfalseWhether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts.
rpc.server_hostThe hosts/ips bound by rpc server to provide services, empty value means not enabled.
rpc.server_port8090The port bound by rpc server to provide services.
rpc.server_timeout30The timeout(in seconds) of rpc server execution.

Cassandra 后端配置项

config optiondefault valuedescription
backendMust be set to cassandra.
serializerMust be set to cassandra.
cassandra.hostlocalhostThe seeds hostname or ip address of cassandra cluster.
cassandra.port9042The seeds port address of cassandra cluster.
cassandra.connect_timeout5The cassandra driver connect server timeout(seconds).
cassandra.read_timeout20The cassandra driver read from server timeout(seconds).
cassandra.keyspace.strategySimpleStrategyThe replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy.
cassandra.keyspace.replication[3]The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’.
cassandra.usernameThe username to use to login to cassandra cluster.
cassandra.passwordThe password corresponding to cassandra.username.
cassandra.compression_typenoneThe compression algorithm of cassandra transport: none/snappy/lz4.
cassandra.jmx_port=71997199The port of JMX API service for cassandra.
cassandra.aggregation_timeout43200The timeout in seconds of waiting for aggregation.

ScyllaDB 后端配置项

config optiondefault valuedescription
backendMust be set to scylladb.
serializerMust be set to scylladb.

其它与 Cassandra 后端一致。

RocksDB 后端配置项

config optiondefault valuedescription
backendMust be set to rocksdb.
serializerMust be set to binary.
rocksdb.data_disks[]The optimized disks for storing data of RocksDB. The format of each element: STORE/TABLE: /path/disk.Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap]
rocksdb.data_pathrocksdb-dataThe path for storing data of RocksDB.
rocksdb.wal_pathrocksdb-dataThe path for storing WAL of RocksDB.
rocksdb.allow_mmap_readsfalseAllow the OS to mmap file for reading sst tables.
rocksdb.allow_mmap_writesfalseAllow the OS to mmap file for writing.
rocksdb.block_cache_capacity8388608The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache.
rocksdb.bloom_filter_bits_per_key-1The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter.
rocksdb.bloom_filter_block_based_modefalseUse block based filter rather than full filter.
rocksdb.bloom_filter_whole_key_filteringtrueTrue if place whole keys in the bloom filter, else place the prefix of keys.
rocksdb.bottommost_compressionNO_COMPRESSIONThe compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.bulkload_modefalseSwitch to the mode to bulk load data into RocksDB.
rocksdb.cache_index_and_filter_blocksfalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.compaction_styleLEVELSet compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO.
rocksdb.compressionSNAPPY_COMPRESSIONThe compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.compression_per_level[NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION]The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.delayed_write_rate16777216The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind.
rocksdb.log_levelINFOThe info log level of RocksDB.
rocksdb.max_background_jobs8Maximum number of concurrent background jobs, including flushes and compactions.
rocksdb.level_compaction_dynamic_level_bytesfalseWhether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended.
rocksdb.max_bytes_for_level_base536870912The upper-bound of the total size of level-1 files in bytes.
rocksdb.max_bytes_for_level_multiplier10.0The ratio between the total size of level (L+1) files and the total size of level L files for all L.
rocksdb.max_open_files-1The maximum number of open files that can be cached by RocksDB, -1 means no limit.
rocksdb.max_subcompactions4The value represents the maximum number of threads per compaction job.
rocksdb.max_write_buffer_number6The maximum number of write buffers that are built up in memory.
rocksdb.max_write_buffer_number_to_maintain0The total maximum number of write buffers to maintain in memory.
rocksdb.min_write_buffer_number_to_merge2The minimum number of write buffers that will be merged together.
rocksdb.num_levels7Set the number of levels for this database.
rocksdb.optimize_filters_for_hitsfalseThis flag allows us to not store filters for the last level.
rocksdb.optimize_modetrueOptimize for heavy workloads and big datasets.
rocksdb.pin_l0_filter_and_index_blocks_in_cachefalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.sst_pathThe path for ingesting SST file into RocksDB.
rocksdb.target_file_size_base67108864The target file size for compaction in bytes.
rocksdb.target_file_size_multiplier1The size ratio between a level L file and a level (L+1) file.
rocksdb.use_direct_io_for_flush_and_compactionfalseEnable the OS to use direct read/writes in flush and compaction.
rocksdb.use_direct_readsfalseEnable the OS to use direct I/O for reading sst tables.
rocksdb.write_buffer_size134217728Amount of data in bytes to build up in memory.
rocksdb.max_manifest_file_size104857600The max size of manifest file in bytes.
rocksdb.skip_stats_update_on_db_openfalseWhether to skip statistics update when opening the database, setting this flag true allows us to not update statistics.
rocksdb.max_file_opening_threads16The max number of threads used to open files.
rocksdb.max_total_wal_size0Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit.
rocksdb.db_write_buffer_size0Total size of write buffers in bytes across all column families, 0 means no limit.
rocksdb.delete_obsolete_files_period21600The periodicity in seconds when obsolete files get deleted, 0 means always do full purge.
rocksdb.hard_pending_compaction_bytes_limit274877906944The hard limit to impose on pending compaction in bytes.
rocksdb.level0_file_num_compaction_trigger2Number of files to trigger level-0 compaction.
rocksdb.level0_slowdown_writes_trigger20Soft limit on number of level-0 files for slowing down writes.
rocksdb.level0_stop_writes_trigger36Hard limit on number of level-0 files for stopping writes.
rocksdb.soft_pending_compaction_bytes_limit68719476736The soft limit to impose on pending compaction in bytes.

HBase 后端配置项

config optiondefault valuedescription
backendMust be set to hbase.
serializerMust be set to hbase.
hbase.hostslocalhostThe hostnames or ip addresses of HBase zookeeper, separated with commas.
hbase.port2181The port address of HBase zookeeper.
hbase.threads_max64The max threads num of hbase connections.
hbase.znode_parent/hbaseThe znode parent path of HBase zookeeper.
hbase.zk_retry3The recovery retry times of HBase zookeeper.
hbase.aggregation_timeout43200The timeout in seconds of waiting for aggregation.
hbase.kerberos_enablefalseIs Kerberos authentication enabled for HBase.
hbase.kerberos_keytabThe HBase’s key tab file for kerberos authentication.
hbase.kerberos_principalThe HBase’s principal for kerberos authentication.
hbase.krb5_confetc/krb5.confKerberos configuration file, including KDC IP, default realm, etc.
hbase.hbase_site/etc/hbase/conf/hbase-site.xmlThe HBase’s configuration file
hbase.enable_partitiontrueIs pre-split partitions enabled for HBase.
hbase.vertex_partitions10The number of partitions of the HBase vertex table.
hbase.edge_partitions30The number of partitions of the HBase edge table.

MySQL & PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to mysql.
serializerMust be set to mysql.
jdbc.drivercom.mysql.jdbc.DriverThe JDBC driver class to connect database.
jdbc.urljdbc:mysql://127.0.0.1:3306The url of database in JDBC format.
jdbc.usernamerootThe username to login database.
jdbc.password******The password corresponding to jdbc.username.
jdbc.ssl_modefalseThe SSL mode of connections with database.
jdbc.reconnect_interval3The interval(seconds) between reconnections when the database connection fails.
jdbc.reconnect_max_times3The reconnect times when the database connection fails.
jdbc.storage_engineInnoDBThe storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL.
jdbc.postgresql.connect_databasetemplate1The database used to connect when init store, drop store or check store exist.

PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to postgresql.
serializerMust be set to postgresql.

其它与 MySQL 后端一致。

PostgreSQL 后端的 driver 和 url 应该设置为:

  • jdbc.driver=org.postgresql.Driver
  • jdbc.url=jdbc:postgresql://localhost:5432/

4.3 - HugeGraph 内置用户权限与扩展权限配置及使用

概述

HugeGraph 为了方便不同用户场景下的鉴权使用,目前内置了两套权限模式:

  1. 简单的ConfigAuthenticator模式,通过本地配置文件存储用户名和密码 (仅支持单 GraphServer)
  2. 完备的StandardAuthenticator模式,支持多用户认证、以及细粒度的权限访问控制,采用基于 “用户-用户组-操作-资源” 的 4 层设计,灵活控制用户角色与权限 (支持多 GraphServer)

其中 StandardAuthenticator 模式的几个核心设计:

  • 初始化时创建超级管理员 (admin) 用户,后续通过超级管理员创建其它用户,新创建的用户被分配足够权限后,可以创建或管理更多的用户
  • 支持动态创建用户、用户组、资源,支持动态分配或取消权限
  • 用户可以属于一个或多个用户组,每个用户组可以拥有对任意个资源的操作权限,操作类型包括:读、写、删除、执行等种类
  • “资源” 描述了图数据库中的数据,比如符合某一类条件的顶点,每一个资源包括 typelabelproperties三个要素,共有 18 种类型、任意 label、任意 properties 可组合形成的资源,一个资源的内部条件是且关系,多个资源之间的条件是或关系

举例说明:

// 场景:某用户只有北京地区的数据读取权限
+

4.2 - HugeGraph 配置项

Gremlin Server 配置项

对应配置文件gremlin-server.yaml

config optiondefault valuedescription
host127.0.0.1The host or ip of Gremlin Server.
port8182The listening port of Gremlin Server.
graphshugegraph: conf/hugegraph.propertiesThe map of graphs with name and config file path.
scriptEvaluationTimeout30000The timeout for gremlin script execution(millisecond).
channelizerorg.apache.tinkerpop.gremlin.server.channel.HttpChannelizerIndicates the protocol which the Gremlin Server provides service.
authenticationauthenticator: org.apache.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties}The authenticator and config(contains tokens path) of authentication mechanism.

Rest Server & API 配置项

对应配置文件rest-server.properties

config optiondefault valuedescription
graphs[hugegraph:conf/hugegraph.properties]The map of graphs’ name and config file.
server.idserver-1The id of rest server, used for license verification.
server.rolemasterThe role of nodes in the cluster, available types are [master, worker, computer]
restserver.urlhttp://127.0.0.1:8080The url for listening of rest server.
ssl.keystore_fileserver.keystoreThe path of server keystore file used when https protocol is enabled.
ssl.keystore_passwordThe password of the path of the server keystore file used when the https protocol is enabled.
restserver.max_worker_threads2 * CPUsThe maximum worker threads of rest server.
restserver.min_free_memory64The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value.
restserver.request_timeout30The time in seconds within which a request must complete, -1 means no timeout.
restserver.connection_idle_timeout30The time in seconds to keep an inactive connection alive, -1 means no timeout.
restserver.connection_max_requests256The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited.
gremlinserver.urlhttp://127.0.0.1:8182The url of gremlin server.
gremlinserver.max_route8The max route number for gremlin server.
gremlinserver.timeout30The timeout in seconds of waiting for gremlin server.
batch.max_edges_per_batch500The maximum number of edges submitted per batch.
batch.max_vertices_per_batch500The maximum number of vertices submitted per batch.
batch.max_write_ratio50The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0.
batch.max_write_threads0The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads.
auth.authenticatorThe class path of authenticator implementation. e.g., org.apache.hugegraph.auth.StandardAuthenticator, or org.apache.hugegraph.auth.ConfigAuthenticator.
auth.admin_token162f7848-0b6d-4faf-b557-3a0797869c55Token for administrator operations, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.graph_storehugegraphThe name of graph used to store authentication information, like users, only for org.apache.hugegraph.auth.StandardAuthenticator.
auth.user_tokens[hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31]The map of user tokens with name and password, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.audit_log_rate1000.0The max rate of audit log output per user, default value is 1000 records per second.
auth.cache_capacity10240The max cache capacity of each auth cache item.
auth.cache_expire600The expiration time in seconds of vertex cache.
auth.remote_urlIf the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’.
auth.token_expire86400The expiration time in seconds after token created
auth.token_secretFXQXbJtbCLxODc6tGci732pkH1cyf8QgSecret key of HS256 algorithm.
exception.allow_tracefalseWhether to allow exception trace stack.

基本配置项

基本配置项及后端配置项对应配置文件:{graph-name}.properties,如hugegraph.properties

config optiondefault valuedescription
gremlin.graphorg.apache.hugegraph.HugeFactoryGremlin entrance to create graph.
backendrocksdbThe data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql].
serializerbinaryThe serializer for backend store, available values are [text, binary, cassandra, hbase, mysql].
storehugegraphThe database name like Cassandra Keyspace.
store.connection_detect_interval600The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time.
store.graphgThe graph table name, which store vertex, edge and property.
store.schemamThe schema table name, which store meta data.
store.systemsThe system table name, which store system data.
schema.illegal_name_regex.\s+$|~.The regex specified the illegal format for schema name.
schema.cache_capacity10000The max cache size(items) of schema cache.
vertex.cache_typel2The type of vertex cache, allowed values are [l1, l2].
vertex.cache_capacity10000000The max cache size(items) of vertex cache.
vertex.cache_expire600The expire time in seconds of vertex cache.
vertex.check_customized_id_existfalseWhether to check the vertices exist for those using customized id strategy.
vertex.default_labelvertexThe default vertex label.
vertex.tx_capacity10000The max size(items) of vertices(uncommitted) in transaction.
vertex.check_adjacent_vertex_existfalseWhether to check the adjacent vertices of edges exist.
vertex.lazy_load_adjacent_vertextrueWhether to lazy load adjacent vertices of edges.
vertex.part_edge_commit_size5000Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled.
vertex.encode_primary_key_numbertrueWhether to encode number value of primary key in vertex id.
vertex.remove_left_index_at_overwritefalseWhether remove left index at overwrite.
edge.cache_typel2The type of edge cache, allowed values are [l1, l2].
edge.cache_capacity1000000The max cache size(items) of edge cache.
edge.cache_expire600The expiration time in seconds of edge cache.
edge.tx_capacity10000The max size(items) of edges(uncommitted) in transaction.
query.page_size500The size of each page when querying by paging.
query.batch_size1000The size of each batch when querying by batch.
query.ignore_invalid_datatrueWhether to ignore invalid data of vertex or edge.
query.index_intersect_threshold1000The maximum number of intermediate results to intersect indexes when querying by multiple single index properties.
query.ramtable_edges_capacity20000000The maximum number of edges in ramtable, include OUT and IN edges.
query.ramtable_enablefalseWhether to enable ramtable for query of adjacent edges.
query.ramtable_vertices_capacity10000000The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity.
query.optimize_aggregate_by_indexfalseWhether to optimize aggregate query(like count) by index.
oltp.concurrent_depth10The min depth to enable concurrent oltp algorithm.
oltp.concurrent_threads10Thread number to concurrently execute oltp algorithm.
oltp.collection_typeECThe implementation type of collections used in oltp algorithm.
rate_limit.read0The max rate(times/s) to execute query of vertices/edges.
rate_limit.write0The max rate(items/s) to add/update/delete vertices/edges.
task.wait_timeout10Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend.
task.input_size_limit16777216The job input size limit in bytes.
task.result_size_limit16777216The job result size limit in bytes.
task.sync_deletionfalseWhether to delete schema or expired data synchronously.
task.ttl_delete_batch1The batch size used to delete expired data.
computer.config/conf/computer.yamlThe config file path of computer job.
search.text_analyzerikanalyzerChoose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. # if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer_modesmartSpecify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}.
snowflake.datecenter_id0The datacenter id of snowflake id generator.
snowflake.force_stringfalseWhether to force the snowflake long id to be a string.
snowflake.worker_id0The worker id of snowflake id generator.
raft.modefalseWhether the backend storage works in raft mode.
raft.safe_readfalseWhether to use linearly consistent read.
raft.use_snapshotfalseWhether to use snapshot.
raft.endpoint127.0.0.1:8281The peerid of current raft node.
raft.group_peers127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283The peers of current raft group.
raft.path./raft-logThe log path of current raft node.
raft.use_replicator_pipelinetrueWhether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent.
raft.election_timeout10000Timeout in milliseconds to launch a round of election.
raft.snapshot_interval3600The interval in seconds to trigger snapshot save.
raft.backend_threadscurrent CPU v-coresThe thread number used to apply task to backend.
raft.read_index_threads8The thread number used to execute reading index.
raft.apply_batch1The apply batch size to trigger disruptor event handler.
raft.queue_size16384The disruptor buffers size for jraft RaftNode, StateMachine and LogManager.
raft.queue_publish_timeout60The timeout in second when publish event into disruptor.
raft.rpc_threads80The rpc threads for jraft RPC layer.
raft.rpc_connect_timeout5000The rpc connect timeout for jraft rpc.
raft.rpc_timeout60000The rpc timeout for jraft rpc.
raft.rpc_buf_low_water_mark10485760The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network.
raft.rpc_buf_high_water_mark20971520The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time.
raft.read_strategyReadOnlyLeaseBasedThe linearizability of read strategy.

RPC server 配置

config optiondefault valuedescription
rpc.client_connect_timeout20The timeout(in seconds) of rpc client connect to rpc server.
rpc.client_load_balancerconsistentHashThe rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters.
rpc.client_read_timeout40The timeout(in seconds) of rpc client read from rpc server.
rpc.client_reconnect_period10The period(in seconds) of rpc client reconnect to rpc server.
rpc.client_retries3Failed retry number of rpc client calls to rpc server.
rpc.config_order999Sofa rpc configuration file loading order, the larger the more later loading.
rpc.logger_implcom.alipay.sofa.rpc.log.SLF4JLoggerImplSofa rpc log implementation class.
rpc.protocolboltRpc communication protocol, client and server need to be specified the same value.
rpc.remote_urlThe remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled.
rpc.server_adaptive_portfalseWhether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts.
rpc.server_hostThe hosts/ips bound by rpc server to provide services, empty value means not enabled.
rpc.server_port8090The port bound by rpc server to provide services.
rpc.server_timeout30The timeout(in seconds) of rpc server execution.

Cassandra 后端配置项

config optiondefault valuedescription
backendMust be set to cassandra.
serializerMust be set to cassandra.
cassandra.hostlocalhostThe seeds hostname or ip address of cassandra cluster.
cassandra.port9042The seeds port address of cassandra cluster.
cassandra.connect_timeout5The cassandra driver connect server timeout(seconds).
cassandra.read_timeout20The cassandra driver read from server timeout(seconds).
cassandra.keyspace.strategySimpleStrategyThe replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy.
cassandra.keyspace.replication[3]The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’.
cassandra.usernameThe username to use to login to cassandra cluster.
cassandra.passwordThe password corresponding to cassandra.username.
cassandra.compression_typenoneThe compression algorithm of cassandra transport: none/snappy/lz4.
cassandra.jmx_port=71997199The port of JMX API service for cassandra.
cassandra.aggregation_timeout43200The timeout in seconds of waiting for aggregation.

ScyllaDB 后端配置项

config optiondefault valuedescription
backendMust be set to scylladb.
serializerMust be set to scylladb.

其它与 Cassandra 后端一致。

RocksDB 后端配置项

config optiondefault valuedescription
backendMust be set to rocksdb.
serializerMust be set to binary.
rocksdb.data_disks[]The optimized disks for storing data of RocksDB. The format of each element: STORE/TABLE: /path/disk.Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap]
rocksdb.data_pathrocksdb-dataThe path for storing data of RocksDB.
rocksdb.wal_pathrocksdb-dataThe path for storing WAL of RocksDB.
rocksdb.allow_mmap_readsfalseAllow the OS to mmap file for reading sst tables.
rocksdb.allow_mmap_writesfalseAllow the OS to mmap file for writing.
rocksdb.block_cache_capacity8388608The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache.
rocksdb.bloom_filter_bits_per_key-1The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter.
rocksdb.bloom_filter_block_based_modefalseUse block based filter rather than full filter.
rocksdb.bloom_filter_whole_key_filteringtrueTrue if place whole keys in the bloom filter, else place the prefix of keys.
rocksdb.bottommost_compressionNO_COMPRESSIONThe compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.bulkload_modefalseSwitch to the mode to bulk load data into RocksDB.
rocksdb.cache_index_and_filter_blocksfalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.compaction_styleLEVELSet compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO.
rocksdb.compressionSNAPPY_COMPRESSIONThe compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.compression_per_level[NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION]The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.delayed_write_rate16777216The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind.
rocksdb.log_levelINFOThe info log level of RocksDB.
rocksdb.max_background_jobs8Maximum number of concurrent background jobs, including flushes and compactions.
rocksdb.level_compaction_dynamic_level_bytesfalseWhether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended.
rocksdb.max_bytes_for_level_base536870912The upper-bound of the total size of level-1 files in bytes.
rocksdb.max_bytes_for_level_multiplier10.0The ratio between the total size of level (L+1) files and the total size of level L files for all L.
rocksdb.max_open_files-1The maximum number of open files that can be cached by RocksDB, -1 means no limit.
rocksdb.max_subcompactions4The value represents the maximum number of threads per compaction job.
rocksdb.max_write_buffer_number6The maximum number of write buffers that are built up in memory.
rocksdb.max_write_buffer_number_to_maintain0The total maximum number of write buffers to maintain in memory.
rocksdb.min_write_buffer_number_to_merge2The minimum number of write buffers that will be merged together.
rocksdb.num_levels7Set the number of levels for this database.
rocksdb.optimize_filters_for_hitsfalseThis flag allows us to not store filters for the last level.
rocksdb.optimize_modetrueOptimize for heavy workloads and big datasets.
rocksdb.pin_l0_filter_and_index_blocks_in_cachefalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.sst_pathThe path for ingesting SST file into RocksDB.
rocksdb.target_file_size_base67108864The target file size for compaction in bytes.
rocksdb.target_file_size_multiplier1The size ratio between a level L file and a level (L+1) file.
rocksdb.use_direct_io_for_flush_and_compactionfalseEnable the OS to use direct read/writes in flush and compaction.
rocksdb.use_direct_readsfalseEnable the OS to use direct I/O for reading sst tables.
rocksdb.write_buffer_size134217728Amount of data in bytes to build up in memory.
rocksdb.max_manifest_file_size104857600The max size of manifest file in bytes.
rocksdb.skip_stats_update_on_db_openfalseWhether to skip statistics update when opening the database, setting this flag true allows us to not update statistics.
rocksdb.max_file_opening_threads16The max number of threads used to open files.
rocksdb.max_total_wal_size0Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit.
rocksdb.db_write_buffer_size0Total size of write buffers in bytes across all column families, 0 means no limit.
rocksdb.delete_obsolete_files_period21600The periodicity in seconds when obsolete files get deleted, 0 means always do full purge.
rocksdb.hard_pending_compaction_bytes_limit274877906944The hard limit to impose on pending compaction in bytes.
rocksdb.level0_file_num_compaction_trigger2Number of files to trigger level-0 compaction.
rocksdb.level0_slowdown_writes_trigger20Soft limit on number of level-0 files for slowing down writes.
rocksdb.level0_stop_writes_trigger36Hard limit on number of level-0 files for stopping writes.
rocksdb.soft_pending_compaction_bytes_limit68719476736The soft limit to impose on pending compaction in bytes.

HBase 后端配置项

config optiondefault valuedescription
backendMust be set to hbase.
serializerMust be set to hbase.
hbase.hostslocalhostThe hostnames or ip addresses of HBase zookeeper, separated with commas.
hbase.port2181The port address of HBase zookeeper.
hbase.threads_max64The max threads num of hbase connections.
hbase.znode_parent/hbaseThe znode parent path of HBase zookeeper.
hbase.zk_retry3The recovery retry times of HBase zookeeper.
hbase.aggregation_timeout43200The timeout in seconds of waiting for aggregation.
hbase.kerberos_enablefalseIs Kerberos authentication enabled for HBase.
hbase.kerberos_keytabThe HBase’s key tab file for kerberos authentication.
hbase.kerberos_principalThe HBase’s principal for kerberos authentication.
hbase.krb5_confetc/krb5.confKerberos configuration file, including KDC IP, default realm, etc.
hbase.hbase_site/etc/hbase/conf/hbase-site.xmlThe HBase’s configuration file
hbase.enable_partitiontrueIs pre-split partitions enabled for HBase.
hbase.vertex_partitions10The number of partitions of the HBase vertex table.
hbase.edge_partitions30The number of partitions of the HBase edge table.

MySQL & PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to mysql.
serializerMust be set to mysql.
jdbc.drivercom.mysql.jdbc.DriverThe JDBC driver class to connect database.
jdbc.urljdbc:mysql://127.0.0.1:3306The url of database in JDBC format.
jdbc.usernamerootThe username to login database.
jdbc.password******The password corresponding to jdbc.username.
jdbc.ssl_modefalseThe SSL mode of connections with database.
jdbc.reconnect_interval3The interval(seconds) between reconnections when the database connection fails.
jdbc.reconnect_max_times3The reconnect times when the database connection fails.
jdbc.storage_engineInnoDBThe storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL.
jdbc.postgresql.connect_databasetemplate1The database used to connect when init store, drop store or check store exist.

PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to postgresql.
serializerMust be set to postgresql.

其它与 MySQL 后端一致。

PostgreSQL 后端的 driver 和 url 应该设置为:

  • jdbc.driver=org.postgresql.Driver
  • jdbc.url=jdbc:postgresql://localhost:5432/

4.3 - HugeGraph 内置用户权限与扩展权限配置及使用

概述

HugeGraph 为了方便不同用户场景下的鉴权使用,目前内置了两套权限模式:

  1. 简单的ConfigAuthenticator模式,通过本地配置文件存储用户名和密码 (仅支持单 GraphServer)
  2. 完备的StandardAuthenticator模式,支持多用户认证、以及细粒度的权限访问控制,采用基于“用户 - 用户组 - 操作 - 资源”的 4 层设计,灵活控制用户角色与权限 (支持多 GraphServer)

其中 StandardAuthenticator 模式的几个核心设计:

  • 初始化时创建超级管理员 (admin) 用户,后续通过超级管理员创建其它用户,新创建的用户被分配足够权限后,可以创建或管理更多的用户
  • 支持动态创建用户、用户组、资源,支持动态分配或取消权限
  • 用户可以属于一个或多个用户组,每个用户组可以拥有对任意个资源的操作权限,操作类型包括:读、写、删除、执行等种类
  • “资源” 描述了图数据库中的数据,比如符合某一类条件的顶点,每一个资源包括 typelabelproperties三个要素,共有 18 种类型、任意 label、任意 properties 可组合形成的资源,一个资源的内部条件是且关系,多个资源之间的条件是或关系

举例说明:

// 场景:某用户只有北京地区的数据读取权限
 user(name=xx) -belong-> group(name=xx) -access(read)-> target(graph=graph1, resource={label: person, city: Beijing})
 

配置用户认证

HugeGraph 默认不启用用户认证功能,需通过修改配置文件来启用该功能。内置实现了StandardAuthenticatorConfigAuthenticator两种模式,StandardAuthenticator模式支持多用户认证与细粒度权限控制,ConfigAuthenticator模式支持简单的用户权限认证。此外,开发者可以自定义实现HugeAuthenticator接口来对接自身的权限系统。

用户认证方式均采用 HTTP Basic Authentication ,简单说就是在发送 HTTP 请求时在 Authentication 设置选择 Basic 然后输入对应的用户名和密码,对应 HTTP 明文如下所示 :

GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
 Authorization: Basic admin xxxx
-

StandardAuthenticator模式

StandardAuthenticator模式是通过在数据库后端存储用户信息来支持用户认证和权限控制,该实现基于数据库存储的用户的名称与密码进行认证(密码已被加密),基于用户的角色来细粒度控制用户权限。下面是具体的配置流程(重启服务生效):

在配置文件gremlin-server.yaml中配置authenticator及其rest-server文件路径:

authentication: {
-  authenticator: com.baidu.hugegraph.auth.StandardAuthenticator,
-  authenticationHandler: com.baidu.hugegraph.auth.WsAndHttpBasicAuthHandler,
+

StandardAuthenticator 模式

StandardAuthenticator模式是通过在数据库后端存储用户信息来支持用户认证和权限控制,该实现基于数据库存储的用户的名称与密码进行认证(密码已被加密),基于用户的角色来细粒度控制用户权限。下面是具体的配置流程(重启服务生效):

在配置文件gremlin-server.yaml中配置authenticator及其rest-server文件路径:

authentication: {
+  authenticator: org.apache.hugegraph.auth.StandardAuthenticator,
+  authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
   config: {tokens: conf/rest-server.properties}
 }
-

在配置文件rest-server.properties中配置authenticator及其graph_store信息:

auth.authenticator=com.baidu.hugegraph.auth.StandardAuthenticator
+

在配置文件rest-server.properties中配置authenticator及其graph_store信息:

auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator
 auth.graph_store=hugegraph
 
 # auth client config
-# 如果是分开部署 GraphServer 和 AuthServer, 还需要指定下面的配置, 地址填写 AuthServer 的 IP:RPC 端口
+# 如果是分开部署 GraphServer 和 AuthServer, 还需要指定下面的配置,地址填写 AuthServer 的 IP:RPC 端口
 #auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897
-

其中,graph_store配置项是指使用哪一个图来存储用户信息,如果存在多个图的话,选取任意一个均可。

在配置文件hugegraph{n}.properties中配置gremlin.graph信息:

gremlin.graph=com.baidu.hugegraph.auth.HugeFactoryAuthProxy
-

然后详细的权限 API 调用和说明请参考 Authentication-API 文档

ConfigAuthenticator模式

ConfigAuthenticator模式是通过预先在配置文件中设置用户信息来支持用户认证,该实现是基于配置好的静态tokens来验证用户是否合法。下面是具体的配置流程(重启服务生效):

在配置文件gremlin-server.yaml中配置authenticator及其rest-server文件路径:

authentication: {
-  authenticator: com.baidu.hugegraph.auth.ConfigAuthenticator,
-  authenticationHandler: com.baidu.hugegraph.auth.WsAndHttpBasicAuthHandler,
+

其中,graph_store配置项是指使用哪一个图来存储用户信息,如果存在多个图的话,选取任意一个均可。

在配置文件hugegraph{n}.properties中配置gremlin.graph信息:

gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy
+

然后详细的权限 API 调用和说明请参考 Authentication-API 文档

ConfigAuthenticator 模式

ConfigAuthenticator模式是通过预先在配置文件中设置用户信息来支持用户认证,该实现是基于配置好的静态tokens来验证用户是否合法。下面是具体的配置流程(重启服务生效):

在配置文件gremlin-server.yaml中配置authenticator及其rest-server文件路径:

authentication: {
+  authenticator: org.apache.hugegraph.auth.ConfigAuthenticator,
+  authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
   config: {tokens: conf/rest-server.properties}
 }
-

在配置文件rest-server.properties中配置authenticator及其tokens信息:

auth.authenticator=com.baidu.hugegraph.auth.ConfigAuthenticator
+

在配置文件rest-server.properties中配置authenticator及其tokens信息:

auth.authenticator=org.apache.hugegraph.auth.ConfigAuthenticator
 auth.admin_token=token-value-a
 auth.user_tokens=[hugegraph1:token-value-1, hugegraph2:token-value-2]
-

在配置文件hugegraph{n}.properties中配置gremlin.graph信息:

gremlin.graph=com.baidu.hugegraph.auth.HugeFactoryAuthProxy
-

自定义用户认证系统

如果需要支持更加灵活的用户系统,可自定义authenticator进行扩展,自定义authenticator实现接口com.baidu.hugegraph.auth.HugeAuthenticator即可,然后修改配置文件中authenticator配置项指向该实现。

4.4 - 配置 HugeGraphServer 使用 https 协议

概述

HugeGraphServer 默认使用的是 http 协议,如果用户对请求的安全性有要求,可以配置成 https。

服务端配置

修改 conf/rest-server.properties 配置文件,将 restserver.url 的 schema 部分改为 https。

# 将协议设置为 https
+

在配置文件hugegraph{n}.properties中配置gremlin.graph信息:

gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy
+

自定义用户认证系统

如果需要支持更加灵活的用户系统,可自定义 authenticator 进行扩展,自定义 authenticator 实现接口org.apache.hugegraph.auth.HugeAuthenticator即可,然后修改配置文件中authenticator配置项指向该实现。

4.4 - 配置 HugeGraphServer 使用 https 协议

概述

HugeGraphServer 默认使用的是 http 协议,如果用户对请求的安全性有要求,可以配置成 https。

服务端配置

修改 conf/rest-server.properties 配置文件,将 restserver.url 的 schema 部分改为 https。

# 将协议设置为 https
 restserver.url=https://127.0.0.1:8080
 # 服务端 keystore 文件路径,当协议为 https 时该默认值自动生效,可按需修改此项
 ssl.keystore_file=conf/hugegraph-server.keystore
@@ -5418,7 +5418,7 @@
 		"task_retries": 0,
 		"id": 2,
 		"task_type": "gremlin",
-		"task_callable": "com.baidu.hugegraph.api.job.GremlinAPI$GremlinJob",
+		"task_callable": "org.apache.hugegraph.api.job.GremlinAPI$GremlinJob",
 		"task_input": "{\"gremlin\":\"hugegraph.traversal().V()\",\"bindings\":{},\"language\":\"gremlin-groovy\",\"aliases\":{\"hugegraph\":\"graph\"}}"
 	}]
 }
@@ -5434,7 +5434,7 @@
 	"task_retries": 0,
 	"id": 2,
 	"task_type": "gremlin",
-	"task_callable": "com.baidu.hugegraph.api.job.GremlinAPI$GremlinJob",
+	"task_callable": "org.apache.hugegraph.api.job.GremlinAPI$GremlinJob",
 	"task_input": "{\"gremlin\":\"hugegraph.traversal().V()\",\"bindings\":{},\"language\":\"gremlin-groovy\",\"aliases\":{\"hugegraph\":\"graph\"}}"
 }
 

7.1.3 删除某个异步任务信息,不删除异步任务本身

Method & Url
DELETE http://localhost:8080/graphs/hugegraph/tasks/2
@@ -5449,7 +5449,7 @@
     "}" +
 "}"
 
Method & Url
PUT http://localhost:8080/graphs/hugegraph/tasks/2?action=cancel
-

请保证在10秒内发送该请求,如果超过10秒发送,任务可能已经执行完成,无法取消。

Response Status
202
+

请保证在 10 秒内发送该请求,如果超过 10 秒发送,任务可能已经执行完成,无法取消。

Response Status
202
 
Response Body
{
     "cancelled": true
 }
@@ -6301,14 +6301,14 @@
         assert !graph.vertices().hasNext();
         assert !graph.edges().hasNext();
     }
-
事务实现原理
  • 服务端内部通过将事务与线程绑定实现隔离(ThreadLocal)
  • 本事务未提交的内容按照时间顺序覆盖老数据以供本事务查询最新版本数据
  • 底层依赖后端数据库保证事务原子性操作(如Cassandra/RocksDB的batch接口均保证原子性)
注意

RESTful API暂时未暴露事务接口

TinkerPop API允许打开事务,请求完成时会自动关闭(Gremlin Server强制关闭)

6.3 - HugeGraph Plugin机制及插件扩展流程

背景

  1. HugeGraph不仅开源开放,而且要做到简单易用,一般用户无需更改源码也能轻松增加插件扩展功能。
  2. HugeGraph支持多种内置存储后端,也允许用户无需更改现有源码的情况下扩展自定义后端。
  3. HugeGraph支持全文检索,全文检索功能涉及到各语言分词,目前已内置8种中文分词器,也允许用户无需更改现有源码的情况下扩展自定义分词器。

可扩展维度

目前插件方式提供如下几个维度的扩展项:

  • 后端存储
  • 序列化器
  • 自定义配置项
  • 分词器

插件实现机制

  1. HugeGraph提供插件接口HugeGraphPlugin,通过Java SPI机制支持插件化
  2. HugeGraph提供了4个扩展项注册函数:registerOptions()registerBackend()registerSerializer()registerAnalyzer()
  3. 插件实现者实现相应的Options、Backend、Serializer或Analyzer的接口
  4. 插件实现者实现HugeGraphPlugin接口的register()方法,在该方法中注册上述第3点所列的具体实现类,并打成jar包
  5. 插件使用者将jar包放在HugeGraph Server安装目录的plugins目录下,修改相关配置项为插件自定义值,重启即可生效

插件实现流程实例

1 新建一个maven项目

1.1 项目名称取名:hugegraph-plugin-demo
1.2 添加hugegraph-core Jar包依赖

maven pom.xml详细内容如下:

<?xml version="1.0" encoding="UTF-8"?>
+
事务实现原理
  • 服务端内部通过将事务与线程绑定实现隔离(ThreadLocal)
  • 本事务未提交的内容按照时间顺序覆盖老数据以供本事务查询最新版本数据
  • 底层依赖后端数据库保证事务原子性操作(如Cassandra/RocksDB的batch接口均保证原子性)
注意

RESTful API暂时未暴露事务接口

TinkerPop API允许打开事务,请求完成时会自动关闭(Gremlin Server强制关闭)

6.3 - HugeGraph Plugin 机制及插件扩展流程

背景

  1. HugeGraph 不仅开源开放,而且要做到简单易用,一般用户无需更改源码也能轻松增加插件扩展功能。
  2. HugeGraph 支持多种内置存储后端,也允许用户无需更改现有源码的情况下扩展自定义后端。
  3. HugeGraph 支持全文检索,全文检索功能涉及到各语言分词,目前已内置 8 种中文分词器,也允许用户无需更改现有源码的情况下扩展自定义分词器。

可扩展维度

目前插件方式提供如下几个维度的扩展项:

  • 后端存储
  • 序列化器
  • 自定义配置项
  • 分词器

插件实现机制

  1. HugeGraph 提供插件接口 HugeGraphPlugin,通过 Java SPI 机制支持插件化
  2. HugeGraph 提供了 4 个扩展项注册函数:registerOptions()registerBackend()registerSerializer()registerAnalyzer()
  3. 插件实现者实现相应的 Options、Backend、Serializer 或 Analyzer 的接口
  4. 插件实现者实现 HugeGraphPlugin 接口的register()方法,在该方法中注册上述第 3 点所列的具体实现类,并打成 jar 包
  5. 插件使用者将 jar 包放在 HugeGraph Server 安装目录的plugins目录下,修改相关配置项为插件自定义值,重启即可生效

插件实现流程实例

1 新建一个 maven 项目

1.1 项目名称取名:hugegraph-plugin-demo
1.2 添加hugegraph-core Jar 包依赖

maven pom.xml 详细内容如下:

<?xml version="1.0" encoding="UTF-8"?>
 
 <project xmlns="http://maven.apache.org/POM/4.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 
     <modelVersion>4.0.0</modelVersion>
-    <groupId>com.baidu.hugegraph</groupId>
+    <groupId>org.apache.hugegraph</groupId>
     <artifactId>hugegraph-plugin-demo</artifactId>
     <version>1.0.0</version>
     <packaging>jar</packaging>
@@ -6317,13 +6317,13 @@
 
     <dependencies>
         <dependency>
-            <groupId>com.baidu.hugegraph</groupId>
+            <groupId>org.apache.hugegraph</groupId>
             <artifactId>hugegraph-core</artifactId>
             <version>${project.version}</version>
         </dependency>
     </dependencies>
 </project>
-

2 实现扩展功能

2.1 扩展自定义后端
2.1.1 实现接口BackendStoreProvider
  • 可实现接口:com.baidu.hugegraph.backend.store.BackendStoreProvider
  • 或者继承抽象类:com.baidu.hugegraph.backend.store.AbstractBackendStoreProvider

以RocksDB后端RocksDBStoreProvider为例:

public class RocksDBStoreProvider extends AbstractBackendStoreProvider {
+

2 实现扩展功能

2.1 扩展自定义后端
2.1.1 实现接口 BackendStoreProvider
  • 可实现接口:org.apache.hugegraph.backend.store.BackendStoreProvider
  • 或者继承抽象类:org.apache.hugegraph.backend.store.AbstractBackendStoreProvider

以 RocksDB 后端 RocksDBStoreProvider 为例:

public class RocksDBStoreProvider extends AbstractBackendStoreProvider {
 
     protected String database() {
         return this.graph().toLowerCase();
@@ -6349,7 +6349,7 @@
         return "1.0";
     }
 }
-
2.1.2 实现接口BackendStore

BackendStore接口定义如下:

public interface BackendStore {
+
2.1.2 实现接口 BackendStore

BackendStore 接口定义如下:

public interface BackendStore {
     // Store name
     public String store();
 
@@ -6387,7 +6387,7 @@
     // Generate an id for a specific type
     public Id nextId(HugeType type);
 }
-
2.1.3 扩展自定义序列化器

序列化器必须继承抽象类:com.baidu.hugegraph.backend.serializer.AbstractSerializer(implements GraphSerializer, SchemaSerializer) +

2.1.3 扩展自定义序列化器

序列化器必须继承抽象类:org.apache.hugegraph.backend.serializer.AbstractSerializer(implements GraphSerializer, SchemaSerializer) 主要接口的定义如下:

public interface GraphSerializer {
     public BackendEntry writeVertex(HugeVertex vertex);
     public BackendEntry writeVertexProperty(HugeVertexProperty<?> prop);
@@ -6411,7 +6411,7 @@
     public BackendEntry writeIndexLabel(IndexLabel indexLabel);
     public IndexLabel readIndexLabel(HugeGraph graph, BackendEntry entry);
 }
-
2.1.4 扩展自定义配置项

增加自定义后端时,可能需要增加新的配置项,实现流程主要包括:

以RocksDB配置项定义为例:

public class RocksDBOptions extends OptionHolder {
+
2.1.4 扩展自定义配置项

增加自定义后端时,可能需要增加新的配置项,实现流程主要包括:

以 RocksDB 配置项定义为例:

public class RocksDBOptions extends OptionHolder {
 
     private RocksDBOptions() {
         super();
@@ -6456,13 +6456,13 @@
                     ImmutableList.of()
             );
 }
-
2.2 扩展自定义分词器

分词器需要实现接口com.baidu.hugegraph.analyzer.Analyzer,以实现一个SpaceAnalyzer空格分词器为例。

package com.baidu.hugegraph.plugin;
+
2.2 扩展自定义分词器

分词器需要实现接口org.apache.hugegraph.analyzer.Analyzer,以实现一个 SpaceAnalyzer 空格分词器为例。

package org.apache.hugegraph.plugin;
 
 import java.util.Arrays;
 import java.util.HashSet;
 import java.util.Set;
 
-import com.baidu.hugegraph.analyzer.Analyzer;
+import org.apache.hugegraph.analyzer.Analyzer;
 
 public class SpaceAnalyzer implements Analyzer {
 
@@ -6472,7 +6472,7 @@
     }
 }
 

3. 实现插件接口,并进行注册

插件注册入口为HugeGraphPlugin.register(),自定义插件必须实现该接口方法,在其内部注册上述定义好的扩展项。 -接口com.baidu.hugegraph.plugin.HugeGraphPlugin定义如下:

public interface HugeGraphPlugin {
+接口org.apache.hugegraph.plugin.HugeGraphPlugin定义如下:

public interface HugeGraphPlugin {
 
     public String name();
 
@@ -6482,7 +6482,7 @@
 
     public String supportsMaxVersion();
 }
-

并且HugeGraphPlugin提供了4个静态方法用于注册扩展项:

  • registerOptions(String name, String classPath):注册配置项
  • registerBackend(String name, String classPath):注册后端(BackendStoreProvider)
  • registerSerializer(String name, String classPath):注册序列化器
  • registerAnalyzer(String name, String classPath):注册分词器

下面以注册SpaceAnalyzer分词器为例:

package com.baidu.hugegraph.plugin;
+

并且 HugeGraphPlugin 提供了 4 个静态方法用于注册扩展项:

  • registerOptions(String name, String classPath):注册配置项
  • registerBackend(String name, String classPath):注册后端(BackendStoreProvider)
  • registerSerializer(String name, String classPath):注册序列化器
  • registerAnalyzer(String name, String classPath):注册分词器

下面以注册 SpaceAnalyzer 分词器为例:

package org.apache.hugegraph.plugin;
 
 public class DemoPlugin implements HugeGraphPlugin {
 
@@ -6496,8 +6496,8 @@
         HugeGraphPlugin.registerAnalyzer("demo", SpaceAnalyzer.class.getName());
     }
 }
-

4. 配置SPI入口

  1. 确保services目录存在:hugegraph-plugin-demo/resources/META-INF/services
  2. 在services目录下建立文本文件:com.baidu.hugegraph.plugin.HugeGraphPlugin
  3. 文件内容如下:com.baidu.hugegraph.plugin.DemoPlugin

5. 打Jar包

通过maven打包,在项目目录下执行命令mvn package,在target目录下会生成Jar包文件。 -使用时将该Jar包拷到plugins目录,重启服务即可生效。

6.4 - Backup Restore

描述

Backup 和 Restore 是备份图和恢复图的功能。备份和恢复的数据包括元数据(schema)和图数据(vertex 和 edge)。

Backup

将 HugeGraph 系统中的一张图的元数据和图数据以 JSON 格式导出。

Restore

将 Backup 导出的JSON格式的数据,重新导入到 HugeGraph 系统中的一个图中。

Restore 有两种模式:

  • Restoring 模式,将 Backup 导出的元数据和图数据原封不动的恢复到 HugeGraph 系统中。可用于图的备份和恢复,一般目标图是新图(没有元数据和图数据)。比如:
    • 系统升级,先备份图,然后升级系统,最后将图恢复到新的系统中
    • 图迁移,从一个 HugeGraph 系统中,使用 Backup 功能将图导出,然后使用 Restore 功能将图导入另一个 HugeGraph 系统中
  • Merging 模式,将 Backup 导出的元数据和图数据导入到另一个已经存在元数据或者图数据的图中,过程中元数据的 ID 可能发生改变,顶点和边的 ID 也会发生相应变化。
    • 可用于合并图

使用方法

可以使用hugegraph-tools进行图的备份和恢复。

Backup

bin/hugegraph backup -t all -d data
+

4. 配置 SPI 入口

  1. 确保 services 目录存在:hugegraph-plugin-demo/resources/META-INF/services
  2. 在 services 目录下建立文本文件:org.apache.hugegraph.plugin.HugeGraphPlugin
  3. 文件内容如下:org.apache.hugegraph.plugin.DemoPlugin

5. 打 Jar 包

通过 maven 打包,在项目目录下执行命令mvn package,在 target 目录下会生成 Jar 包文件。 +使用时将该 Jar 包拷到plugins目录,重启服务即可生效。

6.4 - Backup Restore

描述

Backup 和 Restore 是备份图和恢复图的功能。备份和恢复的数据包括元数据(schema)和图数据(vertex 和 edge)。

Backup

将 HugeGraph 系统中的一张图的元数据和图数据以 JSON 格式导出。

Restore

将 Backup 导出的JSON格式的数据,重新导入到 HugeGraph 系统中的一个图中。

Restore 有两种模式:

  • Restoring 模式,将 Backup 导出的元数据和图数据原封不动的恢复到 HugeGraph 系统中。可用于图的备份和恢复,一般目标图是新图(没有元数据和图数据)。比如:
    • 系统升级,先备份图,然后升级系统,最后将图恢复到新的系统中
    • 图迁移,从一个 HugeGraph 系统中,使用 Backup 功能将图导出,然后使用 Restore 功能将图导入另一个 HugeGraph 系统中
  • Merging 模式,将 Backup 导出的元数据和图数据导入到另一个已经存在元数据或者图数据的图中,过程中元数据的 ID 可能发生改变,顶点和边的 ID 也会发生相应变化。
    • 可用于合并图

使用方法

可以使用hugegraph-tools进行图的备份和恢复。

Backup

bin/hugegraph backup -t all -d data
 

该命令将 http://127.0.0.1 的 hugegraph 图的全部元数据和图数据备份到data目录下。

Backup 在三种图模式下都可以正常工作

Restore

Restore 有两种模式: RESTORING 和 MERGING,备份之前首先要根据需要设置图模式。

步骤1:查看并设置图模式
bin/hugegraph graph-mode-get
 

该命令用于查看当前图模式,包括:NONE、RESTORING、MERGING。

bin/hugegraph graph-mode-set -m RESTORING
 

该命令用于设置图模式,Restore 之前可以设置成 RESTORING 或者 MERGING 模式,例子中设置成 RESTORING。

步骤2:Restore 数据
bin/hugegraph restore -t all -d data
diff --git a/cn/docs/changelog/hugegraph-0.10.4-release-notes/index.html b/cn/docs/changelog/hugegraph-0.10.4-release-notes/index.html
index 6f6e948bf..e8bd9b28e 100644
--- a/cn/docs/changelog/hugegraph-0.10.4-release-notes/index.html
+++ b/cn/docs/changelog/hugegraph-0.10.4-release-notes/index.html
@@ -4,7 +4,7 @@
 支持 HugeGraphServer 服务端内存紧张时返回错误拒绝请求 (hugegraph #476)
 支持 API 白名单和 HugeGraphServer GC 频率控制功能 (hugegraph #522)
 支持 Rings API …">
-

7.1.3 删除某个异步任务信息,不删除异步任务本身

Method & Url
DELETE http://localhost:8080/graphs/hugegraph/tasks/2
@@ -3666,7 +3666,7 @@
     "}" +
 "}"
 
Method & Url
PUT http://localhost:8080/graphs/hugegraph/tasks/2?action=cancel
-

请保证在10秒内发送该请求,如果超过10秒发送,任务可能已经执行完成,无法取消。

Response Status
202
+

请保证在 10 秒内发送该请求,如果超过 10 秒发送,任务可能已经执行完成,无法取消。

Response Status
202
 
Response Body
{
     "cancelled": true
 }
diff --git a/cn/docs/clients/gremlin-console/index.html b/cn/docs/clients/gremlin-console/index.html
index f02f2f938..c91dab933 100644
--- a/cn/docs/clients/gremlin-console/index.html
+++ b/cn/docs/clients/gremlin-console/index.html
@@ -17,7 +17,7 @@
 这种模式便于用户快速上手体验,但是不适合大量数据插入和查询的场景。下面给一个示例:
 在 script 目录下有一个示例脚本 example.groovy:
 import org.apache.hugegraph.HugeFactory import org.apache.hugegraph.backend.id.IdGenerator import org.apache.hugegraph.dist.RegisterUtil import org.apache.hugegraph.type.define.NodeRole import org.apache.tinkerpop.gremlin.structure.T RegisterUtil.registerRocksDB() conf = "conf/graphs/hugegraph.properties" graph = HugeFactory.open(conf) graph.serverStarted(IdGenerator.of("server-tinkerpop"), NodeRole.MASTER) schema = graph.schema() schema.propertyKey("name").asText().ifNotExist().create() schema.propertyKey("age").asInt().ifNotExist().create() schema.propertyKey("city").asText().ifNotExist().create() schema.propertyKey("weight").asDouble().ifNotExist().create() schema.propertyKey("lang").asText().ifNotExist().create() schema.propertyKey("date").asText().ifNotExist().create() schema.propertyKey("price").asInt().ifNotExist().create() schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create() schema.vertexLabel("software").properties("name", "lang", "price").primaryKeys("name").ifNotExist().create() schema.">
-

7.1.3 删除某个异步任务信息,不删除异步任务本身

Method & Url
DELETE http://localhost:8080/graphs/hugegraph/tasks/2
@@ -3668,7 +3668,7 @@
     "}" +
 "}"
 
Method & Url
PUT http://localhost:8080/graphs/hugegraph/tasks/2?action=cancel
-

请保证在10秒内发送该请求,如果超过10秒发送,任务可能已经执行完成,无法取消。

Response Status
202
+

请保证在 10 秒内发送该请求,如果超过 10 秒发送,任务可能已经执行完成,无法取消。

Response Status
202
 
Response Body
{
     "cancelled": true
 }
diff --git a/cn/docs/clients/restful-api/auth/index.html b/cn/docs/clients/restful-api/auth/index.html
index e9a71ca9e..c07e49eef 100644
--- a/cn/docs/clients/restful-api/auth/index.html
+++ b/cn/docs/clients/restful-api/auth/index.html
@@ -29,7 +29,7 @@
 10.2.1 创建用户 Params user_name: 用户名称 user_password: 用户密码 user_phone: 用户手机号 user_email: 用户邮箱 其中 user_name 和 user_password 为必填。
 Request Body { "user_name": "boss", "user_password": "******", "user_phone": "182****9088", "user_email": "123@xx.com" } Method & Url POST http://localhost:8080/graphs/hugegraph/auth/users Response Status 201 Response Body 返回报文中,密码为加密后的密文
 { "user_password": "******", "user_email": "123@xx.">
-

7.1.3 删除某个异步任务信息,不删除异步任务本身

Method & Url
DELETE http://localhost:8080/graphs/hugegraph/tasks/2
@@ -56,11 +56,11 @@
     "}" +
 "}"
 
Method & Url
PUT http://localhost:8080/graphs/hugegraph/tasks/2?action=cancel
-

请保证在10秒内发送该请求,如果超过10秒发送,任务可能已经执行完成,无法取消。

Response Status
202
+

请保证在 10 秒内发送该请求,如果超过 10 秒发送,任务可能已经执行完成,无法取消。

Response Status
202
 
Response Body
{
     "cancelled": true
 }
-

此时查询 label 为 man 的顶点数目,一定是小于 10 的。


Last modified April 17, 2022: rebuild doc (ef365444)
+

此时查询 label 为 man 的顶点数目,一定是小于 10 的。


diff --git a/cn/docs/clients/restful-api/traverser/index.html b/cn/docs/clients/restful-api/traverser/index.html index 30e7b31a5..90fbc9ee1 100644 --- a/cn/docs/clients/restful-api/traverser/index.html +++ b/cn/docs/clients/restful-api/traverser/index.html @@ -6,7 +6,7 @@ K-out API,根据起始顶点,查找恰好N步可达的邻居,分为基础版和高级版: 基础版使用GET方法,根据起始顶点,查找恰好N步可达的邻居 高级版使用POST方法,根据起始顶点,查找恰好N步可达的邻居,与基础版的不同在于: 支持只统计邻居数量 支持顶点和边属性过滤 支持返回到达邻居的最短路径 K-neighbor API,根据起始顶点,查找N步以内可达的所有邻居,分为基础版和高级版: 基础版使用GET方法,根据起始顶点,查找N步以内可达的所有邻居 高级版使用POST方法,根据起始顶点,查找N步以内可达的所有邻居,与基础版的不同在于: 支持只统计邻居数量 支持顶点和边属性过滤 支持返回到达邻居的最短路径 Same Neighbors, 查询两个顶点的共同邻居 Jaccard Similarity API,计算jaccard相似度,包括两种: 一种是使用GET方法,计算两个顶点的邻居的相似度(交并比) 一种是使用POST方法,在全图中查找与起点的jaccard similarity最高的N个点 Shortest Path API,查找两个顶点之间的最短路径 All Shortest Paths,查找两个顶点间的全部最短路径 Weighted Shortest Path,查找起点到目标点的带权最短路径 Single Source Shortest Path,查找一个点到其他各个点的加权最短路径 Multi Node Shortest Path,查找指定顶点集之间两两最短路径 Paths API,查找两个顶点间的全部路径,分为基础版和高级版: 基础版使用GET方法,根据起点和终点,查找两个顶点间的全部路径 高级版使用POST方法,根据一组起点和一组终点,查找两个集合间符合条件的全部路径 Customized Paths API,从一批顶点出发,按(一种)模式遍历经过的全部路径 Template Path API,指定起点和终点以及起点和终点间路径信息,查找符合的路径 Crosspoints API,查找两个顶点的交点(共同祖先或者共同子孙) Customized Crosspoints API,从一批顶点出发,按多种模式遍历,最后一步到达的顶点的交点 Rings API,从起始顶点出发,可到达的环路路径 Rays API,从起始顶点出发,可到达边界的路径(即无环路径) Fusiform Similarity API,查找一个顶点的梭形相似点 Vertices API 按ID批量查询顶点; 获取顶点的分区; 按分区查询顶点; Edges API 按ID批量查询边; 获取边的分区; 按分区查询边; 3."> -