Skip to content

Commit

Permalink
feat: phase 1 of networkx/nebula engine, writer design #30
Browse files Browse the repository at this point in the history
feat: phase 1 of networkx/nebula engine, writer design
  • Loading branch information
wey-gu authored Mar 26, 2023
2 parents e47972d + 2691dc4 commit 5708fd7
Show file tree
Hide file tree
Showing 9 changed files with 383 additions and 75 deletions.
24 changes: 21 additions & 3 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,31 @@ df = reader.read() # this will take some time
df.show(10)
```

#### NebulaGraph Engine(NetworkX)

```python
from ng_ai import NebulaReader
from ng_ai.config import NebulaGraphConfig
# read data with spark engine, query mode
config_dict = {
"graphd_hosts": "127.0.0.1:9669",
"user": "root",
"password": "nebula",
"space": "basketballplayer",
}
config = NebulaGraphConfig(**config_dict)
reader = NebulaReader(engine="nebula", config=config)
reader.query(edges=["follow", "serve"], props=[["degree"],[]])
g = reader.read()
g.show(10)
g.draw()
```

## engines

- `ng_ai.engines.SparkEngine` is the Spark Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter` and `ng_ai.NebulaAlgorithm`.

- `ng_ai.engines.NebulaEngine` is the NebulaGraph Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter`.

- `ng_ai.engines.NetworkXEngine` is the NetworkX Engine for `ng_ai.NebulaAlgorithm`.
- `ng_ai.engines.NebulaEngine` is the NebulaGraph Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter` and `ng_ai.NebulaAlgorithm`, which is based on NetworkX and Nebula-Python.

## `NebulaDataFrameObject`

Expand Down
304 changes: 304 additions & 0 deletions examples/networkx_engine.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a54fe998",
"metadata": {},
"source": [
"![image](https://user-images.githubusercontent.com/1651790/221876073-61ef4edb-adcd-4f10-b3fc-8ddc24918ea1.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f46fdd40",
"metadata": {},
"outputs": [],
"source": [
"# install ng_ai in the first run\n",
"!pip install ng_ai[networkx]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5b4e4143",
"metadata": {},
"source": [
"## AI Suite NetworkX Engine Examples\n",
"### read data with NetowrkX engine, query mode"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f17abcf8",
"metadata": {},
"source": [
"In this example, we are leveraging the NetworkX Engine of NebulaGraph AI Suite, with the GraphD Query mode.\n",
"\n",
"#### Step 1, get dataframe by Querying the Graph\n",
"\n",
"We will scan all edge in type `follow` and `serve` first with props `degree` in `follow` and no props in `serve` as graph: `g`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e158440f",
"metadata": {},
"outputs": [],
"source": [
"from ng_ai import NebulaReader\n",
"from ng_ai.config import NebulaGraphConfig\n",
"\n",
"# read data with spark engine, query mode\n",
"config_dict = {\n",
" \"graphd_hosts\": \"graphd:9669\",\n",
" \"user\": \"root\",\n",
" \"password\": \"nebula\",\n",
" \"space\": \"basketballplayer\",\n",
"}\n",
"config = NebulaGraphConfig(**config_dict)\n",
"reader = NebulaReader(engine=\"nebula\", config=config)\n",
"reader.query(edges=[\"follow\", \"serve\"], props=[[\"degree\"], []])\n",
"g = reader.read()\n",
"g.show(10)\n",
"g.draw()"
]
},
{
"cell_type": "markdown",
"id": "3617de5f",
"metadata": {},
"source": [
"#### Step 2, run Pagerank Algorithm"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90069aaf",
"metadata": {},
"outputs": [],
"source": [
"pr_result = g.algo.pagerank(reset_prob=0.15, max_iter=10)"
]
},
{
"cell_type": "markdown",
"id": "66e70ca0",
"metadata": {},
"source": [
"#### Step 3, check results of the algorithm\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "abbce2fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+---------+-------------------+\n",
"| _id| pagerank|\n",
"+---------+-------------------+\n",
"|player133|0.18601069183310504|\n",
"|player126|0.18601069183310504|\n",
"|player130| 1.240071278887367|\n",
"|player108|0.18601069183310504|\n",
"|player102| 1.6602373739502536|\n",
"+---------+-------------------+\n",
"only showing top 5 rows\n",
"\n"
]
}
],
"source": [
"pr_result"
]
},
{
"cell_type": "markdown",
"id": "49becbdb",
"metadata": {},
"source": [
"#### Step 2, run Conncted Components Algorithm"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cfbcda82",
"metadata": {},
"outputs": [],
"source": [
"cc_result = g.algo.connected_components(max_iter=10)"
]
},
{
"cell_type": "markdown",
"id": "38181d45",
"metadata": {},
"source": [
"#### Step 3, check results of the algorithm\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bed14375",
"metadata": {},
"outputs": [],
"source": [
"cc_result"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3d088006",
"metadata": {},
"source": [
"### Write back algo result to NebulaGraph\n",
"\n",
"Assume that we have a result `graph_result` computed with `g.algo.pagerank()`:\n",
"\n",
"```python\n",
"{'player102': 0.014770646980811417,\n",
" 'player100': 0.02878478843123552,\n",
" 'player101': 0.020163880830622937,\n",
" 'player129': 0.012381302535422786,\n",
" 'player116': 0.015041184157101154,\n",
" 'player121': 0.012178909379871223,\n",
" 'player128': 0.010197889677928056,\n",
"...\n",
"}\n",
"```\n",
"\n",
"Let's write them back to tag: pagerank(pagerank). So we create a TAG `pagerank` in NebulaGraph on same space with the following schema:\n",
"\n",
"```ngql\n",
"CREATE TAG IF NOT EXISTS pagerank (\n",
" pagerank double NOT NULL\n",
");\n",
"```\n",
"\n",
"Then, we could write the pagerank result to NebulaGraph, to tag `pagerank` with property `pagerank`:\n",
"\n",
"```python\n",
"properties = [\"pagerank\"]\n",
"```\n",
"And pass it to NebulaWriter in `nebula` engine and `nebulagraph_vertex` sink"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b43261f",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Run pagerank Algorithm\n",
"graph_result = g.algo.pagerank()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c5bbf9e0",
"metadata": {},
"outputs": [],
"source": [
"from ng_ai import NebulaWriter\n",
"from ng_ai.config import NebulaGraphConfig\n",
"\n",
"config = NebulaGraphConfig()\n",
"writer = NebulaWriter(\n",
" data=graph_result, sink=\"nebulagraph_vertex\", config=config, engine=\"nebula\"\n",
")\n",
"\n",
"# properties to write\n",
"properties = [\"pagerank\"]\n",
"\n",
"writer.set_options(\n",
" tag=\"pagerank\",\n",
" vid_field=\"_id\",\n",
" properties=properties,\n",
" batch_size=256,\n",
" write_mode=\"insert\",\n",
")\n",
"# write back to NebulaGraph\n",
"writer.write()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9da30271",
"metadata": {},
"source": [
"Then we could query the result in NebulaGraph:\n",
"\n",
"```cypher\n",
"MATCH (v:pagerank)\n",
"RETURN id(v), v.pagerank.pagerank LIMIT 10;\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "5bcb02e2",
"metadata": {},
"source": [
"## How to run other algorithm examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff5a866d",
"metadata": {},
"outputs": [],
"source": [
"# lpa_result = df.algo.label_propagation()\n",
"# louvain_result = df.algo.louvain()\n",
"# k_core_result = df.algo.k_core()\n",
"# degree_statics_result = df.algo.degree_statics()\n",
"# betweenness_centrality_result = df.algo.betweenness_centrality()\n",
"# coefficient_centrality_result = df.algo.coefficient_centrality()\n",
"# bfs_result = df.algo.bfs()\n",
"# hanp_result = df.algo.hanp()\n",
"# jaccard_result = df.algo.jaccard()\n",
"# strong_connected_components_result = df.algo.strong_connected_components()\n",
"# triangle_count_result = df.algo.triangle_count()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
7 changes: 5 additions & 2 deletions ng_ai/engines.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,15 +121,18 @@ def __init__(self, config=None):
import networkx as nx
import ng_nx
from ng_nx import NebulaReader as NxReader
from ng_nx import NxScanReader, NxWriter
from ng_nx.utils import NxConfig, result_to_df
from ng_nx import NebulaScanReader as NxScanReader
from ng_nx import NebulaWriter as NxWriter
from ng_nx.utils import NebulaGraphConfig as NxConfig
from ng_nx.utils import result_to_df

self.nx = nx
self.ng_nx = ng_nx
self.nx_reader = NxReader
self.nx_writer = NxWriter
self.nx_scan_reader = NxScanReader
self._nx_config = NxConfig
self.nx_config = None

self.result_to_df = result_to_df

Expand Down
Loading

0 comments on commit 5708fd7

Please sign in to comment.