RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF #18

wey-gu · 2023-03-02T03:53:01Z

Simplifying things in surprising ways.

Native Query experience to leverage ngdi.

API

execution engine/mode

networkx
spark

Call Syntax

Scan based(read mode):

RETURN ngdi("pagerank", ["follow"], ["degree"]) // this means to call in parallel execution mode: spark
RETURN ngdi("pagerank", ["follow"], ["degree"], "compact") // default execution mode, call the single process version in NetworkX

Query based:

Option 0:

MATCH ()-[:follow]->() RETURN e LIMIT 10000 | YIELD collect($.e) AS graph |
RETURN ngdi.query("pagerank", $-.graph)

MATCH ()-[:follow]->() RETURN e LIMIT 10000
WITH collect(e) AS graph
RETURN ngdi.query("pagerank", graph)

Option 1:

YIELD "MATCH ()-[:follow]->() RETURN e LIMIT 10000" AS query |
YIELD ngdi("pagerank", $-.query)

Write Mode

return mode, the function will return the records(ideally in a streaming way)
update mode, the result will be written to the calculated vertices as prop(s), in update way
insert mode, the result will be written to the calculated vertices as prop(s), in insert way

Design

Setup ngdi-api-server listening on 9999(thrift) or 19999(http)
Call ngdi-api-server from UDF
Support to call Compact(networkx) or Parallel(spark) mode on demand with hint

ref: vesoft-inc/nebula#4804

The text was updated successfully, but these errors were encountered:

wey-gu · 2023-03-06T13:45:13Z

The minimal PoC implementation will be:

execution mode: parallel(spark)
read mode: scan&& option 1-query-based
write mode: insert

WIP on:

#18

whitewum · 2023-03-07T08:16:29Z

let's choose the cypher way, not the ngql way.

MATCH ()-[:follow]->() RETURN e LIMIT 10000
WITH collect(e) AS graph
RETURN ngdi.query("pagerank", graph)

whitewum · 2023-03-07T08:18:39Z

I think not necessary to say networkx as compact and spark as parallel .
The name "networkx" and "spark" are fine. Probably, we can introduce more graph engines in the future.

wey-gu · 2023-03-07T08:18:48Z

let's choose the cypher way, not the ngql way.

MATCH ()-[:follow]->() RETURN e LIMIT 10000
WITH collect(e) AS graph
RETURN ngdi.query("pagerank", graph)

Sure, this is much better, but a little hard to implement, but will eventually implement it in this way.

wey-gu · 2023-03-07T08:19:40Z

I think not necessary to say networkx as compact and spark as parallel . The name "networkx" and "spark" are fine. Probably, we can introduce more graph engines in the future.

make sense, then we don't have to introduce other option of compact but just another mode.

whitewum · 2023-03-07T08:24:05Z

let's choose the cypher way, not the ngql way.
MATCH ()-[:follow]->() RETURN e LIMIT 10000
WITH collect(e) AS graph
RETURN ngdi.query("pagerank", graph)
Sure, this is much better, but a little hard to implement, but will eventually implement it in this way.

Ok, for now, let's choose the easiest way. We can change the DSL later. It is not determined.

whitewum · 2023-03-07T08:32:39Z

I don't get. is this udf implemented in c++ or python in Nebula?

Call ngdi-api-server from UDF

the udf seems like a c++ client of ngdi-api-server?

wey-gu · 2023-03-07T08:41:46Z

I don't get. is this udf implemented in c++ or python in Nebula?
Call ngdi-api-server from UDF
the udf seems like a c++ client of ngdi-api-server?

Exactly, thus for the query-based reader, in spark mode, passing the query string rather than evaluating is much easier to implement in the initial fast PoC version.

UDF (c++) make calls from graphd to ngdi api server(run in either spark cluster or as a single process in python)

whitewum · 2023-03-07T08:44:48Z

why not add a python udf in nebula instead?

wey-gu · 2023-03-07T08:45:47Z

why not add a python udf in nebula instead?

Because it's the merged implementation(easier) for now 😭

wey-gu · 2023-03-07T08:47:15Z

why not add a python udf in nebula instead?

We could try adding FFI in UDF to call python code though from graphd directly, but that will benefit the non-spark version only.

But there should be some dirty work on binding this from current UDF infra(basically just the existing function manager, pure c++ by nature)

whitewum · 2023-03-07T08:59:39Z

ok. the c++ client of ngdi-api-server is the easiest way so far.

But, If the udf call is introduced in DSL, the syntax check is not easy.

For example, is it a correct graph structure in this page_udf(graph)? who will check the correctness? graphD or spark?

wey-gu · 2023-03-07T09:07:57Z

the syntax check is not easy.

For example, is it a correct graph structure in this page_udf(graph)? who will check the correctness? graphD or spark?

Indeed, for now, I put all validation that the UDF could do in UDF because it should fail early and explicitly hint at where it goes wrong as much as possible(we could see there are a lot of checks in the current poc UDF in that branch).

From the ngdi_gateway side, there should be as much as possible early check(when needed) and exception handling to not confuse users, too.

For instance, in match query read_mode, in the fast track of implementation(option 2), graphd only treats it as a string, which has to be evaluated by spark connector ngql reader, but I will do my best to make it smoother/clear/lovely to use.

in production/future delivery version of the UDF for ndgi calling, it'll be quite heavy to do enough checks before calling remote ngdi.

wey-gu changed the title ~~RFC: wiring ngdi algo with NebulaGraph UDF~~ RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF Mar 2, 2023

wey-gu added a commit that referenced this issue Mar 7, 2023

WIP: feat: ngdi UDF gateway

68d2173

#18

wey-gu mentioned this issue Mar 7, 2023

feat: ngdi UDF gateway #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF #18

RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF #18

wey-gu commented Mar 2, 2023 •

edited

Loading

wey-gu commented Mar 6, 2023 •

edited

Loading

whitewum commented Mar 7, 2023

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

wey-gu commented Mar 7, 2023

whitewum commented Mar 7, 2023

whitewum commented Mar 7, 2023 •

edited

Loading

wey-gu commented Mar 7, 2023 •

edited

Loading

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

wey-gu commented Mar 7, 2023 •

edited

Loading

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF #18

RFC: connecting the dots, wiring ngdi algo with NebulaGraph UDF #18

Comments

wey-gu commented Mar 2, 2023 • edited Loading

API

execution engine/mode

Call Syntax

Write Mode

Design

wey-gu commented Mar 6, 2023 • edited Loading

whitewum commented Mar 7, 2023

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

wey-gu commented Mar 7, 2023

whitewum commented Mar 7, 2023

whitewum commented Mar 7, 2023 • edited Loading

wey-gu commented Mar 7, 2023 • edited Loading

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

wey-gu commented Mar 7, 2023 • edited Loading

whitewum commented Mar 7, 2023

wey-gu commented Mar 7, 2023

wey-gu commented Mar 2, 2023 •

edited

Loading

wey-gu commented Mar 6, 2023 •

edited

Loading

whitewum commented Mar 7, 2023 •

edited

Loading

wey-gu commented Mar 7, 2023 •

edited

Loading

wey-gu commented Mar 7, 2023 •

edited

Loading