Node2Vec

Supported Graph Characteristics

Directed edges

Undirected edges

Homogeneous vertex types

Heterogeneous vertex types

Algorithm link: Node2Vec

Node2Vec is a legacy node embedding algorithm that uses random walks in the graph to create a vector representation of a node.

A random walk starts with a node, and the algorithm iteratively selects neighboring nodes to visit, and each neighboring node has an assigned probability. This transforms graph structure into a collection of linear sequences of nodes. For each node we will be left with a list of other nodes from their local or extended neighborhoods.

Once the above step is complete, the algorithm uses a variation of the word2vec model from the language modeling community to turn each node into a vector of probabilities. The probabilities represent the likelihood of visiting a given node in a random walk from each starting node.

Notes

Node2Vec consumes a lot of memory and is less scalable than Fast Random Projection. It is included in the library for legacy reasons, but in most cases, Fast Random Projection is recommended instead.

This algorithm ignores edge weights.

Specifications

tg_random_walk(INT step = 8, INT path_size = 4,
    STRING filepath = "/home/tigergraph/path.csv", SET<STRING> edge_types,
    INT sample_num)

tg_node2vec_query(STRING filepath = "/home/tigergraph/path.csv",
    STRING output_file = "/home/tigergraph/embedding.csv",
    INT dimension)gsql

Installing this query requires installing a UDF, which can be found in the GitHub repository of the query. If you are running the query on a cluster, you need to manually install the UDF on every node of the cluster.

Parameters

Parameter Description Data type

Parameter	Description	Data type
`step`	Number of random walks per node	`INT`
`path_size`	Number of hops per walk	`INT`
`filepath`	File path to output results to	`STRING`
`edge_types`	Edge types to traverse	`SET<STRING>`
`sample_num`	Number of nodes to be used in the random sample	`INT`

step

Number of random walks per node

INT

path_size

Number of hops per walk

INT

filepath

File path to output results to

STRING

edge_types

Edge types to traverse

SET<STRING>

sample_num

Number of nodes to be used in the random sample

INT