Node2Vec
Node2Vec is a legacy node embedding algorithm that uses random walks in the graph to create a vector representation of a node.
A random walk starts with a node, and the algorithm iteratively selects neighboring nodes to visit, and each neighboring node has an assigned probability. This transforms graph structure into a collection of linear sequences of nodes. For each node we will be left with a list of other nodes from their local or extended neighborhoods.
Once the above step is complete, the algorithm uses a variation of the word2vec model from the language modeling community to turn each node into a vector of probabilities. The probabilities represent the likelihood of visiting a given node in a random walk from each starting node.
Notes
Node2Vec consumes a lot of memory and is less scalable than Fast Random Projection. It is included in the library for legacy reasons, but in most cases, Fast Random Projection is recommended instead.
This algorithm ignores edge weights.
Specifications
tg_random_walk(INT step = 8, INT path_size = 4,
STRING filepath = "/home/tigergraph/path.csv", SET<STRING> edge_types,
INT sample_num)
tg_node2vec_query(STRING filepath = "/home/tigergraph/path.csv",
STRING output_file = "/home/tigergraph/embedding.csv",
INT dimension)
Installing this query requires installing a UDF, which can be found in the GitHub repository of the query. If you are running the query on a cluster, you need to manually install the UDF on every node of the cluster.