Using an Algorithm
This page provides information on how to use a TigerGraph Graph Data Science (GDS) Library algorithm.
All GDS algorithms can be found at tigergraph/gsql-graph-algorithms
on GitHub in the algorithms
folder.
The algorithms are divided into several categories.
Each algorithm has its own folder, where you can find the implementation of the algorithm, any user-defined functions required to install the algorithm, or different versions of the algorithm if they exist.
GDS Library algorithms are implemented in GSQL as GSQL queries.
Therefore, the process to run an algorithm query is the same as the process to run any other query.
The names of GDS algorithm queries as provided by the library begin with the prefix tg_
only to distinguish them from a user-written query.
However, since the library is open-source, you can change the name of the queries to any name of your choosing.
Check for data or schema constraints
Most algorithm queries in the GDS Library are schema-free, meaning that you are able to run the query on any schema. However, some algorithms have certain schema or data constraints by nature. Make sure to read the documentation for the algorithm to determine the following:
-
Does the algorithm require edges to be directed/undirected?
-
Does the algorithm require edges to be weighted/unweighted?
-
Does the algorithm require any vertex type to have an attribute of a certain data type?
-
Does the algorithm require your data to have been processed in a certain way before it runs?
For example, k-Nearest Neighbors runs on graphs with either directed or undirected edges, but the edges must have a weight attribute.
Another example is the Fast Random Projection algorithm, which expects the vertex type to have an attribute of type LIST<DOUBLE>
if you want to store the embedding results to your graph data.
Create query
If the algorithm you wish to use is not yet installed in your TigerGraph instance, you need to create and install it.
You can create the query in the following ways:
-
Locate the query in the GDS Library GitHub repository. It is a
.gsql
file named after the query. -
Copy the entire content of the query file, which is the command to create the query, and paste it into a file on the machine running TigerGraph.
-
Log in to the GSQL shell as a user with query writing privileges for the graph on which you want to create the query.
-
Run
@<filepath>
from the GSQL shell, and replace<filepath>
with the absolute path to the file where you copied the query. For example, if your filepath is/home/tigergraph/query/pagerank.gsql
, run@/home/tigergraph/query/pagerank.gsql
from the GSQL shell.
Saving a query in GraphStudio does not create the query in GSQL. |
-
Locate the query in the GDS Library GitHub repository. It is a
.gsql
file named after the query. -
Copy the entire content of the query file. This is the command to create the query.
-
Log in to GraphStudio as a user with query writing privileges for the graph on which you want to create the query.
-
Click Global View in the top-left corner and choose the graph to use.
-
Click Write Queries on the left side navigation. Click + to add a new query and enter the query name. This name must be the same as the name in the
CREATE QUERY
command -
Paste the
CREATE QUERY
command into the query and save the query.
Install query
Installing a query allows the algorithm query to access all features offered by the GSQL Query language. It also increases the performance of the query.
To install a query, run INSTALL QUERY <query_name>
in the GSQL shell.
Alternatively, you can click Install on the Write Query page of GraphStudio.
Install query in distributed mode
If you are running the query on a TigerGraph cluster, you may consider installing the query in distributed mode.
In general, distributed mode is likely to improve the performance of a query if the query meets the following conditions:
-
The query starts at a very large set of starting point vertices.
-
The query performs many hops.
For example, algorithms that compute a value for every vertex or one value for the entire graph should use distributed mode. This includes PageRank, Centrality, and Connected Component algorithms.
To install a query in distributed mode in the GSQL shell, run command INSTALL QUERY <query_name> -DISTRIBUTED
.
To install a query in distributed mode from GraphStudio, change the CREATE QUERY
at the beginning of the command to CREATE DISTRIBUTED QUERY
, and then click Install.
Run query
Once the query has been installed, you can run the query on your graph data. Installing a query also creates a REST endpoint you can use to call the query.
Most of the algorithm queries provide multiple output formats. You usually have the option to output one or more of the following:
-
JSON output to console
-
CSV output to a file
-
Results written to an attribute of vertices or edges