Per Clause (Beta)
Introduction
Pattern matching produces a virtual match table, and the ACCUM clause acts like a FOREACH loop, executing the clause’s statement once for each row of the match table.
Patterns are paths in the graphs, and each row in the match table is a distinct path. However, paths may share some vertices or edges. Some applications do not want to do aggregations per path. Instead, they want to execute the ACCUM clause per distinct group of vertex aliases.
For example, consider the following query which counts the number of paths in a simple 2-hop pattern:
SumAccum<int> @@cnt;
S = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
ACCUM @@cnt += 1;
Suppose the query produces the following match table.
(s, edge1, m , edge2, t)//match table schema
v1, e1, v3, e2, v2 //match 1
v1, e3, v4, e4, v2 //match 2
v5, e5, v6, e6, v7 //match 3
v8, e7, v9, e8, v7 //match 4
By default, the ACCUM clause will execute the @@cnt += 1
statement 4 times, for each row in the match table. The result will be @@cnt = 4
.
For the same query, what if the user wants to
-
count the number of distinct path endings in the match table? For this case, we would want to iterate on the alias
t
. -
count the number of distinct (start, end) pairs in the match table? For that case, we would want to iterate on distinct pairs of the aliases
(s, t)
.
To provide users with this added flexibility and finer control over ACCUM iteration, TigerGraph 3.0 adds the PER clause to pattern matching (V2) syntax.
Syntax
The PER Clause is an optional clause that comes at the start of the ACCUM clause in a SELECT statement. As illustrated below, it starts with the keyword PER, and followed by a pair of parenthesis, in which user can put one or more distinct vertex aliases found in the FROM pattern.
selectBlock := SELECT alias
FROM pattern
[sampleClause]
[whereClause]
[[perClause] accumClause]
[postAccumClause]*
[havingClause]
[orderClause]
[limitClause]
perClause := PER (vertex_alias_1, vertex_alias_2, ...)
Examples. Below are multiple examples of the PER Clause using the same FROM clause.
S1 = SELECT s
FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
PER (s)
ACCUM @@cnt += 1;
S2 = SELECT t
FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
PER (t)
ACCUM @@cnt += 1;
S3 = SELECT m
FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
PER (m)
ACCUM @@cnt += 1;
S4 = SELECT t
FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
PER (s, t)
ACCUM @@cnt += 1;
S5 = SELECT t
FROM S:s - (E1:edge1) - M:m - (E2:edge2) - T:t
PER (s, m, t)
ACCUM @@cnt += 1;
Semantics
The PER Clause specifies a list of vertex aliases, which are used to group the rows in the match table, one group per distinct value of the alias or of the alias list. If there are N distinct groups, we will execute the ACCUM clause N times, once per distinct vertex aliases' binding. Note that the PER clause has no effect on POST-ACCUM clauses semantic, except confining the POST-ACCUM vertex alias.
Suppose s, m, and t are vertex aliases in a pattern. Below are some interpretations of the PER Clause based on the graph element bindings found in the match table.
-
PER (s) ACCUM
means that per each distinct s vertex, execute the ACCUM clause once. -
PER (s,t) ACCUM
means that per each distinct (s, t) pair, execute the ACCUM clause once. -
PER (s,m,t) ACCUM
means that per each distinct (s, m, t) tuple, execute the ACCUM clause once.
Examples to show PER clause semantics.
//match table
(s, edge1, m , edge2, t)//schema
v1, e1, v3, e2, v2 //match 1
v1, e3, v4, e4, v2 //match 2
v5, e5, v6, e6, v7 //match 3
v8, e7, v9, e8, v7 //match 4
//since we have v1, v5, and v8 three distinct vertices bind to s,
//we execute ACCUM clause 3 times.
S1 = SELECT s
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s)
ACCUM @@cnt += 1;
//since we have v2, v7 two distinct vertices bind to t,
//we execute ACCUM clause twice.
S2 = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (t)
ACCUM @@cnt += 1;
//since we have v3, v4, v6, v9 four distinct vertices bind to m,
//we execute ACCUM clause 4 times.
S3 = SELECT m
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (m)
ACCUM @@cnt += 1;
//since we have (v1, v2), (v5, v7) and (v8, v7) three distinct vertex pairs
//bind to (s,t), we execute ACCUM clause 3 times.
S4 = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s, t)
ACCUM @@cnt += 1;
//since we have (v1, v3, v2), (v1, v4, v2), (v5, v6, v7) and (v8, v9, v7) four
//distinct vertex groups bind to (s,m,t), we execute ACCUM clause 4 times.
S5 = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s, m, t)
ACCUM @@cnt += 1;
If the PER Clause is used in a SELECT query block, then the vertex aliases used in the SELECT, ACCUM , and POST-ACCUM clauses must be confined to the aliases that appear in the PER clause. |
Below are some illegal cases.
//semantic error. SELECT t, but t doesn't appear in PER clause.
S1 = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s, m)
ACCUM @@cnt += 1;
//semantic error. ACCUM t.@cnt, but t doesn't appear in PER clause.
S2 = SELECT t
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s, m)
ACCUM t.@cnt += 1;
//semantic error. POST-ACCUM t.@cnt, but t doesn't appear in PER clause.
S3 = SELECT s
FROM S:s - (E1:edge1) - M:m -(E2:edge2) - T:t
PER (s)
ACCUM s.@cnt += 1
POST-ACCUM t.@cnt =1;
PER Clause Examples
Example 1. Count the number of Countries that has a City which has a resident that likes a post.
//Example 1.
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
R = SELECT c
FROM Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
PER (c)
ACCUM @@cnt +=1;
PRINT @@cnt;
}
//results
Using graph 'ldbc_snb'
The query AA is dropped.
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@cnt": 111}]
}
Example 2. Count the number of posts liked by a person who is located in a city that belongs to a country. (All cities are in a country, but humor us. We are reusing the same FROM pattern in several examples.)
//Example 2.
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2 {
SumAccum<int> @@cnt;
R = SELECT p
FROM Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
PER (p)
ACCUM @@cnt +=1;
PRINT @@cnt;
//result
Using graph 'ldbc_snb'
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@cnt": 70668}]
}
Example 3. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country.
//Exmaple 3
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2{
MapAccum<string, SumAccum<int>> @@postPerCountry;
R = SELECT p
FROM Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
WHERE c.name in ("Dominican_Republic","Angola", "Cambodia")
PER (c, p)
ACCUM @@postPerCountry += (c.name -> 1);
PRINT @@postPerCountry;
}
//results
Using graph 'ldbc_snb'
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@postPerCountry": {
"Dominican_Republic": 395,
"Angola": 12,
"Cambodia": 4002
}}]
}
Example 4. Find for each country in ("Dominican_Republic","Angola", "Cambodia") the number of posts that is liked by a person living in that country. Use local accumulators this time.
USE GRAPH ldbc_snb
INTERPRET QUERY () SYNTAX v2{
SumAccum<int> @postCnt;
R = SELECT c
FROM Country:c -(<IS_PART_OF.<IS_LOCATED_IN.LIKES>)- Post:p
WHERE c.name in ("Dominican_Republic","Angola", "Cambodia")
PER (c, p) //per (country, post), add 1 to c.@postCnt
ACCUM c.@postCnt += 1;
PRINT R;
}
//results
Using graph 'ldbc_snb'
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"R": [
{
"v_id": "2",
"attributes": {
"@postCnt": 12,
"name": "Angola",
"id": 2,
"url": "http://dbpedia.org/resource/Angola"
},
"v_type": "Country"
},
{
"v_id": "67",
"attributes": {
"@postCnt": 4002,
"name": "Cambodia",
"id": 67,
"url": "http://dbpedia.org/resource/Cambodia"
},
"v_type": "Country"
},
{
"v_id": "11",
"attributes": {
"@postCnt": 395,
"name": "Dominican_Republic",
"id": 11,
"url": "http://dbpedia.org/resource/Dominican_Republic"
},
"v_type": "Country"
}
]}]
}
Performance and Best Practices
The PER Clause not only helps users to control the semantics of the ACCUM clause, it also boosts the performance of the pattern match query, as it uses the PER clause to optimize the query execution.
To get the best performance, we recommend three guidelines for writing efficient queries.
Use PER (target) If Possible
Per target is in general faster than Per source. In the example below, query q2 is faster than q1. The only difference between these two queries is q2’s FROM pattern is the flip of q1’s FROM pattern.
USE GRAPH ldbc_snb
# not recommended, since it does per (src).
CREATE QUERY q1 () SYNTAX v2 {
SumAccum<int> @@cnt ;
T = SELECT c
FROM Comment:c - (<LIKES) - Person:ps - (IS_LOCATED_IN>) - City:city
WHERE year(c.creationDate) >= 2006
PER (c)
ACCUM @@cnt += 1;
PRINT @@cnt;
}
# recommended, since it does per (tgt)
CREATE QUERY q2 () SYNTAX v2 {
SumAccum<int> @@cnt ;
T = SELECT c
FROM City:city - (<IS_LOCATED_IN) - Person:ps - (LIKES>) - Comment:c
WHERE year(c.creationDate) >= 2006
PER (c)
ACCUM @@cnt += 1;
PRINT @@cnt;
}
Write Patterns with Smallest Expected Vertex Set on the Left
The match table is built by traversing the pattern from left to right. Follow the basic principle of pruning early rather than late by orienting the query the smaller cardinality set on the left. This practice will result in producing the least number of candidate matches during the query computation. For example, if there are fewer distinct tags than persons, then query q4 is faster than q3.
USE GRAPH ldbc_snb
# not recommended, since the pattern starts from a large cardinality vertex type
# (Person), and ends at a small cardinality vertex type (Tag).
CREATE QUERY q3 () SYNTAX v2 {
SumAccum<int> @cnt;
V = SELECT s
FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-:f - (HAS_TAG>) - :t
PER (s)
ACCUM s.@cnt += 1;
PRINT V.size();
}
# recommended, start from small cardinality end (Tag), and use per tgt
CREATE QUERY q4 () SYNTAX v2 {
SumAccum<int> @cnt;
V = SELECT s
FROM Tag:t-(<HAS_TAG)-Forum:f -(CONTAINER_OF>)-Post:p - (<LIKES)- Person:s
PER (s)
ACCUM s.@cnt += 1;
PRINT V.size();
}
Specify Complete Type Information
Specifying complete type information improves performance. For example, query q6 is faster than q5 even though they are known to be logically identical. Forum
is the CONTAINER_OF
Post
, so it does not need to be specified in q5, but explicitly saying Forum
in q6 speeds up performance.
USE GRAPH ldbc_snb
#we do not put Forum befoe :f, even if we know it.
CREATE QUERY q5 () SYNTAX v2 {
SumAccum<int> @@person_cnt;
V = SELECT s
FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-:f - (HAS_TAG>) - :t
PER (s)
ACCUM @@person_cnt += 1;
PRINT @@person_cnt;
}
#recommended: we put Forum as the type info.
CREATE QUERY q6 () SYNTAX v2 {
SumAccum<int> @@person_cnt;
V = SELECT s
FROM Person:s- (LIKES>)-Post:p - (<CONTAINER_OF)-Forum:f - (HAS_TAG>) - :t
PER (s)
ACCUM @@person_cnt += 1;
PRINT @@person_cnt;
}
LDBC Benchmark Queries
Using the PER clause and linear regular path pattern, we have translated all of the LDBC-SNB queries. You can find them on github at https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries_linear/queries. Most of the queries are installed as functions. You can find sample parameter(s) of the functions from https://github.com/tigergraph/ecosys/tree/ldbc/ldbc_benchmark/tigergraph/queries/seeds.