Accumulators Tutorial
Introduction
GSQL is a Turing complete Graph Database query language. Comparing to other graph query languages, the biggest advantages is its support of accumulators — global or vertex local.
In addition to provide the classic pattern match syntax, which is easy to master, GSQL supports powerful run-time vertex attributes (a.k.a local accumulators) and global state variables (a.k.a global accumulators).
This short tutorial aims to shorten the learning curve of accumulator. Supposedly, after reading this article, everyone can master the essence of accumulator by heart, and start solving real-life graph problems with this handy language feature.
What is an Accumulator?
An accumulator is a state variable in GSQL. Its state is mutable throughout the life cycle of a query. It has an initial value, and users can keep accumulating (using its "+=" built-in operator) new values into it. Each accumulator variable has a type. The type decides what semantics the declared accumulator will use to interpret the "+=" operation.
In Figure 1’s left box, from line 3 to line 8, six different accumulator variables (those with prefix @@) are declared, each with a unique type. Below we explain the semantic and usage of them.
-
SumAccum<INT> allows user to keep adding INT values into its internal state variable. As the line 10 and 11 have shown, we added 1 and 2 to the accumulator, and end up with the value 3 (shown on line 3 in the right box).
-
MinAccum<INT> keeps the smallest INT number it has seen. As the line 14 and 15 have shown, we accumulated 1 and 2 to the MinAccum accumulator, and end up with the value 1 (shown on line 6 in the right box).
-
MaxAccum<INT> is symmetric to MinAccum. It returns the MAX INT value it has seen. Lines 18 and 19 show that we send 1 and 2 into it, and end up with the value 2 (shown on line 9 in the right box).
-
OrAccum keeps OR-ing the internal boolean state variable with new boolean variables that accumulate to it. The initial default value is FALSE. Lines 22 and 23 show that we send TRUE and FALSE into it, and end up with the TRUE value (shown on line 12 in the right box).
-
AndAccum is symmetric to OrAccum. Instead of using OR, it uses the AND accumulation semantics. Line 26 and 27 show that we accumulate TRUE and FALSE into it, and end up with the FALSE value (shown on line 15 in the right box).
-
ListAccum<INT> keeps appending new integer(s) into its internal list variable. Line 30 - 32 show that we append 1, 2, and [3,4] to the accumulator, and end up with [1,2,3,4] (shown on lines 19-22 in the right box).
Global vs. Vertex-attached Accumulator
At this point, we have seen that accumulators are special typed variable in GSQL language. We are ready to explain their global and local scopes.
Global accumulator belongs to the entire query. Anywhere in a query, a statement can update its value. Local accumulator belongs to each vertex. It can only be updated when its owning vertex is accessible. To differentiate them, we use special prefixes in the identifier when we declare them.
-
@@ prefix is used for declaring global accumulator variable. It is always used stand-alone. E.g **`@@cnt +=1`
-
@ prefix is used for declaring local accumulator variable. It must be used with a vertex alias in a query block. E.g.
v.@cnt += 1,
where v is a vertex alias specified in a FROM clause of a SELECT-FROM-WHERE query block.
Consider a toy social graph modeled by a person vertex type and a person-to-person friendship edge type shown in Figure 2. Below we write a query, which accepts a person, and does a 1-hop traversal from the input person to its neighbors. We use the @@global_edge_cnt accumulator to accumulate the total number of edges we traverse. And we use @vertex_cnt to write to the input person’s each friend vertex an integer 1.
As Figure 2 shows, Dan has 4 direct friends — Tom, Kevin, Jenny, and Nancy, each of which holds a local accumulator @vertex_cnt. And the @@global_edge_cnt has value 4, reflecting the fact that for each edge, we have accumulated 1 into it.
ACCUM vs. POST-ACCUM
ACCUM and POST-ACCUM clauses are computed in stages, where-in a SELECT-FROM-WHERE query block, ACCUM is executed first, followed by the POST-ACCUM clause.
-
ACCUM executes its statement(s) once for each matched subgraph specified by the FROM, WHERE, and PER clauses. Further, ACCUM parallelly executes its statements for all the matches.
-
POST-ACCUM executes its statement(s) once for each involved vertex. Note that each POST-ACCUM clause should bind to one and only one vertex alias specified in the FROM clause. Its statements can access the aggregated accumulator result computed in the ACCUM clause.
Conclusion
We have explained the mechanism of accumulators, their types, and the two different scopes—global and local. We also elaborate the ACCUM and POST-ACCUM clause semantics. Once you master the basics, the rest is to practice more. We have made available 46 queries based on the LDBC schema. These 46 queries are divided into three groups.
You can follow Pattern Matching Tutorial to setup the environment. You can also post your feedback and questions on the GSQL community forum. Our community members and developers love to hear any feedback from your graph journey of using GSQL and are ready to help clarifying any doubts.