Approximate Closeness Centrality
In the Closeness Centrality algorithm, to obtain the closeness centrality score for a vertex, we measure the distance from the source vertex to every single vertex in the graph.
In large graphs, running this calculation for every vertex can be highly time-consuming.
This algorithm approximates the closeness centrality score for each vertex pair by combining two estimation approaches: sampling and pivoting. This hybrid estimation approach offers near-linear time processing and linear space overhead within a small relative error.
Notes
This query uses another subquery closeness_cent_approx_sub
,
which needs to be installed before closeness_approx
can be successfully run.
References
For further information about the algorithm design, see Cohen et al., 2013, Computing Classic Closeness Centrality, at Scale.
Specifications
tg_closeness_approx (
SET<STRING> v_type,
SET<STRING> e_type,
INT k = 100, # sample num
INT max_hops = 10, # max BFS explore steps
DOUBLE epsilon = 0.1, # error parameter
BOOL print_accum = true, # output to console
STRING file_path = "", # output file
INT debug = 0, # debug flag -- 0: No LOG;1: LOG without the sample-node bfs loop;2: ALL LOG.
INT sample_index = 0, # random sample group
INT maxsize = 1000, # max size of connected components using exact closeness algorithm
BOOL wf = True # Wasserman and Faust formula
)
Parameters
Name | Description |
---|---|
|
Vertex types to calculate approximate closeness centrality for. |
|
Edge types to traverse. |
|
Size of the sample. |
|
Upper limit of how many jumps the algorithm will perform from each vertex. |
|
The maximum relative error, between 0 and 1. Setting a lower value produces a more accurate estimate but increases run time. |
|
Whether to print the output to console in JSON format. |
|
If provided, the algorithm will output to this file path in CSV format |
|
There are many conditional logging statements inside the query. If the input is 0, nothing will be logged. If the input is 1, everything else but the breadth-first-search process from the sample-node will be logged. If the input is 2, everything will be logged. |
|
The algorithm will partition the graph based on the sample size. This index indicates which partition to use when estimating closeness centrality. |
|
If the number of vertices in the graph is lower than |
|
Whether to use the Wasserman and Faust formula to calculate closeness centrality rather than the classic formula. |
Example
Below is an example of running the algorithm on the social10 test graph and an excerpt of the response.
RUN QUERY tg_closeness_approx(["Person"], ["Friend", "Coworker"], 6, 3 \
0.1, true, "", 0, 0, 100, false)
[
{
"Start": [
{
"attributes": {
"Start.@closeness": 0.58333
},
"v_id": "Fiona",
"v_type": "Person"
},
{
"attributes": {
"Start.@closeness": 0.44444
},
"v_id": "Justin",
"v_type": "Person"
},
{
"attributes": {
"Start.@closeness": 0.53333
},
"v_id": "Bob",
"v_type": "Person"
}
]