Personalized PageRank
In the original PageRank, the damping factor is the probability of the surfer continues browsing at each step. The surfer may also stop browsing and start again from a random vertex. In personalized PageRank, the surfer can only start browsing from a given set of source vertices both at the beginning and after stopping.
Specifications
tg_pagerank_pers(SET<VERTEX> source, STRING e_type,
FLOAT max_change=0.001, INT max_iter=25, FLOAT damping = 0.85, INT top_k = 100
BOOL print_accum = TRUE, STRING result_attr = "", STRING file_path = "")
Parameters
Parameter | Description | Default |
---|---|---|
|
Names of vertex types to use |
(empty set of strings) |
|
Name of edge types to use |
(empty string) |
|
PageRank will stop iterating when the largest
difference between any vertex’s current score and its previous score ≤
|
0.001 |
|
Maximum number of iterations. |
25 |
|
Fraction of score that is due to the score of neighbors. The balance (1 - damping) is a minimum baseline score that every vertex receives. |
0.85 |
|
Sort the scores highest first and output only this many scores |
100 |
|
If True, output JSON to standard output |
True |
|
If not empty, store PageRank values in FLOAT format to this vertex attribute |
(empty string) |
|
If not empty, write output to this file. |
(empty string) |
Example
We ran Personalized PageRank on the graph social10
using Friend edges with the following parameter values:
# Using "_" to use default values
RUN QUERY tg_pagerank_pers([("Fiona","Person")], "Friend", _, _, _, _, _, _,
_)
In this case, the random walker can only start or restart walking from Fiona. In the figure below, we see that Fiona has the highest PageRank score in the result. Ivy and George have the next highest scores because they are direct out-neighbors of Ivy and there are looping paths that lead back to them again. Half of the vertices have a score of 0 since they can not be reached from Fiona.