k-Nearest Neighbors (Cross-Validation Version)
k-Nearest Neighbors (kNN) is often used for machine learning.
You can choose the value for topK
based on your experience, or using cross-validation to optimize the hyperparameters.
In our library, Leave-one-out cross-validation for selecting optimal k is provided. Given a k value, we run the algorithm repeatedly using every vertex with a known label as the source vertex and predict its label. We assess the accuracy of the predictions for each value of k, and then repeat for different values of k in the given range.
The goal is to find the value of k with highest predicting accuracy in the given range, for that dataset.
Specifications
tg_knn_cosine_cv( SET<STRING> v_type_set, SET<STRING> e_type_set, SET<STRING> reverse_e_type_set,
STRING weight_attribute, STRING label, INT min_k, INT max_k) RETURNS (INT)
gsql
Parameters
Parameter | Description | Default Value |
---|---|---|
|
The vertex types to calculate the distance to the source vertex for. |
(empty set of strings) |
|
The edge types to use |
(empty set of strings) |
|
The reverse edge types to use |
(empty set of strings) |
|
If not empty, use this edge attribute as the edge weight. |
(empty string) |
|
If not empty, read an existing label from this attribute. |
(empty string) |
|
The lower bound of k (inclusive) |
N/A |
|
The upper bound of k (inclusive) |
N/A |