Memory management
This page explains the key factors that affect memory usage in TigerGraph, as well as how to:
-
Monitor memory usage
-
Configure memory thresholds
-
Manage query workload distribution
Memory usage key factors
Below are the major factors that impact memory usage in TigerGraph:
-
Data volume
-
The biggest factor that affects memory usage in TigerGraph is the amount of data you have. Most database operations require more memory the more data you have.
-
-
Queries
-
Queries that read or write a high volume of data use more memory.
-
In a distributed cluster, a non-distributed query can be memory-intensive for the node where the query is run. See Distributed Query Mode.
-
Monitor memory usage
You can monitor memory usage by query and by machine.
Monitor query memory usage
TigerGraph’s Graph Processing Engine (GPE) records memory usage by query at different stages of the query and saves it to the GPE logs at $(gadmin config get System.LogRoot)/gpe/log.INFO
.
This is an estimation of the query’s memory usage on a particular node in a TigerGraph cluster.
You can monitor how much memory a query is using by searching the log file for the request ID and filter for lines that contain "QueryMem"
:
$ grun all 'grep -i <request_id> $(gadmin config get System.LogRoot)/gpe/log.INFO | grep -i "querymem"'
You can also run a query first, and then run the following command immediately after to retrieve the most recent query logs and filter for "QueryMem"
:
$ grep 'tail -n 50 $(gadmin config get System.LogRoot)/gpe/log.INFO |
grep -i "querymem"'
Interpreting log response
This type of query memory logging may not appear for some queries due to the way GSQL handles querying internally.
Specifically, it applies for queries run in GPR mode: Furthermore, it only appears at log levels |
The logs show memory usage by the query at different stages of its execution. The number at the end of each line indicates the number of bytes of memory utilized by the query.
The reported query memory usage is only a partial estimation of the query’s memory usage.
When The best way to evaluate the reported memory usage number is to compare the memory usage of different runs of the same query. If two runs of the same query report different numbers in terms of their memory usage, it is very likely that the query run reporting higher memory usage actually used more memory than the query run reporting lower memory usage. |
For example, if we query for the memory usage for request ID 2.RESTPP_1_1.1665503463466.N
with the following command:
$ grun all 'grep -i 2.RESTPP_1_1.1665503463466.N $(gadmin config get System.LogRoot)/gpe/log.INFO | grep -i "querymem"'
We get the following response from two different nodes in a cluster:
I1011 08:51:03.478861 199860 gpr.cpp:206] Engine_MemoryStats|{request}|MONITORING Step{step} BeforeRun[GPR][QueryMem]{current_mem},watermark:{mem_watermark}|request:WorkerManager,4.GPE_1_2.1665503463470.N:2.RESTPP_1_1.1665503463466.N,NNN,15,0,0,0,S|step:1|current_mem:0|mem_watermark:0
I1011 08:51:04.369922 199860 gpr.cpp:265] Engine_MemoryStats|{request}|MONITORING Step{step} AfterRun[GPR][QueryMem]{current_mem},watermark:{mem_watermark}|request:WorkerManager,4.GPE_1_2.1665503463470.N:2.RESTPP_1_1.1665503463466.N,NNN,15,0,0,0,S|step:1|current_mem:132960052|mem_watermark:157689448
I1011 08:51:04.659490 199860 gpr.cpp:206] Engine_MemoryStats|{request}|MONITORING Step{step} BeforeRun[GPR][QueryMem]{current_mem},watermark:{mem_watermark}|request:WorkerManager,4.GPE_1_2.1665503463470.N:2.RESTPP_1_1.1665503463466.N,NNN,15,0,0,0,S|step:2|current_mem:132960052|mem_watermark:157689448
I1011 08:51:04.819167 199860 gpr.cpp:265] Engine_MemoryStats|{request}|MONITORING Step{step} AfterRun[GPR][QueryMem]{current_mem},watermark:{mem_watermark}|request:WorkerManager,4.GPE_1_2.1665503463470.N:2.RESTPP_1_1.1665503463466.N,NNN,15,0,0,0,S|step:2|current_mem:178609044|mem_watermark:19465093
The following uses the third line as an example:
I1011 08:51:04.659490 199860 gpr.cpp:206] \
Engine_MemoryStats|{request}|MONITORING Step{step} BeforeRun[GPR][QueryMem]{current_mem}, \
watermark:{mem_watermark}| \
request:WorkerManager,4.GPE_1_2.1665503463470.N:2.RESTPP_1_1.1665503463466.N,NNN,15,0,0,0,S (1)
|step:2| \ (2)
current_mem:132960052 \ (3)
|mem_watermark:157689448 (4)
1 | The request’s full ID.
Note GPE_1_2 which tells you this line of log is coming from the second node in the primary replica. |
2 | Internally, a query’s execution consists of many steps. This indicates the step at which the log is generated, which is helpful for TigerGraph Support during debugging. You can usually disregard this data. |
3 | The current amount of memory held by the query at the time the log is taken. |
4 | The maximum amount of memory held by the query during the course of its run. |
Monitor system free memory percentage
Through Admin Portal
If you have access to Admin Portal, you can monitor memory usage by node through the cluster monitoring tool in the Dashboard.
Through Linux commands
The following is a list of Linux commands to measure system memory and check for out-of-memory errors:
-
To check available memory on a node, run
free -h
.-
Use with
grun
to check available memory on every node.
-
-
To view CPU and memory details interactively, run
top
. -
When a query is aborted, run
dmesg -T
| grep -i “oom” to check for out of memory errors[Thu Feb 4 00:41:08 2021] google_osconfig invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 (1) [Thu Feb 4 00:41:08 2021] [<ffffffffafdc208d>] oom_kill_process+0x2cd/0x490 [Thu Feb 4 00:41:08 2021] [ pid ] uid tigergraph total_vm rss nr_ptes swapents oom_score_adj name [Thu Feb 4 00:41:09 2021] [20183] 1004 20183 20200397 377046 5701 0 0 tg_dbs_restd [Thu Feb 4 00:41:09 2021] Out of memory: Kill process 20183 (tg_dbs_restd) score 239 or sacrifice child [Thu Feb 4 00:41:09 2021] Killed process 20183 (tg_dbs_restd), UID 1004, total-vm:80801588kB, anon-rss:1508400kB, file-rss:0kB, shmem-rss:0kB
1 If you see a line that says something invoked oom-killer, it means the node ran out of memory.
Memory state thresholds
GPE has memory protection to prevent out-of-memory issues. There are three memory states:
- Healthy
-
There is a healthy amount of free memory. All system operations run normally for best performance.
- Alert
-
There is a shortage of free memory. TigerGraph’s GPE system enables mechanisms to alleviate the memory pressure by moving some data onto disks and trying to use the maximum allowed threads for rebuilding. This starts to slow down the processing of new requests until the long-running queries finish and release memory.
- Critical
-
There is a critical shortage of free memory. TigerGraph’s GPE starts to abort queries to ensure system stability.
TigerGraph implements memory protection thresholds through the following environment variables. By default, the thresholds are only effective when a machine has more than 30 GB of total memory:
SysAlertFreePct
-
The free memory threshold that causes TigerGraph to enter the Alert state. Default value is 30%.
SysMinFreePct
-
The free memory threshold that causes TigerGraph to enter the Critical state. Default value is 10%.
Configure memory state thresholds
To configure these environment variables, run gadmin config entry GPE.BasicConfig.Env
.
This shows the current values of the environment variables and allows you to add new entries:
$ gadmin config entry GPE.BasicConfig.Env
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false▐
Add your desired memory threshold configuration after the existing environment values.
Use a semicolon ;
to separate the different environment variables:
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false;SysMinFreePct=5;SysAlertFreePct=25; (1)
1 | This sets the critical threshold to 5 percent and the alert threshold to 25 percent. |
Spaces have been added to the following full example for readability.
> gadmin config entry GPE.BasicConfig.Env
GPE.BasicConfig.Env [ LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu...(too long to show the full content, please use 'gadmin config get GPE.BasicConfig.Env' to get it) ]:
The runtime environment variables, separated by ';'
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false
; SysMinFreePct=20;SysAlertFreePct=30 (1)
1 | In this example, the user has set SysMinFreePct to 20 , meaning that queries will start aborting automatically for stability when 20% of system memory is free (80% utilization).
The user has also set SysAlertFreePct to 30 , so queries will start being throttled at 30% free memory (70% utilization). |
After making a change, run gadmin config apply
to apply the changes and gadmin restart gpe
to restart the GPE service.
Changes will take effect after the restart.
Limit query memory usage
There are two ways to limit the memory usage of queries:
-
By system configuration. This affects all queries on your TigerGraph instance.
-
By HTTP request header. This affects one specific query run only and overrides the system configuration.
By system configuration
You can set a limit of how much memory a query is allowed to use on any single node in a cluster. If a query’s memory usage exceeds this limit on any node in a cluster, the query is aborted automatically.
To set a limit for memory usage on any node for a cluster, use the gadmin config
command to configure the value of the parameter GPE.QueryLocalMemLimitMB
.
For example, to set the limit to 100 MB, run the following command:
$ gadmin config set GPE.QueryLocalMemLimitMB 100
You must restart the GPE service for the change to take effect.
By HTTP header
Another way to limit the query memory usage is to specify the memory limit at the time of the request through the HTTP header GSQL-QueryLocalMemLimitMB
when using the Run Query REST endpoint.
This applies to the specific request being run only, and overrides the system configuration.
For example, to set the limit to 100 MB, make the following request:
curl -X POST -H "GSQL-QueryLocalMemLimitMB: 100" -d '{"p":{"id":"Tom","type":"person"}}'
"http://localhost:9000/query/social/hello"