Managing and Inspecting a Loading Job
This page describes the commands to check loading job status, abort a loading job, and restart a loading job.
Job ID and status
When a loading job starts, the GSQL server assigns it a job ID and displays it for the user to see.
The job ID format is typically the name of the graph, followed by the machine alias, following by a code number, e.g., gsql_demo_m1.1525091090494
SHOW LOADING STATUS
outputKick off the following job, i.e.
JobName: load_test1, jobid: demo_graph_m1.1523663024967
Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/demo_graph/demo_graph_m1.1523663024967.log'
Job "demo_graph_m1.1523663024967" loading status
[RUNNING] m1 ( Finished: 3 / Total: 4 )
[LOADING] /data/output/company.data
[============= ] 20%, 200 kl/s
[LOADED]
+--------------------------------------------------------------------------------------+
| FILENAME | LINES | OBJECTS | ERRORS | AVG SPEED | DURATION|
|/data/output/company.data | 6 | 3 | 2 | 60 l/s | 0.10 s|
+--------------------------------------------------------------------------------------+
[RUNNING] m2 ( Finished: 1 / Total: 2 )
[LOADING] /data/output/company.data
[========================== ] 60%, 200 kl/s
[LOADED]
+--------------------------------------------------------------------------------------+
| FILENAME | LINES | OBJECTS | ERRORS | AVG SPEED | DURATION|
|/data/output/company.data | 6 | 3 | 2 | 60 l/s | 0.10 s|
+--------------------------------------------------------------------------------------+
By default, an active loading job will display periodic updates of its progress. There are two ways to prevent these automatic output displays from appearing:
-
Run the loading job with the
-noprint
option. -
After the loading job has started, enter CTRL+C. This will abort the output display process, but the loading job will continue.
SHOW LOADING STATUS
The command SHOW LOADING STATUS
shows the current status of either a specified loading job or all current jobs:
SHOW LOADING STATUS job_id|ALL
gsql
The display format is the same as that displayed during the periodic progress updates of the RUN LOADING JOB
command.
If you do not know the job ID, but you know the job name and possibly the machine, then the ALL
option is a handy way to see a list of active job IDs.
ABORT LOADING JOB
The command ABORT LOADING JOB aborts either a specified load job or all active loading jobs:
ABORT LOADING JOB job_id|ALL
gsql
The output will show a summary of aborted loading jobs.
gsql -g demo_graph "abort loading job all"
Job "demo_graph_m1.1519111662589" loading status
[ABORT_SUCCESS] m1
[SUMMARY] Finished: 0 / Total: 2
+--------------------------------------------------------------------------------------------------+
| FILENAME | LINES | OBJECTS | ERRORS | AVG SPEED | DURATION|
| /home/tigergraph/data.csv | 6 | 3 | 2 | 60 l/s | 0.10 s|
|/home/tigergraph/data1.csv | 6 | 3 | 2 | 60 l/s | 0.10 s|
+--------------------------------------------------------------------------------------------------+
Job "demo_graph_m2.1519111662615" loading status
[ABORT_SUCCESS] m2
[SUMMARY] Finished: 0 / Total: 2
+--------------------------------------------------------------------------------------------------+
| FILENAME | LINES | OBJECTS | ERRORS | AVG SPEED | DURATION|
| /home/tigergraph/data.csv | 6 | 3 | 2 | 60 l/s | 0.10 s|
|/home/tigergraph/data1.csv | 6 | 3 | 2 | 60 l/s | 0.10 s|
+--------------------------------------------------------------------------------------------------+
gsql
RESUME LOADING JOB
The command RESUME LOADING JOB will restart a previously-run job which ended for some reason before completion.
RESUME LOADING JOB job_id
gsql
If the job is finished, this command will do nothing. The RESUME command should pick up where the previous run ended; that is, it should not load the same data twice.
gsql -g demo_graph "RESUME LOADING JOB demo_graph_m1.1519111662589"
[RESUME_SUCCESS] m1
[MESSAGE] The current job got resummed
gsql
Verifying and debugging a loading job
Every loading job creates a log file. When the job starts, GSQL displays the location of the log file.
This file contains the following information which most users will find useful:
-
A list of all the parameter and option settings for the loading job
-
A copy of the status information that is printed
-
A report on the number of lines successfully read and parsed
The report includes how many objects of each type are created, and how many lines are invalid due to different reasons. This report also shows which lines cause the errors.
There are two types of statistics shown in the report:
-
File-level: The number of lines
-
Data-object-level: The number of objects
If a file level error occurs, e.g., a line does not have enough columns, this line of data is skipped for all LOAD statements in this loading job. If an object level error or failed condition occurs, only the corresponding object is not created, i.e., all other objects in the same loading job are still created if there is no object level error or failed condition for each corresponding object.
File level statistics | Explanation |
---|---|
Valid lines |
The number of valid lines in the source file |
Reject lines |
The number of lines which are rejected by reject_line_rules |
Invalid JSON format |
The number of lines with invalid JSON format |
Not enough token |
The number of lines with missing column(s) |
Oversize token |
The number of lines with oversize token(s). Please increase |
Object level statistics | Explanation |
---|---|
Valid Object |
The number of objects which have been loaded successfully |
No ID found |
The number of objects in which PRIMARY_ID is empty |
Invalid Attributes |
The number of invalid objects caused by wrong data format for the attribute type |
Invalid primary id |
The number of invalid objects caused by wrong data format for the PRIMARY_ID type |
incorrect fixed binary length |
The number of invalid objects caused by the mismatch of the length of the data to the type defined in the schema |
Note that failing a WHERE
clause is not necessarily a bad result.
If the user’s intent for the WHERE
clause is to select only certain lines, then it is natural for some lines to pass and some lines to fail:
CREATE VERTEX Movie (PRIMARY_ID id UINT, title STRING, country STRING, year UINT)
CREATE DIRECTED EDGE Sequel_Of (FROM Movie, TO Movie)
CREATE GRAPH Movie_Graph(*)
CREATE LOADING JOB load_movie FOR GRAPH Movie_Graph{
DEFINE FILENAME f;
LOAD f TO VERTEX Movie VALUES ($0, $1, $2, $3) WHERE to_int($3) < 2000;
}
RUN LOADING JOB load_movie USING f="movie.dat"
gsql
0,abc,USA,-1990
1,abc,CHN,1990
2,abc,CHN,1990
3,abc,FRA,2015
4,abc,FRA,2005
5,abc,USA,1990
6,abc,1990
gsql
The above loading job and data generate the following reports:
[LOADED]
+--------------------------------------------------------------------------------------+
| FILENAME | LINES | OBJECTS | ERRORS | AVG SPEED | DURATION|
|/home/tigergraph/movie.dat | 6 | 3 | 2 | 60 l/s | 0.10 s|
+--------------------------------------------------------------------------------------+
[WARNING] bad data in m1 (replica 1) /home/tigergraph/movie.dat: 1 line(s) do not have enough number of tokens.
[WARNING] bad data in m1 (replica 1) /home/tigergraph/movie.dat:Movie: 1 object(s) have invalid attributes.
Sampling error data can be viewed by executing the 'SHOW LOADING ERROR <job_id>'.
gsql
SHOW LOADING ERROR
Beginning with TigerGraph 4.1, the loader will log a sample of the data lines that it could not load.
The command SHOW LOADING ERROR <job_id>
presents a table of the sample of malformed data lines.
Each row reports not only a data line and its location, but also its error type. This information helps users to quickly identify and understand errors, leading to quicker resolution of data format problems.
Error Data Example
+----------------+------------------------------+------------+--------------------------------------------------------+
| Source | Reason | Offset | Content |
+----------------+------------------------------+------------+--------------------------------------------------------+
|/home/tigergraph|InvalidAttribute(load vertex:M|1 |0,abc,USA,-1990 |
|/movie.dat |ovie) | | |
+----------------+------------------------------+------------+--------------------------------------------------------+
|/home/tigergraph|ShortOfToken(file) |7 |6,abc,1990 |
|/movie.dat | | | |
+----------------+------------------------------+------------+--------------------------------------------------------+
Done
gsql
The command shows two line of error data:
* The data at line 1, it has an invalid attribute(year).
* The data at line 7, it doesn’t have enough tokens(should have 4 attributes).
By default, only the earliest 100 errors are shown, but you can limit the number of data displayed by using the LIMIT
option:
SHOW LOADING ERROR <job_id> LIMIT <num_of_lines>`
gsql
User privileges for showing loading errors are the same as for running loading jobs. |
The file level report points out the two data level errors in this Movie loading job.
The load_output
log file contains more detailed information.
===============================================================================
Source File Name: /home/tigergraph/movie.dat
Lines with not enough tokens: 1 [ERROR] (e.g. Line 7:6,abc,1990)
Lines successfully tokenized: 6
Vertex:Movie
Lines failed condition: 2 (e.g. Line 4:3,abc,FRA,2015, Line 5:4,abc,FRA,2005)
Lines passed condition: 4
Invalid Attributes: 1 [ERROR] (e.g. Line 1:0,abc,USA,-1990)
Valid Object: 3
gsql
There are a total of 7 data lines. The report shows that:
-
One line - Line 7 - does not have enough tokens.
-
Six of the lines are valid data lines and were successfully tokenized.
Of the 6 valid lines,
-
Three of the 6 valid lines generate valid movie vertices.
-
One line has an invalid attribute (Line 1: year).
-
Two lines (Lines 4 and 5) do not pass the
WHERE
clause.
The errors that appear are the first line with an error in each batch in the loading job. Up to 100 sample lines showing errors appear in the log.