Running a Loading Job
Clearing and Initializing the Graph Store
There are two aspects to clearing the system: flushing the data and clearing the schema definitions in the catalog. Two different commands are available.
CLEAR GRAPH STORE
The CLEAR GRAPH STORE
command flushes all the data out of the graph store (database).
By default, the system asks the user to confirm that you really want to discard all the graph data.
To force the clear operation and bypass the confirmation question, use the -HARD
option,
CLEAR GRAPH STORE -HARD
Clearing the graph store does not affect the schema.
|
DROP ALL
clears both the data and the schema.
Running a Loading Job
Running a loading job executes a previously installed loading job. The job reads lines from an input source, parses each line into data tokens, and applies loading rules and conditions to create new vertex and edge instances to store in the graph data store.
The input sources could be defined in the loading job or could be provided when running the job. Additionally, loading jobs can also be run by submitting an HTTP request to the REST++ server.
User privileges for running loading jobs are treated as separate from privileges regarding reading and writing data to vertices and edges. A user can create and run loading jobs even without the privileges to modify vertex and edge data. For more information, see Access Control Model in TigerGraph. |
RUN LOADING JOB
RUN LOADING JOB
syntax for concurrent loadingRUN LOADING JOB [-noprint] [-dryrun] [-n [i],j] job_name [
USING file_var [="file_path_string"][, file_var [="file_path_string"]]*
[, CONCURRENCY=cnum][,BATCH_SIZE=bnum][,EOF="eof_mode"]
]
When a concurrent loading job is submitted, it is assigned a job ID number, which is displayed on the GSQL console. The user can use this job ID to refer to the job, for a status update, to abort the job, or to restart the job. These operations are described later in this section.
Options
-noprint
By default, the command will print several lines of status information while the loading is running.
If the -noprint option is included, the output will omit the progress and summary details, but it will still display the job id and the location of the log file.
-noprint
option is usedKick off the following job:
JobName: load_videoE, jobid: gsql_demo_m1.1525091090494
Loading log: '/usr/local/tigergraph/logs/restpp/restpp_loader_logs/gsql_demo/gsql_demo_m1.1525091090494.log'
-dryrun
If -dryrun is used, the system will read the data files and process the data as instructed by the job, but will NOT load any data into the graph. This option can be a useful diagnostic tool.
-n [i], j
The -n
option limits the loading job to processing only a range of lines of each input data file. The -n flag accepts one or two arguments.
For example, -n 50
means read lines 1 to 50.
-n 10, 50
means read lines 10 to 50.
The special symbol $
is interpreted as "last line", so -n 10,$
means reads from line 10 to the end.
Parameters
Below are the parameters available for the RUN QUERY
command introduced by the USING
clause.
filevar
list
The optional USING
clause may contain a list of file variables.
Each file variable may optionally be assigned a filepath_string
, obeying the same format as in the CREATE LOADING JOB
.
This list of file variables determines which parts of a loading job are run and what data files are used.
-
When a loading job is compiled, it generates one RESTPP endpoint for each
filevar
andfilepath_string
. As a consequence, a loading job can be run in parts. WhenRUN LOADING JOB
is executed, only those endpoints whosefilevar
or file identifier (__GSQL_FILENAME_n__
) is mentioned in theUSING
clause will be used. However, if theUSING
clause is omitted, then the entire loading job will be run. -
If a
filepath_string
is given, it overrides thefilepath_string
defined in the loading job. If a particularfilevar
is not assigned afilepath_string
either in the loading job or in theRUN LOADING JOB
statement, then an error is reported and the job exits.
CONCURRENCY
The CONCURRENCY
parameter sets the maximum number of concurrent requests that the loading job may send to the GPE. The default is 256.
Running Loading Jobs as REST Requests
Another way to run a loading job is through the POST /ddl/{graph_name}
endpoint of the REST++ server. Since the REST++ server has more direct access to the graph processing engine, this can execute more quickly than a RUN LOADING JOB
statement in GSQL. For details on how to use the endpoint, please see Run a loading job.