Explaining queries
If it is unclear how a given query will perform, clients can retrieve a query's execution plan from the AQL query optimizer without actually executing the query. Getting the query execution plan from the optimizer is called explaining.
An explain will throw an error if the given query is syntactically invalid. Otherwise, it will return the execution plan and some information about what optimizations could be applied to the query. The query will not be executed.
Explaining a query can be achieved by calling the HTTP REST API
or via the arangosh.
A query can also be explained from the ArangoShell using the ArangoDatabase
's explain
method
or in detail via ArangoStatement
's explain
method.
Inspecting query plans
The explain
method of ArangoStatement
as shown in the next chapters creates very verbose output.
To get a human-readable output of the query plan you can use the explain
method on our database
object in arangosh. You may use it like this: (we disable syntax highlighting here)
arangosh> db._explain("LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1", {}, {colors: false}); Query String: LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1 Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 4 CalculationNode 1 - LET #2 = 1 /* json expression */ /* const assignment */ 2 CalculationNode 1 - LET s = SLEEP(0.25) /* simple expression */ 3 CalculationNode 1 - LET t = SLEEP(0.5) /* simple expression */ 5 ReturnNode 1 - RETURN #2 Indexes used: none Functions used: Name Deterministic Uses V8 SLEEP false false Optimization rules applied: Id RuleName 1 move-calculations-up
arangosh> db._explain("LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1", {}, {colors: false});
The plan contains all execution nodes that are used during a query. These nodes represent different stages in a query. Each stage gets the input from the stage directly above (its dependencies). The plan will show you the estimated number of items (results) for each query stage (under Est.). Each query stage roughly equates to a line in your original query, which you can see under Comment.
Executing an instrumented query
Sometimes when you have a complex query it can become very unclear where time is spend during the execution,
even for intermediate ArangoDB users! For this reason we allow you to execute a query with special instrumentation
code enabled. The resulting query result will contain a copy of the query plan as well as detailed execution statistics.
To use this in an interactive fashion on the shell you can use the _profileQuery
method
on the ArangoDatabase
object. This will display all the usual information contained in the explain,
but additionally you get all the statistics, the query profile and per node stats.
The execution plan contains three additional columns Calls
(number of times this query stage was executed),
Items
(number of temporary results at this stage) and Runtime
(the total time spend in this stage). Below the
execution plan there are additional sections for the overall runtime statistics and the query
profile.
arangosh> db._profileQuery("LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1", {}, {colors: false}); Query String: LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1 Execution plan: Id NodeType Calls Items Runtime [s] Comment 1 SingletonNode 2 1 0.0000 * ROOT 4 CalculationNode 2 1 0.0000 - LET #2 = 1 /* json expression */ /* const assignment */ 2 CalculationNode 2 1 0.2710 - LET s = SLEEP(0.25) /* simple expression */ 3 CalculationNode 2 1 0.5118 - LET t = SLEEP(0.5) /* simple expression */ 5 ReturnNode 2 1 0.0000 - RETURN #2 Indexes used: none Optimization rules applied: Id RuleName 1 move-calculations-up Query Statistics: Writes Exec Writes Ign Scan Full Scan Index Filtered Exec Time [s] 0 0 0 0 0 0.78453s Query Profile: Query Stage Duration [s] initializing 0.00001 parsing 0.00014 optimizing ast 0.00002 loading collections 0.00002 instantiating plan 0.00011 optimizing plan 0.00051 executing 0.78313 finalizing 0.00055
arangosh> db._profileQuery("LET s = SLEEP(0.25) LET t = SLEEP(0.5) RETURN 1", {}, {colors: false});
Execution plans in detail
By default, the query optimizer will return what it considers to be the optimal plan. The
optimal plan will be returned in the plan
attribute of the result. If explain
is
called with option allPlans
set to true
, all plans will be returned in the plans
attribute instead. The result object will also contain an attribute warnings, which
is an array of warnings that occurred during optimization or execution plan creation.
Each plan in the result is an object with the following attributes:
- nodes: the array of execution nodes of the plan. The list of available node types can be found here
- estimatedCost: the total estimated cost for the plan. If there are multiple plans, the optimizer will choose the plan with the lowest total cost.
- collections: an array of collections used in the query
- rules: an array of rules the optimizer applied. The list of rules can be found here
- variables: array of variables used in the query (note: this may contain internal variables created by the optimizer)
Here is an example for retrieving the execution plan of a simple query:
arangosh> var stmt = db._createStatement(
........> "FOR user IN _users RETURN user");
arangosh> stmt.explain();
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "EnumerateCollectionNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 3,
"estimatedNrItems" : 1,
"random" : false,
"outVariable" : {
"id" : 0,
"name" : "user"
},
"projections" : [ ],
"producesResult" : true,
"database" : "_system",
"collection" : "_users",
"satellite" : false
},
{
"type" : "ReturnNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 4,
"estimatedNrItems" : 1,
"inVariable" : {
"id" : 0,
"name" : "user"
},
"count" : true
}
],
"rules" : [ ],
"collections" : [
{
"name" : "_users",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "user"
}
],
"estimatedCost" : 4,
"estimatedNrItems" : 1,
"initialize" : true,
"isModificationQuery" : false
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 32,
"rulesSkipped" : 0,
"plansCreated" : 1
},
"cacheable" : true
}
arangosh> var stmt = db._createStatement(
........> "FOR user IN _users RETURN user");
arangosh> stmt.explain();
As the output of explain
is very detailed, it is recommended to use some
scripting to make the output less verbose:
arangosh> var formatPlan = function (plan) {
........> return { estimatedCost: plan.estimatedCost,
........> nodes: plan.nodes.map(function(node) {
........> return node.type; }) }; };
arangosh> formatPlan(stmt.explain().plan);
{
"estimatedCost" : 4,
"nodes" : [
"SingletonNode",
"EnumerateCollectionNode",
"ReturnNode"
]
}
arangosh> var formatPlan = function (plan) {
........> return { estimatedCost: plan.estimatedCost,
........> nodes: plan.nodes.map(function(node) {
........> return node.type; }) }; };
arangosh> formatPlan(stmt.explain().plan);
If a query contains bind parameters, they must be added to the statement before
explain
is called:
arangosh> var stmt = db._createStatement(
........> `FOR doc IN @@collection FILTER doc.user == @user RETURN doc`
........> );
arangosh> stmt.bind({ "@collection" : "_users", "user" : "root" });
arangosh> stmt.explain();
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "IndexNode",
"dependencies" : [
1
],
"id" : 6,
"estimatedCost" : 1.95,
"estimatedNrItems" : 1,
"outVariable" : {
"id" : 0,
"name" : "doc"
},
"projections" : [ ],
"producesResult" : true,
"database" : "_system",
"collection" : "_users",
"satellite" : false,
"needsGatherNodeSort" : false,
"indexCoversProjections" : false,
"indexes" : [
{
"id" : "11",
"type" : "hash",
"fields" : [
"user"
],
"selectivityEstimate" : 1,
"unique" : true,
"sparse" : true,
"deduplicate" : true
}
],
"condition" : {
"type" : "n-ary or",
"typeID" : 63,
"subNodes" : [
{
"type" : "n-ary and",
"typeID" : 62,
"subNodes" : [
{
"type" : "compare ==",
"typeID" : 25,
"subNodes" : [
{
"type" : "attribute access",
"typeID" : 35,
"name" : "user",
"subNodes" : [
{
"type" : "reference",
"typeID" : 45,
"name" : "doc",
"id" : 0
}
]
},
{
"type" : "value",
"typeID" : 40,
"value" : "root",
"vType" : "string",
"vTypeID" : 4
}
]
}
]
}
]
},
"sorted" : true,
"ascending" : true,
"reverse" : false,
"evalFCalls" : true,
"fullRange" : false,
"limit" : 0
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 5,
"estimatedCost" : 2.95,
"estimatedNrItems" : 1,
"inVariable" : {
"id" : 0,
"name" : "doc"
},
"count" : true
}
],
"rules" : [
"use-indexes",
"remove-filter-covered-by-index",
"remove-unnecessary-calculations-2"
],
"collections" : [
{
"name" : "_users",
"type" : "read"
}
],
"variables" : [
{
"id" : 2,
"name" : "1"
},
{
"id" : 0,
"name" : "doc"
}
],
"estimatedCost" : 2.95,
"estimatedNrItems" : 1,
"initialize" : true,
"isModificationQuery" : false
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 32,
"rulesSkipped" : 0,
"plansCreated" : 1
},
"cacheable" : true
}
arangosh> var stmt = db._createStatement(
........> `FOR doc IN @@collection FILTER doc.user == @user RETURN doc`
........> );
arangosh> stmt.bind({ "@collection" : "_users", "user" : "root" });
arangosh> stmt.explain();
In some cases the AQL optimizer creates multiple plans for a single query. By default
only the plan with the lowest total estimated cost is kept, and the other plans are
discarded. To retrieve all plans the optimizer has generated, explain
can be called
with the option allPlans
set to true
.
In the following example, the optimizer has created two plans:
arangosh> var stmt = db._createStatement(
........> "FOR user IN _users FILTER user.user == 'root' RETURN user");
arangosh> stmt.explain({ allPlans: true }).plans.length;
1
To see a slightly more compact version of the plan, the following transformation can be applied:
arangosh> stmt.explain({ allPlans: true }).plans.map(
........> function(plan) { return formatPlan(plan); });
[
{
"estimatedCost" : 2.95,
"nodes" : [
"SingletonNode",
"IndexNode",
"ReturnNode"
]
}
]
arangosh> stmt.explain({ allPlans: true }).plans.map(
........> function(plan) { return formatPlan(plan); });
explain
will also accept the following additional options:
- maxPlans: limits the maximum number of plans that are created by the AQL query optimizer
- optimizer.rules: an array of to-be-included or to-be-excluded optimizer rules
can be put into this attribute, telling the optimizer to include or exclude
specific rules. To disable a rule, prefix its name with a
-
, to enable a rule, prefix it with a+
. There is also a pseudo-ruleall
, which will match all optimizer rules.
The following example disables all optimizer rules but remove-redundant-calculations
:
arangosh> stmt.explain({ optimizer: {
........> rules: [ "-all", "+remove-redundant-calculations" ] } });
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "EnumerateCollectionNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 3,
"estimatedNrItems" : 1,
"random" : false,
"outVariable" : {
"id" : 0,
"name" : "user"
},
"projections" : [ ],
"producesResult" : true,
"database" : "_system",
"collection" : "_users",
"satellite" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 4,
"estimatedNrItems" : 1,
"expression" : {
"type" : "compare ==",
"typeID" : 25,
"subNodes" : [
{
"type" : "attribute access",
"typeID" : 35,
"name" : "user",
"subNodes" : [
{
"type" : "reference",
"typeID" : 45,
"name" : "user",
"id" : 0
}
]
},
{
"type" : "value",
"typeID" : 40,
"value" : "root",
"vType" : "string",
"vTypeID" : 4
}
]
},
"outVariable" : {
"id" : 2,
"name" : "1"
},
"canThrow" : false,
"expressionType" : "simple"
},
{
"type" : "FilterNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 5,
"estimatedNrItems" : 1,
"inVariable" : {
"id" : 2,
"name" : "1"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 6,
"estimatedNrItems" : 1,
"inVariable" : {
"id" : 0,
"name" : "user"
},
"count" : true
}
],
"rules" : [ ],
"collections" : [
{
"name" : "_users",
"type" : "read"
}
],
"variables" : [
{
"id" : 2,
"name" : "1"
},
{
"id" : 0,
"name" : "user"
}
],
"estimatedCost" : 6,
"estimatedNrItems" : 1,
"initialize" : true,
"isModificationQuery" : false
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 3,
"rulesSkipped" : 29,
"plansCreated" : 1
},
"cacheable" : true
}
arangosh> stmt.explain({ optimizer: {
........> rules: [ "-all", "+remove-redundant-calculations" ] } });
The contents of an execution plan are meant to be machine-readable. To get a human-readable version of a query's execution plan, the following commands can be used:
arangosh> var query = "FOR doc IN mycollection FILTER doc.value > 42 RETURN doc"; arangosh> require("@arangodb/aql/explainer").explain(query, {colors:false}); Query String: FOR doc IN mycollection FILTER doc.value > 42 RETURN doc Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 2 EnumerateCollectionNode 302 - FOR doc IN mycollection /* full collection scan */ 3 CalculationNode 302 - LET #1 = (doc.`value` > 42) /* simple expression */ /* collections used: doc : mycollection */ 4 FilterNode 302 - FILTER #1 5 ReturnNode 302 - RETURN doc Indexes used: none Optimization rules applied: none
arangosh> var query = "FOR doc IN mycollection FILTER doc.value > 42 RETURN doc"; arangosh> require("@arangodb/aql/explainer").explain(query, {colors:false});
The above command prints the query's execution plan in the ArangoShell directly, focusing on the most important information.
Gathering debug information about a query
If an explain provides no suitable insight into why a query does not perform as expected, it may be reported to the ArangoDB support. In order to make this as easy as possible, there is a built-in command in ArangoShell for packaging the query, its bind parameters and all data required to execute the query elsewhere.
The command will store all data in a file with a configurable filename:
arangosh> var query = "FOR doc IN mycollection FILTER doc.value > 42 RETURN doc";
arangosh> require("@arangodb/aql/explainer").debugDump("/tmp/query-debug-info", query);
Entitled users can send the generated file to the ArangoDB support to facilitate reproduction and debugging.
If a query contains bind parameters, they will need to specified along with the query string:
arangosh> var query = "FOR doc IN @@collection FILTER doc.value > @value RETURN doc";
arangosh> var bind = { value: 42, "@collection": "mycollection" };
arangosh> require("@arangodb/aql/explainer").debugDump("/tmp/query-debug-info", query, bind);
It is also possible to include example documents from the underlying collection in order to make reproduction even easier. Example documents can be sent as they are, or in an anonymized form. The number of example documents can be specified in the examples options attribute, and should generally be kept low. The anonymize option will replace the contents of string attributes in the examples with "XXX". It will however not replace any other types of data (e.g. numeric values) or attribute names. Attribute names in the examples will always be preserved because they may be indexed and used in queries:
arangosh> var query = "FOR doc IN @@collection FILTER doc.value > @value RETURN doc";
arangosh> var bind = { value: 42, "@collection": "mycollection" };
arangosh> var options = { examples: 10, anonymize: true };
arangosh> require("@arangodb/aql/explainer").debugDump("/tmp/query-debug-info", query, bind, options);