Qleverfile settings
The Qleverfile contains the full configuration for the qlever command, see
Quickstart. The variables in the Qleverfile are written in
UPPER_SNAKE_CASE and are grouped into sections. The sections are
[data], [index], [server], [runtime], and [ui]. See
https://github.com/qlever-dev/qlever-control/tree/main/src/qlever/Qleverfiles
for a wide selection of example Qleverfiles.
For each Qleverfile variable, there is a corresponding command-line option for
one or more of the qlever commands, which are written in --snake-case. For
example, the Qleverfile variable ACCESS_TOKEN corresponds to the
--access-token option of the commands qlever start, qlever settings,
qlever clear-cache, and qlever query.
The command-line option always takes precedence over the Qleverfile variable,
in case both are specified. There are some command-line options that are
specific to a command and do not have a corresponding Qleverfile variable.
For example, the option --kill-existing-with-same-port of qlever start,
which does what the name suggests.
The following sections describe all Qleverfile variables and their
corresponding command-line options, with their respective default values. The
options specific to a particular command are not listed here, you get them
via qlever <command> --help.
If a variable is missing from the documentation below, please open an
issue. In the meantime, you
can always resort to qlever <command> --help. If the variable / option exists,
it will be listed there.
Section [data]
NAME, --name: The base name of all files created by various qlever
commands. We strongly recommend to stick to the convention to have one separate
directory for each dataset (with a Qleverfile in it). The name of the
directory is up to you. Default: none.
GET_DATA_CMD, --get-data-cmd: The command invoked by qlever get-data to
obtain the dataset. This can be anything that works on your system; see the
many example Qleverfiles.
Default: none.
DESCRIPTION, --description: A concise description of the dataset, set by
qlever index. Default: none.
TEXT_DESCRIPTION, --text-description: A concise description of the
additional text data if any, set by qlever index. Default: none.
FORMAT, --format: The format of the data, one of ttl, nt, or nq. Default:
ttl.
Section [index]
INPUT_FILES, --input-files: A space-separated list of input files or patterns
(you can use * and ? as wildcards). This is used in two ways. First,
qlever index checks whether these files exist, and if not, reports an error.
Second, it is often useful (but not mandatory) to use this variable in your
definition of CAT_INPUT_FILES or MULTI_JSON_INPUT, see the many
example Qleverfiles.
Default: none.
CAT_INPUT_FILES, --cat-input-files: The command used to create a single
input stream for qlever index. This can be any command that works on your
system, see the many example Qleverfiles.
In particular, you can use commands like zcat or bzcat or xzcat, in order
to read compressed files directly. Default: none.
PARALLEL_PARSING, --parallel-parsing: Whether to parse the single input
stream in parallel (true) or sequentially (false). Parallel parsing is much
faster, but requires that all prefix declarations are at the beginning of the
input stream. Default: true if you use CAT_INPUT_FILES, but deprecated to
encourage setting it explicitly or using MULTI_INPUT_JSON instead.
MULTI_INPUT_JSON, --multi-input-json: A comma-separated list of JSON
objects to define multiple input streams for qlever index. Each JSON object
must specify a "cmd" (the command that produces the input stream) and a
"format" (one of ttl, nt, or nq), and can optionally specify a
"graph" (the name of the graph to which the triples from this input stream
are added, use - for the default graph) and "parallel" ("true" if this
input stream should be parsed in parallel, which is faster but requires that
all prefix declarations are at the beginning of the input stream, or "false"
if not). Additionally, each JSON object can specify "for-each" (a
space-separated list of files or patterns), with the effect that the command
from "cmd" is run once for each file matching one of the patterns, with {}
in the command replaced by the file name. In particular, this is useful if you
have many files that belong to the same graph and have the same format.
Default: none.
SETTINGS_JSON, --settings-json: A JSON object (as a string) that can be
used to pass additional settings for qlever index. The recognized keys are
documented in SETTINGS_JSON keys at the end of this
page. This exists for historical reasons and will be deprecated soon, with the
individual keys migrated to their own Qleverfile variables.
ULIMIT, --ulimit: The maximum number of open files allowed during qlever
index. If this number is too low, qlever index will fail with an error that
will make it clear that you need to increase this value. Default: depends on
your system, but is often as low as 1024 (which is too low for large
datasets).
STXXL_MEMORY, --stxxl-memory: The amount of memory that can be used by
qlever index, specified with standard suffixes like k, M, G, and T.
This is only an approximate upper bound, the actual memory consumption might be
higher. When too low, qlever index might fail with an error message (which usually
makes it clear that you need to increase this value). The strange name of the
variable / option is an artifact from when QLever used the STXXL
library for external memory sorting, which it
no longer does. The name will be changed soon. Defaut: 1G.
PARSER_BUFFER_SIZE, --parser-buffer-size: The size of the buffer used by
qlever index when parsing an input stream, specified with standard suffixes
like k, M, G, and T. This must be large enough to hold the longest
predicate-object list in your dataset (everything from a subject until the next
.). Predicate-object lists are usually short, but can be long for TTL
datasets with many long literals, see Qleverfile.osm-planet. Default: 10M.
VOCABULARY_TYPE, --vocabulary-type: Whether the vocabulary is stored
compressed or not (trade-off between index size and query speed), whether to
store it on disk or in memory (trade-off between memory consumption and query
speed), and whether to store geometry data in a separate file (always a good
idea if your dataset contains geometry data). The options are
on-disk-compressed, in-memory-compressed, on-disk-uncompressed,
in-memory-uncompressed, and on-disk-compressed-geo-split. Default:
on-disk-compressed.
ENCODE_AS_IDS, --encode-as-ids: List of IRI prefixes (separated by spaces)
with the effect that all IRIs starting with one of these prefixes and followed
by a sequence of at most 12 digits will not be stored as strings in the
vocabulary, but stored directly in one of QLever's internal 64-bit identifiers.
See
Qleverfile.osm-planet for an example. Default: none.
TEXT_INDEX, --text-index: Four options: none (no text index),
from_literals (create a text index from all literals in the dataset),
from_text_records (create a text index from the givens "words" and "docs"
file, see TEXT_WORDS_FILE and TEXT_DOCS_FILE below), and
from_literals_and_text_records (create a text index from both literals and
the given "words" and "docs" file). Default: none.
TEXT_WORDS_FILE, --text-words-file: The name of the file containing the
word occurrences for the text index, one line per occurrence with four
tab-separated columns each, in the format word or <IRI> TAB 0 or 1 TAB text record
id TAB always 1. Default: <NAME>.wordsfile.tsv.
TEXT_DOCS_FILE, --text-docs-file: The name of the file containing the text
records for the text index, one line per record with two tab-separated columns
each, in the format text record id TAB text. Default: <NAME>.docsfile.tsv.
ADD_HAS_WORD_TRIPLES, --add-has-word-triples: Whether to additionally store,
for each triple with a literal object and for each word in that literal, an
internal triple of the form <literal> ql:has-word "word". The (named) graph
of these internal triples is used to store the term frequency of the word in
the literal. The triples can be used to implement custom full-text search
queries, in particular in combination with materialized
views. Default: false.
INDEX_BINARY, --index-binary: The binary for building the index, when
using SYSTEM = native. The binary must either be in your PATH or you must
specify the full path. Default: qlever-index (which is the default name
of the binary for index building when compiling QLever).
MATERIALIZED_VIEWS, --materialized-views: Materialized views to be written at indexing time, given as a JSON object. The keys in the JSON are used as view names and the string values as the SPARQL queries for writing the respective view. By default, no materialized views are written.
Section [server]
PORT, --port: The port of the SPARQL endpoint created by qlever start.
Default: none.
ACCESS_TOKEN, --access-token: The access token required for privileged
operations such as qlever clear-cache --complete, qlever query
--pin-to-cache, and qlever settings (when modifying a setting). Default:
none (in which case no privileged operations are possible at all).
MEMORY_FOR_QUERIES, --memory-for-queries: The amount of memory that can be
used for queries, specified with standard suffixes like k, M, G, and T.
Default: 5G.
TIMEOUT, --timeout: The maximum time a query is allowed to run, specified
with standard suffixes like s, m, and h. This is an approximate upper
bound, queries might run longer in some cases. Default: 30s.
CACHE_MAX_SIZE, --cache-max-size: The maximum size of the cache used for
caching query results, specified with standard suffixes like k, M,
G, and T. When the total size of the cached results exceeds this value, the
eviction strategy is Least Recently Used (LRU). Default: 2G.
CACHE_MAX_SIZE_SINGLE_ENTRY, --cache-max-size-single-entry: The maximum
size of a single cache entry, specified with standard suffixes like k, M,
G, and T. Default: 1G.
CACHE_MAX_NUM_ENTRIES, --cache-max-num-entries: The maximum number of
cached results held in the cache at the same time. When the number of cached
results exceeds this value, the eviction strategy is Least Recently Used (LRU).
Default: 200.
PERSIST_UPDATES, --persist-updates: When true (or the command-line
option --persist-updates is given), all update requests processed after
qlever start are persisted to disk, in a single file <NAME>.update-triples.
When the server is stopped and qlever start is run again, the updates are
replayed and new updates are appended to the same file. This is rudimentary
for now, a more sophisticated mechanism is currently being developed. For an
alternative, see qlever update-wikidata, where updates come from an SSE
stream, and can be replayed any time from an arbitrary point in time, and the
date until which the dataset is up to date is stored in dedicated triples.
Default: false.
PRELOAD_MATERIALIZED_VIEWS, --preload-materialized-views: Materialized views to be loaded upon server start. Takes an arbitrary number of arguments. By default, no views are loaded on server start.
Section [runtime]
SYSTEM, --system: Three options: native (run natively, assuming that the
QLever binaries qlever-server and qlever-index are in your PATH),
docker (pull Docker image if none is present locally, and run in a
Docker container), and podman (same as docker, but using Podman
instead of Docker). Default: docker.
IMAGE, --image: The name of the image when using SYSTEM = docker or
SYSTEM = podman. Default: docker.io/adfreiburg/qlever:latest.
INDEX_CONTAINER, --index-container: The name of the container used by
qlever index, when using SYSTEM = docker or SYSTEM = podman.
SERVER_CONTAINER, --server-container: The name of the container used by
qlever start, when using SYSTEM = docker or SYSTEM = podman.
Section [ui]
UI_CONFIG, --ui-config: The name of one of the preconfigurations from
https://qlever.dev (the slug after the https://qlever.dev/ is the name of the
preconfiguration). You cannot choose your own name here yet; this will be fixed
soon. But once you have picked a preconfiguration, you can modify it
arbitrarily (except for the name) after running qlever ui once, see the
instructions printed by qlever ui. Default: default.
UI_PORT, --ui-port: The port of the Qlever UI started with qlever ui.
The URL at which the UI can be accessed is printed by qlever ui. Default:
8176 (the ASCII codes for Q and L).
UI_SYSTEM, --ui-system: Which container system to use for qlever ui,
either docker or podman. Note that unlike for qlever index and qlever
start, there is no native option (the only reason for a native mode there
is that it is more efficient, but that is not a concern for the UI). Default:
docker.
UI_IMAGE, --ui-image: The name of the image used for qlever ui. Default:
docker.io/adfreiburg/qlever-ui.
UI_CONTAINER, --ui-container: The name of the container used for qlever
ui, when using UI_SYSTEM = docker or UI_SYSTEM = podman. Default:
qlever.ui.<NAME>.
SETTINGS_JSON keys
The following keys are recognized in the JSON object given to SETTINGS_JSON
in the [index] section. As mentioned there, SETTINGS_JSON exists for
historical reasons and will be deprecated soon, with the individual keys
migrated to their own Qleverfile variables. The keys are listed in roughly
decreasing order of relevance.
"num-triples-per-batch": How many triples are processed in one batch during
qlever index. All data from a batch is kept in memory until it has been
fully processed, and when parsing input streams in parallel, multiple batches
are kept in memory at the same time. Thus, choosing a large value can lead to
high memory consumption or an out-of-memory crash. On the other hand, two
files per batch are produced during qlever index, so a small value can
require increasing your ULIMIT. Default: 10000000 (ten million).
"parser-batch-size": The number of triples the parser hands off to the index
builder in one batch through the parallel parsing pipeline, controlling the
granularity of that pipeline. This is a triple count, not a memory size, and
is unrelated to PARSER_BUFFER_SIZE (the byte size of the parser's I/O
buffer) and to "num-triples-per-batch" (which controls the size of partial
vocabularies). Usually there is no reason to change it. Default: 1000000
(one million).
"parser-integer-overflow-behavior": How the TTL parser handles integer
literals that do not fit into a 64-bit integer. One of
overflowing-integers-throw (abort qlever index with an error message
identifying the offending literal), overflowing-integers-become-doubles
(silently convert only the overflowing values to doubles), and
all-integers-become-doubles (convert every integer literal to a double,
regardless of its size). Note that any conversion to a double can lose
precision, so silently converting overflowing integers may make subsequent
queries return slightly different numeric results than expected. Default:
overflowing-integers-throw.
"ascii-prefixes-only": If true, the TTL parser assumes that all prefix
declarations contain only ASCII characters, which is faster to parse but
rejects datasets that use non-ASCII characters in their prefix declarations.
Default: false. This is deprecated; there is no longer a noticeable
performance difference.
"locale": A JSON object with the keys "language", "country", and
"ignore-punctuation" that determines the locale used by QLever for
collation, which in turn influences the sort order of strings and the result
of string comparisons in SPARQL queries. Note that changing the locale
requires a complete rebuild of the index, and that any non-default value is
currently untested by the QLever team, so be prepared to file bug reports.
Default: {"language": "en", "country": "US", "ignore-punctuation": false}.
"prefixes-external": A list of IRI prefixes with the effect that all IRIs
and literals starting with one of these prefixes are stored in the external
(on-disk) part of the vocabulary instead of the internal (in-memory) part. To
keep certain IRIs in memory for faster lookup, specify a more restrictive
list. Default: [""], that is, a list with the single empty string, which
matches every IRI and every literal, so the entire vocabulary is external.
"languages-internal": A list of language tags with the effect that all
language-tagged literals carrying one of these tags are stored in the
internal (in-memory) part of the vocabulary instead of the external (on-disk)
part. Use this for languages whose literals occur often in query results, to
make those queries faster. Default: none.
"ignore-case": Removed. qlever index aborts with an error if this key is
present. QLever no longer supports building a case-insensitive index. If
needed, case-insensitive matching has to be done at query time (e.g. via
FILTER (LCASE(?x) = "...") or FILTER REGEX(?x, "...", "i")).