Qleverfile settings

The Qleverfile contains the full configuration for the qlever command, see Quickstart. The variables in the Qleverfile are written in UPPER_SNAKE_CASE and are grouped into sections. The sections are [data], [index], [server], [runtime], and [ui]. See https://github.com/qlever-dev/qlever-control/tree/main/src/qlever/Qleverfiles for a wide selection of example Qleverfiles.

For each Qleverfile variable, there is a corresponding command-line option for one or more of the qlever commands, which are written in --snake-case. For example, the Qleverfile variable ACCESS_TOKEN corresponds to the --access-token option of the commands qlever start, qlever settings, qlever clear-cache, and qlever query.

The command-line option always takes precedence over the Qleverfile variable, in case both are specified. There are some command-line options that are specific to a command and do not have a corresponding Qleverfile variable. For example, the option --kill-existing-with-same-port of qlever start, which does what the name suggests.

The following sections describe all Qleverfile variables and their corresponding command-line options, with their respective default values. The options specific to a particular command are not listed here, you get them via qlever <command> --help.

If a variable is missing from the documentation below, please open an issue. In the meantime, you can always resort to qlever <command> --help. If the variable / option exists, it will be listed there.

Section `[data]`

NAME, --name: The base name of all files created by various qlever commands. We strongly recommend to stick to the convention to have one separate directory for each dataset (with a Qleverfile in it). The name of the directory is up to you. Default: none.

GET_DATA_CMD, --get-data-cmd: The command invoked by qlever get-data to obtain the dataset. This can be anything that works on your system; see the many example Qleverfiles. Default: none.

DESCRIPTION, --description: A concise description of the dataset, set by qlever index. Default: none.

TEXT_DESCRIPTION, --text-description: A concise description of the additional text data if any, set by qlever index. Default: none.

FORMAT, --format: The format of the data, one of ttl, nt, or nq. Default: ttl.

Section `[index]`

INPUT_FILES, --input-files: A space-separated list of input files or patterns (you can use * and ? as wildcards). This is used in two ways. First, qlever index checks whether these files exist, and if not, reports an error. Second, it is often useful (but not mandatory) to use this variable in your definition of CAT_INPUT_FILES or MULTI_JSON_INPUT, see the many example Qleverfiles. Default: none.

CAT_INPUT_FILES, --cat-input-files: The command used to create a single input stream for qlever index. This can be any command that works on your system, see the many example Qleverfiles. In particular, you can use commands like zcat or bzcat or xzcat, in order to read compressed files directly. Default: none.

PARALLEL_PARSING, --parallel-parsing: Whether to parse the single input stream in parallel (true) or sequentially (false). Parallel parsing is much faster, but requires that all prefix declarations are at the beginning of the input stream. Default: true if you use CAT_INPUT_FILES, but deprecated to encourage setting it explicitly or using MULTI_INPUT_JSON instead.

MULTI_INPUT_JSON, --multi-input-json: A comma-separated list of JSON objects to define multiple input streams for qlever index. Each JSON object must specify a "cmd" (the command that produces the input stream) and a "format" (one of ttl, nt, or nq), and can optionally specify a "graph" (the name of the graph to which the triples from this input stream are added, use - for the default graph) and "parallel" ("true" if this input stream should be parsed in parallel, which is faster but requires that all prefix declarations are at the beginning of the input stream, or "false" if not). Additionally, each JSON object can specify "for-each" (a space-separated list of files or patterns), with the effect that the command from "cmd" is run once for each file matching one of the patterns, with {} in the command replaced by the file name. In particular, this is useful if you have many files that belong to the same graph and have the same format. Default: none.

SETTINGS_JSON, --settings-json: A JSON object (as a string) that can be used to pass additional settings for qlever index. This exists for historical reasons and will be deprecated soon. In the meantime, the most relevant key is "num-triples-per-batch", which controls how many triples are parsed in one batch. All data from a batch is kept in memory until it has been fully processed, and when parsing input streams in parallel, multiple batches are kept in memory at the same time. Thus, choosing a large value for "num-triples-per-batch" can lead to high memory consumption or an out-of-memory crash. On the other hand, two files per batch are produced during qlever index, which might require increasing your ULIMIT, see below. The default value for "num-triples-per-batch" is 10000000 (ten million).

ULIMIT, --ulimit: The maximum number of open files allowed during qlever index. If this number is too low, qlever index will fail with an error that will make it clear that you need to increase this value. Default: depends on your system, but is often as low as 1024 (which is too low for large datasets).

STXXL_MEMORY, --stxxl-memory: The amount of memory that can be used by qlever index, specified with standard suffixes like k, M, G, and T. This is only an approximate upper bound, the actual memory consumption might be higher. When too low, qlever index might fail with an error message (which usually makes it clear that you need to increase this value). The strange name of the variable / option is an artifact from when QLever used the STXXL library for external memory sorting, which it no longer does. The name will be changed soon. Defaut: 1G.

PARSER_BUFFER_SIZE, --parser-buffer-size: The size of the buffer used by qlever index when parsing an input stream, specified with standard suffixes like k, M, G, and T. This must be large enough to hold the longest predicate-object list in your dataset (everything from a subject until the next .). Predicate-object lists are usually short, but can be long for TTL datasets with many long literals, see Qleverfile.osm-planet. Default: 10M.

VOCABULARY_TYPE, --vocabulary-type: Whether the vocabulary is stored compressed or not (trade-off between index size and query speed), whether to store it on disk or in memory (trade-off between memory consumption and query speed), and whether to store geometry data in a separate file (always a good idea if your dataset contains geometry data). The options are on-disk-compressed, in-memory-compressed, on-disk-uncompressed, in-memory-uncompressed, and on-disk-compressed-geo-split. Default: on-disk-compressed.

ENCODE_AS_IDS, --encode-as-ids: List of IRI prefixes (separated by spaces) with the effect that all IRIs starting with one of these prefixes and followed by a sequence of at most 12 digits will not be stored as strings in the vocabulary, but stored directly in one of QLever's internal 64-bit identifiers. See Qleverfile.osm-planet for an example. Default: none.

TEXT_INDEX, --text-index: Four options: none (no text index), from_literals (create a text index from all literals in the dataset), from_text_records (create a text index from the givens "words" and "docs" file, see TEXT_WORDS_FILE and TEXT_DOCS_FILE below), and from_literals_and_text_records (create a text index from both literals and the given "words" and "docs" file). Default: none.

TEXT_WORDS_FILE, --text-words-file: The name of the file containing the word occurrences for the text index, one line per occurrence with four tab-separated columns each, in the format word or <IRI> TAB 0 or 1 TAB text record id TAB always 1. Default: <NAME>.wordsfile.tsv.

TEXT_DOCS_FILE, --text-docs-file: The name of the file containing the text records for the text index, one line per record with two tab-separated columns each, in the format text record id TAB text. Default: <NAME>.docsfile.tsv.

INDEX_BINARY, --index-binary: The binary for building the index, when using SYSTEM = native. The binary must either be in your PATH or you must specify the full path. Default: IndexBuilderMain (which is the default name of the binary for index building when compiling QLever).

Section `[server]`

PORT, --port: The port of the SPARQL endpoint created by qlever start. Default: none.

ACCESS_TOKEN, --access-token: The access token required for privileged operations such as qlever clear-cache --complete, qlever query --pin-to-cache, and qlever settings (when modifying a setting). Default: none (in which case no privileged operations are possible at all).

MEMORY_FOR_QUERIES, --memory-for-queries: The amount of memory that can be used for queries, specified with standard suffixes like k, M, G, and T. Default: 5G.

TIMEOUT, --timeout: The maximum time a query is allowed to run, specified with standard suffixes like s, m, and h. This is an approximate upper bound, queries might run longer in some cases. Default: 30s.

CACHE_MAX_SIZE, --cache-max-size: The maximum size of the cache used for caching query results, specified with standard suffixes like k, M, G, and T. When the total size of the cached results exceeds this value, the eviction strategy is Least Recently Used (LRU). Default: 2G.

CACHE_MAX_SIZE_SINGLE_ENTRY, --cache-max-size-single-entry: The maximum size of a single cache entry, specified with standard suffixes like k, M, G, and T. Default: 1G.

CACHE_MAX_NUM_ENTRIES, --cache-max-num-entries: The maximum number of cached results held in the cache at the same time. When the number of cached results exceeds this value, the eviction strategy is Least Recently Used (LRU). Default: 200.

PERSIST_UPDATES, --persist-updates: When true (or the command-line option --persist-updates is given), all update requests processed after qlever start are persisted to disk, in a single file <NAME>.update-triples. When the server is stopped and qlever start is run again, the updates are replayed and new updates are appended to the same file. This is rudimentary for now, a more sophisticated mechanism is currently being developed. For an alternative, see qlever update-wikidata, where updates come from an SSE stream, and can be replayed any time from an arbitrary point in time, and the date until which the dataset is up to date is stored in dedicated triples. Default: false.

Section `[runtime]`

SYSTEM, --system: Three options: native (run natively, assuming that the QLever binaries ServerMain and IndexBuilderMain are in your PATH), docker (pull Docker image if none is present locally, and run in a Docker container), and podman (same as docker, but using Podman instead of Docker). Default: docker.

IMAGE, --image: The name of the image when using SYSTEM = docker or SYSTEM = podman. Default: docker.io/adfreiburg/qlever:latest.

INDEX_CONTAINER, --index-container: The name of the container used by qlever index, when using SYSTEM = docker or SYSTEM = podman.

SERVER_CONTAINER, --server-container: The name of the container used by qlever start, when using SYSTEM = docker or SYSTEM = podman.

Section `[ui]`

UI_CONFIG, --ui-config: The name of one of the preconfigurations from https://qlever.dev (the slug after the https://qlever.dev/ is the name of the preconfiguration). You cannot choose your own name here yet; this will be fixed soon. But once you have picked a preconfiguration, you can modify it arbitrarily (except for the name) after running qlever ui once, see the instructions printed by qlever ui. Default: default.

UI_PORT, --ui-port: The port of the Qlever UI started with qlever ui. The URL at which the UI can be accessed is printed by qlever ui. Default: 8176 (the ASCII codes for Q and L).

UI_SYSTEM, --ui-system: Which container system to use for qlever ui, either docker or podman. Note that unlike for qlever index and qlever start, there is no native option (the only reason for a native mode there is that it is more efficient, but that is not a concern for the UI). Default: docker.

UI_IMAGE, --ui-image: The name of the image used for qlever ui. Default: docker.io/adfreiburg/qlever-ui.

UI_CONTAINER, --ui-container: The name of the container used for qlever ui, when using UI_SYSTEM = docker or UI_SYSTEM = podman. Default: qlever.ui.<NAME>.

Qleverfile settings

Section [data]

Section [index]

Section [server]

Section [runtime]

Section [ui]

Section `[data]`

Section `[index]`

Section `[server]`

Section `[runtime]`

Section `[ui]`