Talos is controlled by a single TOML configuration file. A fully commented template lives at
src/talos/example_config.toml.
This document explains the configuration options, the default values, and sensible alternatives.
Description: Talos uses PanelApp as its source of disease-associated genes. Optionally, it can also use it as a source of disease-specific gene associations used for the analysis of disease-specific cohorts.
These settings control which PanelApp deployment is used, which default Panel to use as the base for analysis, which additional genes/panels to include, and which genes to blacklist from the analysis.
Field
Purpose
Default
Alternative
panelapp
PanelApp server to query
https://panelapp-aus.org/api/v1/panels
It is highly recommended to use PanelApp Aus due to enhanced annotations
default_panel
Base panel for the analysis, applied to all participants
List of gene symbols to remove from analysis unless phenotype matched to a participant. This is useful to exclude genes that frequently result in noise.
FLG, GJB2, F2, F5, HFE
forbidden_genes
List of genes that will be excluded from all results.
N/A
Late onset/incidental findings
forced_panels
List of PanelApp panel IDs to apply to all cohort members. Typically used during the analysis of disease-specific cohorts. For example, if analysing a cohort consisting of patients who all have a suspected mitochondrial disorder, you could set this to 203 to highlight any genes associated with the Aus PanelApp Mitochondrial disease panel.
N/A
Phenotype specific panels
within_x_months
Integer, genes added to panels within X months are flagged in the report and used to determine “new” genes for the "ClinVar Recent Gene" category
24
manual_overrides
A manually defined panel: gene symbols, MOI. Can be used to generate an artificial panel, e.g. for recently discovered genes which are not yet in PanelApp
if using VEP annotations, use altered VEP equivalents (e.g. "frameshift" -> "frameshift_variant")
additional_csq
During de novo variant filtering, we reduce the variant set to critical consequences + this list
missense
af_semi_rare
A coarse global population frequency filter - variants more frequent than this are hard filtered during initial processing. Later in the process more rigorous filters are applied, so this is a 'mild' filter to retain plausible compound-het candidates
0.01
callset_af_sv_recessive
A coarse population and callset frequency filter - if an SV variant is more common than this either within the callset, or in the gnomAD annotations, remove
0.03
am_pathogenicity
A float, defining the minimum AlphaMissense pathogenicity score to be treated as pathogenic. Replaces deference to AlphaMissense's default am_class value, which is using the 0.564 score as a Likely Pathogenic threshold
0.564
Config Section: RunHailFiltering.de_novo
Description: Controls filtering thresholds and de novo variant detection parameters during the Hail Stage
Field
Purpose
Default
Alternative
min_child_ab
Minimum proportion of alt-supporting reads relative to overall read depth for the proband
0.20
min_depth
Minimum read depth for calls. In order of preference/availability this is from DP, total of AD, or if both are absent, a dummy value which will always pass
5
max_depth
Maximum read depth for calls. In order of preference/availability this is from DP, total of AD, or if both are absent, a dummy value which will always pass
1000
min_alt_depth
If all other conditions succeed, at least this many Alt observations to confirm a de Novo call
5
min_proband_gq
Minumum genotype quality score for all variants in probands/affected participants (detected via Pedigree)
Description: Controls all filtering parameters for mode-of-inheritance (MOI) validation during per-family analysis.
Talos applies four distinct MOI filter models, each tailored to different variant types and inheritance patterns. For each variant, Talos selects the most appropriate model based on both the variant’s attributes and the MOI being evaluated.
Parameters shared by all models:
Field
Purpose
Default
min_callset_ac_to_filter
Minimum allele count (AC) in the callset before intra-callset allele frequency (AF) filtering is applied. Intra-callset AF filters are highly effective for removing callset-specific artefacts, but may be too stringent for small cohorts. This parameter prevents the exclusion of rare variants in smaller callsets by only applying AF filtering when a variant reaches the specified AC threshold.
10
min_alt_depth
Min. alt-supporting reads
5
minimum_depth
Min. total read depth
10
The Four MOI Filter models are:
ClinVarDominant - Applied when considering Dominant inheritance, if the variant has a P/LP ClinVar rating. Strict, but less strict than non-ClinVar P/LP Dominant
Field
Purpose
Default
clinvar_dominant_gnomad_max_af
Max gnomAD 4.1 Combined AF
0.00005
clinvar_dominant_callset_max_af
Max intra-callset AF
0.05
ClinVar - Applied to ClinVar P/LP variants when considering non-Dominant inheritance patterns.
Field
Purpose
Default
clinvar_gnomad_max_af
Max gnomAD 4.1 Combined AF
0.05
clinvar_callset_max_af
Max intra-callset AF
0.05
Dominant - Applied to all non-ClinVar Dominant MOIs
Field
Purpose
Default
dominant_callset_max_ac
Max intra-callset AC
10
dominant_callset_max_af
Max intra-callset AF
0.01
dominant_callset_sv_max_af
Max intra-callset AC (Structural Variants)
0.01
dominant_gnomad_max_af
Max gnomAD 4.1 Combined AF
0.00001
dominant_gnomad_max_ac
Max gnomAD 4.1 Combined AC
10
dominant_gnomad_max_homozygotes
Max gnomAD 4.1 Combined Homozygotes
0
dominant_gnomad_sv_max_af
Max gnomAD 4.1 Combined AF (Structural Variants)
0
Global - Applied to non-Dominant MOIs when the variant is not in ClinVar
Field
Purpose
Default
callset_max_af
Max intra-callset AF
0.01
callset_sv_max_af
Max intra-callset AF (Structural Variants)
0.01
gnomad_max_af
Max gnomAD 4.1 Combined AF
0.01
gnomad_max_homozygotes
Max gnomAD 4.1 Combined Homozygotes
5
gnomad_max_hemizygotes
Max gnomAD 4.1 Combined Hemizygotes
5
gnomad_sv_max_af
Max gnomAD 4.1 Combined AF (Structural Variants)
5
And finally, there are some further meta-parameters which control the variants being considered. These parameters are used to facilitate inclusion of less confident annotations (e.g. in silico predictions) whilst preventing noise from these tools dominating the overall results:
Field
Purpose
Default
ignore_categories
Categories which are ignored for this analysis. Variants with only these cats. assigned will never reach the report
[exomiser, svdb]
exomiser_rank_threshold
If Exomiser is used, limit results to the top n hits
2
phenotype_match
List of categories, variants with only these assigned will be removed from the report unless a phenotype-match is detected
[6]
support_categories
List of categories, variants with only these assigned will be removed from the report unless detected with a comp-het partner having a 'full' value category
Description: Controls how URLs linking out to external resources are generated and embedded into the report
Hyperlinks: The CreateTalosHTML.hyperlinks block can be absent, in which case no URL links are generated. If present, a LinkEngine will be generated to create all external resource links for the report
Field
Purpose
hyperlinks.template
Mandatory if hyperlinks section is present. A String template containing an instance of {sample} for interpolating with the chosen Sample ID. This should define a template link to a per-family or per-proband resource
hyperlinks.variant_template
Similar to template, but containing both {variant} and {sample}. This is used to generate a link to a per-variant resource, embedding both Sample ID and chr-pos-ref-alt (seqr/gnomAD-style)
To generate Hyperlinks to Seqr, this section should contain the hyperlinks block with relevant templates, alongside a file mapping the VCF sample IDs to the IDs used in the target system. This should be added to the NextFlow config as params.seqr_lookup, and can be a TSV, CSV, or JSON file. This is designed to be as flexible as possible, but may be daunting if you're starting from scratch. Worked examples:
Scenario: I want to generate links using a template, but the sample ID from the VCF is not suitable for the target URL, so I want to map from the VCF sample ID to a URL-appropriate ID in a TSV/CSV/JSON file
VCF ID: SAM1
Link ID: ExtSam1
Lookup JSON (file identified by params.seqr_lookup in NextFlow config)