Target
class¶
The following Target
classes are available to use:
Target
Cohort
Dataset
MultiCohort
SequencingGroup
You can import them from the cpg_flow
package:
from cpg_flow.targets import Cohort, Dataset, MultiCohort, SequencingGroup, Target
cpg_flow.targets.Target
¶
Target()
Defines a target that a stage can act upon.
Source code in src/cpg_flow/targets/target.py
49 50 51 52 53 54 55 56 57 58 |
|
target_id
property
¶
target_id
ID should be unique across target of all levels.
We are raising NotImplementedError instead of making it an abstract class, because mypy is not happy about binding TypeVar to abstract classes, see: https://stackoverflow.com/questions/48349054/how-do-you-annotate-the-type-of -an-abstract-class-with-mypy
Specifically,
TypeVar('TargetT', bound=Target)
Only concrete class can be given where "Type[Target]" is expected
get_sequencing_groups
¶
get_sequencing_groups(only_active=True)
Get flat list of all sequencing groups corresponding to this target.
Source code in src/cpg_flow/targets/target.py
60 61 62 63 64 65 66 67 |
|
get_sequencing_group_ids
¶
get_sequencing_group_ids(only_active=True)
Get flat list of all sequencing group IDs corresponding to this target.
Source code in src/cpg_flow/targets/target.py
69 70 71 72 73 |
|
get_alignment_inputs_hash
¶
get_alignment_inputs_hash()
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage: - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) - we then set up the Stages, where alignment input hashes are generated - at this point, the alignment inputs are fixed - all calls to get_alignment_inputs_hash() need to return the same value
Source code in src/cpg_flow/targets/target.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
set_alignment_inputs_hash
¶
set_alignment_inputs_hash()
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
Source code in src/cpg_flow/targets/target.py
91 92 93 94 95 96 97 98 99 100 |
|
get_job_attrs
¶
get_job_attrs()
Attributes for Hail Batch job.
Source code in src/cpg_flow/targets/target.py
123 124 125 126 127 |
|
get_job_prefix
¶
get_job_prefix()
Prefix job names.
Source code in src/cpg_flow/targets/target.py
129 130 131 132 133 |
|
rich_id_map
¶
rich_id_map()
Map if internal IDs to participant or external IDs, if the latter is provided.
Source code in src/cpg_flow/targets/target.py
135 136 137 138 139 |
|
cpg_flow.targets.Cohort
¶
Cohort(id=None, name=None)
Bases: Target
Represents a "cohort" target - all sequencing groups from a single CustomCohort (potentially spanning multiple datasets) in the workflow. Analysis dataset name is required and will be used as the default name for the cohort.
Source code in src/cpg_flow/targets/cohort.py
43 44 45 46 47 48 |
|
analysis_dataset
instance-attribute
¶
analysis_dataset = Dataset(
name=get_config()["workflow"]["dataset"]
)
get_sequencing_group_ids
¶
get_sequencing_group_ids(only_active=True)
Get flat list of all sequencing group IDs corresponding to this target.
Source code in src/cpg_flow/targets/target.py
69 70 71 72 73 |
|
get_alignment_inputs_hash
¶
get_alignment_inputs_hash()
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage: - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) - we then set up the Stages, where alignment input hashes are generated - at this point, the alignment inputs are fixed - all calls to get_alignment_inputs_hash() need to return the same value
Source code in src/cpg_flow/targets/target.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
set_alignment_inputs_hash
¶
set_alignment_inputs_hash()
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
Source code in src/cpg_flow/targets/target.py
91 92 93 94 95 96 97 98 99 100 |
|
rich_id_map
¶
rich_id_map()
Map if internal IDs to participant or external IDs, if the latter is provided.
Source code in src/cpg_flow/targets/target.py
135 136 137 138 139 |
|
get_cohort_id
¶
get_cohort_id()
Get the cohort ID
Source code in src/cpg_flow/targets/cohort.py
58 59 60 |
|
write_ped_file
¶
write_ped_file(out_path=None, use_participant_id=False)
Create a PED file for all samples in the whole cohort PED is written with no header line to be strict specification compliant
Source code in src/cpg_flow/targets/cohort.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
add_sequencing_group_object
¶
add_sequencing_group_object(s, allow_duplicates=True)
Add a sequencing group object to the Cohort. Args: s: SequencingGroup object allow_duplicates: if True, allow adding the same object twice
Source code in src/cpg_flow/targets/cohort.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
get_sequencing_groups
¶
get_sequencing_groups(only_active=True)
Gets a flat list of all sequencing groups from all datasets. Include only "active" sequencing groups (unless only_active is False)
Source code in src/cpg_flow/targets/cohort.py
112 113 114 115 116 117 118 119 120 |
|
get_job_attrs
¶
get_job_attrs()
Attributes for Hail Batch job.
Source code in src/cpg_flow/targets/cohort.py
122 123 124 125 126 127 128 |
|
get_job_prefix
¶
get_job_prefix()
Prefix job names.
Source code in src/cpg_flow/targets/cohort.py
130 131 132 133 134 |
|
to_tsv
¶
to_tsv()
Export to a parsable TSV file
Source code in src/cpg_flow/targets/cohort.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
|
cpg_flow.targets.Dataset
¶
Dataset(name)
Bases: Target
Represents a CPG dataset.
Each dataset
at the CPG corresponds to
* a GCP project: https://github.com/populationgenomics/team-docs/tree/main/storage_policies
* a Pulumi stack: https://github.com/populationgenomics/analysis-runner/tree/main/stack
* a metamist project
Source code in src/cpg_flow/targets/dataset.py
47 48 49 50 51 52 53 54 |
|
get_sequencing_group_ids
¶
get_sequencing_group_ids(only_active=True)
Get flat list of all sequencing group IDs corresponding to this target.
Source code in src/cpg_flow/targets/target.py
69 70 71 72 73 |
|
get_alignment_inputs_hash
¶
get_alignment_inputs_hash()
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage: - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) - we then set up the Stages, where alignment input hashes are generated - at this point, the alignment inputs are fixed - all calls to get_alignment_inputs_hash() need to return the same value
Source code in src/cpg_flow/targets/target.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
set_alignment_inputs_hash
¶
set_alignment_inputs_hash()
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
Source code in src/cpg_flow/targets/target.py
91 92 93 94 95 96 97 98 99 100 |
|
rich_id_map
¶
rich_id_map()
Map if internal IDs to participant or external IDs, if the latter is provided.
Source code in src/cpg_flow/targets/target.py
135 136 137 138 139 |
|
create
staticmethod
¶
create(name)
Create a dataset.
Source code in src/cpg_flow/targets/dataset.py
56 57 58 59 60 61 |
|
prefix
¶
prefix(**kwargs)
The primary storage path.
Source code in src/cpg_flow/targets/dataset.py
74 75 76 77 78 79 80 81 82 83 84 |
|
tmp_prefix
¶
tmp_prefix(**kwargs)
Storage path for temporary files.
Source code in src/cpg_flow/targets/dataset.py
86 87 88 89 90 91 92 93 94 95 96 97 |
|
analysis_prefix
¶
analysis_prefix(**kwargs)
Storage path for analysis files.
Source code in src/cpg_flow/targets/dataset.py
99 100 101 102 103 104 105 106 107 108 109 110 |
|
web_prefix
¶
web_prefix(**kwargs)
Path for files served by an HTTP server Matches corresponding URLs returns by self.web_url() URLs.
Source code in src/cpg_flow/targets/dataset.py
112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
web_url
¶
web_url()
URLs matching self.storage_web_path() files serverd by an HTTP server.
Source code in src/cpg_flow/targets/dataset.py
126 127 128 129 130 131 132 133 |
|
add_sequencing_group
¶
add_sequencing_group(
id,
*,
sequencing_type,
sequencing_technology,
sequencing_platform,
external_id=None,
participant_id=None,
meta=None,
sex=None,
pedigree=None,
alignment_input=None
)
Create a new sequencing group and add it to the dataset.
Source code in src/cpg_flow/targets/dataset.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
add_sequencing_group_object
¶
add_sequencing_group_object(s)
Add a sequencing group object to the dataset. Args: s: SequencingGroup object
Source code in src/cpg_flow/targets/dataset.py
178 179 180 181 182 183 184 185 186 187 188 189 |
|
get_sequencing_group_by_id
¶
get_sequencing_group_by_id(id)
Get sequencing group by ID
Source code in src/cpg_flow/targets/dataset.py
191 192 193 194 195 |
|
get_sequencing_groups
¶
get_sequencing_groups(only_active=True)
Get dataset's sequencing groups. Include only "active" sequencing groups, unless only_active=False
Source code in src/cpg_flow/targets/dataset.py
197 198 199 200 201 202 203 204 |
|
get_job_attrs
¶
get_job_attrs()
Attributes for Hail Batch job.
Source code in src/cpg_flow/targets/dataset.py
206 207 208 209 210 211 212 213 |
|
get_job_prefix
¶
get_job_prefix()
Prefix job names.
Source code in src/cpg_flow/targets/dataset.py
215 216 217 218 219 |
|
write_ped_file
¶
write_ped_file(out_path=None, use_participant_id=False)
Create a PED file for all sequencing groups PED is written with no header line to be strict specification compliant
Source code in src/cpg_flow/targets/dataset.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
cpg_flow.targets.MultiCohort
¶
MultiCohort()
Bases: Target
Represents a "multi-cohort" target - multiple cohorts in the workflow.
Source code in src/cpg_flow/targets/multicohort.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
name
instance-attribute
¶
name = hash_from_list_of_strings(
sorted(input_cohorts), suffix="cohorts"
)
analysis_dataset
instance-attribute
¶
analysis_dataset = Dataset(
name=get_config()["workflow"]["dataset"]
)
get_sequencing_group_ids
¶
get_sequencing_group_ids(only_active=True)
Get flat list of all sequencing group IDs corresponding to this target.
Source code in src/cpg_flow/targets/target.py
69 70 71 72 73 |
|
get_alignment_inputs_hash
¶
get_alignment_inputs_hash()
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage: - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) - we then set up the Stages, where alignment input hashes are generated - at this point, the alignment inputs are fixed - all calls to get_alignment_inputs_hash() need to return the same value
Source code in src/cpg_flow/targets/target.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
set_alignment_inputs_hash
¶
set_alignment_inputs_hash()
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
Source code in src/cpg_flow/targets/target.py
91 92 93 94 95 96 97 98 99 100 |
|
get_job_prefix
¶
get_job_prefix()
Prefix job names.
Source code in src/cpg_flow/targets/target.py
129 130 131 132 133 |
|
rich_id_map
¶
rich_id_map()
Map if internal IDs to participant or external IDs, if the latter is provided.
Source code in src/cpg_flow/targets/target.py
135 136 137 138 139 |
|
create_dataset
¶
create_dataset(name)
Create a dataset and add it to the cohort.
Source code in src/cpg_flow/targets/multicohort.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
get_cohorts
¶
get_cohorts(only_active=True)
Gets list of all cohorts. Include only "active" cohorts (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
79 80 81 82 83 84 85 86 87 |
|
get_cohort_ids
¶
get_cohort_ids(only_active=True)
Get list of cohort IDs. Include only "active" cohorts (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
89 90 91 92 93 94 |
|
get_cohort_by_id
¶
get_cohort_by_id(id, only_active=True)
Get cohort by id. Include only "active" cohorts (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
get_datasets
¶
get_datasets(only_active=True)
Gets list of all datasets. Include only "active" datasets (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
115 116 117 118 119 120 121 122 123 |
|
get_sequencing_groups
¶
get_sequencing_groups(only_active=True)
Gets a flat list of all sequencing groups from all datasets. uses a dictionary to avoid duplicates (we could have the same sequencing group in multiple cohorts) Include only "active" sequencing groups (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
create_cohort
¶
create_cohort(id, name)
Create a cohort and add it to the multi-cohort.
Source code in src/cpg_flow/targets/multicohort.py
140 141 142 143 144 145 146 147 148 149 150 |
|
add_dataset
¶
add_dataset(d)
Add a Dataset to the MultiCohort Args: d: Dataset object
Source code in src/cpg_flow/targets/multicohort.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
get_dataset_by_name
¶
get_dataset_by_name(name, only_active=True)
Get dataset by name. Include only "active" datasets (unless only_active is False)
Source code in src/cpg_flow/targets/multicohort.py
167 168 169 170 171 172 173 174 175 176 177 |
|
get_job_attrs
¶
get_job_attrs()
Attributes for Hail Batch job.
Source code in src/cpg_flow/targets/multicohort.py
179 180 181 182 183 184 185 186 187 |
|
write_ped_file
¶
write_ped_file(out_path=None, use_participant_id=False)
Create a PED file for all samples in the whole MultiCohort Duplication of the Cohort method PED is written with no header line to be strict specification compliant
Source code in src/cpg_flow/targets/multicohort.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
cpg_flow.targets.SequencingGroup
¶
SequencingGroup(
id,
dataset,
*,
sequencing_type,
sequencing_technology,
sequencing_platform,
external_id=None,
participant_id=None,
meta=None,
sex=None,
pedigree=None,
alignment_input=None,
assays=None,
forced=False
)
Bases: Target
Represents a sequencing group.
Source code in src/cpg_flow/targets/sequencing_group.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
pedigree
instance-attribute
¶
pedigree = pedigree or PedigreeInfo(
sequencing_group=self,
fam_id=participant_id,
sex=sex or UNKNOWN,
)
participant_id
property
writable
¶
participant_id
Get ID of participant corresponding to this sequencing group, or substitute it with external ID.
rich_id
property
¶
rich_id
ID for reporting purposes: composed of internal as well as external or participant IDs.
make_sv_evidence_path
property
¶
make_sv_evidence_path
Path to the evidence root for GATK-SV evidence files.
get_sequencing_group_ids
¶
get_sequencing_group_ids(only_active=True)
Get flat list of all sequencing group IDs corresponding to this target.
Source code in src/cpg_flow/targets/target.py
69 70 71 72 73 |
|
get_alignment_inputs_hash
¶
get_alignment_inputs_hash()
If this hash has been set, return it, otherwise set it, then return it This should be safe as it matches the current usage: - we set up the Targets in this workflow (populating SGs, Datasets, Cohorts) - at this point the targets are malleable (e.g. addition of an additional Cohort may add SGs to Datasets) - we then set up the Stages, where alignment input hashes are generated - at this point, the alignment inputs are fixed - all calls to get_alignment_inputs_hash() need to return the same value
Source code in src/cpg_flow/targets/target.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
set_alignment_inputs_hash
¶
set_alignment_inputs_hash()
Unique hash string of sample alignment inputs. Useful to decide whether the analysis on the target needs to be rerun.
Source code in src/cpg_flow/targets/target.py
91 92 93 94 95 96 97 98 99 100 |
|
rich_id_map
¶
rich_id_map()
Map if internal IDs to participant or external IDs, if the latter is provided.
Source code in src/cpg_flow/targets/target.py
135 136 137 138 139 |
|
get_ped_dict
¶
get_ped_dict(use_participant_id=False)
Returns a dictionary of pedigree fields for this sequencing group, corresponding a PED file entry.
Source code in src/cpg_flow/targets/sequencing_group.py
126 127 128 129 130 131 |
|
make_cram_path
¶
make_cram_path()
Path to a CRAM file. Not checking its existence here.
Source code in src/cpg_flow/targets/sequencing_group.py
133 134 135 136 137 138 139 140 141 142 |
|
make_gvcf_path
¶
make_gvcf_path()
Path to a GVCF file. Not checking its existence here.
Source code in src/cpg_flow/targets/sequencing_group.py
144 145 146 147 148 |
|
get_sequencing_groups
¶
get_sequencing_groups(only_active=True)
Implementing the abstract method.
Source code in src/cpg_flow/targets/sequencing_group.py
162 163 164 165 166 167 168 169 170 171 |
|
get_job_attrs
¶
get_job_attrs()
Attributes for Hail Batch job.
Source code in src/cpg_flow/targets/sequencing_group.py
173 174 175 176 177 178 179 180 181 182 183 184 |
|
get_job_prefix
¶
get_job_prefix()
Prefix job names.
Source code in src/cpg_flow/targets/sequencing_group.py
186 187 188 189 190 |
|