Resources¶
The following resources are available for use:
cpg_flow.resources.gcp_machine_name
¶
gcp_machine_name(name, ncpu)
Machine type name in the GCP world
Source code in src/cpg_flow/resources.py
17 18 19 20 21 22 23 |
|
cpg_flow.resources.MachineType
dataclass
¶
MachineType(
name,
ncpu,
mem_gb_per_core,
price_per_hour,
disk_size_gb,
)
Hail Batch machine type on GCP
Source code in src/cpg_flow/resources.py
37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
max_threads
¶
max_threads()
Number of available threads
Source code in src/cpg_flow/resources.py
51 52 53 54 55 |
|
calc_instance_disk_gb
¶
calc_instance_disk_gb()
The maximum available storage on an instance is calculated
in batch/batch/utils.py/unreserved_worker_data_disk_size_gib()
as the disk size (375G) minus reserved image size (30G) minus
reserved storage per core (5G*ncpu = 120G for a 32-core instance),
Source code in src/cpg_flow/resources.py
57 58 59 60 61 62 63 64 65 66 |
|
set_resources
¶
set_resources(
j,
fraction=None,
ncpu=None,
nthreads=None,
mem_gb=None,
storage_gb=None,
)
Set resources to a Job object. If any optional parameters are set, they will be used as a bound to request a fraction of an instance.
Source code in src/cpg_flow/resources.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
request_resources
¶
request_resources(
fraction=None,
ncpu=None,
nthreads=None,
mem_gb=None,
storage_gb=None,
)
Request resources from the machine, satisfying all provided requirements. If not requirements are provided, the minimal amount of cores (self.MIN_NCPU) will be used.
Source code in src/cpg_flow/resources.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
fraction_to_ncpu
¶
fraction_to_ncpu(fraction)
Converts fraction to the number of CPU (e.g. fraction=1.0 to take the entire machine, fraction=0.5 to take half of it, etc.).
Source code in src/cpg_flow/resources.py
125 126 127 128 129 130 131 |
|
mem_gb_to_ncpu
¶
mem_gb_to_ncpu(mem_gb)
Converts memory requirement to the number of CPU requirement.
Source code in src/cpg_flow/resources.py
133 134 135 136 137 138 |
|
storage_gb_to_ncpu
¶
storage_gb_to_ncpu(storage_gb)
Converts storage requirement to the number of CPU requirement.
We want to avoid attaching disks: attaching a disk to an existing instance
might fail with mkfs.ext4 ...
error, see:
https://batch.hail.populationgenomics.org.au/batches/7488/jobs/12
So this function will calculate the number of CPU to request so your jobs
can be packed to fit the default instance's available storage
(calculated with self.calc_instance_disk_gb()).
Source code in src/cpg_flow/resources.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
nthreads_to_ncpu
¶
nthreads_to_ncpu(nthreads)
Convert number of threads into number of cores/CPU
Source code in src/cpg_flow/resources.py
155 156 157 158 159 |
|
adjust_ncpu
¶
adjust_ncpu(ncpu)
Adjust request number of CPU to a number allowed by Hail, i.e. the nearest power of 2, not less than the minimal number of cores allowed.
Source code in src/cpg_flow/resources.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
cpg_flow.resources.STANDARD
module-attribute
¶
STANDARD = MachineType(
"standard",
ncpu=16,
mem_gb_per_core=3.75,
price_per_hour=1.0787,
disk_size_gb=375,
)
cpg_flow.resources.HIGHMEM
module-attribute
¶
HIGHMEM = MachineType(
"highmem",
ncpu=16,
mem_gb_per_core=6.5,
price_per_hour=1.3431,
disk_size_gb=375,
)
cpg_flow.resources.JobResource
dataclass
¶
JobResource(
machine_type, ncpu=None, attach_disk_storage_gb=None
)
Represents a fraction of a Hail Batch instance.
@param machine_type: Hail Batch machine pool type @param ncpu: number of CPU request. Will be used to calculate the fraction of the machine to take. If not set, all machine's CPUs will be used. @param attach_disk_storage_gb: if set to > MachineType.max_default_storage_gb, a larger disc will be attached by Hail Batch.
Source code in src/cpg_flow/resources.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
get_mem_gb
¶
get_mem_gb()
Memory resources in GB
Source code in src/cpg_flow/resources.py
241 242 243 244 245 |
|
java_mem_options
¶
java_mem_options(overhead_gb=1)
Returns -Xms -Xmx options to set Java JVM memory usage to use all the memory resources represented. @param overhead_gb: Amount of memory (in decimal GB) to leave available for other purposes.
Source code in src/cpg_flow/resources.py
247 248 249 250 251 252 253 254 255 256 257 258 |
|
java_gc_thread_options
¶
java_gc_thread_options(surplus=2)
Returns -XX options to set Java JVM garbage collection threading. @param surplus: Number of threads to leave available for other purposes.
Source code in src/cpg_flow/resources.py
260 261 262 263 264 265 266 |
|
get_ncpu
¶
get_ncpu()
Number of cores/CPU
Source code in src/cpg_flow/resources.py
268 269 270 271 272 |
|
get_nthreads
¶
get_nthreads()
Number of threads
Source code in src/cpg_flow/resources.py
274 275 276 277 278 |
|
get_storage_gb
¶
get_storage_gb()
Calculate storage in GB
Source code in src/cpg_flow/resources.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
|
set_to_job
¶
set_to_job(j)
Set the resources to a Job object. Return self to allow chaining, e.g.:
nthreads = STANDARD.request_resources(nthreads=4).set_to_job(j).get_nthreads()
Source code in src/cpg_flow/resources.py
296 297 298 299 300 301 302 303 304 305 306 307 |
|
cpg_flow.resources.storage_for_cram_qc_job
¶
storage_for_cram_qc_job()
Get storage request for a CRAM QC processing job, gb
Source code in src/cpg_flow/resources.py
310 311 312 313 314 315 316 317 318 319 320 |
|
cpg_flow.resources.joint_calling_scatter_count
¶
joint_calling_scatter_count(sequencing_group_count)
Number of partitions for joint-calling jobs (GenotypeGVCFs, VQSR, VEP), as a function of the sequencing group number.
Source code in src/cpg_flow/resources.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
|
cpg_flow.resources.storage_for_joint_vcf
¶
storage_for_joint_vcf(
sequencing_group_count, site_only=True
)
Storage enough to fit and process a joint-called VCF
Source code in src/cpg_flow/resources.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 |
|