primertool package
Submodules
primertool.exceptions module
Exceptions for the primertool package.
- class primertool.exceptions.InSilicoPCRError[source]
Bases:
object
Base class for exceptions in insilicopcr module.
- class primertool.exceptions.InSilicoPCRRequestError[source]
Bases:
InSilicoPCRError
Exception raised request errors to the InSilicoPCR API.
- exception primertool.exceptions.PrimertoolError[source]
Bases:
Exception
Base class for exceptions in the primertool module.
- exception primertool.exceptions.PrimertoolExonLengthError[source]
Bases:
PrimertoolError
Exception raised when the exon length exceeds the max insert size.
- exception primertool.exceptions.PrimertoolGenomeError[source]
Bases:
PrimertoolError
Exception raised for errors in the genome.
- exception primertool.exceptions.PrimertoolInputError[source]
Bases:
PrimertoolError
Exception raised for errors in the input.
- exception primertool.exceptions.PrimertoolIntronicPositionError[source]
Bases:
PrimertoolError
Exception raised when the given position is intronic.
- exception primertool.exceptions.PrimertoolMutalyzerError[source]
Bases:
PrimertoolError
Exception raised for errors in the Mutalyzer API.
- exception primertool.exceptions.PrimertoolNoPrimerFoundError[source]
Bases:
PrimertoolError
Exception raised when no primer is found.
primertool.functions module
This module contains static auxiliary functions that are used in the primertool package.
- primertool.functions.calculate_targets(target_start: int, target_end: int, primer_bases: int) dict [source]
Defining the sequence start and end positions.
Primer bases are the number of bases to each side of the target sequence in which primer3 looks for a possible primer. These are added to sequence start and end respectively to calculate the sequence start and end positions. The size range is a list with the length of the target sequence and target sequence with the primer bases added. Primer3 should design primers in this size range.
- Parameters:
target_start (int) – Start position of the target sequence
target_end (int) – End position of the target sequence
primer_bases (int) – Number of bases to each side of the target sequence in which primer3 looks for a possible primer
- Returns:
A dictionary containing the sequence start and end positions, the number of bases to each side of the target sequence in which primer3 looks for a possible primer, the length of the target sequence, and the size range for primer3.
- primertool.functions.correct_intronic_variant(response: dict, variant: str) str [source]
Handle the case where the given variant is intronic. If the offset is <= 5, drop the offset. Otherwise, raise an exception.
- Parameters:
response (dict) – The response from the mutalyzer API
variant (str) – The variant to correct
- Raises:
PrimertoolIntronicPositionError – If the offset is too large to be corrected automatically (>)
- Returns:
The corrected variant
- primertool.functions.filter_unique_primers(primer3_dict: dict) Tuple[dict, bool] [source]
Check primer uniqueness and remove all primers with multiple binding sites, so that only the uniquely binding ones remain. Also returns a flag if all primers were invalid (i.e. not uniquely binding), but only if there was at least one primer to begin with.
- Parameters:
primer3_dict (dict) – Primer3 output dictionary
- Returns:
Tuple of the filtered primer3_dict and a flag indicating if all primers were invalid
- primertool.functions.find_sequence_positions(exon_starts: list, exon_ends: list, exon_count: int, strand: str, mutation_position: Interval) dict [source]
Establish if variant position is in an exon.
The hgvs.parser object has the start and end position of the variant (in case of an indel rather than a SNP). Using these positions and the lists of exon starts and ends, determine if the variant is located in an exon.
The information is returned as a dictionary. If the gene is based on the - strand, the exon number needs to be inverted.
- Parameters:
exon_starts (list) – List of exon start positions
exon_ends (list) – List of exon end positions
exon_count (int) – Number of exons
strand (str) – Strand of the gene (‘+’ or ‘-‘)
mutation_position (hgvs.location.Interval) – Start and end position of the variant
- Returns:
Dictionary containing the exon number, start and end position of the variant, the length of the variant, and a boolean indicating if the variant is in an exon
- primertool.functions.get_gene_information(genome_assembly: str, nm_number: str) dict [source]
Retrieve gene information from RefSeq database for a given NM number.
- Parameters:
genome_assembly (str) – The genome assembly to use (‘hg38’ or ‘hg19’)
nm_number (str) – The NM number of the gene to retrieve information for
- Returns:
A dictionary containing the gene information
- primertool.functions.get_snps(chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) list [source]
Retrieve common SNPs from the UCSC database.
Query the UCSC database with the chromosome and position data to find common SNPs in the sequence, which need to be masked. SNPs are returned as list.
- Parameters:
chromosome (str) – chromosome name as string (e.g. ‘chr1’)
seq_start (int) – start position of the sequence
seq_end (int) – end position of the sequence
genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)
- Returns:
List of common SNPs in the sequence
- primertool.functions.mask_snps(genome: Genome, chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) str [source]
Mask common SNP positions with an N in the sequence.
Using the get_snps() function to retrieve common SNPs in the given sequence from UCSC. Extract the genomic sequence between the given positions via genomepy and replace the bases at common SNP positions with an N.
- Parameters:
genome (genomepy.Genome) – genomepy genome object
chromosome (str) – chromosome name as string (e.g. ‘chr1’)
seq_start (int) – start position of the sequence
seq_end (int) – end position of the sequence
genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)
- Returns:
The sequence with common SNPs masked as a string
- primertool.functions.mutalyzer_error_handler(response: dict) PrimertoolInputError [source]
Checks for errors in the mutalyzer response and raises an exception if there is an error.
- Parameters:
response (dict) – The response from the mutalyzer API
- Raises:
PrimertoolInputError – If there is an error in the response
- primertool.functions.parse_mutation(mutation: str) SequenceVariant [source]
Parse mutation with hgvs parser.
Gene names in brackets are removed from the variant (eg: eg: NM_003165.6(STXBP1):c.1702G>A). The mutation is then parsed using hgvs.parser and a hgvs tree object (see https://hgvs.readthedocs.io/en/stable/key_concepts.html#variant-object-representation) is returned. Parses coding and genomic variants.
- Parameters:
mutation (str) – mutation in HGVS nomenclature
- Raises:
PrimertoolInputError – If there is an error in the mutation nomenclature
- Returns:
hgvs.sequencevariant.SequenceVariant
- primertool.functions.purge_primer_pair(primer3_dict: dict, index: int) dict [source]
Removes primer pair of given index from primer3_dict and updates the dictionary so that the remaining data stays consistent. (I.e. reset indices so enumeration does not have gaps and update)
- Parameters:
primer3_dict (dict) – Primer3 output dictionary
index (int) – Index of primer pair to remove
- Returns:
Primer3 output dictionary with the primer pair of the given index removed
- primertool.functions.reduce_numbers_in_string(input_string: str) str [source]
Takes an input string and reduces any (positive) integer in it by one.
- Parameters:
input_string (str) – String to reduce numbers in
- Returns:
String with all numbers reduced by one
- primertool.functions.remove_whitespaces(func: callable) callable [source]
Decorator to remove whitespaces from all string arguments before passing them to the decorated function.
- Parameters:
func (callable) – The function to decorate
- Returns:
The decorated function
- primertool.functions.split_nm(nm_number: str) Tuple[str, int] [source]
Split variant.ac from hgvs object into transcript and version number
If the transcript number includes a version number, split at ‘.’. If not, set the version number to 1.
- Parameters:
nm_number (str) – NM number (e.g. NM_000451.3)
- Returns:
Tuple of transcript and version number
primertool.insilicopcr module
This module provides a class to perform in-silico PCR. It serves as a python interface to the UCSC In-Silico PCR tool (https://genome.ucsc.edu/cgi-bin/hgPcr).
- class primertool.insilicopcr.InSilicoPCR(forward_primer: str, reverse_primer: str, max_product_size: int = 4000, min_perfect_match: int = 15, min_good_match: int = 15, flip_reverse_primer: bool = False)[source]
Bases:
object
This class provides a python interface to the UCSC In-Silico PCR tool. It takes a forward and a reverse primer and returns the PCR product.
- forward_primer
forward primer sequence
- Type:
str
- reverse_primer
reverse primer sequence
- Type:
str
- fasta_pcr
fasta output from the UCSC In-Silico PCR
- Type:
list
- is_uniquely_binding() bool [source]
Check if the PCR product is uniquely binding to just one site. A PCR primer is uniquely binding if it binds to only one site in the genome. Note: sometimes the same PCR product is found in a chromosome and also an alt version of the chromosome, i.e. two entries describe the same PCR product and should only be counted as one.
primertool.logger module
This module contains the logger configuration for the primertool package.
- class primertool.logger.CustomFormatter(fmt: str)[source]
Bases:
Formatter
Logging colored formatter, adapted from https://stackoverflow.com/a/56944256/3638629
- fmt
The format string to use for the log messages
- Type:
str
- FORMATS
A dictionary mapping log levels to their respective format strings
- Type:
dict
- primertool.logger.init_logger(fmt: str = '[%(levelname)s] %(message)s', level: int = 10, save_to: str | None = None) Logger [source]
Initialize the logger for the primertool package.
- Parameters:
fmt (str) – The format string to use for the log messages
level (int) – The logging level to use
save_to (str) – The file to save the logs to (optional)
- Returns:
The configured logger
- Return type:
logging.Logger
primertool.primertool module
This module contains the classes for generating primers for a given genomic position, gene, exon or variant.
- class primertool.primertool.ExonPrimerGenerator(**kwargs)[source]
Bases:
PrimerGenerator
Class for generating primers for a given exon.
- nm_number
NM number of the gene
- Type:
str
- exon_number
Exon number
- Type:
int
- variant_pos
Dictionary containing the variant position
- Type:
dict
- gene_info
Gene information
- Type:
dict
- chromosome
Chromosome of the gene
- Type:
str
- ordertable
DataFrame containing the order information for the primer pair
- Type:
pd.DataFrame
- check_exon()[source]
Checking if exon exists in gene.
- Raises:
PrimertoolInputError – If the given exon number is larger than the number of exons in the gene
- check_nm_number()[source]
Check if nm_number is valid.
- Raises:
PrimertoolInputError – If the given NM number is invalid
- get_exon_boundaries() Tuple[int, int] [source]
Get exon boundaries for the given exon number.
- Returns:
Tuple of exon start and exon end (tuple)
- get_ordertable(gene_info: dict, list_primers: list) DataFrame [source]
Get ordertable for exon primers.
- Parameters:
gene_info (dict) – Gene information
list_primers (list) – List of primers
- Raises:
PrimertoolNoPrimerFoundError – If no primers were found for the given exon
- Returns:
Pandas DataFrame containing the order information for the primer pair
- class primertool.primertool.ExonPrimerPair(gene_info: dict, primer_info: list, exon_number: int)[source]
Bases:
PrimerPair
Class for storing and handling primer pairs for a given exon.
- gene_info
Gene information
- Type:
dict
- gene_name
Name of the gene (e.g. BRCA1)
- Type:
str
- chromosome
Chromosome of the gene (e.g. chr1)
- Type:
str
- nm_number
NM number of the gene (e.g. NM_000451)
- Type:
str
- strand
Strand of the gene (either ‘+’ or ‘-‘)
- Type:
str
- exon_number
Exon number
- Type:
int
- orderprimer_forwards
Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.
- Type:
str
- orderprimer_reverse
Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.
- Type:
str
- primer_forwards
Forward primer sequence.
- Type:
str
- primer_reverse
Reverse primer sequence.
- Type:
str
- ordertable
DataFrame containing the order information for the
- Type:
pd.DataFrame
- class primertool.primertool.GenePrimerGenerator(**kwargs)[source]
Bases:
PrimerGenerator
Class for generating primers for a given gene.
- nm_number
NM number of the gene
- Type:
str
- gene_info
Gene information
- Type:
dict
- chromosome
Chromosome of the gene
- Type:
str
- ordertable
DataFrame containing the order information for the primer pair
- Type:
pd.DataFrame
- check_nm_number()[source]
Check if nm_number is valid.
- Raises:
PrimertoolInputError – If the given NM number is invalid
- class primertool.primertool.GenomicPositionPrimerGenerator(**kwargs)[source]
Bases:
PrimerGenerator
Class for generating primers for a given genomic position.
- chromosome
Chromosome of the genomic position
- Type:
str
- start
Start position of the genomic position
- Type:
int
- end
End position of the genomic position
- Type:
int
- ordertable
DataFrame containing the order information for the primer pair
- Type:
pd.DataFrame
- static check_chromosome(chromosome: str) str [source]
Check if the given chromosome is valid and fix formatting if necessary.
- Parameters:
chromosome (str) – Chromosome (e.g. chr1)
- Raises:
PrimertoolInputError – If the given chromosome is invalid
- Returns:
Chromosome if valid or raises an exception
- Return type:
str
- class primertool.primertool.GenomicPositionPrimerPair(primer_info, chromosome: str, start: int, end: int)[source]
Bases:
PrimerPair
Class for storing and handling primer pairs for a given genomic position.
- chromosome
Chromosome of the genomic position
- Type:
str
- start
Start position of the genomic position
- Type:
int
- end
End position of the genomic position
- Type:
int
- orderprimer_forwards
Forward primer in ChrStartF;Sequence format for ordering.
- Type:
str
- orderprimer_reverse
Reverse primer in ChrStartR;Sequence format for ordering.
- Type:
str
- primer_forwards
Forward primer sequence.
- Type:
str
- primer_reverse
Reverse primer sequence.
- Type:
str
- ordertable
DataFrame containing the order information for the primer pair.
- Type:
pd.DataFrame
- class primertool.primertool.PrimerGenerator(**kwargs)[source]
Bases:
object
Base class for generating primers for a given genomic position, gene, exon or variant.
- genome_assembly
Genome assembly (e.g. hg38)
- Type:
str
- kuerzel
Kuerzel of the person who is ordering the primers
- Type:
str
- genome_dir
Directory containing the genome files
- Type:
str
- genome
Genome object
- Type:
genomepy.Genome
- max_insert
Maximum insert size for the primer (default: 800)
- Type:
int
- min_insert
Minimum insert size for the primer (default: 200)
- Type:
int
- dist_exon_borders
Distance to the exon borders (default: 40)
- Type:
int
- chromosome
Chromosome of the primer pair
- Type:
str
- static check_genome_assembly(genome_assembly: str) str [source]
Check if the given genome assembly is valid.
- Parameters:
genome_assembly (str) – Genome assembly (e.g. hg38)
- Raises:
PrimertoolInputError – If the given genome assembly is invalid
- Returns:
Genome assembly if valid or raises an exception
- Return type:
str
- check_insert_size(start: int, end: int) list [source]
Check if insert size is between min/max insert size.
The insert size for the primer needs to be between min and max insert size, given the sequencing method. If the insert size is smaller, the difference to the min insert size is added/subtracted from start/end position. If the insert size is bigger than the max insert size, the range is split into multiple chunks,
- Parameters:
start (int) – Start position
end (int) – End position
- Returns:
List of positions for primer generation (list)
- design_primer(start: int, end: int, primer_bases: int) Tuple[dict, int] [source]
Design a primer pair using primer3.
Firstly the targets are defined. Then the genomic sequence is retrieved and the most common SNPs are masked in the sequence.
- Parameters:
start (int) – Start position
end (int) – End position
primer_bases (int) – Number of primer bases
- Returns:
tuple of primer3 results as dict and insert size as int
- fetch_genome() Genome [source]
Check if the given genome assembly is available in the genome directory. If not, download it from UCSC.
- Raises:
PrimertoolInputError – If the genome assembly is invalid
PrimertoolGenomeError – If the genome assembly is not available in the genome directory and cannot be downloaded
- Returns:
Genome object (genomepy.Genome)
- class primertool.primertool.PrimerPair(primer_info: list)[source]
Bases:
object
Base class for storing and handling primer pairs.
- primer_info
Primer3 output containing the information about the primer pairs.
- Type:
dict
- mt
Melting temperature of the primer pair.
- Type:
float
- bp
Base pairs of the primer pair.
- Type:
int
- chromosome
Chromosome of the primer pair.
- Type:
str
- ordertable
DataFrame containing the order information for the primer pair.
- Type:
pd.DataFrame
- orderprimer_forwards
Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.
- Type:
str
- orderprimer_reverse
Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.
- Type:
str
- primer_forwards
Forward primer sequence.
- Type:
str
- primer_reverse
Reverse primer sequence.
- Type:
str
- class primertool.primertool.VariantPrimerGenerator(**kwargs)[source]
Bases:
PrimerGenerator
Class for generating primers for a given variant.
- variant
Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)
- Type:
str
- nm_number
NM number of the gene
- Type:
str
- ordertable
DataFrame containing the order information for the primer pair
- Type:
pd.DataFrame
- check_mutation() Tuple[SequenceVariant, SequenceVariant] [source]
Checking the given mutation using the mutalyzer api and converting into a genomic position.
Firstly running the mutalyzer name checker (runMutalyzer) to check the given mutation nomenclature while trying to catch any possible errors. Then the mutation is parsed using the hgvs parser and converted into a genomic position using the mutalyzer api.
- Raises:
PrimertoolInputError – If the given mutation is invalid
PrimertoolMutalyzerError – If the mutalyzer api request fails
- Returns:
Tuple of coding mutation and genomic mutation (tuple)
- static check_variant(variant: str) str [source]
Check if the given variant is valid and in HGVS format. Also fix formatting if possible.
- Parameters:
variant (str) – Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)
- Raises:
PrimertoolInputError – If the given variant is invalid
- Returns:
Variant if valid or raises an exception
- Return type:
str
primertool.ucsc_database module
This module handles the connection and queries to the UCSC SQL database.
- primertool.ucsc_database.query(genome_assembly: str, query: str, local: bool = False, password_local: str = 'password') list [source]
Query the UCSC SQL database or a local copy of the database.
- Parameters:
genome_assembly (str) – The genome assembly to query, e.g. ‘hg38’.
query (str) – The SQL query to execute.
local (bool) – If True, a local copy of the UCSC database is used.
password_local (str) – The password for the local database.
- Returns:
The result of the query as a list of tuples or None if the query did not return any results.
- Return type:
list or None