primertool package

Submodules

primertool.exceptions module

Exceptions for the primertool package.

class primertool.exceptions.InSilicoPCRError[source]

Bases: object

Base class for exceptions in insilicopcr module.

class primertool.exceptions.InSilicoPCRRequestError[source]

Bases: InSilicoPCRError

Exception raised request errors to the InSilicoPCR API.

exception primertool.exceptions.PrimertoolError[source]

Bases: Exception

Base class for exceptions in the primertool module.

exception primertool.exceptions.PrimertoolExonLengthError[source]

Bases: PrimertoolError

Exception raised when the exon length exceeds the max insert size.

exception primertool.exceptions.PrimertoolGenomeError[source]

Bases: PrimertoolError

Exception raised for errors in the genome.

exception primertool.exceptions.PrimertoolInputError[source]

Bases: PrimertoolError

Exception raised for errors in the input.

exception primertool.exceptions.PrimertoolIntronicPositionError[source]

Bases: PrimertoolError

Exception raised when the given position is intronic.

exception primertool.exceptions.PrimertoolMutalyzerError[source]

Bases: PrimertoolError

Exception raised for errors in the Mutalyzer API.

exception primertool.exceptions.PrimertoolNoPrimerFoundError[source]

Bases: PrimertoolError

Exception raised when no primer is found.

primertool.functions module

This module contains static auxiliary functions that are used in the primertool package.

primertool.functions.calculate_targets(target_start: int, target_end: int, primer_bases: int) → dict[source]

Defining the sequence start and end positions.

Primer bases are the number of bases to each side of the target sequence in which primer3 looks for a possible primer. These are added to sequence start and end respectively to calculate the sequence start and end positions. The size range is a list with the length of the target sequence and target sequence with the primer bases added. Primer3 should design primers in this size range.

Parameters:

target_start (int) – Start position of the target sequence
target_end (int) – End position of the target sequence
primer_bases (int) – Number of bases to each side of the target sequence in which primer3 looks for a possible primer

Returns:

A dictionary containing the sequence start and end positions, the number of bases to each side of the target sequence in which primer3 looks for a possible primer, the length of the target sequence, and the size range for primer3.

primertool.functions.correct_intronic_variant(response: dict, variant: str) → str[source]

Handle the case where the given variant is intronic. If the offset is <= 5, drop the offset. Otherwise, raise an exception.

Parameters:

response (dict) – The response from the mutalyzer API
variant (str) – The variant to correct

Raises:

PrimertoolIntronicPositionError – If the offset is too large to be corrected automatically (>)

Returns:

The corrected variant

primertool.functions.filter_unique_primers(primer3_dict: dict) → Tuple[dict, bool][source]

Check primer uniqueness and remove all primers with multiple binding sites, so that only the uniquely binding ones remain. Also returns a flag if all primers were invalid (i.e. not uniquely binding), but only if there was at least one primer to begin with.

Parameters:: primer3_dict (dict) – Primer3 output dictionary
Returns:: Tuple of the filtered primer3_dict and a flag indicating if all primers were invalid

primertool.functions.find_sequence_positions(exon_starts: list, exon_ends: list, exon_count: int, strand: str, mutation_position: Interval) → dict[source]

Establish if variant position is in an exon.

The hgvs.parser object has the start and end position of the variant (in case of an indel rather than a SNP). Using these positions and the lists of exon starts and ends, determine if the variant is located in an exon.

The information is returned as a dictionary. If the gene is based on the - strand, the exon number needs to be inverted.

Parameters:

exon_starts (list) – List of exon start positions
exon_ends (list) – List of exon end positions
exon_count (int) – Number of exons
strand (str) – Strand of the gene (‘+’ or ‘-‘)
mutation_position (hgvs.location.Interval) – Start and end position of the variant

Returns:

Dictionary containing the exon number, start and end position of the variant, the length of the variant, and a boolean indicating if the variant is in an exon

primertool.functions.get_gene_information(genome_assembly: str, nm_number: str) → dict[source]

Retrieve gene information from RefSeq database for a given NM number.

Parameters:

genome_assembly (str) – The genome assembly to use (‘hg38’ or ‘hg19’)
nm_number (str) – The NM number of the gene to retrieve information for

Returns:

A dictionary containing the gene information

primertool.functions.get_snps(chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) → list[source]

Retrieve common SNPs from the UCSC database.

Query the UCSC database with the chromosome and position data to find common SNPs in the sequence, which need to be masked. SNPs are returned as list.

Parameters:

chromosome (str) – chromosome name as string (e.g. ‘chr1’)
seq_start (int) – start position of the sequence
seq_end (int) – end position of the sequence
genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)

Returns:

List of common SNPs in the sequence

primertool.functions.mask_snps(genome: Genome, chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) → str[source]

Mask common SNP positions with an N in the sequence.

Using the get_snps() function to retrieve common SNPs in the given sequence from UCSC. Extract the genomic sequence between the given positions via genomepy and replace the bases at common SNP positions with an N.

Parameters:

genome (genomepy.Genome) – genomepy genome object
chromosome (str) – chromosome name as string (e.g. ‘chr1’)
seq_start (int) – start position of the sequence
seq_end (int) – end position of the sequence
genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)

Returns:

The sequence with common SNPs masked as a string

primertool.functions.mutalyzer_error_handler(response: dict) → PrimertoolInputError[source]

Checks for errors in the mutalyzer response and raises an exception if there is an error.

Parameters:: response (dict) – The response from the mutalyzer API
Raises:: PrimertoolInputError – If there is an error in the response

primertool.functions.parse_mutation(mutation: str) → SequenceVariant[source]

Parse mutation with hgvs parser.

Gene names in brackets are removed from the variant (eg: eg: NM_003165.6(STXBP1):c.1702G>A). The mutation is then parsed using hgvs.parser and a hgvs tree object (see https://hgvs.readthedocs.io/en/stable/key_concepts.html#variant-object-representation) is returned. Parses coding and genomic variants.

Parameters:: mutation (str) – mutation in HGVS nomenclature
Raises:: PrimertoolInputError – If there is an error in the mutation nomenclature
Returns:: hgvs.sequencevariant.SequenceVariant

primertool.functions.purge_primer_pair(primer3_dict: dict, index: int) → dict[source]

Removes primer pair of given index from primer3_dict and updates the dictionary so that the remaining data stays consistent. (I.e. reset indices so enumeration does not have gaps and update)

Parameters:

primer3_dict (dict) – Primer3 output dictionary
index (int) – Index of primer pair to remove

Returns:

Primer3 output dictionary with the primer pair of the given index removed

primertool.functions.reduce_numbers_in_string(input_string: str) → str[source]

Takes an input string and reduces any (positive) integer in it by one.

Parameters:: input_string (str) – String to reduce numbers in
Returns:: String with all numbers reduced by one

primertool.functions.remove_whitespaces(func: callable) → callable[source]

Decorator to remove whitespaces from all string arguments before passing them to the decorated function.

Parameters:: func (callable) – The function to decorate
Returns:: The decorated function

primertool.functions.split_nm(nm_number: str) → Tuple[str, int][source]

Split variant.ac from hgvs object into transcript and version number

If the transcript number includes a version number, split at ‘.’. If not, set the version number to 1.

Parameters:: nm_number (str) – NM number (e.g. NM_000451.3)
Returns:: Tuple of transcript and version number

primertool.insilicopcr module

This module provides a class to perform in-silico PCR. It serves as a python interface to the UCSC In-Silico PCR tool (https://genome.ucsc.edu/cgi-bin/hgPcr).

class primertool.insilicopcr.InSilicoPCR(forward_primer: str, reverse_primer: str, max_product_size: int = 4000, min_perfect_match: int = 15, min_good_match: int = 15, flip_reverse_primer: bool = False)[source]

Bases: object

This class provides a python interface to the UCSC In-Silico PCR tool. It takes a forward and a reverse primer and returns the PCR product.

forward_primer

forward primer sequence

Type:: str

reverse_primer

reverse primer sequence

Type:: str

fasta_pcr

fasta output from the UCSC In-Silico PCR

Type:: list

is_uniquely_binding() → bool[source]: Check if the PCR product is uniquely binding to just one site. A PCR primer is uniquely binding if it binds to only one site in the genome. Note: sometimes the same PCR product is found in a chromosome and also an alt version of the chromosome, i.e. two entries describe the same PCR product and should only be counted as one.

primertool.logger module

This module contains the logger configuration for the primertool package.

class primertool.logger.CustomFormatter(fmt: str)[source]

Bases: Formatter

Logging colored formatter, adapted from https://stackoverflow.com/a/56944256/3638629

fmt

The format string to use for the log messages

Type:: str

FORMATS

A dictionary mapping log levels to their respective format strings

Type:: dict

format(record: LogRecord) → str[source]

Format the log record according to its level.

Parameters:: record (logging.LogRecord) – The log record to format
Returns:: The formatted log message
Return type:: str

primertool.logger.init_logger(fmt: str = '[%(levelname)s] %(message)s', level: int = 10, save_to: str | None = None) → Logger[source]

Initialize the logger for the primertool package.

Parameters:

fmt (str) – The format string to use for the log messages
level (int) – The logging level to use
save_to (str) – The file to save the logs to (optional)

Returns:

The configured logger

Return type:

logging.Logger

primertool.primertool module

This module contains the classes for generating primers for a given genomic position, gene, exon or variant.

class primertool.primertool.ExonPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given exon.

nm_number

NM number of the gene

Type:: str

exon_number

Exon number

Type:: int

variant_pos

Dictionary containing the variant position

Type:: dict

gene_info

Gene information

Type:: dict

chromosome

Chromosome of the gene

Type:: str

ordertable

DataFrame containing the order information for the primer pair

Type:: pd.DataFrame

check_exon()[source]

Checking if exon exists in gene.

Raises:: PrimertoolInputError – If the given exon number is larger than the number of exons in the gene

check_nm_number()[source]

Check if nm_number is valid.

Raises:: PrimertoolInputError – If the given NM number is invalid

get_exon_boundaries() → Tuple[int, int][source]

Get exon boundaries for the given exon number.

Returns:: Tuple of exon start and exon end (tuple)

get_ordertable(gene_info: dict, list_primers: list) → DataFrame[source]

Get ordertable for exon primers.

Parameters:

gene_info (dict) – Gene information
list_primers (list) – List of primers

Raises:

PrimertoolNoPrimerFoundError – If no primers were found for the given exon

Returns:

Pandas DataFrame containing the order information for the primer pair

class primertool.primertool.ExonPrimerPair(gene_info: dict, primer_info: list, exon_number: int)[source]

Bases: PrimerPair

Class for storing and handling primer pairs for a given exon.

gene_info

Gene information

Type:: dict

gene_name

Name of the gene (e.g. BRCA1)

Type:: str

chromosome

Chromosome of the gene (e.g. chr1)

Type:: str

nm_number

NM number of the gene (e.g. NM_000451)

Type:: str

strand

Strand of the gene (either ‘+’ or ‘-‘)

Type:: str

exon_number

Exon number

Type:: int

orderprimer_forwards

Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.

Type:: str

orderprimer_reverse

Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.

Type:: str

primer_forwards

Forward primer sequence.

Type:: str

primer_reverse

Reverse primer sequence.

Type:: str

ordertable

DataFrame containing the order information for the

Type:: pd.DataFrame

get_order_primers() → Tuple[str, str, str, str][source]

Returns the forward and reverse primers in Gene-E(Exonnumber)F;Sequence and Gene-E(Exonnumber)R;Sequence format for ordering.

Returns:: Tuple of orderprimer forwards, orderprimer reverse, forwards primer and reverse primer

make_order_table() → DataFrame[source]

Create a pandas DataFrame for the order table.

Returns:: Pandas DataFrame containing the order information for the primer pair.

class primertool.primertool.GenePrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given gene.

nm_number

NM number of the gene

Type:: str

gene_info

Gene information

Type:: dict

chromosome

Chromosome of the gene

Type:: str

ordertable

DataFrame containing the order information for the primer pair

Type:: pd.DataFrame

check_nm_number()[source]

Check if nm_number is valid.

Raises:: PrimertoolInputError – If the given NM number is invalid

class primertool.primertool.GenomicPositionPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given genomic position.

chromosome

Chromosome of the genomic position

Type:: str

start

Start position of the genomic position

Type:: int

end

End position of the genomic position

Type:: int

ordertable

DataFrame containing the order information for the primer pair

Type:: pd.DataFrame

static check_chromosome(chromosome: str) → str[source]

Check if the given chromosome is valid and fix formatting if necessary.

Parameters:: chromosome (str) – Chromosome (e.g. chr1)
Raises:: PrimertoolInputError – If the given chromosome is invalid
Returns:: Chromosome if valid or raises an exception
Return type:: str

get_ordertable(list_primers: list) → DataFrame[source]

Get ordertable for genomic position primers.

Parameters:: list_primers (list) – List of primers
Returns:: Pandas DataFrame containing the order information for the primer pair

class primertool.primertool.GenomicPositionPrimerPair(primer_info, chromosome: str, start: int, end: int)[source]

Bases: PrimerPair

Class for storing and handling primer pairs for a given genomic position.

chromosome

Chromosome of the genomic position

Type:: str

start

Start position of the genomic position

Type:: int

end

End position of the genomic position

Type:: int

orderprimer_forwards

Forward primer in ChrStartF;Sequence format for ordering.

Type:: str

orderprimer_reverse

Reverse primer in ChrStartR;Sequence format for ordering.

Type:: str

primer_forwards

Forward primer sequence.

Type:: str

primer_reverse

Reverse primer sequence.

Type:: str

ordertable

DataFrame containing the order information for the primer pair.

Type:: pd.DataFrame

get_order_primers() → Tuple[str, str, str, str][source]

Returns the forward and reverse primers in ChrStartF;Sequence and ChrStartR;Sequence format for ordering.

Returns:: Tuple of orderprimer forwards, orderprimer reverse, forwards primer and reverse primer

make_order_table() → DataFrame[source]

Create a pandas DataFrame for the order table.

Returns:: Pandas DataFrame containing the order information for the primer pair.

class primertool.primertool.PrimerGenerator(**kwargs)[source]

Bases: object

Base class for generating primers for a given genomic position, gene, exon or variant.

genome_assembly

Genome assembly (e.g. hg38)

Type:: str

kuerzel

Kuerzel of the person who is ordering the primers

Type:: str

genome_dir

Directory containing the genome files

Type:: str

genome

Genome object

Type:: genomepy.Genome

max_insert

Maximum insert size for the primer (default: 800)

Type:: int

min_insert

Minimum insert size for the primer (default: 200)

Type:: int

dist_exon_borders

Distance to the exon borders (default: 40)

Type:: int

chromosome

Chromosome of the primer pair

Type:: str

static check_genome_assembly(genome_assembly: str) → str[source]

Check if the given genome assembly is valid.

Parameters:: genome_assembly (str) – Genome assembly (e.g. hg38)
Raises:: PrimertoolInputError – If the given genome assembly is invalid
Returns:: Genome assembly if valid or raises an exception
Return type:: str

check_insert_size(start: int, end: int) → list[source]

Check if insert size is between min/max insert size.

The insert size for the primer needs to be between min and max insert size, given the sequencing method. If the insert size is smaller, the difference to the min insert size is added/subtracted from start/end position. If the insert size is bigger than the max insert size, the range is split into multiple chunks,

Parameters:

start (int) – Start position
end (int) – End position

Returns:

List of positions for primer generation (list)

design_primer(start: int, end: int, primer_bases: int) → Tuple[dict, int][source]

Design a primer pair using primer3.

Firstly the targets are defined. Then the genomic sequence is retrieved and the most common SNPs are masked in the sequence.

Parameters:

start (int) – Start position
end (int) – End position
primer_bases (int) – Number of primer bases

Returns:

tuple of primer3 results as dict and insert size as int

fetch_genome() → Genome[source]

Check if the given genome assembly is available in the genome directory. If not, download it from UCSC.

Raises:

PrimertoolInputError – If the genome assembly is invalid
PrimertoolGenomeError – If the genome assembly is not available in the genome directory and cannot be downloaded

Returns:

Genome object (genomepy.Genome)

iterate_positions(positions: list) → list[source]

Generate primers with the given positions.

If primer3 does not return a result, increase the sequence length in which primers can be generated.

Parameters:: positions (list) – List of positions for primer generation
Returns:: List of primers (list)

class primertool.primertool.PrimerPair(primer_info: list)[source]

Bases: object

Base class for storing and handling primer pairs.

primer_info

Primer3 output containing the information about the primer pairs.

Type:: dict

mt

Melting temperature of the primer pair.

Type:: float

bp

Base pairs of the primer pair.

Type:: int

chromosome

Chromosome of the primer pair.

Type:: str

ordertable

DataFrame containing the order information for the primer pair.

Type:: pd.DataFrame

orderprimer_forwards

Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.

Type:: str

orderprimer_reverse

Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.

Type:: str

primer_forwards

Forward primer sequence.

Type:: str

primer_reverse

Reverse primer sequence.

Type:: str

class primertool.primertool.VariantPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given variant.

variant

Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)

Type:: str

nm_number

NM number of the gene

Type:: str

ordertable

DataFrame containing the order information for the primer pair

Type:: pd.DataFrame

check_mutation() → Tuple[SequenceVariant, SequenceVariant][source]

Checking the given mutation using the mutalyzer api and converting into a genomic position.

Firstly running the mutalyzer name checker (runMutalyzer) to check the given mutation nomenclature while trying to catch any possible errors. Then the mutation is parsed using the hgvs parser and converted into a genomic position using the mutalyzer api.

Raises:

PrimertoolInputError – If the given mutation is invalid
PrimertoolMutalyzerError – If the mutalyzer api request fails

Returns:

Tuple of coding mutation and genomic mutation (tuple)

static check_variant(variant: str) → str[source]

Check if the given variant is valid and in HGVS format. Also fix formatting if possible.

Parameters:: variant (str) – Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)
Raises:: PrimertoolInputError – If the given variant is invalid
Returns:: Variant if valid or raises an exception
Return type:: str

primertool.ucsc_database module

This module handles the connection and queries to the UCSC SQL database.

primertool.ucsc_database.query(genome_assembly: str, query: str, local: bool = False, password_local: str = 'password') → list[source]

Query the UCSC SQL database or a local copy of the database.

Parameters:

genome_assembly (str) – The genome assembly to query, e.g. ‘hg38’.
query (str) – The SQL query to execute.
local (bool) – If True, a local copy of the UCSC database is used.
password_local (str) – The password for the local database.

Returns:

The result of the query as a list of tuples or None if the query did not return any results.

Return type:

list or None

primertool.unittest module

class primertool.unittest.PrimertoolTest(methodName='runTest')[source]

Bases: TestCase

test_get_gene_information()[source]

test_query_ucsc_database()[source]

primertool package

Submodules

primertool.exceptions module

primertool.functions module

primertool.insilicopcr module

primertool.logger module

primertool.primertool module

primertool.ucsc_database module

primertool.unittest module

Module contents