primertool package

Submodules

primertool.exceptions module

Exceptions for the primertool package.

class primertool.exceptions.InSilicoPCRError[source]

Bases: object

Base class for exceptions in insilicopcr module.

class primertool.exceptions.InSilicoPCRRequestError[source]

Bases: InSilicoPCRError

Exception raised request errors to the InSilicoPCR API.

exception primertool.exceptions.PrimertoolError[source]

Bases: Exception

Base class for exceptions in the primertool module.

exception primertool.exceptions.PrimertoolExonLengthError[source]

Bases: PrimertoolError

Exception raised when the exon length exceeds the max insert size.

exception primertool.exceptions.PrimertoolGenomeError[source]

Bases: PrimertoolError

Exception raised for errors in the genome.

exception primertool.exceptions.PrimertoolInputError[source]

Bases: PrimertoolError

Exception raised for errors in the input.

exception primertool.exceptions.PrimertoolIntronicPositionError[source]

Bases: PrimertoolError

Exception raised when the given position is intronic.

exception primertool.exceptions.PrimertoolMutalyzerError[source]

Bases: PrimertoolError

Exception raised for errors in the Mutalyzer API.

exception primertool.exceptions.PrimertoolNoPrimerFoundError[source]

Bases: PrimertoolError

Exception raised when no primer is found.

primertool.functions module

This module contains static auxiliary functions that are used in the primertool package.

primertool.functions.calculate_targets(target_start: int, target_end: int, primer_bases: int) dict[source]

Defining the sequence start and end positions.

Primer bases are the number of bases to each side of the target sequence in which primer3 looks for a possible primer. These are added to sequence start and end respectively to calculate the sequence start and end positions. The size range is a list with the length of the target sequence and target sequence with the primer bases added. Primer3 should design primers in this size range.

Parameters:
  • target_start (int) – Start position of the target sequence

  • target_end (int) – End position of the target sequence

  • primer_bases (int) – Number of bases to each side of the target sequence in which primer3 looks for a possible primer

Returns:

A dictionary containing the sequence start and end positions, the number of bases to each side of the target sequence in which primer3 looks for a possible primer, the length of the target sequence, and the size range for primer3.

primertool.functions.correct_intronic_variant(response: dict, variant: str) str[source]

Handle the case where the given variant is intronic. If the offset is <= 5, drop the offset. Otherwise, raise an exception.

Parameters:
  • response (dict) – The response from the mutalyzer API

  • variant (str) – The variant to correct

Raises:

PrimertoolIntronicPositionError – If the offset is too large to be corrected automatically (>)

Returns:

The corrected variant

primertool.functions.filter_unique_primers(primer3_dict: dict) Tuple[dict, bool][source]

Check primer uniqueness and remove all primers with multiple binding sites, so that only the uniquely binding ones remain. Also returns a flag if all primers were invalid (i.e. not uniquely binding), but only if there was at least one primer to begin with.

Parameters:

primer3_dict (dict) – Primer3 output dictionary

Returns:

Tuple of the filtered primer3_dict and a flag indicating if all primers were invalid

primertool.functions.find_sequence_positions(exon_starts: list, exon_ends: list, exon_count: int, strand: str, mutation_position: Interval) dict[source]

Establish if variant position is in an exon.

The hgvs.parser object has the start and end position of the variant (in case of an indel rather than a SNP). Using these positions and the lists of exon starts and ends, determine if the variant is located in an exon.

The information is returned as a dictionary. If the gene is based on the - strand, the exon number needs to be inverted.

Parameters:
  • exon_starts (list) – List of exon start positions

  • exon_ends (list) – List of exon end positions

  • exon_count (int) – Number of exons

  • strand (str) – Strand of the gene (‘+’ or ‘-‘)

  • mutation_position (hgvs.location.Interval) – Start and end position of the variant

Returns:

Dictionary containing the exon number, start and end position of the variant, the length of the variant, and a boolean indicating if the variant is in an exon

primertool.functions.get_gene_information(genome_assembly: str, nm_number: str) dict[source]

Retrieve gene information from RefSeq database for a given NM number.

Parameters:
  • genome_assembly (str) – The genome assembly to use (‘hg38’ or ‘hg19’)

  • nm_number (str) – The NM number of the gene to retrieve information for

Returns:

A dictionary containing the gene information

primertool.functions.get_snps(chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) list[source]

Retrieve common SNPs from the UCSC database.

Query the UCSC database with the chromosome and position data to find common SNPs in the sequence, which need to be masked. SNPs are returned as list.

Parameters:
  • chromosome (str) – chromosome name as string (e.g. ‘chr1’)

  • seq_start (int) – start position of the sequence

  • seq_end (int) – end position of the sequence

  • genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)

Returns:

List of common SNPs in the sequence

primertool.functions.mask_snps(genome: Genome, chromosome: str, seq_start: int, seq_end: int, genome_assembly: str) str[source]

Mask common SNP positions with an N in the sequence.

Using the get_snps() function to retrieve common SNPs in the given sequence from UCSC. Extract the genomic sequence between the given positions via genomepy and replace the bases at common SNP positions with an N.

Parameters:
  • genome (genomepy.Genome) – genomepy genome object

  • chromosome (str) – chromosome name as string (e.g. ‘chr1’)

  • seq_start (int) – start position of the sequence

  • seq_end (int) – end position of the sequence

  • genome_assembly (str) – genome assembly as string (e.g. ‘hg38’)

Returns:

The sequence with common SNPs masked as a string

primertool.functions.mutalyzer_error_handler(response: dict) PrimertoolInputError[source]

Checks for errors in the mutalyzer response and raises an exception if there is an error.

Parameters:

response (dict) – The response from the mutalyzer API

Raises:

PrimertoolInputError – If there is an error in the response

primertool.functions.parse_mutation(mutation: str) SequenceVariant[source]

Parse mutation with hgvs parser.

Gene names in brackets are removed from the variant (eg: eg: NM_003165.6(STXBP1):c.1702G>A). The mutation is then parsed using hgvs.parser and a hgvs tree object (see https://hgvs.readthedocs.io/en/stable/key_concepts.html#variant-object-representation) is returned. Parses coding and genomic variants.

Parameters:

mutation (str) – mutation in HGVS nomenclature

Raises:

PrimertoolInputError – If there is an error in the mutation nomenclature

Returns:

hgvs.sequencevariant.SequenceVariant

primertool.functions.purge_primer_pair(primer3_dict: dict, index: int) dict[source]

Removes primer pair of given index from primer3_dict and updates the dictionary so that the remaining data stays consistent. (I.e. reset indices so enumeration does not have gaps and update)

Parameters:
  • primer3_dict (dict) – Primer3 output dictionary

  • index (int) – Index of primer pair to remove

Returns:

Primer3 output dictionary with the primer pair of the given index removed

primertool.functions.reduce_numbers_in_string(input_string: str) str[source]

Takes an input string and reduces any (positive) integer in it by one.

Parameters:

input_string (str) – String to reduce numbers in

Returns:

String with all numbers reduced by one

primertool.functions.remove_whitespaces(func: callable) callable[source]

Decorator to remove whitespaces from all string arguments before passing them to the decorated function.

Parameters:

func (callable) – The function to decorate

Returns:

The decorated function

primertool.functions.split_nm(nm_number: str) Tuple[str, int][source]

Split variant.ac from hgvs object into transcript and version number

If the transcript number includes a version number, split at ‘.’. If not, set the version number to 1.

Parameters:

nm_number (str) – NM number (e.g. NM_000451.3)

Returns:

Tuple of transcript and version number

primertool.insilicopcr module

This module provides a class to perform in-silico PCR. It serves as a python interface to the UCSC In-Silico PCR tool (https://genome.ucsc.edu/cgi-bin/hgPcr).

class primertool.insilicopcr.InSilicoPCR(forward_primer: str, reverse_primer: str, max_product_size: int = 4000, min_perfect_match: int = 15, min_good_match: int = 15, flip_reverse_primer: bool = False)[source]

Bases: object

This class provides a python interface to the UCSC In-Silico PCR tool. It takes a forward and a reverse primer and returns the PCR product.

forward_primer

forward primer sequence

Type:

str

reverse_primer

reverse primer sequence

Type:

str

fasta_pcr

fasta output from the UCSC In-Silico PCR

Type:

list

is_uniquely_binding() bool[source]

Check if the PCR product is uniquely binding to just one site. A PCR primer is uniquely binding if it binds to only one site in the genome. Note: sometimes the same PCR product is found in a chromosome and also an alt version of the chromosome, i.e. two entries describe the same PCR product and should only be counted as one.

primertool.logger module

This module contains the logger configuration for the primertool package.

class primertool.logger.CustomFormatter(fmt: str)[source]

Bases: Formatter

Logging colored formatter, adapted from https://stackoverflow.com/a/56944256/3638629

fmt

The format string to use for the log messages

Type:

str

FORMATS

A dictionary mapping log levels to their respective format strings

Type:

dict

format(record: LogRecord) str[source]

Format the log record according to its level.

Parameters:

record (logging.LogRecord) – The log record to format

Returns:

The formatted log message

Return type:

str

primertool.logger.init_logger(fmt: str = '[%(levelname)s] %(message)s', level: int = 10, save_to: str | None = None) Logger[source]

Initialize the logger for the primertool package.

Parameters:
  • fmt (str) – The format string to use for the log messages

  • level (int) – The logging level to use

  • save_to (str) – The file to save the logs to (optional)

Returns:

The configured logger

Return type:

logging.Logger

primertool.primertool module

This module contains the classes for generating primers for a given genomic position, gene, exon or variant.

class primertool.primertool.ExonPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given exon.

nm_number

NM number of the gene

Type:

str

exon_number

Exon number

Type:

int

variant_pos

Dictionary containing the variant position

Type:

dict

gene_info

Gene information

Type:

dict

chromosome

Chromosome of the gene

Type:

str

ordertable

DataFrame containing the order information for the primer pair

Type:

pd.DataFrame

check_exon()[source]

Checking if exon exists in gene.

Raises:

PrimertoolInputError – If the given exon number is larger than the number of exons in the gene

check_nm_number()[source]

Check if nm_number is valid.

Raises:

PrimertoolInputError – If the given NM number is invalid

get_exon_boundaries() Tuple[int, int][source]

Get exon boundaries for the given exon number.

Returns:

Tuple of exon start and exon end (tuple)

get_ordertable(gene_info: dict, list_primers: list) DataFrame[source]

Get ordertable for exon primers.

Parameters:
  • gene_info (dict) – Gene information

  • list_primers (list) – List of primers

Raises:

PrimertoolNoPrimerFoundError – If no primers were found for the given exon

Returns:

Pandas DataFrame containing the order information for the primer pair

class primertool.primertool.ExonPrimerPair(gene_info: dict, primer_info: list, exon_number: int)[source]

Bases: PrimerPair

Class for storing and handling primer pairs for a given exon.

gene_info

Gene information

Type:

dict

gene_name

Name of the gene (e.g. BRCA1)

Type:

str

chromosome

Chromosome of the gene (e.g. chr1)

Type:

str

nm_number

NM number of the gene (e.g. NM_000451)

Type:

str

strand

Strand of the gene (either ‘+’ or ‘-‘)

Type:

str

exon_number

Exon number

Type:

int

orderprimer_forwards

Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.

Type:

str

orderprimer_reverse

Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.

Type:

str

primer_forwards

Forward primer sequence.

Type:

str

primer_reverse

Reverse primer sequence.

Type:

str

ordertable

DataFrame containing the order information for the

Type:

pd.DataFrame

get_order_primers() Tuple[str, str, str, str][source]

Returns the forward and reverse primers in Gene-E(Exonnumber)F;Sequence and Gene-E(Exonnumber)R;Sequence format for ordering.

Returns:

Tuple of orderprimer forwards, orderprimer reverse, forwards primer and reverse primer

make_order_table() DataFrame[source]

Create a pandas DataFrame for the order table.

Returns:

Pandas DataFrame containing the order information for the primer pair.

class primertool.primertool.GenePrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given gene.

nm_number

NM number of the gene

Type:

str

gene_info

Gene information

Type:

dict

chromosome

Chromosome of the gene

Type:

str

ordertable

DataFrame containing the order information for the primer pair

Type:

pd.DataFrame

check_nm_number()[source]

Check if nm_number is valid.

Raises:

PrimertoolInputError – If the given NM number is invalid

class primertool.primertool.GenomicPositionPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given genomic position.

chromosome

Chromosome of the genomic position

Type:

str

start

Start position of the genomic position

Type:

int

end

End position of the genomic position

Type:

int

ordertable

DataFrame containing the order information for the primer pair

Type:

pd.DataFrame

static check_chromosome(chromosome: str) str[source]

Check if the given chromosome is valid and fix formatting if necessary.

Parameters:

chromosome (str) – Chromosome (e.g. chr1)

Raises:

PrimertoolInputError – If the given chromosome is invalid

Returns:

Chromosome if valid or raises an exception

Return type:

str

get_ordertable(list_primers: list) DataFrame[source]

Get ordertable for genomic position primers.

Parameters:

list_primers (list) – List of primers

Returns:

Pandas DataFrame containing the order information for the primer pair

class primertool.primertool.GenomicPositionPrimerPair(primer_info, chromosome: str, start: int, end: int)[source]

Bases: PrimerPair

Class for storing and handling primer pairs for a given genomic position.

chromosome

Chromosome of the genomic position

Type:

str

start

Start position of the genomic position

Type:

int

end

End position of the genomic position

Type:

int

orderprimer_forwards

Forward primer in ChrStartF;Sequence format for ordering.

Type:

str

orderprimer_reverse

Reverse primer in ChrStartR;Sequence format for ordering.

Type:

str

primer_forwards

Forward primer sequence.

Type:

str

primer_reverse

Reverse primer sequence.

Type:

str

ordertable

DataFrame containing the order information for the primer pair.

Type:

pd.DataFrame

get_order_primers() Tuple[str, str, str, str][source]

Returns the forward and reverse primers in ChrStartF;Sequence and ChrStartR;Sequence format for ordering.

Returns:

Tuple of orderprimer forwards, orderprimer reverse, forwards primer and reverse primer

make_order_table() DataFrame[source]

Create a pandas DataFrame for the order table.

Returns:

Pandas DataFrame containing the order information for the primer pair.

class primertool.primertool.PrimerGenerator(**kwargs)[source]

Bases: object

Base class for generating primers for a given genomic position, gene, exon or variant.

genome_assembly

Genome assembly (e.g. hg38)

Type:

str

kuerzel

Kuerzel of the person who is ordering the primers

Type:

str

genome_dir

Directory containing the genome files

Type:

str

genome

Genome object

Type:

genomepy.Genome

max_insert

Maximum insert size for the primer (default: 800)

Type:

int

min_insert

Minimum insert size for the primer (default: 200)

Type:

int

dist_exon_borders

Distance to the exon borders (default: 40)

Type:

int

chromosome

Chromosome of the primer pair

Type:

str

static check_genome_assembly(genome_assembly: str) str[source]

Check if the given genome assembly is valid.

Parameters:

genome_assembly (str) – Genome assembly (e.g. hg38)

Raises:

PrimertoolInputError – If the given genome assembly is invalid

Returns:

Genome assembly if valid or raises an exception

Return type:

str

check_insert_size(start: int, end: int) list[source]

Check if insert size is between min/max insert size.

The insert size for the primer needs to be between min and max insert size, given the sequencing method. If the insert size is smaller, the difference to the min insert size is added/subtracted from start/end position. If the insert size is bigger than the max insert size, the range is split into multiple chunks,

Parameters:
  • start (int) – Start position

  • end (int) – End position

Returns:

List of positions for primer generation (list)

design_primer(start: int, end: int, primer_bases: int) Tuple[dict, int][source]

Design a primer pair using primer3.

Firstly the targets are defined. Then the genomic sequence is retrieved and the most common SNPs are masked in the sequence.

Parameters:
  • start (int) – Start position

  • end (int) – End position

  • primer_bases (int) – Number of primer bases

Returns:

tuple of primer3 results as dict and insert size as int

fetch_genome() Genome[source]

Check if the given genome assembly is available in the genome directory. If not, download it from UCSC.

Raises:
Returns:

Genome object (genomepy.Genome)

iterate_positions(positions: list) list[source]

Generate primers with the given positions.

If primer3 does not return a result, increase the sequence length in which primers can be generated.

Parameters:

positions (list) – List of positions for primer generation

Returns:

List of primers (list)

class primertool.primertool.PrimerPair(primer_info: list)[source]

Bases: object

Base class for storing and handling primer pairs.

primer_info

Primer3 output containing the information about the primer pairs.

Type:

dict

mt

Melting temperature of the primer pair.

Type:

float

bp

Base pairs of the primer pair.

Type:

int

chromosome

Chromosome of the primer pair.

Type:

str

ordertable

DataFrame containing the order information for the primer pair.

Type:

pd.DataFrame

orderprimer_forwards

Forward primer in Gene-E(Exonnumber)F;Sequence format for ordering.

Type:

str

orderprimer_reverse

Reverse primer in Gene-E(Exonnumber)R;Sequence format for ordering.

Type:

str

primer_forwards

Forward primer sequence.

Type:

str

primer_reverse

Reverse primer sequence.

Type:

str

class primertool.primertool.VariantPrimerGenerator(**kwargs)[source]

Bases: PrimerGenerator

Class for generating primers for a given variant.

variant

Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)

Type:

str

nm_number

NM number of the gene

Type:

str

ordertable

DataFrame containing the order information for the primer pair

Type:

pd.DataFrame

check_mutation() Tuple[SequenceVariant, SequenceVariant][source]

Checking the given mutation using the mutalyzer api and converting into a genomic position.

Firstly running the mutalyzer name checker (runMutalyzer) to check the given mutation nomenclature while trying to catch any possible errors. Then the mutation is parsed using the hgvs parser and converted into a genomic position using the mutalyzer api.

Raises:
Returns:

Tuple of coding mutation and genomic mutation (tuple)

static check_variant(variant: str) str[source]

Check if the given variant is valid and in HGVS format. Also fix formatting if possible.

Parameters:

variant (str) – Variant in HGVS format (e.g. NM_000451.3:c.1702G>A)

Raises:

PrimertoolInputError – If the given variant is invalid

Returns:

Variant if valid or raises an exception

Return type:

str

primertool.ucsc_database module

This module handles the connection and queries to the UCSC SQL database.

primertool.ucsc_database.query(genome_assembly: str, query: str, local: bool = False, password_local: str = 'password') list[source]

Query the UCSC SQL database or a local copy of the database.

Parameters:
  • genome_assembly (str) – The genome assembly to query, e.g. ‘hg38’.

  • query (str) – The SQL query to execute.

  • local (bool) – If True, a local copy of the UCSC database is used.

  • password_local (str) – The password for the local database.

Returns:

The result of the query as a list of tuples or None if the query did not return any results.

Return type:

list or None

primertool.unittest module

class primertool.unittest.PrimertoolTest(methodName='runTest')[source]

Bases: TestCase

test_get_gene_information()[source]
test_query_ucsc_database()[source]

Module contents