Overview¶
This Python package serves as a wrapper for the incredibly useful TFMPvalue C++ program. It allows users to determine score thresholds for a given transcription factor position frequency matrix associated with a specific pvalue. Naturally, it can also perform the reverse, quickly calculating an accurate pvalue from a score for a given motif matrix.
pytfmpval
allows this functionality to be easily utilized within a Python script, module, or interactive session.
What Works and What Doesn’t¶
Currently, the TFMpvaluepv2sc
and TFMpvaluesc2pv
programs are implemented, though there are also plans to implement the remaining TFMPvaluefastpvalue
, TFMpvaluedistrib
, and TFMpvaluelazydistrib
programs.
A Simple Example¶
JASPAR is a very highlytouted transcription factor motif database from which motif count matrices can be downloaded for a large variety of organisms and transcription factors. There exist numerous other motif databases as well (TRANSFAC, CISBP, MEME, HOMER, WORMBASE, etc), most of which use a relatively similar format for their motifs. Typically, a motif file consists of four rows or columns with each position in a given row or column corresponding to a base within the motif. Sometimes there is an comment line started with >
. The row or column order is always A, C, G, T
. In this example, the motif consists of four rows corresponding to the 16 positions of the motif with counts for each base at each position.
>>> from pytfmpval import tfmp
>>> m = tfmp.create_matrix("MA0045.pfm")
>>> tfmp.score2pval(m, 8.7737)
9.992625564336777e06
>>> tfmp.pval2score(m, 0.00001)
8.773708000000001
This could also be done by creating a string for the matrix by concatenating the rows (or columns) and using the read_matrix()
function. This method is usually easier, as it allows the user to parse the motif file as necessary to ensure a proper input. It’s also more fitting for highthroughput use.
>>> from pytfmpval import tfmp
>>> mat = (" 3 7 9 3 11 11 11 3 4 3 8 8 9 9 11 2"
... " 5 0 1 6 0 0 0 3 1 4 5 1 0 5 0 7"
... " 4 3 1 4 3 2 2 2 8 6 1 4 2 0 3 0"
... " 2 4 3 1 0 1 1 6 1 1 0 1 3 0 0 5"
... )
>>> m = tfmp.read_matrix(mat)
>>> tfmp.pval2score(m, 0.00001)
8.773708000000001
>>> tfmp.score2pval(m, 8.7737)
9.992625564336777e06
Full tfmp Module Reference¶

tfmp.
create_matrix
(matrix_file, bg=[0.25, 0.25, 0.25, 0.25], mat_type='counts', log_type='nat')[source]¶ From a JASPAR formatted motif matrix count file, create a Matrix object.
This function also converts it to a logodds (position weight) matrix if necessary.
Parameters:  matrix_file (str) – Whitespace delimited string of rowconcatenated motif matrix.
 bg (list of floats) – Background nucleotide frequencies for [A, C, G, T].
 mat_type (str) – Type of motif matrix provided. Options are: “counts”, “pfm”, “pwm”. “counts” is for raw count matrices for each base at each position. “pfm” is for position frequency matrices (frequencies already calculated. “pwm” is for position weight matrices (also referred to as positionspecific scoring matrices.)
 log_type (str) – Base to use for log. Default is to use the natural log. “log2” is the other option. This will affect the scores and pvalues.
Returns: Matrix in pwm format.
Return type: m (pytfmpval Matrix)

tfmp.
read_matrix
(matrix, bg=[0.25, 0.25, 0.25, 0.25], mat_type='counts', log_type='nat')[source]¶ From a string of spacedelimited counts create a Matrix object.
Break the string into 4 rows corresponding to A, C, G, and T. This function also converts it to a logodds (position weight) matrix if necessary.
Parameters:  matrix_file (str) – Whitespace delimited string of rowconcatenated motif matrix.
 bg (list of floats) – Background nucleotide frequencies for [A, C, G, T].
 mat_type (str) – Type of motif matrix provided. Options are: “counts”, “pfm”, “pwm”. “counts” is for raw count matrices for each base at each position. “pfm” is for position frequency matrices (frequencies already calculated). “pwm” is for position weight matrices (also referred to as positionspecific scoring matrices.)
 log_type (str) – Base to use for log. Default is to use the natural log. “log2” is the other option. This will affect the scores and pvalues.
Returns: Matrix in pwm format.
Return type: m (pytfmpval Matrix)

tfmp.
score2pval
(matrix, req_score)[source]¶ Determine the pvalue for a given score for a specific motif PWM.
Parameters:  matrix (pytfmpval Matrix) – Matrix in pwm format.
 req_score (float) – Requested score for which to determine the pvalue.
Returns: The calculated pvalue corresponding to the score.
Return type: ppv (float)

tfmp.
pval2score
(matrix, pval)[source]¶ Determine the score for a given pvalue for a specific motif PWM.
Parameters:  matrix (pytfmpval Matrix) – Matrix in pwm format.
 pval (float) – pvalue for which to determine the score.
Returns: The calculated score corresponding to the pvalue.
Return type: score (float)
Contribute¶
Any and all contributions are welcome. Bug reporting via the Issue Tracker is much appeciated. Here’s how to contribute:
 Fork the pytfmpval repository on github (see forking help).
 Make your changes/fixes/improvements locally.
 Optional, but muchappreciated: write some tests for your changes. (Don’t worry about integrating your tests into the test framework  writing some in your commit comments or providing a test script is fine. I will integrate them later.)
 Send a pull request (see pull request help).
Reference¶
License¶
This project is licensed under the GPL3 license. You are free to use, modify, and distribute it as you see fit. The program is provided as is, with no guarantees.