NPLB finds promoter architectures (PAs) and their corresponding promoter elements (PEs) from a given set of promoter sequences. It is available both as a web based application and as a downloadable software.
No Promoter Left Behind (NPLB) finds the optimal number of promoter architectures (PAs), each with their own set of promoter elements (PEs), from a fasta file of promoter sequences.
NPLB has two commands: promoterLearn to learn new models and promoterClassify to identify new PAs using an existing model.
The following files are saved upon execution:
PAlogo.html: An HTML file comprising of information about the model and sequence logos for each PA of the model. The sequence logos are created using a modified form of Weblogo 3.3. promoterLearn also saves similar HTML files for each fold of every model learned.
modelOut.txt: A text file consisting of information about the best model. Similar information is saved for each fold of every model learned by promoterLearn.
architectureDetails.txt: A text file representing PA labels assigned to each sequence of the input FASTA file.
PAimage.png and rawImage.png: Image matrices for the clustered and unclustered data.
likelihood plots: promoterLearn saves the likelihood plot for each fold of every model learned, when run in verbose mode. These plots are helpful is determining whether the sampling has indeed converged.
settings.txt: A text file consisting of the execution settings of promoterLearn.
bestModel.p: The best learned model is saved in a binary format. This can be used later by promoterClassify to determine PAs of another set of promoter sequences.
boxplots and pie charts: A tab separated text file containing information about the promoter sequences corresponding to the fasta file and a column number can also be given as input. This results in construction of a pie chart or boxplot based on the learned model depending on the type of data in the column. A column number of the file, consisting of real numbers, can be taken as input and can be used to arrange the PAs based on the median values calculated for each PA.
NPLB can be run with several options in order to optimize the execution time and the results. Note that promoterLearn learns models by varying number of PAs in a method similar to binary search since the cross-validation likelihood for a model typically increases with the number of PAs till the optimal is reached and decreases afterward. In other words, to save time, models are not learned by linearly varying the count of PAs.
Publication: Mitra S. and Narlikar L., No Promoter Left Behind (NPLB): learn de novo promoter architectures from genome-wide transcription start sites, Bioinformatics, 32(5):779-781, 2016. [ Full Text ]