PIntron

Gene-structure prediction based on spliced alignments of transcript sequences.

Get started

The following example briefly demonstrates how PIntron can be used to predict the full-length isoforms of a gene (TP53) starting from its associated UniGene cluster (Hs.437460).

This page is only a brief introduction to PIntron. Please refer to the full documentation for a complete description of its options.

Preparation

Download and install PIntron as described here.

In the rest of this example, we will assume that PIntron executables have been installed in directory /home/pintron/bin. Please substitute /home/pintron/bin with the directory where PIntron has been installed to (or omit the path if PIntron executables are in a directory listed in $PATH).

Input files

PIntron requires two input files: the genomic sequence (in this case genomic.txt), and the EST/mRNA sequences (in this case ests.txt), specified respectively with the options -g and -s. File genomic.txt is a FASTA file containing a single sequence, while ests.txt is a MultiFASTA file, where each sequence is considered as a single transcript (EST or mRNA).
We strictly require a specific header format. Please refer to the documentation for a detailed description.

The input files for this example are located in the subdirectory dist-docs/example and are also available for download (genomic.txt, ests.txt).

After the preparation, the directory tree should be as follows (other files may exist).

/home/pintron
  ├── bin
  │   ├── cds-annotation
  │   ├── compact-compositions
  │   ├── est-fact
  │   ├── gene-structure
  │   ├── intron-agreement
  │   ├── maximal-transcripts
  │   ├── min-factorization
  │   └── pintron
  └── doc
      └── example
          ├── ests.txt
          └── genomic.txt

Execution

Assuming that we want to generate all output files in the current working directory, the following command executes PIntron on the example.

/home/pintron/bin/pintron                               \
       --bin-dir=/home/pintron/bin                      \
       --genomic=/home/pintron/doc/example/genomic.txt  \
       --EST=/home/pintron/doc/example/ests.txt         \
       --organism=human                                 \
       --gene=TP53                                      \
       --output=pintron-full-output.json                \
       --gtf=pintron-cds-annotated-isoforms.gtf         \
       --extended-gtf=pintron-all-isoforms.gtf          \
       --logfile=pintron-pipeline-log.txt               \
       --general-logfile=pintron-log.txt

Most of the options have sensible default values and a short version. Therefore, a shorter equivalent command line is:

/home/pintron/bin/pintron                        \
       -b /home/pintron/bin                      \
       -g /home/pintron/doc/example/genomic.txt  \
       -s /home/pintron/doc/example/ests.txt     \
       -n human                                  \
       -e TP53

Please notice that options --organism/-n and --gene/-e are optional and can be omitted (in that case, the default value unknown is assumed).

Output files

PIntron produces the following output files:

  • pintron-full-output.json, the complete description of the results computed by PIntron in JSON format. This file is both human- and machine-readable (parsing libraries exist in all the major programming languages). Moreover, the format is almost self-documenting and can be easily adapted to the future needs. Please refer to its description for additional information.
  • pintron-all-isoforms.gtf/pintron-cds-annotated-isoforms.gtf, the set of all (CDS-annotated, respectively) full-length isoforms computed by PIntron in standard GTF2.2 format. These files can be used for some standard downstream analyzes. For example, they can be uploaded to the UCSC Genome Browser as custom tracks (as shown in the example below).
  • pintron-log.txt/pintron-pipeline-log.txt, the logs of main program and of each step of the pipeline. These files could contain important information if an error has occurred. Please upload them with any issue report.

The output files of PIntron on the example gene TP53 are located in the subdirectory dist-docs/example/sample-output while a graphical representation of the reconstructed isoforms is as follows:

img