AARTI - Tutorial

► Job submission

After verifying the CAPTCHA, a user can upload multiple files or multiple FASTA sequences in a single file for the analysis. To submit a job in the AutomAted RepeaT Identifier, upload a file (with ".fasta/ .fna/ .fa" extension) containing at least two nucleotide sequences in FASTA format. The first line in a FASTA file starts with a ">" (greater-than) symbol followed by accession number and other details. In AARTI pipeline, first word upto the space in FASTA header line will be used to represent the sequence/ organism. Here, the accession number (e.g. NC_022714.1) is used to represent the sequence. Information from FASTA sequence header, which must be in the format shown in Fig. 1 will be used otherwise the information available after ">" symbol will be used to represent the submitted FASTA sequences.

Fig. 1: A sequence in FASTA format.

A user can also set the parameters for mono-hexa repeat motifs including distance between two microsatellites. Otherwise, the pipeline will be executed with default parameters (Fig. 2).

Fig. 2: Job submission to AutomAted RepeaT Identifier.

An error message will be displayed, if uploaded nucleotide sequences are not in proper FASTA format or ambiguous characters found in submitted sequences (Fig. 3).

Fig. 3: Error in FASTA sequences.

After successful submission a unique job ID will be generated. It will be used to view or download the results available for 10 days since initial job submission (Fig. 4). User may get information of the job on email, if provided during job submission.

Fig. 4: Successful submission of the job.

► Interpretation of results

Results of two FASTA files (Accession: NC_036024.1 and NC_022714.1), containing complete mitochondrial genome sequences of genus Triticum, are shown. These sequences were downloaded from National Center for Biotechnology Information (NCBI).

The main results page contains frequency of mono-hexa repeats identified in submitted sequences. The frequency of repeat motifs identified in nucleotide sequences (Fig. 5) can be retrieved by clicking on respective accession number.

Fig. 5: Frequency of mono - hexa repeats.

Here user can analyze all repeat motifs identified in selected sequence along with their frequency (Fig. 6).

Fig. 6: Frequency of repeat motifs.

Moreover, MISA results can be retrieved for more details of identified microsatellites (Fig. 7).

Fig. 7: Details of microsatellites mined using MISA, a perl script.

► Common, polymorphic, and unique microsatellites

Information of common, polymorphic, and unique microsatellites can be fetched by clicking on the link 'Common, Polymorphic, and Unique Microsatellites' given on the main page (Fig. 5). Identical repeat motifs with equal and varying length, showing significant similarity of flanking sequences (200 nucleotides from both upstream and downstream of microsatellites) on reciprocal BLAST (Basic Local Alignment Search Tool) among the submitted nucleotide sequences, were considered as common and polymorphic microsatellites, respectively. Other repeat motifs and identical repeat motifs showing no significant similarity of flanking sequences were considered as unique microsatellites.

Frequency of common (green) and polymorphic (red) microsatellites identified between each pair of nucleotide sequences appear in the form of a matrix, whereas, total number of unique microsatellites identified in each nucleotide sequence appears in second column (blue) of the matrix (Fig. 8).

Fig. 8: Common, polymorphic, and unique microsatellites identified using AARTI.

Details of common, polymorphic or unique microsatellites can be fetched by clicking on the respective frequency in the matrix. It will show the details of motif, length, start, and end position along with their sequence alignment results (Fig. 9).

Fig. 9: Details of polymorphic microsatellites identified between submitted sequences.

Note: User can download the sample input file or view example results used in this tutorial, which can also be accessed using 'Home' page of AARTI.

***