FastQC
Function | A quality control tool for high throughput sequence data. |
---|---|
Language | Java |
Requirements | A suitable Java Runtime Environment
The Picard BAM/SAM Libraries (included in download) |
Code Maturity | Stable. Mature code, but feedback is appreciated. |
Code Released | Yes, under GPL v3 or later. |
Initial Contact | Simon Andrews |
Download Now |
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
- Import of data from BAM, SAM or FastQ files (any variant)
- Providing a quick overview to tell you in which areas there may be problems
- Summary graphs and tables to quickly assess your data
- Export of results to an HTML based permanent report
- Offline operation to allow automated generation of reports without running the interactive application
Documentation
A copy of the FastQC documentation is available for you to try before you buy (well download..).
Example Reports
- Good Illumina Data
- Bad Illumina Data
- Adapter dimer contaminated run
- Small RNA with read-through adapter
- Reduced Representation BS-Seq
- PacBio
- 454
Changelog
- 08-01-19: Version 0.11.9 released
-
- Fixed a bug when analysing empty files
- Added support for multi-read fast5 files
- Fixed a corner case bug in adapter detection
- Bundled a JRE with the OSX build so you don't have to install it
- Fixed a hang if the program runs out of memory
- 04-10-18: Version 0.11.8 released
-
- Fixed a performance bug in highly duplicated sequences
- Changed the behaviour of the sequence length module when run with --nogroup
- Other minor bug fixes
- 10-01-18: Version 0.11.7 released
-
- Fixed a crash if the first sequence in a file was shorter than 12bp
- 21-12-17: Version 0.11.6 released
-
- Disabled the Kmer plot by default
- Fixed a bug when long custom adapters were being used
- Changed the tile number cutoff to accommodate the novaseq
- Fixed various format changes in nanopore data from ONT
- Added new Clontech sequences to the contaminant list
- Added a --min-length option to remove short sequences
- Added an option to specify the output name of data streamed into the program
- 08-03-16: Version 0.11.5 released
-
- Fixed the smallRNA adapter sequence so that abundance isn't under-represented in the adapter content plot
- Fixed a bug in the warn / error code for the per-base sequence content plot
- Fixed a typo in the documentation for the duplication plot
- 09-10-15: Version 0.11.4 released
-
- Changed the OSX launcher to not rely on the internal JVM framework, but use any command line java which is found
- Fixed a typo in one of the adapter sequences
- Fixed a bug which meant that some file extensions weren't removed from report names in non-interactive mode
- Made the per-tile module not collect any stats if it's disabled in limits.txt
- Fixed a bug in the calculation of duplication for highly duplicated, ordered files with very small numbers of sequences
- Fixed an incorrect error flag in the per-base quality module where there were less than 100 observations in a read group
- 25-3-15: Version 0.11.3 released
-
- Fixed a bug when disabling the per-tile plot from limits.txt
- Fixed a bug which caused the program to continue when processing of multiple files was actually complete
- Fixed a bug which meant format selection in the interactive application didn't work
- Added checks for mis-itentifying tile numbers in confusing sample ids
- Added the SOLID smallRNA adapter to the standard search set
- Fixed a bug when extracting casava names from uncompressed fastq files
- Added support for processing files of Oxford Nanopore reads
- 6-6-14: Version 0.11.2 released
-
- Fixed incorrect warn/fail defaults for per-seq quality plot
- Fixed memory leaks in Kmer and per-seq quality modules
- Added an option to use a custom limits file
- Fixed a bug in the naming of the folder inside the zip output file
- Fixed a bug in the --extract option
- 2-6-14: Version 0.11.1 released
-
- Added configurable warn/fail thresholds for all modules
- Allow modules to be selectively turned off
- Added a per-tile quality plot for Illumina libraries
- Added an adapter content plot
- Improved the duplication plot
- Improved the Kmer module
- Used embedded graphics in the HTML output so you can distribute a single file
- Added the ability to read data from stdin
- Changed how base grouping works to better accommodate long reads
- Dropped support for Solexa64 format (NB not Phred 64 which is still supported)
- 3-5-12: Version 0.10.1 released
-
- Added a workround to allow the analysis of concatenated gzipped files
- Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
- Added an option to specify the location of the java interpreter on the command line
- 9-9-11: Version 0.10.0 released
-
- Added a Casava mode to sanely process the multiple fastq files produced by the latest illumina pipeline
- Fixed a bug in Kmer analysis which missed of the last possible Kmer in each sequence
- Fixed a classpath bug if using the wrapper script under windows
- 31-8-11: Version 0.9.6 released
-
- Fixed a crash in libraries where every sequence ended in poly-N
- Fixed the launch wrapper to set the classpath correctly on OSX
- 16-8-11: Version 0.9.5 released
-
- Fixed a bug in text output for the per-base sequence content module
- Made progress reporting absolute, and not approximate
- Added a print CSS style so reports are printable again
- 13-7-11: Version 0.9.4 released
-
- Improved the error reporting for failed files in the offline application
- 16-6-11: Version 0.9.3 released
-
- Added support for bzip2 compressed fastq files
- Added new CSS theme for HTML reports, contributed by Phil Ewels
- 16-5-11: Version 0.9.2 released
-
- Fixed a bug where grouped base numbers weren't reported in the per-base quality text report
- Fixed a crash in the Kmer analysis when analysing small files
- 30-3-11: Version 0.9.1 released
-
- Added --quiet and --nogroup options to command line
- Added encoding type to the basic stats
- Added detection of Illumina <1.3 1.3 1.5 and 1.9 encodings
- 10-2-11: Version 0.9.0 released
-
- Added support for very long reads (esp 454 and PacBio)
- Duplication detection now uses only the first 50bp of each read
- 21-1-11: Version 0.8.0 released
-
- Made all graphs easier to interpret
- Added an option to analyse only mapped sequences from a BAM/SAM file
- Added an option to analyse two or more files in parallel
- 24-11-10: Version 0.7.2 released
-
- Fixed bug when analysing libraries with no unique sequences
- Added an option to specify a custom contaminant list on the command line
- 24-11-10: Version 0.7.1 released
-
- Improved the command line interface with proper options and error handling
- Added an option to force the file format where guessing from the filename doesn't work
- 27-10-10: Version 0.7.0 released
-
- Added a Kmer enrichment analysis to find non-aligned enriched sequences
- Cleaned up axis labels on all graphs
- 27-10-10: Version 0.6.1 released
-
- Fixed a bug which caused some sequences and qualities from BAM/SAM files to be reversed
- 18-10-10: Version 0.6.0 released
-
- Sequences can now be read from SAM/BAM format files
- Added smoother lines to the graphs
- 29-09-10: Version 0.5.1 released
-
- Fixed a formatting bug in the text output
- Fixed the %GC plot to work well with reads over 100bp
- Improved the fitting of the modelled curve to the %GC plot
- Added more illumina oligos to the contaminants file
- 16-09-10: Version 0.5.0 released
-
- Improved the fitting of the normal distribution to %GC plot
- Calculated the total duplicated sequence % in the duplicate sequence module
- Added pass/fail/warn icons next to each section of the HTML report
- Put Icons and Images into subfolders in the HTML report
- 30-07-10: Version 0.4.3 released
-
- Fixed the reporting of sequence counts in the Basic Stats module
- Added a warning before overwriting reports in the interactive application
- 26-07-10: Version 0.4.2 released
-
- Fixed y-axis scale on per-base quality plot
- Added fail / warn checks to modules which lacked them and improved existing checks
- Added a modelled distribtion to the per-sequence GC plot
- Scale the width of report graphs for long sequence reads
- 24-06-10: Version 0.4.1 released
-
- Changed the duplicate module to reduce memory usage for long sequences
- Changed the way duplicate levels are counted to be more realistic
- 18-06-10: Version 0.4 released
-
- Added a sequence duplication level module
- Added a lauch wrapper for easier use from the commandline
- Added full machine parsable output for integration into pipelines
- 28-05-10: Version 0.3.1 released
-
- Fixed a bug where invalid template files caused a crash
- Non-interactive use now correctly reports progress for all files, not just the first one
- Added some missing documentation
- 13-05-10: Version 0.3 released
-
- Added support for gzip compressed fastq files
- Added identification of overrepresented sequences
- Improved colorspace support
- Added an option to save non-interactive reports to a specific directory
- 06-05-10: Version 0.2 released
-
- Added support for colorspace fastq files
- Added templating support to allow customisation of HTML reports
- Unzipped non-interactive reports by default, and added an option to turn this off
- Added easily computer readable summary file to reports
- 28-04-10: Version 0.1.1 released
-
- Fixed a bug which prevented non-interactive use on a headless system
- 26-04-10: Version 0.1 released
-
- Initial set of 9 modules
- Interactive and offline operation functional