The 1000 Genomes Project sample cell lines are available for others to use. You can source them from the Coriell Cell Repository from the NHGRI collection.
A full listing of all the 1000 Genomes populations available from Coriell is available.
For the 1000 Genomes Project, due to the freely available nature of the data, no phenotype information was collected for any of the samples. All donors were over 18 and declared themselves to be healthy at the time of collection. We do provide a sample spreadsheet and a pedigree file which contain ethnicity and gender for 1000 Genomes samples.
The 1000 Genomes Project is not accepting volunteers to be sequenced. More information about how samples were recruited please see the About page.
Another large scale resequencing project that does still have rounds of recruitment is the Personal Genomes Project
The most important available existing expression datasets involving 1000 Genomes individuals are probably the following:
RNAseq (mRNA & miRNA) on 465 individuals (CEU, TSI, GBR, FIN, YRI)
Pre-publication RNA-sequencing data from the Geuvadis project is available through http://www.geuvadis.org
http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/samples.html
http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-2/samples.html
RNAseq on 60 CEU individual [1]
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197
Expression arrays on about 800 HapMap 3 individuals with a lot of overlap with 1000g data [1,2]
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-198
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-264
RNAseq for 69 YRI individuals [3]
http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-19480
References
These codes represent our populations, each three letter code represents a different population, CEU means Northern Europeans from Utah and TSI means Tuscans from Italy. There is a summary of all these codes both in a readme on the ftp site and in the alternative question Which populations are part of your study?
There are 26 different populations which are part of our study from many different locations around the globe. The following table lists these populations and indicates what data we currently have available for them.
Population Code | Population Description | Super Population Code | Sequence Data Available | Alignment Data Available | Variant Data Available |
---|---|---|---|---|---|
CHB | Han Chinese in Beijing, China | EAS | 1 | 1 | 1 |
JPT | Japanese in Tokyo, Japan | EAS | 1 | 1 | 1 |
CHS | Southern Han Chinese | EAS | 1 | 1 | 1 |
CDX | Chinese Dai in Xishuangbanna, China | EAS | 1 | 1 | 1 |
KHV | Kinh in Ho Chi Minh City, Vietnam | EAS | 1 | 1 | 1 |
CEU | Utah Residents (CEPH) with Northern and Western European Ancestry | EUR | 1 | 1 | 1 |
TSI | Toscani in Italia | EUR | 1 | 1 | 1 |
FIN | Finnish in Finland | EUR | 1 | 1 | 1 |
GBR | British in England and Scotland | EUR | 1 | 1 | 1 |
IBS | Iberian Population in Spain | EUR | 1 | 1 | 1 |
YRI | Yoruba in Ibadan, Nigeria | AFR | 1 | 1 | 1 |
LWK | Luhya in Webuye, Kenya | AFR | 1 | 1 | 1 |
GWD | Gambian in Western Divisions in the Gambia | AFR | 1 | 1 | 1 |
MSL | Mende in Sierra Leone | AFR | 1 | 1 | 1 |
ESN | Esan in Nigeria | AFR | 1 | 1 | 1 |
ASW | Americans of African Ancestry in SW USA | AFR | 1 | 1 | 1 |
ACB | African Caribbeans in Barbados | AFR | 1 | 1 | 1 |
MXL | Mexican Ancestry from Los Angeles USA | AMR | 1 | 1 | 1 |
PUR | Puerto Ricans from Puerto Rico | AMR | 1 | 1 | 1 |
CLM | Colombians from Medellin, Colombia | AMR | 1 | 1 | 1 |
PEL | Peruvians from Lima, Peru | AMR | 1 | 1 | 1 |
GIH | Gujarati Indian from Houston, Texas | SAS | 1 | 1 | 1 |
PJL | Punjabi from Lahore, Pakistan | SAS | 1 | 1 | 1 |
BEB | Bengali from Bangladesh | SAS | 1 | 1 | 1 |
STU | Sri Lankan Tamil from the UK | SAS | 1 | 1 | 1 |
ITU | Indian Telugu from the UK | SAS | 1 | 1 | 1 |
These populations have been divided into 5 super populations
When the code ALL is used this means that all individuals from that release are being considered.
There is a list of samples who are part of the project available from this spreadsheet. There is also a pedigree file available from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped
Please note this spreadsheet does list samples who are related to the ones we are sequencing but aren’t themselves being sequenced. If a sample has no data in the Total LC or Total E Sequence columns it means it was not sequenced for the main project