International HapMap Project

The International HapMap Project is an organization whose goal is to develop a haplotype map of the human genome (the HapMap), which will describe the common patterns of human genetic variation. The project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in Canada, China, Japan, Nigeria, the United Kingdom, and the United States.

The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease and responses to drugs and environmental factors. The information produced by the project will be made freely available to researchers around the world.

The International HapMap Project officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained in Phase I were published on October 27, 2005. Completion of the HapMap will enable future work. The Japanese teams will study 300,000 people to identify haplotypes that match 47 diseases, and the British will attempt to genotype patients with diabetes, bipolar disorder, rheumatoid arthritis, cardiovascular disease and other common diseases.

Background
Most common diseases and conditions, such as diabetes, cancer, heart disease, stroke, depression and asthma, are affected by many genes and environmental factors. Although any two unrelated people share about 99.9% of their DNA sequence, the remaining 0.1 percent is medically important because it contains the genetic variants that influence how people differ in their risk of disease and response to drugs. Studying these offers one of the best opportunities for understanding the complex causes of many common diseases in humans.

Sites in the genome where the DNA sequences of many individuals vary by a single base are called single nucleotide polymorphisms (SNPs). For example, some people may have a chromosome with an A at a particular site where others have a chromosome with a G. Each form is called an allele.

Each person has two copies of all chromosomes, except the sex chromosomes. The set of alleles that a person has is called a genotype. The term genotype can refer to the SNP alleles that a person has at a particular SNP, or for many SNPs across the genome. A method that discovers what genotype a person has is called genotyping.

Haplotypes
About 10 million SNPs exist in human populations for which the rarer SNP allele has a frequency of at least 1%. Alleles of SNPs that are close together tend to be inherited together. A set of associated SNP alleles in a region of a chromosome is called a haplotype. Most chromosome regions have only a few common haplotypes, which account for most of the variation from person to person in a population. A chromosome region may contain many SNPs, but researchers can use only a few "tag" SNPs to obtain most of the information on the pattern of genetic variation in the region.

The HapMap will describe the common patterns of genetic variation in humans. It will include the chromosome regions with sets of strongly associated SNPs, the haplotypes in those regions and the SNPs that tag those haplotypes. It will also note the chromosome regions where associations among SNPs are weak.

Researchers trying to discover the genes that affect a disease, such as diabetes, will compare a group of people with the disease to a group without the disease. Chromosome regions where the two groups differ in the haplotype frequencies might contain genes affecting the disease. Theoretically, researchers could look for these regions by genotyping 10 million SNPs. However, methods to do this are currently too expensive. The HapMap will identify the 200,000 to 1 million tag SNPs that will provide almost as much mapping information as all 10 million SNPs.

Populations sampled
Most of the common haplotypes occur in all human populations. However, their frequencies differ among populations. Therefore, data from several populations are needed to choose tag SNPs. Pilot studies have found sufficient differences in haplotype frequencies among population samples from Nigeria (Yoruba), Japan, China and the United States (residents with ancestry from Northern and Western Europe) to warrant developing the HapMap with large-scale analysis of haplotypes in these populations. The HapMap developed from information obtained from these populations should be useful for all populations in the world. However, to assess how much more information would be gained by including other populations, a parallel study will examine haplotypes in a set of chromosome regions in samples from several additional populations.

Specifically, the DNA samples for the HapMap will come from a total of 270 people: from the Yoruba people in Ibadan, Nigeria (30 adult-and-both-parents trios), Japanese in Tokyo (45 unrelated individuals), Han Chinese in Beijing (45 unrelated individuals) and the U.S. residents of northern and western European ancestry (30 trios). These numbers of samples will allow the project to find almost all haplotypes with frequencies of 5% or higher.

All of the new samples collected for the project are being obtained with protocols approved by the appropriate ethics committees, after culturally appropriate processes of community engagement or public consultation and individual informed consent. The community engagement process is designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.

Ethical controls
The project raises a number of ethical issues. Since the samples will include no personal identifiers, the privacy risks to individual donors are minimal. However, each sample will be labeled by population to allow researchers to choose tag SNPs that are most useful for each future study population. The tag SNPs will be chosen based on the haplotype frequencies. The tag SNPs for some regions might differ among populations if the haplotype frequencies in those regions were considerably different among populations. Thus the SNP and haplotype frequencies for each population will be calculated, allowing comparisons. This could raise risks of group stigmatization or discrimination, if a higher frequency of a disease-associated variant were found in a population and risks associated with that variant were over-generalized to all or most of members of the population.

Another potential concern is that the inclusion of populations based on ancestral geography could result in categories such as "race," which some contend are socially constructed, while others do not. The project undertook the community consultations to understand community concerns about such issues.

Scientific strategy
To develop the HapMap, the samples will be genotyped for at least 1 million SNPs across the human genome. When the Project started, 2.8 million SNPs were in the public database dbSNP. However, many chromosome regions had too few SNPs, and many SNPs were too rare to be useful, so millions of additional SNPs were needed to develop the HapMap. The project discovered another 2.8 million SNPs by September 2003, and SNP discovery continues. As of August 2006, there are more than 10 million SNPs in the public databases.

For the Phase I, the project was expected to produce a map of 600,000 common SNPs evenly spaced across the genome, which corresponds to a density of one SNP every 5,000 bases. The genotyping was carried out by 10 centres in Canada, China, Japan, the United Kingdom and the United States. Each centre genotyped all the samples for its assigned chromosomes. The centres used five different genotyping technologies.

The Canadian team was led by Thomas J. Hudson at McGill University in Montreal and focused on chromosomes 2 and 4p. The Chinese team was led by Huanming Yang with centres in Beijing, Shanghai and Hong Kong and focused on chromosomes 3, 8p and 21. The Japanese team was led by Yusuke Nakamura at the University of Tokyo and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team wa led by David R. Bentley at the Sanger Institute and focused on chromosomes 1,6, 10, 13 and 20. There were four American genotyping centres: a team led by Mark Chee and Arnold Oliphant located at Illumina inc. in San Diego (chromosomes 8q, 9, 18q, 22 and X), a team led by David Altshuler at the Broad Institute in Boston (chromosomes 4q, 7q, 18p, Y and mitochondrion), a team led by Richard A. Gibbs at the Baylor College of Medicine in Houston (chromosome 12) and a team led by Pui-Yan Kwok at the University of California, San Francisco (chromosome 7p).

During Phase II more than 2,000,000 additional SNPs have been genotyped throughout the genome by the company Perlegen Sciences and 500,000 by the company Affymetrix. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.

Data access and intellectual property
The project releases all data it produces into the public domain, enabling any researcher worldwide to use the information freely.

The new SNPs, assays for genotyping SNPs, and frequencies of SNP alleles, genotypes, and haplotypes have all been released publicly soon after they were produced. Haplotypes, individual genotypes, and tag SNPs have all been released publicly without restrictions.

The project does not include "specific utility" studies to relate genetic variation to particular phenotypes, such as a disease risk or drug response. Participants in the project do not believe that SNP, genotype or haplotype data for which a specific utility has not been generated are appropriately patentable inventions. However, the project's policy does not prevent researchers from applying for patents on SNPs or haplotypes for which they have demonstrated a specific utility, as long as they do not prevent others from obtaining access to project data.

Results of the study so far
The New York Times report Still Evolving, Human Genes Tell New Story states the International HapMap Project is "providing the strongest evidence yet that humans are still evolving" and details some of that evidence.