Last Update: February 24, 2006
We are pleased to make a
preliminary release of genotype data from the haplotype
map of the inbred mouse project being led by the Daly Laboratory at The Broad Institute
of Harvard and MIT and the
Abstract
We aim to create a genome-wide haplotype map
in all commonly used inbred lab strains of mice in order to enable efficient
positional cloning and genotype-phenotype correlation studies. Recent data has
established that the genomes of commonly used inbred lab mice, the primary
mammalian model system, are simple mosaics of long segments from a limited
number of distinct subspecies ofMus musculus. By providing a more complete catalog of variation
patterns in each modern strain and their origins, the map will allow the use of
QTL mapping data from many crosses simultaneously, as well as strain phenotype
data, to accelerate the fine mapping and identification of genes responsible
for medically relevant phenotypes.
Strains
examined to date
"Classical"
Inbred Laboratory Strains |
Wild-derived
Strains |
|
129S1/SvImJ 129S4/SvJae 129X1/SvJ A/J AKR/J BALB/cByJ BTBR+Ttftf BUB/BnJ C3H/HeJ C57BL/6J C57BLKS/J C57BR/cdJ C57L/J C58/J CBA/J CE/J DBA/1J DBA/2J DDK/Pas |
FVB/NJ I/LnJ KK/HIJ LG/J LP/J MA/MyJ NOD/LtJ NON/LtJ NZB/B1NJ NZW/LacJ O20 PL/J Qsi5 RIIIS/J SEA/GnJ SJL/J SM/J ST/bJ SWR/J |
Mus m. castaneus CAST/Ei Mus m. musculus CZECHII/Ei PWD/Ph Mus m. molossinus JF1/Ms MAI/Pas MOLF/Ei MSM/Ms Mus m. domesticus PERA/Ei WSB/Ei Mus spretus SEG/Pas SPRET/Ei |
Markers
The initial data release consists of
138,793 Single Nucleotide Polymorphisms (SNP) dispersed at ~20kb intervals
across the entire euchromatic genome (with the
exception of the Y chromosome). Markers for the HapMap
chips, based on snps discovered from
several inbred laboratory strains with supplemental discovery from a
wild-derived musculus strain (CzechII/Ei),
were selected to be as evenly-spaced as possible with additional
reinforcement in sparsely-covered regions to help guarantee successful
assays. All markers are placed relative to Mouse Build 33 (May 2004 assembly)
and are reported for the forward direction on that assembly.
Raw genotypes from the Affymetrix arrays designed for
this project were filtered to remove those:
1) Showing excess homology to the interrogated genome fraction,
2) Affymetrix quality scores greater than 0.25 (range 0-0.5, 0 being the best),
3) Average quality score of retained calls for each allele among all typed strains was greater than 0.1 and was not within a factor of 2 of each other,
4) Where the particular strains used to discover a particular SNP did not show the expected allele calls.
Important Notes
More details about the design and performance of these arrays (two arrays similar in nature to the 500K human arrays) as well as SNP ascertainment and flanking sequence will be added to this site soon. The data here is still under development and while internal consistency suggests genotyping accuracy is high (~6500 SNPs duplicated on the two arrays show 99.8% consistency), further extraction of data from the arrays and additional QC may be performed so the data should be considered preliminary at this point. The final version of this data will be integrated with other existing SNP data and reflected at public resources at Jackson Labs, NCBI and elsewhere in short order so this site should be viewed as a transient data release site.
Download
Files are tab delimited files that range in length from 1239 lines in length (Chr X) to 12,104 lines (Chromosome 1) and so should be readable in Excel. The format is markers in rows and strains in columns.
Chromosome |
File Size |
1325956 |
|
1144265 |
|
1058666 |
|
951744 |
|
927466 |
|
889708 |
|
859335 |
|
849330 |
|
835596 |
|
801542 |
|
793624 |
|
711231 |
|
699475 |
|
698529 |
|
582587 |
|
567558 |
|
519499 |
|
495696 |
|
376238 |
|
135965 |
Download SNP Flank Sequences Zipped File of SNP Flanks 35.1Mb
· Each snp has one line: <snp name/position>TAB<bracket notation snp sequence>
· All sequence is relative to the + strand of the 2004 sequence (NCBI Build 33).
· Sequence is 500 bases + snp + 500 bases.
· Each snp-bracket has the B6 allele as the numerator.
· Nearby snps (whether on the hapmap or not) are N'd out. As a result, sequence is intended for assay design rather than genomic placement per se.
· Lowercase sequence indicates repeatmasking.
NOTE: This file is a simple tab delimited text file and CANNOT be opened using Excel.
Acknowledgements
Broad
Affymetrix
Perlegen Sciences
NHGRI
NIEHS