USDA-MARC SNPS for parentage, traceback, and animal ID
SNP Selection Criteria
To achieve optimum power in U.S. and Canadian beef and dairy populations, the first selection criterion was for SNPs with average minor allele frequencies (MAF) > 0.40 in each of three groups of animals. These cattle groups may be generally characterized as "diverse U.S. beef cattle", "prominent U.S. dairy cattle", and "cross-bred Canadian beef cattle". The selection and assembly of an earlier version the diverse U.S. beef cattle panel has been previously described [Heaton et al. 2001]. Version 2.9 was used here and consisted of 96 sires from the following breeds: Angus (n = 6), Hereford (n = 6), Limousin (n = 6), Simmental (n = 6), Charolais (n = 6), Red Angus (n = 6), Gelbvieh (n = 6), Brahman (n = 5), Brangus (n = 5), Beefmaster (n = 5), Salers (n = 5), Shorthorn (n = 5), Maine-Anjou (n = 5), Longhorn (n = 4), St. Gertrudis (n = 4), Chianina (n = 4), Braunvieh (n = 4), Corriente (n = 4) Tarentaise (n = 4). Selection criteria for the U.S. dairy cattle panel included sire prominence and diversity and was composed of sires from the following breeds: Holsteins (n = 85), Jerseys (n = 7), Brown Swiss (n = 2), and Guernsey (n = 2 ). On the basis of the number of registered progeny for each breed, the beef and dairy breed panels each represent greater than 99% of the germplasm used in the U.S. beef cattle industry. The composition of the cross-bred Canadian beef cattle has been previously described (Nkrumah 2007) and contained 464 cattle containing germplasm primarily from Angus, Charolais, Hereford, Simmental, Galloway, and other breeds. Although these three animal groups represent the vast majority of alleles in U.S. and Canada, based on the panel composition, alleles originating from Bos indicus germplasm are expected to be less than 3% of the total.
A second selection criterion was to have approximately 20 to 30 cM spacing on autosomes based on a composite bovine map [Snelling et al. 2007]. This decreases the undesirable effect of adjacent alleles being inherited together in related offspring. With a 3000 cM genome, there are between 150 and 100 parentage SNPs that will fit the spacing pattern.
A third selection criterion was to eliminate markers with undesirable sequence features in the region immediately flanking 500 bp on either side of the target SNP. These features include: large blocks of bovine repetitive elements, high-melting temperature stem-loops near the target SNP (>75C) , high-frequency insertion/deletion polymorphisms and SNPs near the target. Any of these features have the possibility to prevent or reduce the accuracy of genotype assays on all of the most popular genotyping platforms. To identify these flanking sequence features, the region surrounding the target SNP was sequenced in the diverse U.S. beef cattle and prominent U.S. dairy cattle panels described above (n = 192). On average there was one SNP per 78 bp of DNA sequenced in this group of 192 cattle with approximately 13 SNPs per kb sequenced. Where possible, additional groups of 24 animals were resequenced ad hoc for available breeds. Additional information is added to the public data base as new samples from breeds become available.
