Excess IBD Regions
by Kevin Alan Brook
As genetic genealogists, we know we need to verify our matches via triangulation to confirm that they are identical-by-descent (IBD) from a common ancestor. Triangulation means every member of a match cluster (triangulation group) has to match every other member who is expected to have overlaps with them (depending on what portions of it they inherited) across their particular portions of the same ancestral segment on the same chromosome, and this ancestral segment has definable start and end points but not every holder of the segment will have inherited the whole thing.
We should also try to phase the inheritance of matches between parent-child pairs. If a segment is above 3.5 centimorgans (cM) long and phases from a parent to a child, it is usually (though not always) valid. It's best to stick with phased segments above 6 cM for high confidence, and those segments must also belong to a triangulation group with a total of at least 4 segment holders who are not very closely related to one another.
Clusters of false matches fail to triangulate and appear in a person's match list due to a phenomenon called identical-by-chance (IBC). There are certain areas of DNA that are notorious for large numbers of matches that are either false or ancient. When a segment appears solely within an Excess IBD Region, it should probably be disregarded as it will be more likely than not to be a time-waster leading to the discovery of false matches.
"Relationship Estimation from Whole-Genome Sequence Data" by Hong Li, Gustavo Glusman, et al. published in PLOS Genetics on January 30, 2014 in its volume 10, number 1, identified the following positions with widespread Excess IBD Regions:
- Chromosome 1 from 118 million to 153 million
- Chromosome 2 from 85 million to 99 million
- Chromosome 2 from 132 million to 141 million
- Chromosome 2 from 192 million to 198 million
- Chromosome 8 from 10 million to 13 million
- Chromosome 9 from 38 million to 72 million
- Chromosome 10 from 44 million to 53 million
- Chromosome 15 from 20 million to 25 million
- Chromosome 15 from 27 million to 30 million
- Chromosome 16 from 19 million to 24 million
- Chromosome 17 from 59 million to 65 million
- Chromosome 17 from 77 million to 78 million
- Chromosome 21 from 16 million to 19 million
- Chromosome 22 from 16 million to 25 million
I've bolded the segment areas on chromosome 1, 15, 21, and 22 as especially problematic. I usually get overwhelmed with false matches in those areas.
The segment area on chromosome 9 provides very old matches as this is a well-known "cold spot". This is true even when the segment begins at an earlier position like 33 million. There may indeed be a common ancestor, but he or she lived very far back in time, sometimes more than the range of 20 generations that direct-to-consumer autosomal DNA tests are limited to. (By contrast, most matches we find on more normal segment areas are related to us within the past 16 generations and will be 14th cousins or closer.)
I have used GEDmatch since 2015 and I've observed several more positions that frequently give trouble.
- Chromosome 20 from 57 million to 59 million - I encounter many false matches there.
- Other sections of Chromosome 1 tend to give false matches, such as near the beginning of it.
- Other sections of Chromosome 22 can yield false matches, like between 46 million and 47 million.
- Also, we encounter many false matches that also cross the area between the two Excess IBD Regions on Chromosome 15 and bridge the two sides.
- The Human Leukocyte Antigen (HLA) region on Chromosome 6 from 29 million to 33 million, containing genes having to do with the immune system, can provide pileups of matches that are sometimes very old, if the match is solely within that range (doesn't either start much earlier on the chromosome or end much later). The matches are not necessarily going to be found to be invalid, but may not be useful for genealogical purposes either. Such matches will often be what can be considered a "population match" representing a common ethnicity.
That is not to say that all false matches will appear on the above areas. A false match can appear just about anywhere.
Some types of false matches include:
- Failure to triangulate at all: This is also known as a false pileup.
- Incomplete triangulation: Some pairs of matches on a cluster don't "match" each other across the entirety of their expected lengths, even while they fully "match" some others there.
- Low SNP counts: For example, when two Family Tree DNA kits are compared against each other, the SNP count ratio usually should be at least 100 SNPs for every 1 centimorgan (cM), because they are using the same type of chip. This is also true for the fully compatible comparisons between Family Tree DNA kits and AncestryDNA v1 kits, Family Tree DNA kits and MyHeritage kits, Family Tree DNA kits and 23andMe v3 kits, MyHeritage and 23andMe v3 kits, MyHeritage and AncestryDNA v1 kits, and 23andMe v3 kits and AncestryDNA v1 kits. This rule does not apply to comparisons between Family Tree DNA kits and 23andMe v4 kits, nor between Family Tree DNA kits and AncestryDNA v2 kits, because many of the particular SNPs tested by those chips differ considerably so the SNP counts that are findable are lower than we'd prefer. [Disclaimer: This website is a direct affiliate of Family Tree DNA and an indirect affiliate of AncestryDNA.]
Genetics of
ethnic groups around the world
Small Ethnicity Percentages are Sometimes Real - another essay by Kevin Alan Brook
Khazaria.com receives
monetary commission payments
from Family Tree DNA and Viglink
from sales generated through links.