### Modeling Quantitative Trait Loci and Interpretation of Models

Abstract Archaeogenomic research has proven to be a valuable tool to trace migrations of historic and prehistoric individuals and groups, whereas relationships within a group or burial site have not been investigated to a large extent.

Knowing the genetic kinship of historic and prehistoric individuals would give important insights into social structures of ancient and historic cultures. Most archaeogenetic research concerning kinship has been restricted to uniparental markers, while studies using genome-wide information were mainly focused on comparisons between populations. Applications which infer the degree of relationship based on modern-day DNA information typically require diploid genotype data. Low concentration of endogenous DNA, fragmentation and other post-mortem damage to ancient DNA aDNA makes the application of such tools unfeasible for most archaeological samples.

We show that our heuristic approach can successfully infer up to second degree relationships with as little as 0. We uncover previously unknown relationships among prehistoric individuals by applying READ to published aDNA data from several human remains excavated from different cultural contexts. In particular, we find a group of five closely related males from the same Corded Ware culture site in modern-day Germany, suggesting patrilocality, which highlights the possibility to uncover social structures of ancient populations by applying READ to genome-wide aDNA data.

READ is publicly available from https: These segments, shared between individuals, can be referred to as identical by descent IBD. Knowledge about IBD segments has been used for haplotype phasing [ 12 ], heritability estimation [ 34 ], population history [ 5 ], inference of natural selection [ 6 ] and to estimate the degree of biological relationship among individuals [ 7 ].

A number of methods have been developed to estimate the degree of biological relationship by inferring IBD from SNP genotype or whole genome sequencing data. Knowing whether a pair of individuals is directly related or not, and estimating the degree of relationship is of interest in various fields: Genome-wide association studies and population genetic analyses often try to exclude related individuals since they do not represent statistically independent samples; in forensics, archaeology and genealogy, individuals and their relatives can be identified based on DNA extracted from human remains [ 1516 ]; Breeders and conservation biologists are interested in the relatedness of mating individuals [ 1718 ].

Current methods present significant limitations for the analysis of degraded samples as they rely on diploid genotype calls, low proportions of missing data and sometimes even phase information. Especially in the fields of forensics and archaeology, the amount of endogenous DNA available for analysis is limited due to postmortem degradation [ 19 — 21 ]. In archaeology, the analysis of IBD has the potential to provide an independent means to test kinship behavior and social organization [ 2223 ], but current methods would be restricted to exceptionally well-preserved samples.

In forensic science and practice, the dominant approach has been to type several short tandem repeat STR markers, which in most cases provide sufficient information for relatedness assessment, but the STRs might be hard to type in degraded samples [ 24 ].

In addition to nuclear STRs, mitochondrial and Y-chromosome haplogroups have been widely used to infer family relationships e. These uniparental markers can be typed from degraded samples, and can be used to exclude maternal or paternal relationships, but not to infer the actual degree of relationship. The conclusion by Routman and Cheverud that one can use the UWR model rather than other models to find more epistasis in an F2 population is unfounded.

Incidentally, the regression model also provides a statistical way to analyze and test different genetic effects and variance components. If a model is orthogonal, the tests for different effects and variance components are independent.

This is an advantage of the orthogonal model. Otherwise, a test for epistasis can still be performed by the comparison of test statistics between the full and reduced models with and without epistatic terms. The orthogonal property of the F2 model applies only to a population where allelic frequencies are one-half. In terms of modeling QTL, it is desirable to have a model that has the orthogonal property for a variety of allelic frequency distributions.

Define an indicator variable for alleles by where x is a standardized indicator variable with mean zero. For regression model 1we can use genetic-effect design variables 18 where x1 and x2 are for the two alleles in an individual.

This is called the G2A model. Note that the v variable is proportional to the product of x1 and x2, which explains why the dominance effect is an interaction effect between the two alleles within a locus.

In matrix notation, the G2A model is 19 and 20 In this model, both w and v, by design, are scaled to have mean zero for a population in Hardy-Weinberg equilibrium. Note that the definition of the dominance effect is independent of allelic frequency for one locus, but not for multiple loci.

They are simply the direct products of the matrices for loci A and B in 19 and 20 with some rearrangement of the columns and rows. Traditionally, a in this model is called the average effect, the allelic substitution effect averaged by allelic frequencies for different genotypes.

## Estimating genetic kin relationships in prehistoric populations

Genetically, a major advantage is that the partition of genetic effects is directly related to the partition of the genetic variance. In an equilibrium population in Hardy-Weinberg and linkage equilibriumthe additive effects contribute to the additive variance, the dominance effects contribute to the dominance variance, etc.

There is no covariance between the genetic effects, due to the orthogonal property of the model. This orthogonal property is also convenient for statistical tests and estimation of QTL effects, as the effects can be tested and estimated separately, although simultaneous estimation will always perform better statistically.

Hardy-Weinberg and linkage disequilibria do not change the definitions and also statistical estimation of the genetic effects with respect to the loci defined in a full model. In the above discussion for two loci with nine genotypic values and nine parameters, given a genetic-effect design matrix there is a unique solution for the parameter values in terms of the genotypic values.

In the next section, we give a numerical example of three loci to show that the genetic effects for each model are the same for different configurations of allele frequencies and linkage disequilibrium in the full model, but not necessarily in a reduced model.

In the appendixwe show this for the relatively simple case of a haploid model with two loci. Disequilibrium will introduce genetic covariance between different effects.

Since the genetic effects estimated in a disequilibrium population in the full model are the same as those in the equilibrium population for the loci concerned if the loci are not in disequilibrium with other locithe additive, dominance, and epistatic variances estimated in a disequilibrium population are still the same as those in the equilibrium population.

But there are covariances between different genetic effects due to disequilibrium. However, disequilibria will change the definition and estimation of genetic effects in a reduced model.