The purpose of this vignette is to show how to work with pedigrees and marker data in pedtools.
To get started, install the current CRAN version of the package:
Alternatively, you may want the latest development version from GitHub:
Now you should be able to load pedtools.
In pedtools and the ped suite packages, pedigrees
are stored as
ped objects. We start by explaining briefly
what these objects look like, and their basic constructor. If you are
reading this vignette simply to learn how to create a particular
pedigree, you may want to skip ahead to section 1.3 where we describe
practical shortcuts to common pedigree structures.
The constructor function
The most direct way to create a pedigree in pedtools is with the
ped() constructor. This takes as input 4 vectors
of equal length:
id: individual ID labels (numeric or character)
fid: id of the fathers (0 if not included)
mid: id of the mothers (0 if not included)
sex: gender codes, with entries 0 (unknown), 1 (male) or 2 (female)
In other words, the j’th pedigree member has label
mid[j], and gender given by
For example, the following creates a family trio, i.e. father, mother and child:
In this example the child (
id=3) is female, since the
associated entry in
sex is 2. Note that missing parents are
*. Individuals without parents are called
founders of the pedigree, while the nonfounders have
both parents specified. It is not allowed to have exactly one
Instead of numerical labels as above, we could have used character
strings. Let us create the trio again, with more informative labels, and
store it in a variable named
The special strings
NA are all interpreted as a missing parent.
How pedigrees are stored internally
From the way it is printed, the object
trio appears to be a
data frame, but this is not exactly true. Rather it is an object of
ped, which is basically a list. We can see the actual
trio by unclassing it:
In most cases it is not recommended for regular users to interact
directly with the internal slots of a
ped, since this can
have unfortunate consequences unless you know exactly what you are
doing. Instead, one should use accessor functions like
founderInbreeding(). The most important accessors are
described within this vignette, while others are documented in the help
To plot a pedigree, simply use
Under the hood,
pedtools::plot() imports the excellent
alignment algorithm from the kinship2 package. The
plotting itself is done within pedtools, which has some
advantages over kinship2, including the ability to plot
singletons. A wealth of plot options are available. An overview can be
found in the documentation
?plot.ped, but a quick example
should get you started:
See Section 2.2 for how to add, and control the appearance of, marker genotypes to pedigree plots.
Rather than using the
ped() function directly, it is
usually quicker and safer to build pedigrees step by step, applying the
arsenal of utility functions offered by pedtools. A
typical workflow is as follows:
You will find several examples below, but first let us list the available tools for each of the 3 steps.
The following pedigree structures serve as starting points for pedigree constructions. For parameters and details, see
singleton(), a pedigree consisting of a single individual
nuclearPed(), a nuclear pedigree (parents+children)
halfSibPed(), two sibships with one parent in common
linearPed(), a straight line of successors
avuncularPed(), uncle-nephew and similar relationships
cousinPed(), cousins of specified degree/removal
halfCousinPed(), half cousins of specified degree/removal
ancestralPed(), a family tree containing the ancestors of a single person
There are also more specialized structures, including double cousins,
selfing pedigrees, and consecutive matings between full siblings. Look
them up in
?ped_complex if you are interested.
The functions below are used to modify an existing
object by adding/removing individuals, or extracting a sub-pedigree. For
addChildren(), with special cases
Edit labels and attributes
The following functions modify various attributes of a
?ped_modify for parameters and details.
As our first example we will recreate the
without using the
ped() constructor. To give a hint of the
flexibility, we show 3 alternative ways to code this.
The obvious starting point is
nch = 1 to indicate 1 child. By default, this creates a
trio with numeric labels (father=1; mother=2; child=3) and a male child.
Hence we fix the gender with
swapSex(), and edit the labels
Pedigree building with pedtools works very well with
R’s pipe operator
|>. For example, the above commands
could be written in one chain like this:
Even quicker than the pipe version is the following one-liner, where all details are specified directly in the call to
Here is another possibility. We start by creating the father as a singleton, and then add the daughter:
addDaughter() automatically created the mother
as “NN_1”, so we needed to relabel her.
This time we will create this inbred family:
One approach is to first create the pedigree consisting of individuals 1-6, with
halfSibPed(), and then use
to add the inbred child.
We could also view the half siblings 4 and 5 as half cousins of degree 0. The
halfCousinPed() function accepts an option
child = TRUE adding an inbred child. The labels will be
different with this approach, so we need to relabel at the end.
We can see that the two alternatives produce the same result:
Here we consider the family tree below, extending both upwards and
downwards from a single person. We will use this example to demonstrate
mergePed() function, which “glues” together two
pedigrees by the indicated members.
Note the argument
by = c("7" = "2"), which means that
individual “7” in
x should be identified with “2” in
y. As seen in the plot below, this individual ends up as
“8” after relabelling in the final result.
Many situations call for selecting all pedigree members sharing some property, e.g., all females, or all descendants of some person. Several utilities in pedtools exist to help with such tasks. Generally these come in two flavours: 1) members with certain global property, and 2) members with a certain relationship to a given individual.
Pedigree members with a certain property
Each of the following functions returns a vector specifying the members with the given property.
By default, the output of these functions is a character vector
containing ID labels. However, adding the option
internal = TRUE will give you an integer vector instead,
reporting the internal indices of the members. This is frequently used
in the source code of pedtools, but is usually not
intended for end users of the package.
Relatives of a given individual
The functions below take as input a
ped object and the
label of a single member. They return a vector of all members with the
given relation to that individual.
The other main theme of the pedtools package (pedigrees being the first) are marker genotypes.
Marker objects created with the
marker() function. For
example, the following command makes an empty marker associated with the
As shown in the output, the marker is indeed empty: All pedigree
members have missing genotypes, and there is no assigned name or
position. By default, markers are diallelic, with alleles 1 and 2, with
equal frequencies. For a more interesting example, let us make a SNP
named “snp1”, with alleles “A” and “B”. The father is homozygous “A/A”,
while the mother is heterozygous. We store it in a variable
m1 for later use.
This illustrates several points. Firstly, individual genotypes may be specified using the ID labels. The different alleles occurring in the genotypes is interpreted as the complete set of alleles for the marker. Finally, these are assigned equal frequencies. Of course, this behaviour can be overridden, by declaring alleles frequencies explicitly:
The markers chromosome can be declared using the
argument, and similarly its position by
Markers with unknown chromosome are treated as autosomal. To define an
X-linked marker, put
chrom = "X". the fact that males are
hemizygous on X (i.e. they have only one allele) is reflected in the
printout of such markers:
A side note: It may come as a surprise that you don’t need quotes
around the ID labels (which are characters!) in the above commands. This
marker() uses non-standard evaluation
(NSE), a peculiarity of the R language which often leads to less
typing and more readable code.1 Unfortunately, this doesn’t work with
numerical ID labels. Thus to assign a genotype to someone labelled “1”
you need quotes, as in
marker(trio, "1" = "A/A").
Including marker data in a pedigree plot is straightforward:
The appearance of the genotypes can be tweaked in various ways, as
?plot.ped. Here’s an example:
In most applications it is useful to attach markers to their
ped object. In particular for bigger projects with many
markers, this makes it easier to manipulate the dataset as a unit.
To attach a marker object
m (which could be a list of
several markers) to a pedigree
x, there are two main
The difference between these is that
replaces all existing markers, while
m to the existing ones. In our
the two are equivalent since there are no existing markers.
There is a handy shortcut,
addMarker() (without the
‘s’), allowing you to create and attach a single marker in one go. The
addMarker(x, ...) is essentially equivalent to
setMarkers(x, marker(x, ...)). It is also well adapted to
piping, as in this example:
Selecting and removing attached markers
Four closely related functions functions are useful for manipulating markers attached to a pedigree:
selectMarkers(), returns a
pedobject where only the indicated markers are retained
removeMarkers(), returns a
pedobject where the indicated markers are removed
getMarkers(), returns a list of the indicated markers
whichMarkers(), returns the indices of the indicated markers
All of these have exactly the same arguments, described in more
?marker_select. Let us do a couple of examples
here. Recall that by now, our
trio has two attached
markers; the first is called “snp1”, and the other is on the X
Internally, a marker object is stored as a matrix with two columns (one for each allele) and one row for each pedigree member. The matrix is numeric (for computational convenience) while the allele labels and other meta information are added as attributes. The most important of these are:
alleles: The allele labels, stored as a character vector.
afreq: The allele frequencies, in the same order as the alleles. An error is issued if the frequencies do not sum to 1 after rounding to 3 decimals.
name: The marker name, which can be any character string not consisting solely of digits.
chrom: The chromosome name. This can be given as an integer, but is always converted to character. The special values “23” and “X” are recognized as the human X chromosome, which affects the way genotypes are printed.
posMb: Chromosomal position given in megabases.
In addition to those listed above, there are two more attributes:
sex. They store the ID labels
and genders of the pedigree associated with the marker, and are only
used to empower the printing method of marker objects.
Marker accessor functions
For each marker attribute listed above, there is a corresponding function with the same name for retrieving its content. These functions take as input either a
marker object, or a
object together with the name (or index) of an attached marker. This may
sound a bit confusing, but a few examples will make it clear!
Recall that our marker “snp1” exists in two copies: One is stored in
m1, while the other is attached to
trio. In both cases we can extract the allele frequencies
with the function
We can also modify the frequencies using this syntax. To
avoid confusion about the allele order, the frequencies must be named
with the allele labels (just as in the output of
In addition to the functions getting and setting marker attributes,
there is one more important marker accessor, namely
genotype(). This returns the genotype of a specified
individual, and can also be used to modify genotypes. As the others, it
can be applied to marker objects directly, or to pedigrees with attached
markers. Here we show a few examples of the latter type:
xalways denotes a
|Use …||When you want to …||For example to …|
||extract all alleles as a matrix.||do summary stats on the marker alleles|
||extract allele frequencies as a data.frame in allelic ladder format.||transfer to other objects, or write the database to a file|
extract list of marker objects. Each marker is a
replace the genotypes of
||erase all genotypes|
replace all allele frequencies without changing the genotype data. The
input is a data.frame in allelic ladder format. Conceptually
||change the frequency database|
||attach marker objects with or without genotype data. Locus attributes are indicated as a list; genotypes as a matrix or data.frame.||prepare joint manipulation of a pedigree and marker data|
||pretty-print ped objects|
||modify a pedigree with marker data|
||transfer genotypes and attributes between pedigree objects (or lists of such).||transfer simulated marker data|
You may have come across NSE before, for instance when
subset() on a data.frame. To learn more about NSE, I
recommend this book chapter by Hadley Wickham: