Introduction
Families are social networks of related individuals belonging to
different generations. A social network may be approached from a
holistic perspective, which concentrates on network
characteristics, or an egocentric perspective, which
concentrates on network ties of a reference person or focal individual,
called ego. Ego may be a member of the oldest generation, the
youngest generation, or an intermediate generation. That leads to
different perspectives on population. If ego is a member of the oldest
generation, ego’s perspective is that of a parent, a grandparent and a
great-grandparent. If, on the other hand, ego is a member of the
youngest generation, the perspective is that of a child, grandchild and
great-grandchild. Members of an intermediate generation are both parents
and children. Some of these members may go through a stage of life with
old parents and young children and experience the double-burden of
caring for parents and children. To study family ties from different
perspectives, simulation is often used. Simulation produces a virtual
population that mimics a real population. The multi-generation virtual
population offers unique opportunities to study aspects of family
demography that did not receive much attention until recently: (a) the
study of the population from a child’s perspective, (b) the demography
of grandparenthood (perspective of the elderly), and (c) the double
burden experienced by the sandwich generation.
The three subjects listed are receiving a growing attention in the
demographic literature. The dominant method of analysis is
microsimulation and SOCSIM is the software package of choice. SOCSIM
creates individual life histories and genealogies from age-specific
mortality and fertility rates (Wachter et al.
1997). ‘VirtualPop’, a simulation package in R (Willekens2022package?), uses a
similar method to produce individual life histories and genealogies in
multigeneration populations. There is an important difference, however:
simulation proceeds in continuous time. Microsimulation in continuous
time has two important advantages (Willekens
2009). First, no two events can occur in the same interval, which
resolves the issue of event sequence that requires the user to determine
which event comes first and which second. In SOCSIM, the simulation
proceeds month by month and events scheduled in a month are executed in
random order (Wachter et al. 1997, p.
93). Second, the duration between events can be
computed exactly, not approximately.
Microsimulation is often used to study subjects that cannot be
studied easily otherwise. Wolf (1994),
Arpino et al. (2018), Margolis (2016), Margolis
and Verdery (2019) and others adopt the perspective of the
elderly, and raise issues such as the presence of children and
grandchildren, the number of grandchildren, the timing of
grandparenthood, the duration of grandparenthood, and the age of
grandchildren at a particular moment in a grandparent’s life. A second
subject is population structure and demographic change from the
perspective of children. A child interprets demography as the presence
of parents, grandparents, siblings and peers. The perspective of
children is rarely adopted in demography (Mills
and Rahal 2021, p. 20). Lam and Marteleto
(2008) and Verdery (2015) adopt the
perspective of a child to interpret demographic change. Using survey
data, Bumpass and Lu (2000) study the
children’s family contexts in the United States. The third subject is
the double burden of child care and elderly care. The double burden was
recently studied by Alburez-Gutierrez et al.
(2021). The authors used mortality and fertility rates of 198
countries and territories included in the 2019 Revision of the UNWPP and
the period 1950-2100 to estimate the time spent caring for as child (and
grandchild) and an elderly parent. The output is a complete kinship
network for the 1970–2040 birth cohorts from which it is possible to
determine the time each simulated individual spent sandwiched as a
parent and grandparent. The authors conclude that “Demographic
microsimulation allowed us to overcome the lack of international and
comparable data on past, present, and future kinship structures.”
The purpose of the vignette is to shown how to navigate the
multi-generation virtual population database to extract demographic
information of direct interest to aspects of family demography listed
above. The desired information is extracted using queries, formulated as
R functions. The code uses subsetting, which is a feature of vectorised
operations. Most of the functions in R are vectorised, which means that
a function operates on all elements of a vector simultaneously.
Vectorized operations prevent the loops that makes R slow.
This document consists of eight sections in addition to the
introduction. The next section is a very brief introduction to
‘VirtualPop’. Section three presents six basic queries. They are
sufficient to retrieve families ties and produce kinship networks.
Section four combines the basic functions in queries to obtain
information on the kin of ego: siblings, cousins and aunts. The fifth
section zooms in on grandparenthood. It illustrates how the basic
queries may be used to answer questions about grandparenthood, such as
age at onset and duration. Section six concentrates on the double
burden. It addresses questions such as age at onset and termination, and
the proportion of people who experience a double burden. The last
section presents the fertility table, which is a multistate life table
of fertility histories. The application of the basic functions, alone or
in combination, to extract desired information on families and kinships
in the virtual population may be labeled kinship calculus.
The ‘VirtualPop’
package
The ‘VirtualPop’ package generates a multi-generation virtual
population. You select the number of generations and the size of the
first generation (generation 1). The sizes of the other generations
depend on the fertility rates. For each individual in the virtual
population, the lifespan and the fertility history from death are
generated by sampling age-at-death distributions implicit in the death
rates by age (single years of age) and sex included in the Human
Mortality Database (HMD) and the time-to-event (waiting time)
distributions implicit in the fertility rates by age and birth order.
Rates may be downloaded from the Human Fertility Database (HFD).
‘VirtualPop’ uses conditional fertility rates (in HFD
parlance). They are more widely known as occurrence-exposure rate. For
details, see the tutorial and the other vignettes included in the
‘VirtualPop’ package.
Note that the life histories of members of the virtual population
produced by ‘VirtualPop’ are not projections or forecasts. They are the
life histories and genealogies that would result if the individuals in
successive generations experience the same mortality and fertility. The
demographic indicators computed from the histories convey information on
the actually observed mortality and fertility rates, unless the rates
are programmed to change in time. A virtual population is analogous to a
stable population. In demography, stable population theory is used to
assess the long-term consequences of sets of age-specific fertility and
mortality rates, and rates of transition between living states. In the
long run, life history characteristics and population characteristics
coincide (ergodicity theorem). The same theorem applies to a
multi-generation virtual population that results from a set of
age-specific fertility and mortality rates, and rates of transition
between living states. In the long run, i.e. after a large number of
generations, life history characteristics and population characteristics
coincide.
The fertility and mortality rates of the United States in the
calendar year 2019 and the virtual population generated by the
‘VirtualPop’ from these rates are included in the data folder of the
‘Families’ package. To list the datasets in the package, type \(data(package = 'Families')\). To
attach the package and to load the data, the following code is used:
# Attach the package to the search path
library (Families)
# Load the data in your workspace (environment:R_GlobalEnv)
data(dataLH_F,package = 'Families')
data(rates,package = 'Families')
dLH <- dataLH_F
# Number of generations in the dataset
ngen <- max(unique (dLH$gen))
Note that the virtual population in \(dLH\) differs from the data file
distributed with “VirtualPop$ due to randomness. The virtual population
is produced from the same rates, but with a different random seed. The
virtual population consists of 2965 individuals belonging to 4
generations. The following table shows the number of individuals by
generation and sex:
addmargins(table (Generation=dLH$gen,Sex=dLH$sex))
#> Sex
#> Generation Male Female Sum
#> 1 535 465 1000
#> 2 416 368 784
#> 3 324 343 667
#> 4 266 248 514
#> Sum 1541 1424 2965
Basic functions to
navigate the virtual population database
The key to retrieve information on a reference person or multiple
reference persons is the individual identification number (ID). The ID
of a reference person is denoted by \(IDego\) or \(idego\). Since R treats a scalar as a
vector, a request for information on a single reference person or on a
group of reference persons is treated similarly. \(IDego\) (and \(idego\)) is a vector of reference persons.
In case of a single reference person, the vector consists of a single
element. To retrieve data on an individual’s social network (kinship
network), the ‘Families’ package includes four basic functions to
navigate the virtual population database. They retrieve the IDs of
mother, father, partner and children. A combination of basic functions
retrieves the IDs of siblings, aunts, uncles and cousins. In addition to
these four functions, two basic functions return dates. In
multi-generation populations, working with dates is more convenient than
working with ages. The six functions are:
- \(IDmother(idego)\) retrieves the
ID of ego’s mother.
- \(IDfather(idego)\) the ID of ego’s
father
- \(IDpartner(idego)\) retrieves the
ID of ego’s partner.
- \(IDch(idego)\) retrieves the IDs
of ego’s children
- \(Db(idego)\) retrieves the decimal
date of birth
- \(Dd(idego)\) retrieves the decimal
date of death
Ego’s age at an event is the date of birth minus the date of the
event. For instance, the age at death is the date of birth minus the
date of death: \(Db(idego)-Dd(idego)\).
To retrieve information on kin other than members of the nuclear
family, the basic functions are combined. For instance, the function
call \(IDmother(IDmother(idego))\)
gives the ID of ego’s grandmother and \(IDmother(IDmother(IDmother(idego)))\)
retrieves the ID of ego’s great-grandmother. If an individual for which
information is requested is not included in the database, the missing
value indicator NA is returned. The query \(IDfather(IDmother(idego)))\) returns the ID
of ego’s maternal grandfather and \(IDfather(IDfather(idego)))\) returns the ID
of the paternal grandfather. The function call \(IDch(IDch(idego))\) returns the IDs of
ego’s grandchildren, and \(IDch(IDch(IDch(idego)))\) the IDs of the
great-grandchildren.
The names of the basic functions may not be unique to ‘Families’.
Other packages included in the Comprehensive R Archive Network (CRAN)
may have function by the same name. CRAN has about 20 thousand
contributed packages. To ensure that \(IDmother\) and the other basic functions
attached during the computations are from ‘Families’ and not from
another package in the archive, the double colon operator is used. The
second line of the code chunk that follows ensures that \(IDmother\) of the package ‘Families’ is
attached and not a function by the same name in another package.
In the remainder of this section, information is retrieved on
mothers, grandmothers, children and grandchildren.
Mothers and
grandmothers
By way of illustration, let’s retrieve the data on three individuals,
their mother and maternal grandmother. Before doing that, the functions
of the ‘Families’ package, which are in the package environment, are
copied to the workspace, which is the global environment.
# Create local copies of the functions (in workspace)
IDmother <- Families::IDmother
IDfather <- Families::IDfather
IDpartner <- Families::IDpartner
IDch <- Families::IDch
Db <- Families::Db
Dd <- Families::Dd
The individuals are selected at random from members of the third
generation. The retrieval of information on mother and grandmother
requires the retrieval of (a) the IDs of mother and grandmother, and (b)
the individual records of these persons. The code is:
base::set.seed(30)
idego <- sample (dLH$ID[dLH$gen==3],3)
z <- dLH[c(idego,IDmother(idego),IDmother(IDmother(idego))),]
rownames(z) <- NULL
z[,1:9]
The three individuals selected have IDs 2114, 2037, 2090. The first
three records contain data on the reference persons. Records 4-6 have
data on the mothers of the reference persons and records 7-9 have data
on the grandmothers. For each individual, the table includes ID of the
individual, generation, sex, date of birth, date of death and age of
death. In addition, it includes the ID of the partner, and the IDs of
the mother and father (if their generation is included in the virtual
population). Individual records also include the birth order of the
individual and data on her or his children: the number of children
(nch), the IDs of the children and the ages of the mother at
childbearing. In this R R chunk, the information is limited to the first
3 children:
z[,c(10:14,21:23)]
The second individual has 2 children. The ID of the children are
2462, 2463 and the age of the mother at birth of the child is 29.43,
33.92 years.
To list the IDs of egos, their mothers and grandmothers, the function
call is:
IDmother(IDmother(idego,keep_ego=TRUE))
with IDgm the ID of the grandmother of ego. The second argument of
the function instructs the function to keep the IDs of ego and ego’s
mother. For an explanation of the code, see the description of the
function \(IDmother()\) (by typing
?IDmother into the console).
Children and
grandchildren
Individual with ID 2 has 2 children and 2 grandchildren. The IDs of
the grandchildren are:
# Children
IDch(id=2)
#> [1] 1663
# Grandchildren
IDch(IDch(id=2))
#> numeric(0)
Consider all women in generation 1 (4883 women) and let \(idego\) denote the vector of IDs of the
women. The IDs of women with children are:
# Select all females in generation 1
idego <- dLH$ID[dLH$gen==1 & dLH$sex=="Female"]
# IDs of children
idch <- IDch(idego)
# IDs of mother with children
idm <- unique(IDmother(idch))
with idm the IDs of women with children. 3751 women have children and
1132 remain childless. The IDs of women without children are retrieved
by the typing \(idego\left[idego\%in\%idm==FALSE\right]\).
The proportion remaining childless is \(1-length(idm)/length(idego)\), which gives
22.37 percent. The number of women in generation 1 by number of children
is
addmargins (table(dLH$nch[dLH$gen==1 & dLH$sex=="Female"]))
#>
#> 0 1 2 3 4 5 6 Sum
#> 104 114 131 75 27 9 5 465
The IDs of the oldest and youngest child in a family are relatively
easily determined. The oldest child is the child with the lowest date of
birth. For each woman in the generation 1, the ID of the oldest child is
obtained as follows:
idego <- dLH$ID[dLH$gen==1 & dLH$sex=="Female"]
# ID of children
idch <- IDch(idego,dLH)
# Date of birth of children
dbch <- dLH$bdated[idch]
# Create data frame
zz <-data.frame (ID=idch,dbch=dbch)
# Select, for each ego, child with lowest date of birth
ch_oldest=aggregate(zz,list(dLH$IDmother[idch]),function(x) min(x))
colnames(ch_oldest) <- c("idego","ID of oldest child","date of birth of oldest child")
The object \(ch\_oldest\) is a data
frame with three columns. The first has the ID of the mother, the second
the ID of the oldest child, and the third the date of birth of the
oldest child. The first lines of \(ch\_oldest\) are
For each woman with children, the ID of the youngest child is
# Select, for each ego, child with highest date of birth
# zz has ID of all children and dates of birth of children.
# They are used to select youngest child ever born (by mother)
ch_youngest=aggregate(zz,list(dLH$IDmother[idch]),function(x) max(x))
# Date of birth of mother
ch_youngest$db_mother <- Db(ch_youngest[,1])
# Age of mother at birth youngest child
ch_youngest$agem_chLast <- ch_youngest[,3] - ch_youngest$db_mother
colnames(ch_youngest) <- c("idego","ID_youngest_child","db_youngest_child","db_idego","agemLast")
\(ch\_youngest\) is a data frame
with five columns: the ID of the mother, the ID of the youngest child,
the date of birth of the youngest child, the date of birth of the mother
and the age of the mother at birth of the youngest child. The density
distribution of the ages of mothers at birth of youngest child is:
xmin <- 10
xmax <- 50
library(ggplot2)
p <- ggplot(ch_youngest, aes(agemLast)) +
geom_histogram(aes(y=..density..), alpha=0.5, position="identity",bins=50)+
geom_density(alpha=0.2) +
scale_x_continuous(breaks=seq(xmin,xmax,by=5)) +
scale_y_continuous (breaks=seq(0,0.07,by=0.01)) +
labs(x="Age")
title <- paste ("Age of mother at birth of youngest child; ",attr(dLH,"country"),attr(dLH,"year") )
p <- p + ggtitle(title) +
theme(plot.title = element_text(size = 10, face = "bold"))
p

The mean age is 33.31 years and the standard deviation is 5.25 years.
The median age at birth of the youngest child is \(r round(median(ch_youngest\)agemLast),2)`
years.
Child’s
perspective
The age of a newborn’s mother is the difference between the date of
birth of the child and the date of birth of the mother. The age of a
mother at a given reference age of a child is the age of the mother at
birth of the child plus the reference age, provided the mother and the
child are both alive at that age. Suppose we are interested in the age
distribution of mothers of 10-year old children. A 10-year old is the
reference person (ego). Consider females of generation 1. The IDs of
mothers alive at ego’s 10 birthday is:
Status_refageEgo <- function(refage_ego)
{ # IDs of egos (children)
idego <- IDch(dLH$ID[dLH$gen==1 & dLH$sex=="Female"])
# Is ego alive at reference age?
alive_ego <- Dd(idego) >= Db(idego) + refage_ego
# Number of children (ego) alive (=TRUE) and dead (=FALSE) at reference age
t1 <- table (alive_ego)
# IDs of egos alive at reference age
idegoRefage <- idego[alive_ego]
# Is mother alive at ego's reference age?
alive_m <- Dd(IDmother(idego)) >= Db(idego) + refage_ego
# Number of mothers alive (=TRUE) or dead (=FALSE) at reference ages of egos
t2 <- table (alive_m)
aa <- list (idego=idego,
alive_ego=alive_ego,
alive_m=alive_m,
LivingStatus_ch_refage=t1,
LivingStatus_m_refage=t2)
return(aa)
}
refage_ego <- 10
out <- Status_refageEgo (refage_ego)
The proportion of egos with a living mother at their 10th birthday
is
round (length(out$alive_m[out$alive_m]) / length(out$alive_ego),2)
#> [1] 0.99
The age distribution of mothers at reference age of children is:
# Age of living mothers at refage of ego
idegoRefage <- out$idego[out$alive_ego]
age_m <- (Db(idegoRefage) + refage_ego - Db(IDmother(idegoRefage)))[out$alive_m]
mean(age_m)
#> [1] NA
sd(age_m)
#> [1] NA
hist(age_m,main=paste ("Age of living mother at age ",refage_ego," of ego",sep=""),breaks=40,xlab="Age of living mother")
box()

The age distribution of living mothers at age 65 of children is:
refage_ego <- 65
out <- Status_refageEgo (refage_ego)
# Age of living mothers at refage of ego
idegoRefage <- out$idego[out$alive_ego]
age_m <- (Db(idegoRefage) + refage_ego - Db(IDmother(idegoRefage)))[out$alive_m]
mean(age_m)
#> [1] NA
sd(age_m)
#> [1] NA
hist(age_m,main=paste ("Age of living mother at age ",refage_ego," of ego",sep=""),breaks=40,xlab="Age of living mother")
box()

The age of a child at a given reference age of the mother is computed
in a similar way. The age is the date of birth of the mother plus the
reference age minus the date of birth of the mother, provided mother and
child are alive. The age of ego at the 85th birthday of ego’s mother and
the age of ego at death of ego’s mother (provided ego is alive at these
ages) are:
idego <- dLH$ID[dLH$gen==3 & dLH$sex=="Female"]
age_m85 <- dLH$bdated[IDmother(idego)] + 85 - dLH$bdated[idego]
age_md <- dLH$ddated[IDmother(idego)] - dLH$bdated[idego]
d <- data.frame (idego,m85=age_m85,md=age_md)
library (ggplot2)
Plot_ages <- function (d)
{ # Age of ego at age 85 of mother and at death of mother
dd <- reshape::melt.data.frame(d,id.vars="idego",measure.vars=c("m85","md"))
colnames(dd)[2] <- "Age"
xmin <- 10
xmax <- 90
p <- ggplot(dd, aes(x=value,color=Age,fill=Age)) +
geom_histogram(aes(y=..density..), alpha=0.5, position="identity",bins=50)+
geom_density(alpha=0.2) +
scale_x_continuous(breaks=seq(xmin,xmax,by=10)) +
scale_y_continuous (breaks=seq(0,0.07,by=0.01)) +
xlab("Age")
# Add median
p <- p + theme(legend.position=c(0.76,0.99),legend.justification=c(0,1))
title <- paste ("Age of ego at the 85th birthday of ego's mother and at death of mother; ",attr(dLH,"country"),attr(dLH,"year") )
p <- p + ggtitle(title) +
theme(plot.title = element_text(size = 10, face = "bold"))
p
}
Plot_ages(d)
