The ipolygrowth
package calculates bacterial growth
curve parameters using 4th degree polynomial functions. Polynomial
growth curves are estimated from time series data from a single
biological sample or multiple samples. Technical replicates within
biological samples are allowed.
In this vignette, we will demonstrate examples with both single and
multiple samples as input data. To use this package, we need to start
with installing and loading the ipolygrowth
package.
# install.packages(c("ipolygrowth", "dplyr", "ggplot2"))
library(ipolygrowth)
library(dplyr)
library(ggplot2)
library(kableExtra) # for RMarkdown table output
Expected outputs are shown in the rendered version of the [vignette] (https://CRAN.R-project.org/package=ipolygrowth) on CRAN.
We will use the bacterial growth data from the R package growthrates for demonstration. This data contains growth data of 3 bacteria strains and 12 antibiotics tetracycline concentration levels. Each strain-concentration combination has 2 replicates. In each replicate, the growth of bacteria is measured 31 times (i.e. 31 time points in the time series). For our demonstration, we consider each strain-concentration combination as a sample (i.e. a biological replicate) and each replicate as a technical replicate.
For more details about this data, see https://CRAN.R-project.org/package=growthrates.
The bactgrowth.txt
can be loaded from the
growthrates
package directly or be downloaded from here
and read in via read.table()
. Let’s load the data and check
its structure.
if (!"growthrates" %in% installed.packages()) {install.packages("growthrates")}
df.gr <- growthrates::bactgrowth
# Download the `bactgrowth.txt` to the directory of your script. Then run:
# data <- read.table("bactgrowth.txt", header = TRUE) %>%
# mutate(strain = factor(strain, levels = c("D", "R", "T")))
str(df.gr)
#> 'data.frame': 2232 obs. of 5 variables:
#> $ strain : Factor w/ 3 levels "D","R","T": 3 3 3 3 3 3 3 3 3 3 ...
#> $ replicate: int 2 2 2 2 2 2 2 2 2 2 ...
#> $ conc : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ time : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ value : num 0.013 0.014 0.017 0.022 0.03 0.039 0.042 0.045 0.048 0.049 ...
We can also take a look at the first few rows of the data.
head(df.gr)
#> strain replicate conc time value
#> 1 T 2 0 0 0.013
#> 2 T 2 0 1 0.014
#> 3 T 2 0 2 0.017
#> 4 T 2 0 3 0.022
#> 5 T 2 0 4 0.030
#> 6 T 2 0 5 0.039
Let’s plot the data.
The ipolygrowth
functions require the input data to be
in long format with a time variable and a dependent variable (y) as the
measure of growth in the input data. Both variables need to be numeric.
Time points in the time series can be any length (uneven and different
spacing allowed, even between technical replicates), as long as the time
scale is measured uniformly (same units used) across all samples.
We will use the ipg_singlesample()
function to calculate
growth curve parameters for one sample. Since the
growthrates
data contains 36 strain-concentration
combinations, we will use data from donor (D) with concentration 15.63,
to represent two replicates from a single sample. We will take out the
variables strain
and conc
from the data frame
to avoid confusion.
df.singlesample <- df.gr %>% filter(strain == "D", conc == 15.63) %>% select(-strain, -conc)
str(df.singlesample)
#> 'data.frame': 62 obs. of 3 variables:
#> $ replicate: int 2 2 2 2 2 2 2 2 2 2 ...
#> $ time : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ value : num 0.008 0.008 0.009 0.011 0.014 0.027 0.028 0.037 0.046 0.052 ...
table(df.singlesample$replicate, df.singlesample$time)
#>
#> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
#> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>
#> 28 29 30
#> 1 1 1 1
#> 2 1 1 1
Now we specify the data frame name, time variable, and y variable in
the ipg_singlesample()
function and save the results. The
output of ipg_singlesample()
contains a table of calculated
growth curve parameters, the polynomial model deriving the calculated
parameters, a table of \(\beta\)
coefficients, and a table of fitted values with time. All components can
be viewed by calling from the list output. Here, we show the table of
calculated growth curve parameters.
out.singlesample <- ipg_singlesample(data = df.singlesample, time.name = "time", y.name = "value")
#> max y time is equal to the largest value of "time"
out.singlesample$estimates %>%
kable %>% kable_styling("striped", full_width = F) # table formatting for rmarkdown
peak growth rate | peak growth time | doubling time | lag time | max y | max y time |
---|---|---|---|---|---|
0.0064197 | 7.246653 | 107.972 | 1.515281 | 0.1149354 | 30 |
We can use the fitted values and the original data to plot our results.
ggplot()+
geom_point(data = df.singlesample, aes(x = time, y = value, color = factor(replicate)))+
geom_line(data = out.singlesample$fitted, aes(x = time, y = fit))+
labs(color = "replicate")+
scale_x_continuous(n.breaks = 10)+
scale_y_continuous(n.breaks = 7)+
theme_bw()
Notice the printed message “max y time is equal to the largest value
of”time”” after ipg_singlesample()
is called. This message
appears when max y time is equal to the largest observed time point in
the sample and when the search algorithm to identify max y time did not
converge. Usually, this indicates the growth curve did not reach an
asymptote. If this message is printed, we recommend to plot the data
from the sample to ensure the calculated max y time is appropriate for
your data.
epsilon
is a tuning parameter to change the threshold in
the calculation of max y time. epsilon
must be between 0
and 1, and a small value is recommended as it represents the convergence
threshold as a fraction of the range of the dependent variable. The
default value is 0.2%. The table below shows how the max y time is
affected when epsilon
is set to 1%. For this data, a
reasonable value of epsilon
is 1% as it corresponds to max
growth at the end of the first growth phase.
out.singlesample2 <- ipg_singlesample(data = df.singlesample, time.name = "time", y.name = "value", epsilon = 1/100)
out.singlesample2$estimates %>%
kable %>% kable_styling("striped", full_width = F) # table formatting for rmarkdown
peak growth rate | peak growth time | doubling time | lag time | max y | max y time |
---|---|---|---|---|---|
0.0064197 | 7.246653 | 107.972 | 1.515281 | 0.1020701 | 23 |
If you have multiple samples, ipolygrowth
can calculate
polynomial growth curve parameters for each sample. An ID variable must
be included in the input data to uniquely identify each sample. For our
demonstration, we will create a new ID variable to represent multiple
biological samples.
df.gr2 <- df.gr %>% mutate(id = paste(strain, conc, sep = "-"))
str(df.gr2)
#> 'data.frame': 2232 obs. of 6 variables:
#> $ strain : Factor w/ 3 levels "D","R","T": 3 3 3 3 3 3 3 3 3 3 ...
#> $ replicate: int 2 2 2 2 2 2 2 2 2 2 ...
#> $ conc : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ time : int 0 1 2 3 4 5 6 7 8 9 ...
#> $ value : num 0.013 0.014 0.017 0.022 0.03 0.039 0.042 0.045 0.048 0.049 ...
#> $ id : chr "T-0" "T-0" "T-0" "T-0" ...
table(df.gr2$strain, df.gr2$conc)
#>
#> 0 0.24 0.49 0.98 1.95 3.91 7.81 15.63 31.25 62.5 125 250
#> D 62 62 62 62 62 62 62 62 62 62 62 62
#> R 62 62 62 62 62 62 62 62 62 62 62 62
#> T 62 62 62 62 62 62 62 62 62 62 62 62
unique(df.gr2$id)
#> [1] "T-0" "T-0.24" "T-0.49" "T-0.98" "T-1.95" "T-3.91" "T-7.81"
#> [8] "T-15.63" "T-31.25" "T-62.5" "T-125" "T-250" "D-0" "D-0.24"
#> [15] "D-0.49" "D-0.98" "D-1.95" "D-3.91" "D-7.81" "D-15.63" "D-31.25"
#> [22] "D-62.5" "D-125" "D-250" "R-0" "R-0.24" "R-0.49" "R-0.98"
#> [29] "R-1.95" "R-3.91" "R-7.81" "R-15.63" "R-31.25" "R-62.5" "R-125"
#> [36] "R-250"
The input data foripg_multisample()
is the same long
format data used for ipg_singlesample()
with the addition
of the ID variable. epsilon
is specified as a single value
that will be applied across samples. It can also be input as a vector of
values, one for each sample, thus allowing different thresholds for each
sample.
out.multi.f <- ipg_multisample(data = df.gr2, id = "id", time.name = "time", y.name = "value", epsilon = 0.2/100)
#> max y time is equal to the largest value of "time" in 31 samples.
out.multi.f$estimates %>%
kable() %>% kable_styling("striped", full_width = F) %>% scroll_box(width = "800px", height = "300px") # table formatting for rmarkdown
id | peak growth rate | peak growth time | doubling time | lag time | max y | max y time |
---|---|---|---|---|---|---|
D-0 | 0.0054743 | 3.636922 | 126.61846 | 0.1231345 | 0.1073791 | 30 |
D-0.24 | 0.0047571 | 3.564325 | 145.70717 | 0.1410942 | 0.0765797 | 21 |
D-0.49 | 0.0051573 | 2.613987 | 134.40205 | 0.0384437 | 0.1027946 | 30 |
D-0.98 | 0.0054686 | 3.170291 | 126.75098 | 0.0728170 | 0.0962677 | 30 |
D-1.95 | 0.0055555 | 3.241725 | 124.76779 | 0.0779164 | 0.0961664 | 30 |
D-125 | 0.0010447 | 8.343843 | 663.46846 | 1.7678931 | 0.0312341 | 30 |
D-15.63 | 0.0064197 | 7.246653 | 107.97198 | 1.5152814 | 0.1149354 | 30 |
D-250 | 0.0011268 | 0.000000 | 615.12569 | 0.0000000 | 0.0426043 | 30 |
D-3.91 | 0.0056744 | 4.503602 | 122.15256 | 0.2438311 | 0.1014891 | 30 |
D-31.25 | 0.0084764 | 11.111193 | 81.77329 | 6.7640880 | 0.1395516 | 30 |
D-62.5 | 0.0053191 | 24.468397 | 130.31169 | 13.6297131 | 0.0906272 | 30 |
D-7.81 | 0.0058283 | 4.077346 | 118.92788 | 0.1753184 | 0.1074358 | 30 |
R-0 | 0.0053276 | 2.465975 | 130.10494 | 0.0481645 | 0.0667463 | 17 |
R-0.24 | 0.0013591 | 17.532117 | 510.01794 | 5.5878873 | 0.0456137 | 30 |
R-0.49 | 0.0026599 | 14.471380 | 260.59538 | 4.8918821 | 0.0732316 | 30 |
R-0.98 | 0.0018747 | 17.529965 | 369.73801 | 6.0976827 | 0.0486636 | 30 |
R-1.95 | 0.0015789 | 18.956604 | 438.99250 | 5.1036184 | 0.0542633 | 30 |
R-125 | 0.0007020 | 6.825708 | 987.42540 | 0.9884225 | 0.0212105 | 30 |
R-15.63 | 0.0015305 | 10.817963 | 452.89623 | 4.8301017 | 0.0416298 | 30 |
R-250 | 0.0008737 | 0.000000 | 793.31947 | 0.0000000 | 0.0363853 | 30 |
R-3.91 | 0.0012402 | 14.589281 | 558.89347 | 4.8800780 | 0.0415794 | 30 |
R-31.25 | 0.0009972 | 8.036184 | 695.10246 | 2.5311519 | 0.0251161 | 30 |
R-62.5 | 0.0013886 | 9.196834 | 499.15604 | 2.6981003 | 0.0344943 | 30 |
R-7.81 | 0.0013439 | 10.387510 | 515.76271 | 4.0522071 | 0.0373071 | 30 |
T-0 | 0.0065034 | 0.000000 | 106.58301 | 0.0000000 | 0.0701819 | 30 |
T-0.24 | 0.0074865 | 0.000000 | 92.58581 | 0.0000000 | 0.0668370 | 30 |
T-0.49 | 0.0072152 | 0.000000 | 96.06805 | 0.0000000 | 0.0874455 | 30 |
T-0.98 | 0.0056525 | 0.000000 | 122.62676 | 0.0000000 | 0.0886480 | 30 |
T-1.95 | 0.0068241 | 0.000000 | 101.57321 | 0.0000000 | 0.1030483 | 30 |
T-125 | 0.0008745 | 0.000000 | 792.60224 | 0.0000000 | 0.0176879 | 19 |
T-15.63 | 0.0049651 | 7.690180 | 139.60458 | 2.4044653 | 0.0852344 | 30 |
T-250 | 0.0012768 | 0.000000 | 542.88605 | 0.0000000 | 0.0407266 | 30 |
T-3.91 | 0.0062803 | 0.000000 | 110.36914 | 0.0000000 | 0.0956584 | 30 |
T-31.25 | 0.0074080 | 18.546689 | 93.56768 | 10.8612979 | 0.1182624 | 28 |
T-62.5 | 0.0010513 | 4.495226 | 659.34379 | 0.5534449 | 0.0206101 | 16 |
T-7.81 | 0.0056621 | 0.000000 | 122.41853 | 0.0000000 | 0.1005954 | 30 |
We can use the same method to plot the fitted growth curves.