Bioi

Zachary Colburn

2019-12-07

Bioi provides functions to perform connected component labeling on 1, 2, or 3 dimensional arrays, single linkage clustering, and identification of the nearest neighboring point in a second data set.

Determine the minimum separation distance between two sets of points

Dual channel PALM or iPALM data can represent the localizations of two distinct fluorophore-labeled proteins. To determine the minimum distance separating each localization of protein A from protein B, the following code could be implemented. Each row in the output of find_min_dists corresponds to the point denoted by the same row in mOne. “dist” is the distance to the nearest point in mTwo, whose row number in mTwo is given in the “index” column.

library(Bioi)

# Generate two data sets to simulate 3D PALM data.
set.seed(10)

mOne <- as.matrix(data.frame(
  x = rnorm(10),
  y = rbinom(10, 100, 0.5),
  z = runif(10)
))

mTwo <- as.matrix(data.frame(
  x = rnorm(20),
  y = rbinom(20, 100, 0.5),
  z = runif(20)
))

# Get separation distances.
find_min_dists(mOne, mTwo)
#>         dist index
#> 1  2.1030934    11
#> 2  0.1711268    12
#> 3  1.0770476     9
#> 4  2.3690046     4
#> 5  0.6385089    12
#> 6  1.5313912    15
#> 7  0.4466117    16
#> 8  0.8946710    10
#> 9  1.4894504    13
#> 10 0.5430501    15

Single linkage clustering

PALM/iPALM localizations in very close proximity may belong to the same superstructure, for example a single focal adhesion. By linking localizations separated by less than some empirically determined distance, these different super structures can be identified. Below, 2D PALM data is simulated and all points falling within a distance of 0.8 are linked.

# Generate random data.
set.seed(10)
input <- as.matrix(data.frame(x=rnorm(10),y=rnorm(10)))

# Perform clustering.
groups <- euclidean_linker(input, 0.8)
#> ||==================================================||
#> ||..................................................||

print(groups)
#>  [1] 3 3 2 3 3 3 2 3 1 3

Clusters can be easily visualized using ggplot2.

library(ggplot2)
input <- as.data.frame(input)
input$group <- as.character(groups)
ggplot(input, aes(x, y, colour = group)) + geom_point(size = 3)

Label connected components (i.e. find contiguous blobs)

Following background subtraction and thresholding, distinct cellular structures (focal adhesions for example) can be identified in fluorescent microscopy images. Array elements that are connected horizontally/vertically or diagonally are labeled with the same group number. Each contiguous object is labeled with a different group number. The function find_blobs can be used to implement this functionality in 1, 2, or 3 dimensions. Below, this operations is performed on a matrix structure.

# Generate a random matrix.
set.seed(10)
mat <- matrix(runif(70), nrow = 7)

# Arbitrarily say that everything below 0.8 is background.
logical_mat <- mat > 0.8

# Row names and column names are preserved in the output of find_blobs
rownames(logical_mat) <- letters[1:7]
colnames(logical_mat) <- 1:10

# Find blobs
find_blobs(logical_mat)
#> ||==================================================||
#> ||..................................................||
#>    1  2  3  4  5  6  7  8  9 10
#> a NA NA NA NA NA NA NA  3 NA NA
#> b NA NA NA NA NA  2 NA NA NA  5
#> c NA NA NA NA NA  2 NA  4 NA NA
#> d NA NA NA NA NA NA NA NA NA NA
#> e NA NA NA NA NA NA NA NA NA NA
#> f NA NA  1  1  1 NA NA NA NA NA
#> g NA NA  1 NA NA NA NA NA NA NA