Getting started with nfer

Sean Kauffman

2021-11-12

To get started with the nfer R interface, one option is to attach the library.
The recommended way to use nfer, though, is to just specify the nfer namespace whenever you use an nfer function. Throughout this vignette, we’ll use the nfer namespace.

library(nfer)
#> 
#> Attaching package: 'nfer'
#> The following objects are masked from 'package:base':
#> 
#>     apply, load

There are four functions provided by the nfer package:

To initialize a specification that can be applied to a dataframe of events, use the load function. This function takes two parameters: the path to an nfer specification file and the log level (optional).

ssps <- nfer::load(system.file("extdata", "ssps.nfer", package = "nfer"))

This specification can then be applied to a dataframe containing events. There should be at least two columns, the first of which is a character type containing the event names, and the second of which is either an integer or a character type containing the event timestamps.

The reason for representing timestamps as strings is that integers in R are limited to 32 bits, so if you need larger numbers (say, if you have millisecond granularity Unix timestamps), they must be character type. Technically numeric type timestamp columns are supported but discouraged, because they risk loss of precision during floating-point conversion. Internally, timestamps are represented by nfer as 64-bit integers. Currently the R wrappers will automatically convert factor columns to character columns.

ssps <- nfer::load(system.file("extdata", "ssps.nfer", package = "nfer"))
df <- read.table(system.file("extdata", "ssps.events", package = "nfer"), sep="|", header=FALSE, colClasses = "character")
intervals <- nfer::apply(ssps, df)
summary(intervals) 
#>      name               start                end           
#>  Length:743         Min.   :8.238e+05   Min.   :1.080e+09  
#>  Class :character   1st Qu.:8.909e+11   1st Qu.:9.007e+11  
#>  Mode  :character   Median :1.791e+12   Median :1.800e+12  
#>                     Mean   :1.787e+12   Mean   :1.799e+12  
#>                     3rd Qu.:2.677e+12   3rd Qu.:2.699e+12  
#>                     Max.   :3.599e+12   Max.   :3.601e+12

If the data frame has more than two columns, the 3rd on will be used as data.
Events will be assigned data values with a name equal to the name of the column whenever the value in the cell corresponding to that event and column has a value other than NA. The read function will load event files formatted for the command-line version of nfer into a dataframe formatted for the R version.

test <- nfer::load(system.file("extdata", "ops.nfer", package = "nfer"))
ops <- nfer::read(system.file("extdata", "ops.events", package = "nfer"))
str(ops)
#> 'data.frame':    300 obs. of  4 variables:
#>  $ Name   : chr  "ON" "ON" "TEST" "ON" ...
#>  $ Time   : int  1090 1148 1760 2206 2330 2357 3106 3186 3298 3688 ...
#>  $ id     : chr  "idf0e9ad0e-5474-4ef7-a170-24503301e30f" "id46c21410-c8b3-4581-90b4-402248eb3483" "idf0e9ad0e-5474-4ef7-a170-24503301e30f" "id8e13ec1f-ae66-48b2-87d0-df7256f0ad1a" ...
#>  $ success: logi  NA NA TRUE NA TRUE NA ...
intervals <- nfer::apply(test, ops)
summary(intervals)
#>      name               start            end             s            
#>  Length:209         Min.   : 1090   Min.   : 2357   Length:209        
#>  Class :character   1st Qu.:15740   1st Qu.:17357   Class :character  
#>  Mode  :character   Median :28473   Median :29371   Mode  :character  
#>                     Mean   :29017   Mean   :29685                     
#>                     3rd Qu.:43630   3rd Qu.:44189                     
#>                     Max.   :55129   Max.   :55261                     
#>       id           
#>  Length:209        
#>  Class :character  
#>  Mode  :character  
#>                    
#>                    
#> 

The nfer mining algorithm can also be used from R using the learn function. The function takes a single parameter which is a data frame of events.
There should be two columns, the first of which is a character type containing the event names, and the second of which is an integer, string, or numeric type containing the event timestamps. learn also has the same optional argument as load which is the log level.

The specification returned from learn can then be applied to a trace using apply just like if it had been loaded from a specification file.

df <- read.table(system.file("extdata", "ssps.events", package = "nfer"), sep="|", header=FALSE)
learned <- nfer::learn(df)
intervals <- nfer::apply(learned, df)
summary(intervals)
#>      name               start                end           
#>  Length:197         Min.   :8.238e+05   Min.   :1.080e+09  
#>  Class :character   1st Qu.:8.909e+11   1st Qu.:8.937e+11  
#>  Mode  :character   Median :1.800e+12   Median :1.800e+12  
#>                     Mean   :1.786e+12   Mean   :1.788e+12  
#>                     3rd Qu.:2.676e+12   3rd Qu.:2.676e+12  
#>                     Max.   :3.599e+12   Max.   :3.601e+12