Phonetic fieldwork and experiments with phonfieldwork package

G. Moroz, NRU HSE Linguistic Convergence Laboratory

2024-04-08

Introduction

There are a lot of different typical tasks that have to be solved during phonetic research and experiments. This includes creating a presentation that will contain all stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in ‘Praat’ TextGrids (this is one of the sound annotation standards provided by ‘Praat’ software, see Boersma & Weenink 2020 https://www.fon.hum.uva.nl/praat/), creating an html table with annotations and spectrograms, and converting multiple formats (‘Praat’ TextGrid, ‘EXMARaLDA’ Schmidt and Wörner (2009) and ‘ELAN’ Wittenburg et al. (2006)). All of these tasks can be solved by a mixture of different tools (any programming language has programs for automatic renaming, and Praat contains scripts for concatenating and renaming files, etc.). phonfieldwork provides a functionality that will make it easier to solve those tasks independently of any additional tools. You can also compare the functionality with other packages: ‘rPraat’ Bořil and Skarnitzl (2016), ‘textgRid’ Reidy (2016), ‘pympi’ Lubbers and Torreira (2013) (thx to Lera Dushkina and Anya Klezovich for letting me know about pympi).

There are a lot of different books about linguistic fieldwork and experiments (e.g. Gordon (2003), Bowern (2015)). This tutorial covers only the data organization part. I will focus on cases where the researcher clearly knows what she or he wants to analyze and has already created a list of stimuli that she or he wants to record. For now phonfieldwork works only with .wav(e) and .mp3 audiofiles and .TextGrid, .eaf, .exb, .srt, Audacity .txt and .flextext annotation formats, but the main functionality is availible for .TextGrid files (I plan to extend its functionality to other types of data). In the following sections I will describe my workflow for phonetic fieldwork and experiments.

Install the package

Before you start, make sure that you have installed the package, for example with the following command:

install.packages("phonfieldwork")

This command will install the last stable version of the phonfieldwork package from CRAN. Since CRAN runs multiple package checks before making it available, this is the safest option. Alternatively, you can download the development version from GitHub:

install.packages("remotes")
remotes::install_github("ropensci/phonfieldwork")

If you have any trouble installing the package, you will not be able to use its functionality. In that case you can create an issue on Github or send an email. Since this package could completely destroy your data, please do not use it until you are sure that you have made a backup.

Use the library() command to load the package:

library("phonfieldwork")

In order to work with some rmarkdown functions you will need to install pandoc, see vignette("pandoc") for the details.

This tutorial was made using the following version of phonfieldwork:

packageVersion("phonfieldwork")
## [1] '0.0.12'

This tutorial can be cited as follows:

citation("phonfieldwork")
## To cite package 'phonfieldwork' in publications use:
## 
##   Moroz G (2023). "Phonetic fieldwork research and experiments with the R
##   package phonfieldwork." In Kobozeva I, Semyonova K, Kostyuk A, Zakharov L,
##   Svetozarova N (eds.), _«…Vperyod i vverkh po lestnitse zvuchashey». Sbornik
##   statye k 80-letiyu Olgi Fyodorovny Krivnovoy [Festschrift in memoriam to Olga
##   Fyodorovna Krivnova]_. Buki Vedi, Moscow.
## 
##   Moroz G (2020). _Phonetic fieldwork and experiments with phonfieldwork
##   package_. <https://CRAN.R-project.org/package=phonfieldwork>.
## 
## To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)',
## 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.

If you have any trouble using the package, do not hesitate to create an issue on Github.

Philosophy of the phonfieldwork package

Most phonetic research consists of the following steps:

  1. Formulate a research question. Think of what kind of data is necessary to answer this question, what is the appropriate amount of data, what kind of annotation you will do, what kind of statistical models and visualizations you will use, etc.
  2. Create a list of stimuli.
  3. Elicite list of stimuli from speakers who signed an Informed Consent statement, agreeing to participate in the experiment to be recorded on audio and/or video. Keep an eye on recording settings: sampling rate, resolution (bit), and number of channels should be the same across all recordings.
  4. Annotate the collected data.
  5. Extract the collected data.
  6. Create visualizations and evaluate your statistical models.
  7. Report your results.
  8. Publish your data.

The phonfieldwork package is created for helping with items 3, partially with 4, and 5 and 8.

To make the automatic annotation of data easier, I usually record each stimulus as a separate file. While recording, I carefully listen to my consultants to make sure that they are producing the kind of speech I want: three isolated pronunciations of the same stimulus, separated by a pause and contained in a carrier phrase. In case a speaker does not produce three clear repetitions, I ask them to repeat the task, so that as a result of my fieldwork session I will have:

There are some phoneticians who prefer to record everything, for language documentation purposes. I think that should be a separate task: you can’t have your cake and eat it too. But if you insist on recording everything, it is possible to run two recorders at the same time: one could run during the whole session, while the other is used to produce small audio files. You can also use special software to record your stimuli automatically on a computer (e.g. SpeechRecorder or PsychoPy).

You can show a native speaker your stimuli one by one or not show them the stimule but ask them to pronounce a certain stimulus or its translation. I use presentations to collect all stimuli in a particular order without the risk of omissions.

Since each stimulus is recorded as a separate audiofile, it is possible to merge them into one file automatically and make an annotation in a Praat TextGrid (the same result can be achieved with the Concatenate recoverably command in Praat). After this step, the user needs to do some annotation of her/his own. When the annotation part is finished, it is possible to extract the annotated parts to a table, where each annotated object is a row characterised by some features (stimulus, repetition, speaker, etc…). You can play the soundfile and view its oscilogram and spectrogram. Here is an example of such a file and instruction for doing it.

The phonfieldwork package in use

Make a list of your stimuli

There are several ways to enter information about a list of stimuli into R:

my_stimuli <- c("tip", "tap", "top")
my_stimuli_df <- read.csv("my_stimuli_df.csv")
my_stimuli_df
##   stimuli vowel
## 1     tip     ı
## 2     tap     æ
## 3     top     ɒ
library("readxl")
# run install.packages("readxl") in case you don't have it installed
my_stimuli_df <- read_xlsx("my_stimuli_df.xlsx")
my_stimuli_df
## # A tibble: 3 × 2
##   stimuli vowel
##   <chr>   <chr>
## 1 tip     ı    
## 2 tap     æ    
## 3 top     ɒ

Create a presentation based on a list of stimuli

When the list of stimuli is loaded into R, you can create a presentation for elicitation. It is important to define an output directory, so in the following example I use the getwd() function, which returns the path to the current working directory. You can set any directory as your current one using the setwd() function. It is also possible to provide a path to your intended output directory with output_dir (e. g. “/home/user_name/…”). This command (unlike setwd()) does not change your working directory.

create_presentation(stimuli = my_stimuli_df$stimuli,
                    output_file = "first_example",
                    output_dir = getwd())

As a result, a file “first_example.html” was created in the output folder. You can change the name of this file by changing the output_file argument. The .html file now looks as follows:

It is also possible to change the output format, using the output_format argument. By dafault it is “html”, but you can also use “pptx” (this is a relatively new feature of rmarkdown, so update the package in case you get errors). There is also an additional argument translations, where you can provide translations for stimuli in order that they appeared near the stimuli on the slide.

It is also possible to use images (or gif, e. g. for a sign language research) as a stimuli. In order to do that you need to provide an absolute or relative path to the file instead of the stimulus and mark in the external argument, which of the stimuli is external:

my_image <- system.file("extdata", "r-logo.png", package = "phonfieldwork")
my_image
## [1] "/home/agricolamz/R/x86_64-pc-linux-gnu-library/4.3/phonfieldwork/extdata/r-logo.png"
create_presentation(stimuli = c("rzeka", "drzewo", my_image),
                    external = 3,
                    output_file = "second_example",
                    output_dir = getwd())

Rename collected data

After collecting data and removing soundfiles with unsuccesful elicitations, one could end up with the following structure:

## ├── s1
## │   ├── 01.wav
## │   ├── 02.wav
## │   └── 03.wav
## ├── s2
## │   ├── 01.wav
## │   ├── 02.wav
## │   └── 03.wav

For each speaker s1 and s2 there is a folder that containes three audiofiles. Now let’s rename the files.

rename_soundfiles(stimuli = my_stimuli_df$stimuli,
                  prefix = "s1_",
                  path = "s1/")
## You can find change correspondences in the following file:
## /home/agricolamz/work/packages/phonfieldwork/vignettes/s1/backup/logging.csv

As a result, you obtain the following structure:

## ├── s1
## │   ├── 1_s1_tip.wav
## │   ├── 2_s1_tap.wav
## │   ├── 3_s1_top.wav
## │   └── backup
## │       ├── 01.wav
## │       ├── 02.wav
## │       ├── 03.wav
## │       └── logging.csv
## ├── s2
## │   ├── 01.wav
## │   ├── 02.wav
## │   └── 03.wav

The rename_soundfiles() function created a backup folder with all of the unrenamed files, and renamed all files using the prefix provided in the prefix argument. There is an additional argument backup that can be set to FALSE (it is TRUE by default), in case you are sure that the renaming function will work properly with your files and stimuli, and you do not need a backup of the unrenamed files. There is also an additional argument logging (TRUE by default) that creates a logging.csv file in the backup folder (or in the original folder if the backup argument has value FALSE) with the correspondences between old and new names of the files. Here is the contence of the logging.csv:

##     from           to
## 1 01.wav 1_s1_tip.wav
## 2 02.wav 2_s1_tap.wav
## 3 03.wav 3_s1_top.wav

To each name was added an additional prefix with number that make it easear to keep the original sorting of the stimuli. If you do not want this autonumbering turn the autonumbering to FALSE:

rename_soundfiles(stimuli = my_stimuli_df$stimuli,
                  prefix = "s2_",
                  suffix = paste0("_", 1:3),
                  path = "s2/",
                  backup = FALSE,
                  logging = FALSE,
                  autonumbering = FALSE)
## ├── s1
## │   ├── 1_s1_tip.wav
## │   ├── 2_s1_tap.wav
## │   ├── 3_s1_top.wav
## │   └── backup
## │       ├── 01.wav
## │       ├── 02.wav
## │       ├── 03.wav
## │       └── logging.csv
## ├── s2
## │   ├── s2_tap_2.wav
## │   ├── s2_tip_1.wav
## │   └── s2_top_3.wav

The last command renamed the soundfiles in the s2 folder, adding the prefix s2 as in the previous example, and the suffix 1-3. On most operating systems it is impossible to create two files with the same name, so sometimes it can be useful to add some kind of index at the end of the files.

There is also a possible scenario, that not all stimuli are retrieved from informant. So in order to deal with that case there is an additional argument missing, where user can put id numbers of stimuli that are not present in audiofiles:

rename_soundfiles(stimuli = my_stimuli_df$stimuli,
                  path = "s3/",
                  missing = c(1, 3))

Sometimes it is useful to get information about sound duration:

get_sound_duration("s1/2_s1_tap.wav")
##           file  duration
## 1 2_s1_tap.wav 0.4821542

It is also possible to analyze the whole folder using the read_from_folder() function. The first argument is the path to the folder. The second argument is the type of information or file type (possible values: “audacity”, “duration”, “eaf”, “exb”, “flextext”, “formant”, “intensity”, “picth”, “srt”, “textgrid”):

read_from_folder(path = "s2/", "duration")
##           file  duration
## 1 s2_tap_2.wav 0.5343991
## 2 s2_tip_1.wav 0.5866440
## 3 s2_top_3.wav 0.6650113

For now phonfieldwork works only with .wav(e) and .mp3 sound files.

Merge all data together

After all the files are renamed, you can merge them into one. Remmber that sampling rate, resolution (bit), and number of channels should be the same across all recordings. It is possible to resample files with the resample() function from biacoustics.

concatenate_soundfiles(path = "s1/",
                       result_file_name = "s1_all")

This comand creates a new soundfile s1_all.wav and an asociated Praat TextGrid s1_all.TextGrid:

## ├── s1
## │   ├── 1_s1_tip.wav
## │   ├── 2_s1_tap.wav
## │   ├── 3_s1_top.wav
## │   ├── backup
## │   │   ├── 01.wav
## │   │   ├── 02.wav
## │   │   ├── 03.wav
## │   │   └── logging.csv
## │   ├── s1_all.TextGrid
## │   └── s1_all.wav
## ├── s2
## │   ├── s2_tap_2.wav
## │   ├── s2_tip_1.wav
## │   └── s2_top_3.wav

The resulting file can be parsed with Praat:

Sometimes recorded sounds do not have any silence at the beginning or the end, so after the merging the result utterances will too close to each other. It is possible to fix using the argument separate_duration of the concatenate_soundfiles() function: just put the desired duration of the separator in seconds.

It is not kind of task that could occur within phonfieldwork philosophy, but it also possible to merge multiple .TextGrids with the same tier structure using concatente_textgrids() function.

Annotate your data

It is possible to annotate words using an existing annotation:

my_stimuli_df$stimuli
## [1] "tip" "tap" "top"
annotate_textgrid(annotation =  my_stimuli_df$stimuli,
                  textgrid = "s1/s1_all.TextGrid")