The SpiCE corpus is officially out!

If you follow my work, you probably know that I鈥檝e spent the last few years working on a new open-access Cantonese-English bilingual speech corpus. Well, today makes the first official release of the data into the wild. I鈥檓 thrilled to share it with all of you, and 100% expect to see lots of great work with the corpus. There are many places you can learn more about it, starting here. ...

20 May 2021 路 2 min 路 Khia A. Johnson

Sibilant trajectories with Python + praat-parselmouth

Once I鈥檝e identified a sample of speech sounds that I want to analyze, the next step is to do that analysis. There are obviously many ways to go about this process. Here, I鈥檒l walk through an example of measuring sibilant trajectories with the fantastic praat-parselmouth Python package. It鈥檚 my current favorite technique for avoiding Praat scripting. ...

3 May 2021 路 5 min 路 Khia A. Johnson

You should use tidylog in your #rstats corpus phonetics workflow

Last week, I asked #rstats twitter for a bit of help with something that has always felt clunky in my R code but was never annoying enough to actually fix. In corpus phonetics, you typically start with a large data set, make measurements, and then use informed criteria to filter out errors to the best of your ability, because measurements can be wrong. When you go to share your findings, you need to report how many items were removed (and why). To do this, you have to keep track. Sure, alternating between filter() and print(nrows(df)) works, but it鈥檚 clunky. I鈥檓 starting to think that maybe I should have been annoyed earlier. ...

29 Mar 2021 路 5 min 路 Khia A. Johnson

TextGrid 鈬 SQLite database in a few steps

So you want to work with annotated speech? I find many (not all!) purpose-built tools for corpus phonetics to be slow, buggy, inflexible, or incomplete, while simultaneously promising to do way more than I actually need. Building a SQLite database from .TextGrid files ended up being a straightforward solution, and it wasn鈥檛 hard to do. Here鈥檚 a quick tutorial. ...

23 Mar 2021 路 7 min 路 Khia A. Johnson

A handful of categorical variable coding resources

Categorical variables are something I think about a lot in my psycholinguistic research, and they aren鈥檛 often given enough time in introductions to mixed effects modeling. I had originally planned to do a write-up on them here, but have now found enough good resources鈥攑apers and talks鈥攖hat I think I鈥檒l just list them with some accompanying comments. ...

9 Mar 2021 路 3 min 路 Khia A. Johnson