git disseration, a quick guide to the non-writing parts of my workflow

I’m currently writing my dissertation with LaTeX in a private GitHub repository. Props to the guy on Twitter writing his dissertation in full public view, but I’ll pass. Maybe I’ll make it public after my defense, and you can gaze into the crystal ball of my writing habits… 🔮✍🏼 (or more likely, adapt my analysis code). In the meantime, here’s a quick guide to the non-writing parts of writing with a latex template, version control, and a cloud backup. It’s certainly given me some peace of mind. You can do it, too! ...

26 Jul 2021 · 6 min · Khia A. Johnson

Creating a speech corpus #1: Before you begin

Before you start collecting data, you need to do some due diligence. Because as important as speech data sets are, they are not trivial to create, and you need to balance what you want from the data with the time and resources you can access. I don’t mean to suggest that developing speech data sets isn’t important (it is), but rather that it needs to happen after careful consideration. This post gets at some of the things you’ll want to think about before you start planning your dream corpus. ...

20 Jul 2021 · 6 min · Khia A. Johnson

Creating a speech corpus: A new blog series

So you want to create a new speech dataset? There are a lot of things to consider at every stage of the process. This is the first (introductory) post in a series I’m starting on the topic, based on my experience developing the SpiCE corpus of Speech in Cantonese and English. There are undoubtedly things I could have done better, but in any case, I certainly learned a lot about speech data along the way. ...

29 Jun 2021 · 2 min · Khia A. Johnson

The SpiCE corpus is officially out!

If you follow my work, you probably know that I’ve spent the last few years working on a new open-access Cantonese-English bilingual speech corpus. Well, today makes the first official release of the data into the wild. I’m thrilled to share it with all of you, and 100% expect to see lots of great work with the corpus. There are many places you can learn more about it, starting here. ...

20 May 2021 · 2 min · Khia A. Johnson

Sibilant trajectories with Python + praat-parselmouth

Once I’ve identified a sample of speech sounds that I want to analyze, the next step is to do that analysis. There are obviously many ways to go about this process. Here, I’ll walk through an example of measuring sibilant trajectories with the fantastic praat-parselmouth Python package. It’s my current favorite technique for avoiding Praat scripting. ...

3 May 2021 · 5 min · Khia A. Johnson