Posts

Getting a start in the tech industry

Deciding to go into the tech industry from academic linguistics is a decision I made after trying the academic job market out for a year, getting nothing out of it, realizing that I didn’t want to move, and learning about some really interesting jobs outside of the academy. I was never really against the idea, so my “leaving academia” story doesn’t come with much emotional baggage. A few months into my job, I could already tell that it was the right decision for me. This post is a mix of me telling my job-search story, the resources that helped me out a lot, and miscellaneous bits of information I learned throughout the process. I don’t claim to be an expert in getting hired, but perhaps you’ll find something you can use. It’s a bit of a hodgepodge, and I may update it in the future. ...

New year, new linguistics doctor

This is a short blog post. If you’re here, maybe you already know this—I’m thrilled to say that I’m officially a doctor of linguistics, a program-completed, library-approved, out-of-the-academy, doctor of linguistics. ...

git disseration, a quick guide to the non-writing parts of my workflow

I’m currently writing my dissertation with LaTeX in a private GitHub repository. Props to the guy on Twitter writing his dissertation in full public view, but I’ll pass. Maybe I’ll make it public after my defense, and you can gaze into the crystal ball of my writing habits… 🔮✍🏼 (or more likely, adapt my analysis code). In the meantime, here’s a quick guide to the non-writing parts of writing with a latex template, version control, and a cloud backup. It’s certainly given me some peace of mind. You can do it, too! ...

Creating a speech corpus #1: Before you begin

Before you start collecting data, you need to do some due diligence. Because as important as speech data sets are, they are not trivial to create, and you need to balance what you want from the data with the time and resources you can access. I don’t mean to suggest that developing speech data sets isn’t important (it is), but rather that it needs to happen after careful consideration. This post gets at some of the things you’ll want to think about before you start planning your dream corpus. ...

Creating a speech corpus: A new blog series

So you want to create a new speech dataset? There are a lot of things to consider at every stage of the process. This is the first (introductory) post in a series I’m starting on the topic, based on my experience developing the SpiCE corpus of Speech in Cantonese and English. There are undoubtedly things I could have done better, but in any case, I certainly learned a lot about speech data along the way. ...