The Bilingual Annotation TaskS Force is an interdisciplinary research group of faculty and students from the disciplines of linguistics, computer science, and electrical engineering with shared interests in using computational techniques for the analysis of language mixing in bilingual speech.
We are seeking a data scientist with an interest in linguistics. Specifically, we seek someone who can scrape language data from the Web and/or Twitter and spread, gather or unite variables, deal with missing values, visualize data, and work with strings and, possibly, with relational data. Experience in python and/or R and an interest in human languages other than English are helpful.
The project is on-going and interns typically serve several semesters.
Assist in the creation of mixed language corpora via web scraping and in the tidying and visualization of linguistic data.