Data scraping and wrangling

This project is closed.

The Bilingual Annotation TaskS Force is an interdisciplinary research group of faculty and students from the disciplines of linguistics, computer science, and electrical engineering with shared interests in using computational techniques for the analysis of language mixing in bilingual speech.

Qualifications

We are seeking a data scientist with an interest in linguistics. Specifically, we seek someone who can scrape language data from the Web and/or Twitter and spread, gather or unite variables, deal with missing values, visualize data, and work with strings and, possibly, with relational data. Experience in python and/or R and an interest in human languages other than English are helpful.

Project Timeline

The project is on-going and interns typically serve several semesters.

Duties

Assist in the creation of mixed language corpora via web scraping and in the tidying and visualization of linguistic data.

Typical Time Commitment
5/week
Desired Length of Commitment
2 semesters

I'M INTERESTED IN THIS PROJECT. WHAT SHOULD I DO NEXT?

The Office of Undergraduate Research recommends that you attend an info session or advising before contacting faculty members or project contacts about research opportunities. We'll cover the steps to get involved, tips for contacting faculty, funding possibilities, and options for course credit. Once you have attended an Office of Undergraduate Research info session or spoken to an advisor, you can use the "Who to contact" details for this project to get in touch with the project leader and express your interest in getting involved.

Have you tried contacting professors and need more help? Schedule an appointment for additional support.