UC San Diego Data Science projects
July 15, 2017 by Bradley Voytek
In my last post I outlined how and why I taught my Data Science in Practice (Cognitive Science 108) class here at UC San Diego. The sole purpose of this post is to show off the incredible talent of my students and to showcase the work they put into their Final Projects. The purpose of the project was to find real-world problems and datasets that can be analyzed with the techniques learned in class. Here are the instructions I gave the students:It is imperative that by doing so you believe extra information will be gained – that you believe you can discover something new! Your question could be just for fun ( e.g. , “What are commonly misheard song lyrics?”), scientific (e.g., “How do different cultures perceive different colors?”), or, ideally, aimed at civic or social good (e.g., “What parts of San Diego are most in need of dedicated bike lanes?”)Edit: The rationale behind why I chose to have the students put together a Final Project like this was so that they could build a Data Science portfolio to show off their skills. Data Science is multifaceted, and requires programming, statistics, data visualization, communication, information aggregation, and so on. When I was recruiting for Data Science positions back in my industry days, students with a portfolio of projects absolutely jumped out at us. In addition, for this first offering group had the option of submitting their projects early in order to participate in a special judging panel lead by former (and first) US Chief Data Scientist and friend, DJ Patil (twitter, wikipedia). DJ is a UC San Diego alumnus (Math, 1996) and happened to be in town the weekend before finals week to receive an Honored Alumni Award from the Chancellor. (You can also watch an interview I did with DJ that weekend below.) While he was in town, he agreed to watch eight of the best groups each give a 5-minute presentation on their final projects. To be considered for this challenge, the following rules needed to be followed:
- Communicate your results effectively to both experts and laypersons.
- Use data scientific approaches to address questions specifically concerning civic utility and social good.
The upper left quadrant contains all census tracts that are below median income and above median public transportation use. Because we have identified that lower income neighborhoods use more public transportation in general, it makes sense to us that this quadrant would represent the neighborhoods that are most reliant on public transportation infrastructure. Of course improved infrastructure would likely increase public transportation use, bringing some census tracts from below the median public transit usage line into this quadrant, but use the information contained in this plot as a preliminary measure of reliance. This population is actually the best served by public transportation. From our multiple linear regression, even when you hold one fixed and vary the other, low income and high public transportation use each predict high transit score. You can see this plainly from the above plot since most census tracts with 'good' transit score (in blue) lie in this region. However, there are still lots of census tracts in this most reliant population with low transit scores. The census tracts that are in this quadrant and have transit scores of less than 50 (we will refer to these census tracts as 'underserved') account for 20.6% of San Diego's population. 41.2% of this underserved population spends over 30% of their income on rent, a critical indicator of poverty.I wish I had the time here to expound the virtues of each of the below projects in detail, sadly I do not. But please do not take my lack of detailed comments as an unspoken commentary on the amazingness of their work. Please do take the time to look at each of these, as they are truly remarkable!
Finalists
"Crime 'n' Booze"
Jenny Hamer, Aparna Rangamani, Jairo Chavez
Chicago Traffic Violations
Arun Sugumar, Zichao Wu, Xiaoxin Xu, Qixin Ding, Lijiu Liang
San Diego Infrastructure
Anaelle Kim, Grant Sheagley, Dylan Christiano, Shawn Le
Flu Demographics
Vincent Tierra, Adrian Herrmann, Lynley Yamaguchi
San Diego Gentrification
Megan Chang, Abena Bonsu, Raymond Arevalo, Lauren Liao
Noteworthy Other Projects
Recurrent Neural Networks for Protein Secondary Structure Prediction
David Wang, Michelle Franc Ragsac, Jimmy Quach, Dhaivath Raghupathy, Shih-Cheng Huang
Exercise vs. Food Environment: Obesity Classification
Bryant Lin, Swarnakshi Kapil, Hendrik Hannes Holste
A Valuation of Public Parks Chad Atalla, Alicia Chen, Nadah Feteih, Alan Chen, Joshua Van Gogh, Anjali Verma
Crime and Public Recreation Areas
Tyler Ly, Reginald Wu, Karen Ma, Ho Tsun Matthew Ho, Erika Morozumi
Causes of Car Accidents
Andy Thai, Johnson Pang, Ronald Baldonado, Haoyuan Wang
Are Universities Worth the Opportunity Cost?
Sharmaine Manalo, Madeline Hsia, Nathanyel Calero, Tianyu Zhang
Using Python to Analyze Billboard Top 100 Pop Songs
Christopher Lo, Ryan Yang, Ken Truong, Kevin Tan, Vivian Mach
Media Violence
Hazel Baker-harvey, Ting Lin, Pratyusha Meka
Optimization of Police Car Placement in San Diego
Emma Roth, Eric Mauritzen, Keven Nguyen, Taralyn Mcnabb
Exploring Desalination Plant Numbers
Christina Cook, Dominic Suares, Youxi Li, Linzhi Xie, Erik Mei
Chicago Crime
Kevin Li, Prithvi Narasimhan, Rajiv Pasricha, Andy Zhang, Matthew Ho
Patil says smart words whilst seated next to me: