In my last post I outlined how and why I taught my Data Science in Practice (Cognitive Science 108) class here at UC San Diego.
The sole purpose of this post is to show off the incredible talent of my students and to showcase the work they put into their Final Projects.
The purpose of the project was to find real-world problems and datasets that can be analyzed with the techniques learned in class. Here are the instructions I gave the students:
It is imperative that by doing so you believe extra information will be gained – that you believe you can discover something new! Your question could be just for fun ( e.g. , “What are commonly misheard song lyrics?”), scientific (e.g., “How do different cultures perceive different colors?”), or, ideally, aimed at civic or social good (e.g., “What parts of San Diego are most in need of dedicated bike lanes?”)
Edit: The rationale behind why I chose to have the students put together a Final Project like this was so that they could build a Data Science portfolio to show off their skills. Data Science is multifaceted, and requires programming, statistics, data visualization, communication, information aggregation, and so on. When I was recruiting for Data Science positions back in my industry days, students with a portfolio of projects absolutely jumped out at us.
In addition, for this first offering group had the option of submitting their projects early in order to participate in a special judging panel lead by former (and first) US Chief Data Scientist and friend, DJ Patil (twitter, wikipedia). DJ is a UC San Diego alumnus (Math, 1996) and happened to be in town the weekend before finals week to receive an Honored Alumni Award from the Chancellor. (You can also watch an interview I did with DJ that weekend below.)
While he was in town, he agreed to watch eight of the best groups each give a 5-minute presentation on their final projects.
To be considered for this challenge, the following rules needed to be followed:
- Communicate your results effectively to both experts and laypersons.
- Use data scientific approaches to address questions specifically concerning civic utility and social good.
Based on these simple criteria, the panel selected the top three projects. In addition to DJ, the panel included Brandon Freeman (UC San Diego Alumni Board of Directors and Leidos Engineering Solutions Architect), Arnaud Vedy (Data Analytics, City of San Diego), and Liz Izhikevich (UC San Diego Computer Science undergrad, President and Co-founder of the UC San Diego Data Science Student Society, and Voyteklab star!)
You’ll note in many of the below notebooks the students make sometimes rudimentary statistical, logical, and/or visualization errors. That’s okay! Those are learning experiences… and they learned a lot in the short 10 weeks we have in each quarter.
Without further ado, here are the eight finalists:
“The Road to DJ Patil is Filled with a Multitude of Potholes”
Lee Anne Mercado, Maggie Chan, Tim Lee, Vinh Doan, Young Jin Yun
The panel unanimously agreed that this project—which took a data-driven approached to understand how San Diego should allocate their pothole workers and resources—was strongest in adhering to the spirit of the competition: they took a problem relevant to civic good and, through diligent and careful data analysis, came to some actionable conclusions.
Diego Saldonid, Roger Ruan, Shu-Wei (Lucas) Hsu, James Mata
This group’s project was well-loved by us all. The primary reason it didn’t win first place was because it wasn’t quite as closely related to the civic/social good focus of the competition. Nevertheless it is super interesting. In brief, the students scraped class reports from UC San Diego’s Course And Professor Evaluations (CAPE) website to look at how average grades differ across different professors teaching the same classes at different quarters. They then look at how grades might differ between an average student who has a “harder grader” path versus an average student who takes the same classes with with an “easier grader”. Amazingly they find it can be almost and entire grade point different: 2.49 v 3.42 overall GPA!
Hudson Cooper, Vlad Bakhurinskiy, Muhammad Islam, Marco Rivera, Wenshuo Li
This group begins by asking: “Can we quantify the quality of life with just a point on the map?” This group really took my advice to heart and pulled in data from several different resources to “try to measure the ‘liveability’ of a neighborhood by looking at different aspects of the availability of methods of transportation rather than just more metrics such as an individual’s level of education or income.” This was a beautiful project with serious civic infrastructure implications. While they focus on San Diego, their methods are easily replicable across different cities. Do yourself a favor and look at their entire analysis notebook.
The upper left quadrant contains all census tracts that are below median income and above median public transportation use. Because we have identified that lower income neighborhoods use more public transportation in general, it makes sense to us that this quadrant would represent the neighborhoods that are most reliant on public transportation infrastructure. Of course improved infrastructure would likely increase public transportation use, bringing some census tracts from below the median public transit usage line into this quadrant, but use the information contained in this plot as a preliminary measure of reliance. This population is actually the best served by public transportation. From our multiple linear regression, even when you hold one fixed and vary the other, low income and high public transportation use each predict high transit score. You can see this plainly from the above plot since most census tracts with ‘good’ transit score (in blue) lie in this region. However, there are still lots of census tracts in this most reliant population with low transit scores. The census tracts that are in this quadrant and have transit scores of less than 50 (we will refer to these census tracts as ‘underserved’) account for 20.6% of San Diego’s population. 41.2% of this underserved population spends over 30% of their income on rent, a critical indicator of poverty.
I wish I had the time here to expound the virtues of each of the below projects in detail, sadly I do not. But please do not take my lack of detailed comments as an unspoken commentary on the amazingness of their work. Please do take the time to look at each of these, as they are truly remarkable!
“Crime ‘n’ Booze”
Jenny Hamer, Aparna Rangamani, Jairo Chavez
Chicago Traffic Violations
Arun Sugumar, Zichao Wu, Xiaoxin Xu, Qixin Ding, Lijiu Liang
San Diego Infrastructure
Anaelle Kim, Grant Sheagley, Dylan Christiano, Shawn Le
Vincent Tierra, Adrian Herrmann, Lynley Yamaguchi
San Diego Gentrification
Megan Chang, Abena Bonsu, Raymond Arevalo, Lauren Liao
Noteworthy Other Projects
Recurrent Neural Networks for Protein Secondary Structure Prediction
David Wang, Michelle Franc Ragsac, Jimmy Quach, Dhaivath Raghupathy, Shih-Cheng Huang
Exercise vs. Food Environment: Obesity Classification
Bryant Lin, Swarnakshi Kapil, Hendrik Hannes Holste
A Valuation of Public Parks
Chad Atalla, Alicia Chen, Nadah Feteih, Alan Chen, Joshua Van Gogh, Anjali Verma
Crime and Public Recreation Areas
Tyler Ly, Reginald Wu, Karen Ma, Ho Tsun Matthew Ho, Erika Morozumi
Causes of Car Accidents
Andy Thai, Johnson Pang, Ronald Baldonado, Haoyuan Wang
Are Universities Worth the Opportunity Cost?
Sharmaine Manalo, Madeline Hsia, Nathanyel Calero, Tianyu Zhang
Using Python to Analyze Billboard Top 100 Pop Songs
Christopher Lo, Ryan Yang, Ken Truong, Kevin Tan, Vivian Mach
Hazel Baker-harvey, Ting Lin, Pratyusha Meka
Optimization of Police Car Placement in San Diego
Emma Roth, Eric Mauritzen, Keven Nguyen, Taralyn Mcnabb
Exploring Desalination Plant Numbers
Christina Cook, Dominic Suares, Youxi Li, Linzhi Xie, Erik Mei
Kevin Li, Prithvi Narasimhan, Rajiv Pasricha, Andy Zhang, Matthew Ho
Patil says smart words whilst seated next to me: