Who comes to a Hackathon? — Using ML to Predict YHack Attendees Part 1

At YHack, Yale’s 36 hour hackathon held earlier this month, we were happy to see over 1,100 students. Since our founding, we’re proud to financially support a vast majority of our hackers from different schools and economic backgrounds. For the years to come, we want to extend our support to as many people as possible. However, to do that, we’ll have to guess fairly accurately who among our applicants will decide to come so we don’t overspend our budget.

And so, I‘m building a neural net.

Picking Demographic Features

In every YHack application, we ask for some demographic data including gender, ethnicity, and if you’re a first time hacker. Like other hackathons, we typically use these as metrics to try to reach diversity in our hackers.

Features: Graduation Year, Gender, Race, First Generation, Previous Hackathons, Transportation Method/Reimbursement Cap, Short Answer Responses.

First, let’s just take a look at the easy to categorize demographic features (italicized above). I filtered out the data by those that we accepted and checked the rates of check-in to each of the features above.

Unfortunately, any difference here is insignificant.
       Gender  Checked In
0 female 0.400915
1 male 0.425472
2 nonbinary 0.294118
3 other 0.500000
4 pnd 0.459459

Similarly, for race and first generation, there weren’t any significant trends.

First Generation  Checked In
False 0.428683
True 0.399088
Race Checked In
asian 0.419563
black 0.358025
hispanic 0.474074
nativeamerican 0.333333
other 0.351724
pnd 0.405229
white 0.429280

These three features didn’t give us much information about whether people would come. It turns out college students are just as flaky no matter what background they’re from. Let’s move on to experience-related information.

Picking Experience Related Features

Graduation Year  Checked In
2018 0.394794
2019 0.337864
2020 0.410385
2021 0.513915
2022 0.466667 (removed for graph)
2023 0.333333 (removed for graph)
Previous Hackathons  Checked In
0 0.500682
1 0.414487
2 0.401813
3 0.319444
4 0.279279
5 0.345588

While admittedly, a 17% and 22% range respectively aren’t groundbreaking, it was more significant than the other results. The graduation year and previous hackathon features seemed to show distinct trends. The class of 2021 are current freshmen, and the class of 2018 are current seniors. It seems that the first year of college is the best time to go to a hackathon. I decided to keep these two demographics features.

After training a couple of classifiers with sklearn…

it turns out, the best classifier has only a 60.6% accuracy, (10.6% better than guessing randomly.) This is a bit disappointing, but not really surprising, after all I expect most people to decide whether or not to come to YHack more so on location and funding, rather than experience. Tomorrow, we’ll take a look at some of the other categorical features like the Transportation Method / Reimbursement Cap and the Short Answer Responses.

Part 2




SWE @ Nuro | Formerly Facebook/Google | Yale ’18 | alanliu.dev

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alan Liu

Alan Liu

SWE @ Nuro | Formerly Facebook/Google | Yale ’18 | alanliu.dev

More from Medium

All about One-Hot-Encoding

Walmart Retail Analysis

Two Class Logistic Regression to build Click Prediction Model for Digital Ads

Recommendation Systems — A Gentle Introduction