Who comes to a Hackathon? — Using ML to Predict YHack Attendees Part 1
At YHack, Yale’s 36 hour hackathon held earlier this month, we were happy to see over 1,100 students. Since our founding, we’re proud to financially support a vast majority of our hackers from different schools and economic backgrounds. For the years to come, we want to extend our support to as many people as possible. However, to do that, we’ll have to guess fairly accurately who among our applicants will decide to come so we don’t overspend our budget.
And so, I‘m building a neural net.
Picking Demographic Features
In every YHack application, we ask for some demographic data including gender, ethnicity, and if you’re a first time hacker. Like other hackathons, we typically use these as metrics to try to reach diversity in our hackers.
Features: Graduation Year, Gender, Race, First Generation, Previous Hackathons, Transportation Method/Reimbursement Cap, Short Answer Responses.
First, let’s just take a look at the easy to categorize demographic features (italicized above). I filtered out the data by those that we accepted and checked the rates of check-in to each of the features above.
Gender Checked In
0 female 0.400915
1 male 0.425472
2 nonbinary 0.294118
3 other 0.500000
4 pnd 0.459459
Similarly, for race and first generation, there weren’t any significant trends.
First Generation Checked In
True 0.399088 Race Checked In
These three features didn’t give us much information about whether people would come. It turns out college students are just as flaky no matter what background they’re from. Let’s move on to experience-related information.
Picking Experience Related Features
Graduation Year Checked In
2022 0.466667 (removed for graph)
2023 0.333333 (removed for graph)
Previous Hackathons Checked In
While admittedly, a 17% and 22% range respectively aren’t groundbreaking, it was more significant than the other results. The graduation year and previous hackathon features seemed to show distinct trends. The class of 2021 are current freshmen, and the class of 2018 are current seniors. It seems that the first year of college is the best time to go to a hackathon. I decided to keep these two demographics features.
After training a couple of classifiers with sklearn…
it turns out, the best classifier has only a 60.6% accuracy, (10.6% better than guessing randomly.) This is a bit disappointing, but not really surprising, after all I expect most people to decide whether or not to come to YHack more so on location and funding, rather than experience. Tomorrow, we’ll take a look at some of the other categorical features like the Transportation Method / Reimbursement Cap and the Short Answer Responses.