Who comes to a Hackathon? — Using ML to Predict YHack Attendees Part 2

Check out Part 1 first!

Last time we tried using a couple of demographic features to classify students and see whether they would attend YHack, but our results weren’t that great. Today, I’ll be adding in other features to try and improve our classification accuracy.

New Features: Transportation Reimbursement Amount & Method, Short Answers

For transportation this year, Yhack reimbursed none to local Connecticut schools and up to $250 in flight costs for students coming from the west coast, which was scaled accordingly to locations in between. For categorization, I split up the reimbursement amounts into $100 buckets.

   Transportation  Checked In
x=0 0.519685
x<$100 0.491379
$100<x<$200 0.264249
x≥$200 0.408759
Free Bus 0.387534

I also took a look at the short responses. While I didn’t have a good way to judge the quality of the response, I used the character count which was hard capped at 500 as a rough way to measure interest.

Why Yhack Response Length  Checked In
<250 0.343533
<400 0.406114
<450 0.420765
≥450 0.504836
Best Project Response Length Checked In
<250 0.349662
<400 0.379167
<450 0.473684
≥450 0.482026

Wow! It turns out about half the people that filled up the whole section would come to YHack, compared to the people that barely filled in anything.

I added these filters to the set and tried running on a couple of different classifiers again.

So the best result out of the batch seemed to be 62.5%, a ~1.5% improvement from yesterday. Granted that this is only a 12.5% improvement over blindly guessing, it’s unfortunately not accurate enough to base decisions on.

Basically, college students are just as flaky no matter how much money you offer them, how much effort they put into the application, and how much experience they have. Looks like we’ll be stuck with good guesses for now!

SWE @ Nuro | Formerly Facebook/Google | Yale ’18 | alanliu.dev

SWE @ Nuro | Formerly Facebook/Google | Yale ’18 | alanliu.dev