Saturday, August 8, 2020

Lies, Damned Lies, and Statistics

Lies, Damned Lies, and Statistics I do understand that human beings are not intuitively good at statistics. Daniel Kahneman won the Nobel Prize in Economics in 2002 for showing, among other things, that people in general are terrible at statistical thinking. From Kahnemans Nobel biography: The standard example of a framing problem, which was developed quite early, is the lives saved, lives lost question, which offers a choice between two public-health programs proposed to deal with an epidemic that is threatening 600 lives: one program will save 200 lives, the other has a 1/3 chance of saving all 600 lives and a 2/3 chance of saving none. In this version, people prefer the program that will save 200 lives for sure. In the second version, one program will result in 400 deaths, the other has a 2/3 chance of 600 deaths and a 1/3 chance of no deaths. In this formulation most people prefer the gamble. If the same respondents are given the two problems on separate occasions, many give incompatible responses. When confronted with their inconsistency, people are quite embarrassed. They are also quite helpless to resolve the inconsistency. Still, just because other people are bad at statistics doesnt mean Ill accept that as an excuse from you as a prospective MIT applicant. Most MIT majors require or suggest a course in probability and/or statistics, so you might as well get a head start in statistical thinking now. First, a few facts on which to chew: 1. The overall admission rate for the class of 2009 was 14.3%. (From here.) 2. Applicants who interviewed (or had their interview waived) had a 19% admission rate; those who didnt interview had a 7% admission rate. (I dont have a citation for this, which is sketchy, so feel free not to believe me. But although I cant remember where I found the numbers, this is close enough to the truth for the purposes of this entry.) 3. Applicants with SAT scores in the 88th percentile (roughly a 1290 old SAT) have about a 5% admission rate, while those with perfect scores have about a 50% admission rate. (From here a very fun read, if youre into this kind of thing. I highly suggest it!) So does this mean that you can pour all of your personal data into some magic admissions algorithm and have it spit out a number which reflects your chances of getting into MIT? First of all, no. Moreover, it wouldnt matter if it could. For example, if the computer said that you had a 33% chance, that would mean that if you applied to MIT many times, you would expect to get in in approximately 1 in 3 tries. (And were not talking if you applied 3 times here. I think applying 500 times would probably give a good result, but I dont feel like playing around with Matlab to see if thats true.) Of course, you cant apply 500 times to MIT in a single year, or even in your lifetime, so its pointless to try and stick a number on your chances at MIT. I guess the moral of the story here is that no one is a shoo-in for MIT, but the opposite is also true nobody should think they have no hope. But its pointless to over-think this issue, because you just cant control for all the variables. For what its worth, my Super Getting into MIT Guide goes something like this: 1. Do something that you really care about, and make sure you write about it glowingly on your application. 2. Interview, and dont be lame and fake at said interview. 3. Get good scores on the SAT I and SAT IIs. 4. Take difficult classes at your high school (or even local community college) and get good grades in them. And, of course, you can get into MIT if you only have three of these four characteristics you can get in if you only have two you can get in if you only have one. But even if you have four, youre not a sure thing. My final statistics lesson has to do with something you may have heard that MIT supposedly has a stratospherically high suicide rate. This is a contention supported by the Boston Globe, a group of stellar journalists, Im sure, but not so good at the statistics thing. (I cant find the original Globe article, but the article here makes all the points the original article made.) The Globe basically looked at the MIT suicide rate between 1990 and 1999, compared it to suicide rates at other schools, and decided it was too high. (Lets just say theres a reason the Globe article wasnt published in a scientific journal. Sweeping conclusions backed up by questionable data like that make scientists including me want to bang their heads on hard surfaces.) Now lets look at some problems with the Globes grandiose conclusions: 1. People who successfully commit suicide are significantly more likely to be young and male. In the 1990s, the average MIT student was both those things; since then, the population has famously evened out. (Source here; relevant quote: In fact, MITs suicide rate is below the national average if one adjusts figures for the schools overwhelmingly male student body [during the years of the study].) 2. Moreover, science, engineering, and business students have significantly higher suicide rates than do liberal arts students. MIT undergraduates are almost exclusively science, engineering, and/or business majors. Given that both those things are true, one would expect MIT to have a high suicide rate based on those demographics alone. (Source here; relevant quote: Based on 10 undergraduate suicides over 11 years, the article concludes that suicide is a greater danger at MIT than elsewhere. When one factors in that science and business students have considerably higher suicide rates than liberal arts students, and that male college students kill themselves five times more often than female college students, the figures quoted prove nothing. MIT is cited as currently being composed of 59 percent male students; that fact alone would make the suicide rate differences with most other colleges understandable; but in the early 1990s an even higher percentage of the students at MIT were ma le.) 3. The Globe compared MIT to other schools with engineering programs, which is a terrible control. Other schools have engineering programs, yes, but few other schools have 50% of the undergraduate student body majoring in engineering. If you dont have appropriate controls (and its difficult to think of a school which would be a good control Caltech is science/engineering focused too, but only having one school as the control population would be pretty sketchy.) 4. Statistics like this are terribly vulnerable to small swings in absolute numbers. The absolute number of suicides is very small, and therefore it takes many of them spread over many years to accurately determine whether or not the rate in one place is higher or lower than the rate in another. (Source here; quote: Because of small number statistics, the true suicide rate i.e., that that would be measured by an very large MIT in the limit of an infinite number of students is, to 95% confidence, approximately 100,000*(11 +/- 2*sqrt(11)/48,000). At this level, MITs suicide rate is consistent with the national average it would take approximately another thirty three years in order to obtain a measurement of the MIT suicide rate that could be distinguished from the national average at 95% confidence.) So now you know. Go out, and tell my story to the masses. ;)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.