www.whyville.net Dec 7, 2008 Weekly Issue



woohooyea
Guest Writer

How Numbers Can Lie

Users' Rating
Rate this article
 
FRONT PAGE
CREATIVE WRITING
SCIENCE
HOT TOPICS
POLITICS
HEALTH
PANDEMIC

Some Internet exploring told me that the average school-age child (that's most of us!) spends 3 to 4 hours a day in front of the TV. That's a lot of messages getting drilled into a lot of learning brains. When we see percentages, surveys, and polls, it's easy to jump to conclusions. You already know to take media with a grain of salt, as they say, but just how much salt should we take?

You've probably heard some of the following statistical terms, but maybe you don't know exactly what they mean or how to use them. The people a survey is supposed to make observations about are called the population. The people who actually give the responses shown in the data are called the sample population. If you want to know if your French class wants chocolate cake next vendredi, rather than asking each and every student, you might just ask the students in the front row. The front row would be your sample, while your whole class would be your population.

A survey can show that 85% of people prefer red, and only 15% of people prefer blue, but this difference in opinion alone doesn't make the survey biased. A biased survey gives you data that, on average, is different than the whole population. The red vs. blue data would be biased if 55% of the populations preferred red, and 45% preferred blue.

I bet some of you are thinking, "But a fact is a fact! By nature it must be true!" Rather than argue this point with you in the BBS, let me give you some examples.

Every Thursday, I get my school newspaper, known as the oldest prep school newspaper in America, and lots of students, faculty, and alumni take it very seriously - but not my Statistics teacher. You see, along with the weekly newspaper, we also get a small weekly survey, from "What time do you go to bed?" to "How do you like the new organic apples in the dining hall?"

Imagine an editor, Dominique, opens the anonymous suggestion box to tally up the apple survey. Her slips of paper tell her that 168 students hate the new apples, 12 students don't care, and only 35 students actually like them. Dominique might run a headline screaming, "Students want old apples back!" but would she be right in saying this? I mean, nearly 5 times more students say they hate the new apples than say they like them! But Dominique would be forgetting that there are about 1000 students in the whole school. Along with the 215 slips of paper that made it to the box, 785 are sitting in the bottoms of backpacks, in the recycling bin, etc. Her data is inaccurate because she took what statisticians call a volunteer sample. Let's think about who might respond to the survey; isn't someone who had a bad apple going to want to complain more than someone who hadn't even noticed the difference? It turns out, most people thought the new apples were okay, so they didn't even bother to respond! Dominique's data was plagued by voluntary response bias.

Last year, student council ran a very heated election for President between a boy and a girl, Steven and Louise. The newspaper published two pie charts showing the percentages of boys who voted for each candidate and the number of girls who voted for each candidate. The numbers seemed to show that while the girls tended to vote pretty evenly between Steven and Louise, the boys overwhelmingly voted for Steven. Is this evidence enough for a front page special on gender discrimination? Let's take a look.

The poll was conducted overnight by Dan, a boy in Steven's dorm. At a dorm meeting, Dan stood up and asked "Hey, everyone voted for Steven, right?" No one objected. "Okay, thanks guys," he said, marking down 60 votes for Steven. How accurate do you think this data is? His question was clearly biased, because what boy is going speak up in front of everyone, including Steven, and say he voted for Louise? When the question itself is going to attract a sampling bias, we call that a questionnaire bias.

What about the people Dan was asking? 60 boys in Steven's dorm probably won't vote the same way the rest of the boys on campus did. Dan took a convenience sample by asking 60 guys he could easily find.

It turns out, there was a shy freshman in the back who secretly voted for Louise, but in the poll, it will show up as a vote for Steven. This causes an incorrect response bias, which arises when people lie, are embarrassed by their answer, or just make a mistake.

It's really easy to misjudge information. I'm even guilty of it in this article (I hope some of you noticed!). I took a convenience sample at the beginning, by picking a couple websites that were at the top of the list when I searched Google with a few keywords. I don't know how they conducted the polls. Did they ask their neighbors, who just bought a flatscreen? How many people did they ask? Did the sample population lie because they actually watch 7 hours a day? Did the researchers say, "Your kids watch way too much TV, don't they? How much do you think they watch?"

You don't have to be paranoid, though. There are more ways than I can mention in this article to take a biased sample, but when a friend says, "Some people told me there are tacos for lunch," you can probably take his or her word for it. But when magazines say "Readers say . . ." or "We asked you!", try to think back to the volunteer sample and be critical to what you're hearing and reading. Who is the sample? What kind of biases might be skewing the data? What conclusions are they drawing? You can always think for yourself!

Yours vigilantly,
woohooyea

Author's Note: I am not trying to accuse anyone specifically of statistical misdemeanors. Although taken from real situations, I made up all the student names and poll numbers.

Sources:
http://kidshealth.org/parent/positive/family/tv_affects_child.html
http://www.tvb.org/rcentral/MediaTrendsTrack/tvbasics/09_TimeViewingPersons.asp
http://www.med.umich.edu/1libr/yourchild/tv.htm
"Statistics in Action: Understanding a World of Data" by Ann E. Watkins, Richard L. Scheaffer, and George W. Cobb

 

Did you like this article?
1 Star = Bleh.5 Stars = Props!
Rate it!
Ymail this article to a friend.
Discuss this article in the Forums.

  Back to front page


times@whyville.net
9821