Survey Says (Part 4 of 5) : Making Sense of Your Numerical Data
To be honest, I considered putting this installment first in the series instead of last. You may wonder why, but you won’t be surprised at the answer: planning.
By the time you’re looking at the data from your survey, the only things that should be unexpected are perhaps the magnitude of the values (how large or small), the distribution of the answers (did people answer similarly or differently to the same items), or the direction (are things positive or negative). These aspects should only be unexpected in the sense that you usually collect data to fill in gaps in your knowledge.
There are many other aspects of your data that should absolutely not surprise you. For example, you should know that the data you’ve collected should be able to answer the questions you have. You should also know how you can analyze the data. You similarly should have in mind a few ideas for displaying and communicating the data.
All of this is made possible through planning. When you decide to survey your stakeholders, it’s because you need answers. It’s not because you think people like taking surveys. That being the case, when you plan your survey, you should think carefully about what the data you need will have to look like in order to answer your questions.
I have worked with very large organizations with very large budgets that nevertheless ended up sitting atop mounds of stakeholder data they could not use. In some cases they had plenty of data related to outcomes but no data related to interventions. In other cases, they had plenty of data related to product usage but no way to connect it to the outcomes.
I don’t say this to demoralize you. Don’t think that just because big organizations with lots of money can’t get surveys right that you can’t either. I promise that if you embrace the guidance below, you will see improved ability to analyze and interpret data starting with your very next survey.
It’s not as easy as 123
One of the first things to consider when planning your data is whether you need numbers or textual data, or both. Historically, numbers have been easier to work with. Calculators, spreadsheets applications (e.g., Excel), and statistical software (e.g., SPSS) have made it possible to summarize and visualize large amounts of numerical data quickly and efficiently whereas text data has to be interpreted primarily through reading a lot of information.
But some quantitative analysis issues aren’t resolved just because you have a computer. You still have to have the right data for the software to use. For example, you may want to know the average number of hours your parents think their children spend on homework each night. If that is the case, the way you craft the item matters. For example, you may ask parents to select the category that best fits: Less than 1 hour, 2 to 4 hours, 5 hours or more. That’s a fine question to ask, but you are going to have to make some sacrifices to determine the average. Does less than one hour mean 15 minutes? Does 2 to 4 mean 3 hours? Maybe you look at this and say “No problem, we will just look at which group had the largest percentage of responses.”
There’s nothing wrong with choosing the group percentages over an average. The problem is that you said in the beginning that you wanted to know the average. In the end, you didn’t change your mind because you wanted to. You changed your mind because you didn’t write your item with the expected data in mind.
In reality, certain question types are better suited for different measures of analysis. Likert scale items (e.g., Highly Unlikely to Highly Likely) are often best communicated through percentages of group membership like in the table below.
Item | Highly Unlikely | Unlikely | Likely | Highly Likely |
Attend an after-school sporting event this semester … | 3.4% | 17.1% | 56.1% | 23.4% |
The reason you want to present these types of items by group membership is that the difference between the group value is uncertain. In other words, the distance between Highly Unlikely and Likely isn’t fixed like the difference between values like Once, Twice, Three Times, Four Times. In each of these cases, the value increases by exactly one each time. You certainly could still use this format for those values, but it is also more appropriate to use to measures like averages.
Avoid nonsensical measures
We’ve all heard statistics like, “The average American household has 2.3 children.” You probably chuckle a bit at these statements, knowing of course that that you cannot have three-tenths of a child. Many of us who laugh at this are just as guilty, though, in our own work. Perhaps you work for a high school and have reported something like, “Last year’s 9th graders earned an average of 4.3 credits.” Is it possible to earn .3 credits in a course? If not, then why would you report it that way. It’s much better to say something like 65% of last year’s 9th graders earned fewer credits than they need to be on pace to graduate in 4 years.
Hopefully you won’t have many issues like this when analyzing your survey data. Why do I say that? Because I’m confident that you will have taken care to PLAN your questions so they provide data in the format you need.
Practitioner’s Corner: Nested Items
There are many great survey building tools available for free. One thing that makes them so helpful is they have many templates for building questions. There is one popular option, though, that I think you should avoid: Nested Items. You may see them called something else, but in general it allows you to ask multiple questions while making it look like just one question. Here’s an example:
Question 5: In a typical week, how many nights (0 – 5) does your child have homework in each subject? | Math | Science | English | Reading | Social Studies |
______ | ______ | ______ | ______ | ______ |
Survey items like this definitely are appealing. It looks like just one item, but you are getting five different data points! The problem is that when you download the data, the responses are often capture like below:
Survey Respondent ID | Question 5 |
Parent 1 | 3, 1, 2, 1, 1 |
Parent 2 | 4, 1, 0, 2, 0 |
Parent 3 | 3, 3, 3, 3, 3 |
Parent 4 | 1, 0, 0, 1, 1 |
Analyzing data like this can be very time consuming. Take a moment and ask yourself how you would find the average nights per week for each class if you have 200 rows of data like this? How would you find the number of parents who reported their child has science homework 3 nights a week?
In general, these nested items are not worth the headache when it comes to analyzing results. On the other hand, you may have a team member who is very proficient with spreadsheet or other data analysis software who can handle this with little effort. Either way, it is a good idea to always have a few people take the survey unofficially before you administer it to your stakeholders. That way, you can look to see if any of the results are coming back in unexpected ways.