# How to analyse survey data

You’ve collected your survey results and have a survey data analysis plan in place. Now it’s time to really get stuck in and start sorting and analysing the data.

## Survey data analysis made easy

The results are back from your online surveys. Now that you’ve collected your statistical survey results and have a data analysis plan, it’s time to begin the process of calculating survey results you got back. Here’s how our Survey Research Scientists make sense of quantitative data (versus making sense of qualitative data), from looking at the answers and focusing on their top research questions and survey goals, to crunching the numbers and drawing conclusions.

### To begin calculating survey results more effectively, follow these 4 steps:

1. Take a look at your top research questions
2. Crosstabulate and filter your results
3. Crunch the numbers
4. Draw conclusions

### Take a look at your top research questions

First of all, let’s talk about how you’d go about calculating survey results from your top research questions. Did you feature empirical research questions? Did you consider probability sampling? Remember that you should have outlined your top research questions when you set a goal for your survey.

For example, if you held an education conference and gave attendees a post-event feedback survey, one of your top research questions may look like this: How did the attendees rate the conference overall? Now take a look at the answers you collected for a specific survey question that addresses that top research question:

Do you plan to attend this conference next year?

You'll notice that there are some percentages (71%, 18%) and some raw numbers (852, 216) in the responses.

The percentages are just that: the percentage of people who gave a particular answer. Put another way, the percentages represent the number of people who gave each answer as a proportion of the number of people who answered the question. So, 71% of your survey respondents (852 of the 1,200 surveyed) are planning to come back next year.

This table also shows you that 18% say they are planning to return and 11% say they are not sure.

### Crosstabulating and filtering results

You'll remember that when you set a goal for your survey and developed your analysis plan, you thought about which subgroups you were going to analyse and compare. Now is when that planning pays off. You might, for example, want to see how teachers, students and administrators compared to one another in terms of their answers to the question about next year’s conference. To work this out, you want to delve into response rates by means of crosstabulation, where you show the results of the conference question according to subgroup:

Looking at this table, you can see that a large majority of the students (86%) and teachers (80%) plan to come back next year. However, it's a different story for administrators who attended your conference, with under half (46%) of them intending to come back. Hopefully, some of our other questions will help you work out why this is the case and what you can do to improve the conference for administrators so that more of them will return year after year.

A filter is another useful tool for modelling data. Filtering means narrowing down your focus to one particular subgroup and filtering out the others. So here, instead of comparing subgroups to one another, we’re just looking at how one subgroup answered the question. For instance, you could limit your focus to just women, or just men, and then re-run the crosstab by type of attendee to compare female administrators, female teachers and female students. One thing to be wary of as you slice and dice your results is that every time you apply a filter or cross tab, your sample size decreases. To make sure your results are statistically significant, it may be helpful to use a sample size calculator.

### Benchmarking, trending and comparative data

Let’s imagine that one key question on your conference feedback survey is “Overall, how satisfied were you with the conference?” and your results show that 75% of the attendees were satisfied with the conference. That sounds quite good, but wouldn’t you like to have some context, something to compare it against? Is that better or worse than last year? How does it compare to other conferences?

Well, if you did ask this question in your conference feedback survey after last year’s conference, you'd be able to make a trend comparison. Professional pollsters aren't exactly renowned for their sense of humour, but one of their favourite lines is “Trend is your friend.”

If last year’s satisfaction rate was 60%, then you would have increased satisfaction by 15 percentage points. What caused this increase in satisfaction? Hopefully, the responses to other questions in your survey will provide some answers.

If you don’t have data from the conferences held in previous years, make this the year you start collecting feedback after every conference. This is called benchmarking. You establish a benchmark or baseline number and, in the future, you can see whether and how this has changed. You can benchmark not only attendees’ satisfaction, but other questions as well. You’ll be able to track, year after year, what attendees think of the conference. This is called longitudinal data analysis.

You can even track data for different subgroups. If, for example, satisfaction rates are increasing year on year for students and teachers but not for administrators, you might want to look at administrators’ responses to various questions to see if you can gain an insight into why they are less satisfied than other attendees.

### Crunching the numbers

You know how many people said they were coming back, but how do you know if your survey has yielded answers that you can trust and answers that you can use with confidence to inform future decisions? It’s important to pay attention to the quality of your data and to understand the components of statistical significance.

In everyday conversation, the word “significant” means important or meaningful. In survey analysis and statistics, significant means “an assessment of accuracy”. This is where the inevitable “plus or minus” comes into survey work. In particular, it means that survey results are accurate within a certain confidence level and not due to random chance. Drawing an inference based on results that are inaccurate (i.e. not statistically significant) is risky. The first factor to consider in any assessment of statistical significance is the representativeness of your sample – that is, to what extent the group of people who were included in your survey “look like” the total population of people about whom you want to draw conclusions.

You have a problem if 90% of conference attendees who completed the survey were men yet only 15% of all your conference attendees were male. The more you know about the population you are interested in studying, the more confident you can be when your survey lines up with those numbers. When it comes to gender, at least, you’ll be quite happy if men make up 15% of survey respondents in this example.

If your survey sample is a random selection from a known population, statistical significance can be calculated in a straightforward manner. A primary factor here is sample size. Let's suppose that 50 of the 1,000 people who attended your conference replied to the survey. Fifty (50) is a small sample size and results in a broad margin of error. In short, your results won’t carry much weight.

Let's suppose that you asked your survey respondents how many of the 10 available sessions they attended over the course of the conference and your results look like this:

You might want to analyse the average. As you may recall, there are three different types of averages: mean, median and mode.

In the table above, the average number of sessions attended is 6.1. The average reported here is the mean, which is the type of average that’s probably most familiar to you. To determine the mean, you add up the data and divide that by the number of figures you added. In this example, you have 100 people saying they attended one session, 50 people for four sessions and 100 people for five sessions, etc. So, you multiply all of these pairs together, add them up and divide by the total number of people.

The median is another type of average. The median is the middle value: the 50% mark. In the table above, we would locate the number of sessions where 500 people were to the left of the number and 500 were to the right. The median is, in this case, six sessions. This can help you eliminate the influence of outliers, which may adversely affect your data.

The last type of average is mode. The mode is the most frequent response. In this case, the answer is six. 260 survey participants attended six sessions, more than attended any other number of sessions.

Means – and other types of averages – can also be used if your results were based on Likert scales.

### Drawing conclusions

When it comes to reporting on survey results, think about the story the data tells.

Let's suppose that, overall, your conference received mediocre ratings. You dig deeper to find out what’s going on. The data show that attendees gave very high ratings to almost all aspects of your conference (the sessions and classes, the social events and the hotel) but that they really disliked the city chosen for the conference. (Maybe the conference was held in Aberdeen in January and it was too cold for anyone to go outside!) That's part of the story right there: It was a great conference overall, but the location was totally unsuitable. Brighton, Nice or Barcelona might be a better choice for a winter conference.

One aspect of data analysis and reporting that you have to consider is causation vs. correlation.

## Appendix

### What is survey data collection?

Survey data collection uses surveys to gather information from specific respondents. Survey data collection can replace or supplement other data collection types, including interviews, focus groups and more. The data collected from surveys can be used to boost employee engagement, understand buyer behaviour and improve customer experiences.

### What is longitudinal analysis?

Longitudinal data analysis (often called “trend analysis”) is basically tracking how findings for specific questions change over time. Once a benchmark is established, you can determine whether and how numbers shift. Let's suppose that the satisfaction rate for your conference was 50% three years ago, 55% two years ago, 65% last year and 75% this year. In this case, congratulations are in order because your longitudinal data analysis shows a solid, upward trend in satisfaction.

### What is the difference between correlation and causation?

Causation is when one factor causes another, whereas correlation is when two variables move together but one does not influence or cause the other. For example, drinking hot chocolate and wearing a woolly hat are two variables that are correlated, in that they tend to go up and down together; however, one does not cause the other. In fact, they are both caused by a third factor: cold weather. Cold weather influences both hot chocolate consumption and the likelihood of wearing a woolly hat. Cold weather is the independent variable, and hot chocolate consumption and the likelihood of wearing a woolly hat are the dependent variables. In the case of our conference feedback survey, cold weather most probably influenced attendees' dissatisfaction with the conference city and the conference overall. Finally, to further examine the relationship between variables in your survey, you might need to perform a regression analysis.

### What is regression analysis?

Regression analysis is an advanced method of data visualisation and analysis that allows you to look at the relationship between two or more variables. There are many types of regression analysis and the one(s) a survey scientist chooses will depend on the variables he or she is examining. What all types of regression analysis have in common is that they look at the influence of one or more independent variables on a dependent variable. In analysing our survey data, we might be interested in knowing what factors have the greatest impact on attendees’ satisfaction with the conference. Is it a matter of the number of sessions? The keynote speaker? The social events? The site? Using regression analysis, a survey scientist can determine whether and to what extent satisfaction with these different attributes of the conference contribute to overall satisfaction.

This, in turn, provides insight into which aspects of the conference you might want to alter next time around. Let's suppose, for example, that you paid a hefty fee to secure the services of a top-flight keynote speaker for your opening session. Participants gave this speaker and the conference overall high marks. Based on these two facts, you might think that securing the services of a fabulous (and expensive) keynote speaker is the key to conference success. Regression analysis can help you determine whether this is indeed the case. You might find that the popularity of the keynote speaker was a major driver of satisfaction with the conference. If so, next year you’ll want to secure the services of a great keynote speaker again. However, if for example, the regression shows that although everyone liked the speaker, this did not contribute much to attendees’ satisfaction with the conference, the large sum of money spent on the speaker might be better spent elsewhere. If you take the time to carefully analyse the soundness of your survey data, you’ll be on your way to using the answers to help you make informed decisions.