[PMC Weekly Insight] Survey marketing: Qualitative analysis of the de-biasing survey

Philip Morgan

My survey marketing experiment1 continues. As a quick reminder, I'm experimenting with using a survey to connect and build trust with a group of people.

It's been a few weeks since I've updated you on this2, so a quick recap seems helpful. As I previously wrote:

The three things that must simultaneously be true for survey marketing to work:

1. Question my audience can answer about themselves

2. Question is one the audience is curious about RE: their peers/the industry

3. Question is one I am very interested in the answer to

This confluence of factors makes it possible for me to serve my audience by answering a question both they and I care about, then sharing the answer back with them using a permission mechanism that was created using the same means I used to generate the answer in the first place: a survey.

I started this project with a very open-ended survey that I am referring to as my "de-biasing survey". The purpose of this survey is to align my thinking with how my sample group thinks about the question of investing in their career.

I distributed this survey to a sample I recruited from LinkedIn using scrappy, inexpensive methods. I also forked the survey and sent the fork to a sample recruited from my email list.

I shared some quantitative results here, and in that data there's a pretty clear difference between my LinkedIn sample and my email list sample, with the email list sample seemingly much more interested in and active in career development. And younger, too, you good-looking lot!

That pretty much brings us current. Now to dig into the qualitative data this research has yielded.

The qualitative data

For this project, I'm thinking of the responses to my open-ended questions as qualitative data. It's not as rich or nuanced a qualitative dataset as realtime audio or video or IRL interviews would yield, but it's still useful because it adds context to the quantitative data.

Here's an example of a few responses to one of my open-ended questions. The responses come from the LinkedIn sample, and the question was:

Consider your entire career as a self-employed software developer and times you have gotten new opportunities, better projects, or other forms of career improvement. What do you think led to these improvements in your career?

  • Coincidence. It's much harder now because the applicant pool is overloaded.
  • taking many shots
  • capacity to focus and deal with problems
  • How to get more customers
  • "Being curious and open. I tell people about the things I'm interested in and the projects I hack together in my own time. Everytime that has come up in a ""9-to-5"" work environment it has led to me getting more money and interesting conversations (e.g. would you like to work here, would you like this project)"
  • Networking & experience.
  • I am not a full-time software developer. I started because my place of work needed certain applications not commercially available.
  • I haven't been very successful in finding good projects.
  • longevity
  • Being friendly, honest, hard-working and producing quality results.

This list is the first 10 responses to that question, listed in the chronological order the responses showed up. You really get a sense of the range here, from quick 1-word responses--some seeming to be nonsequiturs or misreadings of the question--to more lengthy, seemingly more thoughtful responses. This is totally normal in the context of a survey like this one.

Bias alert!

At this point, I'm on the lookout for a subtle bias in myself, which would be to discount the shorter responses in some way. To assume they're less valuable, less thoughtful, or less meaningful to my question. Remember, when you are starting with no data, the marginal value of additional data is huge until you get to about 30 data points, then it starts tapering off pretty quickly. I'm referencing Douglas Hubbard here, who has said that beyond about 30 samples you need to quadruple the sample size to reduce error--which we can think of as uncertainty--by half. I can't find online this nifty graph that Douglas shared in a recent webinar for the Military Operations Research Society, but the graph below, from a different source, conveys the same idea. Notice how the curve pretty quickly goes asymptotic around the 30 sample mark:

This shows the decreasing marginal value of additional data. The biggest gains happen between 0 and ~30 samples.

Anyway! I think it would be a mistake for me to discount the value of any of my responses here, even if the responses are super short or don't make a lot of grammatical sense. They're still data, and they're still moving me from massive uncertainty to greatly-reduced uncertainty.

Cleaning the qual data

In order to address this bias in myself, and to make this qual data more useful, I need to normalize the responses to open-ended questions. This is coding the responses. I'll do this right here, as an example, for a few of the above responses.

  1. First example:
    2. Actual response: "Coincidence. It's much harder now because the applicant pool is overloaded." 3. Coded to: "s-chance, s-competition"
  2. Second example:
    2. Actual response: "taking many shots" 3. Coded to: "a-volume"
  3. Third example:
    2. Actual response: "capacity to focus and deal with problems" 3. Coded to: "a-problemsolving"

You'll notice each of my codes begins with a letter, which is a shorthand for one of two things: "a-" means action/activity, and "s-" means sentiment, or a sort of feeling/worldview being expressed. This allows me to sort and filter more easily, and it's a meaningful distinction here.

Any open-ended tagging or categorization system, such as my coding system here, presents a challenge because you can invent an infinite number of categories and become highly granular in your categorization. This is why almost every time I've ever set up a CRM for myself, I abandon using it. It collapses under the weight of its own complexity which, ironically, I created by creating a too-granular category/tagging system!

So... be careful with your coding system. :) You want it to be expressive and not conceal too much granularity and nuance, but you also want it to be useful, which means avoiding excessive complexity which means avoiding excessive detail and granularity.

What I'll be doing next with this research is coding the qualitative answers, and I'll do so in an iterative way. I'll read through each column of responses to open-ended questions, and set up an adjacent column in the spreadsheet that contains the responses and that's where I'll put in the coded responses. If column C contains responses to an open-ended question, I'll add a column D for my codes, and so on. Theoretically a RDBMS would be better here--or perhaps Airtable--but I'm sticking with a spreadsheet at this point because it's good enough.

For each new action or sentiment I find in the qual data, I'll create a new code. Then, I'll pull out a list of all the codes and look for opportunities to simplify the coding schema by collapsing sufficiently similar codes into one, and then search for the old codes in my spreadsheet and replace them with the new codes based on the now-simplified schema. This is where the process becomes more art than science.

Next steps

Here's my list of next steps for this research project:

  • Code the open-ended responses into one of two tags:
    • Crisply defined activities (somewhat objective on my end. More normalizing than interpreting.)
    • Sentiments (quite subjective on my end. More interpreting than normalizing. This is where I can skew objective or skew towards "rack the shotgun style filtering" where I apply my own worldview.)
      • Interesting to note my personal emotional reaction to some of the sentiments expressed. Judgey! :(
  • Analyze the coded responses:
    • Word cloud to facilitate easy, "cotton candy" sharing of results like those shitty infographics everywhere online do. :)
    • Simple quant analysis ("what % of respondents list this activity/sentiment?")
    • Really, really think about what the patterns I see with the above analysis methods might be saying. Is there a story in the data?
  • Compare the LinkedIn sample vs. the list sample
  • And of course, write up my findings into a report to share back with those who left their email address for me.
  • Decide whether to extend or pivot based on what I've learned.

Interesting derivative questions

Thus far, this research--even though I'm nowhere near "done"--has raised some very interesting questions for me:

  • Wow, the responses from my email list sample are so immediately strikingly different. "Better" in my view. What filters for these kinds of people? What "racks the shotgun" for them? Where do they hang out in an already-filtered group?
  • Do I create two reports--one per sample group--or just one?
  • Was the structure of my survey questions redundant? I got a few comments to the effect that it was. I saw the later survey questions as drill-downs going deeper on earlier questions, but a few participants saw them as redundant. I need to be sensitive to this if I design a second survey to go bigger with this research.


I think I'll use the next few free Weekly Insight articles to update you on the continuation of this research and, incidentally, give myself helpful deadlines to keep it going. :)

Looking forward to sharing my method and what I learn with you.


Recent Daily Insights

[display-posts posts_per_page="3" include_excerpt="true" category="daily-insight"]

  1. If you want to read up on this experiment:
    1. /pmc-survey-marketing/
    2. /pmc-the-de-biasing-survey/
    3. /pmc-survey-marketing-recruitment/
    4. philipmorganconsulting.com/pmc-survey-marketing-initial-data-from-the-de-biasing-survey/
  2. Moving is hella disruptive! :) Cheryl and I are partially moved into our long-term rental, waiting for the moving company to deliver our furniture and the possessions we didn't bring with us by car.