A few days ago, I successfully defending my MA thesis on crowd motivation. As I polish up a final copy for print, here is my draft. Details forthcoming.

Why Bother? Examining the Motivations of Users in Large-Scale Crowd-Powered Online Initiatives

As this is not the final copy, please do not distribute it. Rather, link back to this page, so that I can provide the final when available.


Everyone has their own stories, no? The time that Wikipedia made you understand something a textbook couldn’t,

Stack Overflow is a programming help site. The site is heavily controlled by its community; as users gain points and achievements for contributing to the site, they gain more administrative power.

I’ll admit, I went through a period where I watched the questions page daily, looking for something I could answer. However, I found  that every single time, somebody gave a better answer much quicker. I’m sure it is eventually discouraging, but in my experience it made me competitive. If you don’t get the first answer, at least get a higher rated response. Programming is an endless process of puzzles and problem-solving, so it’s good to know that there’s a place where one in need can get help within minutes. As Jeff Atwood notes in the tweet above, it’s quality help too.

Clay Shirky has a wonderful talk about this phenomenon, wherein he argues that passion (or ‘love’) is a resource that is unfairly underestimated. He uses the example of support for the open-source programming language perl: in an absence of commercial support, there are more than enough passionate perl users online to fill the need. Watch it below.

In recent months, there have been two large-scale crowdsourcing project launched with the involvement of well-known internet personalities. I’m referring to Hunch—heading by Caterina Fake of Flickr fame— and Kickstarter—advised by Andy Baio of Upcoming.org and Waxy.org fame. While these projects are very different, it seems only appropriate for them to share a post.

Kickstarter is a site that lets projects collect funding pledges from users. One can add a project with a funding goal and set tiered rewards for patrons. You’re not expected to pay your pledge unless the project goal is met. Hunch is a crowdsourced decision tree site. Users creating content add decision-making questions to a decision, ones that affect the outcome of the final recommendation.

Had either of these been released elsewhere, the project would not provide any particularly extraordinary impact. Indeed, the paradox of crowdsourcing is that the idea alone will not sell the project, because the difficult barrier of critical mass. Critical mass, however, is itself dependent on having sold the idea already. In other words, new crowdsourcing project can have a great idea in principle, but potential users are aware that it will be difficult to achieve in practice. Such doubt is self-propogating, because knowing that you have doubts means admitting that others may too, further removing confidence in the objective.

A site like Fake’s former creation Flickr was able to grow a community on the back of a valuable individual experience. The community was not vital to the experience of Flickr. Kickstarter and Hunch, however, have no individual experience. If there was only one user, that user would have little to do (admitting that Hunch’s employee-created content can only go so for).

With strong talent behind both of the sites, there is no shortage of cleverness in their mechanics. However, the key to their first few months lies in the trusted celebrity behind them. Like doubt, confidence is be self-propagated. Baio and Fake have an audience already in place, and seeing that audience can swing cynics from “great idea but wouldn’t work” to thinking “just maybe.”

“The prolonged, indiscriminate reviewing of books is a quite exceptionally thankless, irritating and exhausting job. It not only involves praising trash but constantly inventing reactions towards books about which one has no spontaneous feelings whatever. The reviewer, jaded though he may be, is professionally interested in books, and out of the thousands that appear annually, there are probably fifty or a hundred that he would enjoy writing about.

… Seeing the results, people sometimes suggest that the solution lies in getting book reviewing out of the hands of hacks. Books on specialized subjects out to be dealt with by experts, and on the other hand a good deal of reviewing, especially of novels, might well be done by amateurs. Nearly every book is capable or arousing passionate feeling, if it is only a passionate dislike, in some or other reader, whose ideas about it would surely be worth more than that of a bored professional. But, unfortunately, as every editor knows, that kind of thing is very difficult to organize.”

George Orwell, “Confessions of a Book Reviewer.” Tribune (1946)

Orwell, one would presume, would view aggregate amateur review as a boon, not a threat.

“Scepticism about Wikipedia’s basic viability made some sense back in 2001; there was no way to predict, even with the first rush of articles, that the rate of creation and the average quality would both remain high, but today those objections have taken on the flavor of the apocryphal farmer beholding his first giraffe and exclaiming, ‘Ain’t no such animal!’ Wikipedia’s utility for millions of users has been settled; the interesting questions are elsewhere.”
– Clay Shirky, Here Comes Everybody, p.117

In my work on crowdsourcing, my advisors warn me to be careful of how I speak about Wikipedia around academics, because scholars are still divided on it. Clay Shirky’s quote perfectly encapsulates the situation: if it is clear that it works and that it works well, the question shouldn’t be “does it work?” Rather, we should be asking why it works. Kevin Kelly suggests that Wikipedia is “impossible in theory, but possible in practice“: shouldn’t we be tweaking our theories then? Perhaps then, the issue is that if an expert were to praise Wikipedia as reliable, they undermine society’s need for experts. Larry Sanger, creator/co-founder or Wikipedia, says no, but it’s certainly food for thought.

I’ve recently been mulling over the question, “Why does Genius suck so much?” and the implications that it has.

Genius is the playlist generation tool in Apple’s iTunes music software. You choose a song that you’re in the mood for, and it creates an entire playlist of similar songs. Essentially, its a recommender system; if you like x you’ll like y. The problem is that you get a very narrow point of view, with very little genre skipping. and no pleasantly clever surprises.

What sets Genius apart from other song recommender systems is that its essentially powered by the crowds. Apple has the luxury of a rich data set of habits and rating, and it appears to factor heavily into the recommendations. Indeed, algorithmic playlist generators were creating better results years before Genius came on the scene. So, what does this mean for the crowd?

The fact that computers can be better than humans in understanding art is off-putting. I’m still working through this problem, but here are some thoughts toward untangling it.

Ratings data is emotionless. When you rate a song 1 or 5, you’re giving it a universal ‘like’/’dislike’. This data doesn’t factor the mood of the song or the emotion of the listener. This is all very removed for circumstance. As I suggested to Bill Turkel, perhaps such simple crowd-based recommendations are better for high-level suggestions, like artists you may like, but useless at the micro-level (unless that data crowds are contributing is more specific to the topic of recommendations). In contrast, technology can quite effective interpreting the types and patterns of sound which represent an emotion. Certainly it can’t easily understand whether a song is good, but if you want a slow, jazzy rock song, that’s fairly achievable. This is something in which music recommendation is fairly unique, as it is easy to interpret than it would be to interpret thousands of movie plots or millions of book themes.

Despite this, perhaps the most-cited example of a good music recommender is Pandora, which is an internet radio based on the Music Genome Project (MGP). The MGP does use humans to categorize songs, having professionals tag each song with over 400 tags and using an algorithm to weigh the values. Pandora’s success shows that humans are indeed effective at understanding music, given that they’re looking at it in the right way.

There’s also the effect of popular media that makes human-based recommendations unbalanced. If a lot of people like Coldplay, the range of music that it will be recommended for will be broad. This additionally creates an echo loop where popular music simply grows in popularity. Inversely, it is very difficult for new music to enter the loop. If everybody that likes The Strokes like Yeah Yeah Yeahs, the recommender will reinforce this, brushing aside any similar new bands.

However, such problems are limited to the balance of the algorithm. Last.fm, which tracks all of its users’ listened music, is fairly effective in recommending similar music. Also, because of their detailed information on what a user has listened to, they can suggest less listened to songs. Though they don’t offer playlist generation, I wouldn’t put this beyond their abilities.

So where do crowds factor in here? If anything, Pandora suggests that this is best left to professionals. Certainly, you can’t get that sort of exhaustivity with crowds. The answer may lie in reliability. Large groups would be able to make much simpler connections, but on a larger and more verified scale. When I make a playlist with Lou Reed’s Take a Walk on the Wild Side, I always follow it with Urge Overkill’s Girl, You’ll Be A Women Soon.  The songs are linked very little, but there’s something in me that recognizes the similarly cool feeling that I feel. If you could somehow capture millions of these sorts of links, that could lead somewhere.

The absolute most important part of collective action is the collective. At the same time, it the the most difficult and unpredictable piece of such an effort. For many otherwise good ideas, the lack of a crowd deals a critical blow to their success.

In collecting a crowd, one should consider the incentives that motivate that crowd. However, collecting a crowd in not the only way to gain one. What I’m talking about is repurposing a crowd.

While obviously not an option for everybody, repurposing a crowd offers, in many cases, the best chances for crowdsourced success. This may entail piggybacking functions onto your widely-used product (especially useful if you’re Google or Yahoo—which most of us aren’t) but it can also mean borrowing from somebody else’s (like with Facebook platform). The point comes down to this: it’s hard to provide incentive for users to frequent a new site in their online routine, but it is much easier to utilize the sites that they’re already visiting for other purposes.

Here are three directions to consider.

1. Making an audience into participants

Sometimes you have to make do with what you have. When you’re a local radio station competing with a television network, “what you have” can seem frustratingly limited. When it comes to traffic reporting, the difference may be that the big guys have helicopters in the sky and cameras on busy road. How can you compete with that?

For radio stations, the answer lies in what they do have: an audience actively experiencing the traffic. What has emerged from is traffic reporting based on mobile phone audience tips. This crowdsourced model helps narrow the competitive gap that expensive technologies create.

The magic lies in the clever mobilization of readers. There is already a large, dormant audience armed with the information that a traffic watch needs. Giving that audience a voice is all that is needed to get the information. A similar example is Are You Being Gauged from WNYC.

2. Crowdsource as a feature, not as the main event

Some of the most useful examples of crowdsourcing in the wild formed on the backs of other products. The golden standard here is the tagging feature of Flickr. Users have no rules forced upon them on how to encde their information. They just want to put up their photos, and tagging is something that is simply offered for them to stay organized. However, on the larger scale, all the users that do end up using tags helps create an extremely semantically relevant corpus of images (and, as I’ve mentioned before, the first place you should go when looking for images).

Returning to the earlier example of traffic information, such “incidental crowdsourcing” is being tested in using cellphone tower information to determine how fast traffic is moving. Simply by having one’s cellphone in their cupholder, they’re contributing data. Similar to this are e-commerce recommendation engines, where simply by surfing a site, a user contributes to an algorithm for predicting what similar users would want.

3. Reimagine Popular Actions

In this form of repurposing crowds, the question come down to how one can squeeze extra juice out of something already being used. his is the approach that Luis von Ahn projects take, especially well epitomized in reCaptcha and the ESP Game. If people are already using captchas, why not have them also digitize scanned texts? If people like to relax with an online game, why not also have them encode image metadata?

Before starting a crowd-assisted project, don’t bet on people finding their way to it. Think about how existing groups and communities can be used and you’re much more likely to succeed.