Bill Dunphy recently directed me to Galaxy Zoo, an astronomy project out of Oxford that’s flown under my radar. Most basically, Galaxy Zoo offers millions of sky charts to the public, asking them to classify galaxies. Like many “artificial artificial intelligence” tasks, this is something that is immense in scale but hard to computerize.
It’s interesting to note the seriousness with which the researchers approached Galaxy Zoo, in that the content was the primary purpose and rarely is the word “experiment” tossed around in regards to the method.
As of November 28th 2007, GZ had over 36 million classifications (called ’votes’ herein) for 893,212 galaxies from 85,276 users. (Land et al. 2)
“… we are able to produce catalogues of galaxy morphology which agree with those compiled by professional astronomers to an accuracy of better than 10%.” (Lintott et al. 9).
For greater reliability, two methods were employed. First of all, each galaxy was classified numerous times, and the researchers would “only use objects where one class-type receives a significant majority of the votes.” This technique of independent confirmation is used in most such undertakings, as it limits individual impact by unreliable or malicious users.
There was also a weighted ranking, where “users who tended to agree with the majority have their votes up-weighted, while users who consistently disagreed with the majority have their votes down-weighted” (Land et al. 2). However, reserachers did not see much difference, and chose to concentrate on the unweighted results.
“More than 99.9% of the galaxies classified as MOSES ellipticals which are in the Galaxy Zoo clean sample are found to be ellipticals by Galaxy Zoo.” (Lintott et al. 2)
Bias and Validity
Numerous types of bias were recorded. Notably, colour-bias (where more experienced users can classify based on the prior tendencies of the specific colour) and spiral-bias were noted. The second one, as noted in Galaxy zoo finds people are screwed up, not the Universe, appears to people a product of human psychology, where users would tend to classify a galaxy as rotating counterclockwise, when in theory CW and CCW should be about equal.
To investigate this, the creators ran a bias sample, with occasional monochromatic, vertically-mirrored, and diagonally-mirrored images. We see this done in Luis von Ahn’s projects, with decoys being used in Recaptcha and ESP Game to help determine reliability. The GZ researchers note the Hawthorne Effect, in that “users may be more cautious with their classifications if they think that they are being tested for bias” (Lintott et al. 9). However, considering the example of Recaptcha—which offers one real word and one decoy—perhaps such an effect can be utilized fully.
To get as many users as possible, simplicity and a low barrier to entry were extremely important considerations in the design. “Visitors to the site were asked to read a brief tutorial giving examples of each class of galaxy, and then to correctly identify a set of ‘standard’ galaxies. …Those who correctly classified more than 11 of the 15 standards were allowed to proceed to the main part of the site. The bar to entry was kept deliberately low in order to attract as many classifiers to the site as possible” (Lintott et al. 3).
The majority of users classified around 30 galaxies. As the following chart shows, however, some went up to the tens of thousands. Even though the use of crowds to dissipate individual time obligations is the core purpose of such a system, it is very beneficial to accommodate the “super-users”, who do hundreds of times as much work as the casual user.
Betsy Devine: Ox, Docs Shocks!