The Andromeda Project

Anil Seth, aseth@astro.utah.edu, & Cliff Johnson, lcjohnso@astro.washington.edu

Every six months for the past three years, Hubble has been collecting images of the Andromeda galaxy (M31) as part of the Panchromatic Hubble Andromeda Treasury (PHAT) survey (Dalcanton 2011; Dalcanton et al. 2012).  When the survey is completed in the summer of 2013, Hubble will have collected over 40,000 separate exposures of M31 taken over more than 800 orbits.  Hidden in these images are ~3,000 star clusters, which is a larger sample than the star clusters known in the Milky Way.  These clusters are invaluable laboratories which we will use to study the initial mass function of massive stars, constrain rare stages of stellar evolution, and learn about the transition of stars from bound clusters to dispersed field populations.

To achieve any of these science goals, we first need to find the star clusters.  Following a long tradition of cluster finding, we started identifying clusters just by looking through the images ourselves. We hoped to follow this initial by-eye search with an automated search of the remaining data, using the visually identified sample as a training set for the algorithmic methods.  Using the first year of data collected for PHAT, eight members of the cluster science team each spent about a month searching the first ~20% of the survey’s images, hunting for clusters.  The team identified ~600 likely star clusters, a four-fold increase over the sample of previously known clusters within the same search region (Johnson et al. 2012).  These results were very encouraging, and we were excited to expand our work to the rest of the PHAT survey data.

Despite our best efforts, however, our attempts at automated cluster identification came up short.  Variations in cluster appearance due to differences in age, mass, spatial distribution, and vast changes in the galaxy background proved troublesome for objective identification techniques.  We were unable to create an automated cluster-finding methodology that did not require significant amounts of human input.  Without an automated algorithm, cluster identification became a major obstacle to progress.  Our path to addressing our scientific questions now seemed to include many months of visual identification and confirmation of cluster candidates.

The Birth of the Andromeda Project

To overcome this cluster-identification obstacle, we collaborated with the Zooniverse to create a citizen science project called the Andromeda Project.  The goal of “citizen science” is to engage untrained individuals in real science projects, for the benefit of both researchers and volunteers. Citizen science isn’t just outreach; it is research that can be performed better or faster by tapping into the surplus of brain power on the web.  The most famous of these projects is the Zooniverse’s well-known Galaxy Zoo, whose ~100,000 users have morphologically classified nearly one million galaxies (Lintott et al. 2011).  The Zooniverse has since expanded into topics ranging from meteorology to animal behavior. It now involves a community of more than 750,000 citizen scientists—a powerful force for “crowd sourcing” scientific problems.

Our cluster-finding task was a good match to the strengths of citizen science.  We benefited greatly from previous Zooniverse projects; the Andromeda Project site borrowed heavily from the Seafloor Explorer project’s interface, and a community of engaged citizen scientists interested in astronomy already existed, thanks to Galaxy Zoo and other previous projects.  Once initiated, the project moved quickly. Planning started during the summer of 2012, development followed in mid-September, and beta testing of the site occurred in early November.

The Andromeda Project website is designed to get users doing science quickly.  After taking a short tutorial on the basics of cluster identification and how to use the interface, users start classifying images immediately.  While non-expert volunteers may not individually perform as well as members of our science team, combining large numbers of user classifications ensures high overall quality in the aggregate.  In addition, we employ a number of cross-validation techniques to assign user weightings, which allows us to normalize user identifications based on the quality of an individual’s cluster selections.  Using these analysis techniques, we expect the cluster catalog from the Andromeda Project to be very reliable.