Massive Open Online Data

GloBI Exporer See-It-All Poster

Baron, Daniela; Caragol, Ri; Furrer, Stefan; Macmurchy, Peter; Stark, Adam (2015): GloBI Explorer Interactive Ecosystem Explorer. figshare. Retrieved on May 23, 2015.

MOOCs are all the rage. An enormous amount of courses are now available online for anyone who has time and a networked device. When Battushig Myanganbayar, a 15-year-old high school student from Mongolia, aced MIT’s Circuits and Electronics MOOC, he found himself in the spotlight and got invited to study at the prestigious institution.

The folks at Indiana University take the idea to the next level: the Information Visualization MOOC organized by Katy Börner et al. not only provides free education, it also gets students to collaborate with real projects that give access to open data. I was excited that our project, Global Biotic Interactions (GloBI), was invited to participate in this unique course for a second year (see IVMOOC 2014 project).

GloBI Explorer Paper

Baron, Daniela; Caragol, Ri; Furrer, Stefan; Macmurchy, Peter; Stark, Adam (2015): GloBI Explorer: Interactive Ecosystem Explorer. figshare. Retrieved May 22, 2015.

The IVMOOC-GloBI challenge for this year was to create an engaging experience for high school students to explore food webs in and outside of the classroom. From the start, Daniela Baron, Ri Caragol, Stefan Furrer, Peter MacMurchy, and Adam Stark were eager to learn more about the dataset, provide improvement suggestions, and respond to feedback provided by Jeff Holmes, Marie Studer, and Jen Hammock of the Encyclopedia of Life. I was impressed by what they were able to create in only a short amount of time: a web application, a paper, and show-it-all poster.

GloBI Explorer Screenshot

GloBI Explorer Screenshot. Retrieved on May 22, 2015.

I think that this year’s IVMOOC project demonstrates the benefits of open data: the use of openly accessible data provides the ability for anyone with an idea and an internet connection to help us better understand the world around us. Not only that, it helps to create Massive Open Online Data (MOOD) communities of citizen scientists and engineers from all over the world to help make the data, and the tools used to access it, increasingly more useful. For example, Sergey Slyusarev, an IVMOOC 2014 alumnus, has identified data issues and is coauthor of rglobi, an R library for accessing GloBI.

Thanks to the IVMOOC class of 2015 (and their organizers) for making this happen!

Tea at Berkeley Institute for Data Science

Doe Memorial Library at UC Berkeley

On Feb 5, 2015, Global Biotic Interactions (GloBI) was topic of an afternoon Tea talk at Berkeley Institute for Data Science (BIDS). Located in the historic Doe Memorial Library, BIDS helps to advance data-intensive science across the UC Berkeley campus. Part of their many activities is a twice-a-week afternoon tea series where projects are presented and discussed.

Berkeley Institute for Data Science

After a short introduction by BIDS fellow Falk Schuetzenmeister, I introduced GloBI to the 20-30 audience members. Following, the more interesting part of the afternoon started: an open discussion! I learned a few things from the discussion. First is that the audience considered the immediate utility of GloBI to be facilitating data discovery and literature research to help come up with original research questions. In line with this topic, Iryna Dronova suggested to create a real-time data source tracker to help visualize which data sources are available through GloBI.

One of the other discussion topics was how to stimulate ecologist to share data. David Ackerly mentioned that a critical part of the success of GenBank was that editorial boards of genomics journal collaborated and mandated that data be deposited in this public resource prior to accepting manuscripts for publication. Also, in-person, multi-day workshops were mentioned as promising method to get cross-disciplinary researchers together, share data and use new tools to help answer meaningful research questions.

GloBI slides presented at BIDS Tea

Sea otters and their lunch were a topic of discussion at BIDS Tea Feb 5, 2015 following a presentation.

Finally, an audience member asked a question: ‘Do otters really eat beavers?’. This told me that my short demo using available GloBI tools assisted in data review – dubious data was identified quickly during a short data excursion in the world of species interactions. My answer to this question was: Please lookup the source reference and ask the data contributors. After a visit to the EOL Enhydra data tab, I discovered that this interaction was recorded by Joel Sachs et al. (2006).

Now the question remains: Do sea otters (Enhydra) really eat American beavers (Castor canadensis)? Are sea otters that vicious? I hope we’ll find out… I’ll make a point to share this blog post with the authors, in the hope that they can shed some light on the topic.

Many thanks to Falk Schuetzenmeister, Ali Ferguson and the engaged audience for a stimulating afternoon at BIDS.

Update Feb 19, 2015 – After an open discussion with data contributors, the trophic interaction between Sea Otters (Enhydra) and American beavers (Castor canadensis) has been removed. The changes will propagate into GloBI and EOL with some delay.

Eating Pudding

Poelen et al., 2014

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. .

“The proof of the pudding is in the eating”, is a phrase that stuck out in detailed comments from Jan Willem Henfling on our recent paper (Poelen et al., 2014) in Ecological Informatics. With this, he pointed out that it is important to get the species interaction data into the hands of researchers and educators.

I was happy to read his comments, because it told me that our investment in writing and publishing an open-access paper (at a seemingly hefty price of $2500) is starting to pay off. Also, it highlighted that getting the interaction data out there for anyone to use is not enough: active collaborations are essential to show the use of our project. This is why I wanted to share some recent activities with you.

NESCent-EOL-BHL Research Sprint Feb. 4-7, 2014 (3) copy

Participants of NESCent-BHL-EOL Research Sprint on 4-7 February 2014 in Durham, North Carolina. Can you find the author?

After participating in the 4 day research sprint organized by NESCent, Biodiversity Heritage Library and Encyclopedia of Life at Durham, North Carolina in February 2014, I have been working with Brian Hayden to use GloBI data to show how dietary niche relates to biodiversity around the globe. Preliminary results are encouraging and a manuscript is in the works. Also, I have continued to work with Jen Hammock (Encyclopedia of Life, Smithsonian Institution) and Jim Simons (Gulf of Mexico Species Interactions, Texas A&M Corpus Christi) to put GloBI data to public use.

Tree-for-All hackathon participants gather to hear a progress report.

In September 2014, I participated in the week long Tree-for-All hackathon hosted at the University of Michigan and organized by Arbor Workflows and Open Tree of Life. Among many other things, this collaborative event helped create a method to retrieve phylogenetic trees related to species interactions (e.g. Pocket Gophers and Their Parasitic Chewing Lice) using rglobi (part of rOpenSci) and rotl R libraries.

In the time to come, I am looking forward to continue to help others eat (or make!) more of that delicious GloBI data pudding! Pudding anyone?

A Food-Web Map of the World

This is a caption

A spatially integrated food web of the world derived from hundreds of thousands of interactions, across tens of thousands of species, and thousands of locations.

Sergey Slyusarev, Dimitrios-Georgios Kontopoulos, William Taysom, Adrian Guzman, and Bimlesh Wadhwa used GloBI data to create a food-web map [1] as part of the Information Visualization MOOC class of 2014 at Indiana University. The map was created by combining interaction data from GloBI’s Darwin Core Archive with terrestrial and marine ecoregions of the world and various openly available taxonomies (e.g., ITIS, NCBI, WoRMS). After eliminating taxa with few recorded interactions, species with similar predator-prey characteristics were grouped by a custom algorithm that was inspired by the Jaccard index, a similarity measure, and based on Infomap, a community-detection algorithm. The resulting interconnected taxa communities were then used to make an information-packed (gorgeous!) food-web visualization. The map was generated with a combination of custom R scripts, existing libraries (e.g., igraph, Reol, rgdal), Cytoscape, and Adobe Illustrator.

This is the caption

Explanation of how color, line width, and node size are used to encode spatial food-web information.

This is a caption

Color encoding of ecoregions around the world, plotted with interaction locations.

I find the integration of spatial information (e.g., marine, terrestrial) in this graph useful because I can quickly relate specific interactions to regions in the world. For instance, I can easily spot a coastal interaction as a filled node that also has a colored border. In addition, the directionality of the interactions are easy to understand thanks to color coding: predator is orange, prey is blue. Opening the high-resolution image in a run-of-the-mill image viewer, I can easily browse the map by zooming and moving with touch-pad gestures. With the help of this visualization, data anomalies in GloBI’s complex data collection were detected, reported through GloBI’s issue list (see here, here, here, and here), and corrected. This alone tells me that the visualization by Slyusarev et al. is a useful research tool.

Special thanks to all GloBI data contributors, Sergey for his suggestions for improving GloBI, and Scott Weingart of Indiana University for inviting GloBI as a client project of IVMOOC 2014. Can’t wait to work with the IVMOOC class of 2015!

[1] Slyusarev, Sergey; Kontopoulos, Dimitrios-Georgios; Taysom, William; Guzman, Adrian; Wadhwa, Bimlesh (2015): Global Biotic Interactions food web map. figshare. Retrieved 03:26, Feb 07, 2015 (GMT)

Exploring Antarctic Interactions Using GloBI’s Interaction Browser


Area selection tool in the GloBI’s Interaction Browser provides access to raw data files in addition to a share link. The “show” link updates the visualizations in other parts of the page.

Rugged scientists frequently brave the elements to study who eats what in those frigid yet productive waters of the Southern Ocean. Earlier this year, Ben Raymond was kind enough to share the Southern Ocean diet database that he developed with colleagues (Raymond et al. 2011) with GloBI. Having great data is one thing but . . . being able to (easily) explore the data is a challenge by itself. Enter Göran Bodenschatz, an enthusiastic, passionate web developer. Göran unleashed his skills to create a first pass at the GloBI’s Interaction Browser using d3, a javascript visualization library, in combination with the GloBI API. His html/ javascript source code is available here.

With Ben’s data and Göran’s tool, we can now “dial-up food webs” (phrase coined by Peter Roopnarine)  all across the Antarctic and discover that many species feast on Eurythenes gryllus and its cousin Eurythenes obesus. Not only are the interactions visualized on-the-fly using a dependency wheels, you can also access the raw csv, json or dot files to do offline analysis. In addition, you can share the selected area with others using a provided Interaction Browser “share” link.

This particular experience tells me that simply collecting and aggregating data is not enough. Only after locating and illuminating data with search and visualization tools,  I can start to analyze and perhaps understand the biological mechanisms behind the data hidden inside GloBI. . .


Screenshots of circular diagrams that highlight predatory interactions for Eurythenes gryllus around the Antarctic peninsula. The left diagram indicates the number of interactions by the width of the arc on the outside of the circle. The right diagram bundles the interactions to help detect highly interacting taxa. In the right diagram red indicates incoming interactions of selected taxon (e.g. prey), whereas green indicates outgoing interactions (e.g. predator).

What Parasites Does the Atlantic Croaker Host? Find Out on the Encyclopedia of Life

EOL's Atlantic Croaker species page with the GloBI data elements highlighted in pink.

EOL’s Atlantic croaker species page with the GloBI data elements highlighted in pink.

In the spring of 2013, a friend of mine pointed me to an article in the National Geographic about tongue-eating fish parasites. After suppressing my gag reflex upon seeing a picture of a parasite acting as a tongue of an Atlantic croaker (yes, the fish was still alive), I decided to request data from Colt W. Cook, author of a master’s thesis titled “The Early Life History and Reproductive Biology of Cymothoa excisa, a Marine Isopod Parasitizing Atlantic Croaker, (Micropogonias undulatus), along the Texas Coast.” Colt was kind enough to give me permission to add his dataset to GloBI.

Now that the Encyclopedia of Life has integrated GloBI data into its species pages, the Atlantic croaker page includes dietary habits as well as information about parasites such as Cymothoa excisa. It’s a win-win: users of the Encyclopedia of Life gain access to all sorts of structured species-interaction data, and the hardworking scientists who collected the data are attributed for their research.

Reference to Colt W. Cook dataset from the EOL's Atlantic Croaker data page.

Screenshot of the reference to Colt W. Cook’s thesis on the EOL Atlantic croaker data page.

At time of writing (January 24, 2014), GloBI includes about half a million global interactions with close to four hundred references, spanning over a century of species-interactions data. As GloBI continues to aggregate existing datasets, we lower the barrier to accessing important data and put the scientists who’ve made contributions to the field of biology in the spotlight.

Acknowledgments: big thanks to Colt W. Cook for sharing his data, and Jen Hammock and Patrick Leary for helping to integrate GloBI’s aggregated Darwin Core Archive into the Encyclopedia of Life.

Want to contribute data?

Want to access species-interaction data?

The Anatomy of GloBI


An anatomy of GloBI: Sources (datasets, taxonomies, ontologies) are aggregated into a single normalized metadata set. This dataset can be accessed in various ways to suit offline data analysis (Darwin Core), data integration (Linked Data), or interactive apps (JavaScript APIs or web services).

In the past year, we’ve written a bunch of software to normalize, aggregate, and expose various existing species-interaction datasets. To help understand the bits and pieces of the software that drives GloBI, I’ve included a system diagram in this post. You’ll find the data sources on the top (ontologies/datasets), the normalization magic in the middle, and the exports or APIs on the bottom. Also, the current (known) users (e.g., EOL pages and GoMexSI) are included. If you have an interest in learning more or sharing ideas on any of these topics, I invite you to read our wiki, play around with the JavaScript API, download aggregate datasets, comment on this post, or create a GitHub issue. It is your input that helps us to build the right things at the right time. Past feedback has led us to make the improvements we’re working on now. For instance, we are working on bettering the quality control of name mapping, providing examples for JavaScript APIs, and creating more mappings to existing ontologies, such as EnvO and Uberon, while extending our new interaction ontology.

Hoping to hear from you! Thank you for reading this post!