Yet another gift from the social networking behemoth – this one being a “big graph” processing library for the JVM.
Twitter’s recent open source charm-offensive has continued with big graph processing library Cassovary being contributed to GitHub.
The social networking giant has already been right on trend by offering Scala cascading library Scalding and Cassandra client Cassie to the masses, but now it’s Cassovary’s turn for attention. Cassovary is written in Scala (but can be used with other JVM languages) and is described as ‘a simple “big graph” processing library for the JVM,’ that is importantly space efficient unlike many JVM-hosted graph libraries.
Revealing the open sourcing on Twitter’s engineering blog, Pankaj Gupta said that Cassovary is designed from the ground up to be able to efficiently handle graphs with billions of nodes and edges. Given the gargantuan operation that is Twitter, they are surely one of the best to know about how to deal with large-scale graph mining of a big network.
Gupta gave examples of Cassovary at work within Twitter too:
At Twitter, Cassovary forms the bottom layer of a stack that we use to power many of our graph-based features, including “Who to Follow” and “Similar to.” We also use it for relevance in Twitter Search and the algorithms that determine which Promoted Products users will see. Over time, we hope to bring more non-proprietary logic from some of those product features into Cassovary.
You may be thinking, there’s already several graph mining libraries available but Cassovary differs from the likes of Neo4J, the storage sacrificing JUNG and C/C++ written SNAP by deliberately being as simple as possible to use. No need for persistence or database functionality or even partioning like Apache Giraph, Cassovary appears to stay out of the complex stuff to allow it to run efficiently.