Confirmed Sessions at Graph Day SF

We'll be adding a few more sessions between now and the day of the conference.

Open Problems in the Universal Graph Theory

Marko Rodriguez - DataStax / Apache Software Foundation

The universal graph is a theoretical construct capturing the idea that every aspect of reality can be modeled as a graph composed of vertices and edges and, as such, reality is a graph. From the physical world of atoms, people, and galaxies to the mental planes of thoughts, words, and knowledge, there exists a universal graph hosting all such structures. While this idea is enticing, there are still strides to be made in coming to terms with a reality that is not composed of atoms bound by spacetime, but instead, a graph composed of vertices united by edges. This presentation will discuss three open problems in our understanding of the universal graph. It is strongly recommended that the attendee read the corresponding letter Open Problems in the Universal Graph Theory ( and bring a printed version of the letter to the talk as the presentation will serve as a visual tour of the contents therein.

A Cognitive Knowledge Base as an Enterprise Database

Haikal Pribadi - GRAKN.AI
Developing intelligent systems require integrating large and diverse datasets that results in data that are too complex for current databases to handle. Current databases could not model complex domains, and the query languages are not capable of capable of interpreting complex relationships. Intelligent systems, therefore, require a knowledge base to manage complex data and derive knowledge out of it. Unfortunately, a knowledge base requires more intelligent modelling and querying capabilities than what current databases provide.
In this talk, we will briefly discuss the history of AI to learn how the idea of a knowledge base came about. We will discuss why it matters to the enterprise, and how the state of the art can help enterprises tackle large problems with complex data.
We will also talk about GRAKN.AI, a distributed knowledge base with a reasoning query language, allows you to develop intelligent systems that are capable of interpreting complex data. We will discuss the knowledge representation model, the inference query engine, and how GRAKN.AI provides the logical integrity of SQL, the scale of relationships of Graph DBs, with the horizontal scalability of NoSQL. To know more, intelligent systems need to collect and make sense of more information of the real world. To derive knowledge, intelligent systems need to interpret the relationships in real world datasets. If a database can do both, we will enable the development of next generation enterprise intelligent systems, built on top of a cognitive knowledge base.

Mimblewimble Transaction Graph Design

Dr. Denise Gosnell - PokitDok

Mimblewimble Transaction Graph is a design for a blockchain transaction graph that utilizes the mathematical properties of homomorphic encryption to enable compression, faster verification, higher security in a transaction graph.
This talk will walk through the definition and examples of homomorphic encryption to build up to its application within the Mimblewimble transaction graph design. We will discuss the compression, security, and processing speed gained via a Mimblewimble transaction graph over the traditional bitcoin transaction graph.

Scale Out Your Graph Across Servers and Clouds with OrientDB

Luca Garulli - OrientDB

Scaling a graph database isn't an easy task. Discover what we've done in OrientDB to scale out a native graph database on multiple servers by using a mix of distributed techniques in order to achieve the best performance. With OrientDB, it's finally possible to have a distributed graph across multiple data-centers on premise or on different cloud platforms. Don't lock yourself into one cloud vendor. With an Apache 2 Open Source license, OrientDB is the most popular distributed graph database.

The graph capabilities of a native multi-model database

Claudius Weinberger - ArangoDB

Polyglot persistence - maybe today much better interpreted as using the right data model for each part of an application - has just become a lot easier and cheaper with the advent of native multi-model databases. But can these compete for example with specialized graph databases, both with respect to the performance for smaller graphs that fit in RAM in their entirety, but also with respect to their handling of larger graphs by means of scaling horizontally? It is also very interesting to see what graph functionalities are supported.
In this talk we will use the example of ArangoDB to present the wide range of graph capabilities of native multi-model databases. We explain and show concrete examples with the following topics:
path pattern matching
graph traversals
sharding leveraging domain-specific locality knowledge
fully distributed parallel graph algorithms using Pregel
All these are readily available from the query language AQL which is common to all supported data models and thus allows mixing and matching of these graph capabilities with document queries, joins and key/value lookups.

SPARQL querying of Graph Databases using Gremlin

Harsh Thakkar - Universität Bonn

Knowledge graphs have become popular over the past decade and frequently rely on the Resource Description Framework (RDF) or property graph databases as data models. We present Gremlinator, the first translator from SPARQL -- the W3C standardised language for RDF -- and Gremlin -- a popular property graph traversal language. Gremlinator translates SPARQL queries to Gremlin path traversals for executing graph pattern matching queries over graph databases.
This allows a user, who is well versed in SPARQL, to access and query a wide variety of Graph Data Management Systems (DMSs) avoiding the steep learning curve for adapting to a new Graph Query Language (GQL). Gremlin is a graph computing system agnostic traversal language (covering both OLTP graph database or OLAP graph processors), making it a desirable choice for supporting interoperability for querying Graph DMSs.

GraphSQL - A Game Changer: A Complete High Performance Graph Data & Analytics Platform

Dr. Yu Xu - GraphSQL

GraphSQL develops a high performance enterprise-class graph data platform to simplify and empower real-time graph analytics of massive connected data from all sources in complex enterprise data ecosystem. GraphSQL’s ground-breaking technology innovation enables businesses to transform structured semi-structured, unstructured data and massive enterprise data silos into a massive intelligent inter-connected data network of business entities and meaningful relationships, and uncover implicit patterns and critical insights in order to achieve better business outcome faster, easier and cheaper.
GraphSQL’s technology excels at fast data loading to build graph rapidly, high-performance parallel execution of graph algorithms, real-time capability for streaming updates and inserts using REST, and unified real-time analytics with large-scale offline data processing in a single hassle-free environment. A complete high-level SDK package with intuitive visualization library is provided for graph modeling, mapping, loading, and querying to ease and accelerate the analytic application development and delivery lifecycle. Application developers benefit significantly to shorten their design and development cycle time from GraphSQL’s winning expressive SQL-like graph query language (GSQL). No additional coding via Java, C++, and other programming languages is necessary.
Through strong partnership and a co-innovation model with customers, GraphSQL is powering some renowned industry leaders in banking and financial services/payments/FinTech (Anti-Money Laundering, fraud detection and prevention), e-commerce (product search and recommendation), and social networks. With real time access to big connected data about their customers, audience and visitors, GraphSQL customers significantly exceed their sales targets with smarter recommendations, intelligent product search, more personalized offers and ads, and relentlessly protect their customers by fighting financial fraud, and successfully comply with government regulatory compliance on AML.

Knowledge Graph in Watson Discovery

Anshu Jain / Nidhi Rajshree - IBM

When we extracted information from 5 million Wikipedia documents, we obtained a knowledge graph of 30 million entities and 200 million relationships. With yet another Watson Discovery client, we ingested 80 million documents resulting in 5 billion relationships. Our main goal was to discover knowledge from this graph. In doing so, there were two key challenges we faced, 1) discovering “un-obvious” knowledge from a knowledge graph of entities and relationships and 2) doing it at scale. In this talk, we describe the approach we took in tackling both these challenges by redefining data models, leveraging multiple backend technologies best suited for storing different aspects of the data, creating redundant and cache-like stores to optimize the query workloads, re-thinking our rank and retrieval algorithms such that we don’t compromise on precision and the discovery quotient while designing for scale.
Technical skills and concepts required: Basic knowledge of Natural Language Processing, Information Retrieval and Database Retrieval concepts

Propel your performance: AgensGraph, the multi-model database

Junseok Yang - Bitnine

You don’t need to reinvent the wheel to create the next generation DMBS. In this presentation, we will talk about how we built a AgensGraph - a multi-model database based on RDBMS. We will share some challenges to get the desired performance we initially planned. Finally, we will provide the result of LDBC benchmark and use cases in which AgensGraph excelled .
The AgensGraph key differentiator is one fast single query for both SQL & Graph.

Dgraph: A native, distributed graph database

Manish Jain - DGraph

Dgraph is built with the aim to be used as a primary datastore and serve terabytes of data over commodity hardware. It is a real-time, horizontally scalable, high-throughput, fast graph database, designed for web-scale production environments. Within a short span of time, Dgraph has amassed 3300 Github stars and an actively growing community.
In this talk, we present Dgraph and explain the design choices and concepts behind the database. We showcase a new graph query language GraphQL+-, a modification of Facebook's GraphQL, comparing it with the existing popular graph query languages with real-world examples. This would be followed with a demo using Stack Overflow data to show how various pages can be rendered with complex queries to the database, keeping the application code simpler and allowing developers to iterate faster over their application.
Come and listen to this talk to understand what's going on behind the scenes and whether you should be using Dgraph for your next project.

Globally distributed, horizontally scalable graphs with Azure Cosmos DB

Aravind Krishna R - Microsoft

Learn about how you can work with massive scale graphs using Microsoft’s new Azure Cosmos DB service. Cosmos DB lets you interact with graphs using Apache TinkerPop’s Gremlin APIs along with providing turn-key global distribution, elastic scaling of storage and throughput,

Graphs All The Way Down: Building A Full Stack Graph Application With GraphQL and Neo4j

William Lyon - Neo4j

Despite what the name may imply, GraphQL is not a query language for graph databases. Instead it is a new way of building APIs where application data is treated as a graph on the frontend. GraphQL has been called "REST 2.0" as it offers many advantages over REST.
Although GraphQL can be used with any database or backend service, it becomes even more powerful when combined with a graph database such as Neo4j. Using graphs on the backend as well as the frontend allows for removing the mapping and translation layer, simplifying development. Translating GraphQL to a single graph database query offers performance benefits. Further, we can enhance the expressivity of GraphQL by exposing Cypher, the query language for graphs, in our GraphQL schema.
This talk will start with a brief overview of GraphQL and graph databases then dive into why they are awesome when used together! We will talk about how we can use GraphQL with Neo4j and walk through the code to build a full stack application. No experience with GraphQL or graph databases is necessary to benefit from this talk.

On-boarding with JanusGraph Performance

Chin Huang / Yi-Hong Wang - IBM Open Technologies

When approaching a new technology, an upfront evaluation of its performance is necessary. Graph databases support a flexible data model that allows users to easily represent and manage domain specific data. Meanwhile, there are a number of variables in graph modeling and implementation mechanisms that will influence the performance of loading and querying graph data. With one of the latest graph databases available, JanusGraph, we evaluated various graph workloads in order to understand the performance characteristics and to identify system requirements. In this talk, we will share with the audience our performance test approach, the data, schema, tools, and methodology we used. We will also show the results of JanusGraph performance, provide recommendations on achieving better graph performance, and investigate how to apply the same approach to other graph databases.

Intended audience:
Graph data model designer
Graph data application developer
Graph database operator

Technical skills required:
Basis understanding of graph concepts and performance metrics

Building a Graph Data Pipeline

Paul Sterk / George Tretyakov - Ten-x

Are you thinking about implementing a Graph Database? Are you wondering how to transform your existing datasets into a Graph model? At Ten-X we built a complex, multi-stage Graph Data Pipeline that sources, filters, de-dupes, transforms, loads and manages different sets of data in Janus-Graph. We would like to share some of these insights and hard-earned lessons with you especially in how to deal with poorly documented, complex and dirty legacy datasets. We will talk about a third-party service you can use to greatly ease your ability to de-duplicate any geo-orientated records (such as customer addresses) as well as a compelling data enrichment story. We will also cover approaches for converting data records into vertices and edges, strategies for transforming and creating a graph database ‘load-ready’ dataset, and thoughts on our technology stack (Hadoop, Hive, Spark, TinkerPop, JanusGraph, Cassandra and Elastic Search).
- Intended audience: engineers, architects
- technical skills and concepts required: familiarity with a Big Data Stack and a Graph Data

Start Flying with Python and Apache TinkerPop

Jason Plurad - IBM

Gremlin, the graph traversal language from Apache TinkerPop, continues to evolve in support of the growing graph ecosystem. In this session, we'll take a deep dive into Gremlin Language Variants (GLV) to see how TinkerPop enables modern programming languages to leverage Gremlin natively. By converting Gremlin into bytecode, the same instructions can be transmitted and interpreted by graph systems from many different vendors. We'll uncover the benefits of this approach by demonstrating a Python-based graph architecture designed to empower your application developers and data scientists. By using popular packages from Python open source, like Flask microframework and Jupyter notebooks, we'll see how you can easily transition your app development from your machine to the IBM Cloud.

Cross-Device Pairing with Apache Giraph

Gargi Adhav / Obuli Krishnaraj Venkatesan - Drawbridge

In this talk we will talk about how Giraph is used at Drawbridge to match pairs of devices, and ultimately assign each pair to an anonymous consumer. Using Giraph, we identify more than 10 billion cross-device pairs among 5 Billion identities spread across computers, smartphones, tablets, and ultimately connected TVs. We will also talk about how adopting Giraph has reduced the runtime, while increasing the quality and scale.
- Our graph has >5 Billion vertices and >10 Billion edges and generates >20 Billion pairs.
- When we used Hadoop MapReduce to generate the pairs, it took more than 24 hours to finish.
- Treating this problem as a graph problem and using Giraph brought down the runtime to less than an hour.
- The Giraph implementation is more flexible and gives us more capability to tune at the individual vertex level. It has also resulted in double-digit improvement in precision.
- We use 800 workers on 100 machines with 20 TB of combined memory.

We will also talk about the hardware changes that we did to our Hadoop cluster, and how we are using Giraph as a Java action in Oozie workflow. We will also talk about some of the changes that we did in the computation and data representation that helped us in improving the performance by balancing between CPU, Memory and Network bandwidth usage to achieve optimum run-time.

Tinkering the Graph

Karthik Karuppaiya - Ten-X / Chris Pounds - Expero

OK, so you had your Zen moment and suddenly you realized that Graphs are everywhere and the best way to model and store your data is as a Graph. The next big thing is how do you actually productize the graph database and let your products access the data. In this talk we will talk about how we did exactly that at Ten-X. JanusGraph, Cassandra, Ansible, SpringBoot, Docker and Mesos are some of the technologies we have used to make our platform production ready. We will share how Janus Graph is deployed in our production environment. We will also talk about the RESTful API layer we built Using SpringBoot and TinkerPop for performing CRUD operations and search queries on the JanusGraph Database.

Comparing Giraph and GraphX

Jenny Zhao / Yu Gan - Drawbridge

At Drawbridge we process one of the largest graphs in the industry with more than 11 billion vertices and 70 billion edges. In order to run complex algorithms (i.e. intelligent clustering) efficiently at this scale, we rely on optimizing the best distributed graph processing technologies. In this talk we will compare Giraph and GraphX based on performance, tuning, integration with Spark and how they fit in our platform. We will present an actual use case, demonstrating feature generation for machine learning on our graph using 3 different technologies: Spark, Giraph, and GraphX. In addition, we showcase an in-house python utility module used for collecting and monitoring Giraph performance metrics. Not only its current implementation covers critical data for debugging and profiling algorithms, but also provides it the framework of flexible extension for various use cases.

JanusGraph: Today and Looking to the Future

Ted Wilmes - Expero

Graph databases are no longer just the new kids on the block, but maturity doesn't mean that they can't be a little edgy. Research in data engines can be applied in graph databases, and open-sourced projects like JanusGraph are a great place to do it. Join Ted as he looks into the internals of JanusGraph and consider how the engine can be extended and enhanced with modern day research conjectures and proposals inspired by other database engines and academia.

Making Old Data New Again in the Original Social Network: real-time navigation and insights in massive family trees

Josh Perryman - Expero / Jeremy Hanna - DataStax

Family trees are natural graphs, and kindly are both directed and acyclic. However, ancestry data has been bogged down by legacy file formats (GEDCOM) and is often relegated to implementations in tabular data engines like relational databases. Join us in a quick journey through some common ancestry use cases. We’ll start off by finding our (pretend) ancestor Robert (or Bob) with fuzzy search capabilities. Then when we have a few candidates, we will compare their subgraphs, we mean family trees, and see if we can fit two together. Finally, we will merge the results and get a more complete picture of our faux heritage. The talk will include liberal amounts code samples and maybe a few snarky comments as well.

Successful Techniques for a Large Distributed Database: the Comcast XNET Platform

Jaya Krishna - Tulasea / Ravi Lingam - Comcast

XNet is a platform that captures network change events in a graph database. Currently, XNet handles 10 Data Sources producing 5 billion change events/day across multiple Regions/Data Centers and 200 million consumer queries/day across multiple Regions/Data Centers. XNET supports 24/7 availability Inter-Data Center Failover. XNET is designed for growth to expand to 5 additional data centerst, and to handle hundreds of data sources and a trillion events/day

Knowledge Graph Platform: Going Beyond the Database

Michael Grove - Stardog

Adoption of Graph Databases in the enterprise is gaining momentum, and while they are suitable alternatives to other database types for many use cases, the real power of graphs is not yet being utilized generally. Graph offers more than just traversals and convenient analytics, they offer a transformative platform that lets an enterprise create knowledge from data. This flexibility, combined with a formal logical model, allows data in all its forms—structured, semi-structured, and unstructured—to blend seamlessly into a single, coherent Knowledge Graph. The formal logical model can not only encode business logic as part of the graph itself, but it is also an ideal, declarative way to define graph structure, while retaining its flexibility. It can also be used as the basis for alignment between disparate data sources, or a way to enrich the data before advanced NLP, machine learning, or analytics are performed over all of an enterprise's knowledge.

Project Konigsburg - A GraphAI

Denis Vrdoljak / Gunnar Kleemann - Berkeley Data Science Group

In this presentation, we will talk about our research and development towards creating an AI that can predict connections within graph networks. Unlike typical prediction methods based on counting wedges (e.g., counting “mutual friends”) or requiring outside knowledge (e.g., syncing with email or contacts lists), we will talk about how we employed different triadic measurements to engineer features for our machine learning models to predict connections based on relationship patterns, specific to different applications. We will also go over some of the applications we have in mind for our system – including recommending stores and restaurants based on social connections’ shopping patterns, predicting future social or professional contacts, and even possible applications in counterintelligence and counterterrorism.
We will cover some of the challenges that we faced, like biased uncertainty in training data, single classifier approaches, and limitations of existing graph databases , and adapting heuristics based on application – after all, we don’t expect recommending coffee-shops to work with the same parameters as identifying sleeper cells!
Finally, we will review the different machine learning models that we evaluated, talk about their trade-offs, and conclude with a brief demo of our system in action, and talk about some of the new developments and possibilities we learned at Data Day Texas earlier this year.

Do I need a Graph Database? If so, what kind?

Juan Sequeda, Capsenta

When you have a hammer, you think everything is a nail. Now that graph databases have gained popularity and are starting to mature, you may think that you have a graph problem. But do you really? If you do have a graph problem, what graph database should you use? Property Graph or RDF Graph? In this talk, I will present characteristics that indicate that you may have a graph problem and discuss which type of graphs should be used. This talk is an evolution of many conversations at data meetups and discussions at previous talks.

Graph-based Taxonomy Generation

Rob McDaniel - Live Stories

How do you automatically generate a taxonomy from a corpus? Getting topics is easy, but organizing them into any meaningful hierarchy is expensive. This talk will cover the real-world application of graph-based taxonomy generation from a weighted topic graph, as proposed by Treeratpituk et al. Included in this lecture will be a brief overview of multi-level graph partitioning, the generation of an edge and vertex weighted graph and a basic open-source implementation and samples.

Graph navigation via pattern matching using the new OrientDB MATCH statement

Robert Schneider - WiseClouds LLC

No matter what type of graph database application you’re developing, fully leveraging the power of these exciting new technologies requires constructing dynamic, far-reaching queries that are capable of navigating your entire information collection. Unfortunately, this has often meant getting bogged down while attempting to become competent with proprietary, difficult-to-understand syntax.This steep learning curve has resulted in far too many graph initiatives struggling to achieve the insights that were promised to key stakeholders.
In response to these challenges, OrientDB recently introduced the MATCH statement. This presents a fresh approach to graph navigation that lets users and developers pair straightforward pattern matching syntax with familiar SQL. The outcome is a dramatically simpler yet more powerful way to fully realize the potential of your graph data.
In this session, we’ll introduce you to the new OrientDB MATCH statement, including how you can use it to quickly construct robust queries. To help you visualize what’s possible, we’ll provide numerous examples that are both simple yet meaningful. The upshot will be that you’re ready to start utilizing this powerful command to transform your graph database interactions.

Graphs in Genomics

Jason Chin - Pacific Biosciences

Since the discovery of DNA molecules, graph theory and methods have been used in analyzing genomes. Recently progress in high throughput DNA sequencing instrument development has pushed the state of art using graph for understanding genomics further. Jason will present recent advances in this field to the data science community.
Jason will begin by going over a few examples where graphs are used to encode genomics information for human health. He will then dive a bit into the graph theory used for a specific problem: "genome assembly" - essentially, how currently bioinformatists use graphs to put millions of smaller pieces of DNA sequences (hundred gigabyte data) into contiguous genome sequences (several Mb to several GB) in practice.
Jason will 1) define the problem, 2) give an overview of the general approach, 3) compare different topological and statistical properties of assembly graph to other kinds of graphs, e.g., social network or small world graph, and 4) demonstrate a specific end-to-end example for people to see the whole process. Jason will wrap up the talk with a view toward future challenges: computation scaling, new related theoretical problems and standardization for related graph processing.

Investigating patterns of human trafficking through graph visualization

Christian Miles - Cambridge Intelligence

It is estimated that at any given time, 2.5 million people are in forced labour (including sexual exploitation) as a result of trafficking. The vast majority of victims are between 18 and 24 years of age. In this talk, Christian will walk the audience through the steps taken to collect, visualize and analyze a unique dataset of 22,500 classified advertisements in order to identify potential indicators of human trafficking. This talk and associated work follows a similar methodology for analysis as previous studies completed in Hawaii but applies a distinctly graph-oriented approach to a whole new geography. Methods used will include graph modelling/visualization, web scraping, text mining and geospatial analysis. Christian will demonstrate how his analysis reveals valuable new insights into trafficking routes and highlights patterns of exploitation that could be used to prevent trafficking crimes.

Time for a new relation: Going from RDBMS to Graph

Patrick McFadin, DataStax

Most of our introductory graph sessions come from practitioners with a heavy graph background. Patrick McFadin will present a session from the perspective of someone with a broad relational background (at scale) who has recently started working with graphs.
Like many of you, I have a good deal of experience building data models and applications using a relational database. Along the way you may have learned to data model for non-relational databases, but wait! Now we are seeing Graph databases increase in popularity and here’s yet another thing to figure out. I’m here to help! Let’s take all that hard won database knowledge and apply it to building proper Graph based applications. You should take away the following:
- How graph creates relations differently than an RDBMS
- How to insert and query data
- When to use a graph database
- When NOT to use a graph database
- Things that are unique to a graph database

Graph Query Language Task Force Update from LDBC

Juan Sequeda, Capsenta

The Linked Data Benchmark Council (LDBC) is non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software.
The goal of the Graph Query Language Task Force is to present a standardization proposal of a graph query language for Property Graphs. The task force consists of members from academia and industry (Neo4j, Oracle, IBM, SAP, Huawei, Ontotext, Sparsity and more). In this talk, we will provide an update, which includes a formal model for Property Graphs which includes paths as a first class citizen.

openCypher on the march: Composing graph queries, powerful path patterns and SQL integration

Ryan Boyd, Neo4j

Cypher is the first and by far the most widely used declarative language for querying and maintaining property graph databases. Today, it is used in Neo4j, SAP HANA Graph, AgensGraph and RedisGraph (as well as several tools and other research or incubating projects).
Cypher is a declarative query language optimized for graphs. We'll show the basic constructs of the language and talk about why it has been widely adopted and praised by the graph database community.
Cypher introduced the idea of matching “ASCII art” path descriptions, to extract subgraphs for client processing, and for shaping new data to be created or merged into the database. This approach has been adopted by other languages like Oracle’s PGQL or Microsoft SQLServer’s extensions to standard SQL.
Cypher is not standing still. In two openCypher Implementers Meetings this year, researchers, vendors and end-users have discussed plans for Cypher to become the only graph query language to return graphs, support graph views, and to process and output multiple named graphs. This will allow Cypher queries and sub-queries to be composed (including combining queries with other graph functions) and allowing graph set operations.
Designs are also emerging for very powerful, enhanced path patterns, capable of fulfilling the promise of the leading research conceptual language, GXPath. We’ll also talk about current industry discussions on options for SQL graph features, and Neo’s proposals for straightforward bi-directional interoperation rather than complex and far-reaching “over-extensions” which try to turn SQL into a dual-data model language.
openCypher welcomes all contributors and aims to make Cypher the practical, and ultimately the formal, open standard: the “SQL of graphs”.

Large scale graph analytics on distributed graph database and distributed graph computing in memory

Terry Yang, IBM

Graph approaches to structuring, analyzing data have been a significant area of interest, Graphs are well-suited to expressing complex interconnections and clusters of highly related entities.
Large-scale graph analytics research is growing fast in recent years, to leverage Hadoop2 ecosystem for graph is a good approach, enterprise graph computer requires to store large graph and do fast computing against graph. One for the OLTP database systems which allow the user to query the graph in real-time, Hbase as the distributed NOSql database can be the backend storage to persistent large graph, the property graph stored its vertices and edges in key-value pairs in Hbase, it also provide highly reliable, scalable and fault tolerant to the data, Solr as the distributed indexing will make the query more efficient. Titan itself will handle cache, transaction; And another for the OLAP analytics systems, use TinkerPop hadoop gremlin SparkGraphComputer to processed a large graph, every vertex and edge is analyzed, a cluster-computing platform will help for the processing of large distributed in memory graph datasets.
Graph DB base on Hbase/Solr and graph computing analysis base on spark is powerful for discovering valuable information about relationships in complex and large data, representing significant business opportunity in enterprise. It will help graph data analytics in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.