Graph Day SF 2018 sessions

These were the featured sessions at Graph Day SF 2018.

Extensible RESTful Applications with Apache Tinkerpop

Varun Ganesh / Harshvardhan Joshi - Nugit

You are into data analytics. You come across a source of data and you realise that it is an intuitive case for a Knowledge Graph and that there is much value to be gained by incorporating it into one. How do you take this from zero to product while ensuring that it is well-tested, extensible, scalable and plays nicely with other components and services?
Slack, with its various interactions among its users is a prime candidate for this. Join us as we take you through our journey of conceptualizing Slack user data as a knowledge graph, evaluating different frameworks, incorporating business logic using Tinkerpop with an extensible DSL and exposing it all through a familiar RESTful interface that allows us to effectively handle an ever-growing and dynamic graph.
This talk will cover the Slack Knowledge Graph, why TinkerPop was chose, Gremlin and the motivations behind the DSL, implementing custom workflows, remote traversals and making the graph RESTful.
Technical skills and concepts required: Basic understanding of the knowledge graph. Familiarity with Python, the Gremlin query language and the concept of REST will be helpful.

Navigating Time and Probability in Knowledge Graphs

Jans Aasman - Franz, Inc.

The market for knowledge graphs is rapidly developing and evolving to solve widely acknowledged deficiencies with data warehouse approaches. Graph databases are providing the foundation for these knowledge graphs and in our enterprise customer base we see two approaches forming: static knowledge graphs and dynamic event driven knowledge graphs. Static knowledge graphs focus mostly on metadata about entities and the relationships between these entities but they don’t capture ongoing business processes. DBPedia, Geonames and Census or Pubmed are great examples of static knowledge.
Dynamic knowledge graphs are used in the enterprise to facilitate internal processes, facilitate the improvement of products or services or gather dynamic knowledge about customers. I recently authored an IEEE article describing this evolution of knowledge graphs in the Enterprise and during this presentation I will describe two critical success factors for dynamic knowledge graphs, a uniform way to model, query and interactively navigate time and the power of incorporating probabilities into the graph. The presentation will cover three use cases and live demos showing the confluence of knowledge via machine learning, visual querying, distributed graph databases, and big data not only displays links between objects, but also quantifies the probability of their occurrence.

Graph Database + Legacy Application = Hard

Dave Bechberger - GeneByGene

"Let's use a graph to add special project alpha to our product" - Big Boss
"Our application is 15 years old and built on SQL server" - Team
"This project alpha was funded as a graph problem so we are using one" - Big Boss

No matter if you are new to using graph databases or a seasoned veteran implementing new technology into legacy applications is hard. Implementing one as relatively new, and unknown, as graph databases is even harder. The reality is that almost every applications is a legacy application and in order to make them better, taking the hard path turns out to be the best approach.
In this talk, we will walk through both pleasant and painful experiences adding graph databases into legacy applications. Dave will share war stories, battle scars and walk through common patterns and anti-patterns to help you forge your roadmap to success. In the end you will hopefully come away with a sense of what warning signs tp watch out for when starting this kind of project and a better understanding of what not to do.

Building a Graphy Time Machine

Max Neunhöffer - ArangoDB

Graph databases allow users to analyze highly interconnected datasets and find patterns within these relationships. Social networks, corporate hierarchies, fraud detection, network analytics, or building whole knowledge graphs are great use cases for graph databases. However, these datasets of nodes and connecting edges change over time. Whether you are a developer, architect or data scientist, you may want to time travel for analyzing the past or even predict tomorrow.
While your graph database may be lacking built-in support for managing the revision history of graph data, this talk will show you how to manage it in a performant manner for general classes of graphs. Best of all, this won't require any groundbreaking new ideas. We'll simply borrow a few tools and tricks from existing persistent data structure literature and adapt them for good performance within the graph database software. This will help enable new ways to manipulate and exploit graph data and hopefully power new and exciting applications.

Case Study: Visualize and Analyze the GDELT Global Knowledge Graph

Kevin Madden - Tom Sawyer

Every 15 minutes Google captures the world’s news into the Global Database of Events, Language, and Tone (GDELT) Global Knowledge Graph. The 2015 data contains nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references.
This talk describes the challenges, complexities, and technical solutions involved in creating a massive data lake from GDELT data that enables users to query, visualize and analyze the world’s news every 15 minutes, and save the results back to a graph database.
On the back end, this project included ingesting the entire underlying event and graph datasets – more than 2.5TB for last year alone – and then updating the data every 15 minutes. We migrated the link structure from a relational to a spatial and graph database. This enabled regional geographic clustering and provided the platform to support all-in-one analysis.
The front end required interactive, user-friendly search and filtering to automatically generate a unified view of diverse data and analytics. The view includes geospatial information, network topologies and sentiment analysis combined with timelines, link analysis, maps, trees, charts and tables.
The talk concludes with a demonstration.
Intended Audience: CTOs, CIOs, data architects, data engineers, data scientists, enterprise architects, solution architects, VPs of Engineering, VP of Analytics.
Technical Skills and Concepts Required: Property Graphs, Knowledge Graphs, Graph Query Languages, Graph Theory and Algorithms, Graph Analytics and Visualization, Linked Data, Sentiment Analysis.

Building a Knowledge Graph

Dan Bennett - Thomson Reuters

Just a few years ago a knowledge graph was the domain of academic papers, today they underpin the natural language capabilities of Alexa, Siri, Cortana and Google Now. Graphs are a natural fit for this use case: treating every data item as equivalent and embracing rapid schema mutation. For the past few years, Thomson Reuters has been building a professional information knowledge graph to power our next generation of products. Our graph is RDF based, fast growing and supports a number of different products and user experiences. In this session, Dan will cover our experiences, architecture, tools and lessons learned from building, integrating and maintaining a 100bn triple graph.

From Theory to Production

Dr. Denise Gosnell - DataStax

We are here to build applications with graph data and deliver value. The graph community has spent years defining and describing our passion. In order to decipher graph thinking into a production application, there is a suite of hard decisions that have to be made. It's time for graph to go mainstream!
This talk will walk through some practical and tangible decisions that come into play when shipping distributed graph applications. Developers need to have a tangible set of play books to work from and my years of experience have narrowed it down to some of the most universal and difficult to spot. Let's see how well they match up with yours.

Comparing GraphFrames access methods in DSE Graph

Jim Hatcher - DataStax

GraphFrames is a powerful feature in Spark that allows you to harness Spark's distributed computing framework to operate on your Graph. Tasks like data ingestion, schema migrations, and analytical jobs can all be run against your Graph. In DSE Graph, there are several methods to leverage GraphFrames including Gremlin, Spark SQL, and Motif. In this talk, Jim will walk through the basics of using GraphFrames with DSE Graph; he will thenl show how these different methods can be used and how you can evaluate which one is the best for your use case.

Graph Based Malware Analysis

Florian Hockmann / Stefan Hausotte- G DATA Software

We will present our use case for graph databases where we search for similar malware samples based on their behavior. As an anti-virus vendor, we analyze several hundred thousand of potential malware samples per day. These samples belong to only a few malware families whose members share a lot of behavior features. We use this fact to cluster all samples together that belong to the same family by connecting all samples that exhibit the same features via those common features. The behavior features are extracted from malware samples with the help of automatic analysis tools and inserted into a JanusGraph database. This talk shows the advantages a graph database has to offer for automatic and manual malware analysis.

How Do *You* Graph? - Minimizing Developer Impedance

Ben Krug - DataStax

We're often told that graph databases are entirely different from relational and other databases. Graph traversal tools, like Gremlin, often look and feel imperative, whereas tools like SQL are basically declarative. But are they really completely different?
Upon examination, the real issue is - which approaches will help a developer obtain the results they need most quickly and efficiently.
We will contrast different views of, and access methods to, graph data, focusing on examples from Tinkerpop's Gremlin (does traversals, feels imperative, and graphy) and Apache's Spark SQL (does queries, feels declarative, RDBMSy). Gremlin can be leveraged from a variety of programming languages, thanks in part to Gremlin Language Variants, or from a gremlin console. Spark SQL makes use of Spark architecture and concepts, and allows you to build on existing relational experience.
There are many ways to look at and use any data, including graph data. This talk will consider some different graph approaches, their strengths and weaknesses, and demonstrate the use of each to accomplish "graphy" tasks.
Intended audience: developers or admins, to give an overview of the tools and their use, and help elucidate which tools may be the best fit for them.
Required skills: some familiarity with relational and graph databases

Data Modeling with an FU to Super Nodes

Jonathan Lacefield - DataStax

Graph databases are receiving a lot of hype these days because of the promise of fast and flexible queries that aren’t possible within either traditional RDBMs or NoSQL stores built on simple/singular access patterns. There are some practical tips and tricks that ensure that your graph database project is going to live up to the hype. In this talk, we will walk through the data modeling tips and tricks that are being used to help graph users achieve success. We’ll also highlight how to avoid the largest graph problem that can plague any graph database project, the dreaded supernode. This will be a demo led presentation with lots of examples. Beginners to advanced participants are welcomed as there’s something to learn for everyone.

How to Destroy Your Graph Project with Terrible Visualization

Christian Miles - Cambridge Intelligence

We are all using graphs for a reason - in many cases, it's because the graph model presents an intuitive view of the data. Unfortunately, the most elegant graph data models can often be stymied by bad visualizations that obscure rather than enlighten. In this talk, Christian Miles will discuss a number of bad practices in graph visualization that are surprisingly common. He will then outline graph visualization best practices to help create visual interfaces to graph data that convey useful insight into the data.

Visualizing Graph Data - Moving Beyond Node Views

Lynn Pausic - Expero

When people talk about visualizing graph data, what typically comes to mind is the canonical node view. Node views typically display nodes (vertex) and the relationships (edge) between them. With large data sets consisting of millions of vertices and edges, node views can quickly become unwieldy to use and comprehend. Traditional UI patterns and visualizations conceived for relational schemas often don’t work with graph data. Relational schemas are predefined and relatively static making it easy to tailor UI navigation to the available data dimensions. Due to the distinct mathematical nature of graph data, traversing data in a graph is fairly different. While this presents additional challenges, there are also opportunities. Traversing a graph with certain algorithms allows you to, for example, show key influencers in social networks, clusters of communities in customer reviews or weak points in electrical grids. These new insights into data provide novel tools to craft the user experience. But this opportunity comes at a price, namely more complexity. Through building and deploying dozens of applications driven by graph data, we’ve developed a unique approach to building UIs driven by graph data and arsenal of data visualizations that work well across broad range of contexts. In this talk we’ll share various tools and examples for displaying graph data in meaningful ways to users.

The Diffbot Knowledge Graph - Using AI to create a Knowledge Graph from the Web

Mike Tung - Diffbot

Mike Tung, Founder & CEO of Diffbot, describes how they built the world's largest Knowledge Graph, by building an AI to read and understand all of the pages on the web. Mike will discuss AI techniques of document layout classification, computer vision, natural language understanding, and knowledge fusion technologies. He will also give an overview of current research challenges and future directions.

Distributed ACID with JanusGraph on FoundationDB

Ted Wilmes - Expero

The popular open source JanusGraph property graph database supports a variety of different storage layers, each with their own operating characteristics and constraints, but up until now has not had a distributed ACID option. Earlier this year, Apple announced the open sourcing of FoundationDB, a high performance distributed key-value store with ACID guarantees. Could this be a match made in distributed serializable graph isolation heaven? This talk will explore this question in detail, starting with an overview of FoundationDB, followed by a discussion of why ACID matters in a graph database. We’ll finish with the implementation details of the new, experimental JanusGraph FoundationDB adapter and early performance results.

Vertex Programs or GraphFrames? Two approaches to Graph Analytics.

Paras Mehra - DataStax

TinkerPop Vertex Programs and the GraphFrames package for Spark enable engineers and data scientists to extract insights from their graphs. Several algorithms like connected components and page rank are available out of the box. I will explain what Vertex Programs are and how they differ from GraphFrames. We will then walk through some examples of using both Vertex Programs and GraphFrames. This talk will be part theory and part demo with practical tips and recommendations spread through out.

Using GPUs & Design to Scale Visual Analysis of Digital Crime

Leo Meyerovich - Graphistry

What happens if the performance concerns around visually interacting with today's largest graphs somehow got solved?
We examine this question for several recent incidents: a cybersecurity breach logdump, a multi-million dollar Ethereum blockchain theft, and a trafficking extract. Graphs help tie many disparate events together, but enterprise-scale workloads cause performance and usability breakdowns. First, we demonstrate how GPU Open Analytics Initiative technologies (Arrow, PyGDF, NvGraph, and Graphistry) are tantalizingly close to scaling subsecond visual interactions. The basic idea is to connect GPUs in the data center to GPUs in the browser. Then, we show how the bottleneck is increasingly shifting to human-in-the-loop interaction design questions for tasks such as data wrangling. Finally, we provide examples of augmented interaction techniques that make large-scale graph analysis more practical. Put together, we provide a peek into the emerging generation of scalable visual graph tooling.

Apache Atlas and JanusGraph - Graph-based Metadata Management

Jing He - IBM

Data governance enables organizations to effectively and efficiently use their data. At the center of modern data governance is the metadata system that describes the data, collects, stores, and exchanges the metadata. Apache Atlas is the Data Governance and Metadata framework for Hadoop with integration with the whole enterprise data ecosystem. The cornerstone in Apache Atlas is its JanusGraph-based metadata management and repository.
In this presentation, we primarily focus on how Apache Atlas's metadata type and entity system is modeled to a property graph and stored in JanusGraph. With that, we first introduce Apache Atlas, with focus on its metadata type and entity system. Then we will briefly talk about property graph in general and Apache Tinkerpop enabled JanusGraph in particular. Strategies and considerations on DSL, graph queries and storage used in Apache Atlas for JausGraph will be presented.
Intended audience: Data Architect, Data Engineer, Graph and graph database professionals.
Required skills: data governance, metadata management, graph.

Large scale graph analytics with RDF and LPG parallel processing

Barry Zane -Cambridge Semantics

Analytics that traverse large portions of large graphs have been problematic for graph engines, both RDF and LPG systems. This talk describes the native parallel-computing approach taken in AnzoGraph to yield interactive, scalable performance for RDF and LPG graphs. Included will be a sample benchmark that highlights the performance possible with well understood industry-standard data. This talk is targeted to engineers and data scientists already familiar with at least one graph database system, and at least passing familiarity with RDF, SPARQL and other query languages. Familiarity with parallel processing systems such as large-scale relational and Hadoop systems is beneficial, but not required.

Eight Prerequisites of a Graph Query Language

Dr. Mingxi Wu - TigerGraph

Graph query language is the key to unleashing the value of interconnected data. The talk includes discussion of 8 prerequisites of a graph query language for successful implementation of real world graph analytics use cases. The talk will present the pros and cons of three query languages - Cypher, Gremlin, and SPARQL. Finally, the talk will provide an overview of GSQL, a Turing Complete graph query language that is a conceptual descendent of Cypher, Gremlin and SPARQL and has incorporated design features from SQL as well as Hadoop MapReduce. The talk will compare GSQL query language with Gremlin, Cypher and SparkQL, pointing out the differences including pros and cons for each language.