Neo4j: What a graph database is and what it is used for
No matter who you talk to about data and where and why you do it, surely they will tell you that the most important thing about Big Data is extracting value from the information (it is the perfect concept in which the four famous Vs of Big Data are included: volume, velocity, variety and veracity). Many companies have data that are of little use because they are unstructured and they do not know the relationship between them.
Graph databases help to find relationships and make sense of the complete puzzle. One of the best known is Neo4j, which is a service implemented in Java. Its first version was released in February 2010 and is now under two types of license: a commercial license on the one hand and an Affero General Public License (AGPL) on the other. It is developed by the company Neo Technology, a Swedish startup based in San Francisco.
In Dare2Data, a recent event dedicated to data organized by the BBVA Innovation Center, David Montag, a software engineer at Neo Technology and consultant to giants like Cisco, Adobe and Viadeo, gave a lecture on what Neo4j is, what its advantages are and also its uses in today's market.
"Today all companies in the world are trying to do data-driven business," Montag stated. During his talk, the developer explained several use cases of Neo4j: eBay uses it to plan e-commerce service routes; Walmart analyzes each sale of a product to "understand what kind of items you like to buy and what kind of products it can recommend to you”; or for example Cisco, that thanks to Neo4j offers solutions that are tailored to its customers "without having to pick up the phone and talk to the helpdesk”.
How Neo4j works and what its advantages are
Neo4j uses graphs to represent data and the relationships between them. A graph is defined as any graphical representation that consists of vertices (shown by circles) and edges (shown with intersection lines). Within these graphical representations, we have several types of graphs:
- Undirected graphs: nodes and relationships are interchangeable, their relationship can be interpreted in any way. Friendly relationships in the Facebook social network, for example, are this type.
- Directed graphs nodes and relationships are not bidirectional by default. Twitter relationships are this type. A user can follow certain profiles in this social network without them following him.
- Graphs with weight: in this type of graphic relationships between nodes have some kind of numerical assessment. This allows operations to be subsequently performed.
- Graphs with labels: these graphs have labels incorporated that can define the vertices and relationships between them. On Facebook we might have nodes defined by terms like 'friend' or 'co-worker' and relationships like 'friend' or 'partner of'.
- Property graphs: this is a weighted graph with labels where we can assign properties to both nodes and relationships (for example, matters such as name, age, country of residence or birth). This is the most complex.
Neo4j uses property graphs to extract added value of data of any company with great performance and in an agile, flexible and scalable way.
Graph databases such as Neo4j perform better than relational (SQL) and non-relational (NoSQL) databases. The key is that, even though data queries increase exponentially, the performance of Neo4j does not drop, compared to what happens with relational databases such as MySQL.
Graph databases respond to inquiries by updating the node and the relationships of that search and not the whole of the complete graph. That optimizes the process.
Volker Pacher, eBay developer and Neo4j client, explains with data what is entailed by changing MySQL to this graph database in the performance of Shutl, the platform that coordinates delivery between stores, couriers and buyers in eBay Now: "Our Neo4j solution is literally a thousand times faster than the previous MySQL solution, with searches that require between 10 and 100 times less code”.
Neo4J has many advantages, but one is its responsiveness in managing data. If we wanted to push the boundaries of its abilities, we would have to exceed a total volume of 34 billion nodes (data), 34 billion relationships between these pieces of data, 68 billion of properties and 32,000 types of relationships.
3. Flexibility and scalability:
When developers of a company work with big data, they are looking for flexibility and scalability. Graph databases contribute a lot in this regard because when needs increase, the possibilities of adding more nodes and relationships to an existing graph are huge.
Use cases of Neo4j
Neo4j has different uses today in international companies. Neo Technology has several white papers analyzing each of these uses:
Neo4j already works with several corporations in detecting fraud in sectors such as banking, insurance and e-commerce. This database can discover patterns that would be difficult to detect with other DBs.
Fraud networks have mechanisms for crime that are not detectable with the linear analysis of data. But with a scalable analysis of the multiple relationships between data, this is much easier.
A common fraud is to open credit facilities under false identities with the intention not to pay: nowadays, between 10% and 20% of the debt without the backing of leading banks in the US and in Europe is due to this fraud.
Neo4j effectively connects people with our products and services, based on personal information, their profiles on social networks and their recent online activity. In this respect, graph databases are interesting because they are able to connect people and interests.
With that information, a company can adjust its products and services to its target audience and customize the recommendation based on profiles. That is what allows the commercial accuracy and customer engagement to increase.
Graph database are the perfect antidote to the overwhelming growth in data. The large quantity of information, devices and users mean that traditional technologies cannot handle so much data. The flexibility, performance and scalability of Neo4j make it possible to manage, monitor and optimize all types of physical and virtual networks despite the large amount of data.
Master data management is a real headache for companies. The creation of a centralized and reliable information system is always a complex issue. The ultimate goal is that every member of an organization uses the same formats and applications for data. This creates a working protocol that is usable by the rest.
Neo4j helps create such systems with speed, agility, performance, and all this without losing flexibility and scalability with data. We would have a system for creating 360º insights: employees, customers and products.
After the Dare2Data event during InnovaChallenge Data Week at the BBVA Innovation Center, David Montag has given us the keys to understanding graph databases:
Follow us on @BBVAAPIMarket