Published by Tan Zhenghai, Wu Yi on August 18, 2016 | The richest man's "one hundred million" brush screen? How about setting a small goal and taking advantage of the QCon Shanghai offer first? Discuss
Share on:WeiboWeiboFacebookTwitterYouDaoCloudNoteMailShare
Read Later
My Reading List
In the era of big data, with the explosive growth of data volume, the speed of data processing requirements for data is also increasingly high, the previous data processing solutions based on MySQL can no longer satisfy the big throughput, low-latency writing and high-speed query scenarios; percent summarizes a complete set of solutions, this article will take you together to understand VoltDB in the streaming data interactive query application practice.
Streaming data interactive query scenarios
In the percentage point, there are 1 billion records generated every day, for these large amounts of real-time data, not only to achieve real-time writing, similar to the recommendation of tuning, data validation, and other queries should be in the second response. There is a simple single validation, but also a few hours or a day of aggregation calculations, but also based on tens of millions/hundreds of millions of data tables between the joint aggregation query. For example, the following SQL query:
For the pre-MySQL program, although it has been done according to certain rules of the manual library, but for the above SQL table Event fell on a single machine in the amount of data reached tens of millions, Result table is also nearly ten million, in such a large table between the complexity of the joint aggregation query, MySQL to check it down to spend 30 minutes or so, or even longer, or did not respond. Or even longer, or no response.
So in a scenario that requires high throughput, low-latency writes, and high-speed queries, the existing MySQL-based solution is completely unachievable. Without giving up the convenience of SQL statements, we went through a variety of options and program research, and finally chose VoltDB to solve these problems.
Related Vendor Content
Ctrip's Recommendation and Intelligent Algorithms and Architecture System Practice
Autodesk's Practical Experience of Building Its Own Big Data Platform Based on Spark
Big Data and the Four Core Elements of E-commerce
Alibaba's Way of Establishing and Managing the Data Research and Development System
Suning Yuncheng Data Platform Real-time Practice
Related Sponsors
QCon Shanghai 2016, October 20~22, Shanghai - Baohua Marriott Hotel, the highlights of the first look!
As shown above, the full volume of traffic on the wire, via the Streaming bus, reaches both the VoltDB and offline Hive tables. The difference is that data is written to VoltDB using real-time and to Hive using batch. New data is required to be written to VoltDB within a very short delay to be queried immediately; batch data written to Hive can also be flushed to the corresponding partition within an hour.
Introduction to VoltDB
VoltDB is an open source, extremely fast in-memory relational database, developed by Mike Stonebraker, co-founder of Ingres and Postgres, led by the development of NewSQL, available in both community and commercial versions.VoltDB adopts a shard-nothing architecture. Both get the good scalability of NoSQL and high throughput data processing, but did not give up the traditional relational database transaction support --- ACID.
Generally VoltDB database cluster consists of a large number of sites (partitions), scattered on multiple machines, the data storage and processing are distributed to the various sites, the architectural diagram is shown below:
As shown above, the cluster has 3 nodes, each node 1 site composition. Therefore, the table in the figure are divided into only 3 zones, of course, can also be divided into more zones, then a table on a single node exists in multiple partitions.
Specifically in the use of the following concepts:
The client can connect to any node in the cluster, all the nodes in the cluster is peer-to-peer, the use of horizontal partitioning;
Each table is specified in a field as a partitioning key, VoltDB use this key using the key using a hash algorithm to distribute the table data to the various parts of the distribution. In fact, there are two types of tables in VoltDB, partitioned tables and "replicated tables". A "replicated table" stores not part of a table's data, but all of it, at each node, and is suitable for tables with small amounts of data.
Here we mainly look at partitioned tables, the choice of partition fields for partitioned tables is very important, you should try to choose fields that spread the data evenly.
Client-side languages or interfaces supported by VoltDB:
C++
C#
Erlang
Go
Java
Python
Node.js
JDBC driver interface
HTTPJSON interface (this means that any language that implements http requests can write a client program for VoltDB, and it's very intuitive)