This month, AliCloud held the "2022 AliCloud Data Storage Ecosystem Conference" in Beijing, and EvenNumber Technology, as a current pioneer in the field of cloud-native data warehouse technology in China, was invited to participate in this conference.
As a guest speaker, Zhenglin Tao, Chief Architect of EvenNumber Technologies, reviewed the evolution of analytical databases with the line of technical benevolence*** at the conference, as well as the current cutting-edge concepts and practices of EvenNumber Technologies in terms of Lake Warehouse Integration.
In response, Zhenglin Tao highlighted the six key features of ANCHOR: real-time T+0, one copy of the data, ultra-high concurrency, data consistency, cloud native, and multi-data support. With the latest version and architecture of OushuDB, the Even Lake Warehouse All-in-One solution will help customers leverage the value of their data in a cloud facility.
Why the split model of "lake" + "warehouse" is not the best choice
With the gradual rollout of Hadoop big data platforms in recent years, organizations are starting to try to use Hadoop for some of their non-core scenarios. The Hadoop platform has been used in a number of non-core scenarios, but Hadoop performance and concurrency support is limited, and transaction support is weak, delivery, operation and maintenance costs are high, and can not replace the core number of warehouses, which can only be used as a basic "data lake". In order to meet the user's requirements in terms of performance, transactions, etc., many enterprises have begun to consider the complementary approach of data lakes and data warehouses. In the construction of the data lake at the same time, but also the use of MPP, the lake warehouse are deployed independently of each other, the data through the ETL way through.
This is often referred to in the industry as the Hadoop+MPP "lake warehouse split" model.
While this model allows lakes and silos to complement each other's technical characteristics, it also creates serious problems that often baffle organizations, including:
These common conditions are more of a headache for practitioners. To solve these problems, it is necessary to realize the data and query level to form an integrated architecture, completely get rid of these bottlenecks encountered by the big data platform, which can greatly reduce IT operation and maintenance costs and the technical threshold of data management.
What's the difference between the OushuDB split-lake model
So what's the difference between the OushuDB split-lake model and the Hadoop+MPP "split-lake" model?
OushuDB, the world's fastest next-generation analytic database engine developed by OddNumber Technologies, innovatively adopts a cloud-native architecture that separates the storage from the computation. As a new data platform architecture, the separation of storage and compute ensures that storage and compute can elastically scale and expand independently.
Traditional MPP and Hadoop are not suitable for this requirement:
In addition, in order to meet the demands of real-time stream processing, real-time on-demand analytics, and offline analytics at the same time, Even Technology uniquely explored the Omega full-real-time data processing architecture, which has obvious advantages over the traditional Kappa and Lambda architectures.
It can be said that OushuDB basically solves the technical bottleneck of the "lake warehouse split", and the technical advantages are quite obvious:
Lake warehouse selection, Anchor first
Even Technology believes that in order to truly solve the pain points of the business and choose the right Lake Warehouse product for the enterprise, we can follow the previously mentioned ANCHOR criteria to select the model. 6 initial letters of ANCHOR represent six characteristics:
Industry Recognition and Even's Continuous Breakthrough Innovation
Among the common customer industries of big data, the banking industry is one of the fields with the highest requirements for independent control, high availability, and high reliability of applications, and the landing of even-numbered tech solutions in the banking industry is a clear evidence of its technical strength and comprehension of user pain points. As early as 2020, Even Technology established a high-performance big data joint laboratory with the Construction Bank to explore the implementation path of lake warehouse integration. After continuous technical discussions and application verification, the full real-time lake warehouse integration program based on cloud-native database technology developed by the two partners adopts a set of technology stacks and unified storage for the construction of dual capabilities of the lake warehouse, which has been equipped with extreme performance, elastic scaling, on-demand allocation of computing resources, single storage for full-volume data, no need for frequent derivatives, mixed loads and other related capabilities, and is able to fully construct the real-time application scenarios of the bank and its customers, helping the bank to improve its performance. application scenarios, helping CCB to improve real-time demand response performance, enhance system elasticity, and save operation and maintenance costs.
Recently, Even Technology was officially selected to the national list of "small giants" in specialization, refinement, specialization, and novelty. As a startup that helps the country break through the "neckline" of key technology areas, Even Tech's efforts in database localization and technology independent security are being gradually verified and affirmed at the national level.
With the gradual establishment of the Internet of Things and the Industrial Internet in the future, the big data field will be faced with more and more extensive data sources, more and more data volumes, more and more unstructured data, more and more rich application scenarios, and more and more complex technology stacks, and the difficulty of processing and analyzing big data will be further enhanced. From the database in the 1960s, to data warehouses, data lakes, to the current Lake Warehouse All-in-One, new products are always in the performance, functionality to solve the previous practitioners in the business pain points, we can say that the Lake Warehouse All-in-One is the inevitable product of the development of databases to the cloud-native era.
Through virtual computing cluster technology in hundreds of thousands of nodes on the mega cluster to achieve high concurrency, guarantee transaction support, provide real-time capabilities, a data and then no data islands, a new generation of lake warehouse all-in-one architecture will be the future development trend. As a leader in the field of lake warehouse integration, even technology will continue to optimize the technology to bring users higher performance and more robust solutions to support more industry users to transform data into productivity.