Current location - Loan Platform Complete Network - Big data management - impala concurrency settings
impala concurrency settings
impala concurrency settingsBy checking the code of Impala, this kind of error is generally due to two situations:

One situation is that there is not enough available memory; the other situation is that the impalaservicepool is full.

Impala is a new query system led by Cloudera that provides SQL semantics to query petabytes of big data stored in Hadoop's HDFS and HBase. Although the existing Hive system also provides SQL semantics, but because the underlying execution of Hive uses the MapReduce engine, it is still a batch process, and it is difficult to meet the interactivity of the query. In contrast, Impala's best feature and biggest selling point is that it's fast.

Benefits:Impala does not need to write intermediate results to disk, eliminating a lot of I/O overhead. Saves the overhead of MapReduce job startup.MapReduce startup task is very slow (the default interval between each heartbeat is 3 seconds), Impala directly through the appropriate service process for job scheduling, much faster.

Impala completely abandons MapReduce, a paradigm less suited to SQL queries, and instead borrows the idea of MPP parallel databases as Dremel did to start a new one, so it can do more query optimization, thus eliminating unnecessary shuffle, sort, and other overheads. Unified compilation of runtime code by using LLVM, avoiding unnecessary overhead to support generic compilation. Implemented in C++ with many targeted hardware optimizations, such as the use of SSE instructions. Uses a Datalocality-enabled I/O scheduling mechanism to distribute data and computation on the same machine as much as possible, reducing network overhead.