Current location - Loan Platform Complete Network - Big data management - Understanding hive locking mechanism from a locked table problem
Understanding hive locking mechanism from a locked table problem

A coworker ran into an issue where no matter where the Hive SQL was submitted from the platform, there was no progress and no logs.

In view of the previous similar feedback, check the locks of the tables involved in the SQL,

found that there are a number of shared locks, unlocking operations on the table

But re-execute the sql still have locks on the table phenomenon, the main body of the sql is as follows (to find out the users of a certain day in the partition table B, and does not exist in the history of the table A, has been before the execution of add partition)

found that not only can not be a user of a certain day, and not in the history of the table A, before the execution of add partition

found that not only can not insert, which select user_id from A statement can not be executed, because it is a test table, on the rebuild (after the fact is estimated to be more than one partition is locked, and there is a X lock, only to unlock the table can not be recursively unlocked on the partitions). After rebuilding select disctinct... This whole statement will be GC memory exceeded, it will be rewritten as the following framework (window function de-emphasis to replace distinct, outside the connection to replace in):

so that the select statement can run, but the whole sql is still locked table.

Looking at the data

Hive Lock stuff

Official docs

blogs.com/barneywill/p/10185577.html

Found that the select ... T1 partition P1 statement requires S locks on T1, T1.P1, then select on the whole partitioned table requires S locks on all partitions .

Going back to the statement, the join operation relies on the S lock on the A table, but it ends up writing to a new partition of the A table, with an X lock, causing a deadlock. So you need to leave the partition of table A to be written to without the S lock :

Successfully executed.

Additionally, setting the hive parameter set hive.support.concurrency=false; before SQL execution can force locks to be ignored, but for data integrity, it is not recommended to use this operation.