Current location - Loan Platform Complete Network - Big data management - Two systems do data synchronization, data is missing, what is the reason?
Two systems do data synchronization, data is missing, what is the reason?
During the loading process, the update_time may change, resulting in a change in the ordering of the unloaded data. This results in missing data being loaded.

Using IDs for paging prevents the data from being missed due to data changes during the loading process. The current big data platform does not support the update operation, but uses: full outer join + insert overwrite; (i.e., if the day scheduling, the incremental data of the day and the full data of the previous day will be full outer join, and the latest full data will be reloaded) If you are worried about the data updating error: keep each article a latest full-volume version, keep a shorter event cycle. (Alternatively, when there is a physical deletion of data from a table in the business system and the data warehouse needs to retain all the historical data, you can choose this option to keep the latest snapshot of the full-volume data permanently in the data warehouse.)