Current location - Loan Platform Complete Network - Big data management - ETL engineer to learn what?
ETL engineer to learn what?

Technical aspects: need to learn to use the data source, the target side of the basic use of tools (such as oracle MySQL hive, etc.); need to learn etl tool installation and configuration of commonly used error resolution (such as kettle DataStage infa sqoop datax, etc.)

Theoretical aspects: know the number warehouse layered architecture, dimensional modeling, etc. strong>Theoretical aspects: know the number of warehouses layered architecture, dimensional modeling, etc..

From the literal point of view of ETL, it contains three main phases, which are data extraction, data transformation, and data loading.

1. Data Extraction

The main goal of this phase is to aggregate multiple data sources in preparation for the next transformation.

2. Data Transformation

This phase is the core aspect of ETL and the most complex. Its main goal is to extract a variety of data, data cleaning, format conversion, missing values to fill in, eliminate duplicates and other operations, and ultimately get a unified format, highly structured, high data quality, good compatibility of the data for subsequent analysis and decision-making to provide reliable data support.

3. Data loading

The main goal of this phase is to load data to a destination, such as a data warehouse. It is common practice to write the processed data to a file in a specific format (e.g., parquet, csv, etc.) and then mount the file on a specified table partition. There are also some tables that have a small amount of data and will not use a partitioned table, but will generate the final data table directly.

Understanding the work of this part of the ETL what to do, and then to say as an ETL engineer needs to have what skills, these are the focus of the need to learn -

1, proficient in SQL language, with the ability to develop stored procedures, proficient in SQL query optimization;

2, familiar with Hive data warehouse design, understanding of data warehouse models and ideas, dimensional modeling ideas, understanding of data warehousing;

3, familiar with Hadoop, Spark, Flink, Kafka and other related technologies;

4, proficiency in at least one of the languages of Python, Java;

5, familiarity with Mysql, Nosql and other common databases.