At the beginning, we must understand clearly the situation of themselves and the industry, many people simply can not tell the difference between big data and data mining in the recruitment information to say that they want to change careers, in fact, is very irresponsible. Do not always catch hot, anyway, I am often despised to do what big data development is too Low, do data to do data mining, otherwise it will always be water.
2. Select learning path
If it is really clear that they clearly want to change the data development, to consider their own time and energy, how much time can be taken out, and it is best to have someone to point out more in the learning time, otherwise it is too easy to take a detour.
When choosing a specific learning path, be careful, there are several options:
Self-study
Enrollment
Find someone to guide
Other than that, enrollment can be considered, do not expect to enroll in a tutorial class will be able to take you to the sky, but you can rely on him to help you sort out the ideas. If there are people who specialize in this line of work more help, is the best. Not necessarily good technology, mainly communicable.
3. Learning route
Learning route, the following is a general recommendation:
Phase 1
First have a certain Linux and Java foundation, do not necessarily have to be particularly deep, the first to play up, Linux, if you can perform a variety of operations on their own, Java can write a little program. These things for building Hadoop environment preparation.
Learn Hadoop, learn to build a stand-alone version of Hadoop, then distributed Hadoop, write some MR programs.
Then learn the Hadoop ecosystem of other big data components, such as Spark, Hive, Hbase, try to build and then run some official Demo.
Linux, Java, a variety of components have some foundation, to have some project practice, this time to find some successful cases, such as searching for various video tutorials How to get a recommender system, use what you have learned.
Second stage
Here is a basic stage, roughly some understanding of data development. Then there should be some interesting content to choose to learn.
Data warehouse system: how to engage in data layering, data warehouse system how to build, can have some general understanding.
User profiling and feature engineering: the earlier this part is understood, the better.
Some of the system implementation ideas: such as scheduling system, metadata system, recommendation system how these systems are realized.
The third stage
The following to have some sub-division of the field needs to be carried out in depth, depending on the work and interest to choose some to carry out in depth
Distributed Theory: such as Gossip, DHT, Paxo, these constitute the underlying protocols and algorithms of a variety of distributed systems, or to learn a little.
Data mining algorithms: algorithms are to be learned, but not necessarily pure theory, the implementation of algorithms in a distributed environment, itself a big challenge.
Various systems of source code learning: such as Hadoop, Spark, Kafka source code, want to go deeper into the big data, the source code can not run away.