Current location - Loan Platform Complete Network - Big data management - Finding Data Scientists in the Age of Big Data
Finding Data Scientists in the Age of Big Data
Looking for Data Scientists in the Age of Big Data

It's no secret that data scientists are in short supply. The explosion of data and the corresponding explosion-proofing tools, along with Moore's and Metcalfe's Laws, and their ripple effects have resulted in more data, links, and technology to process than ever before. The last year in the world of Hadoop has seen a frenzy to train data scientists who can only barely keep up with the demand that dwarfs the demand for technology-oriented data architects. This means:

1. A potential MacArthur Grant recipient who has a passion for and insight into data, math, and statistical skills, which understands the laws of arithmetic, understands the artistry of painting pictures, and understands the orientation of all data. This is what it means to be a data scientist.

2. These people can understand the side of big data platforms, that is, data architects or data engineers.

The data structurer will be the side that is more upfront about the difficulties. Understanding the big data platforms (Hadoop, MongoDB, Riak) and the emerging advanced SQL offerings (Exadata, Netezza, Greenplum, Vertica, and a more recent technology on the rise such as Calpont) is a technical skill that can be taught through an explicit curriculum. The laws of supply and demand will solve this problem - just as the bubble created a demand for Java programmers back in 1999.

Behind all the cries of needing Hadoop programmers is a similar, but very quiet, trend of people scrambling to hire data scientists in a hurry. As much as some data scientists claim that data scientist is a buzzword, this need is real. [page] However, data science will have a lot of hurdles to overcome. It's all about connecting the dots and it's not as easy as it sounds. The V's of Big Data-volume, variety, velocity, and value-all require certain people to make discoveries based on insights into the data; traditionally, that role has been fulfilled by data developers. But data developers can only handle a limited number of problems well, and bounded (known) datasets, which makes the problem more two-dimensional.

Big data in all its varieties-introduces an element of the unknown in form and source. Interpreting big data requires savvy research, communication skills, creativity/art, and the ability to think very intuitively about numbers. And let's not forget that all of this has to be built on a solid background in statistics and machine learning, plus technical knowledge of the tools and programming languages of the trade.

Sometimes it seems like we are looking for Einstein or some wise man.

Nature hates a vacuum

Just as nature hates a vacuum, there is now a rush not only to define what kind of person is a data scientist, but also all the time to think about developing programs through which to teach and learn, through software packages that contain this information to some degree, or else throw it elsewhere.EMC and others are stepping up to the plate. Developing boards to provide training, not just on platforms, but also for data science. kaggle offers an innovative cloud-based, crowdsourced approach to data science that provides predictive modeling platforms, which are then segmented to launch 24-hour competitions for potentially training data scientists to develop the best solution to a particular problem (which brings to mind Netflix's $1 million dollar prize system, devising a way for data scientists to develop the best solution to a particular problem). million dollar prize system to devise a smarter algorithm to predict viewer tastes).

With data science talent in short supply, we expect consulting firms to buy more talent that they can then "rent" to multiple clients. With the exception of a few foreign firms, few system integrators (SIs) have stepped up to the plate to formally launch a Big Data practice (where logical data scientists will reside), but we expect that to change soon. [page] Opera's solution, which has been in the predictive analytics consulting game since 2004, is taking the next step down the packaging route. Adding $84 million in Series A funding last year, the company has staffed up with nearly 200 data scientists, making it one of the largest portfolios of talent this side of Google. opera's predictive analytics solutions are designed for a variety of different platforms, SQL and Hadoop, and today they joined the trend of announcements for SAP Sapphire along with the release of their offer for the HANA in-memory database. Andrew? Brewster has a good in-depth analysis of the details of this announcement.

From SAP's perspective, Opera's predictive analytics solutions are a logical fit for HANA because they involve a variety of complexities (e.g., one computation triggering other computations) for which its new in-memory database platform is specifically designed.

The expectation that Opera will remain the only large aggregator of data scientists that are available for rent to other companies is of great value to Opera. Ironically, however, market entry barriers will keep the competitive space very narrow and highly concentrated. Of course, as market demand increases, there will inevitably be a downward spiral in the definition of a data scientist so that more and more companies can claim that they've got one or many.

The laws of supply and demand will be skewed in favor of data scientists, but the supply will not rise as quickly as the more platform-focused data architects or engineers. Inevitably, the supply of data scientists will be augmented by software that automates the interpretation of machine learning, but software can only go so far as you can program machines with creativity and counter-intuitive insights.