In my sessions at NYCC, I'll teach simple strategies for moving from a technology stack mindset to an insightful app mindset.
4. Ensure data availability
Data science stories are usually told in the reverse order of how they actually happened. In well-written stories, the author will start with an important question, guide you through collecting data to answer the question, describe the steps of the experiment, and present the final conclusion. But in real data science practice, the story usually starts with someone looking at existing data and asking the question, "Hey, I wonder if there's something cool we can do with this data?" That question brings about change, which further brings about building useful things, as well as bringing about finding possible beneficiaries. Much of the work is devoted to bridging the gap between the new insights discovered and the needs of stakeholders. But when the story is told, the reader experiences a smoothing process from the needs of the stakeholders to the discovery of new insights.
The questions you ask are usually questions that you have access to enough data to answer. True data science usually requires a robust storage system for discretionary data. In this tutorial, I'll cover building and using data channels to make sure you always have enough data to do something useful.
3. Have a strategy
Data strategy is often confused with data governance. When I think of strategy I think of chess. To play chess you have to know the rules of the game, but to win you have to have a strategy. You have to know the rule that "a pawn on D2 can move to D3 unless there is an obstruction on D3 or the move exposes the king to a direct attack". But just knowing this rule didn't help me make the winning move. What I really needed was a pattern that would help me put my pieces in better positions to win the game: "If I can get my rook and queen together in the middle of the board, then I can force my opponent's king into a corner trap".
This lesson from chess also applies to using data to win games. Professional data scientists understand that to win a game you must have a strategy; to build a strategy you must have a strategy map. In this tutorial we'll cover how to build a strategy map based on the most important business problems, build a data strategy, and execute a strategy based on application thinking.
2. Hacking
Hacking in this context certainly doesn't mean engaging in destructive or illegal activities; I'm talking about piecing together useful solutions (the ability to do so). Professional data scientists often need to build solutions quickly. While tools can make you more efficient, tools alone will not by themselves bring efficiency when you need it.
To reach the level of a professional data scientist, you must master the art of hacking. You need to be adept at using the resources you already have to generate new, minimally viable data products. In New York we'll cover some of the techniques that can put together data products and build solutions that you can understand and that are fit for purpose.
1. Experimentation
With experimentation, I don't mean simply trying different things and seeing what happens. I mean more formal experiments guided by the scientific method. Remember all those experiments you did in elementary school science class, all those reports you wrote, and all those presentations you gave in class? Yeah, it was like that.
Conducting experiments and evaluating the results is one of the most effective ways for data scientists to make an impact. I've found that in organizations, good stories and macros aren't enough to convince others to adopt new methods. The only way I've found that can be powerful enough to influence change is a success story. Few people are willing to try a new approach unless it's proven successful. You can't prove a method is successful unless you get people to try it. The way out of this vicious circle is to conduct a series of small experiments.
In the tutorial, we'll also learn the technique of conducting experiments in very short sprints, which will force us to focus on discovering new insights and improving the business in small, meaningful batches.
We are at the beginning of a new phase in big data. This phase has less to do with the technical details of acquiring and storing data at scale and more to do with discovering impactful and scalable new insights. Organizations that can adapt and learn to make the most of their data will, as always, outperform their peers. People who can conceptualize data-driven business enhancements, make them a reality and drive change are what organizations need most. I don't know how many are really interested in taking on this challenge, but I really look forward to meeting them.