Crowdsourcing and the Shortage of Data Scientists

CrowdANALYTIX.logoI first connected with CrowdANALYTIX during their early growth phase in 2012, after being funded by Accel. Since then, the company has made strides solving complex problems for Big Four consulting firms, leading pharmaceuticals companies, and financial services organizations.

Recently I spoke with Divyabh Mishra, CEO and Founder, about the shortage of skilled data workers, data privacy, and upcoming trends in crowdsourcing.


Ian Paterson: First, what is CrowdANALYTIX?

Divyabh Mishra: CrowdANALYTIX is an on-demand crowdsourcing platform for data analytics services. It focuses on uncovering leading predictors of financial performance from publicly available data sources, by orchestrating a global community of 5,800+ data scientists and pitching them against one another to uncover the best solutions.

We work with leading strategy and financial consulting firms to develop custom analytical models that uncover leading indicators of financial performance, and monitor these indicators to forecast and uncover actionable signals.

Privacy

IP: In most organizations, data is a closely guarded commodity. How do you handle privacy and security concerns?

DM: That’s true. We overcome this by offering a variety of engagement models to match the risk appetite of clients.

  1. Deliver value using public or syndicated datasets. These datasets are collected and processed by our community, and the client doesn’t need to share any data with us.
  2. Anonymize both content and context before sharing the data, and the problem statement, with the open community.
  3. Host private competitions restricted to consultants that meet the confidentiality needs of the client.

Of course, the more restrictive you make the competition, the less the client benefits from the wisdom of the crowd.

Data Science

IP: Which companies do you consider to be in the CrowdANALYTIX peer group?

DM: Several crowdsourcing firms like Kaggle, Topcoder and Innocentive use a similar approach to crowdsourcing, but few work with clients in translating their business problems into data competitions. CrowdANALYTIX takes complete ownership of validating responses from the competitions, and delivers solutions that are directly usable to the client.

From a utility perspective, we are similar to analytics services firms like Mu Sigma and Fractal Analytics.

tools which allow you to process terabytes of data don’t automatically deliver insights.

IP: Where do you see data science headed?

DM: Data science as a discipline has existed for hundreds of years – the statistical techniques used today aren’t new. What is new is the sudden surge in raw data, the different forms that data takes, and the technology available to process huge data volumes.

As organizations mature, they realize that tools which allow you to process terabytes of data don’t automatically deliver insights. Access to data scientists, and more importantly, access to a variety of perspectives, is a requirement to building data algorithms which are customized to business needs and optimized to deliver accurate results.

Data science will proliferate through every industry and become a basic requirement for companies in all industries. Enterprises that are not data-driven will struggle to survive. Financial markets are already driven by algorithms. Decision making in all industries will become algorithmic, with humans involved in tuning and testing these algorithms.

Talent

Enterprises that are not data-driven will struggle to survive.

IP: We’re already seeing a huge scarcity in qualified individuals to work with data, and as you mentioned data will continue to increase in volume. How does this scarcity of skills affect CrowdANALYTIX, which depends on a large number of contributors for its success?

DM: Good question.

In order to benefit from the wisdom of the crowd, you only need 5 to 10 high quality data scientists competing against each other. We have 5,800 data scientists, we add 100 new data scientists per week, and each competition gets a minimum of 100+ participants.

However even more important than the number of contributors, is the quality of those contributors, as data science relies heavily on the skills and experience of the individual. We invest in “learning competitions” that improves the quality of the fringe contributors on our platform. As demand for quality resources grow, more individuals will get attracted to the field of data science too.

Scarcity of resources exists, but crowdsourcing is the best way of aggregating top professionals from over 50 countries. In fact, CrowdANALYTIX is well positioned to make these resources available to global enterprises.

IP: What do you see for the future of crowd sourcing?

DM: Crowdsourcing is a phenomenon that’s going to change the way we work in the future. The new generation will want to work on their own terms, and this will be especially true of the best among them. Crowdsourcing is already disrupting industries like taxi services (Uber), hotels (AirBnB), simple tasks (Amazon Mechanical Turk), logo design (99designs), and more. This will proliferate into other disciplines, including those more complex like data science. The best data scientists will chose to be part of crowdsourcing platforms, as they will make much more on the platform than as an employee. They will also have the opportunity to work on a variety of challenges, which wouldn’t necessarily be true as part of a large organization.

The need to work on your own terms and the continuing enhancement of technology that enables people to work remotely will eventually lead to many business functions being outsourced. This will also allow firms to be agile and adopt to the rapid shifts in their industry.

IP: To your point, the best scientists might make more than being an employee, but if only the top contributors make money, is there concern about the sustainability of the work force?

DM: Not really. Folks join us with different motivations, and money is only one type of satisfaction.

A platform like ours enables data scientists to pitch their skills against the best, and see where they stand. It also allows them to try their hand on new types of problems, without the risk associated with failing at the task – doing badly doesn’t harm them in anyway. Plus, if they do well, they get visibility through the leaderboard.

Like any profession, there are 5-10% who are exceptional. To date, the only option these individuals have had is to be employed by the top firms in each industry. What we are saying is that in the future, this cream in every industry will prefer to be part of a crowdsourcing platform and not bound to any one organization. They would want to contribute to SpaceX and Google and Shell, all at once!


My thanks to Divyabh Mishra for the insightful conversation. The interview has been edited for length and format.

 

This is the first in the Spotlight series, highlighting new, disruptive data companies. Follow on Twitter to get notified of new articles.