Emerging Tech: the profile of a data scientist
Obviously, technique is important in a data scientist profile, but today lots of other qualities are sought as well. And to reveal just what those other qualities are, the university center U-Tad has organized a debate on the subject as part of its Emerging Tech: What To Do When Data Change Everything series.
“It's a fallacy to look for a programmer with six years' experience in big data because there aren't any. You have to look for growth potential and the capacity to learn and work well with a team." Those were the opening words of Luis González, a cloud architect with Beeva. Óscar Méndez, CEO of Stratio, followed in the same vein:
“It's hard to find people with more than three years' experience, even though the technology was invented 10 years ago."
In the opinion of Beeva architect González, “The most technical data analysis profiles are unusual profiles by their very nature. They usually do things at home to complete their training." As he went on to point out, companies are looking for professionals with a "more horizontal" profile; in other words, what are known today as full-stack engineers, because they can program in PHP, Python, Html, etc.
And this has to be complemented by a good grounding in math, so that they "are familiar with and can understand mathematical codes and algorithms", and even have a business knowledge "to be able to understand the services that can be offered with data".
Óscar Méndez of Stratio pointed out that all these qualities are more necessary now because, at least in his start-up, they "don't use big data but build it", and the difference is like "driving the car and building it. The engineering is key“.
Which is why he insisted that the most sought-after profiles today have to have a technical knowledge "but also the right values and attitude”. Something crucial for adapting to this booming start-up is that "there are no hierarchies. We work in circles on which no decisions are imposed”.
In this respect, humility has to be one of the essential pillars. “In a model without any hierarchies and procedures, someone who is very good technically but has no humility will end up imposing their opinion on everyone else, so instead of co-creating as a team something is being taken away".
Sonia Casado, senior manager at Accenture Digital, agreed with both González and Méndez: “We look for people who are passionate, inquisitive, innovative, and who know full well that what people were doing three years ago has changed but also that what they were doing just six months ago has changed enormously as well”.
Casado stressed the three pillars they look for: first of all, the business; second, analytics; and third, the architecture and technology.
“They have to know how to develop business applications and what they can offer to each company. In the case of analytics, they need an expert knowledge of math and statistics, but what really makes a difference is a talent for applying business solutions in different languages. And all of that accompanied by an advanced knowledge of data processing".
Susana Ferreras, a data scientist at Telefónica, supported Casado's thesis. “We look for developers with basic skills and analytic experts with a knowledge of statistics, but we also need to take care of the business leg as well - in other words, the direct application."
In her opinion, it's not a matter of "finding the best programmer or best statistician, but people with an intermediate level in these three fields". However, she also said that it was no easy task to find people with this profile, because they demand “people who are motivated and curious, and who show it as well”. Being more specific, she said the key was to be proactive and to enjoy always learning something new.
The recommended technologies
The million-dollar question is this: What do I need to know to have everyone queuing up to hire me? And the answer is, everything that has to do with big data technology.
This was how Stratio CEO Méndez summed up the great debate on the type of training required.
“If you know Hadoop, rest assured: they'll hire you. If you study, are familiar with and can use Hadoop confidently, the market is sure to want you. The same goes for Spark and Python as well."
He also gave a great tip to all aspiring data scientists: “A lot of languages have been underrated in the past, like Python and Node.js, but they're actually very interesting because many different profiles can use them at the same time."
According to González, the star technology of the moment is Spark: “There's no doubt that it's the most important thing in analytics. It's something that's live and is constantly being improved."
Lastly, Susana Ferreras praised Python, especially in the world of analytics where it competes with the programming language R. But she also pointed out the merits of the paradigm MapReduce for processing and distributing large data sets.