What does it take to be a data scientist?
Alan O’Loughlin, Director of Analytics and Statistical Modelling, International, LexisNexis Risk Solutions, Insurance explains the growing importance of data science within the insurance industry and looks at how insurance providers can leverage the vital skills of data scientists.
Data is a game-changer for many industries, not least insurance, but it needs skill to derive value from data. As such, the demand for good data scientists is growing. This role not only needs a combination of key personality traits and skillsets; there is a certain mentality required for an effective Data Scientist.
The problem with data
In the nearly 18 years I’ve been at this across many different industries, I have never been provided with a clean dataset to work on and build from outside of college. Normalising and cleansing data, intertwining multiple sources and bringing it to a point where it can be used to build a valuable solution is never plain-sailing.
It is possible to build valuable insights from the data already held by an organisation, although larger businesses often find this is more challenging as the information is often owned by separate departments or business areas. Gathering the data requires working closely with each area, encouraging buy-in to the data pooling strategy, often through a certain amount of trust building too.
Once the data is pooled, the next challenge begins. Each business area will produce different information in different formats. Sources may range between structured databases like PostgreSQL, MySQL, HPCC and HANA to unstructured raw XML API responses to excel and text files. These will contain anything from transactional information (live quotes) to historical data and contributory assets from multiple insurers. Then before the data can be used, it must be connected and normalised, which usually means solving a string of problems along the way. This is where the skills of a Data Scientist really come to the fore.
The essential skills
Coding fluency - Pulling data from many sources often means dealing with multiple coding languages. A data scientist fluent in Python, R, Java, SQL, and Scala to name but a few, will be able to work independently and efficiently which removes dependency on other teams.
Mathematics – The ability to analyse data will allow a Data Scientist to quickly recognise if a dataset can be enriched and used in predictive models and to validate any hypotheses surrounding that data. It is also extremely important to understand the concepts of enabling the data to ‘time travel’, through the lens of what information is available at a given point in time.
Problem Solving – Even conceptually, tying data together never goes smoothly and requires innovative problem-solving skills to produce the best results. I have long since lost count of the number of perfect products built in a sandbox environment, which either fail or take an age to release to a live audience. Data Scientists become data detectives, tracing back to figure out what has gone awry, where the solution differs between trial and real-life and how it can be resolved to deliver the best solution.
Adaptability – When those ‘best laid plans’ do not work out, quick reactions and delivery of solutions are essential. Data Scientists must be clever and adapt, figuring out why it is not working and what can be changed to fix the issue. Once a product is built and delivered the Data Scientist cannot move immediately on to the next shiny new product: they have a responsibility over the solution, to ensure it works correctly – and when it doesn’t, to figure out why and how it can be fixed. This can mean investing more time on analysis or bringing in additional datasets to compensate for the underlying issue.
Hunger – The cycle of pooling, connecting, normalising, cleansing, analysing, building, adapting, correcting and maintaining data is essential for success in the area of Data Science, so they must have a hunger to deliver the best solutions, and not be deterred by some of the monotony that can exist in that cycle and keep working until they have the most robust and valuable solution live. A true Data Scientist must evolve and move beyond just reporting out, being able to build products and recommend business decisions off the back of their analysis.
Underwriters of the future
Many insurance providers will already be using these methodologies and toolkits to deliver their analysis, finding the value and respond to the changing market. Having the right skillsets on-board to properly interrogate the data and using attributes built specifically for this purpose has become increasingly important. It’s also key that people analysing the data don’t get trapped using legacy systems which lack the necessary range of capabilities.
A Data Scientist will understand in the real world how data is captured and why there are gaps – not something someone can learn simply by studying the theory. They can then figure out intuitively what needs to be done with gaps in the data, using their problem-solving abilities, patience throughout the process of trial and error, and of course, robust data interrogation skills.
There’s no denying that the traditional actuarial underwriter has a lot of the base skill sets of a Data Scientist. What is new is that data has grown in volume, quality and accessibility. There is now the ability to combine and analyse multiple data sources, offering insurance providers the opportunity to derive new insights that may offer a competitive advantage.