Bootstrapping an AI product... without any data

Wrisk CDO Stewart Duncan explains how the team approached the issue of building a brand new insurance experience.
Wrisk Factor Card showing a House with sunglasses

As a small insurtech, Wrisk is free of legacy technology and can create a truly modern insurance experience for customers. 

However, it means we don’t have what incumbent insurers have been collecting and using for years - data on who has claimed, how often and for how much. 

These are some insights into how we’ve used some Product-led thinking to work around our data paucity and bring a compelling proposition to market.

Making things simple is complex

For those who don’t know us, Wrisk is making the complex world of insurance, simple, personal and transparent. We’ve built a platform designed to seamlessly cross-sell lines of insurance such as contents and car with just a few taps. Customers can manage these effortlessly through the Wrisk app. 

We are working with some innovative and well-respected partners such as BMW and Allianz to bring this experience to their customers. We think working to build consumer trust in insurance is an important thing to do, especially with newer consumers, where the relationship might already be a little strained.

Our commitment to keeping things simple and customer-focused has helped us to create an insurance experience that stands out in the industry: 

  • Our unique Wrisk Score helps customers understand the things that affect their premium, offering transparency not usually disclosed by an insurer 
  • A personalised price and experience which provides full control, so customers only pay for the cover they need 
  • It’s simple to make changes anytime, anywhere, and even cancel in an instant - all without a fee

Those who follow the field will recognise that two of the above are inherently AI* problems. Firstly, how do we use patterns in data to predict the appropriate price for each customer so that we can collect enough premium to pay all their claims (and our bills). Secondly, how do we explain the complex blackbox algorithm in a way the customer can trust. 

Making up for not having data

Building a properly priced car insurance product without a set of historical claims data posed a significant challenge. However, the world of motor insurance in the UK has a few characteristics we could use:

  • It’s highly competitive, which means most incumbents are not making large margins on their insurance
  • It’s sold primarily through comparison sites (aggregators), which means they see most of the market
  • It’s commoditised, because of the oligopolistic strength of those same comparison websites
  • It has limited customer loyalty because of aggressive renewal pricing, which drives churn and returns customers to the comparison sites

These are the things we plan to challenge, but first, we decided to use them to our advantage.

Through industry contacts, we bought an anonymised set of 2.2 million quotes from one of the leading aggregators, comprised of some 60 features (basically the questions you’d typically answer when getting an insurance quote).

Since car insurance is competitive, we reasoned if we could build a model that predicted the market price with sufficient accuracy and if we could create some sensible assumptions for how that price related to actual risk, we could satisfy our underwriters and get trading.

Combining machine learning approaches

Of course, this wasn’t as simple as it sounded. We wanted the end model to be expressed in an industry-standard form, essentially a generalised linear model (GLM). Not particularly revolutionary, but in building our own backend pricing engine we wanted to be able to support typical actuarial models. 

Furthermore, GLMs are easy to understand, meaning underwriters can follow them and add their own expert experience in the form of simple overlays.

But most importantly, our UX designers could externalise the impact of model features to our customers as our Wrisk Score, fulfilling our ambition of throwing back the curtain on the inner workings of pricing (or so we thought).

However, we were unable to fit a GLM model with sufficient accuracy to the market data - there just wasn’t enough of a pattern. We were going to have to engineer some new features (essentially, build new data points from the ones we had). The question was, where to start? 

Gradient boosted models (GBM) are a useful technique for uncovering hidden patterns in a high dimensional dataset. Using this approach, we built a “champion” model that fit our market prices well. By deconstructing our GBM, we could create new features that would help us find a GLM that worked for us, fulfilling our functional system requirements while also producing a sufficiently accurate quote.

On the way, we had to create some additional intelligence around the location, as the types of roads, traffic conditions and crime propensity all influence your insurance premium. We had postcodes in our training data, but we needed to assign them to groups so we could train our model. To achieve this, we did some spatial smoothing using a K-nearest neighbours algorithm weighted with a population distance metric to account for the difference between urban and rural areas.

(Here’s what London looks like as an example, with red being greater risk) 

Using these and other techniques we gradually added further structure to our original data set, until we could fit a suitably accurate GLM that satisfied our underwriters and let us get to market. 

Using Machine Learning to explain the Machine

Now we could create a price for a customer that was sophisticated enough to work in the complex world of motor insurance, but that didn’t mean we could explain it easily and fulfil our transparency ambition. 

Our designers wanted to create a library of messages to explain the factors that affect a customer’s Wrisk Score. But how do to do this when the model uses over 50 features that combine in nearly limitless ways - without an army of copywriters?

Finding the meaningful clusters that explained price movements was the trick. We trained a random forest algorithm across a large sample of quotes. Through observing where our algorithm divided the quotes, we could explain the important clusters and their characteristics to our copywriters, who wrote messages for each.

This boiled the problem down to a few hundred messages across groups of factors, while still giving meaningful (and fun) explanations to customers.

The importance of standards

Building our proprietary pricing engine in a way that supported standard actuarial models has had another significant commercial benefit.

We were approached by a well-known motoring company who had collected a proprietary data set over many years of providing car-related services to the UK public. Their data pointed to a segment of customers who were routinely overcharged by other insurers.

They liked our Wrisk innovations and the experience our platform offered to customers. Their data scientists were able to create a pricing model using their proprietary insight that we could merge with our own factors created above. It was a simple task for them to express this as a GLM which could be managed on our technology stack and we aim to bring this unique product to market shortly.

Putting AI at the heart of the Product

Having fine-grained control of our policy management and pricing engines is a large part of what has made the Wrisk experience so magical and user friendly. 

Without these, it would be impossible to provide the slick instant changes, the clever data enrichments, and our unified billing and disclosure across multiple products (if you move house, for example, that effect ripples seamlessly through all your policies, even though they are priced differently).

This is the true advantage of applying AI in a Product-focused organisation: being able to shape the offering so that AI works seamlessly with and reinforces the experience we want to provide, which also happens to be our core strategic differentiator and the centrepiece of our business model.

* For those less familiar with the field, Artificial Intelligence is a broad term that is gradually converging to include both machine learning (spotting patterns in data to make predictions) and cognitive functions associated with human minds (such as vision and language). It slots into the broader data specialism alongside data engineering, business intelligence, and various forms of advanced analytics.