Monday, March 31, 2008

The Secret Laws of Analytic Projects

The First Certainty Principle: C~ 1/K ; Certainty is inversely proportional to knowledge.
A person who really understands data and analysis will understand all the pitfalls and limitations, and hence be constantly caveating what they say. Somebody who is simple, straightforward, and 100% certain usually has no idea what they are talking about.

The Second Certainty Principle: A ~ C ; The attractiveness of results is directly proportional to the certainty of the presenters.
Decision-makers are attracted to certainty. Decision-makers usually have no understanding of the intricacies of data mining. What they often need is simply someone to tell them what they should do.

Note that #1 and #2 together cause a lot of problems.

The Time-Value Law: V ~ 1/P ; The value of analysis is inversely
proportional to the time-pressure to produce it.

If somebody want something right away, that means they want it on a whim not real need. The request that comes in at 4:00 for a meeting at 5:00 will be forgotten by 6:00. The analysis that can really effect a business has been identified through careful thought, and people are willing to wait for it. (A cheery thought for those late-night fire drills.)

The First Bad Analysis Law: Bad analysis drives out good analysis.
Bad analysis invariably conforms to people's pre-conceived notions, so they like hearing it. It's also 100% certain in it's results, no caveats, nothing hard to understand, and usually gets produced first. This means the good analysis always has an uphill fight.

The Second Bad Analysis Law: Bad Analysis is worse than no analysis.
If there is no analysis, people muddle along by common sense which usually works out OK. To really mess things up requires a common direction which requires persuasive analysis pointing in that direction. If that direction happens to be into a swamp, it doesn't help much.

Sunday, March 30, 2008


I've written quite a bit about unsuccessful Information Engineering projects; now I want to write about a successful one.

How can you change a company? Give people the information they need to make decisions they never thought they could and that changes how they think about the enterprise. The trouble is, any organization will put up a lot of resistance to change.

In 2002 I managed a Life-Time Value (LTV) project at a Large Telecommunications Company (LTC) that did change the enterprise. LTV is an attempt to measure the overall economic impact of each customer to the enterprise over their expected life. Ideally this is concrete numeric data so we can ask “Is this customer worth $300 in new equipment for them if they will stay with us for two more years”?

The LTV project allowed people to think about the business in new ways, the project was embraced by the Chief Marketing Officer, and the project saved $15 million each year in direct marketing costs while adding to the revenue from marketing programs simply by not spending money to retain customers that LTC was losing money on.

There are a lot of articles about how to do LTV calculations. This time I want to talk about all the corporate politics around sheparding the LTV project to success.

Wednesday, March 26, 2008

Data-Driven Organizations are a Bad Idea

Consider: it really takes only a few facts to make a decision, but it takes a wealth of insight to know what the relevant facts are for the decision.

In a data-driven company, every single analysis generates facts, and every single one of those facts indicates a possible decision. In a data-driven organization people really have very little guidance to make decisions. Even worse, the uncertainty that all the possible decisions that could be made drives people to ask for more analysis. More analysis means more facts generated which means more possible decisions suggested, which means an even greater confusion and the problem gets worse. The end result is that decisions get made for really very arbitrary reasons, usually the last fact someone say before they were forced to decide. I think it's better to rely on intuition and experience that to try to make sense out of a sea of random, contradictory facts.

What works is to have a decision-driven organization. Understand what kind of decisions the organization needs to make, understand the basis on which these decisions should be made and be explicit about it, and then once that blueprint for decision-making has been made then build the information needed for the decision.

Tuesday, March 25, 2008

I don't like books

I'm not that big a fan of data mining books. Every article I've read, or book I've read, or class I've taken, has been about what works. About the only way to find out what doesn't work is to have project blow up on you and be sweating blood at 2 a.m. trying to figure out why all the nice algorithms didn't work out the way they were supposed to.

Friday, March 21, 2008

Good Data, Bad Decisions

Barnaby S. Donlon in the BI Review ( gives a good description of how data goes to information, to knowledge, and then to decisions. He's saying all the right things, and all the things I've been hearing for years, but you know -- I don't think it works anything like that.

When we start with the data, it's all too much. It's too easy to generate endless ideas, endless leads, endless stories. I've seen it happen when an organization suddenly gets analytic capability.

Before, the organization was very limited in it's abilities to make decisions because they had limited information. The organizational leaders have ideas, and because of the lack of information they have no way of deciding what is a good idea or a bad idea. After the organization starts an analytic department, then suddenly every idea that the leadership gets can be investigated. The paradoxical result is that the leadership still can't make informed decisions. Every idea generates an analysis, and virtually every analysis can generate some kind of results. Without data, the result is inertia; with too much data the result is tail-chasing.

The right way to do this is to begin with the end. Think about the decisions that need to be made. Then think about how to make those decisions in the best possible way. Starting with the end means the beginning -- the data, the analysis, the information -- is focused and effective.

Information Design: What does it take to be successful?

All of the examples that I have given are of poor information design. Some of them have had more or less success, but they all had substantial flaws. There's a reason I'm saying that information design is a missing profession.

Why is it so hard? First off, true information design projects are fairly rare. BI is usually about straightforwards reporting and ad-hoc analysis. People don't get much of a chance to practice the discipline.

Information design requires a lot of other disciplines. It takes statistics but isn't limited to statistics. Data mining can help but can easily bog down a project in complicated solutions. It requires being able to think about information in very sophisticated ways and then turn around and think about information very naively.

It requires knowing the nuances of an organization. Who are the clients? The users? What is the organizational culture? What does the organization know about itself? What does the organization strongly believe that just isn't so? It's not impossible for an outside consultant to come in and do information design, but it is impossible for a company to come it with a one-size-fits-all solution. When it comes to information design, one size fits one.

Because the profession of information design hasn't been developed yet, it isn't included in project plans and proposals. For two of the projects above information design wasn't even thought of and for the third it wasn't done well because the clients true needs weren't uncovered.

Thursday, March 20, 2008

Daily Churn: The Project was a Complete Success and the Client Hated Us

The story eventually had a less than desirable ending. After producing accurate daily forecasts for months our work was replaced by another group's work, with the predictions that were much higher than ours. It turned out that having attrition sometimes higher than predictions and sometimes lower was very stressful to upper management and what they really wanted to be told wasn't an accurate prediction of attrition but that they were beating the forecast.

Ultimately the problem was a large difference between what management wanted and what they said they wanted. What management said they wanted was an attrition forecast at a daily level that was very accurate. To this end my group was constantly refining and testing models using the most recent data we could get. What this meant was that all the most recent attrition programs were already baked into the forecasts.

What management really wanted to be told was the effect of their attrition programs, and by the design of the forecasts there was no way they could see any effect. It must have been very disheartening to look at the attrition forecasts month after month and being told in essence your programs were having no effect.

What my group should have done is to go back roughly a year, before all of the new attrition programs started, and to build our forecasts using older data. Then we could make the comparison between actual and forecasts and hopefully see an effect of programs.

Surprisingly, I've met other forecasters that found themselves with this same problem: their forecasts were accurate and they got the project taken away and given to a group that just made sure management was beating the forecast.

Wednesday, March 19, 2008

Daily Churn Prediction

The next project gone off I want to talk about is when my group created daily attrition forecasts for a company.

Attrition is when a customer leaves a company. I was charged with producing daily attrition forecasts that had to be within 5% of the actual values over a month. The forecast vs. actual numbers would be feed up to upper management to understand the attrition issues of the company and the effect new company programs were having on attrition.

Because my group had been working at the company for a few years we were able to break the attrition down by line of business, into voluntary and involuntary (when customers don't pay their bills), we were able to build day-of-week factors (more people call to leave the company on a Monday) and system processing factors (delays from the time a person calls to have their service canceled and when the service is actually canceled). Our forecasts performed within 3% of actual attrition. Often we were asked to explain individual day's deviations from predictions which we were always able to do – invariably major deviations were the result of processing issues, such as the person that processed a certain type of attrition taking a vacation and doubling up their processing the next week.

We were able to break down the problem like this because we knew the structure of the information that the company data contained and we were able to build a system that respected that information.

The analysis was a complete success but the project died. Why tommorrow.

Tuesday, March 18, 2008

Premiums from Credit Data II

A new team, including myself, was brought in to take a second pass at the project. What we did was to 1) look at the data to make sure we had a valid data set, validated with the client 2) make sure we had standards to meet that were appropriate to the project and 3) started with a simple solution and then built more complex solutions. What approach 3) meant was that very quickly we had some solution in hand, and then we could proceed to imporve our solution through project iterations.

The project didn't work out in the end. The relationship with the client had been irrevocably poisoned by the previous failure.
But we were able to do the project the right way the second time.

Monday, March 17, 2008

Premiums from Credit Data: Going Wrong

The modeling effort ran into trouble. The models were drastically underperforming from what was anticipated. The team tried every modeling approach they could think of, with little success. Eventually the whole project budget was used up in this first unsuccessful phase with little to show for it. I was brought in at the end but couldn't help much.

There's a long list of things that went wrong.

The team forgot the project they were on. They were using approaches appropriate to marketing response models and they were working in a different world. Doing 40% better than random doesn't work well for marketing response models but here it meant we could improve the insurance company rate models by 40% which is fairly impressive. Before the project started the team needed to put serious thought into what success would look like.

The team let an initial step in the project take over the project. At the least, that initial step should have been ruthlessly time-boxed. Since that initial step wasn't directly on the path towards the outcome it should not have been in the project.

The team didn't do any data exploration. When I was brought onto the project near the end, one of the first things that I did was to look closely at the data. What I found was that over 10% of the file had under $10 in six-month premiums, and many other records had extremely low six-month premiums. In other words, a large chunk of the data we were working with wasn't what we think of as insurance policies.

This goes to an earlier point, that often DBAs know the structure of their data very well but often have very little idea of the distribution and informational content of their data. Averages, minimums, maximums, most of what we can get easily through SQL don't tell the story. One has to look closely at all the values and usually this means using specialized software packages to analyze data.

We got a second chance later, fortunately.

Wednesday, March 12, 2008

The Next Fiasco - Premiums from Credit Data

A company I was with was building a modeling system to look at individual credit history, compare it with insurance premiums and losses, and identify customers where the insurance premium was either too high or too low. I was only peripherally involved with the project and only brought in at the end. What we were asked to predict was the overpayment or underpayment ratio so the insurance companies could adjust their premiums.

The project started by receiving large files from the client and starting the model building process. The team decided to start out with a simpler problem by predicting if there was a claim or not, and once that problem was solved using the understanding gained to move on to the larger problem.

Things didn't work out so well.

Monday, March 10, 2008

Righting the Wrong-Sizer

In order to fix this problem the company has to do some hard thinking about what kind of company they want to be and what kind of customers they want to have. Other things being equal companies want the customers to pay more for goods and services and the customers want to pay less; on the other hand companies want to attract customers and customers are willing to pay for goods and services they want. This means that in order to maximize the total return there is a real tension between maximizing the price (to get as much as possible from each customer) and minimizing the price (to attract customers and make sure they stay). How to resolve that tension is by no means trivial. One option is to assume that “our customers are stupid people and won't care that their bill just went up” but I don't think that's a good long-term strategy.

Ideally we want to find services that are cheap for the company but that customers like a lot. Standard customer surveys will just give us average tendencies when what we care about the preferences of each individual customer. Fortunately we have an excellent source of that customer's preferences: the rate plan they are on. Let's assume that the customers are in fact decently smart and are using roughly the best rate plan for them, but they might need some help fine tuning their plan.

Take the customer rate plans and divide them up into families. When a customer calls up, look at their actual usage and calculate their monthly bill in the different rate plans in their families. If a customer can save money by switching rate plans, move them but keeping them in their rate plan family. This method makes sure the customer is getting a good deal and sticking within their known preferences, and the company is still maintaining a profitable relationship with the customer.

Saturday, March 8, 2008

What's Wrong with the Wrong-Sizer?

Let's start with the customer usage profile. To start out with a project that's intended to give individual recommendations to customers and start that project by assuming that all customers act the same is amazingly dense. The Rate Plan Optimizer project manager explained that they had a study done several years ago saying that most customers were fit pretty well by their profile.
First off, a study done a few years ago doesn't mean that much in a constantly changing world, not when data can be updated easily. Second, even if most customers are pretty well fit by the profile that means that some customers are badly fit by the profile and will be negatively impacted by the system's recommendations.

The reason that the IT department went with using a one-size-fits all usage pattern was that the customer data warehouse did not actually have customer usage data in it, only how the customer was billed. The IT department should have taken this project as an excuse to get the usage data into the data warehouse. The customer recommendations could have been been done at an actual customer level.

The next major problem with the Rate Plan Optimization project was choosing the rate plan that was most profitable to the company and then suggesting the customer adopt that plan. In other words, the Rate Plan Optimizer had the goal of making the customer's bills as large as possible and making sure the customer got the worst possible plan from the customer's standpoint.

How to fix it? That's tomorrow.

Thursday, March 6, 2008

Real Examples: The Rate Plan Wrong-Sizer

That's a hypothetical example of information design; let's talk about some examples where I don't think that design was done so well, and how I think it could have been made better.

The Rate Plan Wrong-Sizer

I was working for a telecommunications company when my group was introduced to the Rate Plan Optimizer Project. IT had just spent one million dollars in development budget and they needed a group to take over the product.
The goal of the Rate Plan Optimizer was to help customer service reps suggest rate plan improvements to customers. The product did this by

  1. Assume every customer had exactly the same usage patterns with the only difference being their minutes of use and then
  2. Look at a series of rate plans and suggest to the customer the plan that would be most profitable to the company.

The product had a number of parameters that could be managed, and IT wanted our group to do the managing.

I can't tell you that much about the parameters because my group got as far away from the project as quickly as we could. The project was broken enough that no amount of parameter tweaking could fix it and we didn't want to take the blame for generating bad customer experiences.

What's wrong with the Rate Plan Optimization Project and how should it have been designed? More tomorrow.

Wednesday, March 5, 2008

Building an Attrition System

We're talking about setting up an attrition intervention system.

This is all about information: how to get customer care reps the exact information they need to help out our customers.

The first big step is getting commitment to build a system and do it right. A well-done simple policy is a lot better than a badly done sophisticated policy. The next step is getting commitment to test the system at every level. Customers are fickle creatures and we don't understand how they will react to our best efforts. I'll have to say something about how to measure campaigns soon, but right now let's just say that we need to do it.

Let's start with the intervention. The obvious thing is to try to throw money at customers, but buying customers can get very expense quickly. What will often work better is to talk with them and just solve their problems. But here you need a good understanding of what their problems are. We can do this by a combination of data analysis, focus groups, surveys, and talking to customer reps. There are a couple of dangers here. 1) Trying to do this by simply building an attrition model. Attrition models will typically tell us the symptoms of attrition , but not the root causes. 2) Relying on the intuitions of executive management. Executives often have some ideas about attrition but rarely have a comprehensive understanding of why customers actually leave.

The next step is trying to get an understanding of the finances involved. What are the financial implications of, say, reversing a charge the customer didn't understand? It's going to be different for one customer that has done this once and another customer that habitually tries to take advantage of the system.

Everything, everything, everything needs to be checked against hard numbers. We have experiences and form opinions on these experiences but until be check we don't know what's really going on.

The last step is what people usually start with: building an attrition model to tell when customer are likely to leave. A standard attrition model won't really give us the information we need. We don't just need the chance someone is going to leave. We need to match customer with intervention; that's a much more specific type of information.

Monday, March 3, 2008

An Example of Information Design

Let's say we're designing an attrition (attrition is when a customer leaves a company) system. When a customer call customer care, we give the representative a recommendation. We have a lot of options:
  1. We can't ever know exactly who is going to leave so let's not address the problem.
  2. Have an overall policy that treats all customers exactly the same.
  3. Present an attrition score to the customer representative.
  4. Present an attrition threat flag to the customer representative.
  5. Give a graded response with reasons and some specific recommendations to the representative.
Depending on the company any one of the solutions may be appropriate.

If a company is going through a period of low attrition, ignoring attrition may be the best response. There can easily be more important problems for an organization to worry about. I have seen this happen in companies where attrition has been a critical focus of the company: the incremental effect of a new attrition-focused system is small. However, if attrition is a problem in the company (1) can be a foolish approach. It is usually impossible to tell who exactly is going to leave but good analytic design can tell you how to make bets and get a good return on your efforts.

Solution (2) is what companies usually do, and if the policy is well thought out this can be sufficient.

Solutions (3) and (4), while apparently more sophisticated that solutions (1) and (2) are asking for trouble, How are service representatives supposed to interpret the data they are given? If we give customer service representatives a raw score without guidance, then good representatives will worry about their interpretations and the attrition score will become a source of stress. If we give an “Attrition Threat: Yes / No” flag, then we've lost the ability to distinguish between a slight risk and a substantial risk. we'll be giving the representatives clear guidance but that guidance probably not be appropriate and the company will be worse off than if they had no policy.

What we want to do is solution (5): break the base down into segments with guidance and insight in each segment, making sure that our intervention is appropriate and effective in every case.

Still, there are lots of right and wrong ways of doing this -- more tomorrow.