You know, at their heart statistical measurements are basically the easiest thing in the world to do, especially when it comes to direct marketing. Set up your test, randomly split the population, run the test, measure the results. It pretty much takes serious work to mess this up. It's amazing how many bright people leap at the chance to go the extra mile and find an inventive way to bozo a measurement.
The first exhibit is a database expert working for a customer contact project at a bank. A customer comes in, talks to the teller, and the system 1) randomly assigns the customer to the control group or not if this is the first time the customer has hit the system, otherwise it looks up the customer's status and then 2) makes a suggestion for a product cross-sell. The teller may or may not use the suggestion, depending on how appropriate the teller thinks the offer is for the customer and/or how busy the branch is and if there is time available to talk to the customer.
So now, we've got the simplest test/control situation possible. What the DBA decided was to toss out all the customers where no offer was made, on the theory that if no offer was made then the program had no effect. So, all the reporting was done on "total control group" vs. "treatment group that received the offer", creating a confounding effect. The teller decision to make the offer or not was highly non-random. The kind of person that comes in at rush hour (where the primary concern of the teller is handling customers and keeping wait times down) is going to be very different from the kind of person that comes during the slow time in the middle of the afternoon.
The project team understood this confounding, that in their reporting they were mixing up two different effects, and talked for over two years about how to overcome this confounding when all they had to do was be lazier and report on the random split.