Friday, October 29, 2010

Don’t stop – there is a lot more to discover out there!

There is a rumour out on the street that says that the age of the Zettabyte is near. Now that is a realistic number when measuring internet traffic. But could there possibly be analytical environments of this size? There are lots of Teradata customers that are in the Terabyte range, and they do unique stuff with their data. Is there any point in having huge data volumes available in itself? A Zettabyte equals 1024 Exabytes, which equals 1024 Petabytes – and the Petabyte club is still an exclusive circle of Teradata customers (although their volumes are increasing at a breath-taking pace.) What is their interest in “big data”? Couldn’t there be a point when large data volumes become an obstacle, rather than an opportunity for analytical purposes?

Listening to the big guys, you don’t get the impression that they worry about size. Large data volumes rather seem to be their playing field on which they try out new tricks. While many data warehouse architects are heavily concerned with economics, these guys are worried that they do not capture enough detail. Oliver Ratzesberger of eBay specifically quoted one of his colleagues who said that you should never throw away any data at all because you never know what you may want them for in a few years’ time. The knack is to have the data ready when the business idea pops up rather than having to wait an extra 12 or 30 months until you have collected enough material to test it on.

This appears to be the attitude that sets the Petabyte club apart from (and often ahead of) other players: they are moving away from using data to confirm suspected answers towards discovering genuinely new things. It requires them to capture high quality detailed data as opposed to summary data. For example, storing information about transactions will enable you to identify some long-term patterns. But it is the interactions with the company that a customer has before and during the purchase that tell the whole story, especially when the sale does not take place in the end (which is clearly the more interesting case.) On websites, simple usability details may turn out to be deal-breakers en masse. But you will never find them unless you track and analyze customer interactions. The club members make sure they can do so when they notice that they need to.

The lesson to learn from these giants is that when you are laying out your data warehouse, it makes sense to look beyond the first few years when you do standard reporting and a bit of analysis. At a later stage, when you have learned to make full use of your insights and when your organization has begun to direct more and more questions at the analytics department, you will want to predict, react to and trigger business events. For example, look at the telecommunications sector. A few years ago, some people questioned the benefits of storing call detail records. The reality today is that if you cannot analyze these data by now, you’ll probably find it hard to stay in business at all, given your competitors’ analytical advantages. The same is happening with OSS data now. This data is turning out to be extremely valuable if you want to understand customer experiences with your services. Are there, for example, any bandwidth bottlenecks in any area that actively drive customers away? And isn’t there any way you can spot these problems in real-time and rectify them right away? Well, there will be, unless you have decided against capturing those data on a detailed level in the first place.

It’s worth to keep this in mind as social media data, and all those other new data sources, are about to be integrated into enterprise analytics. An assortment of data won’t keep you at the head of the field. I am aware that this looks like a massive challenge to many enterprises. If handling the Petabyte was like breaking the sonic wall, handling the Zettabyte appears to be like trying to travel at speed of light. Well, I’d say just don’t stop (as Freddy Mercury has put it), keep on pushing – and we will get there all together! The consensus at the panel was that in the next 3-5 years, we will see the required innovations. Now ain’t that worth keeping the faith?

No comments: