The Challenges of Dealing With Big Data
I have a had a series of blog posts on issues related to Enterprise Data Architecture and related topics sitting in my TODO list for ages and just never seem to get around to it. So here is a start to that and you can thank Semil Shah for getting me off my butt and writing this.
So I happened to be on Twitter yesterday and saw a Re-Tweet of a request from Semil (@semilshah) asking for some assistance from some big data experts. So I figured that since I regularly design systems that store and analyze many terabytes or more of data per day and can grow to over 20 Petabytes of live data (I think that’s my high water mark so far) I probably qualified as a Big Data guru so I reached out. What Semil wanted was some comments and perspective on a post he had done discussing ways to create value from big data, as he is doing some additional research on this topic for another project. Semil postulates 3 levels of value creation from Big Data:
- Visualization
- Actionable Insight
- Discovery
Semil’s closing statement resonated with me very strongly. He says, and I quote:
“The real value will be created by one of these companies that help big companies in the life sciences, in oil & gas exploration, and in the social graph discover new markets, resources, and relationships that will power their businesses in the future.”
I must say that I absolutely agree with Semil that this is the Holy Grail for getting value from Big Data. I also think there are some companies building technology to allow this data to be analyzed in new and interesting ways and some of them are going to produce some truly awesome tools. There is only one problem.
All of this assumes that we can capture our data accurately, consistently or at all. Unfortunately, in my 25+ years of experience designing and build large scale systems for the World’s largest enterprises, this is not a valid assumption at all. In fact I had this discussion repeatedly with my colleagues in the Cognos division of IBM on many occasions. Even though Cognos has some great tooling and smart people making yet better tooling, all the reporting and analysis tools in the world will not help, if the data on which they report and base their analysis is itself inaccurate, incomplete or inconsistent. It’s the proverbial case of Garbage In = Garbage Out.
In order for the tools to be able to provide true value, firms first need to ensure they can do the following:
- Capture all of their data completely and accurately
- Ensure a consistent representation of the data across the entire enterprise and all of the systems within it
- Be able to capture, store, maintain and retrieve the data in a manner which can keep up with the pace at which it’s being generated or ingested (this is technologically more difficult than you realize for Big Data firms). This is somewhat related to the completely statement in one, but completeness means much more than just handling the volume
Only after this is achieved, can one start to ponder the levels of value creation which Semil postulates.
You know its kind of funny, but as I pondered Semil’s question and started writing this post, I realized that while we never really think about Nebility as a Big Data company, in many ways we are. Sure we do large scale Service Oriented Architecture (SOA) based apps and wrap them up for use as Software as a Service (SaaS) or standalone in a clients datacenter using our NebulaBlocks technology, but we never actively think of ourselves as a big data company. Yet as I sit here pondering it, everything we designed into our NebulaBlocks platform, from the approach we take to data modeling, and storage through to the way we implemented all of our components to scale is because we are specifically trying to ensure we address items 1-3 in my statement above with a whole lot of security stuff thrown in on top of those 3. In fact, the whole reason we started Nebility was to be able to provide tools and components to solve these issues. Its seems clear now that I think about it, but until Semil raised his question I would typically not have positioned Nebility as a big data company. But I guess we are one.
Anyhow, now that I have started, I will try and provide some more material on this topic in the near future in other articles, so watch for them. I will probably tackle issues related to those three items above each in a separate post, but I think I will throw in an article with some anecdotal evidence of the real issues, from my personal experiences in the financial world, first, so you can put things in perspective.
As always you can reach me through Twitter, LinkedIn, by using the contact links in the author box or here through the website.













