Main Office +1 512.814.6324
  • Home
  • Solutions
    • Trading Systems
    • NebulaBlocks
    • Custom SaaS
  • Services
  • Blog
    • Article Series Index
  • News
  • Contact
  • About

Prebuilt Solutions

Nebility is constantly developing new Traditional and SaaS Solutions for the Financial Industry. We also offer our NebulaBlocks SaaS Platform to 3rd party developers.

Read More

Custom Solutions

Nebility is able to build any Custom Financial System and offer it as a hosted SaaS Solution. Let us build your next system with dramatically reduced cost and time to market, while simultaneously improving quality and reducing risk.

Read More

NebulaBlocks

NebulaBlocks is Nebility's premier, fully SOA based, SaaS Multi-Tenant capable, platform. Nebility uses it as the basis for all of its applications and it can be made available for yours as well.

Read More

About Nebility

Nebility is Enterprise Systems Done Right. We combine decades of experience building some of the World's largest and most complex systems with cutting edge technology to deliver World Class solutions.

Read More

Home / Enterprise Solutions Done Right / Real Life Issues With Big Data In The Enterprise – The Issues With Data Consistency (Or Lack Thereof)

Real Life Issues With Big Data In The Enterprise – The Issues With Data Consistency (Or Lack Thereof)

Posted on: 03-7-2011 in Big Data, Business Intelligence, Challenges Of Dealing With Big Data, CIO, Data Architecture
This entry is part 2 of 3 in the series Challenges Of Dealing With Big Data

Large Enterprises face huge challenges when dealing with their Big Data.  In this article I am going to outline some of the common challenges with Big Data I see firms dealing with on a day to day basis.  This is a continuation of the discussion that was started in the article titled “The Challenges of Dealing With Big Data”.

In the previous article we discussed how a lot of firms and discussions, in and out of the press, are focused on how to analyze and gain insight from Big Data (whether it be on Twitter or in the traditional Enterprise).  Furthermore, I outlined how, in my personal experience, the root of the true problems with big data are often not in how or what tools we use to analyze the data, but more so in how we capture, or fail to capture it in the first place.  In essence, our failure to capture the data accurately and consistently often renders analysis of it a meaningless exercise due to the Garbage In = Garbage Out (GIGO) principle.  To make this issue more clear, I am going to provide some real world examples of some of the Big Data issues I come across with my clients on a regular basis.  Unfortunately, as I started writing this it was getting more than a bit long so I have broken it into three shorter posts of which this is the first one.

The Issues With Data Consistency (Or Lack Thereof)

Consider a large enterprise.  My typical clients often have between 500-2000+ different applications running within their data centers.  Furthermore, these applications are spread across 1500-100,000 servers.  Now imagine that Paul Michaud is a customer of this enterprise.  Now in my life time I have lived and worked all over the world and with all the temporary corporate housing I have had, it probably amounts to over 20 addresses in my adult lifetime.  Now consider that as a customer of said enterprise for most of that time,  they have had to enter my information into some meaningful fraction of those systems in order to handle me as a customer.  As a result its likely that some have me as Paul Michaud,  some as P. Michaud   Some as Paul K Michaud and some even as K Michaud.  Believe me it happens…a lot. In addition they probably have many addresses for me in the different systems with some of them reflecting current addresses but some probably have stale addresses.  So the question is this,  how does the firm analyze their business relationship with Paul Michaud when they can’t even guarantee that all of these different versions of Paul Michaud’s are the same person. 

Now consider a corporation as a client.  Let’s use IBM as an example.  If you are say a large bank you may deal with IBM and its subsidiaries in many capacities.  IBM may be a client,  a trading counter party, a supplier, a customer, etc.  To make it even more interesting you may also deal with some of the subsidiaries directly in their own right and to make it even more interesting you might have dealt with a firm in the past that IBM has now purchased (say COGNOS). The opportunities are virtually boundless here for data error issues.  Is IBM in some systems as IBM or International Business Machines or even as Intl. Bus Mach?  Does the system even know that IBM bought Cognos and when you try and determine all business you do with IBM does that Cognos business show up in the analysis?  What is likely is that some of the systems in your enterprise don’t even have a concept of a Corporate hierarchy in their data structures so it’s completely impossible for them to understand that there is a relationship between Cognos and IBM.

Now some of you are probably thinking this is an exaggeration but let me tell you this.  I had a client a few years ago who had to have the same data replicated in about 450 applications.  They would try to synchronize this every day in real time, but would also do a major replication each night using Exchange Transform and Load (ETL) processes.  Now this seems pretty standard I am sure.  What may surprise you though is that this firm had to employ over 200 fulltime staff whose sole job was to fix the data errors that happened every night in that ETL process.  These errors result from a couple of sources:

  1. Human error in how it is entered
  2. Differences in how each application stores the data in their internal data models (No two systems probably have exactly the same way of recording a customer record for example and the need to translate between them results in either errors of loss of fidelity in the data)
  3. Errors in the ETL processes itself

The bottom line is for a large enterprise, they often have huge amounts of errors in their data and if we don’t correct this problem at the source, then all the fancy analysis tools in the world can only do so much.  At their core, they all have to assume that the data they are being used on is basically good data.  Unfortunately, this is often not the case.

Watch for the second part of this post to be published in the next day or so.

As always you can reach me through Twitter, LinkedIn, by using the contact links in the author box or here through the website.

Series Navigation

The Challenges of Dealing With Big DataReal Life Issues With Big Data In The Enterprise – The Issues With Data Completeness

Paul Michaud

avatar

Paul Michaud is a co-founder and CEO of Nebility, an enterprise solutions company. Paul has been designing and building some of the world’s largest, most scalable and highest performing applications, for over 25 years. Immediately prior to Nebility, Paul was Global Executive IT Architect for Financial Services at IBM. To learn more about Paul check him out on LinkedIn using the button at the top of this author box.

Other posts by Paul Michaud
  • Popular Posts
  • Related Posts
  • Real Life Issues With Big Data In The Enterprise - The Issues With Data Completeness
    Real Life Issues With Big Data In The Enterprise - The Issues With Data Completeness
  • Real Life Issues With Big Data In The Enterprise – The Issues With Data Consistency (Or Lack Thereof)
    Real Life Issues With Big Data In The Enterprise – The Issues With Data Consistency (Or Lack Thereof)
  • The Challenges of Dealing With Big Data
    The Challenges of Dealing With Big Data
  • Welcome To Nebilitys’ New Blog
    Welcome To Nebilitys’ New Blog
  • Real Life Issues With Big Data In The Enterprise - The Issues With Data Completeness
    Real Life Issues With Big Data In The Enterprise – The Issues With Data Completeness
  • The Challenges of Dealing With Big Data
    The Challenges of Dealing With Big Data

Search

Categories

Popular Categories

Architecture Big Data Business Intelligence Challenges Of Dealing With Big Data CIO Cloud Computing Data Architecture Executive Discussions High Performance Computing Series Service Oriented Architecture (SOA) Software as a Service (SaaS) Strategy
Avatars by Sterling Adventures
Call +1 512.814.6324 to speak with a Nebility Enterprise Expert
© 2011 Nebility Inc. All Rights Reserved
TwitterStumbleUponRedditDiggdel.icio.usFacebookLinkedIn