Is anyone out there a fan of the new television series “Person of Interest?” In that show, a computer genius gathers big data from multiple sources (traffic cameras, crime reports, facial recognition, customs, etc.) to determine who is about to commit a crime *and* stop them from committing it. I’ve been meaning to write about the show for a while, as the data management challenges are rampant—then New York said they were trying to actually do it. Wow.
Short run-down from the newspaper article on Slashdot:
The Domain Awareness System will draw data from 911 calls, previous crime reports, license-plate readers, law-enforcement databases, environmental sensors, and roughly 3,000 closed-circuit cameras. It will rely on the New York City Wireless Network (NYCWiN), a high-speed wireless broadband infrastructure that allows city agencies to rapidly transmit data and be used for everything from emergency response to reading meters.
I love this snippet, because it highlights the approach many are taking to big data.
Challenge: How do I analyze large volumes of data from many different sources? Sources that were never intended to work together and have been defined and maintained by organizations that never intended to work together (utility meters and license plate readers, anyone?)?
Solution: Make it possible to move (and store) those large amounts of data.
Storage and movement are definitely required. No issues there. But New York is going to be most unimpressed with their return on investment if they ignore the following key requirements:
- Comprehensive information modeling across multiple sources. These disparate sources were never designed to come together in one great analytics whoosh. How does the customer information map to the tenant information (utility meters)? And how does that compare to the license plate owner (corporate information in the DMV) and the driver of the vehicle?
- Metadata definitions. Which metadata will the closed circuit cameras log? Which elements will be searchable? How will that metadata map to other systems’ metadata (license plate readers, for example)?
- Ownership. Love this one. The owner of “citizen” (the customer in the public sector) can’t effect change in any of the subscribing systems. The utility company, for example, isn’t going to change the definition of customer/tenant to conform with previous crime reports, which means that the data owner must instead understand all of the data elements, exactly how they’ll be transformed (and check the accuracy of that transformation), and how the elements from all of the different sources are aggregating. Big job. You should be catching a whiff of information governance here.
- Quality. First, let’s make the very risky assumption that each subscribing agency is cleaning its own data. (Let’s hope there’s an SLA in place to assure that.) The written crime report could say 205 Avenue of the Americas, and the license plate could be registered to 205 Sixth Avenue. Both are correct. Should those roll up to the same citizen? (Yes, but you wouldn’t know that by looking at the record.) Are L. Eric Johnson, Lawrence E. Jonson, Larry Johnson, and Eric Johnson the same person? Can we rely on their address for verification? (Not with this wide swath of historical data and the relocation frequency.)
What a massive coordination job! Let me throw one last fly in the soup: collection methods and human incentives. What’s the incentive for the 911 operator? Determine the scope of the emergency and dispatch help. They’re not rewarded for capturing the full legal name of a caller or victim. Neither are they rewarded for capturing exactly where the caller lives—rather, they need the scene of the crime.
How does this relate to you, a business professional? How is your company attacking big predictive analytics projects? Are you addressing the breadth of challenges listed above? If not, your big data challenge will remain just that: a challenge. You may end up with a report, but the report can’t spur real insight unless you follow classic EIM best practices.