I took a stab at a discussion of big data in a recent note (see: Big Data: The Hottest New Thing in Computing Attracts Big Money). It was largely about one of the popular database products but didn't go into too much detail about what constitutes big data or why one needs to be concerned about it. A recent article helps to close this gap (see: If 'Big Data' Simply Meant Lots of Data, We Would Call It 'Lots of Data'). Below is an excerpt from it. It's a longish article so link to it for a deeper dive.
It’s official, Big Data is the flavor of the month....According to McKinsey Global Institute, “Big Data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” IT guru, Buck Woody, says, “Big data is the data that you aren’t able to process and use quickly enough with the technology you have now.” But Big Data is more than “lots of data.” It goes beyond quantity and speaks to the challenges of velocity and diversity. In today’s age of digital media and channels, vast amounts and types of data are coming at us so fast and from every online and offline direction. Big Data refers to the ability (or inability) to effectively deal with all these aspects of consumer information....
Substantial changes in technology have given us the opportunity to process the mounds of data we are generating and actually do something with it.....Twenty-five years ago, most data systems processed flat files in mainframe environments. ....The relational database represented a fundamental shift from the mainframe, storing data more effectively and efficiently by “remembering,” while a mainframe “processes and forgets.” Now, these relational databases have gotten even more sophisticated. As the internet created new industries..., companies have begun dealing with data that’s not only real-time but also unstructured in nature. Consider Google, whose “data” is effectively the internet – they basically download and index the internet as a business. This data change forced new technology advancements and caused a paradigm shift in data management. Enter things like NoSQL...environments to the mix, and suddenly, we’re in the era of Big Data. But through the evolution of mainframe to database to NoSQL, have we really created competitive advantage for companies?....
The digitization of – well, everything – is creating “data” at an unprecedented rate. And data is only valuable when it can be converted to information and understood and rationalized. With Big Data, the vast majority of the data is ‘information poor,’ (i.e., worthless). As the data grows and changes, how has analytics evolved to find meaning in a growing sea of data noise? Twenty years ago, we used predictive modeling to forecast direct mail responsiveness. Back then, we had to significantly “sample down” the data so our mainframes and desktop computers could process statistical regression models in a timely manner. But with today’s advancements in technology, our ability to process all the data has changed dramatically. I believe the first main difference with Big Data is that we no longer have to rely on sampling to determine the likely outcome of a population. Today’s technology is vast enough to process all the data we couldn’t study before.
So, what's the relevance of all of this for healthcare? Well, for one thing, hospitals sit on a mountain of data. The labs churn out much of it. One might speculate that healthcare would provide a golden opportunity to test some of the basic concepts in this emerging field. However, David Kibbe and Vince Kuraitis suggest in a blog note that this would be premature (see: The Power of Small). Another article continued the discussion (see: Think Small Data Before Big Data, Healthcare Gurus Argue). Below is an excerpt from the latter article:
Kibbe and Kuraitis assert that it's premature and unnecessary to use big data in patient care. Instead, they maintain, providers should use "relatively low tech, high touch, data-driven interventions" to improve care management. They suggest establishing electronic registries to identify and manage high-risk, high-cost patients. Electronic health records certified for Meaningful Use stage 1, they note, can generate clinical summaries known as continuity of care documents (CCDs) that providers can use to export data to those registries....."Simple data mining" of registries, they point out, has been used successfully to:
- Identify patients who have multiple diagnoses and need nurse case management;
- Report physician-specific patient benchmarks for people with diabetes, such as patients meeting targets for blood pressure and LDL cholesterol;
- Find patients with specific ER visits and assign them care managers; and
- Recognize preventive care gaps for diabetic patients during visits and assign a care manager to make sure those gaps are filled.