Data capture. This reblog has a fantastic video explaining the extreme detail of massive data collection. Low cost for high Return On Investment. This data set can be infused in a Data Lake to find signal in the noise such as relief for traffic and relief for flooding. The Data Lake can be tapped for more innovation as well.
Posts Tagged → data lake
Elephant(s) In the Room – Big Data Elephant Conservation Info Graphic
Info graphic for elephant conservation utilizing Big Data. Follow up to Elephant(s) in the Room – Just How Many Are There?
Continue readingDoes your Doctor Know It’s Safe to Take That? Big Data Replies
Friday’s post contended that scientific method has many holes in its application. Ben Goldacre’s “Battling Bad Science” Ted Talk explains one facet of this concept.
Big Data addresses:
Why should the ancient practice of scientific method be questioned?
AUTHORITY.
As individuals in society, we hold others in regard for accomplishments that give them authority, such as doctor for their medical degree. Although with the internet at our fingertips we have gained access to ever-greater amounts of information, we have also learned some skepticism, but still retain some sheep mentality.
Goldacre points out we still have a retained awe for authority. With a simple example, he explains how authority can be accepted by a large, popular audience when the authority is actually less than ideal.
With the ubiquity of the internet, authority will only continue to be an issue for any organization or society at large. Big Data is more of an open source platform which involves creating data lakes. These currently infuse the data silos of an organization, or in the case of drug efficacy, corporate secrets.
“SCIENTIFIC” STUDIES
Goldacre expounds upon how cause and effect studies are “published” with basic flaws in even the simplest cases. The testing environment does not accurately, or sometimes even remotely, simulate the results touted. In addition, the plethora of factors involved is rarely accounted. The test sample sets are representative of general or specific populations, but are these representative of YOU?
Because Big Data is able to consume a vast variety of data, not adhering to strict control methods of traditional scientific method frees the data to more readably present a viable pattern. Trying to hold all other variables constant in a scientific experiment is challenging at best and completely unrealistic practically at worst. (In real life, you can’t hold all the scientific experiments environmental factors constant to obtain the same favorable results.)
OUTCOMES
Goldacre somberly explains then that these simple examples are just that – simple. Drug studies that are the basis of doctors’ “knowledge” of treating YOU and society are based upon far more complex … and jaded processes.
Our beliefs and expectations of a drug’s efficacy shape the outcome. He gives several examples of how data is effectively rigged to produce a carefully prepared outcome. Thus making the result look … like what they want you to see.
One of the premises of Big Data is finding patterns in the data, not looking to prove or disprove a theory. Therefore, trying to rig an outcome one direction or the other is not a Big Data practice.
(…so would a drug company ever what to use it?)
MISSING DATA
Goldacre’s final, sobering point was actually the jumping off point for his next Ted Talk on how drug trials have dangerously biased results.
Missing data is one of the greater challenges to Big Data execution. Several methods are in practice to compensate for gaps such as null values or incongruous data sets. The difference with Big Data is that it readily addresses missing data as opposed to discounting it as Ben Goldacre explains in his examples. Because Big Data involves huge volumes of data points, the missing data compensation practices more readily present an accurate representation of the information.
What is Data Munging?
Definition and discussion on what data munging is.
Continue readingBig Data Implementation in the Middle
Digital age techies come from an open source world. Middle and top managers are used to silos. Addressing this difference is imperative to Big Data implementation.
Continue reading