8 Best application examples for blockchain in the US Navy (or your organization) – Part 3

Part 3

Expanding Operational: Blockchain Deployments for Impact


Expanding Operational: Blockchain Deployments for Impact

In Part 1, we explored the building blocks of blockchain – bitcoin and smart contracts. These top level basics of blockchain work quickly toward making more complex operations possible. Using step by step application, blockchain is already progressing right now in today’s industries.

In Part 2,  we began moving from tactical to operational.  The tactical utilization of bitcoin and smart contracts for stand-alone functions in test and evaluation morphed into the next level of operational with the isolated applications pulled into a third dimension, kinda like the third semester of calculus.

In this Part 3, we move further into operational with more complexity and subsequently a greater demand for coordination of resources.  Using these novel concepts also further intertwines cultural change both internal and external to the organization. Instead of modifying or enhancing current business practices, blockchain replaces the process entirely.

Scary? Because replacing a current practice requires extensive planning and considerable disruption to the business process, the effort must exact a significant return on investment. So, let’s start with a strong and somewhat clean candidate for substituting a process entirely.  

NO Sugar Coating

Blockchain can eliminate travel claims. Travel claims are a huge administrative burden to any organization and the Navy is no exception. The present digitized paper process although cumbersome has been necessary because travel claims historically have been riddled with fraud.  A significant check and balance system has been necessary not only to counter the financial risk but also to hold together the integrity for faithful use of government funds.

The essence of blockchain is trust and the point of a travel claim has been verifying trust in a complicated (but not complex) process determining whether travel costs are true to the mission, in line with the command operations, and in adherence to multiple legal rules and guidelines.

By integrating smart contracts as the mission validation and order generation, a blockchain solution ensures the individual travel arrangements are only ticketed if they follow the smart contract requirements. A traveler can’t make a first class airline reservation to the Caribbean unless the orders include that provision. The traveler can’t accidentally book a rental car in Bangor Maine when he or she should be in Bangor Washington. They can’t book a hotel that exceeds the maximum lodging rate, again unless the orders permit such exceptions. Although the user-unfriendly Defense Travel System (DTS) flags such transgressions, it does so in a cryptic procedure that still requires verification in both the creation and execution of the process, adding administrative burden as well as risk – to the traveler, the authorizing official and subsequently to the organization.

Blockchain ledgers reside in several distributed processing nodes that miners use. As such, a complete copy of the database exists on each node. This makes it highly difficult for anyone to misuse the technology for fraudulent purposes. A person will need to fool all the miners in the system to create a fraudulent entry.



Furthermore, changing travel arrangements, even to save the overall cost of the mission, requires significant staffing of command personnel as well as a 24/7 help desk.  Resolving those changes works well sometimes and not other times, making the process clumsy and flex-deterrent. Travelers avoid modifications because the process often doesn’t cooperate and changes cause ambiguities in cost accountability, shifting the risk to the traveler. It’s safer for the traveler, but more expensive for the government, to stick to the original itinerary.


With smart contracts, the travel payment and former claim process actually execute simultaneously in real time as travel occurs.  There is no back-side report which is today’s travel claim. When the traveler boards a plane, the transaction is verified and paid. When the traveler checks in the hotel, the night’s stay is paid, and the next, and the next until the traveler leaves. The metro ticket or Uber ride is verified – and paid – as it happens.  Per diem clocks in at midnight every day. Per diem might morph into per minuta (prima/secunda) more relevantly. Each transaction is a block – communicated and verified as it happens.


The immediate exchange is possible because accountability is pervasive and simultaneous. The command, the travel authority, and the financial auditing are all the distributed network.  All receive identical copies that cannot be altered or corrupted as the traveler progresses. The smart contracts are created to only execute with valid transactions. By definition, all costs are incurred and audited in situ – as they happen.  Travel claims are not necessary because the transaction cannot happen without valid quid pro quo.  Get it?

Smart contracts also provide detailed record keeping on a Big Data level. Because the transactions are distributed to several sources, each monitors flags for transactions out of context. More efficient than verifying each travel claim, individual anomalies are not only detected and resolved more readily, the anomaly data provides feedback to the system as a whole.


Pay Off

The Defense Travel System (DTS)  is basically a digitized paper process, enhanced with the ability to flag certain items and complete select transactions such as airline tickets, hotel reservations and rental cars (most of the time).  A blockchain smart contract is a true digital process inherently built with trust to facilitate transactions without undue verification. Smart contracts would understand cost trade offs without manning redundant staffs.

APPLICATION 5: substitute the travel claim process with travel order smart contracts


Replacing a digitized paper process with a digital system is a foundation for operational blockchain applications.  So let’s pick another example.

Pass the Test

Physical fitness is and always will be a personal measurement.  No one can be your fitness for you; it’s a bank account only you create through deposits and withdrawals.  However, it no longer needs to be a command function. Like most standardized tests, the Navy’s Physical Fitness Assessment (PFA) doesn’t measure fitness; it measures the ability to take the test. Blockchain can eliminate the administrative burden of physical fitness assessments currently required of each command by replacing them with continuous monitoring and smart contracts.

To understand the solution, let’s first look into the natural stasis of physical fitness testing within the Navy lens.  Personal physical fitness – and the test thereof – falls into three categories.


The first group – hopefully the largest within the Navy – already routinely exercises without monitoring or testing, often far exceeding any written instructions. Whether they hit the gym three times a week or hit the trail every day or train for triathlons or all of the above, they just do it.  Working out doesn’t have to make sense or be convenient, these folks know it feels good and it is good. They don’t need an instruction or direction, let alone a minimum test.

The second group does not have any workout regimen, yet they appear twice a year to pass the current fitness testing at whatever competency level. This “3 mile club” demonstrates that testing does not measure fitness so much as underline the administrative burden it takes to execute the command physical fitness assessment. They naturally pass the minimum standard and do not need training or workouts. They do not need further monitoring or assistance unless they begin missing the mark.

The third group does not make the minimum requirements.  Falling somewhere in the range of how much or little they workout, these folks are potential for either direction.  Not everyone has the natural ability to pass like the second group, but the the patterns of the first group’s regimen can be learned.  Instead of the time spent testing the whole, the attention can be given to supporting these individuals that need help. If this group is failing, by this means we can focus the attention on those that need it the most, potentially by learning from those exceeding the bar.


The Minimum

One of the challenges to having standards testing is the minimum requirement itself. The bar is set surreptitiously to ensure that during the perceived arduous duty, Navy personnel have the physical capability to thrive in combat. Historically, the need for physical capability has fluctuated greatly.  Even within the lens of today’s standards, the Navy is bounded by the overall physical fitness of the recruiting population, which is famously becoming less fit and overweight.

Within the Navy, too, the physical demands of a job vary from community to community. The pilot flying high-performance aircraft requires greater physical capability than the human resources officer ensuring the mission continues on the ground.  The combat corpsman needs to be in better shape than the submariner.

The Rest of the Story

At the end of the day, the bar is set not so much to ensure physical fitness as to meet the variety of goals required for the Navy’s overall mission.  End strength – the overall numbers in uniform – and Fit & Fill – the right skill sets sitting in the appropriate job – are highly challenging tasks even without any friction.  PFA testing has often been used for force shaping – the tool to manage end strength and fit & fill. Thus the bar raises higher during times of economic downturn and reduced budgets in order to pare down numbers.  The bar settles downward to retain Sailors in less austere times.

The Navy will grant a clean slate to nearly 50,000 sailors with fitness failures in their records, part of new shakeup for fleet-wide fitness rules announced Thursdsay.

So what replaces “testing”?

Blockchain validates a transaction and for the PFA, a smart contract fulfills through individual accomplishment. That data aggregates into a Navy-wide physical fitness measurement. Wherein a standardized test measures the ability to pass a test at a given level, flipping that idea means recording actual fitness participation and determining fitness from the data. The smart contract fulfills the testing requirement, but the Big Data capture is actually the value that is important to understand. One more time – knowing how fit the Sailors are is far more valuable than passing any test. Time and policy has proven the test is variable. If followed effectively, this methodology actually relieves the need for a test.


MCS Christopher Pratt/NavyPic MCS Christopher Pratt/Navy

What Does it Look Like?

Implementation would start with a morph. The first group is the model. Their individual workouts fulfill the requirements of physical fitness for the organization day after day. For this group, the smart contract obligations are integrally and continuously verified. For the second group, the 3 mile club makes a trip to the gym for specific measurements at a periodicity to fulfill the obligation, like an inoculation that has to be fulfilled. Finally, the third group gets flagged immediately, which provides the quality attention for establishing the routines of the first group.

Eventually, the fitness assessment would be seamless, ubiquitous, and transparent. Like your phone knows where you are, the Navy would know fitness as a whole and as individuals. The notion of twice a year testing is bound by the discrete, paper limitations in the box of analog thinking. Today’s Sailors are not draftees. The all-volunteer force are amassing millennials, born into a connected, continuous world. Making a digital process – not digitizing the current one – is what serves them as well as the Navy.

APPLICATION 6: substitute the Command Physical Fitness Assessment test with personal continuous fitness smart contracts


Next up:  Part 4,

Ain’t Talkin’ ‘Bout Love and The Edge of the Internet of Things

Aint Talkin Bout Love and The Edge of the Internet of Things


I’ve been to the edge

And there I stood and looked down

You know I lost a lot of friends there, baby

I got no time to mess around.

  • – Van Halen, Ain’t Talkin’ ‘Bout Love


Ah … nothing like 1978 classic rock lyrics about getting/giving an STD to start off discussion on emerging technology.

The Edge or Edge computing is an important tenet in understanding the Internet of Things (IoT).  Wherein you may never reach the end of the internet, you can actually see the edge of the Internet of Things.  “Things” or sensors exist everywhere – anywhere in a process or across multiple processes – but at some point there is an end of the line.  From The Edge, sensors stand and look down … and out and around at the physical world. Because they are at the Edge, these nodes can be the furthest extent from the people and processes that are interested in the information or they could be right there where they are needed.  Thus this can be where you gain or lose the best data (friends). This is where measurements are real time… when there’s “no time to mess around”. The Edge is current discussion on security as well, keeping the systems and processes free from disease and evildoers. The Edge is an important feature in building and utilizing IoT.  


For Example

To explain how the Edge works, let’s go back to an old school intrusion detection system and smoke/fire alarm (I used this in my IoT Connections blog post.) Pre-Internet, intrusion detection or smoke/fire alarm systems had various sensors hard-wired into place to determine whether the desirable conditions (no fire or intruders) were being met.  Smoke and fire devices triggered alarms for heat or chemical substances within a physical building.  Burglar alarms were usually a circuit that once broken sounded the alarm.  For both, the alarm could be sounds or lights that were experienced just by anyone there or they could be connected via land lines to other players that could call emergency services.

When the sensors are triggered, the alarm sounds there at The Edge.  Anyone in the building who is aware of the alarm understands the dangers and has the ability to make decisions from that information, such as calling the police or fire department or taking others actions such as evacuations or defense procedures.  If the sensors are connected to a monitoring service, the responders are trained and ready to act appropriately.  Perhaps the system may be able to notify the emergency services directly.  That is Cloud decision space.

With the pervasiveness of the Internet and the autonomy of the design, you can easily understand the most preferable choice – an instantaneous, specific and desirable response to an emergent situation.  Thus the IoT has grown exponentially, leveraging the combination of ubiquitous sensors (active and passive, deliberate and advantageous) and omnipresent Cloud.  However, this popularity is changing.


Keeping it Local

So Why NOT Use the Cloud?

For the 15 years that IoT has been growing, it has crawled the Internet as a natural progression.  Utilizing the ubiquity and ease of the Cloud made undeniable leverage of current operations and projected expansion.  Afterall, so far the Cloud is an amorphous, expanding universe that has served our needs.  We haven’t reached the end of the internet, so why not continue mining a perceived inexhaustible resource? However, recent developments have begun shifting the processing of the sensors back to The Edge for decision making.  Four reasons have driven it back.


Cost. The cost of sensors continues to drop, and the capabilities of those sensors are increasing.  Subsequently, more data with more fidelity is possible at multiple touch points.  Processing ALL the sensors in the cloud derives a resource tax.  Simply, you can buy more sensors by saving the cost of connecting to the Internet.  Or you can even more simply spend less money.


Security.  Proprietary or personal information is risked with exposure to The Cloud.  Keeping the information local to the sensors for decision at the source can be more effective as well as provide better security.  The Edge sensors still need safe-keeping but the damage control is more easily prevented or contained.


Design.  Just because the Cloud is there doesn’t mean you need to use it.  KISS.  The Cloud doesn’t necessarily fulfill the mission of the system created.  The Cloud is actually getting pretty crowded, and for now, this point in time, keeping the game locally may be in the best interest of the system.  Also, capability has developed to collect data from multiple sensors but interested parties have different access for different needs.


Speed.  “Instantaneous” is highly measurable now.  The meer fractional computation distribution of data still may not be fast enough.  For example, autonomous driving cars need Edge computing because the criticality of data for safe driving decision simply is too fast to zoom out and back to the Cloud.


Farm to Table

A current application of Edge computing is sensors planted with crops.  These nodes constantly provide feedback as to the soil’s properties, such as moisture content, mineral composition, and density.  The automated watering systems then deploy precise amounts meter by meter, not acre by acre, determined by real time monitoring.  Cost savings are realized in both water and fertilizer consumption.  The harvest is more bountiful and The Edge is more likely cheaper than utilizing a Cloud structure.

Ain’t Talkin’ ‘Bout Love

But The Edge is where sensors are beginning to do more of the heavy lifting of data processing and decision making – for now.  Technology will evolve and we will rock and roll with it. Be careful though because just as with The Cloud, you wouldn’t want to catch a virus or malicious attack any more than you would give one.  Always practice safe sensor deployment.  Like the 80s, all trends don’t die; they just come back around.


Previous post on IoT: IoT Connections

Next up:  PIoT vs IIoT


Big Data & Your Vote: Did Trump Change Your Mind?

Two summers before the 2016 US Presidential election, I was sitting around a bonfire in the wilds of Kenya, lingering in the peace that comes from spending the day amongst the extraordinary wildlife of safari (that’s ordinary for Africa). An intimate gathering of around 15 guests from all continents, the conversation was friendly and centered on the day’s site seeing. Eventually though it meandered into the typical conversationalist vernacular: who you are, where are you from and what do you do.

There was an extended pause as we all gazed into the fire, reminiscing on elephants, lions, and miles of wildebeest trekking the Mara. I was dreamily wondering about the potential of the universe when an unexpected verbal volley shot across the flames.

“So what do you think about Donald Trump becoming President?”  Like a grandmother’s awkward question about a pregnant member at the holiday family dinner table, heads turned and all eyes rolled toward to me, the sole American representative.

I wish I could say I easily conjured an interesting and insightful and perhaps even clever reply to demonstrate my thorough comprehension of American politics but honestly the thought in my mind a year before primaries was “… Donald Trump is running for President?”

With Trump’s victory in the history books, it seems pretty obvious now, but until Election Day, Hillary Clinton seemed to be walking away with the prize. As for me on safari a year plus before the election, I had been buried deep in personal and professional malaise for several months. I hadn’t given any thought to the election; those games would begin without me. Obviously, other people – from around the world – were looking into United States politics. At the moment, they were looking at me.

My mind went through several iterations of perspective thoughts, but each was rejected for lack of intelligence or wit. I gave a public-affairs response along the lines that at this point, many people put their hat in the ring early for a variety of reasons. Internally I thought that Trump had a very clever plan to position himself for something more viable to his operations. He was stumping for a cause or a better deal.
[thirstylink linkid=”1602″ linktext=”” class=”thirstylink” title=”Big Data coffee mug”]
Oddly enough, he ended up with becoming President (which means the causes and deals are still a fait accompli).

Elections as the Founding Fathers saw them

Selecting the Executive Officer for the United States was a point of contention for the Constitutional Convention that met in 1787 to further define the Articles of Confederation that had originally sustained the thirteen colonies. As much as the delegates wanted the government to be of the people, they had their doubts as to how much they really trusted the average person capable of making a good decision. They also contended with exactly who counted, women and minorities did not, and the less populous southern states wanted equal representation in electing the chief executive officer. The Federalist Papers argued the merits of the proposed Constitution and specifically #68, arguing how the President should be determined. The compromise is Article II of the Constitution, which spells out the Electoral voting process.

The result is the popular vote tips the hand of the Electoral College. An interesting arrangement, it nonetheless has stood the test of time. Mostly, the people vote the country’s conscious, but on those times where it’s a little dicey, the safe stop comes into play.

Evolution of elections

This year was not the first year that electoral vote did not match the popular vote.   Some can recall the famous “hanging chad” recount in Florida between Al Gore and elected President George W. Bush. The Supreme Court had to step in on that one as the very essence of how a vote is counted became questionable itself.

A hundred years previous to that was the 1876 centennial race between Rutherford B Hayes and Samuel Tilden. This – the most contentious race in US history – was settled by the famous (or infamous according to some) Compromise of 1877, which removed the last occupying soldiers in the southern states to end the Reconstruction Era. It also recorded the greatest voter turnout in US history.

The Information Age – This or That

Coming back to the more recent elections, reaching out electronically via the Internet came to play for President Obama’s campaign, which is credited with the first victory utilizing social media. His team developed virtual grass roots capability, breaking ground with the now common practice of A/B testing. The website experts at Optimizely derived the magic that would test six media options and four call-to-action buttons. These web page features examined the subtle differences to check for conversion rates.



The media was video or picture, with President Obama by himself or with family. The four choices for “call to action” buttons seem the same, but what is the difference?

The famous “Combination 11” won. 

The winning variation had a sign-up rate of 11.6%. The original page had a sign-up rate of 8.26%. That’s an improvement of 40.6% in sign-up rate. What does an improvement of 40.6% translate into?

Well, if you assume this improvement stayed roughly consistent through the rest of the campaign, then we can look at the total numbers at the end of the campaign and determine the difference this one experiment had. Roughly 10 million people signed up on the splash page during the campaign. If we hadn’t run this experiment and just stuck with the original page that number would be closer to 7,120,000 signups. That’s a difference of 2,880,000 email addresses.

Sending email to people who signed up on our splash page and asking them to volunteer typically converted 10% of them into volunteers. That means an additional 2,880,000 email addresses translated into 288,000 more volunteers.

Each email address that was submitted through our splash page ended up donating an average of $21 during the length of the campaign. The additional 2,880,000 email addresses on our email list translated into an additional $60 million in donations.

https://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/ boldfaced added for emphasis

Many of us wouldn’t consider 8.26% significantly less than 11.6%, but this simplification exemplifies the extrapolation of data capability when the numbers become Big.

One of the big takeaways that Optimizely upholds is the “knowledgable” staff putting together the website instinctively felt certain features would be the top performers. They were wrong. They learned to accept that data has results that “trump” their gut.

Lessons Learned

  1. Every visitor to your website is an opportunity. Take advantage of that opportunity through website optimization and A/B testing.
  2. Question assumptions. Everyone on the campaign loved the videos. All the videos ended up doing worse than all the images. We would have never known had we not questioned our assumptions.
  3. Experiment early and often. We ran this experiment in December of 2007 and reaped the benefits for the rest of the campaign. Because this first experiment proved to be so effective we continued to run dozens of experiments across the entire website throughout the campaign.


Speaking of Trumping

President Elect Trump was noted a great deal for controversy and one of his earlier campaign comments badmouthed Big Data; however, in the waning months of the run he turned to his son-in-law Jared Kushner to spin the magic of Big Data. (Trump since then has appointed Jared Kushner Senior White House Advisor.) Jared went to a Texas based marketing firm Cambridge Analytica. Untested in political waters, the company instinct was less of an influence. They were using the data. Ah yes, Big Data.

In comparison, the 24 Obama message small data combinations were countable. The Trump messaging system incorporated 4,000 messages read by over 1.5 billion people. Whereas the Obama campaign converted traffic arriving at the campaign website, the Big Data of Trump crawled all of Twitter, Snapchat, Instagram, Pandora and more to pick just the right ad to influence … you.

Like traditional politics, if your mind was made up, no ad would likely change it, but for the swing votes, and states, and Electoral College members, just enough created the tipping point.

So what does this tell us about the US election process?

No matter that that population has gone from an estimated 2.5 million in the original thirteen colonies to over 320 million, each vote does count, and that vote can be influenced, even until the last minute. Although the Founding Fathers worked with a set of social issues and technical capabilities that are quite antiquated, the premise of popular vote backed by the Electoral College has proven evergreen. Through Supreme tests and unforeseen circumstances, the tremendous impact and significance of the Constitution continues the bow wave of global proportions.

Big Data Elections

So how does Big Data influence the well-established process? The first iteration is evidenced in the striking Trump victory. By creating over 4000 messages, the Trump campaign influence individuals as gleaned from their interactions with friends, family, colleagues and strangers via a suite of online platforms. The targeting itself is not a new concept; the ability to learn detail about your preferences and match them to a more specific message for you is what is evolving.

Future capability will be even more surgical. A study of your patterns will provide a unique message only you will see, not just one in four thousand. It will be oh-so-compelling and it will be timely. Pandora kept repeating the same message about Obamacare to me, which makes me question the targeting capability. It’s a subject I’m not passionate about either way so I doubt its efficacy and wonder at its origin. Future Big Data political influence will know your personal call to action, what makes you click the button in its favor.

“Big Data, Who Should I Vote For?”

As always until technologically replaced, there’s an app for that. Whereas the campaigning is the push, an app is the pull part of Big Data. Today’s voting apps provide information about the candidates, which is good, but they don’t utilize Big Data. The Big Data Voting app first understands all your personal data profile: social media, work habits, employer and pay history, geotags of your life. The app crawls the political schema to search the candidates and party habits and messages, even determining where the rubber doesn’t meet the road (do they do what they say?).

The app doesn’t tell you for whom to vote, but it does profile your activity against the candidates: what specific messages and actions truly align with who you are and what you do.

The app is even more effective at the levels below the Presidential race. The app will better sort out the voting records of constituents to let you know how their actions align with your profile proclivities. Digging down even further, it would provide a more robust picture of those local officials that this election you might not have heard of before seeing the ballot. Positions like school board members and county officials and judges have tremendous impact on the tactical level of our lives, and yet we rarely have much information on their backgrounds or voting records. (Or we don’t take the time to search the various resources for that information.) The app would do the heavy lifting to provide a decision dashboard. It’s less emotion and more metrics.

Big Data Creates the Slate

The rhyme and reason for individuals to pursue an elected position is far from a perfect process. Like even our Founding Fathers, the credentials for running for office have more to do with whom you know and how much financial capability you have than actual capability for executing the duties.

Further in the future, Big Data would start the election process by picking potential candidates out of the crowd. A Big Data candidate proposal process would cull the general populace to find individuals with demonstrated good character and decision capability.

Big Data isn’t Big Brother though. At the polls, it doesn’t vote for you. It rises above the emotion and motion of the crowd to provide a better understanding of what your habits and proclivities actually mean in political perspective.

End of Day

The prescience of the US Founding Fathers is almost unfathomable. The Founding Fathers took great care and incredible insight to vision a government that would live in perpetuity – through a magasmaum of unforeseeable technological, social and political changes.

Perhaps had they known all the challenges ahead, it would have presented an impossible undertaking. They didn’t even have electricity. Consider the cascading inventions, capabilities and perturbations from that singular effect. I’m glad they didn’t have any idea how many ways the world would change or how many times their work would be challenged; otherwise, the Founding Fathers may have just stayed home.

The morning of the 2016 election held no obvious portent to one of the most unforeseen comeback victories in the US. I made my coffee hurriedly that morning.   Although my employer had no qualms about my being late, I still had to fit voting into the day’s routine. It wasn’t a significant weather day, no hailstorms or raging battles or extremist groups to deter turnout. I had to travel a torturous (joking) four blocks to reach my voting location. The only army there was volunteers (some paid) that made the process about as efficient as possible for an ad hoc group of strangers enforcing the rules. How easy is that!

I thought of that night in Kenya again, realizing how my single vote made a difference not just in my country. As small as the planet becomes in a Big Data world, all the more so every one counts.



2018 Guide to Big Data (5 Easy Concepts you need to know today)

Big Data innovations continue to drive business intelligence and integrate into everyday life. Whether you are an experienced data scientist or an aspiring one, whether you are in big business or a one-man shop, whether you are worried about your weight or what your government is doing – Big Data is a part of everyone’s future.

Big Data made a Big Difference in the biggest story of 2016 – the US Presidential election. Although President Trump had pooh-poohed the impact of Big Data during his initial campaign, he rallied a last minute expert team just months before the polls that just may have made the difference.

Using sophisticated analytics and digital targeting, President Trump’s technology strategy collected characteristics from online and offline sources to find potential voters. With over 4,000 finely tuned messages, a specific one was placed after assessing the potential voter’s Facebook, Pandora and snapchat activity. Virtual grassroots at its finest.

Bringing Big Data to the people.  So, what is Big Data and what concepts do you need to know right now?

What is the big deal about Big Data?

Big Data is the collective term for the accumulation, processing and utilization of lots and lots (and lots) of data.  Big Data is huge quantities of data – Volume. Big Data is an array of types of data, from an equally diverse set of sources – Variety.  Big Data is collection and interpretation at ever-faster rates – Velocity. These are the “3 Vs” often referred to in discussion of Big Data.

Although humans have been collecting information about what they do and create since the beginning of recorded history arguably somewhere in the Roman Empire, Big Data is the relatively recent capability to capture and process such significantly larger and more robust data sets. Although computers began the data accumulation in the 1950s-70s, the phenomena of Big Data evolved as recently as 2001 when the term was coined by analyst Doug Laney.  What makes the BIG in Big Data is the exponential increase in the 3 Vs discussed. Here’s a couple of examples.

The Big Picture

When the Sloan Digital Sky Survey began in 2000, its telescope in New Mexico collected more information in the first few weeks than had been amassed in the entire history of astronomy.  By 2010, there was over 140 Terabytes of information. That amount of information can now be collected every 5 days.

When scientists first decoded the human genome in 2003, it took them a decade of intensive work to sequence the three billion base pairs. Now a single facility could sequence that much DNA in a day. The cost of that processing went from $40 million to $5000.

What data you can store and process on your phone today in 24 hours has probably more capability than all computer processing up through the 1970s. In 2005, a cell phone – without even a camera – had more processing power than NASA’s mission control during the Apollo flights that put men on the moon.

To understand why you need to know about Big Data, let’s start with The Fab Five.



#1 Not a Fad

In the past decade plus years, the 3 Vs of Big Data – Volume, Velocity and Variety has gotten a lot of attention from techies, industry and the public. There’s even been a fourth (Veracity) and Fifth V (Value) to further explore its opportunities and challenges.  Like any popular uprising, the hype or substance of Big Data (depending on how you look at it) reached a certain level of attention before the naysayers began to cast the first predictions of if being a passing fad.

To some, Big Data melts into a crucible of technology slugs and ingots that are pedestrian and passing. But it’s not. The volume, velocity and variety of data available today, versus last year or ten years ago are not about to peak. Following the Second Law of Thermodynamics, its disorder only increases.


Big Data is still in the flat slope climbing the learning curve of what Big Data is and isn’t or what it can and cannot do. Utilizing its capability has considerable challenges, ranging from how it is initial collected to how to get to its mined “gold” – prediction. The philosophic trellis supporting Big Data is complexity and chaotic systems. It’s tricky stuff that the best experts are still beginning to explore.

It’s all emerging technology with all the nubile stumbling of a toddler.  As its potential is only unfolding, the impact of Big Data is less like a popular novel and more like the Gutenberg bible. The bell can’t be unrung; it is here to stay.

Business uses it. Government uses it. Non-government organizations and non-state actors – both beneficent and malevolent (terrorist) – use it. And you use it too.



#2 You’re Wearing It

Wearables continue to infiltrate everyday life. Right now, the obvious example is your mobile phone. Somewhere in 2014, the number of cell phone subscriptions rose to equal the world population. (Land lines in the US never made that ratio, peaking way back in 2000.)

Cell phones provide you with more and more capability that is also your identity. It’s not just contacts and email connectivity. It’s not just communication. It has your banking information. It has your pics and music and social media, all brimming over with the 3 Vs of data. It entertains you and provides you with convenience. Some argue it is also security. It tells you where you are as well, and as it captures everywhere you have been.

Wearables have become increasingly popular with connecting into more robust medical applications – blood content, vital signs, respiration. Shoes have been designed to give directions to the blind. Socks can charge batteries with walking. These may seem like cool or awkward technologies but their implementation will break barriers in ways that aren’t obvious to the casual technology observer.

Wearables aren’t just for humans either. Wildlife is tracked for numbers and habits. Domestic animals also wear their own version of biometric sensors. The data analysis is used to optimize breeding and feeding practices. Even a honey bee can be fitted out for tracking movement for scientific experiment. These are data points that have been available in small portions before, but as the cost has gone downward, the capacity of data to be analyzed has gone up. Before it was a few discrete points; now it is a flow with more robust and significant and actionable outcomes.

Wearables are moving into more platforms and becoming more ubiquitous. They can be literally woven into fabric and painted or embedded into the skin. The Big Data doesn’t stop capturing your life though with wearables. It keeps going.

#3 It’s All Around

Wearables are just a subset of the propagation of sensors embedded in every aspect of life. Sensors will continue to combine with increased ability to interact and utilize that information. This – the Internet of Things (IoT) – started as a cool idea, but you can bet it already has effect in your life. You are always “on”.

Mobile phones and wearables are examples already provided, but there are others you already know. A suite of home monitoring products on the market provide remote control and observation to check on your electricity usage, environmental status, fire protection, doors locked. You can add monitoring to your car as well, and new models are incorporating more and more sensors that analyze its operation, alerting the driver to hazardous operating conditions and providing maintenance observations.

The Internet of Things (IoT) monitors crop growth. It’s used to drive building space utilization and builds maintenance plans for that building. Big Data and the IoT predict the weather and provide direction for recovery efforts when weather goes awry. The IoT is tolls tags in your car that don’t impede traffic and intelligent labels in your clothing that provide wardrobe inventory analysis and suggestions.

As the Internet itself is the eruption of software – bits and bytes that have become the blood of life, the Internet of Things (IoT) is essentially the physical hardware that we touch and manipulate connecting to the data flow. The embedded technologies weaving together your daily life are becoming more robust, providing an increase in productivity, an increase in relevance, and increase in well-being.

Consumers and society want this capability and they are willing to sacrifice at least some privacy and security for the perceived benefits. See Who’s Betting On the IoT.

#4 It’s Your Business

Big business has been the early adopter of Big Data and it touches all aspects of business – product/service development, manufacturing, operations, distribution, marketing, sales. More importantly, Big Data affects the most important function of business – the bottom line. Big business has had the deep pockets to explore the emerging technology, recognizing the not only the potential return on investment but also the danger of competitive advantage. As Big Data expands, the cost of entry is decreasing as the availability of resources extends to smaller businesses and individuals.

At last year’s (2016) Paris Air Show for example, Bombardier showcased its C Series jetliner that carries Pratt & Whitney’s Geared Turbo Fan (GTF) engine, which is fitted with 5,000 sensors that generate up to 10 GB of data per second. A single twin-engine aircraft with an average 12-hr. flight-time can produce up to 844 TB of data. In comparison, at the end of 2014, it was estimated that Facebook accumulated around 600 TB of data per day; but with an orderbook of more than 7,000 GTF engines, Pratt could potentially download zeta bytes of data once all their engines are in the field. It seems therefore, that the data generated by the aerospace industry alone could soon surpass the magnitude of the consumer Internet.  


We live in a world of increasing choices. The Mad Men marketing schema are iconic caricatures of what capability has begun and will continue to evolve. Your computer already learns from your search history what products and services you are at just thinking about purchasing. That’s a linear example. You search; the sites you visit take the information from your activity to pitch you products and services you are more likely to want. In a way, it’s annoying. In a way, it is convenient.

Big Data will make the message more compelling and more satisfying as it is derived from multivariate activity that accumulates from the 3 Vs. It’s going to start passing products and services you didn’t’ know you need (or want.)

“A lot of times, people don’t know what they want until you show it to them.” – Steve Jobs

Big Data marketing will know your transaction history, your lifestyle patterns and deviations, and fashion a very, very personal sales message to you (whether you like it or not).


#5 Your Tax Dollars at Work

Governments are getting into Big Data, not so much by leaps and bounds, but more by specific experiments. The United States uses Big Data in several agencies. Fraud, default and illegal activities can be detected or even predicted by observing the huge volumes of data available from agencies that use a huge volume of transactional data, like the Social Security Administration, the Federal Housing Authority and the Securities Exchange Commission. In the interest of public health, the Food and Drug Administration and Department of Health and Human Services utilize Big Data for better decision-making on the impact of individual lifestyle choices.

The Department of Homeland Security is another obvious player, utilizing the 3Vs of data available from not just federal sources, but state and local law enforcement entities. In the aftermath of the Boston Marathon bombing, over 480,000 images were ingested for investigation. Cross pollination of NASA and the US Forest System Big Data resources coordinated to better predict weather patterns affecting ground and space events.

The next wave of Big Data in government goes even further. It’s a bit more “out there,” and it is a little scary. China citizens have stopped using wallets and instead use their phones for all transactions.  At first it was simple and convenient for buying groceries or renting a bike, but it has evolved into personal credit and social monitoring. Big Data or Big Brother, only the Chinese government algorithms know.


Greater Good

The 2018 Guide to Big Data has the 5 things to know about Big Data; it’s not just big business, although that group will continue to invest for both ROI and competitive advantage. Big Data also isn’t just about lifestyle choices. Wearables and the Internet of Things are building a Big Data trellis that grows the fruit of your life. Businesses that utilize Big Data will nurture that fruit, providing the tools and subsistence to grow the optimal grape.

Big Data is also about a bigger picture too. Ill intent will continue to undermine the soil and bind the vines. The bad guys aren’t going away; they will continue to find new ways to steal, or worse.

Big Data can do really great things. It is used for disaster search and rescue as well as damage assessment. It’s used for wildlife assessment.  It brings together the people throughout the world who want to help.

Is Big Data a silver bullet or final solution? No. Big Data is only just beginning. Is all the technology in place? No. But we did see Big Data turn the tide of the US Presidential race.  What will happen in 2018??

Stay tuned.

Big Data, Bird Flocks and Figuring Out World Hunger

Do you notice the flocks of birds that pass overhead?

I love watching the graceful flow of the flying inhabitants of the beach: pelicans, sandpipers, seagulls, cranes. Some are ‘regulars’ – seen day after day. Some come and go. Last week I watched an array of over 20 stork-like creatures I’ve not seen before fly by. Another favorite is the transitory flights of geese that mark the passing of time through the change of seasons. I am a far cry from being a bird watcher though. I just enjoy observing.

Rewind a couple thousand years to the pre-republic days of Rome. Bird watching was more than a hobby. The augur or auspex was a religious official who observed natural signs, especially the behavior of birds, interpreting these as an indication of divine approval or disapproval of a proposed action. He (always men) derived the gods’ intent from how the birds flew. In this highly esteemed position, the Augur watched for bird movements in the skies at specific times for signs to regard holidays or elections. They also watched in general to portend evil activity or warn of possible enemy movement. This bird observation was reading the auspices. People would consult augurs for guidance on personal matters too – from business dealings to wedding dates. Government officials consulted the auspex for holidays. Roman military campaigns would utilize augers before battle.

Murmuration from Islands & Rivers on Vimeo.

Big decisions … based on how the crow flies (figuratively)

Seems silly or crude? What do the birds know about politics, or war plans or whether this year’s crops will be fertile?

Bird traffic does provide information though.

Romans didn’t have computers or cell phones. Romans didn’t have weather forecasters; they didn’t have any way to know what weather was coming. The best they could do was look out the window or maybe across a field. How many times has that worked out for you when trying just to predict the commute home?

Bird activity does say something about current conditions in the air, water and earth. A single bird can go further and see farther than any human many times over day after day. Their action as a group signifies a coalition of instinct and knowledge. They also fly upon air current, which is driven by barometric pressure, which is result of uneven heating of the earth’s surface, which is … weather. If today we were stripped of so many data sources taken for granted, perhaps we might learn to study the signs of nature very, very, very carefully. We would want to be able to predict bad conditions, or worse – disasters.

Not Ancient History

First news from Galveston just received by train which could get no closer to the bay shore than 6 sq mi (16 km2) where the prairie was strewn with debris and dead bodies. About 200 corpses counted from the train. Large steamship stranded 2 sq mi (5.2 km2) inland. Nothing could be seen of Galveston. Loss of life and property undoubtedly most appalling. Weather clear and bright here with gentle southeast wind.
— G.L. Vaughan
Manager, Western Union, Houston,
in a telegram to the Chief of the U.S. Weather Bureau on the day after the hurricane, September 9, 1900

It was the early days of fall in 1900. The deadliest hurricane in US history struck Galveston Texas with little portend. The day’s weather forecasting methods did not predict the 15 foot storm surge that covered the entire island that lay at a mere 7 feet. Entire buildings pulled off their foundations and 145 mph winds ripped at whatever held above the tide. The deaths were only able to be estimated and reached 6,000-8,000.

By comparison, Hurricane Andrew struck Miami in 1992 with all the full warning of the National Hurricane Center as the mighty Category V storm hit with winds of 165 mph. Miami’s population alone was hundreds of thousands more than turn of the century Galveston, and over 1.2 million people were evacuated from Miami and surrounding counties. The result was a still unfortunate loss of life, but minimized to 65 persons.

Even by 1935, the Weather Bureau was able to send widespread warnings and Coast Guard aircraft even transited the shoreline dropping message blocks concerning an approaching storm. The effect was apparent when the most intense storm to ever hit the US travailed upon the west coast of Florida with over 185 mph winds and 18 foot storm surge. Deaths were curtailed to an amazing 465.

Obvious, and less obvious

Weather affects everyone, every day. What to wear? Need an umbrella? How about needing disaster response? That’s the direct, tangible effect. Weather also has indirect reach: how well crops grow, the cost of those crops, the economy that depends on people affording and eating those crops, the politics that influence all of those reaches.

So without telecommunications or computers or the mechanics of electricity or the knowledge of weather, perhaps studying the birds was actually pretty damn smart. The Romans had a lot of good ideas, tangibles such as roads, bridges and aqueducts that are still in use today. Their influence too is in our government, architecture, language, law, and military tactics and equipment.

Data use has been likened to searching a dark room with a penlight. The room is stacked to the ceiling with information, but we can only find what we need within the narrow confines of a very small beam. This is a great comparison to the Romans using birds. They were right, but context and content were still in the works. They did a helluva lot with what they had.

So How Does Bird Watching relate to Big Data?

Big Data gets a lot of attention. It’s not quite the reverence given the Roman augurs, but it does tend to attract believers and non-believers.

Like the augurs, Big Data is not wholly left brain activity. It is not a Newtonian equation that takes variables and outputs a product. But as Einstein first got us bending time with thought experiments about quantum capability, the Laws of Nature aren’t as solid as we think.

If we stay within the Left Brain and Newton’s confines, we will eventually be trapped there. That’s why cancer, hunger, and terrorism are still very much a part of our world. These are Big Problems that require human interaction with data in ways we haven’t figured out yet. These challenges are dynamic and non-linear. Cause and effect thinking fails.

“Chaos theory becomes critical in understanding the way things work.  We must look for flow patterns rather than linear cause-effect explanations. ”  – Jean Houston, Forward for Chaos, Creativity and Cosmic Consciousness

Our world is chaotic, not in the conversational context of pure disorder but in the scientific posture of “behavior so unpredictable as to appear random.” Chaotic study has proven things are not random as they appear; it is only our ability to perceive the patterns that emerges. That is where Big Data begins.

Unlike the augurs

Big Data is nascent capability. The tools and techniques to master its volume, velocity and variety are as yet quite experimental. The pen light has perhaps grown to searchlight proportions, but now the room has expanded into coliseum size. The beam too is not quite a surgeon’s hand but more so likened to an elephant meandering through the jungle. Strong, powerful, with significant intelligence and excellent latent memory but … not so delicate.

So there is knowledge in bird flight patterns. So there is more knowledge in the 3 Vs of Big Data. It won’t be Newton’s apple clunk on the head; it will be in the whispers and wails of the wind and our ability to interpret the direction.

Why Haven’t We Cured Cancer?

Why We Haven’t Cured Cancer

My mother died of breast cancer in 1989. She was 44. Ironically, she was considered a “survivor” because she lived for five years after diagnosis.   She was a teacher and during those five years, she never missed a day of school. She saved the summers in between for chemo, two rounds of radiation and a mastectomy, all at famed MD Anderson Cancer Center in Houston, which was considered then and remains today one of the world’s premier cancer treatment facilities.

So what has happened in the almost 30 years since?

Breast Cancer for the past 25 years


It’s not very impressive, is it?

Yes, there is a decline in the mortality rate – for less than 5 years. As grateful as I was for those five years, I would have liked to have 20 years or more. I’d like to know that today much fewer women get breast cancer. I want it to look like this chart of polio occurences.


Polio hits the dirt

That makes me pretty demanding too. Cancer is tough stuff. Many brilliant minds and passionate hearts have dedicated their lives to research of the disease and care of its victims. Is it too much for anyone to figure out how to cut the numbers?

In my years of work in resourcing individuals for worldwide mobilization, the numbers represent people, I realize that each node has excruciating significance. Who is that specific person that melts into those numbers? Was it your mother or aunt or sister? Did they live less or more than the survival 5?

Average Joe

We live by statistics. We toss around the words “average” and “normal” more commonly and casually than their less technical counterparts “mostly” or “seldomly.” We actually abuse statistics, using it to create desired response. Politics report crime is down or up X% to catch your vote; business marketing emphasizes attributes and effects skewed to get you to buy; teeth are Y% brighter or items contain Z% less calories. The media wants to grab your attention and activists want to prove their point. It’s the notorious tail wagging the dog. Data collection, data sets and data analysis are manipulated with ignorance or intent to provide facts and figures that support a cause.

Even without using statistics, our brains are wired to measure risk in ways that are often unrealistic. We assess situations trying to find cause and effect relationships when actually only coincidence is present. We perceive great risk where there is little (shark attack), and we underestimate potential dangers (driving to the grocery store).

Treat to the mean

What does cancer look like to the “average” victim today? A diagnosis via testing, a specialist doctor, and a suite of supporting consultant experts and services. A team is built to craft a treatment. This is goodness but it’s not the best we can do.

In this Ted Talk, Kristin Swanson explains her personal journey with cancer and the cold hard numbers.



To Kristin Swanson, this is personal. And to me, she is right. Treating every person as the mean is today’s standard, regardless of how many experts are in the room. Although that is amazingly better than even one generation ago, the medical research, as scientific as it is, is also subject to misdirection. Medical poor methodology and outright plagiarism is likely, as in many disciplines and forums.[i]

Two heads are better than one, and a whole team behind you definitely improves your odds. Going back to the stadistics, though how much has that made a difference?

Going from Little Data to Big Data

Medical school is not walk in the park; it is notoriously long and rigorous. I imagine the volume, velocity and variety of information learned in medical school has grown exponentially from even just two generations ago. Although the human body itself hasn’t changed (appreciably) in a couple thousand years, the amount of research compiled about body parts, human habits and social environments again expanded in quantity and quality in the past two generations. But that is small data.

We can do better

Kristen gives the example of Patient X whose tumor is larger after therapy. In a small data world, this is a “failure.” Therapy => larger tumor => failure. In reality, had Patient X been the famed “mean” participant, his life expectancy would have been 15 months. This Patient X lived another five years. So where is the true failure?

This doesn’t mean doctors are wrong or idiotic about their techniques and therapy recommendations. These physicians are compassionate and intelligent resources for developing appropriate therapy according to standard practices. Unfortunately, they are making life decisions for patients with relatively very little information that is very biased.

Comprehending the intricacies of the infinite array of influences that affect a single person’s holistic health profile is impossible psychologically. Humans (at this stage of evolution) can only hold so many numbers in their brain and then the matrix of combinations thereof quickly pushes override. Even if doctors could study harder or lean harder on tried-and-true practices in an attempt to maximize survival, it is still small data.

A Streetlamp Named Desire

There’s a joke told several ways about a man at night searching for his car keys under a streetlamp. A passer-by/neighbor/police officer stops to help him look and begins by asking where he thinks he has dropped the keys. The man responds he lost them down the street near where his car is parked. “Why are you looking here then?”

“Because this is where the light is.”

Tackling Big Problems – such as cancer – has been looking for those car keys. What we know about cancer is under the streetlight. The complexity of cancer and other Big Problems has been beyond the tools and capabilities we have been utilizing.

In comparison, polio and small pox were not complex; they were complicated.  In these situations, enough experiments and enough coincidence unraveled the mystery and a vaccine was borne. This scenario is likened to untying the ear buds that were wadded up at the bottom of a backpack. Patience and grit win the day. Neither polio or small pox are even curable, but they are preventable. That prevention significantly dropped mortality rates. Small pox was even considered eradicated from the earth in 1980.




Sir Isaac Newton Needed Calculus

 Big Problems need Big Data. Big Data’s capability has only come into play within the past ten years. Its nascent tools are just evolving. The information of the world previously has been only ether, captured in memory through unreliable replication or in select recording capability of finite data points. Now it is being captured in ways and means –subtle and overt – that even the most creative imaginations didn’t predict.

Big Data is possible now for a couple of reasons. Data acccomodation was limited in a small data world but now digital capture has created an expanding flow that is not only easily made, but also sharable and searchable. Punch cards and floppy disks were the beginning of the end of small data. Storage is just the first tenet.

The power to manipulate the data is keeping pace with capacity. The cell phone of 10 years ago (think flip phone) had more power than what was possible for the Apollo missions to the moon. Today’s typical work environment still relies heavily on spreadsheets and traditional databases, an abacus of sorts relatively. This gap between what is possible with data that is collected and what is done with it sets up the huge demand signal for data scientists. We need intelligent people who can manipulate Big Data.

Finally, Big Data needs to be articulated – art meeting science. Data scientists are needed for accuracy and intent but the expression needs a handshake with the operators of the information. In the case of cancer, this is doctors and specialists and caregivers. Data visualization rounds it out because the reasonings and relationships are far more coquette and piquant than dashboards and powerpoints.

Heavy Lifting

Big Data is complex – not complicated. It’s chaotic. The earbuds cannot be unraveled; they continuously contort with each pull of a knot. Each of us carries around genetic predispositions which are either enhanced or deferred with the daily choices that compound into life time patterns and twist with chance events. How much you weigh or what you eat or drink or where you live or how you work or play or where you travel or stay put or how you relate to friends, family and coworkers – these all interplay until a tipping point is reached.

Big Data captures the minutiae of internal gyrations and external influences affecting that probability of cancer. Big Data is where the breakthrough lies.

Since Big Data has become a capability, we can go beyond treating the mean. A cancer treatment need no longer be determined by your doctor, or a suite of doctors. It can be determined by all the doctors who have ever treated cancer. A patient’s medical history isn’t a couple of pages of discrete data points. It is a continuous flow of information illuminating personal habits, events, and discretions. All the dedication to cancer research folds into Big Data.

Big Data is not a Holy Grail of itself. Big Data helps us see information in a way never before capable, evolving a holistic methodology.  With that power comes great responsibility. Like statistics, Big Data outputs can become ugly monsters or heavenly returns.

Big Data can move the street lamp.


Further reading:

Big Data Medical Record

The Inherent Clumpiness of Randomness

Spurious Correlations

[i] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4401313/






It’s YOUR Fault!! Big Data Takes the Blame

It’s your fault.

No it’s the other person’s fault.

It’s the guy who cut you off in traffic.

It’s “customer service.”

It’s the government.

It’s the weather.

It’s global warming.

It’s the way things go.


A tree falls in the forest. A roof falls in. A business fails. A stock market crashes. A disease spreads.

Sh!t happens

Who or what caused it? Cause and effect – causality – is deeply woven into our lives.

Figuring out who or what caused something is a universal application of everyday life as well as one of global consequence. We want to know why something happened, regardless of good or bad outcomes.

Researchers use causality to test whether a drug has the desired effects as well as to control the less desirable side effects. It’s used by scientists to figure out the common cold or/and the secrets of the cosmos. What-caused-it figures out who is going to pay for damages – for the automobile accident or for global warming. It’s used by governments to develop regulations and uphold laws.

Knowing what caused what not only tries to explain what already happened, it also leans forward into the future. Causality is used for the ultimate gold – prediction. You use it to keep your finances and your safety. Business tries to predict who is going to buy what. Weather forecasters use causality to keep us dry as well as out of danger.

If we know what causes what, we can avoid the unfortunate and encourage the beneficial. So why doesn’t causality work?


Causality is a very flawed practice

Cause and effect is largely opinion. Consider the controversy over global warming. Do too many cars burning gas melt the polar ice caps? Are there too many people breathing too much? Is it a wobble in the earth’s rotation? Is it just … the way it goes? Is the globe even warming?

The debate consumes some of the greatest minds of science. Then politics and politicians become involved. Other non-experts but prominent participants such as actors and public figures weigh in. Finally, the “average” consumer and citizen have say in the truth or perception as well. Should each person’s opinion have equal weight? (when they don’t)

Causality is assailed by lots of variables and lots of interpretations, which not only seek to figure out what has happened but also how to affect the future.

So can we change global warming? Do we want to change it? What happens when we do change it? Will we get the desired results?

Getting Personal

We all want to know cause and effect, so let’s look at some personal examples – weight control. Does eating a doughnut make you fat? Does it take eating a doughnut every day to make you fat? What if you’re one of those really skinny people? What if you ran a mile or a marathon after that doughnut?

What about cancer? Smoking = cancer, right? It’s not just one cigarette though. Is it one year of smoking? Day 366 or many years? Just how many does it take and what other factors increase or decrease that opportunity? That’s another causality factor. How can you explain those that smoke for decades without getting cancer?

Real or Perceived?

People like the security that numbers and calculations provide. Causality is no exception. There is a preconception of fairness that 2 + 2 = 4 and no one gets hurt.   Simple addition though doesn’t exist in a bubble (unless you’re a mathematician). We use numbers for decisions. Small decisions are whether to purchase those cool shoes by whether the bank account supports it. Traffic engineers quantify what is safe with speed limits, which makes enforcing those rules “easier”.

Big decisions are global warming or poverty or war or whether a corporation or a government is operating in the black or the red. Unfortunately like any good story, numbers can be manipulated, either by ignorance or intent – ask those who invested in Enron and Goldman Sachs.

Mistaken Identity

So causality often uses numbers for quantifying things that are a bit fuzzy. But when is it causality or its misunderstood cousin – correlation? This graph represents a strong correlation between US spending on science and technology and an increase in suicide by strangulation. A strong correlation does not mean that financing more STEM leads to more suicides. That’s the difference between correlation and causality, a fine line we are able to appreciate given an obvious scenario.

What if it’s not so obvious?


Gorilla in the Room

In The Invisible Gorilla Christopher Chabris and Daniel Simons explore intuition via a psychology experiment demonstrating how we overlook the obvious while concentrating on a task at hand. Survey participants are asked to count the passes while watching a video of players passing a basketball. During all this passing, a gorilla comes in the middle of the circle beating his chest and then departs.

Recreated in numerous scenarios in numerous countries, half the subjects never notice the gorilla.

The lesson? The authors use this experiment to underlie six areas in which our data collecting minds short circuit: attention, confidence, knowledge, memory, potential and yes, cause. Their chapter on causation centers on how people depend upon pattern recognition to solve problems or prevent them from occurring.

A physician cobbles symptoms together with personal experience and training to fix what ails you. A stock market trader does the same to make money. A parent tries to guide their child to safety and prosperity using their successes and failures. As crowds or governments or societies, we perpetually absorb environmental factors and interpolate results based on causality.

“Our world is systematically biased to perceive meaning rather than randomness and to infer cause rather than coincidence. And we are usually completely unaware of these biases.” P.154

Our minds have fascinatingly adapted to interpolate vastly more complicated situations. It’s also disturbing that biases build up like plaque in which we are unaware of the distortion.

Even if you think you’re an EXPERT on correlation versus causality, watch this Ted Talk and personal awareness quiz. Test your perspective of the world and learn some pretty cool data points.

Big Data Bias

How about Big Data? Big Data is not immune to the misdirections of faulty intuition. Like a bigger hammer, the momentum of Big Data could be perceived – or utilized – inaccurately with that much more destruction.

The glory though is that Big Data can overcome those causality challenges because of the Big-ness of the Data. The volume, velocity and variety usurp sampling “thinking”, hypothesis testing and small data limitations.

Small Thinking

This Khan Academy lesson in causality and correlation demonstrates the trip ups of small data sets and hypothesis testing. Using very little data from a small but statistically significant sample, an article suggests that one thing – eating breakfast – can decrease childhood obesity. The author never relays eating breakfast prevents obesity. As the lesson expounds, the careful word selection includes enough suggestion while omitting some relationships.



It’s a Wild World

In this example and in every hypothesis test, a plethora of variables must be held constant in order to create any deductions from the experiment. Like the examples above regarding eating donuts or preventing cancer, a hypothesis test tries to still a moment in time to get an answer when the reality of life is much more complex. Life an open system, subject to a world of whims and multiplying factors. Life is chaotic, vast dynamic systems sensitive to initial conditions that divine behavior.

The fault with small data can be overcome with Big Data capability. Big Data captures more, holds more and manipulates more data on a level we are really just beginning to comprehend. That transformation begins with removing the plaque of small data thinking.

Big Data is best deployed en masse, collecting and digesting mass quantities of volume and variety of sources. Without hypothesis testing, the information is observed for the patterns and outliers that arise.  Unlike the doctors and stock brokers and parents, Big Data weaves without bias.

So It’s Only Natural

Causality is natural. We do it without thinking, which is both a survival mechanism and a fault. We draw conclusions from information and we suffer sometimes from those derivations from incorrect assumptions. Although errant thinking can be overcome by careful thinking and rigorous process, some preconceptions will always elude us.

Big Data is subject to the same bias errors, and it can be even worse because of the volume, velocity and variety of data. But Big Data is a new methodology capability. That capability is still being explored and it needs to be done outside of small data context.

The signal in the noise though is brilliant. It can comprehend solutions we would never dreams and it will solve problems we didn’t think possible. Think Big. It’s coming.


More resources: