Wednesday, July 3, 2013

Theory Thursday: Knowledge is Power

Welcome to the first installment of Theory Thursday!  (The first installment is coming to you on Wednesday  because Thursday is 4th of July.  Give yourself a moment to adjust to the cognitive dissonance, then contiue reading.)  Sometimes, when considering the world of Big Data, it is instructive to consider the world of Very Very Small Data: how do the bits of information around us add up to the terabytes on our hard drives and, more importantly, what constraints does the physical world place on us?  Once upon a time, I was a Warrior of the Physicist Clan, and it was my clan's destiny to gaze to the heavens and to the minutiae, and ask what the cosmic rules were which bind the two together.  I hope to share some of that wonder with you here.

So, let us start with the mundane, and proceed from there to the sublime: You are the operations manager for a thriving startup in San Francisco, and as such, figuring out how to heat the building on those frigid June mornings is of paramount importance.  You conceive of a grand plan to leverage Big Data to heat the building:
Step One: Start with information about the location and of every air molecule in the building.
Step Two: Stand by the front door.  When a colder-than-average molecule approaches the door from the inside, open the door very quickly to let it out, and then shut the door again.  When a warmer-than-average molecule approaches from the outside, do the same to let it in.
Step Three: Profit!  You have just raised the temperature of the building using nothing more than Big Data!  The devs are happy in their shorts and flip flops, and you have saved the company huge amounts of money on heating bills, allowing them to buy Red Bull and paleo snack packs for the micro-kitchen for another three months before they burn through the rest of their funding.

This little thought experiment is often called "Maxwell's Demon", because of its original publication by James Clerk Maxwell in 1872.  It was raised as a paradox, of sorts, because it appears to violate the 2nd Law of Thermodynamics, which states that entropy should always increase.  The demon effectively takes a disordered system, and sorts it into an ordered system of "hot molecules" inside, and "cold molecules" outside, making it appear to violate the 2nd Law.  However, there's a way out of the apparent paradox.  Notice what the demon started with: Information about the location of every air molecule in the building.  The demon converted pure information into pure energy.  This is because information is interchangeable with with energy.  When Claude Shannon coined the concept of "information entropy", he did so full well understanding the implications, that the two concepts are, in fact, one in the same.  Entropy, the thermodynamic concept, is in fact simply a measurement of the number of accessible states that a system can obtain; a molecular chain with 20 links can have more possible configurations than one with 10, by an exponential amount, and hence has far higher entropy.  By the exact same token, a computer with 20 bits of storage can store an exponentially larger number of possible binary strings than one with only 10 bits, leading us to speak of it having a higher capacity for entropy.  The measurement of entropy is simply an accounting trick, and it's the same for computers as it is for molecules.

So, where's the resolution in the demon paradox?  There are many subtleties that have arisen over the years about how the demon makes measurements, and whether it can open the door in a way that doesn't waste more energy than it captures, but fundamentally, we can understand the answer by realizing that we forgot about a step:

Step Zero: Measure the location of every air molecule in the building.

This step, gathering the information used in step one, requires us to expend energy, to measure (and record!) the location of every molecule.  In essence, we are converting that energy into information, and then using that information to convert it back into energy (in the form of building heat).  The 2nd Law tells us that we will never be able to do this in a way that gets out more energy from Step Two than we started with in Step Zero, because in each of these conversion steps, we create excess entropy, and lose energy, driving the universe that much closer to heat death.

This relationship between energy and information is one of the most fundamental and befuddling, and we'll revisit it many times on Theory Thursday.  Compare, for instance, that a 50 MHz CPU heat sink and fan:
With the equivalent hardware for cooling a 4 GHz CPU:

The fact that the latter crunches at 80 times the speed of the former is directly responsible for the fact that it requires much more cooling: processing information generates entropy, which must be dissipated in the form of heat...or does it?  We'll revisit this topic when we discuss reversible computing, or, How to Compute On The Cheap (If You Don't Mind Waiting For Your Answers).

No comments:

Post a Comment