Wednesday, June 26, 2013

Data is Oil and the API is...Whatever Oil Comes Out Of!

How APIs Help Us Comprehend The Infinite Concept Of Data

"The API has emerged as the means for connecting software and services."  "The API", as opposed to...hacking into someones system and extracting the data in pure binary form?  I don't mean to nitpick; APIs are real, and they're important.  Badly designed APIs (or even a lack thereof) are a horror, and can be a true economic drain.  Ask anybody whose data is locked away in a 00s vintage "data warehouse." Or ask Joshua Bloch, the guru of Java APIs.  But when you say something like:

By itself, data is irrelevant. The enterprise model has demonstrated that software on-premise has limited value when isolated in silos. But connect it with APIs, and transformations can occur that just were not possible before.

You're not saying anything about APIs.  You're saying something about data integration.  If you remove the words "with APIs" from the sentence, the meaning is unchanged, since there's no other way to connect data.  Data integration is the secret sauce.  The API is just the pipe.

Tuesday, June 25, 2013

Crowdsourcing Regulation

A few interesting recent regulatory moves:
- California state regulators try to ban ridesharing services such as Lyft as unlicensed taxi service
- New York City bans AirBnB as an unlicensed hotel
- Several states attempt to ban Tesla from selling direct-to-consumer cars because...well, it's not clear why except that car dealerships are threatened and have a lot of friends in high places.

These all have in common 20th century regulation colliding at high speed with 21st century commerce.  As another example, the America Psychological Association has long made it illegal to provide psychological counseling for patients unless both the patient and the doctor are licensed in the same state, making telemedicine a difficult proposition.

For the most part, these regulations are well-intentioned (with the exception of the Tesla direct sales ban.)  They are intended to prevent consumers from getting ripped off in an imperfect marketplace, and to impose minimum standards of safety, cleanliness, and service on service industries.  And, pre-Internet, they made a lot of sense.  How was I supposed to know if this pink-mustache car that I'm climbing into is going to take me for a ride?

But a funny thing happened on the way to ubiquitous informational awareness.  You can't book an AirBnB or a Lyft or buy a Tesla direct without an internet connection.  And, if you have an internet connection, you can in fact find out whether these services are reputable and safe.  There's no information asymmetry anymore for these services.  Is it possible that Yelp and other review services have actually solved the problem of crowdsourcing regulation?

Still, there's a need for enforcement.  How do we verify bad actors and punish them without a licensure model and an army of inspectors?  Enter: ADA compliance lawsuits.  The Americans with Disabilities Act imposes a number of requirements on small businesses to ensure accessibility for those with disabilities, such as adequate disabled parking and accessible toilets.  And, instead of employing inspectors, it empowers disabled individuals to sue for non-compliance, and receive damages.  Libertarian think tanks find this practice appalling, and call it frivolous.  But in my mind, this is a Libertarian dream come true: no messy government inspectors or agencies, we simply empower individuals with market-driven incentives, and they ensure appropriate compliance.  The explosion of cell phones and connected mobile devices ensures that the consumers of internet-based services have the tools to ensure compliance on their own.  For instance, services like Uber and Lyft routinely record the routes their drivers take, and so could any consumer, using a GPS enabled smartphone.  If they feel the route they took was inefficient and they were overcharged, they have all the evidence they need in their pocket to prove it.  If they feel the driver was unsafe, or the car was filthy, they can easily take a photo or video and instantly file a report.  When everybody is an inspector, who needs inspectors?

There's no doubt that the profusion of government regulation in this sphere is in part due to entrenched interests.  Taxi cabs don't want competition from ridesharing, and hotels don't want competition from house-sharing.  But the existing regulatory regimes aren't laudable, they're laughable, and they're long due for an overhaul.  Everybody would benefit from the expanded competition and vastly expanded compliance information that the public could provide.

Reports of Kindle Death Rays Have Been Exaggerated

Apparently the FAA has heard my complaints about the Kindle Death Rays:
FAA moving toward easing electronic device use

Friday, June 21, 2013

Should We Rely On Incompetence to Safeguard Our Civil Liberties? Or: How to Build a Better Call Trap

Robert Mueller's testimony today on the NSA phone monitoring (F.B.I. Director Warns Against Dismantling Surveillance Program) had some fascinating tidbits.  First, there's this (emphasis mine):

Testifying before the Senate Judiciary Committee, Mr. Mueller addressed a proposal to require telephone companies to retain calling logs for five years — the period the N.S.A. is keeping them — for investigators to consult, rather than allowing the government to collect and store them all. He cautioned that it would take time to subpoena the companies for numbers of interest and get the answers back.

“The point being that it will take an awful long time,” Mr. Mueller said.

“In this particular area, where you’re trying to prevent terrorist attacks, what you want is that information as to whether or not that number in Yemen is in contact with somebody in the United States almost instantaneously so you can prevent that attack,” he said. “You cannot wait three months, six months, a year to get that information, be able to collate it and put it together. Those are the concerns I have about an alternative way of handling this.”

Mr. Mueller did not explain why it would take so long for telephone companies to respond to a subpoena for calling data linked to a particular number, especially in a national security investigation.

I can tell you why it would take so long in one word: incentives.  The NSA and FBI are incentivized to build a system that actually works efficiently and effectively.  The phone companies, if faced with regulatory requirements to retain records, and incentivized to do it cheaply.  Let's do some back of the envelope math here:
- The average person probably makes 5 - 10 phone calls/text messages a day on their mobile device.
- Wikipedia tells us that there are about 300,000,000 mobile phones in the US.
- That comes out to about 3 trillion phone calls in 5 years.  Let's say a single carrier handles maybe 1/5 of that traffic, or 600 billion calls they have to retain.
- Assuming metadata on a single call (from, to, duration, date, time, and maybe IMEI) takes up 1 kilobyte of data.
- Then the carrier is required to keep a rolling log of about 500 terabytes of call data

As bad as this sounds, it's not actually that big a deal.  Facebook handles about this much data each day.  And using horizontally scalable key-value stores, like Cassandra or MongoDB, you can easily store the data and return the results in near real time, as long as you're willing to throw enough commodity hardware at it.  But that's the real issue: the willingness.  Verizon, AT&T, these guys don't really want to be in the business of storing call log data and providing it to the government.  It doesn't make them any money.  So they would simply throw it onto a disk, making it unsearchable, and tell the government, "Sorry, your request will return in 3 - 6 weeks."  You could in theory legislate that they return the results faster, but you can't actually legislate that people build competent technology infrastructure.  Failure is a more likely scenario than compliance.

With all that said, though, the fundamental question in my mind is this: What is the real difference between the NSA storing the data and the phone carriers storing it and producing it on request?  I think this is an interesting philosophical question, and as a civil libertarian, not one I take lightly.  The process is essentially the same:

Case 1: The FBI asks Verizon for calls relating to X, and they get an answer back.
Case 2: The FBI asks the NSA for calls relating to X, and they get an answer back.

Going through Verizon for the request may make it take longer, and that may be a good thing, if you're worried about abuse of the data.  But, should we really be relying on incompetence as a safeguard against abuse?  Frankly, incompetence is often the only thing that stands between us and abuse by corporations and the government.  People who ascribe all things to vast complex conspiracies fail to appreciate the true depths of human fallibility and incompetence, in most cases.  But, if the question is one of principle (legal, moral, or otherwise), it's worth asking ourselves if we'd be comfortable with Case 1, why are we fundamentally less comfortable with Case 2?

Transparency Drives up the Value of Secrets

I do almost all of my magazine reading on airplanes in the 20 minutes at takeoff and landing when I can't use my laptop or eReader (because dangerous Kindle rays have been implicated in over a dozen fatal airplane crashes in just the past 12 months).  Two items in this month's New Yorker caught my interest (emphasis mine).  The first by James Surowiecki:
The consequences of being caught [insider trading] have never been higher...but hard-pressed fund managers continue to be tempted.  Competition in the investing world is fierce: there are now nearly eight thousand hedge funds, and on average they have underperformed the stock market for nine of the past ten years.  Whatever your supposed market-beating strategy is, someone else is probably duplicating it, and everyone is desperate to find an informational edge.  There was a time when big investors could come by that edge quasi-legally, as companies leaked information to select investors and analysts.  [c.f. the Facebook IPO -ed]  But in 2000 the S.E.C. passed a rule called Regulation F.D., which required companies to disclose material information publicly or not at all.

And, from The President and The Press: "Obama said that he would make 'no apologies' for zealous press-leak investigations, since unauthorized disclosures of secrets jeopardized the lives of soldiers and the spies he sent in danger's way."

The common thread between these two articles is the rising value of secret information.  The internet has given us a huge amount of information, which can be utilized by almost anybody to help them understand market trends, world changing events, and communities.  The governments are increasingly being pressured to put everything online, and not just DMV forms, but raw data in computer readable formats, ("Empowering People").  And others are using that data to create ever more sophisticated analyses and make them available to others, such as Open Secrets.

In a society where everybody knows everything, the ability to trade on secret information becomes impossible.  The problem is that trading on insider information is a really, really lucrative way to make Money for Nothing and your Checks for Free.  Surowiecki suggests that the solution (to the insider trading problem) is to simply disclose yet more and more.  But companies have very little incentive to deter insider trading, especially when the 'tips' come from third party sources (like expert networks.)  Moreover, this strategy will simply drive the value of the secrets up further, and drive up the excesses people will go to in order to obtain them.  The explosion in top-secret data (as well as the explosion in leaking of that data to the press) is based on the same principle.  The internet makes it possible for everyone to know everything that's publicly available at the speed of light, so where is my next Huge Scoop going to come from?  It's got to come from a secret source.

Hypothesis: The demand for fraud in the world is pretty much a constant; if you squeeze it via regulation, prosecution, or disclosure, it simply increases the value until supply reaches demand, at least to first order.

Thursday, June 20, 2013

A Public Service Announcement for Tech Bloggers

If you look at the web site of an enterprise software company and declare that "nobody understands what they do" after three minutes of reading, here's a hint: it's probably because you cover mobile-social-smartphone-cloud-game consoles, and have no idea what the challenges are of running an enterprise with dozens of databases and terabytes of data spanning 10 years and multiple architectures.  One of my problems with blogging has always been, whenever I try to post on a topic, the acute awareness of how little I actually know.  Most bloggers are blissfully unburdened by the Dunning-Kruger effect.

Tuesday, June 18, 2013

The Knowledge and The Information

Right after one of my first really brutal breakups, at the suggestion of a friend, I picked up The Information by Martin Amis.  It's incredibly depressing, but it has a lot to say about what's important in life, and what isn't.  I read it the same way some people like to listen to maudlin emo albums after a breakup.  The main character Richard likes to frequent a divey joint called The Warlock, which has, along side the snooker table and other accouterments  a bar trivia machine which is colloquially referred to by the denizens of The Warlock as "The Knowledge" (named after the test that London cabbies have to take before they can get their hack license.)  Richard is the hands down world champion at The Knowledge.  He knows every answer to every crossword puzzle.  He has read everything by everyone.  He is a fount of trivia.  But, the quiz machine (and, indeed, the modern sense of the word "trivia" itself) slyly winks at central paradox of The Information.  Martin has all The Information, but he has none of The Knowledge.  He can answer questions about dead authors and living cricketers, he's a novelist who has read every piece of fiction there is to read about the meaning of life and the human condition, but he has no idea what the meaning of his own life is.
The 3 P's: Plumbing, Payment, and Politics

I've spent the past several years on the front lines of the Big Data wars, and I spend a lot of time thinking about issues around data, information, meaning, and intelligence from a very pragmatic perspective: How do you get organizations to share information?  What are the obstacles to actually moving information around within and between organizations?  How do you bring value to organizations through data?  Where are the trends taking us, and what are the emerging contours?  (In my recent talk at Georgetown, I coquettishly categorized these under "The Three P's: Plumbing, Payment, and Politics".)  So, consider this an invitation to a discussion about (or, more likely, a solipsistic ramble through) getting us from Big Data to Big Knowledge, the "ecstasy of consciousness" , and/or a marginally functioning information economy.