Logo for BigHook 2012: Data
27Aug12

BigHook2012: Data

Data, the theme of BigHook2012, is the highest form that the Internet Protocol deals with. Data are the stuff of IP packet payloads. Apps, and other networky stuff like error checking protocols, were implemented at a higher layer than the Internet Protocol itself for good reason, the most important of which was humility-borne ignorance about future uses of the Internet.

Data are what we care about. Public data, private data, open data, big data, metadata, real time data, data rates, data sets, stored data, data representation, data visualization, data analysis, confirmatory data, contradictory data, data in the cloud, data mining, data encryption, fabricated data, data centers, data archives, data back-up, data caps, data, data, data . . .

[Cartoon used here under Fair Use -- David I]

. . . data, data, data . . .

. . . [Target statistician Andrew Pole] was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy. One Target employee I spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August . . . About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry . . . “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

from How Companies Learn Your Secrets, NYT (link is to local copy)

. . . data, data, data . . .

. . . the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. But they are also creating a host of new problems. Despite the abundance of tools to capture, process and share all this information—sensors, computers, mobile phones and the like—it already exceeds the available storage space (see chart 1). Moreover, ensuring data security and protecting privacy is becoming harder as the information multiplies and is shared ever more widely around the world.

from Data, Data Everywhere, Information has gone from scarce to superabundant. That brings huge new benefits, says Kenneth Cukier (interviewed here)—but also big headaches in The Economist, February 15, 2010

. . . data, data, data . . .

The White House blog today featured a new post about the "Building Blocks of a 21st Century Digital Government." If these are the building blocks of reinvented government, however, we're on shaky ground. Most agency CIOs don't know what their agency's major IT holdings are. Really. Decisions determining what data will be released, and how it gets released, are routinely made by individual departments, outside public view, and without review from the federal CIO or CTO, Congress, or the public. This is a shame, because the $80 Billion+ federal IT budget contains a wealth of vital information, and should be considered a national asset . . .

from Shaky Foundations of Federal IT, by John Wonderlich,The Sunlight Foundation

. . . data, data, data . . .

. . . meteorology, genomics, connectomics, complex physics simulations, biological and environmental research . . . Internet search, finance and business informatics . . . ubiquitous information-sensing mobile devices, aerial sensory technologies, remote sensing, software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks . . . A/B testing, association rule learning, classification, cluster analysis, crowdsourcing, data fusion, data integration, ensemble learning, genetic algorithms, machine learning, natural language processing, neural networks, pattern recognition, predictive modelling, regression, sentiment analysis, signal processing, supervised learning, unsupervised learning, simulation, time series analysis and visualisation. Additionally, massively parallel-processing (MPP) databases, search-based applications, data-mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.

from Big Data article in Wikipedia

. . . data, data, data . . .

The SG-3 room (Room 641A at the AT&T Building, 641 Folsom Street) was created under the supervision of the NSA, and contains powerful computer equipment connecting to separate networks. This equipment is designed to analyze communications at high speed, and can be programmed to review and select out the contents and traffic patterns of communications according to user-defined rules. Only personnel with NSA clearances – people assisting or acting on behalf of the NSA – have access to this room. AT&T’s deployment of NSA-controlled surveillance capability apparently involves considerably more locations than would be required to catch only international traffic. The evidence of the San Francisco room is consistent with an overall national AT&T deployment to from 15 to 20 similar sites, possibly more. This implies that a substantial fraction, probably well over half, of AT&T’s purely domestic traffic was diverted to the NSA. At the same time, the equipment in the room is well suited to the capture and analysis of large volumes of data for purposes of surveillance. [Emphasis added -- David I]

from EFF summary of former AT&T technician Mark Klein [link.pdf]

. . . data, data, data . . .

My name is Babak Pasdar, president and CEO of Bat Blue Corporation . . . In late September 2003 I received a call from a technology partner about an urgent high visibility project for large wireless carrier . . . At one point I overheard [client employee co-workers on the project] Client 1 and Client 2 talk about skipping a location. Not wanting to do a shoddy job, I stopped and said we should migrate all sites. Client 1 told me this site is different. I inquired who is it? Carrier owned or affiliate? Client one said this is the Quantico Circuit. I remembered that he paused and looked at me as did Client 2. I inquired "Quantico, Virginia? is this a store location?" Client 1 responded, No." "Is this what I think it is?" I asked. Client 1 did not reply but just smiled . . .

from Affadavit of Babak Pasdar for the Government Accountability Project, 2/28/08 [link.pdf]

. . . data, data, data . . .

 

Agenda

[nb: the mapping of time slots to people/subjects is very much subject to change.]

Wednesday, August 29

Noon to 1:30 PM: Check in, lunch, swimming, meet fellow participants
1:30 to 3:30 PM: Session 1a: Introductions
3:30 to 4:00 PM: break
4:00 to 5:30 PM, Session 1b: More Intros, Intro to Data
5:30 to 8:00 PM: Dinner, fishing
8:00 to 9:00 PM, Session 2: Music, Jesse Ausubel on the Encyclopedia of Life, etc.
9:00 to Whenever: Whatever

Thursday, August 30

7:00 to 8:30 AM: Breakfast, fishing
8:30 to 10:00 AM, Session 3a: New Prospects for Access Networks (Susan Crawford, discussion starter)
10:00 to 10:30: break
10:30 AM to Noon, Session 3b: ITU and the Internet (panel)
Noon to 2:00 PM: Lunch, swimming
2:00 to 3:30 PM, Session 4a: Discussion w/ Bruce Schneier
3:30 to 4:00 PM: break
4:00 to 5:30, Session 4b: Spectrum, Radios, Regulation (panel)
5:30 to 8:00 PM: Dinner, fishing
8:00 to 9:30 PM, Session 5: Spectacular Musical Event plus Something Else Awesome, TBD
9:30PM to whenever: BOF Sessions

Friday, August 31

7:00 to 8:30 AM: Breakfast, fishing
8:30 to 10:00 AM, Session 6a: Discussion w/ Brewster Kahle
10:00 to 10:30 AM: break
10:30 AM to Noon, Session 6b: Summaries, Learnings, Action Items.
Noon to 2:00 PM: Lunch, swimming
2:00 PM-ish: Adjourn

Travel & Lodging

Information about Airports, Busses, Lodging, for BigHook is here. Providence (PVD) is a small airport and traffic is better than Boston, though "The Big Dig" has made Boston's Logan Airport much more accessible. Also the bus service from Logan to Woods Hole is MUCH better than Providence bus service.

Lodging establishments are depicted below. Click the map for "live" G-Map of Airplane House, etc:

 

Music

The BigHook2012 musicians in residence are the band Choro das 3, namely Corina Meyer Ferreira, 24 (flute), Lia Meyer Ferreira, 22 (7 string guitar), Elisa Meyer Ferreira, 19 (mandolin, clarinet, banjo & piano), and Eduardo Ferreira (pandeiro).

Sponsors & Acknowledgements

The BigHook community and isen.com, LLC gratefully acknowledge the sponsorship of Afilias, thanks to Ram Mohan and Desiree Miloshevic, and Google via the initiative of Rick Whitt and the good offices of Vint Cerf and Johanna Shelton. Also, thanks to all BigHook2012 participants who dug a bit deeper into their budgets to support BigHook this year.

Thanks also to

  • Chef Roland and his fine crew
  • Dewayne Hendricks, Hartley Hoskins, Art Gaylord, and the Woods Hole Oceanographic Institution for Internet connectivity
  • Judi Clark for Web work
  • Gardner Miller, the Point man
  • Paula Blumenthal

Fine Print:

All of the above is on a best effort basis. I might fail to deliver on any of the above, so none of it is a promise, and no guarantees or warranties are implied. Here's my actual, real world promise: I'll do my best, and if things screw up or stuff happens that causes plans to change, I'll do my best to give as much notice as I practically can. In other words, if you don't expect the impossible, I'll do my best to deliver it. -- David I

BigHook Home

on this page:

Intro

Agenda

Travel Info

Music

Sponsors

on nearby pages:

2012 Participants

Photos by . . .
Monica
Scott
Gwenn
Doc

Chat day 1
Chat day 2
Chat day 3

elsewhere:

Data

Data.Gov

The Shaky Foundations of Federal IT by John Wonderlich

Big Data

Information

Metadata

Shaping the Future by Charlie Stross

How Companies Learn Your Secrets, NYT (local copy)

Too Big to Know by David Weinberger

Cory Doctorow's review of Too Big to Know

The Information: A History, a Theory, a Flood by James Gleick

Andrew Odlyzko's review of The Information

The Volume and Value of Information (.pdf), by Andrew Odlyzko