DIVERSITY AND RELIABILITY
Flash! Satellite Outage:
" . . . not only were medical
professionals, repair people and stock watchers left hung
out to dry, but CBS, Reuters, NPR, UPI and other news
organizations were left looking for backup. By all
accounts, CBS had seamless backup, but NPR was in the
middle of broadcasting 'All Things Considered' and had to
switch to alternate satellites, ISDN, phone bridges and --
surprise -- RealAudio to get its feed to local stations."
(Industry Standard Media Grok, May 21, 1998.)
Two recent network failures
serve as "emblematic events"
for the new telecommunications: (1) the PanAmSat Galaxy IV
satellite outage, which struck at 6PM EDT on May 19, and
(2) AT&T's Frame Relay Network shutdown of several weeks
ago. Both drew attention to the pervasive, underlying
dependence of civilization-as-we-know-it upon single
specific communications links.
There are lessons here. "Learn
From History, or . . . "
kind of lessons. For example:
1. Systems that are carefully crafted and managed by the
"reliability, reliability, reliability" crowd to be 24-by-7
and fault-free, aren't.
2. If you have not second-sourced your data transport yet,
do it now.
3. The Year 2000 is coming, better make that triple-
Humanity has an amazing inability
to plan. Not too many
generations ago, when our relatives lived by hunting and
gathering, the inability to plan for the next season meant
death. Planners survived. The clueless died. But today,
Homo Sapiens eats at McDonalds - for the moment, planning
and survival are not strongly linked.
Inability to plan made news
during the big United Parcel
Service strike of 1997. Anybody that cared to look could
have seen the strike coming weeks before it occurred. When
it hit, a school supply company made news. Its entire
revenue stream hinged on a single end-of-summer shipment.
UPS was its only shipper. Yet, in several poignant
interviews with the owner, no happy-talk reporter ever
asked, "Before the strike, did you ever think what would
happen if it occurred during your critical week?"
In the old days, before telecom
competition, Ma Bell was
into Physical Diversity. Physical Diversity meant that
there were several alternate routes, each with different
geography, different technology, and different physical
infrastructure. For example, a phone call from New York to
San Francisco, using the modern technologies of the 80s,
might have traveled underground by cable, or hopped line-
of-sight from microwave tower to microwave tower, or made
one long leap to a satellite and another to Earth again.
Today, in the era of competition,
telcos want to be "Low
Cost Providers." Physical Diversity is an "unnecessary
expense." They want to skinny down, remove redundancy,
reduce inventory, refine processes - find the best solution
and stick with it. They don't need no steenkin' diversity.
Fiber *has* become very reliable,
and SONET makes it even
more so. This raises the reliability coefficient, but does
not make it 1.0. Too bad the whole system isn't just rings
of passive glass. There are all kinds of hardware and
software components to light the fiber, set up calls, parse
headers, route data, relay frames, and monitor the system.
There are systems for connecting customers, and systems for
interconnecting with other networks. There are systems to
check on the other systems.
The result is that a complex,
linearly engineered system -
designed to be 99.9999 percent reliable - ISN'T. It is
still a chaotic, adaptive system, even though it wasn't
designed to be.
ONUS ON THE CUSTOMER
The onus for Physical Diversity
is on the customer. That's
why, in the era of telecom competition, business critical
operations demand a couple of interexchange carriers, an
office phone and a cell phone, a desktop and a laptop, a
cable modem and a dial-up, and at least two Internet
Service Providers (ISPs). It is a good thing that
infrastructure is getting cheaper.
Physical Diversity, along
as many dimensions as possible,
is the most reliable route to reliability. Note that
Galaxy IV's backup system failed too! Any time that
parallel systems share components, the event that takes one
system out is also likely to bring the second system down.
If it's a software bug, and both primary and backup systems
are running the same faulty code, too bad. If radio
interference is the problem, and both systems use the same
frequencies or modulations, sorry. If you rely on
satellites and there are solar storms or meteor showers,
look out. If you get primary and backup from the same
company, and the company fails (or goes on strike, or . .
.), remember you read it here first.
When the onus for Physical
Diversity is on the customer, the
customer needs alternatives. That's a problem when 90% of
computers run one company's operating system, no matter how
innovative that company might be. And it's a problem when
a single telco controls local telephone service, no matter
how big the telco's territory.
EXODUS TO THE PROMISED BANDS
Exodus Communications is a
Stupid company - they are into
over-provisioning and Physical Diversity. They call
themselves an "Internet Data Center." Actually Exodus has
about 8 Internet Data Centers around the world. They'll
buy data feeds - DS-3, OC-3, or more - from any carrier
that'll sell them. UUNet, GTE, Sprint - Exodus buys it
all. Their customers are ISPs. An ISP gets a cage on the
Exodus floor, data feeds to order, and a Chinese menu of
In one Exodus customer configuration,
the ISP has two
redundant racks. Rack #1 gets a primary 100BaseT feed
from, say, UUNet and a secondary, totally redundant feed
from, say, Sprint. Rack #2 gets its primary from Sprint,
and its secondary from UUNet.
Exodus maintains a 200% headroom
policy. It attempts to
have twice as much bandwidth as it needs in its busy hour.
Its 200% and Physical Diversity policies extend to electric
power and heating-cooling too. It has contracts with two
different power companies, and it has a back-up generator
on the roof and another in the basement. The rooftop
generator has a different fuel tank than the basement one.
There are four air conditioners, one in each corner of the
data center. Each of the four electrical feeds supplies
one AC. And so on.
A facilities based telco can't
do this. (Imagine AT&T
advertising that its redundancy is due to secondary
facilities by MCI!) Exodus can, because it buys facilities
from all comers. Reliability emerges at a different point
in the value space.
Exodus is an excellent example
of what SMART Person Paul
Saffo calls "disinterREmediation." Once upon a time, telcos
mediated Physical Diversity for their customer, but competition
and the resulting drive to lower costs made it prudent for
them to stop. Customers can still buy Physical Diversity in
the age of telecom competition, but they have to do it
piecemeal . . . one from GTE, one from MCI, etc.
Exodus REmediates by providing one-stop shopping for
Physical Diversity. The whole process is called
Gentle Reader, if you are
still asking what is the
relationship of the Y2K Problem to The Stupid Network, I
don't think I can help further. To unsubscribe, send me a
brief message to that effect.
For the rest of us, let's
take the lessons of Physical
Diversity home. We could ask ourselves now what might
happen if our communications, our food, our electricity,
our heat, our transportation, our money, our employment are
disrupted. Physical Diversity is part of the solution
space. It can protect individuals and defined groups from
potential technological failures. I wonder whether Exodus
will rent cages for living spaces next year :-)
Physical Diversity offers
much less protection against the
kinds of sociological phenomena that could plausibly occur
when physical systems are disrupted. I have no idea where
this discussion will lead, but it is time to begin talking . . . .
<<to unsubscribe to the SMART List, send a brief
unsubscribe message to firstname.lastname@example.org>>
<<for past SMART Letters,
David S. Isenberg email@example.com
d/b/a isen.com http://www.isen.com/
18 South Wickom Drive 888-isen-com (anytime)
Westfield NJ 07090 USA 908-875-0772 (direct line)
-- Technology Analysis and Strategy --
Rethinking the value of networks
in an era of abundant infrastructure.
Date last modified: 27 May 1998