Peer to Peer: Analyzing the Network Traffic
December 5, 2000
"Napster opens a link between your PC and the hard drive of the user you've selected. Then -- bam! -- it zaps [a song] your way."
Hmmm. The only "bam!" sound I usually hear when transferring a five-megabyte MP3 from some random machine that Napster dug up is the sound of my head hitting the desk when I fall asleep waiting. So is peer-to-peer really a good way to transfer data around the Internet?
The Fortune article compares a Napster transfer to a transfer from a Web site. In discussing the latter, Fortune writes that "the song you wanted was stored there on a server , and thousands of people might be trying to access the same track at once. No wonder overtaxed servers would often conk out before you could download a single note."
Fortune claims that with Napster, "while the music travels to you via the Internet, the files Napsterites trade aren't stored on the Web at all."
OK, hold on a minute. This makes it sound like Napster files are floating around in the ether, waiting for me to grab as they drift by. In reality, the two aren't very different. If I have one machine in my office running an HTTP server and the other running Napster, it makes no sense to say that the Napster files are somehow stored and served in a fundamentally different way.
The selling point of peer-to-peer is that it claims to eliminate bottlenecks when transferring data. (There is also a claim that it's easier to find and transfer multiple files with Napster, but as I wrote in a previous article, this is because Napster is a custom application dedicated to music swapping, as compared to starting at Sony.com's site and trying to navigate down to where they sell online music, then re-navigating there if you want to transfer several files at once.)
Traffic moving over the Internet is in some ways similar to traffic moving over roads and highways. Client-server computing, or whatever you want to call the "old way" that peer-to-peer is supposed to replace, is like a city in which everyone lives in the suburbs and commutes to work downtown.
When everyone is using roads to go to the same place, you can improve specific routes, with carpool lanes and rapid transit. Traffic management companies like Akamai and InterNAP are trying to optimize Internet traffic that is flowing to or from a common point, effectively building carpool lanes on the Internet.
Meanwhile peer-to-peer is more like taking the side streets to avoid congestion, which can be a risky maneuver. And while Web servers are often in colocation facilities built near the "exits" on the Internet "highway," in the case of peer-to-peer networking, you may be doing the network equivalent of driving to a house at the end of a dirt road in the middle of nowhere.
Now, the Internet is not quite like driving. When copying a file to your machine, the final leg -- the last few hops to your machine -- will be the same no matter where the data is coming from, and the intermediate Internet will be a crapshoot as usual. The key differences are the machines that are serving the data, and the first few hops away from them on the network.
For purposes of evaluating peer-to-peer networking, we can basically toss out the issue of the machines that are serving the data. Any recent PC machine can fill most of a 100 megabit-per-second fast Ethernet connection, which is way faster than any throughput you are going to get on the Internet.
And if a Web site is serious about serving data and discovers that the bottleneck is something other than the Internet -- the processor, or memory, or disk, etc. -- then they can solve the problem locally by adding more machines.
So the real issue is the first few hops away from the serving machine. The network protocol TCP, on top of which both HTTP and Napster are built, has had decades of research done to make it adapt well to variable-bandwidth, variable-latency networks like the Internet. If TCP is given a big chunk of data to move, it will quickly reach a state where it is sending data at the maximum bandwidth allowed by the path the packets are taking through the Internet.
This maximum bandwidth turns out to be completely determined by the lowest-bandwidth part of the path. That is, the bandwidth is only as wide as its weakest link. The length of the path -- the number of hops involved -- is not nearly as important.
So the concern for peer-to-peer is really that the first few hops away from the machine that is sourcing the data will have low bandwidth. In particular, the first hop may be suspect.
Given the cable infrastructure that exists in this country, a lot of cable modems have much lower up bandwidth than down bandwidth (keep in mind that the up bandwidth of the machine that is sourcing the data is what matters). Some DSL connections are also asymmetric and always biased towards providing more down bandwidth since that is generally what home Web surfers want.
Web servers in colocation facilities, by contrast, will be set up with fat pipes to the Internet, and if anything they will be bias on the side of having larger up bandwidth.
So things may look a little grim for peer-to-peer networking. But does it really work out that way? In Part 2 of this editorial, I will discuss the results of some real-life testing I did.
Adam Barr worked at Microsoft for over 10 years before leaving in April. He is working on a book about his time there. He lives in Redmond, Washington.