Dan Bricklin's Web Site: www.bricklin.com
A Taxonomy of Computer Systems and Different Topologies: Standalone to P2P
Looking at where there is computation, data storage, and input/output, at what is standard and what is custom, and at what is on a PC and what is on a shared managed server, helps the understanding of different systems.
I'm part of yet another presentation about Peer-to-Peer (P2P) at InfoWorld's CTO Forum. In preparation for that panel, I discussed my thoughts with Della Van Heyst, who's involved in setting up the conference, and Ray Ozzie of Groove, who will be on the panel and is a major voice in the P2P world. This essay is an expression of some of those thoughts and should help position Groove's offering.
I think that in thinking about the excitement around P2P, you should try to look into it deeply and find out why there are elements that are clearly full of promise, and why it clearly has some things that are different than what many people were used to before. You also need to understand why so many different things are called "Peer-to-Peer" even though they are quite different.
We all know that P2P relates to some sort of topographical difference in using computers and networks compared to what was common in the near past. What we struggle with is "what is that difference?" What I think is that people want to think that there is one new topology and that's what's happening. I think we're confused because all sorts of things, each of which has merit, claim to be "Peer-to-Peer", yet they all are different in many fundamental ways. They all use multiple PCs, which we think of as "Peers", connected by the Internet "to" something, so they each can lay claim to that moniker.
Let me propose some taxonomy that describes the topologies and roles of computers in a network, and then let's examine the taxonomy of various systems, such as Napster, Instant Messaging, and Groove, in light of that taxonomy.
First, let's distinguish between two types of computers. I'll call them "servers" and "PCs". The servers are computers that are often shared by many users, managed by dedicated people, and often dedicated to a particular application. PCs are personal devices that individuals have near their persons (often on their desks, in their homes, or carry with them). The same PC is often used for a variety of different purposes.
Second, let's distinguish between different types of software running on those computers. I'll break them down into "standard" and "custom". Standard software implements a general purpose application that communicates in a standard way. Usually many computers run the same standard software for a wide variety of purposes with different data. Custom software is specific to a particular task or system. It is usually used by only a small number of people, sometimes related to the same application and sharing of data.
Finally, most applications have "computation", "data storage", "input", and "output". Computation is what the CPU does, running the program. Data storage is what is specific to a particular use and results from the input and is used to make the output, sometimes after doing some computation. Input is created by people or devices or other systems. Output is usually read by people or results in some other change in the outside world.
Let's look at various types of configurations and examine their topology in light of this taxonomy. (I have a simple chart that will make more sense after reading the detailed explanation.)
We'll start with standalone computing.
When there is no network, you have either a server or a PC on which either a standard program (e.g., an accounting package, image editor, or word processor) or a custom program (e.g., company-specific inventory management, actuarial table calculation, mortgage calculator) takes input on that computer, does some calculations, stores intermediate data, and produces output.
Organizations and individuals liked this configuration for doing local things that could be confined to one location. It opened up the prepackaged software industry, because there were many applications that were general enough that you could sell to many locations. The ability to choose locally which software to run (either on a managed machine or a personal machine) was a great source of empowerment and led to a surge in the purchase of first managed corporate machines in the 1960's and 1970's, and then the PCs in the 1980's. Every system could be different.
Transaction processing systems have a single server with custom software doing computation and data storage. Each client machine is usually a standard "smart" terminal or a personal computer running a standard program acting as one. The input and output occurs on the PC. The PC has no storage and is only connected directly to the single server through a dedicated communications system. Part of the input sometimes comes from other sources.
This configuration let organizations do applications that involved people at diverse locations involved with a single database. It was best for applications that were worth dedicating the communications system as well as the dedicated client machines. Since the client machines were standard, their cost and maintenance could be commodities. The applications were often mission-critical to the company employing them. Interacting with computers started to be integral to the operation of an enterprise, for example at banks or airlines.
Client-server systems have servers similar to the transaction processing systems, with computation, data storage, and maybe some input, and run custom software. The PC is connected by a communications system (a LAN or WAN) that is usable for more than one application. The PCs run custom software that does computation as well as input and output.
As corporations started learning the value of using computers as part of their operation, client-server configurations let them take advantage of the power of the PCs on people's desks and the sharing afforded by the LAN. The standalone applications showed them the value of user-friendly, usable user interfaces. By running custom applications communicating with the servers they could make more efficient applications. The LAN let them share expensive hardware like large disks and expensive printers. Because they used standard equipment on the client side, they could take advantage of commodity pricing.
A web site has a standard server, with standard software, doing the simple computation to serve up data from connected storage. The data on the server is input by people associated with the server, usually through standard but special connections. The viewers of the web site use standard PCs, running standard software, connected over a standard communications system. They do very simple input (requesting a particular piece of data) and get data in a standard form, that is processed locally by a complex program (a browser) in a standard way.
The idea of web sites excited people. It was completely different than transaction processing or client-server with their custom software that took the IT department months or years to develop. Setting up a server was relatively simple, and the software was standard. In fact, many applications could share the same server with little additional work. The data could be created with very simple tools, all standard in that they could be used by many different people for different purposes. The readers needed only one standard set of software and one form of communication to access any server on the Internet. The number of potential users who could access any particular data was huge, and the number of "applications" (web sites) accessible by a single PC was also huge. Readers and writers paid for the equipment and communications cost. Adding more of either readers or writers only increased the value of the whole network without significantly increasing the cost of the whole system over what they paid incrementally to join. When people learned they could use a single PC to quickly switch between multiple applications as they needed them the utility of computing grew. When they learned they could instantly choose among thousands or hundreds of thousands or more of useful sources of information there was an additional understanding. Also, this is a broadcast medium. One server acts the same to any number of PCs. This type of configuration lead to business models similar to paper publishing, but less expensive for each additional copy. The cost for publishing educational and marketing material dropped substantially.
I use web applications as a term for custom software on web servers. This has the same characteristics as the web sites, except that there is a much higher cost to construct the web sites and much more of the data comes from the PCs connecting to the web application.
There was much excitement about this form of computing at first. It is similar to transaction processing, except that the client machines are usually shared with many applications from many servers. Unlike transaction processing systems, though, you couldn't always predict the maximum capacity needed. With transaction processing, you would peak out when you didn't have enough of your own people connected to your terminals. Here you peak out when your system can't handle it.
It was hoped that rather than hooking sales people to transaction terminals that were connected to a purchasing system you could directly hook purchasers to the purchasing system. The fear that this would displace all other forms of purchasing caused some companies to create such systems at any cost (and they are costly to do well). This change in business has become important, but has not displaced the entire set of other systems. For many applications, though, when the costs are right, this form opens up new ways to serve employees, customer, and citizens. It is very valuable within corporations and in connecting with their customers.
It was also hoped that companies and individuals that were using standalone applications or client-server applications would switch to similar applications running as web applications. This would let them have those applications managed centrally by specialists, and let them share the application beyond the confines of a single LAN. It would also enable business models where the use of the server could be charged for like a service, bringing in recurring revenue without doing additional development. Unfortunately for most providers of such applications, most people like controlling their data and applications and these have not caught on except where an externally managed server would have been used anyway (such as web site hosting), or as replacements or expansions of transaction processing systems. Running your own software continues to be desirable. In some cases a business model that gives the impression of the troll under the bridge in children's stories, charging every time you pass by, has not helped.
All of the preceding systems use single main systems (either a standalone PC or a server) to do the main computation and store the main data. The only change is where the I/O is done. The view of the world is always through this shared main data or computation. With the growth of the Internet because of email and the web, the communications system to connect dispersed, independent PCs together became available. PCs also became more powerful and data storage costs dropped so much that most PCs had huge capacities. New configurations came about that didn't depend on the server so much. These configurations became known as Peer-to-Peer. There was another boost in excitement about computing from the realization that there were other topologies that gave the user of a PC even more options for choosing applications and more options where to have data created, stored, or used. A lure of the PC, the programmable personal device, is in knowing you can find new uses and mix and match those uses as you see fit for your own needs when you want. P2P topologies added other people and what they did, wrote, or had to the mix. Applications seemed possible that didn't require a "managed" server, just the agreement of individuals and equipment they controlled and the software they used. Avoiding the "troll" business models also helped. It was like the boom when suddenly you didn't need special programming or dedicated communications lines to do incredible things that came with the early web.
A distinguishing characteristic between various forms of P2P is the role of the other PCs to a given user. In the standalone or central server world, there is one computer that is "special". In the P2P world, there are configurations where the mass of other PCs serve as a "super-server" for the purposes of redundancy, cost reduction, parallel processing, privacy, or whatever. Which person, if any, uses that PC doesn't matter. At most, the anonymous properties of the data make a difference, or which other computers it is connected to. I call these "anonymous distributed serving" configurations. Other configurations are for connecting particular PCs because of the individuals using, or location of, each PC. I call these "person to person" configurations.
Many systems calling themselves P2P are for file sharing. In that case, you need to ask yourself if it is for distributing static, possibly widely desired, files, or dynamic, narrowly needed streams of data. See my essays on "Thoughts on Peer-to-Peer" and "Friend-to-Friend Networks". I think that long-term, the most interesting P2P applications will be dynamic data, especially the Person-to-Person ones.
In many P2P systems, a server is used to simplify a particular aspect of operation, such as organizing the PCs or dealing with Internet complexities. This opens up the possibilities of running that server as a service, and the business models that go with it. These business models don't always fit in with the goals of the rest of the system. Susceptibility to, and ease of, charging doesn't always fit with the most efficient and standard way of doing things. Remember that it is the wide use of standards that helped the Web beat out competing custom systems that had "better" business models with easier ways to charge.
A problem is that P2P systems use complex software on the PCs. Developing and marketing such software takes time and money (real or volunteer). Some of the more interesting applications will have very narrow uses. We need a renewed market for PC software that can support the masses of specialized P2P software that could be developed and enhanced.
Here are some of the early P2P systems:
In the mid-1990's another configuration became popular. I'll call it instant messaging after its most popular version. Users run a particular but common piece of software on their PC. (It's "standard" but proprietary in many cases so far.) All the input and output is done on the PCs. The PCs connect to other PCs either directly through the standard communication of the Internet, or through a very simple server acting as an intermediary. A server is also often used to set up the initial connection (though with systems like NetMeeting and PCs like those connected to some cable modem systems that give you PCs locatable using a DNS name the server isn't needed). Little data is stored. The most important data is the list of names used to locate other systems (a "buddy list"), and that is often stored on the PC. This is a Person-to-Person system.
This system is different in that all the data is created and consumed on the PC and a server plays a very simple role. When the Internet is able to better address more devices (e.g., with IPv6) the DNS could be used as a distributed system of servers for doing the locating. If many people want to send many messages, the load is just on their computers and the communication facilities between them. Even if they do want to use centralized servers to relay the data (to help penetrate firewalls, for example) the servers can be relatively simple and not computationally or database heavy, and they can grow linearly since there is no sharing of any single resource. If standard protocols are used, there need be no dependencies on any given server, and the serving can be commoditized.
The configuration used by Napster is a server with a simple database of user names, a dynamic database of titles currently available with the Internet location of the PCs that hold them, computation and data access to search the titles, and Internet communication to a common custom client. The PCs run the clients that follow a single standard, and have the main data -- the MP3 files. The heavy computation of compressing and decompressing the files, the effective user interface, and all the I/O is done on the PC. Transfer of files, the main use of communications expense, is done between PCs with no intermediary server. This is an Anonymous Distributed Serving configuration. Its configuration lets it grow without heavy costs to the server for each additional user. For other benefits, see my "The Cornucopia of the Commons: How to get volunteer labor" essay.
Gnutella, Freenet, etc.
There are other systems that do the same file sharing, but use no central server for connecting PCs. They use forms of relaying to connect machines together, where each machine can act as both the client and the shared resource connecting multiple other machines. These Anonymous Distributed Serving P2P systems are designed to not have the presence of a server to aid in privacy and for the avoidance of single points of failure or shutdown.
There are configurations that use central servers to send different data to multiple PCs and make use of the computation on those PCs. The PCs then send the results back to the server. This is used when there are PCs that are unused for large periods of time (as most are) and someone can either convince or cause (because they own them) their users to run custom programs to augment the computation on a server. These Anonymous Distributed Serving systems don't even need PC to PC connections.
The purpose of the system from Groove Networks is to let individuals build configurations that have PCs with custom software, running local data, connected to other PCs running compatible software, all in a secure, controllable way acceptable to corporations. They do it by providing standard software for the PCs for assembling the components and doing the communications, as well as standard server software to do the locating, distribution of the custom software, and control of access. The applications created with Groove are Person to Person.
- Dan Bricklin, 18 June 2001
© Copyright 1999-2010 by Daniel Bricklin
All Rights Reserved.