Lesson 1: I P, You P, we all P for TCP/IP

Next Lesson
Sample Writings


Terms

byte a grouping of bits that signifies one character of information. Modern operating systems all use 8-bit bytes, but early work on TCP/IP was done on systems with 7-bit bytes.
datagram a packet format defined by the Internet Protocol (IP). All packets sent by the TCP/IP protocol suite use datagrams.
gateway a machine that converts from one protocol suite to another, such as from TCP/IP to AppleTalk. This is the modern networking usage; it used to be interchangable with router, but is not anymore. Gateways can also be used to refer to specific protocols within a suite (such as converting email between SMTP and X.400) rather than the entire suite. If multiple protocol suites are being run over the same network, then a gateway does not need to be multi-homed.
host any machine with an interface onto a TCP/IP network. A host can often have several functions that it performs within that network, or it can serve a single purpose.
interface a computer's connection to the network, regardless of the physical type of network. This can be a point-to-point link such as a modem or serial interface, or a multipoint interface such as an Ethernet adapter. An interface is not the same thing as the physical connection; it is the computer's software construct that allows TCP/IP to function. Because of this, a computer may have several virtual interfaces, including the loopback interface.
IP address a 32-bit value that is a unique identifier for a given interface. This value is used as the source address of any datagram that originates from this interface, and is the destination address for any datagram meant to be delivered to this interface.
multi-homed a host that has multiple interfaces. Usually used to refer to hosts that have multiple physical interfaces.
octet an 8-bit block of information. In modern operating systems, this is synonymous with byte, but the term octet is used in most RFCs and documentation regarding TCP/IP since much of the early work on TCP/IP was done on machines with 7-bit bytes.
packet a block of data that, in addition to the data, also carries the information necessary to deliver it, similar in fashion to a postal letter.
protocol a set of rules that is used to standardize communication between two or more possibly dissimiliar entities, such as nations or computers. In the computer sense, most protocols perform one specific purpose (updating routing information, transporting email, etc.), and themselves rely on other, more basic protocols. When talking about such a family of protocols, the term "suite" is usually used.
router a host whose function includes routing IP datagrams. This usually implies that the host is multi-homed. Mainly for performance reasons, most modern routers are dedicated machines with specialized hardware and software, but just about any machine capable of running TCP/IP can also be as a router.

Would you like a suite with that network?

TCP/IP is not just one protocol for communicating between computers. It is an entire family of protocols, all adapted for specific purposes, all designed to work together. There are two protocols, however, that form the backbone of the TCP/IP suite:

TCP/IP is used in many places because it is fast, efficient, and reliable. Since it has been in development for many years, it has attained a high level of maturity, despite the fact that it can be quite complex, both to implement and maintain. There are many other protocol suites: OSI, X.25, SNA, IPX/SPX, NetBEUI, SMB, AppleTalk, as well as many others. Many of these other suites have some distinct advantages over TCP/IP, yet TCP/IP is more popular than all of them. Why?

TCP/IP has two distinct advantages that no other protocol suite has:

  1. Source implementations are freely available. This means that a company who would like to develop a version of TCP/IP can start by looking at known versions and seeing how they do it. TCP/IP was originally developed as part of the Net/1 networking code in the 4.2 release of BSD UNIX, followed by Net/2 in BSD 4.3 and Net/3 in BSD 4.4. Most implementations of TCP/IP are based on the Net/2 and Net/3 code in some fashion. Unfortunately, because it is extremely complicated to write a full TCP/IP implementation, many of these versions suffer from bugs that were present in the original Net/2 and Net/3 releases.
  2. TCP/IP is governed by a set of open standards that anyone can provide input to. For example, SMB (governed by Microsoft) and AppleTalk (governed by IBM) can be and have been changed several times during their lifetime, which often causes incompatibilities or problems. While TCP/IP has also changed over its lifetime, the fact that no one vendor or group determines the direction that TCP/IP will take means that many of the best minds in computer science can provide information and viewpoints. This has resulted in a suite of protocols that continues to (for the most part) work just as well with older implementations as it does with modern, up-to-date machines.

The Internet Engineering Taskforce (IETF) is now the group that manages the discussions over the TCP/IP suite. The IETF membership is made of members of most of the main TCP/IP vendors (such as Sun Microsystems, Cisco, Bay Networks, Microsoft, and others) as well as several highly-respected computer science researches and consultants. The IETF maintains a complete list of the current documentation for the TCP/IP protocol and oversees the process by which anyone may propose changes or additions to TCP/IP.

Documents called Requests For Comment (RFCs) are the documentation for TCP/IP. Collectively, these documents describe how the various protocols within TCP/IP operate, and determines which protocols and options are mandatory and which are optional. You can find the complete listing and text of all RFCs at http://www.rfc-editor.org/, the home site of the RFC Editor.

Although it is not necessary to read the RFCs in order to use TCP/IP, the RFCs are the final authority in the networking world. There are also several RFCs that deal with broader matters, such as routing practices, that are the result of years of experience from hundreds of network administrators. Having such information available is necessary if you truly desire to understand how TCP/IP works, why certain things are possible but less than optimal, and how to get around TCP/IP's shortcomings and flaws.

RFCs cover many different aspects of networking, not just the nitty-gritty details of TCP/IP. To quote from the RFC Editor website:

The Requests for Comments (RFCs) are a series of notes, started in 1969, about the Internet (originally the ARPANET). The notes discuss many aspects of computing and computer communication focusing in networking protocols, procedures, programs, and concepts, but also including meeting notes, opinion, and sometimes humor.
The specification documents of the Internet protocol suite, as defined by the Internet Engineering Task Force (IETF) and its steering group (the IESG), are published as RFCs.
The RFC Editor is the publisher of the RFCs and is responsible for the final editorial review of the documents.

You should have RFC 1878, "Variable Length Subnet Table For IPv4" as a handout for this course, and will probably be receiving several more as this course continues. Spend a few minutes looking them over to get familiar with how they're formatted and what sorts of information they're likely to contain.

Blueprints for a network protocol

Because TCP/IP is so complex and dynamic, the original designers wanted to avoid a common problem with computer software: having changes break old software. Although several modern software design philosophies, such as Object-Oriented Programming, provide acceptable answers to this problem, they didn't exist when TCP/IP was being designed. Instead, the designers used an architectural model for TCP/IP that would fulfill much of the same purpose. Although not all descriptions of TCP/IP agree on the number of layers in the protocol architecture, they all show a number between three and five (as opposed, for example, to the OSI reference model, which has seven layers).

Figure 1-1

Figure 1-1 is a pictorial representation of the TCP/IP suite. In this model, all data being sent to the network is passed by from Layer 4, the Application Layer, down through each successive layer until it is placed upon the physical network medium by Layer 1, the Network Access Layer. As the data passes down, each layer wraps another envelope around the data, containing the control information necessary for that layer. The control information is called a header, since it is transmitted before the data. This process of adding headers is called encapsulation, shown in Figure 1-2. One thing to remember is that only the Application Layer handles the original data; the lower layers receive a block of data that includes the original data as well as the headers from the intervening layers, and treat that complete block as their data.

Figure 1-2

When the network stack receives a packet of data, the reverse process is followed. Each layer in turn strips off its corresponding header, processes the control information as necessary, and passes the data up to the next layer as directed. Each layer has its own data structures and, in theory, is unaware of the data structures of other layers. In practice, the data structures are chosen to be compatible with those of other layers. However, this design assures us that should the details of one layer change, the only changes that would need to be made to a system would be a replacement of the common code that directly manipulates that layer; all other networking code on the system would continue to run normally.

The little black address book

Undoubtedly, the heart and soul of TCP/IP is the IP address, which is part of the the Internet Protocol, defined in RFC 791. Just as the key to moving the vast amounts of mail and packages per day that the US Postal Service handles is correct addresses, IP addressing allows vastly different machines from all over the world to communicate every day. An IP address is a unique 32-bit number assigned to each interface (virtual or physical) within a host that is running TCP/IP. This provides over 4 billion possible addresses in theory; in practice, we are in danger of depleting the IP address pool within a few years.

The most common way of signifying an IP address is to use the dotted quad notation. Since an IP address is a 32-bit number, it can easily be broken up into 4 octets. Each octet may range in value from 0 to 255 as an unsigned 8-bit value.

An example is our news server, news. It has a single Ethernet interface, which is assigned the IP address of 192.168.0.101.

Our web server www provides another example. It has one Ethernet interface, which is assigned the IP address of 192.168.0.102. However, it also hosts several virtual domains; although each domain on the computer shares the same physical interface, administratively each is assigned a virtual interface, which are each then assigned their own, unique IP address (such as 192.168.0.103).

A final example is our border router. It has approximately 16 different interfaces currently assigned, some of which are physical, some of which are virtual. For example, the Ethernet interface that is connected to the servers has an IP address of 192.168.0.1, while the Ethernet interface that connects to the office network is assigned the IP address 192.168.1.1. On its one ATM/OC-3c interface, it has several virtual interfaces; one of those is our connection to the Internet, assigned the IP address of 192.168.2.130.

A note about uniqueness of IP addresses: in the early days of the Internet, when it was still called the ARPANET and run by the National Science Foundation under grant from the Department of Defense, there were many different universities, military bases, defense contractors, and computer science companies that all had their own in-house networks. Many different networks sprang up to connect these organizations in different ways; for example, there were regional networks for many of the educational institutions to join, the Milnet joined military networks, and BITNET joined all sorts of different organizations. The ARPANET became one of many internets - a network that, instead of providing connectivity to hosts, provides connectivity to networks. Over time, as many of the networks it connected were in themselves internets (the BITNET, for example, and many of the regional educational networks) it became THE internet, and therefore the Internet.

Getting back to IP addresses, each IP address needs to be unique on the internet the network is connected to. If you are running your own network and never expect to provide Internet connectivity, then you can use any addressing scheme that you want and be fairly confident of having enough IP addresses. On the other hand, if you are going to connect to the Internet, then you need to contact the proper authorities (usually your service provider) to get assigned a valid, unique range of addresses.

The other thing to note is that you need a unique IP address per interface; despite this being clearly stated in the RFCs, there has been some equipment manfactured that tries to maintain one IP address per host. This practice is called host addressing. Although it seems on the surface to be much simpler, in practice it is both harder to understand (partly because the assumption in any TCP/IP literature is that you are following the RFCs and using interface addressing) and harder to use (since it isn't nearly as flexible and versatile).

Even with interface addressing, it seems hard to believe that we're running out of 4 billion IP addresses. Lesson 2 will help us to understand exactly why this is, in fact, the case.


Next Lesson
Sample Writings