l.madhavan

Networking Basics

An introduction to how the Internet works.

© L. Madhavan

Contents

Introduction

When you open your web browser, you usually enter the address of a website, for eg. www.google.com. Have you ever wondered what actually goes on inside your computer and how the webpage reaches your computer?

If you look at the status bar of your browser window while loading a webpage, you will see several messages similar to the ones below:

	Looking up www.google.com
	Connecting to www.google.com
	Transferring data from www.google.com

Firefox explicitly displays these messages. Internet Explorer too has to perform these steps, but it hides the details and simply says Opening page www.google.com.

So what do these messages mean? A lot! Each message denotes one or more of the various steps involved in transferring the webpage from the server (the computer which contains the page you are requesting) to the client. (the computer which is requesting the page, i.e. your computer)

These steps are not unique to webpage retrieval - they are the basic steps in almost every network service, including e-mail, file download and instant messaging. By learning what these messages mean, you gain a broad understanding of the working of the entire Internet!

Before proceeding, let us define an IP address.

An IP address is a dotted number of the form a.b.c.d where each of the letters can take a value in the range 0 to 255. Every computer on a network has a unique IP address by which it is identified. All communication between computers on the network takes place based on the computers' IP addresses. Examples of IP addresses are 64.233.187.99 and 192.168.0.1.

Now we are ready to look into each of these messages and find out what they mean.

Step 1 - Name Resolution

Although it is easier for humans to remember domain names such as google.com and yahoo.com, computers are more efficient at handling numbers and hence, they need the IP address of a computer to communicate with it.

This calls for a DNS server to translate domain names into IP addresses. This process is known as name resolution and this is what happens when your browser says Looking up www.google.com. The IP address of the DNS server itself has to be provided directly, and this is generally done by your ISP. (Internet Service Provider)

Step 2 - Connecting to the server

Once name resolution is complete, your browser next connects to the target server. These two processes are independent of each other, i.e. the DNS server returns the IP address to your computer and then your computer again initiates a connection using this IP address.

When you enter the website address into your browser, you will notice that it automatically prefixes http:// if it is not already present. HTTP stands for Hyper-Text Transfer Protocol and it is the default protocol for transferring webpages.

A protocol is like a language that computers use to communicate with each other. A protocol defines a common set of rules that computers follow so that they can understand each other.
A port is a number assigned to a service running on a server. Standard port numbers are assigned to various protocols, such as 80 for HTTP and 21 for FTP. For further details on ports and client/server communication, see the section on Sockets.

Whereas HTTP is used for transferring the webpage, another protocol called TCP (Transmission Control Protocol) is used to connect to the server. The browser uses TCP to connect to port 80 of the target server. (80 is the default port for HTTP)

Step 3 - Transferring data

The final step uses HTTP commands to transfer the required data from the server to your computer. A simple command, or HTTP verb as it is known, to retrieve a webpage, looks like:

	GET http://www.google.com/ HTTP/1.0

The server responds with various HTTP headers containing information about the page, followed by the actual content of the webpage. A sample response looks like:

	HTTP/1.0 200 OK
	Cache-Control: private
	Content-Type: text/html
	Server: GWS/2.1
	Date: Sat, 03 Mar 2007 06:32:10 GMT
	Connection: Close

	<html>
	... (webpage contents here)
	</html>

Using the ping command

ping is a command-line tool used to check whether another computer is reachable from your own. You can use the command either from the Windows Command Prompt or from a Linux terminal. The following shows the sample output under Windows:

	C:\>ping www.google.com

	Pinging www.l.google.com [216.239.59.103] with 32 bytes of data:

	Reply from 216.239.59.103: bytes=32 time=256ms TTL=239
	Reply from 216.239.59.103: bytes=32 time=258ms TTL=239
	Reply from 216.239.59.103: bytes=32 time=258ms TTL=239
	Reply from 216.239.59.103: bytes=32 time=259ms TTL=239

	Ping statistics for 216.239.59.103:
	    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
	Approximate round trip times in milli-seconds:
	    Minimum = 256ms, Maximum = 259ms, Average = 257ms

You can give either a domain name or an IP address to the ping command. In case you pass a domain name, it is first resolved to the IP address.

The output gives various details about the remote computer:

  1. The IP address of the computer
  2. Whether the remote host is currently reachable, and if so, the time in milliseconds taken for the message to reach the remote computer and back to you

Thus, the ping tool is extremely valuable for network diagnostics. Note, however, that it is possible for the remote computer to disable ping requests, and hence failure of ping does not necessarily mean that the host is unreachable.

Sockets

Simply put, a socket is an end-point of network communication. All forms of network communication require sockets.

End-users are usually not bothered about sockets. It is generally the programmer who uses sockets to implement network services. The functions related to socket creation, connection and data transfer over sockets are provided by the operating system.

The two protocols commonly used with sockets are:

  1. TCP (Transmission Control Protocol)
  2. UDP (User Datagram Protocol)

Sockets using TCP (connection-oriented sockets)

Sockets using TCP are also known as connection-oriented sockets because they require a connection to be made between the two computers before data can be transmitted. In order to do this, one of the computers acts as a server and the other acts as a client. Note that although one computer is called a server, there is no distinction between the two once the connection is established - data can flow in either direction.

Step 1: On the server side, a new socket is created and put into listening mode, wherein the socket waits for a connection (listens) on a specified port number. This port number is used to identify the service that the server is providing. For eg., HTTP uses port 80.

Step 2: The client, after creating a socket, issues a connect request to the server by specifying its hostname or IP address. The listening socket on the server then accepts this request and the connection is established. Once the connection is established, data can flow from either computer to the other.

Since data is sent in small chunks (packets) at a time, these individual chunks may arrive in any order on the other computer. TCP makes sure that these packets are reassembled in the correct order on the target computer, and also requests the sender to resend packets that are lost in transit. Hence, TCP is more reliable than UDP, which is discussed in the following section.

Sockets using UDP (connection-less sockets)

Sockets using UDP are known as connection-less sockets. Unlike sockets using TCP, these sockets do not establish a connection with the other end. Hence, the mode of operation for UDP sockets is comparatively simple.

Sending data: Once the UDP socket has been created, the computer simply specifies the destination IP address and the port number to send the data to. Because of this connection-less approach, nobody is responsible for keeping track of packets sent/received, so it is possible for packets to get lost on the route to their destination. Hence, UDP is less reliable than TCP.

Receiving data: A socket is created, specifying the port on which the computer wants to receive data. Once created, the socket can receive data on this port from any computer.

Another disadvantage of UDP arises from the fact that no connection is maintained between the computers. Because of this, there is no way of keeping track of the order in which packets are sent. Hence, packets may arrive in any order at the destination. This is why UDP is generally used only for very small amounts of data which can fit completely in a single packet.

So if TCP is better, why would one use UDP? The answer to that question comes a little later, in the next section.

Subnet Masks

A subnet mask (or simply netmask) is an address that determines the network to which an IP address belongs. The process of dividing a range of IP addresses into different networks is known as subnetting, hence the name subnet mask.

It would be easier to understand subnetting through an example. Suppose we are to set up a new network consisting of 100 computers. Now, there must be a means of identifying which network each computer belongs to, and also that all the 100 computers belong to the same network. Consider the following sample configuration for the network:

	IP addresses: 10.1.2.1 - 10.1.2.100 (100 computers)
	Subnet mask:  255.255.255.0

The network address of a network is a unique address that identifies the network as a whole. All computers on a network have the same network address, but no computer can have the network address as its IP address. The network address is obtained by performing a bitwise AND of the IP address and the subnet mask.

Although the subnet mask can contain any value from 0 to 255, it usually consists of only 0s and 255s. This makes the job of calculating the network address simpler. Whenever you see a 255, copy the corresponding field in the IP address as such, and whenever you see a 0, put a 0 in the corresponding field. Applying this technique to our sample network:

	10.1.2.1 & 255.255.255.0 = 10.1.2.0
	10.1.2.2 & 255.255.255.0 = 10.1.2.0
	etc.

We find that all the computers (10.1.2.1 - 10.1.2.100) have the same network address and hence they belong to the same network. If you try the same for the address 10.1.3.1, you will get a network address of 10.1.3.0, which means that 10.1.3.1 belongs to a different network.

One more thing to note is that the network 10.1.2.0 with a subnet mask of 255.255.255.0 can have a maximum of 256 IP addresses. (10.1.2.0 - 10.1.2.255) Anything other than this will belong to a different network. Suppose we change the subnet mask to 255.255.0.0, then the network can have 256 x 256 different addresses. The network address correspondingly changes to 10.1.0.0. Thus, the subnet mask also decides the maximum number of computers on the network.

Network Broadcasts

A broadcast message is a special kind of message that reaches every computer on the network.

Every network has a broadcast address, which is formed by replacing the 0s at the end of the network address with 255s. (the number of 255s depends on the subnet mask) For example, for the network 10.1.0.0 with subnet mask 255.255.0.0, the broadcast address is 10.1.255.255. For the same network with subnet mask 255.255.255.0, the broadcast address will be 10.1.0.255.

Any message sent to the broadcast address of a network is forwarded to all computers on the network. Only UDP can be used to send broadcast messages. TCP cannot be used since it requires a connection with a single computer.

Routers

A router is a network device that interconnects two or more computer networks. Consider the following set-up of 3 networks and a router:

Sample Network Setup

The router itself belongs to all 3 networks, i.e. it has an IP address on each network. Suppose the router's IP address on each network is:

	Network 1 - 10.0.1.1
	Network 2 - 10.1.0.1
	Network 3 - 10.2.1.1

These addresses are known as interfaces to the router. For convenience, we usually assign the first address on the network to the router.

Since a computer can only communicate with another computer on the same network, whenever it wants to access an external network, it does so through the router. Hence, the router is also known as a gateway.

For example, if a computer on network 1 wants to send data to a computer on network 3, it does so through 10.0.1.1. The router then passes on this data to its other interface, namely 10.2.1.1, and from there, the data directly reaches the destination on network 3. This process is known as routing.

To achieve routing, a router uses what is known as a routing table. The routing table for our sample network would look like:

DestinationNetmaskInterface
10.0.1.0255.255.255.010.0.1.1
10.1.0.0255.255.0.010.1.0.1
10.2.1.0255.255.255.010.2.1.1

The table is quite self-explanatory. It simply says: "send all packets with this network as their destination through this interface."


Last modified on June 13, 2007