[Network]TCP/IP Protocol

22 Nov 2023

Reading time ~37 minutes

1.The TCP/IP Architecture

The PDUs exchanged by the peer TCP protocols are called TCP segments or segments
The PDUs exchanged by UDP protocols are called UDP datagrams or datagrams.
The PDUs exchanged by IP protocols are called IP packets or packets.

TCP/IP Protocol suits.

IP multiplexes TCP segments and UDP datagrams and performs fragmentation. IP packets are sent to the network interface for delivery across the physical network.

At the receiver: (1) packets passed up by the network interface are demultiplexed to the appropriate protocol(IP, ARP, RARP).

(2) The receiving IP entity determines whether a packet should be sent to TCP or UDP.

(3) The TCP(UDP) sends each segment(datagram) to the appropriate application based on the port number. such as Ethernet, Token Ring, ATM, PPP.

The logic addresses need to be converted into specific physical addresses to carry out the transfer of bits from one device to the other. This conversion is done by an address resolution protocol.

Each host in the Internet is identified by a globally unique IP address.

An IP address is divided into two parts:

a network ID
a host ID

The Internet layer provides for the transfer of information across multiple networks through the routers.

At each router the network interface layer is used to encapsulate the IP packet into a packet or frame of the underlying network or link.
The IP packet is recovered at an exit router of the given network. This router must determine the next hop in the route to the destination and then encapsulate the IP packet into the frame of the type of the next network or link.

To enhance the scalability of the routing algorithms and to control the size of a routing tables, additional levels of hierarchy are introduced in the IP addresses. Within a domain the host address is further subdivided into a subnetwork part and an associated host part.

Addresses of multiple domains can be aggregated to create supernets.

Multi Media.

2.The Internet Protocol

IP corresponds to the network layer in the OSI reference model and provides a connectionless best-effort delivery service to the transport layer.

best-effort: IP will try its best to forward packets to the destination, but does not guarantee that a packet will be delivered to the destination.

2.1 IP Packet

The header has a fixed-length component of 20 bytes plus a variable-length component consisting of options that can be up to 40 bytes. IP packets are transmitted according to network byte order in the following groups: bits 0–7, bits 8–15, bits 16–23, and finally bits 24–31 for each row

Version: The version field indicates the version number used by the IP packet so that revisions can be distinguished from each other.
Internet Header Length: The Internet Header Length(IHL) specifies the length of the header in 32-bit words. If no options are present, IHL will have a value of 5
Type of Service: The type of service (TOS) field traditionally specifies the priority of the packet based on delay, throughput, reliability, and cost requirements.
- Three bits are assigned for priority levels (called “precedence”)
- Four bits for the specific requirement (i.e., delay, throughput, reliability, and cost).
- delay bit to one to get the high priority.
Total Length: The total length specifies the number of bytes of the IP packet including header and data. with 16 bits assigned, the maximum packet length is 65535 bytes. However, the Ethernet limits(Physical Networks) has limitation of 1500 bytes.
Identification, Flags, and Fragment Offset: These fields are used for fragmentation and reassembly.
Time to live: The time-to-live (TTL) field is defined to indicate the amount of time in seconds the packet is allowed to remain in the network. However, most routers interpret this field to indicate the number of hops the packet is allowed to traverse in the network.
- this field prevents packets from wandering aimlessly in the Internet
Protocol: The protocol field specifies the upper-layer protocol that is to receive the IP data at the destination host. Examples of the protocols include TCP (protocol = 6), UDP (protocol = 17), and ICMP (protocol = 1).
Header checksum: The header checksum field verifies the integrity of the header of the IP packet.
Source IP address and destination IP address: These fields contain the addresses of the source and destination hosts.
Options: The options field, which is of variable length, allows the packet to request special features such as security level, route to be taken by the packet, and timestamp at each router.
Padding: This field is used to make the header a multiple of 32-bit words.

The IP packet is passed to the Router, the following process take place:

(1) The header checksum is computed and the fields in the headerr (version and total length) are checked to see if they contain valid values.

(2) the router identifies the next hop for the IP packet by consulting its routing table.

(3) Update the TTL and header checksum fields. The IP packet is then forwarded along the next hop.

2.2 IP Addressing

A node (such as a router or a multihomed host) may have multiple network interfaces with each interface connected to a different network.

An IP address has a fixed length of 32 bits.
The network ID identifies the network the host is connected to. , all hosts connected to the same network have the same network ID.
The host ID identifies the network connection to the host rather than the actual host.

An implication of this powerful aggregation concept is that a router can forward packets based on the network ID only, thereby shortening the size of the routing table significantly.

Class A addresses have seven bits for network IDs and 24 bits, for host IDs, allowing up to 126 networks and about 16 million hosts per network.
Class B addresses have 14 bits for network IDs and 16 bits for host IDs, allowing about 16,000 networks and about 64,000 hosts for each network.
Class C addresses have 21 bits for network IDs and 8 bits for host IDs, allowing about 2 million networks and 254 hosts per network.
Class D addresses are used for multicast services that allow a host to send information to a group of hosts simultaneously.
Class E addresses are reserved for experiments.

2.3 Subnet Addressing

Problem: With the original addressing scheme, it would be a gigantic task for the local network administrator to manage all 64,000 hosts.

Solution: Subnetting. Add another hierarchical level called ‘subnet’.

Subnet mask: the router needs to store an additional quantity to find the subnet number.

Calculation Process:

(1) The IP Address is:

10010110 01100100 00001100 10110000

(2) The Subnet mask is:

11111111 11111111 11111111 10000000

(3) The subnet number can be determined by performing a binary AND between the subnet mask and the IP address.

10010110 01100100 00001100 10000000

The Router Process Example

(1) a Packet having a destination IP address of 150.100.15.11 arrives from the outside network R1 has to know the next-hop router to send the packet to.

(2) Change the IP address to binary string, and apply the subnet mask. Finally get the subnet binary string.

(3) Router R1 looks up this subnet number in its routing table, and findsthe corresponding entry to specify the next-hop router address for R2.

(4) When R2 recieves the packet, R2 performs the same process and finds out that the destination host is connected to one of its network interfaces.

Special IP address.

2.4 IP Routing

The IP layer in the end-system hosts and in the routers work together to route packets from IP network sources to destinations. The IP layer in each host and router maintains a routing table that it uses to determine how to handle each IP packet.

If its routing table indicates that the destination host is directly connected to the originating host by a link or by a LAN, then the IP packet is sent directly to the destination host using the appropriate network interface.
Otherwise, the routing table typically specifies that the packet is to be sent to a default router that is directly connected to the originating host.

Now consider the action of a router. When a router receives an IP packet from one of the network interfaces:

the router examines its routing table to see whether the packet is destined to itself, and if so, delivers the packet to the appropriate higher-layer protocol.
If the destination IP address is not the router’s own address, then the router determines the next-hop router and the associated network interface, and then forwards the packet.

Each row in the routing table must provide the following information:

(1) destination IP address

(2) IP address of next-hop router

(3) several flag fields

(4) outgoing network interface

(5) other information such as subnet mask, physical address, and statistics information.

the H flag indicates whether the route in the given row is to a host (H = 1) or to a network (H = 0).
The G flag indicates whether the route in the given row is to a router (gateway; G = 1) or to a directly connected destination (G = 0).

The routing table is searched in the following order:

(1) The destination column is searched to see whether the table contains an entry for the complete destination IP address. If so, then the IP packet is forwarded according to the next-hop entry and the G flag.

(2) If the table does not contain the complete destination IP address, then the routing table is searched for the destination network ID. If an entry is found, the IP packet is forwarded according to the next-hop entry and the G flag.

(3) If the table does not contain the destination network ID, the table is searched for a default router entry, and if one is available, the packet is forwarded there.

(4) If none of the above searches are successful, then the packet is declared undeliverable and an ICMP “host unreachable error” packet is sent back to the originating host.

2.5 Classless Interdomain Routing(CIDR)

CIDRE: An arbitrary prefix length to indicate the network number. Using a CIDR notation, a prefix 205.100.0.0 of length 22 is written as 205.100.0.0/22.

With CIDR, packets are routed according to the prefix of the IP address without distinguishing different address classes.

The entries in a CIDR routing table contain a 32-bit IP address and a 32-bit mask.

CIDR enables a technique called supernetting to allow a single routing entry to cover a block of classful addresses.

Example:

(1) the original contiguous four class c entries:

Class C address 205.100.0.0 = 11001101 01100100 00000000 00000000

Class C address 205.100.1.0 = 11001101 01100100 00000001 00000000

Class C address 205.100.2.0 = 11001101 01100100 00000010 00000000

Class C address 205.100.3.0 = 11001101 01100100 00000011 00000000

(2) the route aggregation:

Mask 255.255.252.0 = 11111111 11111111 11111100 00000000

Supernet address 205.100.0.0 = 11001101 01100100 00000000 00000000

The use of variable-length prefixes requires that the routing tables be searched to find the longest prefix match.

2.5 Address Resolution Protocol(ARP)

In reality IP packets must eventually be delivered by the underlying network technology, which uses its own addressing scheme.

Ethernet is the most common underlying network technology that IP runs on
the Ethernet hardware can understand only its own 48-bit MAC address format

Address Resolution Protocol.

Address Resolution Protocol ARP: Question: Suppose H1 wants to send an IP packet to H3 but does not know the MAC address of H3.

Solution:

(1) H1 first broadcasts an ARP request packet asking the destination host, which is identified by H3’s IP address, to reply. All hosts in the network receive the packet, but only the intended host, which is H3, responds to H1.

(2) The ARP response packet contains H3’s MAC and IP addresses.

From now on H1 knows how to send packets to H3.

(3) H1 caches H3’s IP and MAC addresses in its ARP table so that H1 can simply look up H3’s MAC address in the table for future use.

(4) Each entry in the ARP table is usually “aged” so that the contents are erased if no activity occurs within a certain period, typically around 5 to 30 minutes.

2.6 Reverse Address Resolution

Question: In some situations a host may know its MAC address but not its IP address.

Solution:

(1) To obtain its IP address, the host first broadcasts an RARP request packet containing its MAC address on the network.

(2) All hosts on the network receive the packet, but only the server replies to the host by sending an RARP response packet containing the host’s MAC and IP addresses.

(3) One limitation with RARP is that the server must be located on the same physical network as the host.

2.7 Fragementation and Reassembly

One of the strengths of IP is that it can work on a variety of physical networks.

Each physical network usually imposes a certain packet-size (the maximum transmission unit (MTU)).
- Ethernet specifies an MTU of 1500 bytes
- FDDI specifies an MTU of 4464 bytes.
When IP has to send a packet that is larger than the MTU of the physical network, IP must break the packet into smaller fragments whose size can be no larger than the MTU.

Packet Fragmentation.

If the MTU of some other network downstream is found to be smaller than the fragment size, the fragment will be broken again into smaller fragments.

The destination IP is the only entity that is responsible for reassembling the fragments into the original packet.

To reassemble the fragments, the destination waits until it has received all the fragments belonging to the same packet.

If one or more fragments are lost in the network, the destination abandons the reassembly process and discards the rest of the fragments.
To detect lost fragments, the destination host sets a timer once the first fragment of a packet arrives.
If the timer expires before all fragments have been received, the host assumes the missing fragments were lost in the network and discards the other fragments.

Three fields in the IP header (identification, flags, and fragment offset) have been assigned to manage fragmentation and reassembly:

The identification field is used to identify which packet a particular fragment belongs to so that fragments for different packets do not get mixed up.
The source host must not repeat the identification value of the packet destined to the same host until a sufficiently long period of time has passed.
The flags field has three bits:
- If the “don’t fragment” DF bit is set to 1, it forces the router not to fragment the packet. If the packet length is greater than the MTU, the router will have to discard the packet and send an error message to the source host.
- The “more fragment”MF bit tells the destination host whether or not more fragments follow.
- The fragment offset field identifies the location of a fragment in a packet.

Example.

2.8 ICMP: Error and Control Messages

The Internet Control Message Protocol (ICMP) is the protocol that handles error and other control messages.

Although ICMP messages are encapsulated by IP packets (with protocol number 1), ICMP is considered to be in the same layer as IP.

ICMP basic error message format.

Basic error messages generally follow the format:

Type: This field identifies the type of the message.
Code: Foragiven type, the code field describes the purpose of the message.
Checksum: This field is used to detect errors in the ICMP message.
IP header plus original datagram: This field can be used for diagnostic purposes by matching the information (e.g., port numbers) in the ICMP message with the original data in the IP packet.

Type3: Destination Unreachable. The following codes:

0 Network Unreachable
1 Host Unreachable
2 Protocol Unreachable
3 Port Unreachable
4 Fragmentation needed and DF set
5 Source route failed

Type11: Time exceeded problem

0: TTL value exceeded
1: Fragment reassembly time exceeded

3. User Datagram Protocol

The User Datagram Protocol (UDP) is an unreliable, connectionless transport layer protocol. Provides two services beyond IP protocol:

demultiplexing
error checking on data

The IP and UDP responsibility:

IP knows how to deliver packets to a host.
- IP checks only the integrity of its header.
UDP adds a mechanism that distinguishes among multiple applications in the host.
- UDP can optionally check the integrity of the entire UDP datagram.

Usage: Applications that do not require zero packet loss such as in packet voice systems are well suited to UDP. In practice, applications that use UDP include Trivial File Transfer Protocol, DNS, SNMP, and Real-Time Protocol (RTP).

UDP frame format.

The destination port allows the UDP module to demultiplex datagrams to the correct application in a given host.
The source port identifies the particular application in the source host to receive replies.
The UDP length field indicates the number of bytes in the UDP datagram (including header and data).
The UDP checksum field detects errors in the datagram, and its use is optional.

3.1 Check Sum Algorithm

If a source host does not want to compute the checksum, the checksum field should contain all 0s so that the destination host knows that the checksum has not been computed.

The checksum computation procedure is similar to that in computing IP checksum except for two new twists.

if the length of the datagram is not a multiple of 16 bits, the datagram will be padded out with 0s to make it a multiple of 16 bits. the actual UDP datagram is not modified. The pad is used only in the checksum computation and is not transmitted.
UDP adds a pseudoheader (shown in Figure 8.25) to the beginning of the datagram when performing the checksum computation

UDP pseudo header.

The pseudoheader is also created by the source and destination hosts only during the checksum computation and is not transmitted.

The pseudoheader is to ensure that the datagram has indeed reached the correct destination host and port.
Finally, if a datagram is found to be corrupted, it is simply discarded and the source UDP entity is not notified.

4. Transmission Control Protocol

TCP [RFC 793] and IP are the workhorses of the Internet. The Transmission Control Protocol (TCP) provides a logical full-duplex (two-way) connection between two application layer processes across the Internet.

TCP provides these application processes with a connection-oriented, reliable, in-sequence, byte-stream service.

flow control: a TCP receiver to control the rate at which the sender transmits information so that the receiver buffers do not overflow.
congestion control: senders to reduce the rate at which they send packets when there is congestion in the routers.
TCP provides reliable byte stream service using a form of Selective Repeat ARQ.

4.1 TCP Operation and Reliable Stream Service

A TCP connection goes through the three phases of a connection-oriented service.

(1) The TCP connection establishment phase

This phase sets up a connection between the two application processes by creating and initializing variables that are used in the protocol.

These variables are stored in a connection record that is called the transmission control block (TCB).

(2) The Data Transfer Phase

This phase delivers data over each direction in the connection correctly and in sequence.

TCP was designed to operate over IP and does not assume that the underlying network service is reliable.

Selective Repeat ARQ.

(3) Connection Termination Phase

Each direction of the connection is terminated independently, allowing data to continue flowing in one direction after the other direction has been closed.

A TCP connection is uniquely identified by four parameters:

the sender IP address and port number
the destination IP address and port number.

An end system can therefore support multiple simultaneous TCP connections.

Typically the server is assigned a well-known port number and the client is assigned an ephemeral port number that is selected from a pool of available numbers when the connection is set up.

TCP preview.

The application layer writes the data it needs to transmit into a buffer. TCP does not preserve message boundaries and treats the data it gets from the application layer as a byte stream.

TCP may split or combine the application information and pack it into segments in the way it finds most appropriate for transfer over the underlying network.
The TCP transmitter arranges a consecutive string of bytes into a segment.
The segment contains a header with address information that enables the network to direct the segment to its destination application process.
The segment contains a sequence number that corresponds to the number of the first byte in the string that is being transmitted.

The receiver receives data process:

The TCP receiver performs an error check on each segment it receives. If the segment is error-free and is not a duplicate segment, the receiver inserts the bytes into the appropriate locations in the receive buffer if the bytes fall within the receive window.
The receiver will accept out-of-order but error-free segments, so the receive buffer can have gaps where bytes have not been received.
The receiver keeps track of the oldest byte it has not yet received.
It sends the sequence number of this byte in the acknowledgments it sends back to the transmitter.
The receive window slides forward whenever the desired next oldest byte is received.

Time-out mechanism to trigger retransmission when a segment is not acknowledged within a certain time.

The TCP transmitter send window slides forward whenever the acknowledgments for pending bytes are received.

TCP was designed to deliver a connection-oriented service in the Internet environment, where different IP packets may traverse a different path from the same source to the same destination and may therefore arrive out of order.

Questions: duplicate segments, it is possible for old segments from previous connections to arrive at a receiver.

Solution:

using long sequence numbers (32 bits) and by establishing randomly selected initial sequence numbers during connection setup.
TCP enforces a time-out period at the end of each connection to allow the network to clear old segments from the network.

TCP separates the flow control function from the acknowledgment function. The flow control function is implemented through an advertised window field in the segment header.

4.2 TCP Protocol

TCP segment.

TCP segment header consists of a 20-byte fixed part plus a variable-size options field.

(1) Source port and destination port: The source and destination ports identify the sending and receiving applications, respectively.

(2) Sequence number: The 32-bit sequence number field identifies the position of the first data byte of this segment in the sender’s byte stream during data transfer (when SYN bit is not set). The sequence number wraps back to 0 after 2^32 − 1.

If the SYN bit is set to 1 (during connection establishment), the sequence number indicates the initial sequence number (ISN) to be used in the sender’s byte stream.

The sequence number of the first byte of data for this byte stream will be ISN + 1.

(3) Acknowledgment number: This field identifies the sequence number of the next data byte that the sender expects to receive if the ACK bit is set.

This field also indicates that the sender has successfully received all data up to but not including this value.
If the ACK bit is not set (during connection establishment), this field is meaningless. Once a connection is established, the ACK bit must be set.

(4) Header length: This field specifies the length of the TCP header in 32-bit words. This information allows the receiver to know the beginning of the data area because the options field is variable length.

(5) Reserved: As the name implies, this field is reserved for future use and set to 0.

(6) URG: If this bit is set, the urgent pointer is valid.

(7) ACK: If this bit is set, the acknowledgment number is valid.

(8) PSH: When this bit is set, it tells the receiving TCP module to pass the data to the application immediately. Otherwise, the receiving TCP module may choose to buffer the segment until enough data accumulates in its buffer.

(9) RST: When this bit is set, it tells the receiving TCP module to abort the connection because of some abnormal condition.

(10) SYN: This bit requests a connection.

(11) FIN: When this bit is set, it tells the receiver that the sender does not have any more data to send. The sender can still receive data from the other direction until it receives a segment with the FIN bit set.

(12) Window size: The window size field specifies the number of bytes the sender is willing to accept. This field can be used to control the flow of data and congestion.

(13) Checksum: This field detects errors on the TCP segment. The procedure is discussed below.

(14) Urgent pointer: When the URG bit is set, the value in the urgent pointer field added to that in the sequence number field points to the last byte of the “urgent data” (data that needs immediate delivery).

However, the first byte of the urgent data is never explicitly defined. Because the receiver’s TCP module passes data to the application in sequence, any data in the receiver’s buffer up to the last byte of the urgent data may be considered urgent.

(15) Options: The options field may be used to provide other functions that are not covered by the header. If the length of the options field is not a multiple of 32 bits, extra padding bits will be added.

The most important option is used by the sender to indicate the maximum segment size (MSS) it can accept. This option is specified during connection setup. T
The window scale option allows the use of a larger advertised window size. The window can be scaled upward by a factor of up to 2^14.
The timestamp option is intended for high-speed connections where the sequence numbers may wrap around during the lifetime of the connection.

TCP pseudoheader.

4.2.1 TCP Checksum

The purpose of the TCP checksum field is to detect errors. The checksum computation procedure is similar to that used to compute an IP checksum except for two features

if the length of the segment is not a multiple of 16 bits, the segment will be padded with zeros to make it a multiple of 16 bits.
a pseudoheader (shown in Figure 8.28) is added to the beginning of the segment when performing the checksum computation.

The pseudoheader is created by the source and destination hosts during the checksum computation and is not transmitted. This mechanism ensures the receiver that the segment has indeed reached the correct destination host and port and that the protocol type is TCP (which is assigned the value 6).

4.2.2 Connection Establishment

(1) Host A sends a connection request to host B by setting the SYN bit. Host A also registers its initial sequence number to use (Seq no = x).

(2) Host B acknowledges the request by setting the ACK bit and indicating the next data byte to receive (Ack no = x + 1). The “plus one” is needed because the SYN bit consumes one sequence number. At the same time, host B also sends a request by setting the SYN bit and registering its initial sequence number to use (Seq no = y).

(3) Host A acknowledges the request from B by setting the ACK bit and confirming the next data byte to receive (Ack no = y + 1). Note that the sequence number is set to x + 1. On receipt at B the connection is established.

3 times hand shakes.

If during a connection establishment phase, one of the hosts decides to refuse a connection request, it will send a reset segment by setting the RST bit.

Each SYN message can specify options such as maximum segment size, window scaling, and timestamps.

Because TCP segments can be delayed, lost, and duplicated, the initial sequence number should be different each time a host requests a connection.

4.2.3 Data Transfer

To provide a reliable delivery service to applications, TCP uses the Selective Repeat ARQ protocol with positive acknowledgment implemented by a sliding-window mechanism.

The difference here is that the window slides on a byte basis instead of on a packet basis.
TCP can also apply flow control over a connection by dynamically advertising the window size.

Flow control is the process of regulating the traffic between two points and is used to prevent the sender from overwhelming the receiver with too much data.

TCP window flow control.

At first transmission, the advertised window allows host A to transmit up to 2048 bytes of unacknowledged data.

When the segment arrives, host B chooses to delay the acknowledgment in the hope that the acknowledgment can ride freely with the data.

At this time, host B finds out that it can allocate only 512 bytes of receive buffer space for this connection because other connections are also competing for the precious memory. So it shrinks the advertised window from 2048 bytes to 512 bytes. When host A receives the segment, the host changes its sending window to 512 bytes.

TCP can delay transmission:

the acknowledgments can be piggybacked to the data segment.
reduce bandwidth waste.

the Nagle algorithm: reduce the waste:

When an interactive application wants to send a character, the TCP module transmits the data and waits for the acknowledgment from the receiver.
In the meantime, if the application generates more characters before the acknowledgment arrives, TCP will not transmit the characters but buffer them instead.
When the acknowledgment eventually arrives, TCP transmits all the characters that have been waiting in the buffer in a single segment.

In LAN environment where delay is relatively small and bandwidth is plentiful, the acknowledgment usually comes back before another character arrives from the application. Thus the Nagle algorithm is essentially disabled.

In the WAN environment where acknowledgments can be delayed unpredictably, the algorithm is self-adjusting.

Question: silly window syndrome. Another problem that wastes network bandwidth occurs when the sender has a large volume of data to transmit and the receiver can only deplete its receive buffer a few bytes at a time. Sooner or later the receive buffer becomes full.

Solution:

It can be avoided by having the receiver not advertise the window until the window size is at least as large as half of the receive buffer size, or the maximum segment size.
The sender side can cooperate by refraining from transmitting small segments.

Maximum segment size(MSS) increases, the bandwidth efficiency decreases.

4.2.4 Connection Termination

TCP provides for a graceful close that involves the independent termination of each direction of the connection.

A termination is initiated when an application tells TCP that it has no more data to send. The TCP entity completes transmission of its data and, upon receiving acknowledgment from the receiver, issues a segment with the FIN bit set.

TCP graceful close.

(1) Upon receiving a FIN segment, a TCP entity informs its application that the other entity has terminated its transmission of data.

(2) Host B sends an ACK segment to acknowledge receipt of the FIN segment from A. Note that the FIN segment uses one byte, so the ACK is 5087 in the example.

(3) The direction of the flow from B to A is still open. host B sends 150 bytes in one segment, followed later by a FIN segment. Host A sends an acknowledgment.

(4) The TCP in host A then enters the TIME WAIT state and starts the TIME WAIT timer with an initial value set to twice the maximum segment lifetime (2MSL).

The only valid segment that can arrive while host A is in the TIME WAIT state is a retransmission of the FIN segment from host B (for example, if host A’s ACK was lost, and host B’s retransmission time-out has expired). If such a FIN segment arrives while host A is the TIME WAIT state, then the ACK segment is retransmitted and the TIME WAIT timer is restarted at 2MSL.

(5) When the TIME WAIT timer expires, host A closes the connection and then deletes the record of the connection.

The TIME WAIT state protects future incarnations of the connection from delayed segments.

The first MSL accounts for the maximum time a segment in one direction can remain in the network
the second MSL allows for the maximum time a reply in the other direction can be in the network.

TCP provides for an abrupt connection termination through reset (RST) segments.

An RST segment is a segment with the RST bit set.

If an application decides to terminate the connection abruptly, it issues an ABORT command, which causes TCP to discard any data that is queued for transmission and to send an RST segment. A TCP module that receives the RST segment then notifies its application process that the connection has been terminated.

example:

an RST segment is sent when a connection request arrives for an application process that is not listening on the given port.
an application in host A may crash. The TCP module in host B, unaware of the crash, may then send a segment.

4.2.5 State Transition Diagram

TCP state machine.

4.3 TCP Congestion Control

The advertised window does not prevent the buffers in the intermediate routers from overflowing, hence the congestion situation.

Because the IP layer does not implement a mechanism to control congestion, it is up to the higher layer to detect congestion and take proper actions.

TCP conjestion control.

The objective of conjestion flow have each sender transmit just the right amount of data to keep the network resources utilized but not overloaded.

Solution: the TCP protocol defines another window, called the congestion window, which specifies the maximum number of bytes that a TCP sender is allowed to transmit with the assumption that congestion will not be triggered with the given amount of data.

The TCP congestion control algorithm at the sender dynamically adjusts the congestion window according to the current condition of the network.

The operation of the TCP congestion control algorithm may be divided into three phases:

(1) Slow Start

Run when the algorithm begins a data transfer or when the algorithm restarts after recovery from segment loss.

Slow start is accomplished by first setting the congestion window to a small value, 1 MSS.

Each time the sender receives an ACK from the receiver, the sender increases the congestion window by one segment (1 MSS).

the congestion window size grows exponentially during the slow-start phase.

The reason for the exponential increase is that slow start needs to fill the pipe as quickly as possible to utilize network resources maximally.

(2) Congestion Avoidance

This phase assumes that the pipe is approaching full utilization. It is thus wise for the algorithm to reduce the rate of increase so that it will not overshoot the pipe capacity. Specifically, the algorithm increases the congestion window linearly rather than exponentially during congestion avoidance.

The congestion window stops increasing when TCP detects that the network is congested.

(3) Conjestion Threshold

TCP sets the congestion threshold to one-half the current congestion window for probing the pipe next time.

Then the congestion window is set to one maximum-sized segment, and the algorithm restarts, using the slow start technique.

When the TCP sender receives three duplicate ACKs, it performs fast retransmit by immediately retransmitting the lost segment. The sender then performs fast recovery by first setting the congestion threshold as explained earlier.

Then the congestion window is set to the congestion threshold plus three MSSs to account for the additional three segments that have caused the duplicate ACKs.
The TCP sender then continues in the congestion avoidance phase.

One basic assumption underlying TCP congestion control is that segment loss is due to congestion rather than errors. it should be noted that this assumption might not be valid in a wireless network where transmission errors can be relatively high.

5. DHCP and NAT

5.1 Dynamic Host Configuration Protocol

The Dynamic Host Configuration Protocol (DHCP) automatically configures hosts that connect to a TCP/IP network.

Process:

When a host wishes to obtain an IP address, the host broadcasts a DHCP Discover message in its physical network.
The server in the network may respond with a DHCP Offer message that provides an IP address and other configuration information. Several servers may reply to the host, so the host selects one of the offers and broadcasts a DHCP Request message that includes the ID of the server.
The selected server then allocates the given IP address to the host and sends a DHCP ACK message assigning the IP address to the host for some period of lease time T. The message also includes two time thresholds T1 (usually = .5T) and T2 (usually .875T).
- When time T1 expires the host attempts to extend the lease period by sending a DHCP Request to the original server. If the host gets the corresponding ACK message, it will also receive new values for T, T1, and T2.
- If the host does not receive an ACK by time T2, then it broadcasts a DHCP Request to any server on the network.
- If the host does not receive an ACK by time T, then it must relinquish the IP address and begin the DHCP process from scratch.

5.2 Network Address Translation

Network address translation (NAT) refers to a method for mapping packets generated by machines in a private network into packets that can traverse the global Internet.

NAT also transfers packets arriving from the global Internet to the appropriate destination machine in the private network.
In particular, NAT allows a number of machines in a private network to share a limited number of registered IP addresses.

There are small networks, such as home networks, that uses Intranet addresses. The Intranet addresses are not recognized in the Internet. Such a small network may be accessing to the Internet through a single router that performs address translation, referred to as NAT router.

Let us say the Intranet host would like to download a web page from the Internet. The Intranet host sends a request to the NAT router with its Intranet IP address (10.0.0.1) and port number (3345). NAT router replaces host address with its IP address (138.76.29.7) and assigns a new port number (5001) and this information is stored in NAT translation table.

When the server returns the web page, the NAT router performs the reverse operation so that response is received by the Intranet host.

6. The Internet Routing Protocols

The objective of Routing Protocols is making routing scalable.

(1) scale: billions of destinations:

can’t store all destinations in routing tables!
routing table exchange would swamp links!

(2) administrative autonomy:

Internet: a network of networks
each network admin may want to control routing in its own network

Internet approach to scalable routing is aggregate routers into regions known as “autonomous systems” (AS) (a.k.a. “domains”).

(1) Intra-AS (intra-AS)

Routing among routers within same AS(“network”).

all routers in AS must run same intradomain protocol
routers in different AS can run different intra-domain routing protocols
gateway router: at “edge” of its own AS, has link(s) to router(s) in other AS’es

(2) inter-AS (aka “inter-domain”)

routing among AS’es：

gateways perform inter-domain routing (as well as intra-domain routing)

(3) Interconnected ASes

forwarding table configured by intraand inter-AS routing algorithms

intra-AS routing determine entries for destinations within AS
inter-AS & intra-AS determine entries for external destinations

Interconnected ASes.

Question: suppose router in AS1 receives datagram destined outside of AS1, router should forward packet to gateway router in AS1, but which one?

AS1 inter-domain routing must:

learn which destinations reachable through AS2, which through AS3: Inter-AS routing
propagate this reachability info to all routers in AS1: Intra-AS routing

Inter-AS routing.

6.1 Intra-AS Routing

Intra-AS routing: most common intra-AS routing protocols:

RIP: Routing Information Protocol
- classic DV: DVs exchanged every 30 secs
- no longer widely used
EIGRP: Enhanced Interior Gateway Routing Protocol
- DV based
- formerly Cisco-proprietary for decades
OSPF: Open Shortest Path First
- link-state routing
- IS-IS protocol (ISO standard, not RFC standard) essentially same as OSPF

6.1.1 OSPF: Open Shortest Path First Routing

(1) “open”: publicly available

(2) classic link-state

each router floods OSPF link-state advertisements (directly over IP rather than using TCP/UDP) to all other routers in entire AS
multiple link costs metrics possible: bandwidth, delay
each router has full topology, uses Dijkstra’s algorithm to compute

(3) forwarding table

security: all OSPF messages authenticated (to prevent malicious intrusion)

6.1.2 Hierarchical OSPF

two-level hierarchy: local area, backbone.

link-state advertisements flooded only in area, or backbone
each node has detailed area topology; only knows direction to reach other destinations

Hierarchical OSPF.

6.2 Inter-AS Routing

routing among AS’es.

6.2.1 BGP

BGP (Border Gateway Protocol): the de facto inter-domain routing protocol.

“glue that holds the Internet together”

Allows subnet to advertise its existence, and the destinations it can reach, to rest of Internet: “I am here, here is who I can reach, and how”

BGP provides each AS a means to:

obtain destination network reachability info from neighboring ASes(eBGP)
determine routes to other networks based on reachability information and policy
propagate reachability information to all AS-internal routers (iBGP)
advertise (to neighboring networks) destination reachability info

The BGP Connection.

BGP session: two BGP routers (“peers”) exchange BGP messages over semi-permanent TCP connection:

advertising paths to different destination network prefixes (BGP is a “path vector” protocol)

when AS3 gateway 3a advertises path AS3,X to AS2 gateway 2c:

AS3 promises to AS2 it will forward datagrams towards X

Protocol Messages:

OPEN: opens TCP connection to remote BGP peer and authenticates sending BGP peer.
UPDATE: advertises new path (or withdraws old).
KEEPALIVE: keeps connection alive in absence of UPDATES; also ACKs OPEN request.
NOTIFICATION: reports errors in previous msg; also used to close connection.

BGP advertised route: prefix + attributes

prefix: destination being advertised
two important attributes:
- AS-PATH: list of ASes through which prefix advertisement has passed
- NEXT-HOP: indicates specific internal-AS router to next-hop AS

6.2.2 BGP Path Advertisement Example

Single Path

(1) AS2 router 2c receives path advertisement AS3,X (via eBGP) from AS3 router 3a

(2) based on AS2 policy, AS2 router 2c accepts path AS3,X, propagates (via iBGP) to all AS2 routers

(3) based on AS2 policy, AS2 router 2a advertises (via eBGP) path AS2, AS3, X to AS1 router 1c

The BGP Path Advertisement.

Multi Paths

gateway router may learn about multiple paths to destination:

AS1 gateway router 1c learns path AS2,AS3,X from 2a
AS1 gateway router 1c learns path AS3,X from 3a
based on policy, AS1 gateway router 1c chooses path AS3,X and advertises path within AS1 via iBGP

The BGP Path Advertisement multiple paths.

The populating forwarding tables is shown as follows:

The forwarding tables in 1d.

Populating forwarding tables.

The forwarding tables in 1a.

Populating forwarding tables.

2d learns (via iBGP) it can route to X via 2a or 2c

hot potato routing: choose local gateway that has least intra-domain cost (e.g., 2d chooses 2a, even though more AS hops to X): don’t worry about inter-domain cost!

Hot potato routing.

6.2.3 Achieving Policy via Advertisements

ISP only wants to route traffic to/from its customer networks (does not want to carry transit traffic between other ISPs – a typical “real world” policy)

(1) A advertises path Aw to B and to C

(2) B chooses not to advertise BAw to C!

B gets no “revenue” for routing CBAw, since none of C, A, w are B’s customers
C does not learn about CBAw path

(3) C will route CAw (not using B) to get to w

achieving policy via advertisements.

(4) A,B,C are provider networks

(5) x,w,y are customer (of provider networks)

(6) x is dual-homed: attached to two networks

policy to enforce: x does not want to route from B to C via x .. so x will not advertise to B a route to C

achieving policy via advertisements.

6.2.4 BGP Route Selection

Router may learn about more than one route to destination AS, selects route based on:

local preference value attribute: policy decision
shortest AS-PATH
closest NEXT-HOP router: hot potato routing
additional criteria

6.3 Differences between Intra- Inter- AS routing

In Policy:

inter-AS: admin wants control over how its traffic routed, who routes through its network
intra-AS: single admin, so policy less of an issue

In Scale:

hierarchical routing saves table size, reduced update traffic

performance:

intra-AS: can focus on performance
inter-AS: policy dominates over performance

Network