TCP vs UDP: Demystifying the Reliability vs. Efficiency Debate

Today, we're going to start by focusing on TCP. Earlier in the chapter on layering, we mentioned an important point. At the network layer and below, it's more about host to host connections, which means your computer needs to know where another computer is in order to connect to it. However, communication in a network is often interprocess communication rather than intermachine communication. Therefore, TCP protocol introduces the concept of port. A port can be occupied by only one process, which provides direct communication between application processes running on different hosts.

The task of the transport layer is how to provide direct communication services between application processes running on different hosts, so it is also known as an end-to-end protocol. The transport layer hides the core details of the network, allowing the application process to see as if there is a logical end-to-end communication channel between the two transport layer entities.

TCP stands for Transmission Control Protocol and is known as a connection-oriented protocol. This means that before one application can start sending data to the other, the two processes have to do a handshake. Handshake is a logically connected process that ensures reliable transmission and orderly reception of data. During the handshake, a connection is established between the source and destination hosts by exchanging a series of control packets and agreeing on some parameters and rules to ensure successful data transmission.

What is TCP?
TCP (Transmission Control Protocol) is a connection oriented, reliable, byte-stream based transport layer communication protocol.

Connection-oriented: Connection-oriented means that TCP communication is one-to-one, that is, point-to-point end-to-end communication, unlike UDP, which can send messages to multiple hosts at the same time, so one-to-many communication cannot be achieved.
Reliable: The reliability of TCP ensures that packets are delivered reliably to the receiver regardless of changes in the network link, which makes the protocol packet format of TCP more complex than that of UDP.
Byte-stream-based: The byte-stream-based nature of TCP allows for the transmission of messages of any size and guarantees message order: even if the previous message has not been fully received, and even if the subsequent bytes have been received, TCP will not deliver them to the application layer for processing and will automatically drop duplicate packets.
Once host A and host B have established a connection, the application only needs to use the virtual communication line to send and receive data, thus ensuring data transmission. The TCP protocol is responsible for controlling tasks such as connection establishment, disconnection, and holding. It should be noted that here we say the virtual line only means to establish a connection, TCP protocol connection only indicates that the two sides can start data transmission, and to ensure the reliability of the data. The routing and transport nodes are handled by the network devices; the TCP protocol itself is not concerned with these details.

A TCP connection is a full-duplex service, which means that host A and host B can transmit data in both directions in a TCP connection. That is, data can be transferred between host A and host B in a bidirectional flow.

TCP temporarily stores data in the connection's send buffer. This send buffer is one of the caches set up during the three-way handshake. Subsequently, TCP will send the data in the send cache to the receive cache of the destination host at the appropriate time. In practice, each peer will have a send cache and a receive cache, as shown here:

TCP-UDP

The send buffer is an area of memory maintained by the TCP implementation on the sender side that is used to temporarily store data to be sent. When the three-way handshake is performed to establish a connection, the send cache is set up and used to store data. The send buffer is dynamically adjusted according to network congestion and feedback from the receiver.

A receive buffer is an area of memory maintained by the TCP implementation on the receiving side that is used to temporarily store received data. TCP stores the received data in the receive cache and waits for the upper application to read it.

Note that the size of send cache and receive cache is limited, when the cache is full, TCP may adopt some strategies, such as congestion control, flow control, etc., to ensure reliable data transmission and network stability.

In computer networks, data transmission between hosts is carried out by means of segments. So what is a packet segment?

TCP creates a TCP segment, or packet segment, by splitting the incoming stream into chunks and adding TCP headers to each chunk. Each Segment can only be transmitted for a limited amount of time and cannot exceed the Maximum Segment Size (MSS). On its way down, a packet segment passes through the link layer. The link layer has a Maximum Transmission Unit (MTU), which is the maximum packet size that can pass through the data link layer. The maximum transmission unit is usually related to the communication interface.

So what is the difference between MSS and MTU?

In computer networks, the hierarchical architecture is very important because it takes into account the differences between the different levels. Each layer has a different name; in the transport layer, the data is called a segment, and in the network layer, the data is called an IP packet. Therefore, the Maximum Transmission Unit (MTU) can be thought of as the Maximum IP packet Size that can be transmitted by the network layer, while the Maximum Segment Size (MSS) is a transport layer concept that refers to the maximum amount of data that can be transmitted by a TCP packet at a time.

Note that when the Maximum Segment Size (MSS) is larger than the Maximum Transmission Unit (MTU), IP fragmentation will be performed at the network layer, and TCP will not split the larger data into segments suitable for MTU size. There will be a section on the network layer dedicated to the IP layer.

TCP packet segment structure
Let's explore the format and contents of TCP headers.

TCP Segment

Sequence number: A random number generated by the computer when the connection is established as its initial value when the TCP connection is established, and the sequence number is sent to the receiver through the SYN packet. During data transmission, the sender increments the sequence number according to the amount of data sent. The receiver judges the order of the data according to the received sequence number. If the data is found out of order, the receiver will reorder the data to ensure the order of the data.

Acknowledgement number: This is a sequence number used in TCP to acknowledge the receipt of data. It indicates the sequence number of the next data that the sender expects to receive. In a TCP connection, the receiver determines which data has been successfully received based on the sequence number of the received data packet segment. When the receiver successfully receives the data, it sends an ACK packet to the sender, which contains the acknowledgement acknowledgement number. After receiving the ACK packet, the sender can confirm that the data before acknowledging the reply number has been successfully received.

The control bits of a TCP segment include the following:

ACK bit: When this bit is 1, it means that the acknowledgement reply field is valid. TCP specifies that this bit must be set to 1 except for SYN packets when the connection is initially established.
RST bit: When this bit is 1, it indicates that there is an exception in the TCP connection and the connection must be forced to be disconnected.
SYN bit: When this bit is set to 1, it means that the connection is to be established and the initial value of the sequence number is set in the sequence number field.
FIN bit: When this bit is 1, it means that no more data will be sent in the future and the connection is desired.
The various functions and characteristics of TCP are embodied by the structure of TCP packet segments.

What is UDP?
User Datagram Protocol (UDP) is a connectionless communication protocol. Compared with TCP, UDP does not provide complex control mechanisms. The UDP protocol allows applications to directly send encapsulated IP packets without establishing a connection. When the developer chooses to use UDP instead of TCP, the application communicates directly with the IP.

The full name of the UDP Protocol is User Datagram Protocol, and its header is only eight bytes (64 bits), which is very concise. The format of the UDP header is as follows:

UDP segment

Destination and source ports: Their main purpose is to indicate to which process UDP should send packets.
Packet size: The packet size field holds the size of the UDP header plus the size of the data
Checksum: Designed to ensure reliable delivery of UDP headers and data The role of the checksum is to detect whether an error or corruption has occurred during the transmission of a UDP packet to ensure the integrity of the data.

Differences between TCP and UDP
TCP and UDP are different in the following aspects:

TCP vs UDP

Connection: TCP is a connection-oriented transport protocol that requires a connection to be established before data can be transferred. UDP, on the other hand, does not require a connection and can transfer data immediately.

Service Object: TCP is a one-to-one two-point service, that is, a connection has only two endpoints to communicate with each other. However, UDP supports one-to-one, one-to-many, and many-to-many interactive communication, which can communicate with multiple hosts at the same time.

Reliability: TCP provides the service of delivering data reliably, ensuring that data is error-free, loss-free, non-duplicate, and arrives on demand. UDP, on the other hand, does its best effort and does not guarantee reliable delivery. UDP may suffer from data loss and other situations during transmission.

Congestion control, flow control: TCP has congestion control and flow control mechanisms, which can adjust the data transmission rate according to the network conditions to ensure the security and stability of data transmission. UDP does not have congestion control and flow control mechanisms, even if the network is very congested, it will not make adjustments to the UDP sending rate.

Header overhead: TCP has a long header length, typically 20 bytes, which increases when option fields are used. UDP, on the other hand, has a fixed header of only 8 bytes, so UDP has a lower header overhead.

TCP vs UDP

TCP and UDP Application Scenarios:
TCP and UDP are two different transport layer protocols, and they have some differences in application scenarios.

Since TCP is a connection-oriented protocol, it is primarily used in scenarios where reliable data delivery is required. Some common use cases include:

FTP file transfer: TCP can ensure that files are not lost and corrupted during transfer.
HTTP/HTTPS: TCP ensures the integrity and correctness of web content.
Because UDP is a connectionless protocol, it does not provide reliability guarantee, but it has the characteristics of efficiency and real-time. UDP is suitable for the following scenarios:

Low-packet traffic, such as DNS (Domain Name System) : DNS queries are usually short packets, and UDP can complete them faster.
Multimedia communication such as video and audio: For multimedia transmission with high real-time requirements, UDP can provide lower latency to ensure that data can be transmitted in a timely manner.
Broadcast communication: UDP supports one-to-many and many-to-many communication and can be used for the transmission of broadcast messages.

Summary
Today we learned about TCP. TCP is a connection oriented, reliable, byte-stream based transport layer communication protocol. It ensures the reliable transmission and orderly reception of data by establishing connection, handshake and acknowledgement. TCP protocol uses ports to realize the communication between processes, and provides direct communication services for application processes running on different hosts. TCP connections are full-duplex, allowing simultaneous bidirectional data transfers. In contrast, UDP is a connectionless oriented communication protocol, which does not provide reliability guarantees and is suitable for some scenarios with high real-time requirements. TCP and UDP are different in connection mode, service object, reliability, congestion control, flow control and other aspects, and their application scenarios are also different.


Post time: Dec-03-2024