The Solution of “Micro Burst” in Bypass Network Traffic Capture Application Scenario

In the typical NPB application scenario, the most troublesome problem for administrators is packet loss caused by the congestion of mirrored packets and NPB networks. Packet loss in NPB can cause the following typical symptoms in back-end analysis tools:

- An alarm is generated when the APM service performance monitoring indicator decreases, and the transaction success rate decreases

- The NPM network performance monitoring indicator exception alarm is generated

- The security monitoring system fails to detect network attacks due to event omission

- Loss of service behavior audit events generated by the service audit system

... ...

As a centralized capture and distribution system for Bypass monitoring, the importance of NPB is self-evident. At the same time, the way it processes data packet traffic is quite different from the traditional live network switch, and the traffic congestion control technology of many service live networks is not applicable to NPB. How to solve NPB packet loss, let's start from the root cause analysis of packet loss to see it!

NPB/TAP Packet Loss Congestion Root Cause Analysis

First of all, we analyze the actual traffic path and the mapping relationship between the system and the incoming and outgoing of the level 1 or level NPB network. No matter what kind of network topology NPB forms, as a collection system, there is a many-to-many traffic input and output relationship between "access" and "output" of the whole system.

Micro Burst 1

Then we look at the business model of NPB from the perspective of ASIC chips on a single device:

Micro Burst 2

Feature 1: The "traffic" and "physical interface rate" of the input and output interfaces are asymmetrical, resulting in a large number of micro-bursts is an inevitable result. In typical many-to-one or many-to-many traffic aggregation scenarios, the physical rate of the output interface is usually smaller than the total physical rate of the input interface. For example, 10 channels of 10G collection and 1 channel of 10G output; In a multilevel deployment scenario, all NPBBS can be viewed as a whole.

Feature 2: ASIC chip cache resources are very limited. In terms of the currently commonly used ASIC chip, the chip with 640Gbps exchange capacity has a cache of 3-10Mbytes; A 3.2Tbps capacity chip has a cache of 20-50 mbytes. Including BroadCom, Barefoot, CTC, Marvell and other manufacturers of ASIC chips.

Feature 3: The conventional end-to-end PFC flow control mechanism is not applicable to NPB services. The core of the PFC flow control mechanism is to achieve end-to-end traffic suppression feedback, and ultimately reduce the sending of packets to the protocol stack of the communication endpoint to alleviate congestion. However, the packet source of NPB services is mirrored packets, so the congestion processing strategy can only be discarded or cached.

The following is the appearance of a typical micro-burst on the flow curve:

Micro Burst 3

Taking 10G interface as an example, in the second level traffic trend analysis diagram, the traffic rate is maintained at about 3Gbps for a long time. On the micro millisecond trend analysis chart, the traffic spike (MicroBurst) has greatly exceeded the 10G interface physical rate.

Key Techniques for Mitigating NPB Microburst

Reduce the impact of asymmetric physical interface rate mismatch - When designing a network, reduce asymmetric input and output physical interface rates as much as possible. A typical method is to use a higher rate uplink interface link, and avoid asymmetric physical interface rates (for example, copying 1 Gbit/s and 10 Gbit/s traffic at the same time).

Optimize the cache management policy of the NPB service - The common cache management policy applicable to the switching service is not applicable to the forwarding service of the NPB service. The cache management policy of static guarantee + Dynamic sharing should be implemented based on the features of the NPB service. In order to minimize the impact of NPB microburst under the current chip hardware environment limitation.

Implement classified traffic engineering management - Implement priority traffic engineering service classification management based on traffic classification. Ensure service quality of different priority queues based on category queue bandwidths, and ensure that user sensitive service traffic packets can be forwarded without packet loss.

A reasonable system solution enhances the packet caching capability and traffic shaping capability - Integrates the solution through various technical means to expand the packet caching capability of the ASIC chip. By shaping the flow at different locations, the micro-burst becomes micro-uniform flow curve after shaping.

Mylinking™ Micro Burst Traffic Management Solution

Scheme 1 - Network-optimized cache management strategy + network-wide classified service quality priority management

Cache management strategy optimized for the whole network

Based on the in-depth understanding of NPB service characteristics and practical business scenarios of a large number of customers, Mylinking™ traffic collection products implement a set of "static assurance + dynamic sharing" NPB cache management strategy for the whole network, which has a good effect on traffic cache management in the case of a large number of asymmetric input and output interfaces. The microburst tolerance is realized to the maximum extent when the current ASIC chip cache is fixed.

Microburst Processing Technology - Management based on business priorities

Micro Burst 4

When the traffic capturing unit is deployed independently, it can also be prioritized according to the importance of the back-end analysis tool or the importance of the service data itself. For example, among many analysis tools, APM/BPC has a higher priority than security analysis/security monitoring tools because it involves the monitoring and analysis of various indicator data of important business systems. Therefore, for this scenario, the data required by APM/BPC can be defined as high priority, the data required by security monitoring/security analysis tools can be defined as medium priority, and the data required by other analysis tools can be defined as low priority. When the collected data packets enter the input port, the priorities are defined according to the importance of the packets. Packets of higher priorities are preferentially forwarded after the packets of higher priorities are forwarded, and packets of other priorities are forwarded after the packets of higher priorities are forwarded. If packets of higher priorities continue to arrive, packets of higher priorities are preferentially forwarded. If the input data exceeds the forwarding capability of the output port for a long period of time, the excess data is stored in the cache of the device. If the cache is full, the device preferentially discards the packets of the lower order. This prioritized management mechanism ensures that key analysis tools can efficiently obtain the original traffic data required for analysis in real time.

Microburst Processing Technology - classification guarantee mechanism of the whole network service quality

Micro Burst 5

As shown in the above figure, traffic classification technology is used to distinguish different services on all devices at the access layer, aggregation/core layer, and output layer, and the priorities of captured packets are re-marked. The SDN controller delivers the traffic priority policy in a centralized manner and applies it to the forwarding devices. All devices participating in the networking are mapped to different priority queues according to the priorities carried by packets. In this way, the small-traffic advanced priority packets can achieve zero packet loss. Effectively solve the packet loss problem of APM monitoring and special service audit bypass traffic services.

Solution 2 - GB-level Expansion System Cache + Traffic Shaping Scheme
GB Level System Extended Cache
When the device of our traffic acquisition unit has advanced functional processing capabilities, it can open up a certain amount of space in the memory (RAM) of the device as the global Buffer of the device, which greatly improves the Buffer capacity of the device. For a single acquisition device, at least GB capacity can be provided as the cache space of the acquisition device. This technology makes the Buffer capacity of our traffic acquisition unit device hundreds of times higher than that of the traditional acquisition device. Under the same forwarding rate, the maximum micro burst duration of our traffic acquisition unit device becomes longer. The millisecond level supported by traditional acquisition equipment has been upgraded to the second level, and the micro-burst time that can be withstand has been increased by thousands of times.

Multi-queue Traffic Shaping Capability

Microburst Processing Technology - a solution based on large Buffer Caching + Traffic Shaping

Micro Burst 6

With a super-large Buffer capacity, the traffic data generated by micro-burst is cached, and the traffic shaping technology is used in the outgoing interface to achieve smooth output of packets to the analysis tool. Through the application of this technology, the packet loss phenomenon caused by micro-burst is fundamentally solved.


Post time: Feb-27-2024