克服 cisco 路由器或传输 linux 机器中的微爆

网络工程 思科 转变 多播 linux 数据包丢失
2021-08-01 09:14:18

我有使用 Java 应用程序多播 UDP 流的 linux 机器。多猫通过 ciso 路由器。我看到接收器丢失了很多数据包。检查路由器中的端口显示超限计数器增加。在互联网上搜索表明这些是由于来自 TX 机器的微爆而发生的。我也猜想一样,因为下降随着时间的推移而增加,可能是 java 应用程序随着时间的推移变得不稳定,然后开始输出数据包,它的错误。也许我可以更改路由器中的某些内容?

编辑

以下是思科硬件:

Cisco WS-C6504-E 带 FAN-MOD-4HS 空 4 插槽 6500 增强型机箱 1

Cisco PWR-2700-AC/4 2700W 交流电源,适用于 7604 6504-E #12025 X 2 2

Cisco Catalyst 6500 7600 Supervisor 720 模块 - WS-SUP720-3BXL WS-F6K-PFC3BXL 1

思科 WS-X6548-GE-TX 48 端口 1G 铜缆以太网模块

以下是 CPU 利用率:

RHE-001#show fabric utilization all
 slot    channel      speed    Ingress %     Egress %
    1          0        20G            0            0
    2          0         8G           12            2
    3          0         8G            9           14
    4          0         8G            0           13

RHE-001#

以下是接口统计信息:

RHE-001#show int GigabitEthernet 2/4
GigabitEthernet2/4 is up, line protocol is up (connected)
  Hardware is C6k 1000Mb 802.3, address is 0023.04dd.0d00 (bia 0023.04dd.0d00)
  Internet address is 10.0.1.13/30
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 7/255, rxload 21/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseT
  input flow-control is off, output flow-control is off
  Clock mode is auto
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:27, output 00:00:05, output hang never
  Last clearing of "show interface" counters 5d08h
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 83389000 bits/sec, 7648 packets/sec
  5 minute output rate 30360000 bits/sec, 2786 packets/sec
  L2 Switched: ucast: 32 pkt, 2048 bytes - mcast: 0 pkt, 0 bytes
  L3 in Switched: ucast: 0 pkt, 0 bytes - mcast: 3542591539 pkt, 4825009676118 bytes mcast
  L3 out Switched: ucast: 0 pkt, 0 bytes mcast: 2879819193 pkt, 3922313740866 bytes
     3542548642 packets input, 4830700273704 bytes, 0 no buffer
     Received 3542548610 broadcasts (3542458124 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 4243199 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     1276819687 packets output, 1738995021346 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
RHE-001#

正如我们所看到的,溢出计数器在增加。

缓冲器:

RHE-001#show buffers
Buffer elements:
     499 in free list (500 max allowed)
     623919821 hits, 0 misses, 0 created

Public buffer pools:
Small buffers, 104 bytes (total 1024, permanent 1024):
     1021 in free list (128 min, 2048 max allowed)
     183749828 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Medium buffers, 256 bytes (total 3000, permanent 3000):
     2999 in free list (64 min, 3000 max allowed)
     22779465 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Middle buffers, 600 bytes (total 512, permanent 512):
     510 in free list (64 min, 1024 max allowed)
     5814462 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Big buffers, 1536 bytes (total 1000, permanent 1000):
     999 in free list (64 min, 1000 max allowed)
     2009529750 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 10, permanent 10):
     10 in free list (0 min, 100 max allowed)
     363 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Large buffers, 9240 bytes (total 8, permanent 8):
     8 in free list (0 min, 10 max allowed)
     57 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Huge buffers, 18024 bytes (total 2, permanent 2):
     2 in free list (0 min, 4 max allowed)
     41 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)

Interface buffer pools:
Syslog ED Pool buffers, 600 bytes (total 150, permanent 150):
     118 in free list (150 min, 150 max allowed)
     10421 hits, 10168 misses
LI Middle buffers, 600 bytes (total 512, permanent 256, peak 512 @ 7w0d):
     256 in free list (256 min, 768 max allowed)
     171 hits, 85 fallbacks, 0 trims, 256 created
     0 failures (0 no memory)
     256 max cache size, 256 in cache
     0 hits in cache, 0 misses in cache
EOBC0/0 buffers, 1524 bytes (total 2400, permanent 2400):
     1200 in free list (0 min, 2400 max allowed)
     1200 hits, 0 fallbacks
     1200 max cache size, 680 in cache
     2369496864 hits in cache, 0 misses in cache
LI Big buffers, 1536 bytes (total 512, permanent 256, peak 512 @ 7w0d):
     256 in free list (256 min, 768 max allowed)
     171 hits, 85 fallbacks, 0 trims, 256 created
     0 failures (0 no memory)
     256 max cache size, 256 in cache
     0 hits in cache, 0 misses in cache
IPC buffers, 4096 bytes (total 2352, permanent 2352):
     2242 in free list (784 min, 7840 max allowed)
     333747144 hits, 0 fallbacks, 0 trims, 0 created
     0 failures (0 no memory)
LI Very Big buffers, 4520 bytes (total 257, permanent 128, peak 257 @ 7w0d):
     129 in free list (128 min, 384 max allowed)
     85 hits, 43 fallbacks, 4101 trims, 4230 created
     0 failures (0 no memory)
     128 max cache size, 128 in cache
     0 hits in cache, 0 misses in cache
Private Huge IPC buffers, 18024 bytes (total 2, permanent 2):
     2 in free list (1 min, 4 max allowed)
     0 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Private Huge buffers, 65280 bytes (total 2, permanent 2):
     2 in free list (1 min, 4 max allowed)
     787 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)

Header pools:


RHE-001#

编辑 2

没有开关。组播发送机和接收机直接连接到cisco路由器。

1个回答

所以首先show fabric utilization all显示结构利用率,而不是 CPU 利用率。Fabric 本身没有 CPU 组件,您可以将所有的 Fabric 利用率提高到 100%,而不会产生类似于 CPU 在接近满负荷时造成的负面影响。

接下来,WS-X6548-GE-TX 是 8Gbit/s 卡,因此“旧”结构附加了具有 8Gbit/s 通道的 LC。在内部,它在卡上每 8 个端口共享缓冲区,因此考虑到您遇到“溢出”错误,这些错误通常指向及时接收流量并将其移交给其他端口的问题,我要做的第一件事是分离传入卡上的8口组为独立组。换句话说,如果有特定的端口/端口组接收大量多播流量,我会将它移到卡上的单独组中 - 请记住,每个连续的 8 个端口是一个“组”:

http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/hardware/Module_Installation/Mod_Install_Guide/6500-emig/02ethern.html#wp1043307

这意味着,除其他外,8Gbit/s 到结构的接口被静态划分为 6 组,每组 8 个端口,其中每组最大为 1Gbit/s。因此,如果在任何给定的端口组(8 个 10/100/1000 端口)中有端口接收超过 1Gbit/s 的流量,您就会遇到您遇到的问题。这就是为什么我的建议是将任何其他端口移出 8 端口组,除了接收大量多播流量的一个接口(这似乎是GigabitEthernet 2/4您的情况)。您可以在发行说明中逐字找到此信息:

http://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/15-1SY/release_notes.html#pgfId-4909956

每组 8 个端口(1–8、9–16、17–24、25–32、33–40 和 41–48)的总带宽为 1 Gbps。”

为了更好地利用物理端口,我建议您查看WS-X6748-GE-TX卡,它也有 48 个 10/100/1000 端口,但还有两个 20Gbit/s 结构连接那些 20Gbit/s 光纤通道在端口 1-24 和 25-48 之间拆分,因此您仍然会出现超额订阅,但通道支持的 20Gbit/s 上只有 24Gbit/s,而不是 6548 中的 1Gbit/s 上的 8Gbit/s(因此,实际上,6748 中 1.2:1 的超额认购比 6548 中的 8:1)。这应该为您提供空间来突发来自发送站的链路上的流量并将其分布到整个系统中。