I've been trying to debug this odd performance behavior on my new NAS and I'm curious if anyone has more ideas I can try before I write off the NIC as a bad part. The short of it is, if lots of incoming traffic is received to this system, it appears to throttle the bandwidth. However if this system sends lots of outbound traffic, it does not appear to throttle. I've been using iper3 for testing server-to-server performance as a way to troubleshoot.
Some notes
System B
System C
When I run iperf to send from System B to System A I see performance drops after a few seconds.
However if when I run iperf to send from System A to System B things look much better (but there are still retries I don't like seeing)
I did a similar test using System C to System A and I get very similar results. The majority of the performance drop is shown when a system is sending traffic to the problem System A.
Followup troubleshooting:
I paid $25 for the NIC, I may just replace it and see if that solves my issue but wanted to ask if there are other things I haven't considered.
Some notes
- All three systems are running Ubuntu Server 24.04.2 LTS
- All three are cabled to the same 10Gb switch using DAC cables into SFP+ ports
- All three are using a version of a Mellanox Connect-X 3 NIC
- All three are on a flat network with no routing involved
System B
System C
When I run iperf to send from System B to System A I see performance drops after a few seconds.
Code:
iperf3 -c 192.168.1.231 -p 3000 -t 20
Connecting to host 192.168.1.231, port 3000
[ 5] local 192.168.1.226 port 58434 connected to 192.168.1.231 port 3000
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 823 MBytes 6.90 Gbits/sec 0 1.58 MBytes
[ 5] 1.00-2.00 sec 484 MBytes 4.06 Gbits/sec 0 1.58 MBytes
[ 5] 2.00-3.00 sec 484 MBytes 4.06 Gbits/sec 0 1.58 MBytes
[ 5] 3.00-4.00 sec 482 MBytes 4.05 Gbits/sec 0 1.58 MBytes
[ 5] 4.00-5.00 sec 482 MBytes 4.05 Gbits/sec 0 1.58 MBytes
[ 5] 5.00-6.00 sec 485 MBytes 4.07 Gbits/sec 0 1.58 MBytes
[ 5] 6.00-7.00 sec 564 MBytes 4.73 Gbits/sec 9 1.20 MBytes
[ 5] 7.00-8.00 sec 546 MBytes 4.58 Gbits/sec 0 1.23 MBytes
[ 5] 8.00-9.00 sec 469 MBytes 3.93 Gbits/sec 0 1.25 MBytes
However if when I run iperf to send from System A to System B things look much better (but there are still retries I don't like seeing)
Code:
iperf3 -c 192.168.1.226 -p 3000 -t 20
Connecting to host 192.168.1.226, port 3000
[ 5] local 192.168.1.231 port 46358 connected to 192.168.1.226 port 3000
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.40 MBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.47 MBytes
[ 5] 2.00-3.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.47 MBytes
[ 5] 3.00-4.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.68 MBytes
[ 5] 4.00-5.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.76 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.76 MBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.76 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.35 Gbits/sec 254 1.28 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.36 Gbits/sec 0 1.44 MBytes
[ 5] 9.00-10.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.44 MBytes
I did a similar test using System C to System A and I get very similar results. The majority of the performance drop is shown when a system is sending traffic to the problem System A.
Followup troubleshooting:
- I changed to a new SFP+ switch port for System A - no change in behavior
- I swapped the DAC cable for a new one for System A - no change in behavior
- I painstakingly updated the firmware on the Mellanox Connectx-3 (MCX311A-XCAT) to the latest for System A - no change in behavior
- I made sure to power cycle after firmware update so it got loaded
I paid $25 for the NIC, I may just replace it and see if that solves my issue but wanted to ask if there are other things I haven't considered.