storm-test | mesh-no-stp | mesh-no-stp-mlag
live well without stp
I don’t have a bare-metal Catalyst 6500 at hand to play with VSS just yet sorry – we’re going Open Source for this one.
so here we got multichassis LACP, which is much better than raw balance. however we did only validate the ha acceptance test, not the throughput acceptance test.
(login as cumulus/cumulus and provide new password as requested)
on node1
nv set system hostname cumulus1
on node2
nv set system hostname cumulus2
then follow the steps as described there
and finally disable STP altogether
on both nodes
nv set bridge domain br_default stp state down nv config apply
nv show interface bond-members nv show system forwarding lag-hash cat /proc/net/bonding/bond1 | grep Hash cat /proc/net/bonding/bond2 | grep Hash cat /proc/net/bonding/bond0
sniff the link between the two switchen. there should be no storm unless you add a secondary link between the switchen.
# from node2 ping -c1 10.5.5.1 ping -b -c1 10.5.5.255
since we enabled LACP (eventhough it’s multichassis) there should be no broadcast duplicates on the end nodes. there are no multiple pathes.
# node1 tcpdump -i all # from node2 ping -c1 10.5.5.1 ping -b -c1 10.5.5.255
cut and restore one link, then the other. interruptions should last less than a second.
# from node2 ping 10.5.5.1
# on node1 ( iperf3 -s & ); iperf3 -s -p 5202 # on node2 - single pipe iperf3 -c 10.5.5.1
==> e.g. 100 mbit/s
# on node2 - multiple tcp/ports ( iperf3 -c 10.5.5.1 & ); iperf3 -c 10.5.5.1 -p 5202
==> should be about the double of the previous amount (for two pipes within the LACP aggregate)
https://en.wikipedia.org/wiki/Multi-chassis_link_aggregation_group
https://en.wikipedia.org/wiki/Cisco_Catalyst
https://developer.nvidia.com/blog/maximizing-hpc-cluster-ethernet-fabric-performance-with-mlag/