A bit of network assistance plz!

Karandras · Jan 14, 2022

Please check out this setup.
The 10.2.1.x network works no problem
However the 10.10.1.x network is causing me issues.

This is not a routed network, just a 10g switch with the 10g cards plugged in and no gateway.
All the networks are /24
10.10.1.81-84 can all ping each other no problem
10.10.1.99 can ping to 10.10.1.81 and vise versa
10.10.1.82-84 cannot ping 10.10.1.99

ProxCPU1-4 are nodes on a c6220 with dual 10g mezzanine card.
ProxBackup1 is a pe2950 with a Mellanox connectx-3 card.

Any ideas?

Thanks!

Eulogy · Jan 15, 2022

That diagram isn't really helping a whole lot, though I think I've mostly parsed it out by now.

What troubleshooting have you done so far? Can you post relevant switch configs?

MrGuvernment · Jan 15, 2022

First question, why the need for 3 subnets?

are they all /24 subnets (though should all be able to ping any ways even if not on same subnet)

Do you have any static routes in any of the switches?

JavaLava · Jan 15, 2022

Has to be a layer 2 configuration issue.

On the machines themselves, make sure you didn't fat finger a subnet or IP address octet (I personally done this many times...typed /25 instead of /24, etc).

On the switch, make sure the VLAN access ports are setup correctly (even if its just the default VLAN.

Karandras · Jan 16, 2022

Sorry for the late reply.
Three subnets are what the software asked for...sort of.

10.2.x.x /16 - Management/Network traffic, dual gige
10.10.1.x/24 and 10.10.2.x/24 is for High availability and cluster management, 10g
there is one more not listed cause it works fine and doesn't need to goto the backup. 192.168.1.x/24 - dual 40g for CEPH RBD

The 10.2.x.x/16 works just fine, not problems. Even the backup node can ping prox1-4
10.10.1.x/24 is my main issue. Only Layer 2, no vlans either.
The switches for 10.10.1.x and 10.10.2.x are both just layer 2 switches, no routes or any vlanning.
I have no configs to post as they are just defaulted to vlan1 throughout. That being said I think I'll try a different port on the switch to make sure there isn't something funky with that port.

I double checked the machines are all /24 for the 10.10.1.x network (as the backup machine only has one 10g nic)

So on Monday I'm going to try the following.
-Different port on the 10.10.1.x network, maybe a couple of different ports.
-Move the Backup1 machine to 10.10.2.x network. See if that switch has better results.

Since this is all layer 2, I'm confused as to why this isn't working. Beyond what I've put as my to do, any further ideas?

Thanks again!

Nicklebon · Jan 16, 2022

I would check the output of netstat -r and arp -a on each machine to make sure it matches expectations. I would also also disconnect two of the networks and test one network at a time.

Karandras · Jan 17, 2022

Well, arp -a is interesting. Going to have to investigate this some more:

root@pve1-backup1:~# arp -a
? (10.10.1.81) at 00:8c:fa:5a:b3:a0 [ether] on enp8s0
? (10.10.1.82) at <incomplete> on enp8s0
? (10.10.1.83) at <incomplete> on enp8s0
? (10.2.0.2) at 00:50:56:8f:2b:ff [ether] on vmbr0
? (10.2.2.82) at 8a:0c:35:65:9a:92 [ether] on vmbr0
? (10.2.0.1) at 00:00:5e:00:01:03 [ether] on vmbr0
? (10.10.1.84) at <incomplete> on enp8s0
root@pve1-backup1:~#

Being incomplete, not getting an L2 reply from the device at that IP.
Going to try other stuffs.

Karandras · Jan 17, 2022

Well, looks like there must be some sort of port separation, vlanning or something.
The switch was laid out like this:

Port 1-4, 10.10.1.81-84
Port 23, backup1

backup1 can ping 10.10.1.81
backup1 cannot ping 10.10.1.82-84

Moved backup1 to port 5
now it's different

backup1 cannot ping 10.10.1.81
backup1 can ping 10.10.1.82-84

So, have to figure out what' up with that..

Eulogy · Jan 17, 2022

That's why I asked for the configs

.

MrGuvernment · Jan 17, 2022

Reset all the switches to factory and start clean!

Karandras · Jan 18, 2022

Alright, I reset the switches and still having the same problem... 10.10.1.81-84 cannot ping 10.10.1.99(backup server)
Here is the base config (just admin pass and ip set):

Code:

(FASTPATH Routing) #show run
!Current Configuration:
!
!System Description "Quanta LB6M, 1.2.0.18, Linux 2.6.21.7"
!System Software Version "1.2.0.18"
!System Up Time          "0 days 1 hrs 2 mins 53 secs"
!Additional Packages     FASTPATH QOS
!Current SNTP Synchronized Time: Not Synchronized
!
serviceport protocol none
serviceport ip 10.2.1.71 255.255.0.0 10.2.0.1
vlan database
exit
configure
username "admin" password <REMOVED> level 15 encrypted
aaa authentication enable "enableList" enable
line console
exit
line telnet
exit
line ssh
exit
spanning-tree configuration name "04-7D-7B-46-FF-D8"
!
router rip
exit
router ospf
exit
exit

(FASTPATH Routing) #

However I think there might be something wrong down to the NIC.
I plugged the backup server into one of the ports that the other servers were on and still not pinging through.
I switched to the other LB6M (on the 10.10.2.x network, changed the IP on the backup server and everything can ping everything.
The thing that is a big weird, everything shows that it's connected at 10g but transferring under 1g. Going to try another NIC and repeat the process.

Eulogy said:
That's why I asked for the configs .

I actually didn't know these were managed switches. Ha. The admin that had these before never mentioned anything (and I didn't look it up)... Whoops...

SamirD · Jan 18, 2022

Karandras said:
I actually didn't know these were managed switches. Ha. The admin that had these before never mentioned anything (and I didn't look it up)... Whoops...

I think we found the root cause.

I was looking at that and thinking that if those were unmanged switches everything should work fine.

Karandras · Jan 24, 2022

Well, I still don't know what the problem was. Working from home last week and the beginning of this week.
I think there is something weird with that switch, a weird routing database or something that wasn't cleared an isn't easily found.
My work around, move the cable to the other switch and use the 10.2.2.x/24 network. Worked great but doesn't answer my question, whyyyyyyyyyyyy :-/ .

Nicklebon · Jan 24, 2022

It's been a while since I've used my cisco-fu but I assume you started with "wr mem erase" yes? If not I would recommend a do over starting with that.

edit: Actually you'll need to delete the vlan.dat file after you erase the config because in their infinite wisdom cisco decided that erasing the config should not also erase the vlan config.

Nobu · Jan 24, 2022

Ah, gotta love compiled configuration files.

Karandras · Jan 25, 2022

These aren't Cisco switches, the 10g swiches (which I'm having problems with one of them) is a Qunata L6BM switch. The OS is quasi-cisco which is quite annoying to use.

Nicklebon · Jan 25, 2022

Either way you should wipe and reset to default before using. Seeing as you did not know they were managed my guess is you did not. I would suggest starting over.

SamirD · Jan 25, 2022

Nicklebon said:
Either way you should wipe and reset to default before using. Seeing as you did not know they were managed my guess is you did not. I would suggest starting over.

This is generally a good idea for managed switches. However, since Quanta products are typically for the datacenter, I would be careful on how you do this reset because wiping the switch completely clean would probably remove its os aka brick it.

Karandras · Jan 25, 2022

SamirD said:
This is generally a good idea for managed switches. However, since Quanta products are typically for the datacenter, I would be careful on how you do this reset because wiping the switch completely clean would probably remove its os aka brick it.

We aren't going to talk about this.... All I'm going to say is I'm glad for this site https://brokeaid.com/revert/#booting-quanta

A bit of network assistance plz!

Karandras

[H]ard|Gawd

Eulogy

2[H]4U

MrGuvernment

Fully [H]

JavaLava

Limp Gawd

Karandras

[H]ard|Gawd

Nicklebon

Gawd

Karandras

[H]ard|Gawd

Karandras

[H]ard|Gawd

Eulogy

2[H]4U

MrGuvernment

Fully [H]

Karandras

[H]ard|Gawd

SamirD

Supreme [H]ardness

Karandras

[H]ard|Gawd

Nicklebon

Gawd

Nobu

[H]F Junkie

Karandras

[H]ard|Gawd

Nicklebon

Gawd

SamirD

Supreme [H]ardness

Karandras

[H]ard|Gawd