A bit of network assistance plz!

Karandras

[H]ard|Gawd
Joined
Feb 16, 2001
Messages
1,873
prox-net-issue01.jpg


Please check out this setup.
The 10.2.1.x network works no problem
However the 10.10.1.x network is causing me issues.

This is not a routed network, just a 10g switch with the 10g cards plugged in and no gateway.
All the networks are /24
10.10.1.81-84 can all ping each other no problem
10.10.1.99 can ping to 10.10.1.81 and vise versa
10.10.1.82-84 cannot ping 10.10.1.99

ProxCPU1-4 are nodes on a c6220 with dual 10g mezzanine card.
ProxBackup1 is a pe2950 with a Mellanox connectx-3 card.

Any ideas?

Thanks!
 
That diagram isn't really helping a whole lot, though I think I've mostly parsed it out by now.

What troubleshooting have you done so far? Can you post relevant switch configs?
 
First question, why the need for 3 subnets?

are they all /24 subnets (though should all be able to ping any ways even if not on same subnet)

Do you have any static routes in any of the switches?
 
Last edited:
Has to be a layer 2 configuration issue.

On the machines themselves, make sure you didn't fat finger a subnet or IP address octet (I personally done this many times...typed /25 instead of /24, etc).

On the switch, make sure the VLAN access ports are setup correctly (even if its just the default VLAN.
 
Sorry for the late reply.
Three subnets are what the software asked for...sort of.

10.2.x.x /16 - Management/Network traffic, dual gige
10.10.1.x/24 and 10.10.2.x/24 is for High availability and cluster management, 10g
there is one more not listed cause it works fine and doesn't need to goto the backup. 192.168.1.x/24 - dual 40g for CEPH RBD

The 10.2.x.x/16 works just fine, not problems. Even the backup node can ping prox1-4
10.10.1.x/24 is my main issue. Only Layer 2, no vlans either.
The switches for 10.10.1.x and 10.10.2.x are both just layer 2 switches, no routes or any vlanning.
I have no configs to post as they are just defaulted to vlan1 throughout. That being said I think I'll try a different port on the switch to make sure there isn't something funky with that port.

I double checked the machines are all /24 for the 10.10.1.x network (as the backup machine only has one 10g nic)

So on Monday I'm going to try the following.
-Different port on the 10.10.1.x network, maybe a couple of different ports.
-Move the Backup1 machine to 10.10.2.x network. See if that switch has better results.

Since this is all layer 2, I'm confused as to why this isn't working. Beyond what I've put as my to do, any further ideas?

Thanks again!
 
I would check the output of netstat -r and arp -a on each machine to make sure it matches expectations. I would also also disconnect two of the networks and test one network at a time.
 
Well, arp -a is interesting. Going to have to investigate this some more:

root@pve1-backup1:~# arp -a
? (10.10.1.81) at 00:8c:fa:5a:b3:a0 [ether] on enp8s0
? (10.10.1.82) at <incomplete> on enp8s0
? (10.10.1.83) at <incomplete> on enp8s0
? (10.2.0.2) at 00:50:56:8f:2b:ff [ether] on vmbr0
? (10.2.2.82) at 8a:0c:35:65:9a:92 [ether] on vmbr0
? (10.2.0.1) at 00:00:5e:00:01:03 [ether] on vmbr0
? (10.10.1.84) at <incomplete> on enp8s0
root@pve1-backup1:~#

Being incomplete, not getting an L2 reply from the device at that IP.
Going to try other stuffs.
 
Well, looks like there must be some sort of port separation, vlanning or something.
The switch was laid out like this:

Port 1-4, 10.10.1.81-84
Port 23, backup1

backup1 can ping 10.10.1.81
backup1 cannot ping 10.10.1.82-84

Moved backup1 to port 5
now it's different

backup1 cannot ping 10.10.1.81
backup1 can ping 10.10.1.82-84

So, have to figure out what' up with that..
 
Alright, I reset the switches and still having the same problem... 10.10.1.81-84 cannot ping 10.10.1.99(backup server)
Here is the base config (just admin pass and ip set):
Code:
(FASTPATH Routing) #show run
!Current Configuration:
!
!System Description "Quanta LB6M, 1.2.0.18, Linux 2.6.21.7"
!System Software Version "1.2.0.18"
!System Up Time          "0 days 1 hrs 2 mins 53 secs"
!Additional Packages     FASTPATH QOS
!Current SNTP Synchronized Time: Not Synchronized
!
serviceport protocol none
serviceport ip 10.2.1.71 255.255.0.0 10.2.0.1
vlan database
exit
configure
username "admin" password <REMOVED> level 15 encrypted
aaa authentication enable "enableList" enable
line console
exit
line telnet
exit
line ssh
exit
spanning-tree configuration name "04-7D-7B-46-FF-D8"
!
router rip
exit
router ospf
exit
exit

(FASTPATH Routing) #

However I think there might be something wrong down to the NIC.
I plugged the backup server into one of the ports that the other servers were on and still not pinging through.
I switched to the other LB6M (on the 10.10.2.x network, changed the IP on the backup server and everything can ping everything.
The thing that is a big weird, everything shows that it's connected at 10g but transferring under 1g. Going to try another NIC and repeat the process.


That's why I asked for the configs ;).
I actually didn't know these were managed switches. Ha. The admin that had these before never mentioned anything (and I didn't look it up)... Whoops...
 
I actually didn't know these were managed switches. Ha. The admin that had these before never mentioned anything (and I didn't look it up)... Whoops...
I think we found the root cause. :D I was looking at that and thinking that if those were unmanged switches everything should work fine.
 
Well, I still don't know what the problem was. Working from home last week and the beginning of this week.
I think there is something weird with that switch, a weird routing database or something that wasn't cleared an isn't easily found.
My work around, move the cable to the other switch and use the 10.2.2.x/24 network. Worked great but doesn't answer my question, whyyyyyyyyyyyy :-/ .
 
It's been a while since I've used my cisco-fu but I assume you started with "wr mem erase" yes? If not I would recommend a do over starting with that.

edit: Actually you'll need to delete the vlan.dat file after you erase the config because in their infinite wisdom cisco decided that erasing the config should not also erase the vlan config.
 
Last edited:
These aren't Cisco switches, the 10g swiches (which I'm having problems with one of them) is a Qunata L6BM switch. The OS is quasi-cisco which is quite annoying to use.
 
Either way you should wipe and reset to default before using. Seeing as you did not know they were managed my guess is you did not. I would suggest starting over.
 
Either way you should wipe and reset to default before using. Seeing as you did not know they were managed my guess is you did not. I would suggest starting over.
This is generally a good idea for managed switches. However, since Quanta products are typically for the datacenter, I would be careful on how you do this reset because wiping the switch completely clean would probably remove its os aka brick it.
 
This is generally a good idea for managed switches. However, since Quanta products are typically for the datacenter, I would be careful on how you do this reset because wiping the switch completely clean would probably remove its os aka brick it.
:whistle: We aren't going to talk about this.... All I'm going to say is I'm glad for this site https://brokeaid.com/revert/#booting-quanta
 
Last edited:
Back
Top