Help with routing

Smoblikat

Limp Gawd
Joined
May 28, 2020
Messages
444
Hello all, im trying to get somthing setup and I think im having a brain fart. The setup is as follows (actual VLAN ID's and IP addresses are different):

VLAN 5 = Production - 5.5.5.0/24
VLAN 10 = Printers - 10.10.10.0/24
VLAN 15 = Uplink to firewall from switch - 15.15.15.0/30

I have several VLANs that all have their gateways on a layer 3 switch, with an uplink to the firewall for WAN access. Lets say VLAN 5 is the regular production network, I have the routing working so that this VLAN can talk to all other "layer 3" VLANs that also reside on the switch, and the switch has a static route of 0.0.0.0/0 pointing to the firewalls LAN uplink (this connection is VLAN 15, its a /30 subnet with the firewall port being 15.15.15.1 and the L3 switch is 15.15.15.2, with an IP policy sending traffic from .1 to the WAN port on the firewall). The issue I am having is that VLAN 10 (printers) has the gateway set as a port on the firewall (virtual port, VLAN 10) and I cant for the life of me figure out how to get it talking back to the other VLANs that reside on the L3 switch. I have 2 IP policies defined on the firewall that sends traffic between the L3 switch uplink port and the printer port (both ports are on the firewall), yet doing a ping from a host on the printer subnet to a host on the production subnet results in the packet getting confused on the L3 switch (Reply from 15.15.15.2: Destination host unreachable). I have another VLAN dedicated for servers, which hosts DHCP for all of the other VLANs, and also resides on the L3 switch. The printer subnet is able to pull addresses from the server (DHCP relay on the printer port), so I know the packets are at least getting through, and I can ping the gateway address of the printer subnet from the production VLAN (and vice versa), so I would assume the VLAN is tagged properly. What am I missing here? I thought that maybe since the packet was getting routed through the firewall, and coming out the uplink port that it would pick up the tag of that VLAN (15), so I added tags to the ports im testing on, but still nothing. Adding an IP address to the printer VLAN results in the same destination host unreachable error, but this time its coming from whatever IP I set VLAN 10 to on the L3 switch.

Hopefully that makes sense, im exhausted, which is probobly why im in this mess in the first place :D

Any help would be greatly appreciated.
 
Do you have rules in your ACLs to allow traffic back the other direction? That's caught me up a couple times, where traffic is getting there but can't get back. Either through the router or the switch. You need an allow established/connected, reflexive, something. If your switch is like mine it isn't stateful and only evaluates rules as they enter the switch so that requires some thought.

Also might make sure you have some sort of catch all "allow any" on the switch itself. When I set up my switch's ACLs and had a default deny all it wouldn't move a lot of traffic, but permit any made it work. I read elsewhere to do that as well (for my switch). The ACLs are properly blocking unwanted traffic so I think that works. :D
 
It sounds like and ACL issue, or even a VLAN tag being stripped somewhere, normally if I have an issue like this that is bugging me, I'll mock it up in packet tracer, switch to simulation mode, and watch the packet die, than I know on what device to actually begin troubleshooting on. I assume the uplink between your switch and firewall is in trunk mode and allows all the VLANS?

Another thing is this " IP policies defined on the firewall that sends traffic between the L3 switch uplink port and the printer port " are you using a form of VRF to do this? are you allowing established packets in the reverse? printer to uplink....if that's the way you want to it go back.
 
Sounds like ACLs. What brand of FW. Since your switch is acting as layer 3 devices from the different vlans do not need a router to access one another.
 
If I"m understanding your design properly, you need a static route on the firewall for return traffic. The firewall doesn't know where the other subnets exist so it drops packet due to no valid routes to destination. The same is true if VLAN15 is a separate physical interface. If that's the case, don't use a VLAN between switch and firewall. Just make that a routed interface to simplify it.

Basically, your printer gateway exists on the firewall, but the firewall doesn't have a route to send traffic to the other VLANs that are directly connected to the switch. In this scenario, you need a static route. This could also be an ACL issue. By default all inter-VLAN routing is allowed on switches this way, but if you create ACLs of any kind you need to create a specific allow rule for any L3 or service like ICMP otherwise it will hit implicit deny.

My recommendation to simplify this, increase visibility and security is to tag all VLANs to your firewall and create a subinterface on the firewall for each VLAN and then let the firewall handle all the routing and access rules between VLANs. This reduces blind spots on your network and your firewall gets visibility into every single MAC address on the network. However, your firewall will need to be up to the task of processing packets at your desired speeds.

1597254013673.png
 
Sorry for the late replies everyone, ive been at work....its been nonstop with all the projects im doing trying to get the schools ready for the new year (sole netadmin for between 4 and 6 buildings depending on how you look at it, across 3 districts......thanks covid....)

Im going to read all the replies now.
 
Do you have rules in your ACLs to allow traffic back the other direction? That's caught me up a couple times, where traffic is getting there but can't get back. Either through the router or the switch. You need an allow established/connected, reflexive, something. If your switch is like mine it isn't stateful and only evaluates rules as they enter the switch so that requires some thought.

Also might make sure you have some sort of catch all "allow any" on the switch itself. When I set up my switch's ACLs and had a default deny all it wouldn't move a lot of traffic, but permit any made it work. I read elsewhere to do that as well (for my switch). The ACLs are properly blocking unwanted traffic so I think that works. :D

Currently as im still getting everything setup, I dont have any ACLs on the switch, and all VLANs are routing between eachother (the ones with their gateways defined on the L3 switch itself at least), eventually I will implement the ACLs that only allow the traffic to talk to the subnets it needs to, but for now it "should" be open season as far as inter vlan routing is concerned. The only actual "rule" on the switch is just a basic 0.0.0.0/0 15.15.15.1 just to get everything pointed to the uplink port on the firewall, would adding a deny 10.10.10.0/24 15.15.15.0/30 be appropriate, or am I misunderstanding this?

It sounds like and ACL issue, or even a VLAN tag being stripped somewhere, normally if I have an issue like this that is bugging me, I'll mock it up in packet tracer, switch to simulation mode, and watch the packet die, than I know on what device to actually begin troubleshooting on. I assume the uplink between your switch and firewall is in trunk mode and allows all the VLANS?

Another thing is this " IP policies defined on the firewall that sends traffic between the L3 switch uplink port and the printer port " are you using a form of VRF to do this? are you allowing established packets in the reverse? printer to uplink....if that's the way you want to it go back.

Really good idea, I have an old copy of packet tracer from when I used it in high school that I could mock things up in (apparently you cant "just" download it now, at least not direct from cisco) usually I just mock things up on my test network/server, but PTracer sounds like much less effort :D

So im not going to lie here, been in IT for about 10 years, but most of that has been in desktop support roles with some system administration on the side, ive only been doing networking for a few years and I definitely didnt know VRF was a thing. The uplink port is configured to accept all VLAN tags. I think you might be on to somthing with the VLAN tag getting stripped, I zoned out and accidentally moved my secondary test network onto the production VLAN when I reconfigured it, but it too has its gateway as a port on the firewall rather than the L3 switch, and it worked perfectly. The only mechanical difference between the 2 configurations is the VLAN ID, im going to get somthing mocked up tomorrow if I can find the time. I do have IP policies that specifically allow printer -> uplink, and uplink -> printer. Just a random question that I wanted to test, but havent had a chance, can I leave a firewall port with an IP of 0.0.0.0/0 and just use policies to route whatever traffic comes its way? Ive always defined an address on a port, but the VPN tunnel interfaces are all 0.0.0.0/0, so it got me thinking.....

Sounds like ACLs. What brand of FW. Since your switch is acting as layer 3 devices from the different vlans do not need a router to access one another.

Fortigate firewall, aruba switches. I might have misspoken, the L3 switch is what I would consider to be the "router" in the sense that it handles the inter VLAN stuff, but obviously it isnt a full on router in the traditional sense. According to Arubas documentation, any VLAN without an IP address defined on the switch itself would not participate in the inter VLAN routing, which is why im trying to use IP policies on the firewall to handle that.

If I"m understanding your design properly, you need a static route on the firewall for return traffic. The firewall doesn't know where the other subnets exist so it drops packet due to no valid routes to destination. The same is true if VLAN15 is a separate physical interface. If that's the case, don't use a VLAN between switch and firewall. Just make that a routed interface to simplify it.

Basically, your printer gateway exists on the firewall, but the firewall doesn't have a route to send traffic to the other VLANs that are directly connected to the switch. In this scenario, you need a static route. This could also be an ACL issue. By default all inter-VLAN routing is allowed on switches this way, but if you create ACLs of any kind you need to create a specific allow rule for any L3 or service like ICMP otherwise it will hit implicit deny.

My recommendation to simplify this, increase visibility and security is to tag all VLANs to your firewall and create a subinterface on the firewall for each VLAN and then let the firewall handle all the routing and access rules between VLANs. This reduces blind spots on your network and your firewall gets visibility into every single MAC address on the network. However, your firewall will need to be up to the task of processing packets at your desired speeds.

View attachment 269401

Good point about simplifying the traffic flow, this is somthing I will look into. Though the reason I dont create sub interfaces on the main fiber uplink (bonded 1g fiber because the GBICs were like $8 each :D) is because im trying to physically isolate this data onto cables other than the main fiber uplink, purely for the fact that I am also getting a voip network running and I did want that network to have its own isolated cable for ease of configuration, and from what I understand, the packets would get transmitted at wire speed if left at layer 2 (at least for internal to internal calls, im building a multi-site 2 site fiber connection as opposed to the 4 independent connections + VPNs we had before), so I figured that I would at least need to know how to make this work before we needed the phones cutover. I assumed printers and guest were a pretty safe bet to start "learning" on :D I do have static routes that point all of the respective production subnets back out the uplink port, and ive added a static route that points the printer network out of the uplink port too, but that didnt make sense to me so I ditched it. Was that the right move?

Sorry if im a little all over the place, its been a really long couple of weeks and I just got the good news that one of our vendors messed up all the new cable runs for one of my 2 wifi upgrade projects im running, hopefully the chromebooks dont need any wireless to work :D (sidenote.........they do......)

The good news is I got comcast to expedite our fiber runs (saw the guys on the poles today!) AND the VOIP cutover went perfectly smooth at the main site where the L3 switch/Firewall is going to be physically located, so theres that I guess....
 
If this is a FortiGate then you definitely need to embrace the FortiGate way with device detection and tagging all your VLANs up to the Gate. The visibility is so awesome.

Which model FortiGate is it? Chances are if these are 1G copper/fiber links then it can do normal L3/4 rules at 1G line rate. Their performance is really solid. Where you have to be careful is doing inspection in your inter-VLAN policies. Also if you want to isolate the VLANs then you can still do 1:1 VLAN to physical interface on the FortiGate from the switch.
 
UPDATE - I "solved" it!

Ive been picking away at this issue as I have time away from all of the other projects im doing, it got to the point where I even called an outside engineer to look at my config, at which point it was determined that the config is absolutely correct and theres clearly some bug or issue with the firmware of the FW (or even switches). I updated every piece of networking equipment in the entire site to the latest (minus the FGT which I had to back down to 6.4.0 from 6.4.2 as I had a legitimate bug with the web UI) which solved absolutely nothing, I got to thinking this is an issue with a stale entry in the arp cache that for whatever reason wasnt clearing on a reboot of the device, manually cleared the arp cache of all the equipment............still nothing. So I gave up, I just said "screw it" and put all of the printers into production anyway, with absolutely no changes to the config.......I can communicate with everything on the printer network except for the one specific host I was using to test with, which is just a PC in my office plugged into an untagged printer VLAN port. This is after multiple clears of all the arp caches, outside engineers verifying my config, tickets to fortinet, physically moving ports, checking and rechecking configs, updating ALL the firmware.........I tried everything and to this day I still have literally no idea why my test PC doesnt work and everything else does, but I am so past the point of caring. I wish I had more time to dig into this, as it seems like an incredibly interesting problem to have, but I can pretty safely say the issue is resolved for absolutely no reason, nothing has changed on the config.........and yet here we are, pinging actual printers!

Thanks for the help everyone, I used to think "testing in production" was the sign of a sub-par environment, but this one specific instance proved that testing in the test environment is sometimes significantly worse than just putting it directly into production with no regards to testing it first..........what a nightmare lol
 
FYI, FortiOS 6.2.5 dropped today and the bugfix list is quite extensive. It's probably worth reviewing the resolved issues and known issues and make a judgment call to update to this code.
 
DO your ACLS allow traffic on ephemeral ports?
I'll assume you use Wireshark and nagios to help trace.

If you map out your traffic are you seeing any issues with transitive routing?
 
FYI, FortiOS 6.2.5 dropped today and the bugfix list is quite extensive. It's probably worth reviewing the resolved issues and known issues and make a judgment call to update to this code.

Good to know, im going back onsite tomorrow to wrap up the basic L2 configs at my other sites, ill take a look at the release notes and see if I want to give the new firmware a shot, though so far I really like how stable the release im on is, and it has several quality of life features over the release I was on prior (5.6........dont judge me :p )

DO your ACLS allow traffic on ephemeral ports?
I'll assume you use Wireshark and nagios to help trace.

If you map out your traffic are you seeing any issues with transitive routing?

Im no expert with wireshark, but I have used it and I try to use it when i can, but ive got one better for you.....

The entire ****ing time it was the ****ing windows firewall!!! ****!!!


I had an engineer out to one of my sites today to help me with some issues I was having getting new AP's to play nicely with the existing infrastructure, and I just happened to mention the weirdness with this whole printer routing scenario. Instantly he suggested disabling the software firewall, as he had ran into similar strange unexplained issues before, and it turned out to be a scenario where windows decided that the network was a public network and not a private/domain network (even though it is listed as a "work" network in network and sharing center on my machine). Why I didnt think of this sooner, I will never know, but I thought just enabling ICMP echo in/out in he firewall was good enough. APPARENTLY NOT FOR WINDOWS! He said theres some commands to run that will make winblows see it as the appropriately private network that it is, but at this point im going to nuke the install (with extreme prejudice) and put Kubuntu/Snow Leopard/Chromium/ANYTHING but windows on it :D

Honestly, if this was 5 years ago my step 1 would have been to disable that stupid firewall and THEN run the tests, but now im boring and care about nonsense like "best practices" and "security", and making things work "the correct way", which typically doesnt account for (very typical) bugs in windows. Ive got half a mind to plug an XP box with no service packs directly into a modem and put an external IP on it as static just for the rush id get. So done with everything right now.........but at least the mystery is solved :D

Hopefully if anyone in the future is chasing down mystery problems like this, they find this thread and disable the stupid firewall BEFORE wasting days spinning their wheels on nothing. What a relief.......
 
Yeah, next thing I was going to suggest was local config/cache issue, but you figured it out.
 
It was a 25 ago, but I had a similar problem with HP departmental printer software having its own notions of networking and security and we had to disable source IP on it.

There's all sorts of implementations throughout the chain that's always fun to run up and down flights of stairs for a couple days trying to figure out.

I remember some weird series of MacOS releases that broke NFS mounts that had us ripping apart a studio trying to figure out while on the clock for a delivery to Universal.

Glad you got it handled.
 
Back
Top