wireshark: how to determine which "Switch" is the culprit?

oROEchimaru

Supreme [H]ardness
Joined
Jun 1, 2004
Messages
4,662
from my pc to the file share or back there is 0 packet loss and ping rate is 1-5ms on the lan. However file transfers for folders that have many small files go at 1-5mbs while a large file transfers at 100mbs+. There are a ton of tcp dup ack errors. I would like to narrow down if it is a specific switch with issues. any tips when looking at the log files which switch could possibly be causing the issue?
 
how big are the "small files?"
Or what range of sizes are we talking?

It is completely normal for small file transfers to have substantially slower transfer rates due to TCP overhead.
 
Small files take longer due to the system acknowledging that the file made it to the destination after each file is moved/copied, which makes the small files transfer slower overall. this is what I read a few years back.
I use a different file transfer program on my main machine, TeraCopy, which seems to make transfers faster.
 
I would look for an abnormal amount of CRC errors on the switch port in question. Possible port duplex mismatch, 10half or something similar.
 
While there could be a duplex issue or something of that nature it is much more likely that OP is experiencing normal behavior. There is a MUCH larger overhead when transferring many small files.

When transferring a large file you're dealing with sequential reads and writes to/from the HD. While small files becomes more random read/write. This could be mitigated quite a bit by using SSDs.

You're also dealing in larger TCP windows with less acknowledgement packets. There's not a lot you can do here.

You could make sure that RSS is turned on and if this is Windows and this performance is highly degrading to your bottom line you could try switching to Server 2012+ and Window 8+ where you can take advantage of the SMB3 performance enhancements, though to what degree this will help I'm not certain.

EDIT:

I just noticed that OP said "There are a ton of tcp dup ack errors" that does sound like a potential like issue. So it could very well be a combination. So if your desire is to determine what switch is causing the issue then it boils down to what switches do you have? If they are managed switches you can look for interface errors/drops and speed/duplex issues. If they are not managed you will just have to test a lot while getting closer and closer to the source (like using a laptop and plugging into upstream switches)
 
Every dozen or so lines during transfer is followed by 1-10 tcp dup ack errors. The files are 300kb-1mb in size. I'm trying to transfer an extracted version of emisoft's portable antimalware toolkit:
https://www.emsisoft.com/en/software/eek/

This behavior also occurs if you try to execute the application from the file share. The bandwidth is so low that it goes to a not responding state. If you copy the app over it runs fine. Ideally I'd like to address this so the file can execute quickly.

I'm not advance enough in wireshark to determine:
a. which switch/hop is possibly a culprit between my pc and the file share (any pc in this location)
b. my connection is 1gps but is there a switch thats limiting bandwidth


example:
4 0.000070000 10.x.x.x. 10.x.x.x. TCP 66 [TCP Dup ACK 1#3] 55860→445 [ACK] Seq=1 Ack=1 Win=255 Len=0 SLE=1429 SRE=7141
 
Every dozen or so lines during transfer is followed by 1-10 tcp dup ack errors. The files are 300kb-1mb in size. I'm trying to transfer an extracted version of emisoft's portable antimalware toolkit:
https://www.emsisoft.com/en/software/eek/

This behavior also occurs if you try to execute the application from the file share. The bandwidth is so low that it goes to a not responding state. If you copy the app over it runs fine. Ideally I'd like to address this so the file can execute quickly.

I'm not advance enough in wireshark to determine:
a. which switch/hop is possibly a culprit between my pc and the file share (any pc in this location)
b. my connection is 1gps but is there a switch thats limiting bandwidth


example:
4 0.000070000 10.x.x.x. 10.x.x.x. TCP 66 [TCP Dup ACK 1#3] 55860→445 [ACK] Seq=1 Ack=1 Win=255 Len=0 SLE=1429 SRE=7141

You're not going to see the switch as a hop since it doesn't touch the frame, just forwards it.
 
from my pc to the file share or back there is 0 packet loss and ping rate is 1-5ms on the lan. However file transfers for folders that have many small files go at 1-5mbs while a large file transfers at 100mbs+. There are a ton of tcp dup ack errors. I would like to narrow down if it is a specific switch with issues. any tips when looking at the log files which switch could possibly be causing the issue?

I too believe it may be down to just normal behavior with small files.

I see this same behavior even when scanning my office's allocated network shares. I can scan my personal (i.e. Home) network share of about 8.5 Terabytes of files in a little over 9 hours while it takes 31 hours to scan my office's network drive containing only 442Gigabytes of files
 
I've ran the utility on other file shares without issues (previous employment). Running portable utilities saves alot of time for techs and helps reduce conflicts with malware infected devices. The application hangs when running from the fileshare.

Any other ideas/insight on wireshark?
 
due to the 3-4 switches it travels through between me and the file share.

If people can remove my scenario out of the equation and just give me wireshark coaching tips that would be preferred.
 
unless you're doing layer 3 switching, you will not be able to find the switch causing the issue, if it is in fact a switch, using wireshark.

are you also seeing tcp retransmits or tcp fast retransmits?
 
Thanks for the feedback. Looks like I am not seeing those messages just the errors:

4 0.000070000 10.x.x.x. 10.x.x.x. TCP 66 [TCP Dup ACK 1#3] 55860→445 [ACK] Seq=1 Ack=1 Win=255 Len=0 SLE=1429 SRE=7141
 
As others said, it's probably just an IOPS issue. A single large file almost always copies faster than a bunch of random files because the disk doesn't have to spin back and forth. AKA "Dancing head".

The best way to test would be to plug a machine into the same switch that the file server resides on. If you still have the slower speeds, than it is probably not a networking issue.
 
Hello, it is fine if the files are copied to a server located in the same data center/building.
Not an I/O issue.

It transfers at 90mbs from the data center from the storage location to a server.
 
I already told you how to determine what switch is to blame if it is a switch. Assuming they are not managed switches or you don't have access to look for config/interface errors:

Start following the switches up stream. Plug in a laptop into the same switch you're having issues on, check the performance and look for errors. Then go one switch up and try again, compare results, repeat until you no longer observe issue.
 
I recommend using a throughput testing tool to see what your sustainable bandwidth is across the switch. Take a look at iperf/jperf. This will help you rule out iops v switch problems.
 
For sake of argument I'll point that your post, #16 in the thread, does not eliminate the most likely source of problems, your own PC. There is absolutely zero reason to suspect a network problem given your description of the issue.

I will point out that if you feel you must eliminate the network then do what the poster above suggested and use a proper tool as CIFS file copy is perhaps the single worst possible way to measure network speed.
 
thanks for everyone for your feedback. Nicklebon I've already tried test labs and multiple pcs they each have the same issue. however servers in the same datacenter as the file share do not.

however the vendor also believes it may be due to the small file sizes of 50-500kb. however it is odd that its blazing fast doing a copy paste from the file share to the server but to pcs in other locations dramatically lower for small files, yet the same 100mbs for 100mb files.
 
I'm still stuck on this issue. It is not a "small files = slow transfer issue" since that only occurs between the data center and external locations like our IT office. Onsite communications or other areas on the campus are fine with the transfers (copy down to data center). It is also not a speed issue with some pc to pc transfers but its for SAN to pc transfers at other locations.

Since the VM servers most likely use the same SAN, this probably explains the faster transfers from the file share copy to a VM server.
 
anyone else have ideas? I cannot do a trace up by each switch but I may try to find devices on different switches/locations to test.
 
I don't think this is a switch issue if your using layer 2 switches as they don't touch the packet or interface with the TCP protocol. Somebody, transmitter or receiver is generating those TCP dup acks...I'd assume it's the file receiver but you should be able to see who is generating that message in Wireshark by its source IP. I'm a bit hazy on the protocol's basic functionality but there is Window Size that sender/receiver should agree on and, once reached, the sender expects to get an acknowledgement on sequence number every so many packets. I guess it could be possible there is a window size mismatch if the server's have theirs set to a specific size though, again, my knowledge on the protocol isn't as sharp as it used to be but I'd focus on layer 3 being the culprit. I'm sure you're already making sweet sweet love to The Google, I'd suggest a deeper penetration into the TCP protocol for answer.

As far as troubleshooting goes, as was mentioned earlier, connect your laptop into each switch in the stream and try to replicate the issue to narrow it down somewhere. I'd assume, based on my thoughts above, that you could connect into the switch that directly connects to the server and see the same issue.


If you find a resolution, let us know...I'd like to know what it ends up being.
 
Back
Top