Packets disappearing

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
I've got a very simple network that is nonetheless losing packets somewhere. I've got a Solaris 10 machine on one end, with a bge interface, and on the other end is a Linux 2.6.21.3 machine with a Realtek 8169. There's no switch in the middle, they're directly connected via crossover cable. Firewalls are turned off on both ends. SSH works just fine between the two machines (as far as I can tell, anyways), and X forwarding does too. But when I try scp'ing a few files from Solaris to Linux, I get this:
(running snoop -d bge1 | grep -v 51401) (because 51401 is the port ssh is running from)
Code:
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Push Ack=1769018628 Seq=3632270790 Len=64 Win=55680
  10.1.1.122 -> 10.1.1.123   TCP D=38496 S=22 Push Ack=3632270854 Seq=1769018628 Len=48 Win=251
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Push Ack=1769018676 Seq=3632270854 Len=128 Win=55680
  10.1.1.122 -> 10.1.1.123   TCP D=38496 S=22 Push Ack=3632270982 Seq=1769018676 Len=48 Win=268
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018724 Seq=3632270982 Len=0 Win=55680
  10.1.1.122 -> 10.1.1.123   TCP D=38496 S=22 Push Ack=3632270982 Seq=1769018724 Len=48 Win=268
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Push Ack=1769018772 Seq=3632270982 Len=96 Win=55680
  10.1.1.122 -> 10.1.1.123   TCP D=38496 S=22 Push Ack=3632271078 Seq=1769018772 Len=48 Win=268
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632278038 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Push Ack=1769018820 Seq=3632284998 Len=2512 Win=55680
  10.1.1.122 -> 10.1.1.123   TCP D=38496 S=22 Ack=3632271078 Seq=1769018820 Len=0 Win=268 Options=<nop,nop,sack 3632284998-3632287510>
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> (broadcast)  ARP C Who is 10.1.1.122, 10.1.1.122 ?
  10.1.1.122 -> 10.1.1.123   ARP R 10.1.1.122, 10.1.1.122 is 0:1:8:0:82:2d
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=43075 Ack=1363309704 Seq=3532640949 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=49863 Ack=1697702876 Seq=3614496709 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=49863 Ack=1697702876 Seq=3614496709 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=49863 Ack=1697702876 Seq=3614496709 Len=6960 Win=55680
  10.1.1.122 -> 10.1.1.255   NBT Datagram Service Type=17 Source=FS[0]
  10.1.1.122 -> 10.1.1.255   NBT Datagram Service Type=17 Source=FS[0]
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> (broadcast)  ARP C Who is 10.1.1.123, 10.1.1.123 ?
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=49863 Ack=1697702876 Seq=3614496709 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=38496 Ack=1769018820 Seq=3632271078 Len=6960 Win=55680
  10.1.1.123 -> 10.1.1.122   TCP D=22 S=49863 Ack=1697702876 Seq=3614496709 Len=6960 Win=55680
and on the other end, running Ethereal with the filter "not port 51401"
grab.png


As you can see, the dump from 'snoop' shows the Solaris box pushing 6960-byte packets onto the network. This is good - its MTU is set to 9000, and the Linux box is set to 7000, and it notices that and pushes smaller packets than it could otherwise. But the Ethereal capture shows none of these packets arriving.

Any ideas what's wrong? What tests could I run to eliminate problems? I have another gigabit-capable (but not jumbo-capable) machine running Linux that I can use to debug.

The MTU is almost definitely the issue; setting the MTU to 1500 on the Linux end makes scp work just fine. ifconfig won't let me set the MTU to 9000 on that end. In fact, values up to 1558 work okay. I haven't found a single value over that that results in a successful transfer.
 
Back
Top