Help With Network Problem- Intermittent No Response

rosco

Gawd
Joined
Jun 22, 2000
Messages
722
We have a small lan with about 25 stations on a peer to peer network. Lately, we've been having problems with slowness and printing to shared network printers etc. After some troubleshooting it seems that what's happening is that stations are unable to even ping other stations for a period of time. Then, they will be able to ping again. One time, they'll be able to resolve by the netbios name, the next time, it gets host not found.

Any ideas?

All machines have antivirus software. I've tried changing ports on the hubs to see if there was a problem there.

Any troubleshooting advice? The thing is it's so intermittent. It happens very often, at least once every 10 minutes or so it seems.
 
Try ethereal. Move everything (and I mean everything) onto an old hub, and find out what's sending packets where and why it isn't getting a response.

Offhand it sounds like you've got a problem with samba (if applicable) or whatever is your browse master or whatever they call it.
 
I had just started using ethereal and I'm not in the process of figuring out what it all means.

I just have it loaded on one of the workstations, I haven't moved everything over to a hub yet.

So far, I'm seeing mainly UDP packets consisting of NETBIOS name querys and several browser election requests. I haven't let it run for too long yet though.

Anything in particular I should be looking for? It's just so weird how it's this intermittent. One time I can ping a hostname just fine, and then other times not at all. It's also happened where the first ping goes through but the next three fail.

We don't have a samba server. No servers really, just peer to peer.

Also, in ethereal, if something isn't getting a response how can I tell the reason it isn't getting a reponse? I've never used ethereal in case you can't tell. :)

EDIT: I was running ethereal on this workstation and trying to ping and it did fail. So, I've stopped ethereal and I'm looking at the packets but I don't see anything that tells me what happened. All I see is that a ping request was sent and when it failed, obviously none was received. I am missing vital info because everything isn't plugged into a hub?
 
well if it happing to ALL workstations i'd be looking at your switch. But need more info on your infastructure.
 
It seems like it does happen to all of them. It's just that some of them don't get used as much as others so unless they tell me it's a problem, it's hard to know. Because, if I go over there, it might be working at the time. But later.....

I had been thinking switch or hub also. Currently, there's 1 switch and 3 hubs. All of them are old and the switch and one hub are 10baseT. There are three stations that work off the same server that hosts a database for them. So, I put that group of stations on the same switch, (a different 10baseT switch that was laying around) and they still have the intermittent problem.

I talked the boss into purchasing a couple of Dell 24port unmanaged switches as that needs to be done anyway, however, I'm not convinced that will fix the problem. Especially since those 4 stations on one different switch (that was uplinked to the other 10baseT switch) still had problems.

Any tips on what to filter for in ethereal? I am seeing some problems such as retransmission and some failed dns lookups. I'm not sure though what is the cause of our problem and what is just normal failure.
 
rosco said:
It seems like it does happen to all of them. It's just that some of them don't get used as much as others so unless they tell me it's a problem, it's hard to know. Because, if I go over there, it might be working at the time. But later.....

I had been thinking switch or hub also. Currently, there's 1 switch and 3 hubs. All of them are old and the switch and one hub are 10baseT. There are three stations that work off the same server that hosts a database for them. So, I put that group of stations on the same switch, (a different 10baseT switch that was laying around) and they still have the intermittent problem.

I talked the boss into purchasing a couple of Dell 24port unmanaged switches as that needs to be done anyway, however, I'm not convinced that will fix the problem. Especially since those 4 stations on one different switch (that was uplinked to the other 10baseT switch) still had problems.

Any tips on what to filter for in ethereal? I am seeing some problems such as retransmission and some failed dns lookups. I'm not sure though what is the cause of our problem and what is just normal failure.

First off do you have one switch or two? you contradict yourself so its hard to get an idea of your topology. try to isolate the devices one by one if possible. Second I hope you bought some high end dell unmanaged switches and not the cheap stuff thats always on sale. If your in a business invironment you need business grade equipment. I don't care for powerconnect switches. I am partial to HP Procurves for their performance vs cost.
 
rosco said:
Any tips on what to filter for in ethereal? I am seeing some problems such as retransmission and some failed dns lookups. I'm not sure though what is the cause of our problem and what is just normal failure.
I'd start with a capture filter like this:
Code:
port 139 or port 445 or port 137 or port 138
and see what shows up. Particularly interesting will be browser elections: who wins them? This filter says basically to capture NetBIOS/SMB type stuff.

What OS is your server running? Linux/windows/what?
 
I mentioned that there is currently one switch and 3 hubs. I had added in an additional switch and put 4 stations on it and uplinked that switch to the others hoping to isolate the problem. I was hoping that with all 4 stations on the switch and least if there was a lot of broadcast traffic slowing things down so much to the point that a host is only able to intermittently respond that putting the 4 on a switch would help isolate it. At least from the hell that is hubs.

The only server we have is the windows 2000 server that has shared files and a database that the 3 other stations use. It's not too high traffic and has been working fine in this scenario for years.

I did see some browser election requests while scanning through the logs but I didn't notice which machine was winning.

The other thing that was done is we took out a linux router/firewall/proxy machine as it was having problems and replaced it with a xincom dual wan router. That has been working well and our internet traffic hasn't seen any problems. It is not our default gateway as we have a cisco router that routes traffic across a T1 to the other location and passes internet traffic on to the xincom.

Any more info needed? Keep the ideas coming. I'll look over more of the ethereal logs tomorrow.
 
In most cases the server should be winning the elections as browse master. If it's not then there are a few techniques you can employ to ensure that it does. While you're at it, you can also install the WINS server service on that server and have all the clients register with it (by setting the WINS server property in the network's DHCP scope and/or manually entering the WINS server address in the IP configuration of static clients. WINS may be old technology, but it's a lot less random than simple network browse lists, and should completely eliminate what you are describing which seems to be a name resolution issue. With the proper node type set Windows client will use WINS before browsing for most things. The WINS server service is free to install and is very lightweight in terms of utilization. Of course it's best to have a second, redundant WINS server, but one may be better than none.
 
I'm sorry. It was late and I worded that wrong. I meant to say a Windows 2000 Professional machine that is acting as a "server" only in that it hosts the shared files and the database for those three other workstations. It is 2000 pro not Server edition. Sorry about that.

The linux server we took out wasn't acting as a WINS server (at least none of the computers had it's ip address listed in the tcpip properties.) And it had been working fine like that. I had thought of installing SME server on the box that we uninstalled and having it act as a WINS server. I just don't think that will solve the problem at this point. We're having problems with pinging not even working consinstently. So, I don't think a WINS server is the main solution here. It might be something that also needs to be done though.

First priority is figuring out why pinging isn't consistently working. Like I said, the first of four ping attempts could fail and the last three go through. Or, the first attempt is successful and the last three get no response. Something weird is going on.
 
Well, I replaced the hubs with two 10/100 unmanaged switches and the problems are still the same. I don't understand. Pinging is so low level and I can't even get that to work consistently.

I just tried pinging from one station to the station that hosts the database, no response. I tried pinging the firewall, that worked fine. I pinged the router, that worked fine.

I have no idea what's going on. I thought for sure those switches would fix the problem. How can pinging like that fail? I know it's not a firewall on the pc because I've checked and also pinging does work properly sometimes.

Help.........please :(
 
Back
Top