IT Pros, rally to me!

oldpablo

Supreme [H]ardness
Joined
May 31, 2003
Messages
6,352
Okay guys, I just got handed a fun one. I still have yet to get all the details, but I wanted to get a head start since the cause or path to discovering the cause has not yet been established. Telnet sessions are timing out seemingly at random between client stations (XP Pro) and the telnet server at the main office. The timeout happens when they are sitting, not being used. When it works, it works just fine, so I am led to believe (nothing ruled out yet) that its something about the networking, not the software. Here are the facts (variables?), and I'll post more as they come in:

1. Old XP image didn't do this, new image does. I didn't create the image (was made before I started here) and to go back and trace what all is different between them would honestly be less usefull then to just recreate the image again. That probably won't happen though unless it is last last last resort. I only bring up this point to establish that this process has worked within the existing network structure in the past with no software changes. Well maybe a version has changed, I don't know and will find out.

2. Setting stations to a static IP seems to make the timeouts go away. Setting short DHCP lease times makes it happen more frequently and setting longer times make it happen less often. I'm thinking maybe when an IP expires is when its timeing out, I don't know and have yet to get my hands on a system to leave on next to me to be able to catch that. Thats first step for me.

3. We are a Novell 6.5 network using ZenWorks 4 for management and XP Pro for clients. We are using TCP/IP, not IPX/SPX. I was hired to be the MS guy (they have very few MS servers) and honestly don't have much familiarity with Novell or Zenworks (yet). Should have paid attention in high school when the netadmin was giving me the Novell certification courses for free to keep me from finding his network soft spots and installing Wolf3D and Wing Commander... ;)

4. Uninstalling and reinstalling the latest Novell client (which is what we use) seems to take fix it, but I was handed that info from someone else and I always treat others troubleshooting as "not a guarantee". I'll assume its true for now though.

Now, I realize theres a lot more information thats needed to properly analyze this, and I'm going to be getting it. What I'm looking for are leads/similar situations that might point me in various directions. I used to be an escalation engineer for the 95/98 project for MS, so I understand that even something that sounds similar yet doesn't seem to be related can be the key to tracking down the "final answer." One more thing, serious IT Pros only need reply. I don't mean to be offensive, but I'd rather not get 10 posts like "make sure you get your windows updates" and "check your antivirus." Don't be afraid to call me on something if you think I may have missed it, but lets not pursue the obvious too far... Thanks, I'll be editing this post with new data as it comes in...
 
2. Setting stations to a static IP seems to make the timeouts go away. Setting short DHCP lease times makes it happen more frequently and setting longer times make it happen less often. I'm thinking maybe when an IP expires is when its timeing out, I don't know and have yet to get my hands on a system to leave on next to me to be able to catch that. Thats first step for me.
That gets my vote by a mile. Ive run into soooo many problems when running connections, of almolst any kind, over a DHCP connected system. I have learned to HATE DHCP for anything more then just a system that browses the internet. I have corrected tons of friends' problems with games, messager programs, telnet, SSH sessions, VNC sessions, you name it, by just helping them set up a static IP setup for their LAN. I totally disabled the DHCP server in my home setup long ago and only run static IP because I had similar troubles as well. These issues get worse when you have a DHCP server that is a bit flakey, by the way. *remembers her jewett roofing network setup woes*

Just run static IP if you at all can, and put the system IPs into a master list and also on the case with labels. better in the long run. A good idea to find rogue systems when doing this is just to keep a DHCP server going with just a few IPs accessible to it and then as you see clients with DHCP leases going after them with your static IP table and nailing them to a static IP.

I have grown to hate DHCP, and I dont think it has much of a use anymore because of the 'break connection' thing, because today's apps rely on a steady constant connection to if not just LAN, also the internet.
 
Are they timing out on connect or once a connection is established they just drop out after a while?

Everything comes down to something not set right on the new XP images. Find out what changed or remake the image yourself and set it up on one test box to see if it stays connected.

First thing I would try:

Make sure dynamic udpates are enabled on the dns server. Then make sure the workstations are set to "register this connectioin in DNS"

Also check the link speed on the NIC. The new image might be set to auto-detect. Try setting them to the actual port speed.
 
Download PuTTY and set it to send keepalive packets to see if that works.
 
Yoblad said:
Download PuTTY and set it to send keepalive packets to see if that works.

Thats the hard part, its seemingly random when it happens. I have a laptop next to me now thats stayed connected for a few hours now. I'm waiting for it to dump so that I can check event logs, etc to see what might have happened at same time. Unless I'm missing something, if I use putty and keep it going how will I get to the bottom of the issue if its no longer recreated? I guess technically that would help me find out whether or not its at the TCP/IP layer. As for the DHCP post above, I'm pretty sure they will want all 38 locations to stick with DHCP. I hate workarounds, its hard for me to not take an issue through to the end... :mad:
 
have you put a sniffer on it? I understand it's hard to catch but it would tell you some things.

As for the DHCP removal post. Hogwash. Our network has over 20K clients, all DHCP. If it's setup right, with the right hardware and software DHCP is a no brainer.
 
Interesting bit of info, I just found out that our IP lease times are 2 hours. Thats 2 hours for all 38 locations. To me that seems like a very bad idea and a heck of a lot of network traffic is being generated far more then necessary. It used to be at two minutes a few months ago. Holy ipconfig Batman! Now this issue wasn't around with the old images remember, but the image maker dude and the DHCP dude aren't the same dude and they work independantly from each other, so maybe this is one of those several things all together "equal this issue." Anyone have a standard recommended industry standardized average lease time? I mean a week off the top of my head seems fine to me...
 
86400 seconds is about standard. that's a day. 2 minutes is just ridiculous. is there one dhcp server or 38? it should be easy to change...
 
One server. Does anyone know if network connectivity stops when an IP gets renewed, even if for a split second? Enough to drop a telnet session with not builtin network recovery maybe? :)
 
OldPueblo said:
One server. Does anyone know if network connectivity stops when an IP gets renewed, even if for a split second? Enough to drop a telnet session with not builtin network recovery maybe? :)
That's very possible. Set it to 3 days. A week would be fine as long as you don't move PC's around too much or install new ones a lot. As long as the connections register in DNS then netbios stuff will work well even with a shortened DHCP lease time.

Question: You say the DHCP lease time wasn't 2 hours before? But now they are.

Is it possible that in tandem with the DHCP lease changes and the image chanes someone changed the inactivity timeout on the telnet server? ...maybe something to check.
 
ktwebb said:
. . . . Our network has over 20K clients, all DHCP. If it's setup right, with the right hardware and software DHCP is a no brainer.
of course it is. if its been set up with, what? quality equipment. and what have I stated my experience with? home networks, and a small business or three. Hardly the place where quality equipment is in surplus, usually if not always its the cheap crap that is bought in these two situations, linksys, accton, etc.
 
OldPueblo said:
One server. Does anyone know if network connectivity stops when an IP gets renewed, even if for a split second? Enough to drop a telnet session with not builtin network recovery maybe? :)
Yeppers, it does, without question. it dumps the IP when it gets the new one. try it... run telnet, and do a renew on the DHCP connectiong. oops! youre bumped offline on anything you were running. Thats why i suggested static IP, stops the problem entirely. By the way I did not know the scope of which you were running. Anyways, there is a chance that cheap or squirrelly DHCP servers are running. ive seen one actually disconnect everyone for a second whenever any machine was assigned a new IP/lease was up for renew. Another, ive seen it assign the same IP to a second system and make the system that had it renew its IP to a different one, bumping it off for a second too. DHCP, if done right with good equipment, is good. but in my own personal experience, its all too often not. If I ever did DHCP it would have to be with some high end hardware and a month or so lease time. Too much crap can happen otherwise.
 
OldPueblo said:
. . . . Anyone have a standard recommended industry standardized average lease time? I mean a week off the top of my head seems fine to me...
A week is good. if nearly all your IPs are used you might want to take it to 5 days. I personally prefer setting it a lot longer, say 2 weeks to a month if I ever do run them.
 
of course it is. if its been set up with, what? quality equipment. and what have I stated my experience with? home networks, and a small business or three. Hardly the place where quality equipment is in surplus, usually if not always its the cheap crap that is bought in these two situations, linksys, accton, etc

Has nothing to do with this thread. We're not talking about home networks here or even small office installations with budget, 50 dollar routers with integrated dhcp servers. ZenWorks, Novell shop with Novell servers. Hasn't intimated large enterprise network but you don't use Zenworks in a small office. I wouldn't anyway. You can manage 5 or 6 PC's without thousands of dollars managment software. So yeah, if for some strange reason this fella is using a Linksys router to dole out IP's, then static addressing is a pretty good idea. For some odd reason I think that isn't applicable here. Could be wrong but I highly doubt it. .
 
What version of client32 are you running? I would confirm that you have the latest version, as bugs are always being fixed in new builds...

There are several Novell TIDs on node drop outs... but they usually fix the problem with a new client32 build..

You could run netmon and filter the results while manually forcing a workstation to renew its dhcp address, to see what kind of errors are occuring if any?...
 
jaqie said:
Yeppers, it does, without question. it dumps the IP when it gets the new one. try it... run telnet, and do a renew on the DHCP connectiong. oops! youre bumped offline on anything you were running. Thats why i suggested static IP, stops the problem entirely.

BS.... Unless your doing a *release and renew* the client system will request a renewal on the existing IP address. The client will only dump the existing address when the DHCP server has for whatever reason decided to deny the renewal, in such a case the client would be given a new IP.

Simply doing an ipconfig /renew does not dump the existing address. Yes I have tested it according to your description.

Running a sniffer is going to be about useless unless someone knowledgeable is able to configure it and read whats going on.

Oldpueblo, is this occurring just at remote sites, or is it also occurring in the same office as the telnet server?
 
scuse me you need to look at my experience and what ive run stuff on before you say that. Most equipment that ive had experience with DOES do that. I am constrained by my experience here, and its not BS with what ive dealt with. maybe those high end...*calms herself down* business stuff you can play with does that but a lot of the stuff I am forced to work with because I am on the comsumer end of things it doesnt usually.

Ive been trying to help as best my experience and knowledge allows, and you decide to walk all over me because you know more. men. *is reminded why she hates men so much* Ive tried to be nice. All I get is my face pushed into the dirt.
 
jaqie said:
scuse me you need to look at my experience and what ive run stuff on before you say that. Most equipment that ive had experience with DOES do that. I am constrained by my experience here, and its not BS with what ive dealt with. maybe those high end...*calms herself down* business stuff you can play with does that but a lot of the stuff I am forced to work with because I am on the comsumer end of things it doesnt usually.

Ive been trying to help as best my experience and knowledge allows, and you decide to walk all over me because you know more. men. *is reminded why she hates men so much* Ive tried to be nice. All I get is my face pushed into the dirt.

All I am saying is your making a blantantly false claim by saying:

"Yeppers, it does, without question. it dumps the IP when it gets the new one. try it... run telnet, and do a renew on the DHCP connectiong. oops! youre bumped offline on anything you were running."

Even cheap 4 or 8 port Linksys routers don't exhibit those problems. All I can say is the equipment you've been dealing with must be pretty damn faulty to be causing that problem in the networks your dealing with.

Just because I work as a consultant doesn't mean I get to play with expensive high end equipment all the time as my projects are constrained by my clients budgets.

I'd also appreciate it if you don't play the gender card, it's really not appreciated since I did not know you were female, and quite frankly don't care as it isn't going to change how I respond to anyone spreading misinformation.
 
ooh ooh my turn. BS.
bye now.

and you didnt know I was female? hm, you must not read sigs.
 
I'll have to check on whether or not its happened at the home office, the previous tshooter has that data. I have to correct myself. I didn't think about this earlier, but when I asked about network connectivity dropping with an IP renewal my test laptop had already sat without dropping through two automatic renewal periods (two hours each). So I guess we can throw that out the window. We are using the latest client, I verified that. Whether or not we were using it when this issue started months ago, I don't know. Too many variables right now, the previous tshooter uses the old shotgun method so all the info he is giving me is inconlusive in many ways. I always try to isolate down till I nail it. The DHCP server is a linux box, thats all I know so far. Inactivity timeout on the server? Possibly, though if no-one has checked that I'm gonna line up some IT guys and smack the hell out of them. The guy who runs the telnet server (its a non windows box, sun maybe?) is on "his own" as well. I'm currently monitoring the connection with TDImon from sysinternals since thats all I had handy at the time. This place drives me nuts to a certain extent (only been here a month or so), its very different from where I've worked in the past. Novell network with linux/mac/ms servers sprinkled everywhere all doing different things yet all working together to serve mac/ms clients. Because of the variety there are several IT guys, the Mac guy, the Linux guy, the MS guy (me), the Router guy, etc. One of the reasons I was happy to get this job was to expose myself to stuff I've been wanting to mess with yet haven't had time to on the side. But now it seems to me that we would be FAR more efficient moving to one platform across the board. MS of course... ;)
 
jaqie said:
and you didnt know I was female? hm, you must not read sigs.

Nice way to digress from your posting of misinformation to which you probably won't admit you were wrong. FYI I don't read sigs because most of them contain information that I don't care about, hence I tune them out which allows me to put forth more effort for problem solving and correction of statements like yours.

OldPueblo said:
I'll have to check on whether or not its happened at the home office, the previous tshooter has that data. [rest snipped]

It would help alot to possibly narrow down if this problem occurs only remotely or not, as for the rest it sounds like ya need a director to keep things in "tune". :p
 
Yeah, lets say I have big plans for the place... :D Tomorrow will bring more data, thanks for the help everyone.
 
jaqie said:
ooh ooh my turn. BS.
bye now.

and you didnt know I was female? hm, you must not read sigs.
I don't read sigs, and I even have one of my own. Most of them are either lists of hardware specs or just uncreative quotes from other users. Some are just a billboard for the person to tell the world how cool they think they are. Anything longer than a line is generally a waste of space, and when sigs get longer than some posts, they go beyond ridiculous. Don't expect everyone to read your sig, don't expect everyone to look at your profile, and don't expect the world to care about your daily blog (if you have one).

As for SJC's claims, he is correct. If the device is renewing its lease on the same address, there is usually no loss of connectivity. If there is, you are dealing with faulty equipment. Otherwise, people all over the place—in home networks, on school campuses, and in the enterprise—would experience massive network-wide hiccups while transferring data or simply connecting through a terminal session to a server.

This has not been a problem since back in the days of networked dumb terminals (or thin clients).

It's easy to test: just take any home router, set the lease time as low as it can be set (some will be an hour, some a minute), and go ahead and start pinging the router or connecting to another computer's files (another Windows machine with RDP is an optimal environment). Wait until the lease time is up, and see if you drop packets or lose the sustained connection. I can guarantee you that you won't, or you'll need to RMA the equipment that is failing.

OldPueblo, you may want to go into the NIC settings on the window boxes and check to make sure they aren't set to idle when the machine does, or basically power down when Windows goes idle. If it doesn't do this until the machine has been left idle for a bit, then this could be the culprit.

Go into the NIC properties (connections >> NIC >> properties) and then go to "configure." It will probably ask you if you're sure you want to make changes, and if it doesn't, don't be alarmed. You should get a window with a handful of tabs, and you want the "Power Management" tab. Un-check the "Allow the computer to turn off this device to save power" box, and then click "OK."

Then you're done. Test and see if the probem persists.
 
AH, I'm aware of that option but hadn't considered it. That might explain the random nature of it. Thanks, I'll check it out.
 
In the Windows world, the DHCP client will send a request to renew it's lease halfway through it's lease. If it cant find a server on the first try, it will try again 3 times. If it's unsuccessful it will keep it's assigned address and when something like 90% of it's lease time has expired it will try this again. In any case, it should not cause disruption in network traffic unless the IP address assigned to the client changes. Nothing is happening here except a change in a timer that tells the DHCP client how long to wait before it tries to renew it's lease again. It's not as if the TCP/IP configuration is being reapplied to the NIC.

There's probably an article on this on Microsoft's TechNet which could explain it better than I can.

We use DHCP all day long, and for PC's that it's better to have a static IP on, we just make a static DHCP reservation which binds an IP to a particular MAC. That way if DNS servers change, default gateways change, etc...the client will still get the update through DHCP without us having to touch the box. The only time I've ever seen issues with DHCP is when it wasn't configured correctly. For example, if you set your lease time too low, you're going to have a ton of traffic on your network from clients trying to renew leases.
 
Update. Turns out it doesn't happen at the home office at all, and the test machine I am using (with the same image that didn't drop all night) does have the power management box checked. So if it doesn't happen with a static IP and it doesn't happen at the home office, that would mean its some kind of bandwidth issue?!? I'm starting to think whoever came up with "it doesn't drop with a static IP" maybe just didn't wait long enough...
 
One thing to check is the client binding order.
I've seen similar oddness with the novell client first.
 
Back
Top