Severe problems on loading Windows user account.

FelixC

n00b
Joined
Jul 4, 2010
Messages
33
I'm kinda torn here on whether this might be an OS problem or a hardware one, failing motherboard, but I'm guessing the right way to trace it would be to go through here since the faults are manifesting themselves with Windows' user accounts.

A couple of days ago, out of the blue, I got a BSOD reporting a problem with ATAPORT.SYS. System crashed and on restart it seemed that it wouldn't complete Windows load anymore. After a couple of restarts I let it stew on the Please Wait screen for a long time, literally minutes, and eventually it did present me with my single user account and logged onto it. Subsequent reboots I did for testing worked perfectly. So I looked around the net and I found two suggestions, both relating to the MBR, one that I might have a rootkit that dug itself in there, another that an auto-defrag process might've botched something with it. Either way, I did a sweep with AVG then downloaded and ran Malwarebytes and didn't find anything. For good measure I used my install disc to rewrite the MBR to play it safe, bootrec/fixmbr and fixboot.

Today problems came back but even worse. I got to the Please Wait screen on loading Windows, then it briefly flashed to black and displayed the logon screen. When I attempted to enter my password, however, it said "Can't connect to RPC server." I rebooted, and then I got the same as days before, spent ages on the Please Wait screen. But this time, once it did finally allow me to log into my user account, the Aero interface was disabled (though it came on by itself after a little while) and I got an error message saying something to the effect of "Windows failed to connect to service System Event Notification Service" and all my user account info wasn't there. Firefox would load with stock settings, my desktop calendar was empty, and the Libraries category in Explorer redirected to nothing. I went into the Users folder on C:\ and my user account data was there, but just not loaded into the running account.

Windows 7 Professional 64bit, installed on an OCZ Vertex II SSD. Used PassMark's Disk Checkup utility to have a look at it, SMART info reported no errors. Any sort of testing on the SSD with DiskCheckup just hangs, though, seems to go on forever. Works on the HDDs, but might just be an incompatibility of the program. My motherboard is an Asus P8Z68-V with a known fault of causing multiple boots when certain overclocking features are enabled, but I'm running all stock settings and the system's been rock-solid so far. 8GB RAM, and an i7-2600K.

Edit: Keeps happening. It's like playing roulette, sometimes boots fine but apparently more often than not it's an utter screwup. Sometimes it takes forever only to eventually log into an empty account. Most recently it actually didn't even load the Please Wait message, I was left staring at the background screen. The last time it loaded properly it threw me a wobbly about a runtime error with an nVidia service.

And just now on restart, at POST, it gave me a No Keyboard Found beep. This is starting to stink of motherboard failure. I also noticed a delay in prepping the audio device too last time it loaded up properly before this session, along with the AVG service being disabled. Looks like more trouble with starting up services or something, not sure. Running fine at the time of writing this last bit now.

Since I wrote the first contents of this post I've done another rewrite of the MBR. Something a little off here, the OS is installed on C:\, and the installation disc sees that as F:\. Fair enough, the first time I rewrote the MBR I did it on C:\ and it said it completed successfully, second time on F:\ and it said the same thing. But when I tried a /rebuildBCD it said it found 0 Windows installations across all drives. Don't what to make of this, at this point I'm kind of grasping at straws.

Anyway, does anyone have a clue of what might be going on? A problem with the OS, or more likely that the hardware's going? Could the mainboard cause issues like these?
 
Not sure if you have done this but what is in the event log, especially the system one? This is the first (and second) place I would look. I've also used the performance logging in Windows 7 to look at performance issues. Not quite the same as what you require but check this out, see if it tells you where things are dropping http://www.techrepublic.com/blog/window-on-windows/use-windows-7-event-viewer-to-track-down-issues-that-cause-slower-boot-times/3253

Sounds to me like you migtht have power / sata port issues. Is there anything in the event log saying that that a port was reset or anything like that.
 
No, I hadn't used the Even Log before. I've just built up views for Boot Time and Degradation, as per the article, but having a quick skim over the details over today's boots I haven't seen any mention of a port reset. In fact, weirder still, I've restarted the system today over a dozen times but only four boots show up with today's date. Are these logs stored in user account data? 'Cause if they are, that might explain while they don't show up in history. No degradation issues are listed today at all.

If it's a power issue, do you suppose it could be coming from the PSU? Or is the motherboard alone the one that could be causing trouble with this? The thing is, once I'm into my proper loaded account, the system seems to work perfectly.

Another thing about the motherboard, this is the way it's worked from day one - quick flash of the POST screen with American Megatrends, black screen then a quick flash of a message of not finding some JMicron controller I believe (it's really hard to read, quick one), the POST screen comes on again and the OS boots. But there isn't an actual halt for the system, that only happens if I enable EPU/TPU - boot, full halt, boot again.
 
I think that haileris might be onto something with one of the SATA ports being the issue. Have you tried a different SATA port? Perhaps one that's on a different controller chip (I believe that your particular motherboard split the SATA ports between 2 or 3 different controllers).

I think that the fact that the initial BSOD pointed to ATAPORT.SYS suggests that it's a problem with either the hard drive or the SATA port that you're using (or possibly the SATA controller, which would mean an RMA of the motherboard if you want it to work properly).

The error that you got that it couldn't connect to the RPC server is a bit odd, it shouldn't be connecting to any RPC server (to my knowledge at least) on boot up unless you've specifically set it up to connect to a remote desktop when it boots, it should only be trying that if you're trying to connect to a desktop on another machine. That to me seems to say that you might, indeed, have a rootkit. Unfortunately neither AVG nor Malwarebytes is likely to detect it - rootkits have access to the kernel of the OS and those programs can't scan that deeply. To rule it out a rootkit I would recommend running either RootkitHookAnalyzer or RemoveAny. They do a pretty good job at detecting rootkits and it would certainly be nice to rule them out.
 
Okay, here's what I got so far. Long story short, did a bunch of test, over twenty reboots, with both the original SATA port I had it it in, SATA III IDE Port 0, and one of the other ports, SATA II Port 2. I ended up going back to the Even Log and this time I looked in the right place, the System log.

Basically, what I described as the larger failures, "user data not showing up" appears to be the result of massive failures for various services on starting up. What I didn't realise is that even when the computer seemed to be booting up fine, in fact it wasn't. Those little flashes or pauses on the logon screen were also indicative of a problem, which appears to identify the main two failures. These two errors happen every time the PC boots into Windows from scratch:

- one, generated by MS Windows Eventlog: "The event logging service encountered an error (res=1117) while initializing logging resources for channel Microsoft-Windows-GroupPolicy/Operational."

- the second, and likely the root of all this mess, generated by atapi: "The driver detected a controller error on \Device\Ide\IdePortX." - where X is either 0 or 2, depending on which port I have the OS SSD plugged into.

The full details if of any interest:


Now, like I said, these two errors happen every time I boot into Windows. And every now and then, more rarely, they're accompanied by that other cascade of errors which make it look as if the OS hasn't read all the user info. Or once it actually caused a full PC reset at the login screen. But these errors don't occur when I put the PC in and out of Hibernation. That works with a hitch, no problems visible nor logged in the Event viewer.

This all puts me in a bit of a bind, because I'm in a situation where I really need my PC for the next two weeks. Can't RMA for now, nor can I spend a lot of time on reinstalls, I've got some work that needs to get crunched out. So I'd like to boil this down to three questions for the time being:

1. Is it at all possible that it's the SSD that's causing the problems, not the motherboard SATA controller? The error in the System log would make me think that the fault lies with the mainboard, would it be possible for this not to be the case? I know, one way to check would be to take a clean HDD and make a new install on that and see, but like I said, I'm scarce on time and if anyone knows for sure one way or the other I wouldn't have to go through with it if there's no point.

2. Any reasonable esitmates, given these circumstances is it likely that the system will completely crap out on me within the next two weeks? I scrounged through the System log, and the first incidence of a fault with the IDE controller dates back to just September 1st, and it's gotten this bad just this last week. For reference, I've had this PC since the end of June.

3. Why would Windows have said "Can't connect to RPC server?" I'm really kinda worried about this, is there any reasonably likely scenario where these failures with the controller and services startups could cause that? I've looked for rootkits with Kaspersky's TDSSKiller and SanityCheck (RootHookAnalyser), and they didn't turn up anything except Daemon Tools' SPTD service (false positive).

So, yeah, that's where I'm at. If anyone's in the know, I'm all ears, and by the way thanks for the input guys, it's much appreciated.
 
Last edited:
I think that one way to see if you can reproduce the problem would be to use VMware Convert to make a virtual machine out of your computer and test it on another machine. I've never done it myself, but I have a friend from Australia who swears by this trouble shooting technique. If anything, it might generate some useful error messages to give you a better idea what's wrong, if anything, with the drive.

I don't have a whole lot of experience with SSD hard drives (just what I've read about them) so this might not be that helpful: but you could try a utility like SpinRite or HDD Regenerator to see if you've got bad sectors on the SSD. That being said, because I don't have any experience with SSDs I'm not sure if a sector-by-sector scan of the hard drive would even be worth it. But perhaps there's a similar scanning technique that can be used for an SSD. A more reasonable test would be to see if you can clone the drive using Clonezilla, put the image on a new drive (or a known good drive) and see if the problem reproduces itself.

Are you running the latest drivers for your motherboard? It could be that one of the drivers somehow got corrupted during the BSOD and led to your instability.

Unfortunately I'm in a bit of a rush so I can't sit down and really think about your problem so these are just a few things that I can think of off the top of my head. I'll sit down on the problem when I get back home and see if there aren't more effective things that you could be trying.

As for the possibility of a rootkit, the error concerning the RPC error has me a bit worried. It's out of left field and doesn't really match the other symptoms your computer is presenting us with. It could very well be a rootkit and the detection software doesn't do a good job of finding it. In my experience rootkit detection software is about 60-70% effective as the nature of rootkits makes them extremely hard to to detect. So you could very well have a rootkit at ring 0 that's causing both the RPC server error as well as the SATA controller error (or SSD error as it were). Unfortunately the only way that I know of getting rid of a rootkit that's that deeply buried into the system is to wipe the hard drive (and I'd say use a DoD wipe from DBAN) and reinstall the OS...the suckers are that hard to get rid of.

That being said - rootkits are pretty rare, at least the ones that are that serious are pretty rare.

I would say that, if you really need to get some work done on your machine, I'd image the drive using Clonezilla and put that image on a hard drive that you know works, ideally it would be the same drive but I don't know anyone who keeps a second drive of the same make and model lying around just in case they need to create an exact image. I don't know 100% if you can really just put an old image on a new drive (and I'd test it if I didn't have to leave in a few minutes), but it's worth a shot. If that doesn't work (bearing any technical limitation from the fix I'm suggesting) then you know that it's probably either the SATA controller or the driver that handles the SATA controller. In which case either update to the newest driver or revert back to a driver that you know works (in case an upgrade to the latest driver is what killed your machine in the first place).

Good luck.
 
just update ur ocz firmware, do secure erase with parted magic.

and reinstlal windows.
 
I would start with doing a chkdsk
The driver detected a controller error on \Device\Ide\IdePort2
that can refer to either a controller issue, or possibly a drive issue.

Since you've had the same issues with 2 different ports (on 2 different controllers) it would point to either a disk issue or even the SATA cable.
 
Right, sorry for sticking my head in the sand, been all tied up this week. So it looks like I found the source of the problem. Well, hopefully not counting my chicken before they're hatched, etc.

I don't have a whole lot of experience with SSD hard drives (just what I've read about them) so this might not be that helpful: but you could try a utility like SpinRite or HDD Regenerator to see if you've got bad sectors on the SSD.
I would start with doing a chkdsk

This was it. A bad cluster, by the looks of it. One. Though I gotta say it strikes me as rather odd, since the file affected just doesn't look like an essential process at first glance. Hence why I'm rather reluctant to pop out the bubbly.

Here's the relevant bit of the chkdsk log:
Code:
CHKDSK is verifying file data (stage 4 of 5)...
[B][U]Read failure with status 0xc0000185 at offset 0x58421a000 for 0x10000 bytes.
Read failure with status 0xc0000185 at offset 0x58421a000 for 0x1000 bytes.
Windows replaced bad clusters in file 57562
of name \Windows\System32\winevt\Logs\MIE38D~1.EVT.[/U][/B]
  168176 files processed.                                                 File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
  15710992 free clusters processed.                                         Free space verification is complete.
[B][U]Adding 1 bad clusters to the Bad Clusters File.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.[/U][/B]

 124930047 KB total disk space.
  61728636 KB in 137457 files.
     84420 KB in 28634 indexes.
         [B][U]4 KB in bad sectors.[/U][/B]
    273019 KB in use by the system.
     65536 KB occupied by the log file.
  62843968 KB available on disk.

      4096 bytes in each allocation unit.
  31232511 total allocation units on disk.
  15710992 allocation units available on disk.

After this process ran, I've done over a dozen reboots on both IDE ports and my System Log in the Event Viewer no longer stars the hit band Atapi Error and Friends. Boot and login times have also returned to normal. So I'm hoping that this was the root of the problem, and there isn't any further damage or a rootkit burrowed in for the winter.

Though I gotta say, is it normal for a three months' old hard drive to spring a bad cluster? Or maybe poop just happens.

As for the possibility of a rootkit, the error concerning the RPC error has me a bit worried. It's out of left field and doesn't really match the other symptoms your computer is presenting us with. It could very well be a rootkit and the detection software doesn't do a good job of finding it. In my experience rootkit detection software is about 60-70% effective as the nature of rootkits makes them extremely hard to to detect. So you could very well have a rootkit at ring 0 that's causing both the RPC server error as well as the SATA controller error (or SSD error as it were). Unfortunately the only way that I know of getting rid of a rootkit that's that deeply buried into the system is to wipe the hard drive (and I'd say use a DoD wipe from DBAN) and reinstall the OS...the suckers are that hard to get rid of.

That being said - rootkits are pretty rare, at least the ones that are that serious are pretty rare.
I'll be back looking some more into the rootkit issue in a couple of weekends, though hopefully it will prove a wild goose chase. I've got AVG and Windows Firewall up and running, and I tried three different solutions to look for a rootkit. I don't have a lot of sensitive data on my PC, some emails and logins, but no passwords of any sort and at the end of the day nothing irreplaceable.

just update ur ocz firmware, do secure erase with parted magic.

and reinstlal windows.
Ordinarily, I might've done that by the third day the system was acting up, but I need the PC at this time and can't really do without. Don't have the better part of a day to set up again and reinstall each and every application I commonly need.

Btw, funny that its ATAPORT.SYS - doesnt that indicate that you are using IDE drivers. What is your BIOS set up like?
BIOS has SATA Mode set to AHCI, is that what you were after?


Anyway, with a bit of luck this will have proved to be the full extent of the problem. So I'd just like to say thanks to everybody for your time and advice, been of great help! At least for now, things seem fine.
 
Back
Top