Correctable Memory ECC @ DIMMA2(CPU1) - Asserted

rufik

Limp Gawd
Joined
Sep 16, 2013
Messages
151
Any advice about that alarm from IPMI event log

Code:
Correctable Memory ECC @ DIMMA2(CPU1) - Asserted

Broken RAM ?
 
Depends how many times do you see it?

If Many times == Yes DIMM starting to go bad
One Time Offense == Eh lets see if it does it again, if not could have been just a fluke!
 
few times i got purple screen of death in my esxi.

Caused by that dimm ?
 
Yeah, flip some random bits that are not corrected, and who knows what happens. If it's an instruction, you can go off to the moon... A pointer, a fault. etc...
 
Most likely the cause, depends on what info is in the PSOD but most likely culprit is the DIMM. There are other reasons for PSODs but bad acting ram can exacerbate the underlying issue
 
It showed int he PSOD that it was stored on the host, check your /var/core directory
 
Hello

~ # vm-support -p
17:39:32: Creating /var/tmp/esx-esxi.olivia.net-2014-01-02--17.39.tgz
17:40:25: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:25: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:25: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from /usr/lib/vmware/vm-support/bin/dump-vmdk-rdm-info.sh /vmfs/volumes/52964d59-67ac717:40:26: Gathering output from vmkfstools -P -v 10 /bootbank 17:40:34: Gathering output from /usr/sbin/localcli --formatter=json storage core claimrule list --claimrule-class17:40:34: Gathering output from /usr/sbin/localcli storage nmp device list 17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/vGW-Vyatta-01a/vGW-Vyatta-01a-000001-delta.vmd17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/vGW-Vyatta-01a/vGW-Vyatta-01a-000001.vmdk 17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/Windows 2003 Server/Windows 2003 Server-flat.v17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/Windows 2003 Server/Windows 2003 Server.vmdk 17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/W2k12r2-Term-SRV/W2k12r2-Term-SRV-000001-delta17:41:37: Adding /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/W2k12r2-Term-SRV/W2k12r2-Term-SRV-000001.vmdk 17:41:54: Gathering output from /bin/ls -la /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/W2k3-Terminal-SRV 17:41:54: Gathering output from /bin/ls -la /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/vyatta_virt 17:41:54: Gathering output from /bin/ls -la /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/Windows 2003 Server17:41:54: Gathering output from /bin/ls -la /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/W2k12r2-SRV 17:41:54: Gathering output from /bin/ls -la /vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/W2k12r2-Term-SRV 17:41:54: Gathering output from /usr/sbin/localcli vm process list 17:47:08: Done.
Please attach this file when submitting an incident report.
To file a support incident, go to http://www.vmware.com/support/sr/sr_login.jsp
To see the files collected, run: tar -tzf '/var/tmp/esx-esxi.olivia.net-2014-01-02--17.39.tgz'
~ #

Ok i made into ESXi CLI below command

vm-support -p

Whole tgz file has almost 500 MB

Which one file do you want, so i will host and provide url here

Regards and Im waiting for quick replay
 
looking for a file with the name
vmkernel-zdump

should be in /var/core of that folder.
 
I can look at it i honestly like having the entire file just to analyze deeper also i have to rebuild the core from the fragements... any chance u can get me that file through a dropbox or a file downloader and i can extract the log and analyze the files
 
Ok. Thank you very much, About 20 min im going to finish uploading zip file to server, after that i will provide to private message URL
 
Still Im waiting for outputs from Dasaint :)

I was trying do some investigation by myself

so ...

I was looking for vmkernel-zdump files.

/vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core # ls -la
total 149520
drwxr-xr-x 1 root root 840 Jan 4 13:48 .
drwxr-xr-x 1 root root 980 Dec 18 17:47 ..
-rw-r--r-- 1 root root 83123846 Jan 2 16:47 vmkernel-zdump.1
-rw-r--r-- 1 root root 66725097 Jan 4 12:17 vmkernel-zdump.2
/vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core #


I decided to extract it

vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core # vmkdump_extract -l vmkernel-zdump.1

vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core # vmkdump_extract -l vmkernel-zdump.2

/vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core # ls -la
total 149520
drwxr-xr-x 1 root root 840 Jan 4 13:48 .
drwxr-xr-x 1 root root 980 Dec 18 17:47 ..
-rw------- 1 root root 970149 Jan 4 13:48 vmkernel-log.1
-rw------- 1 root root 310366 Jan 4 13:48 vmkernel-log.2

-rw-r--r-- 1 root root 83123846 Jan 2 16:47 vmkernel-zdump.1
-rw-r--r-- 1 root root 66725097 Jan 4 12:17 vmkernel-zdump.2
/vmfs/volumes/52964d59-67ac7e62-3fcd-002590d5501a/scratch/core #


By WINSCP i downloaded to my computer logs vmkernel-log.1 and vmkernel-log.2

and i found

in vmkernel-log.1

2014-01-02T02:33:51.694Z cpu5:32784)r13=0x1 r14=0x41238041d3e0 r15=0x41238041d434
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:0 world:270890 name:"vmm0:W2k12r2-SRV" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:1 world:93478 name:"vmm2:Lync_Server" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:2 world:93477 name:"vmm1:Lync_Server" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:3 world:261823 name:"vmm0:W2k3-Terminal-SRV" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:4 world:93479 name:"vmm3:Lync_Server" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:5 world:32784 name:"netCoalesce2World" (S)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:6 world:920562 name:"vmm1:WXP-P2P" (V)
2014-01-02T02:33:51.695Z cpu5:32784)pcpu:7 world:93475 name:"vmm0:Lync_Server" (V)
2014-01-02T02:33:51.695Z cpu5:32784)@BlueScreen: #PF Exception 14 in world 32784:netCoalesce2 IP 0x418016210c57 addr 0x0
PTEs:0x161988023;0x161989023;0x16198a023;0x0;
2014-01-02T02:33:51.695Z cpu5:32784)Code start: 0x418016000000 VMK uptime: 14:08:36:01.268
2014-01-02T02:33:51.695Z cpu5:32784)0x41238041d4d0:[0x418016210c57]E1000PollRxRing@vmkernel#nover+0xb73 stack: 0x41238041d500
2014-01-02T02:33:51.695Z cpu5:32784)0x41238041d540:[0x418016213bb5]E1000DevRx@vmkernel#nover+0x3a9 stack: 0x1
2014-01-02T02:33:51.696Z cpu5:32784)0x41238041d5e0:[0x418016192164]IOChain_Resume@vmkernel#nover+0x174 stack: 0x41087941c540
2014-01-02T02:33:51.696Z cpu5:32784)0x41238041d630:[0x418016179e22]PortOutput@vmkernel#nover+0x136 stack: 0x41087938e780
2014-01-02T02:33:51.696Z cpu5:32784)0x41238041d690:[0x41801672cf58]EtherswitchForwardLeafPortsQuick@<None>#<None>+0x4c stack: 0x4e78
2014-01-02T02:33:51.697Z cpu5:32784)0x41238041d8b0:[0x41801672df51]EtherswitchPortDispatch@<None>#<None>+0xe25 stack: 0x418000000014
2014-01-02T02:33:51.697Z cpu5:32784)0x41238041d920:[0x41801617a7d2]Port_InputResume@vmkernel#nover+0x192 stack: 0x412e84a32780
2014-01-02T02:33:51.697Z cpu5:32784)0x41238041d970:[0x41801617ba39]Port_Input_Committed@vmkernel#nover+0x25 stack: 0x410842b91980
2014-01-02T02:33:51.698Z cpu5:32784)0x41238041d9e0:[0x41801621763a]E1000DevAsyncTx@vmkernel#nover+0x112 stack: 0x41238041db60
2014-01-02T02:33:51.698Z cpu5:32784)0x41238041da50:[0x4180161add70]NetWorldletPerVMCB@vmkernel#nover+0x218 stack: 0x9
2014-01-02T02:33:51.698Z cpu5:32784)0x41238041dbb0:[0x4180160eae77]WorldletProcessQueue@vmkernel#nover+0xcf stack: 0x1
2014-01-02T02:33:51.699Z

What it means that BlueScreen report in world:32784 name:"netCoalesce2World" (S)
What is does mean ?


in vmkernel-log.2


2014-01-04T11:50:18.946Z cpu3:35137)rax=0x0 rbx=0x83 rcx=0x3
2014-01-04T11:50:18.946Z cpu3:35137)rdx=0x2 rbp=0x4123a505d328 rsi=0x0
2014-01-04T11:50:18.946Z cpu3:35137)rdi=0x410867b45a40 r8=0x0 r9=0x0
2014-01-04T11:50:18.946Z cpu3:35137)r10=0x4110e7e84080 r11=0x4 r12=0x41000ca1cef4
2014-01-04T11:50:18.946Z cpu3:35137)r13=0x1 r14=0x4123a505d238 r15=0x4123a505d28c
2014-01-04T11:50:18.946Z cpu3:35137)pcpu:0 world:35209 name:"vmm3:Lync_Server" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:1 world:35067 name:"vmm0:W2k3-Terminal-SRV" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:2 world:35205 name:"vmm0:Lync_Server" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:3 world:35137 name:"vmx" (U)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:4 world:35207 name:"vmm1:Lync_Server" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:5 world:51251 name:"vmm0:WXP-P2P" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:6 world:35139 name:"vmm0:W2k12r2-SRV" (V)
2014-01-04T11:50:18.947Z cpu3:35137)pcpu:7 world:35208 name:"vmm2:Lync_Server" (V)
2014-01-04T11:50:18.947Z cpu3:35137)@BlueScreen: #PF Exception 14 in world 35137:vmx IP 0x418019410c57 addr 0x0
PTEs:0x124710027;0x122926027;0x0;
2014-01-04T11:50:18.947Z cpu3:35137)Code start: 0x418019200000 VMK uptime: 1:01:41:14.520
2014-01-04T11:50:18.947Z cpu3:35137)0x4123a505d328:[0x418019410c57]E1000PollRxRing@vmkernel#nover+0xb73 stack: 0x4123a505d358
2014-01-04T11:50:18.947Z cpu3:35137)0x4123a505d398:[0x418019413bb5]E1000DevRx@vmkernel#nover+0x3a9 stack: 0x1
2014-01-04T11:50:18.948Z cpu3:35137)0x4123a505d438:[0x418019392164]IOChain_Resume@vmkernel#nover+0x174 stack: 0x0
2014-01-04T11:50:18.948Z cpu3:35137)0x4123a505d488:[0x418019379e22]PortOutput@vmkernel#nover+0x136 stack: 0x41087938e780
2014-01-04T11:50:18.948Z cpu3:35137)0x4123a505d4e8:[0x41801992cf58]EtherswitchForwardLeafPortsQuick@<None>#<None>+0x4c stack: 0xab93
2014-01-04T11:50:18.949Z cpu3:35137)0x4123a505d708:[0x41801992df51]EtherswitchPortDispatch@<None>#<None>+0xe25 stack: 0x14
2014-01-04T11:50:18.949Z cpu3:35137)0x4123a505d778:[0x41801937a7d2]Port_InputResume@vmkernel#nover+0x192 stack: 0x412e853d86c0
2014-01-04T11:50:18.949Z cpu3:35137)0x4123a505d7c8:[0x41801937ba39]Port_Input_Committed@vmkernel#nover+0x25 stack: 0x410842b1f0d0
2014-01-04T11:50:18.950Z cpu3:35137)0x4123a505d838:[0x41801941763a]E1000DevAsyncTx@vmkernel#nover+0x112 stack: 0x4123a505d9b8
2014-01-04T11:50:18.950Z cpu3:35137)0x4123a505d8a8:[0x4180193add70]NetWorldletPerVMCB@vmkernel#nover+0x218 stack: 0x417fd92ee0e0
2014-01-04T11:50:18.950Z cpu3:35137)0x4123a505da08:[0x4180192eae77]WorldletProcessQueue@vmkernel#nover+0xcf stack: 0x1
2014-01-04T11:50:18.951Z cpu3:35137)0x4123a505da48:[0x4180192eb93c]WorldletBHHandler@vmkernel#nover+0x54 stack: 0x125b0022d8092
2014-01-04T11:50:18.951Z cpu3:35137)0x4123a505dab8:[0x41801922e5b9]BH_Check@vmkernel#nover+0xc9 stack: 0x0
2014-01-04T11:50:18.951Z cpu3:35137)0x4123a505db28:[0x41801944e72d]CpuSchedIdleLoopInt@vmkernel#nover+0x391 stack: 0x2
2014-01-04T11:50:18.951Z cpu3:35137)0x4123a505dc88:[0x418019454930]CpuSchedDispatch@vmkernel#nover+0x1630 stack: 0x41110138d060
2014-01-04T11:50:18.952Z cpu3:35137)0x4123a505dcf8:[0x418019455c65]CpuSchedWait@vmkernel#nover+0x245 stack: 0x1
2014-01-04T11:50:18.952Z cpu3:35137)0x4123a505dd98:[0x4180192dc46e]WorldWaitInt@vmkernel#nover+0x2c6 stack: 0x410800000200
2014-01-04T11:50:18.952Z

What it means that BlueScreen report in world:35137 name:"vmx" (U)
What is does mean ?

What is a root cause ?

Regards
 
Get me those two log files thats what i was really going to get...

The link u sent me didnt work.

Are your VM's (ANY of them) running E1000 or E1000E Nics? I have seen something like this before... way to fix use all VMXNet3's i saw u were using a Server 2012 so it has to be e1000e or vmxnet3. And based on what i see from your logs there, this looks like a match. (Affects all Versions of 5.x).. In this case your faulted DIMM might not be the cause of the PSODs... this one has popped up a lot internally should be fixed in next major update (update not Patch).

http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2059053
 
Last edited:
based on the logs it looks like you are hitting that issue, change out all your VMs to VMXNet3 see if your issue stops... if it continues i would be surprised to see the PSOD matching this issue...

To resolve this issue all vm's need to be using VMXNET3 (No e1000 or e1000e on the hosts at all) should get you all repaired up!
 
Hey

Yes i have E1000 on ALL VM's

Could you tell me something more about differences between VMXNET3 and E1000 ?
 
VMXNet3 is VMwares implementation of a Paravirtual NIC Card, it can go up to 10Gb speeds so VMs within the same host can cross communicate at 10Gb speeds as they flow within the same virtual switch backplane (Unless vlans and routing is involved and don't have a virtual router)

E1000 is a stock intel driver that has been in use since the dawn of whenever :p usually its a tried and true Nic... sadly for this case a big bad bug got it! E1000 is limited also on what it can do from within the VM unless you are using non-stock drivers to handle the advanced interfaces such as guest vlan tagging and other functions.

In the past it kind of went Flexible -> E1000 -> VMXNet3 where in VMwares world the VMXNet3 is the king of the drivers.. its continually updated via the VMware Tools so we put a high standard on delivering good drivers in most cases.

see this KB to explain it all
http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=1001805
 
I just checked. I have ....

1 x Flexible" network adapter om Windows XP VM
9 x E1000 on vatta virtual router VM
1 x E1000E on Windows Server 2012 R2 VM
2 x VMXNET on CentOS VM
4 x E1000E on Windows Server 2012 R2 VM
1 x E1000 on Windows Server 2003 R2 VM
 
Last edited:
From all i see there, the one i am concerned with is the Vyatta Router depending on the version... The old stuff use to have issues but i think they fixed it in Core 6.4 with the VMXNet3 support.

XP with Flexible (very old school) should be VMXNet anyways... Server 2k3 and XP (Might show up as a different speed dont worry about that as 10Gb Interfaces didnt really exist in the XP Days but still works great! [ http://kb.vmware.com/selfservice/mi..._1_1&dialogID=136506672&stateId=0 0 136516516 ]

If you were running 5.0 or 5.1 u would have hit the E1000E bug with Server 2012 (http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2058692)

VMXNet is awesome with native support in Centos
 
u should be good then for all VMXNet3 You can also do what they recommended by disabling the RSS in windows but im a bigger fan of using VMXNet3 over that
 
I just decied to re address whole network for VMXnet3. I like flexible, scalable soultion dn i down want anymore PSOD :)

Dasaint, thx for huge help :) toy are a men :)

Cheers
 
Back
Top