Home » Vmware » How to read vmkdumps?

Lead the Trends - Follow us on Twitter and Facebook to stay updated

So the other day a hypervisor PSOD’d on my. The infamous pink screen of death!

A reboot made me believe that it was fixed but it really wasn’t, the PSOD happened again!
So I rebooted the hyp for the third time and did the following to get vmkdumps and analyze them and make sure the issue was fixed. Usually I have seen PSOD’s happen due to hardware issues.
1. Reboot ESX 4.1 Host
2. Once up, i tried downloading the dump from vcenter but did not work, annoyingly slow. So cmd line to the rescue.
3. A PSOD results in generating a ESX Kernel dump that is stored in the /root directory. It is named as ‘vmkernel-zdump-<reversed date>.#.#.#’
4. Remember vmkdump is no longer used in esx4 so use esxcfg-dumppart -log DUMPFILE or esxcfg-dumppart -L Dumpfile
5. This will create a vmkernel-log.1 file
6. In that log you will see

@BlueScreen: Hardware (Machine) Error: Internal Unclassified Error. PCPU47 in world 4143:idle47 0:01:25:06.951 cpu47:4143)Code start: 0x41800fa00000 VMK uptime: 0:01:25:06.951 0:01:25:06.952 cpu47:4143)0x417f8017fd38:[0x41800fc9ed96]Power_HaltPCPU@vmkernel:nover+0x27d stack: 0x417f801 7fde8 0:01:25:06.952 cpu47:4143)0x417f8017fe48:[0x41800fbce7fe]CpuSchedIdleLoopInt@vmkernel:nover+0x985 stack: 0x41 7f8017fe88 0:01:25:06.953 cpu47:4143)0x417f8017fe58:[0x41800fbd3fce]CpuSched_IdleLoop@vmkernel:nover+0x15 stack: 0x2f 0:01:25:06.953 cpu47:4143)0x417f8017fe88:[0x41800fa32c57]Init_SlaveIdle@vmkernel:nover+0x11e stack: 0x0 0:01:25:06.954 cpu47:4143)0x417f8017ffe8:[0x41800fca5668]SMPSlaveIdle@vmkernel:nover+0x45f stack: 0x0 0:01:25:06.964 cpu47:4143)FSbase:0x0 GSbase:0x41804bc00000 kernelGSbase:0x0 0:01:25:06.965 cpu47:4143)MC:PCPU17 B:5 S:0xfa00000000400405 M:0x180 A:0x0 5
7. Googling says – 0:01:25:06.965 cpu47:4143)MC:PCPU17 B:5 S:0xfa00000000400405 M:0x180 A:0x0 5 – means that the cpu is having some machine check architecture issues which is used to detect and report hardware issues. You can read the kb article about MCE exception here.

8. So we replaced the CPU and all was good.

Now you know

Comment if you have any questions or want to debate!

Make Current

No comments yet... Be the first to leave a reply!

Leave a Reply