2011-06-12 03:10:25

by John Z. Bohach

[permalink] [raw]
Subject: Kernel bug in root=nfsroot codepath

There is a problem with S5 and nfsroot.

The kernel will NOT enter S5 if and only if root=nfsroot.

With the same rootfs burned to disk, it works fine. I've tracked this
done to the

__raw_notifier_call_chain()

function. The call trace is sequence is:

kernel_power_off()
...
disable_nonboot_cpus()
_cpu_down()
...
cpu_die()
cpu_notify_nofail()
cpu_notify()
...
__cpu_notify()
__raw_notifier_call_chain()
notifier_call_chain()
?? strange second call from unknown location to:
__raw_notifier_call_chain()
then it just stops...

Machine does not hang as I can see NFS timeout messages after a few
minutes (probably interrupt context), but no further printk's are
manifest, and system stays in this state until physically reset and is
unresponsive.

This is the third time I'm posting this...any ideas? If I simply skip
the disable_nonboot_cpus() call, then it powers down fine. This only
happens with root=nfsroot. This happens with multiple kernels and goes
back to at least 2.6.16 and is there even with today's latest kernel.

To try to duplicate this, boot an nfsroot-ed machine into run-level 1,
and run 'halt -n -d -f -p'.

Thanks,
John