2007-01-05 14:20:40

by Tomasz Kvarsin

[permalink] [raw]
Subject: [BUG] 2.6.20-rc3-mm1: can not mount root

I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
I made binary search, patch bellow cause this bug:
$quilt top
patches/sched-improve-sched_clock-on-i686.patch

backtrace which I got by connecting "gdb" to machine:

_raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
108 for (i = 0; i < loops; i++) {
(gdb) bt
#0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
#1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
#2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
CPU#%d, %s/%d\n",
args=0xc1167a84 "") at kernel/printk.c:534
#3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
CPU#%d, %s/%d\n")
at kernel/printk.c:508
#4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
"recursion") at lib/spinlock_debug.c:61
#5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
#6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
#7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
kernel paging request",
args=0xc1167b8c "") at kernel/printk.c:534
#8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
kernel paging request")
at kernel/printk.c:508
#9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
arch/i386/mm/fault.c:555
#10 0xc056b11c in page_fault ()
#11 0xc0808160 in ?? ()
#12 0xc0626ed0 in kallsyms_token_index ()
#13 0xc1167cac in ?? ()
#14 0x00000001 in ?? ()
#15 0xc0808163 in printk_buf.19225 ()
#16 0xc1167c0c in ?? ()
#17 0x00000000 in ?? ()


2007-01-05 17:43:05

by Tomasz Kvarsin

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

Some details:
system hangs after it print:
VFS: Mounted root (ext3 filesystem) readonly.
And if connect gdb to machine, I see the trace that I include into
previous letter.
My kernel config in attachment.

If you need some more info please tell.

On 1/5/07, Tomasz Kvarsin <[email protected]> wrote:
> I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> I made binary search, patch bellow cause this bug:
> $quilt top
> patches/sched-improve-sched_clock-on-i686.patch
>
> backtrace which I got by connecting "gdb" to machine:
>
> _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> 108 for (i = 0; i < loops; i++) {
> (gdb) bt
> #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n",
> args=0xc1167a84 "") at kernel/printk.c:534
> #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n")
> at kernel/printk.c:508
> #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> "recursion") at lib/spinlock_debug.c:61
> #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request",
> args=0xc1167b8c "") at kernel/printk.c:534
> #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request")
> at kernel/printk.c:508
> #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> arch/i386/mm/fault.c:555
> #10 0xc056b11c in page_fault ()
> #11 0xc0808160 in ?? ()
> #12 0xc0626ed0 in kallsyms_token_index ()
> #13 0xc1167cac in ?? ()
> #14 0x00000001 in ?? ()
> #15 0xc0808163 in printk_buf.19225 ()
> #16 0xc1167c0c in ?? ()
> #17 0x00000000 in ?? ()
>


Attachments:
(No filename) (1.89 kB)
.config (42.12 kB)
Download all attachments

2007-01-05 18:50:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On Fri, 5 Jan 2007 17:20:38 +0300
"Tomasz Kvarsin" <[email protected]> wrote:

> I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> I made binary search, patch bellow cause this bug:
> $quilt top
> patches/sched-improve-sched_clock-on-i686.patch
>
> backtrace which I got by connecting "gdb" to machine:

How did you manage to use gdb on an i386 kernel? Using qemu or something?

> _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> 108 for (i = 0; i < loops; i++) {
> (gdb) bt
> #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n",
> args=0xc1167a84 "") at kernel/printk.c:534
> #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n")
> at kernel/printk.c:508
> #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> "recursion") at lib/spinlock_debug.c:61
> #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request",
> args=0xc1167b8c "") at kernel/printk.c:534
> #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request")
> at kernel/printk.c:508
> #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> arch/i386/mm/fault.c:555
> #10 0xc056b11c in page_fault ()
> #11 0xc0808160 in ?? ()
> #12 0xc0626ed0 in kallsyms_token_index ()
> #13 0xc1167cac in ?? ()
> #14 0x00000001 in ?? ()
> #15 0xc0808163 in printk_buf.19225 ()
> #16 0xc1167c0c in ?? ()
> #17 0x00000000 in ?? ()

It looks like the machine was trying to oops, only it gets stuck on
logbuf_lock. Perhaps it hit an oops while running printk_clock() inside
vprintk() then tried to go recursive.

oopses while holding logbuf_lock are rare, and appear to be fatal. Perhaps
we should ignore logbuf_lock if oops_in_progress, but the chances are we'll
just hit the same oops again..

Do you have "time" on the kernel boot command line? If so, does removing
that option make the hang go away?

2007-01-05 19:41:28

by Tomasz Kvarsin

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On 1/5/07, Andrew Morton <[email protected]> wrote:
> On Fri, 5 Jan 2007 17:20:38 +0300
> "Tomasz Kvarsin" <[email protected]> wrote:
>
> > I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> > I made binary search, patch bellow cause this bug:
> > $quilt top
> > patches/sched-improve-sched_clock-on-i686.patch
> >
> > backtrace which I got by connecting "gdb" to machine:
>
> How did you manage to use gdb on an i386 kernel? Using qemu or something?
>

Yes, I use qemu.

> > _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > 108 for (i = 0; i < loops; i++) {
> > (gdb) bt
> > #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > CPU#%d, %s/%d\n",
> > args=0xc1167a84 "") at kernel/printk.c:534
> > #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > CPU#%d, %s/%d\n")
> > at kernel/printk.c:508
> > #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> > "recursion") at lib/spinlock_debug.c:61
> > #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> > #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > kernel paging request",
> > args=0xc1167b8c "") at kernel/printk.c:534
> > #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > kernel paging request")
> > at kernel/printk.c:508
> > #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> > arch/i386/mm/fault.c:555
> > #10 0xc056b11c in page_fault ()
> > #11 0xc0808160 in ?? ()
> > #12 0xc0626ed0 in kallsyms_token_index ()
> > #13 0xc1167cac in ?? ()
> > #14 0x00000001 in ?? ()
> > #15 0xc0808163 in printk_buf.19225 ()
> > #16 0xc1167c0c in ?? ()
> > #17 0x00000000 in ?? ()
>
> It looks like the machine was trying to oops, only it gets stuck on
> logbuf_lock. Perhaps it hit an oops while running printk_clock() inside
> vprintk() then tried to go recursive.
>
> oopses while holding logbuf_lock are rare, and appear to be fatal. Perhaps
> we should ignore logbuf_lock if oops_in_progress, but the chances are we'll
> just hit the same oops again..
>
> Do you have "time" on the kernel boot command line? If so, does removing
> that option make the hang go away?
>

No, I have no "time" option,
I use grub, the config looks like:
kernel /boot/kernel root=/dev/hda1 init=/linuxrc

2007-01-30 08:12:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On Fri, 5 Jan 2007 17:20:38 +0300
"Tomasz Kvarsin" <[email protected]> wrote:

> I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> I made binary search, patch bellow cause this bug:
> $quilt top
> patches/sched-improve-sched_clock-on-i686.patch
>
> backtrace which I got by connecting "gdb" to machine:
>
> _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> 108 for (i = 0; i < loops; i++) {
> (gdb) bt
> #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n",
> args=0xc1167a84 "") at kernel/printk.c:534
> #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> CPU#%d, %s/%d\n")
> at kernel/printk.c:508
> #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> "recursion") at lib/spinlock_debug.c:61
> #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request",
> args=0xc1167b8c "") at kernel/printk.c:534
> #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> kernel paging request")
> at kernel/printk.c:508
> #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> arch/i386/mm/fault.c:555
> #10 0xc056b11c in page_fault ()
> #11 0xc0808160 in ?? ()
> #12 0xc0626ed0 in kallsyms_token_index ()
> #13 0xc1167cac in ?? ()
> #14 0x00000001 in ?? ()
> #15 0xc0808163 in printk_buf.19225 ()
> #16 0xc1167c0c in ?? ()
> #17 0x00000000 in ?? ()

Tomasz, is this still happening in 2.6.20-rc6-mm3?

err. We merged that patch. So perhaps 2.6.20-rc6 now crashes in the same
manner?

2007-01-30 08:24:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root


* Andrew Morton <[email protected]> wrote:

> Tomasz, is this still happening in 2.6.20-rc6-mm3?
>
> err. We merged that patch. So perhaps 2.6.20-rc6 now crashes in the
> same manner?

no, we havent merged that patch yet, but it's:

x86_64-mm-improve-sched_clock-on-i686.patch

I bet this is due to Qemu simulating a CPU that does not truly exist.
I'll try to reproduce this.

Ingo

2007-01-30 08:32:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root


* Ingo Molnar <[email protected]> wrote:

> I'll try to reproduce this.

cannot see the crash in qemu - i suspect it's .config dependent. Tomasz,
could you send me the .config you used?

Ingo

2007-01-30 09:04:43

by Tomasz Kvarsin

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On 1/30/07, Andrew Morton <[email protected]> wrote:
> On Fri, 5 Jan 2007 17:20:38 +0300
> "Tomasz Kvarsin" <[email protected]> wrote:
>
> > I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> > I made binary search, patch bellow cause this bug:
> > $quilt top
> > patches/sched-improve-sched_clock-on-i686.patch
> >
> > backtrace which I got by connecting "gdb" to machine:
> >
> > _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > 108 for (i = 0; i < loops; i++) {
> > (gdb) bt
> > #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > CPU#%d, %s/%d\n",
> > args=0xc1167a84 "") at kernel/printk.c:534
> > #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > CPU#%d, %s/%d\n")
> > at kernel/printk.c:508
> > #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> > "recursion") at lib/spinlock_debug.c:61
> > #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> > #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > kernel paging request",
> > args=0xc1167b8c "") at kernel/printk.c:534
> > #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > kernel paging request")
> > at kernel/printk.c:508
> > #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> > arch/i386/mm/fault.c:555
> > #10 0xc056b11c in page_fault ()
> > #11 0xc0808160 in ?? ()
> > #12 0xc0626ed0 in kallsyms_token_index ()
> > #13 0xc1167cac in ?? ()
> > #14 0x00000001 in ?? ()
> > #15 0xc0808163 in printk_buf.19225 ()
> > #16 0xc1167c0c in ?? ()
> > #17 0x00000000 in ?? ()
>
> Tomasz, is this still happening in 2.6.20-rc6-mm3?
>

Have no idea, I tryied to test 2.6.20-rc6-mm2, but
build failed, I will try -mm3.

> err. We merged that patch. So perhaps 2.6.20-rc6 now crashes in the same
> manner?
>
>

Actually there is solution by Vivek Goyal
>
> -int tsc_disable __cpuinitdata = 0;
> +int tsc_disable = 0;

if you merge his patch also all should be fine.

2007-01-30 14:45:46

by Tomasz Kvarsin

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On 1/30/07, Tomasz Kvarsin <[email protected]> wrote:
> On 1/30/07, Andrew Morton <[email protected]> wrote:
> > On Fri, 5 Jan 2007 17:20:38 +0300
> > "Tomasz Kvarsin" <[email protected]> wrote:
> >
> > > I can not boot machine with 2.6.20-rc3-mm1 and 2.6.20-rc2-mm1.
> > > I made binary search, patch bellow cause this bug:
> > > $quilt top
> > > patches/sched-improve-sched_clock-on-i686.patch
> > >
> > > backtrace which I got by connecting "gdb" to machine:
> > >
> > > _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > > 108 for (i = 0; i < loops; i++) {
> > > (gdb) bt
> > > #0 _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:108
> > > #1 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > > #2 0xc011c3bb in vprintk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > > CPU#%d, %s/%d\n",
> > > args=0xc1167a84 "") at kernel/printk.c:534
> > > #3 0xc011c6c7 in printk (fmt=0xc0649c00 "<0>BUG: spinlock %s on
> > > CPU#%d, %s/%d\n")
> > > at kernel/printk.c:508
> > > #4 0xc027be42 in spin_bug (lock=0xc06c0c60, msg=0xc065fc00
> > > "recursion") at lib/spinlock_debug.c:61
> > > #5 0xc027c178 in _raw_spin_lock (lock=0xc06c0c60) at lib/spinlock_debug.c:79
> > > #6 0xc056ac42 in _spin_lock (lock=0xc06c0c60) at kernel/spinlock.c:182
> > > #7 0xc011c3bb in vprintk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > > kernel paging request",
> > > args=0xc1167b8c "") at kernel/printk.c:534
> > > #8 0xc011c6c7 in printk (fmt=0xc0626ed0 "<1>BUG: unable to handle
> > > kernel paging request")
> > > at kernel/printk.c:508
> > > #9 0xc0116de4 in do_page_fault (regs=0xc1167bcc, error_code=0) at
> > > arch/i386/mm/fault.c:555
> > > #10 0xc056b11c in page_fault ()
> > > #11 0xc0808160 in ?? ()
> > > #12 0xc0626ed0 in kallsyms_token_index ()
> > > #13 0xc1167cac in ?? ()
> > > #14 0x00000001 in ?? ()
> > > #15 0xc0808163 in printk_buf.19225 ()
> > > #16 0xc1167c0c in ?? ()
> > > #17 0x00000000 in ?? ()
> >
> > Tomasz, is this still happening in 2.6.20-rc6-mm3?
> >
>
> Have no idea, I tryied to test 2.6.20-rc6-mm2, but
> build failed, I will try -mm3.
>

2.6.20-rc6-mm3 seems to work ok.

2007-01-30 14:48:52

by Tomasz Kvarsin

[permalink] [raw]
Subject: Re: [BUG] 2.6.20-rc3-mm1: can not mount root

On 1/30/07, Ingo Molnar <[email protected]> wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > Tomasz, is this still happening in 2.6.20-rc6-mm3?
> >
> > err. We merged that patch. So perhaps 2.6.20-rc6 now crashes in the
> > same manner?
>
> no, we havent merged that patch yet, but it's:
>
> x86_64-mm-improve-sched_clock-on-i686.patch
>
> I bet this is due to Qemu simulating a CPU that does not truly exist.

I use qemu+kqemu, so as I understand almost all code executes on my real CPU.

> i suspect it's .config dependent. Tomasz,
>could you send me the .config you used?

I supposed you was agree that
this cause a problem:
> -int tsc_disable __cpuinitdata = 0;
^^^^^^^^^^^^^^^^