2008-08-24 22:51:30

by John Kacur

[permalink] [raw]
Subject: 2.6.26.3-rt3 bug report

I haven't seen this one before.

Bad page state in process 'firefox-bin'
page:ffffe20001011f00 flags:0x0100000000000000
mapping:ffffe20001011f18 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 4180, comm: firefox-bin Tainted: G W 2.6.26.3-rt3 #3

Call Trace:
[<ffffffff80287a72>] bad_page+0x6f/0x9d
[<ffffffff8029184c>] ? inc_zone_page_state+0x5f/0x6b
[<ffffffff80289046>] free_hot_cold_page+0x88/0x1e6
[<ffffffff8028920b>] free_hot_page+0x10/0x12
[<ffffffff8028922a>] __free_pages+0x1d/0x26
[<ffffffff80294b95>] __pte_alloc+0x8e/0x99
[<ffffffff80294cff>] handle_mm_fault+0x15f/0x766
[<ffffffff8045f2d6>] ? rt_mutex_down_read_trylock+0x1ee/0x1f9
[<ffffffff804629d9>] do_page_fault+0x51d/0x92c
[<ffffffff8021251c>] ? native_sched_clock+0x2a/0x72
[<ffffffff8027c930>] ? ftrace_now+0x9/0xb
[<ffffffff80281548>] ? tracing_hist_preempt_start+0xf1/0x10d
[<ffffffff8021251c>] ? native_sched_clock+0x2a/0x72
[<ffffffff8027c930>] ? ftrace_now+0x9/0xb
[<ffffffff804602f9>] error_exit+0x0/0x56

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------


2008-08-25 23:15:20

by John Kacur

[permalink] [raw]
Subject: Re: 2.6.26.3-rt3 bug report

On Mon, Aug 25, 2008 at 12:51 AM, John Kacur <[email protected]> wrote:
> I haven't seen this one before.
>
> Bad page state in process 'firefox-bin'
> page:ffffe20001011f00 flags:0x0100000000000000
> mapping:ffffe20001011f18 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 4180, comm: firefox-bin Tainted: G W 2.6.26.3-rt3 #3
>
> Call Trace:
> [<ffffffff80287a72>] bad_page+0x6f/0x9d
> [<ffffffff8029184c>] ? inc_zone_page_state+0x5f/0x6b
> [<ffffffff80289046>] free_hot_cold_page+0x88/0x1e6
> [<ffffffff8028920b>] free_hot_page+0x10/0x12
> [<ffffffff8028922a>] __free_pages+0x1d/0x26
> [<ffffffff80294b95>] __pte_alloc+0x8e/0x99
> [<ffffffff80294cff>] handle_mm_fault+0x15f/0x766
> [<ffffffff8045f2d6>] ? rt_mutex_down_read_trylock+0x1ee/0x1f9
> [<ffffffff804629d9>] do_page_fault+0x51d/0x92c
> [<ffffffff8021251c>] ? native_sched_clock+0x2a/0x72
> [<ffffffff8027c930>] ? ftrace_now+0x9/0xb
> [<ffffffff80281548>] ? tracing_hist_preempt_start+0xf1/0x10d
> [<ffffffff8021251c>] ? native_sched_clock+0x2a/0x72
> [<ffffffff8027c930>] ? ftrace_now+0x9/0xb
> [<ffffffff804602f9>] error_exit+0x0/0x56
>
> ---------------------------
> | preempt count: 00000000 ]
> | 0-level deep critical section nesting:
> ----------------------------------------
>

Slightly different form today.

Bad page state in process 'firefox-bin'
page:ffffe20000daa360 flags:0x0100000000000000
mapping:ffffe20000daa378 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 16955, comm: firefox-bin Not tainted 2.6.26.3-rt3 #3

Call Trace:
[<ffffffff80287a72>] bad_page+0x6f/0x9d
[<ffffffff8029184c>] ? inc_zone_page_state+0x5f/0x6b
[<ffffffff80289046>] free_hot_cold_page+0x88/0x1e6
[<ffffffff8028920b>] free_hot_page+0x10/0x12
[<ffffffff8028922a>] __free_pages+0x1d/0x26
[<ffffffff80294b95>] __pte_alloc+0x8e/0x99
[<ffffffff80294cff>] handle_mm_fault+0x15f/0x766
[<ffffffff8045f2d6>] ? rt_mutex_down_read_trylock+0x1ee/0x1f9
[<ffffffff804629d9>] do_page_fault+0x51d/0x92c
[<ffffffff8027f57a>] ? trace_hardirqs_off+0x11d/0x136
[<ffffffff8045feeb>] ? __spin_unlock_irqrestore+0x29/0x4d
[<ffffffff8045feeb>] ? __spin_unlock_irqrestore+0x29/0x4d
[<ffffffff8023168b>] ? hrtick_set+0x8f/0x100
[<ffffffff8021251c>] ? native_sched_clock+0x2a/0x72
[<ffffffff8027c930>] ? ftrace_now+0x9/0xb
[<ffffffff80281a97>] ? tracing_hist_preempt_stop+0x2cb/0x2f5
[<ffffffff8020c94d>] ? retint_swapgs+0xe/0x13
[<ffffffff8045f52c>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff804602f9>] error_exit+0x0/0x56

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

2008-08-26 04:15:29

by Jon Masters

[permalink] [raw]
Subject: Re: 2.6.26.3-rt3 bug report

On Tue, 2008-08-26 at 01:15 +0200, John Kacur wrote:

> Slightly different form today.
>
> Bad page state in process 'firefox-bin'
> page:ffffe20000daa360 flags:0x0100000000000000
> mapping:ffffe20000daa378 mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 16955, comm: firefox-bin Not tainted 2.6.26.3-rt3 #3

Er, does firefox run reliably on a non-RT 2.6.26.3 kernel? This many
random calls to bad_page suggests more of a RAM problem.

Jon.

2008-08-26 05:51:12

by Carsten Emde

[permalink] [raw]
Subject: Re: 2.6.26.3-rt3 bug report

Jon Masters wrote:
> On Tue, 2008-08-26 at 01:15 +0200, John Kacur wrote:
>> Slightly different form today.
>> Bad page state in process 'firefox-bin'
>> page:ffffe20000daa360 flags:0x0100000000000000
>> mapping:ffffe20000daa378 mapcount:0 count:0
>> Trying to fix it up, but a reboot is needed
>> Backtrace:
>> Pid: 16955, comm: firefox-bin Not tainted 2.6.26.3-rt3 #3
> Er, does firefox run reliably on a non-RT 2.6.26.3 kernel? This many
> random calls to bad_page suggests more of a RAM problem.
Don't think so. Same problem here:
Bad page state in process 'bonobo-activati'
page:c18f66fc flags:0x40000000 mapping:c18f670c mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 4169, comm: bonobo-activati Not tainted 2.6.26.3-rt3 #2
[<c0438e52>] ? printk+0x14/0x1a
[<c026e8ef>] bad_page+0x4e/0x79
[<c026f3d3>] free_hot_cold_page+0x5b/0x1dc
[<c026f5a1>] free_hot_page+0xf/0x11
[<c026f5c3>] __free_pages+0x20/0x2b
[<c02789a0>] __pte_alloc+0x6f/0x77
[<c0278a4d>] handle_mm_fault+0xa5/0x56c
[<c0243a1c>] ? rt_mutex_down_read+0x15c/0x164
[<c0218759>] do_page_fault+0x2c1/0x674
[<c021b21d>] ? enqueue_task+0x5a/0x66
[<c043bb66>] ? __spin_unlock_irqrestore+0x24/0x42
[<c0241e4f>] ? rt_mutex_adjust_prio+0x1a/0x31
[<c043bc92>] ? __spin_lock_irqsave+0x1e/0x38
[<c021f37d>] ? try_to_wake_up+0x176/0x180
[<c043bb5a>] ? __spin_unlock_irqrestore+0x18/0x42
[<c0241e4f>] ? rt_mutex_adjust_prio+0x1a/0x31
[<c0241e61>] ? rt_mutex_adjust_prio+0x2c/0x31
[<c0241e61>] ? rt_mutex_adjust_prio+0x2c/0x31
[<c043a8f7>] ? rt_write_slowunlock+0x1cd/0x1d5
[<c027ccb0>] ? mprotect_fixup+0x238/0x282
[<c0259384>] ? audit_syscall_exit+0x2b6/0x2d1
[<c0204a4e>] ? resume_userspace+0x6/0x1c
[<c0218498>] ? do_page_fault+0x0/0x674
[<c043bf02>] error_code+0x72/0x78

--cbe

2008-08-29 14:29:22

by John Kacur

[permalink] [raw]
Subject: Re: 2.6.26.3-rt3 bug report

On Tue, Aug 26, 2008 at 7:49 AM, Carsten Emde <[email protected]> wrote:
> Jon Masters wrote:
>> On Tue, 2008-08-26 at 01:15 +0200, John Kacur wrote:
>>> Slightly different form today.
>>> Bad page state in process 'firefox-bin'
>>> page:ffffe20000daa360 flags:0x0100000000000000
>>> mapping:ffffe20000daa378 mapcount:0 count:0
>>> Trying to fix it up, but a reboot is needed
>>> Backtrace:
>>> Pid: 16955, comm: firefox-bin Not tainted 2.6.26.3-rt3 #3
>> Er, does firefox run reliably on a non-RT 2.6.26.3 kernel? This many
>> random calls to bad_page suggests more of a RAM problem.
> Don't think so. Same problem here:
> Bad page state in process 'bonobo-activati'
> page:c18f66fc flags:0x40000000 mapping:c18f670c mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 4169, comm: bonobo-activati Not tainted 2.6.26.3-rt3 #2
> [<c0438e52>] ? printk+0x14/0x1a
> [<c026e8ef>] bad_page+0x4e/0x79
> [<c026f3d3>] free_hot_cold_page+0x5b/0x1dc
> [<c026f5a1>] free_hot_page+0xf/0x11
> [<c026f5c3>] __free_pages+0x20/0x2b
> [<c02789a0>] __pte_alloc+0x6f/0x77
> [<c0278a4d>] handle_mm_fault+0xa5/0x56c
> [<c0243a1c>] ? rt_mutex_down_read+0x15c/0x164
> [<c0218759>] do_page_fault+0x2c1/0x674
> [<c021b21d>] ? enqueue_task+0x5a/0x66
> [<c043bb66>] ? __spin_unlock_irqrestore+0x24/0x42
> [<c0241e4f>] ? rt_mutex_adjust_prio+0x1a/0x31
> [<c043bc92>] ? __spin_lock_irqsave+0x1e/0x38
> [<c021f37d>] ? try_to_wake_up+0x176/0x180
> [<c043bb5a>] ? __spin_unlock_irqrestore+0x18/0x42
> [<c0241e4f>] ? rt_mutex_adjust_prio+0x1a/0x31
> [<c0241e61>] ? rt_mutex_adjust_prio+0x2c/0x31
> [<c0241e61>] ? rt_mutex_adjust_prio+0x2c/0x31
> [<c043a8f7>] ? rt_write_slowunlock+0x1cd/0x1d5
> [<c027ccb0>] ? mprotect_fixup+0x238/0x282
> [<c0259384>] ? audit_syscall_exit+0x2b6/0x2d1
> [<c0204a4e>] ? resume_userspace+0x6/0x1c
> [<c0218498>] ? do_page_fault+0x0/0x674
> [<c043bf02>] error_code+0x72/0x78
>

Hi Carsten

Any progress or ideas here?

Thank you

John

2008-08-30 14:31:29

by Carsten Emde

[permalink] [raw]
Subject: Re: 2.6.26.3-rt3 bug report

Hi John,

> Carsten Emde wrote:
>> Jon Masters wrote:
>>> John Kacur wrote:
>>>> Slightly different form today.
>>>> Bad page state in process 'firefox-bin'
>>>> page:ffffe20000daa360 flags:0x0100000000000000
>>>> mapping:ffffe20000daa378 mapcount:0 count:0
>>>> Trying to fix it up, but a reboot is needed
>>>> Backtrace:
>>>> Pid: 16955, comm: firefox-bin Not tainted 2.6.26.3-rt3 #3
>>> Er, does firefox run reliably on a non-RT 2.6.26.3 kernel? This many
>>> random calls to bad_page suggests more of a RAM problem.
>> Don't think so. Same problem here:
>> Bad page state in process 'bonobo-activati'
>> page:c18f66fc flags:0x40000000 mapping:c18f670c mapcount:0 count:0
>> [..]
>
> Hi Carsten
> Any progress or ideas here?
No. Admittedly, we have not yet started to take care of the 2.6.26 RT
tree. Finishing the 2.6.24 RT tree was quite difficult and took somewhat
longer than earlier trees so we now really need to use it and to
integrate it into the various industrial projects that waited long
enough for it. This will certainly keep us busy during the next weeks or so.

Unfortunately, the "Bad page state" is not the only regression.
Currently, we know of the following problems in 2.6.26.3-rt3:
- Bad page state
- Tasks blocked for more than 120 seconds
- Various kernel OOPSes
- Increased worst-case latencies as compared to 2.6.24.7-rt17
The problem with these regressions is that they occur only very rarely
and under specific high system load conditions. So we will probably
first need to work on test conditions that will make them happen more
frequently. The "Bad page state" thing appears especially difficult to
fix, since it is probably due to memory corruption.

However, we will continuously check this and future releases of the
2.6.26 RT tree and work on them as time permits, but we do not expect to
have 2.6.26.X-rtY ready for production before October. Meanwhile, if you
find a way to better reproduce the "Bad page state" message, we will
certainly appreciate and use it here.


Thanks,

Carsten.