2002-09-16 09:16:43

by Shawn Starr

[permalink] [raw]
Subject: BUG(): sched.c: Line 944 - 2.5.35


Machine: Athlon MP 2000+ 512MB DDR Registered
Motherboard: A7M266-D

Kernel 2.5.35:

code resides in main schedule() function:

if (unlikely(in_atomic()))
BUG();

:(

--
Shawn Starr, sh0n.net, <[email protected]>
Maintainer: -shawn kernel patches: http://xfs.sh0n.net/2.4/



2002-09-16 16:32:31

by Adam J. Richter

[permalink] [raw]
Subject: Re: BUG(): sched.c: Line 944 - 2.5.35

Shawn Starr wrote:

>Kernel 2.5.35:
>
>code resides in main schedule() function:
>
>if (unlikely(in_atomic()))
> BUG();


That line prevously checked in_interrupt (in 2.5.34) instead of
in_atomic. If you have CONFIG_PREEMPT defined, the definition of in_atomic
in linux-2.5.35/include/asm-i386/hardirq.h is:

# define in_atomic() (preempt_count() != kernel_locked())

When I see this problem at boot, preempt_count() returns 0x4000000
(PREEMPT_ACTIVE) and kernel_locked() returns 0.

I don't understand the semantics of PREEMPT_ACTIVE to know
whether to

(1) change the test back to using in_interrupt instead of in_atomic, or
(2) change the definition of in_atomic(), or
(3) look for a bug somewhere else.

However, I know experimentally that changing the test back to
using in_interrupt() results in a possibly unrelated BUG() at line 279
of rmap.c:

void page_remove_rmap(struct page * page, pte_t * ptep)
{
pte_addr_t pte_paddr = ptep_to_paddr(ptep);
struct pte_chain *pc;

if (!page || !ptep)
BUG();
if (!pfn_valid(page_to_pfn(page)) || PageReserved(page))
return;

pte_chain_lock(page);

BUG_ON(page->pte.direct == 0);



Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-09-16 18:37:14

by Robert Love

[permalink] [raw]
Subject: Re: BUG(): sched.c: Line 944 - 2.5.35

On Mon, 2002-09-16 at 12:36, Adam J. Richter wrote:

> When I see this problem at boot, preempt_count() returns 0x4000000
> (PREEMPT_ACTIVE) and kernel_locked() returns 0.
>
> I don't understand the semantics of PREEMPT_ACTIVE to know
> whether to
>
> (1) change the test back to using in_interrupt instead of in_atomic, or
> (2) change the definition of in_atomic(), or
> (3) look for a bug somewhere else.

There are two problems: First, PREEMPT_ACTIVE is indeed set on entry to
schedule() from preempt_schedule() so we need to check for that too.
Second, the BUG() is catching a bit of issues... you want something
like:

- if (unlikely(in_atomic()))
- BUG();
+ if (unlikely(in_atomic() && preempt_count() != PREEMPT_ACTIVE)) {
+ printk(KERN_ERR "schedule() called while non-atomic!\n");
+ show_stack(NULL);
+ }

I will send a patch to Linus.

> However, I know experimentally that changing the test back to
> using in_interrupt() results in a possibly unrelated BUG() at line 279
> of rmap.c:

This is unrelated.

Robert Love

2002-09-16 22:43:50

by Luigi Genoni

[permalink] [raw]
Subject: Re: BUG(): sched.c: Line 944 - 2.5.35

OK, I will try this tomorrow morning

Luigi

On 16 Sep 2002, Robert Love wrote:

> Date: 16 Sep 2002 14:42:13 -0400
> From: Robert Love <[email protected]>
> To: Adam J. Richter <[email protected]>
> Cc: [email protected], [email protected], [email protected]
> Subject: Re: BUG(): sched.c: Line 944 - 2.5.35
>
> On Mon, 2002-09-16 at 12:36, Adam J. Richter wrote:
>
> > When I see this problem at boot, preempt_count() returns 0x4000000
> > (PREEMPT_ACTIVE) and kernel_locked() returns 0.
> >
> > I don't understand the semantics of PREEMPT_ACTIVE to know
> > whether to
> >
> > (1) change the test back to using in_interrupt instead of in_atomic, or
> > (2) change the definition of in_atomic(), or
> > (3) look for a bug somewhere else.
>
> There are two problems: First, PREEMPT_ACTIVE is indeed set on entry to
> schedule() from preempt_schedule() so we need to check for that too.
> Second, the BUG() is catching a bit of issues... you want something
> like:
>
> - if (unlikely(in_atomic()))
> - BUG();
> + if (unlikely(in_atomic() && preempt_count() != PREEMPT_ACTIVE)) {
> + printk(KERN_ERR "schedule() called while non-atomic!\n");
> + show_stack(NULL);
> + }
>
> I will send a patch to Linus.
>
> > However, I know experimentally that changing the test back to
> > using in_interrupt() results in a possibly unrelated BUG() at line 279
> > of rmap.c:
>
> This is unrelated.
>
> Robert Love
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>