2009-01-27 23:10:51

by Roger Larsson

[permalink] [raw]
Subject: PROBLEM: in_atomic() misuse all over the place

(resent, non html)

[1.] One line summary of the problem:
in_atomic() is used to select path of execution, most likely to suppress
warning logs when running a preemptive kernel.

[2.] Full description of the problem/report:

The problem is that in_atomic() only works on preemptive kernels...

Result: preemptive kernel warns about a potential deadlock in all kernels,
developer silence with the help of in_atomic()
bug remains in SMP/UP kernels...

[3.] Keywords (i.e., modules, networking, kernel):
in_atomic(), preemptive, sleeping

[4.] Kernel version (from /proc/version):
2.6.28.2 (review)

[5.] Output of Oops..

"sleeping function called from invalid context"

[6.] A small shell script or example program which triggers the
problem (if possible)

Small example of the strange code I found is this

file: include/net/sock.h

static inline gfp_t gfp_any(void)
{
return in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
}

// I can find no comment on why this strange code would be OK.
// An it is in its turn used in several drivers...

If the processor can get to this function while atomic in a preemptive kernel
it will likely be able to get there in other kernels as well, especially SMP.

It returns GFP_ATOMIC when being atomic in a preemptive kernel, allocating
with that flag will not sleep. But calling this with another kernel will
ALWAYS return GFP_KERNEL, and it may cause a sleeping allocate...

Other places that I think misuse of in_atomic is Arch code, USB code, ...

There are 77 uses of in_atomic directly (grep | wc)
36 in ./arch
23 in ./drivers
2 in ./sound/core/seq/seq_virmidi.c
1 in ./include/net/sock.h
The remaining 15 are probably OK (kernel, mm/filemap)

But the sock.h gfp_any() spreads further...
1 in ./Documentation
2 in ./drivers
9 in ./net


Suggestion: should in_atomic() that by default return the safer 1 instead of
the unsafe 0. And add an in_atomic_warn() with default of 0 to use where it is
OK - like when deciding to log a warning.

/RogerL


2009-01-28 00:12:19

by Robert Hancock

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

Roger Larsson wrote:
> (resent, non html)
>
> [1.] One line summary of the problem:
> in_atomic() is used to select path of execution, most likely to suppress
> warning logs when running a preemptive kernel.
>
> [2.] Full description of the problem/report:
>
> The problem is that in_atomic() only works on preemptive kernels...
>
> Result: preemptive kernel warns about a potential deadlock in all kernels,
> developer silence with the help of in_atomic()
> bug remains in SMP/UP kernels...
>
> [3.] Keywords (i.e., modules, networking, kernel):
> in_atomic(), preemptive, sleeping
>
> [4.] Kernel version (from /proc/version):
> 2.6.28.2 (review)
>
> [5.] Output of Oops..
>
> "sleeping function called from invalid context"
>
> [6.] A small shell script or example program which triggers the
> problem (if possible)
>
> Small example of the strange code I found is this
>
> file: include/net/sock.h
>
> static inline gfp_t gfp_any(void)
> {
> return in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
> }
>
> // I can find no comment on why this strange code would be OK.
> // An it is in its turn used in several drivers...
>
> If the processor can get to this function while atomic in a preemptive kernel
> it will likely be able to get there in other kernels as well, especially SMP.
>
> It returns GFP_ATOMIC when being atomic in a preemptive kernel, allocating
> with that flag will not sleep. But calling this with another kernel will
> ALWAYS return GFP_KERNEL, and it may cause a sleeping allocate...
>
> Other places that I think misuse of in_atomic is Arch code, USB code, ...
>
> There are 77 uses of in_atomic directly (grep | wc)
> 36 in ./arch
> 23 in ./drivers
> 2 in ./sound/core/seq/seq_virmidi.c
> 1 in ./include/net/sock.h
> The remaining 15 are probably OK (kernel, mm/filemap)
>
> But the sock.h gfp_any() spreads further...
> 1 in ./Documentation
> 2 in ./drivers
> 9 in ./net
>
>
> Suggestion: should in_atomic() that by default return the safer 1 instead of
> the unsafe 0. And add an in_atomic_warn() with default of 0 to use where it is
> OK - like when deciding to log a warning.

I would think that the code that's using it for this purpose should be
changed to do things differently, such as by changing the functions
using it to make their caller pass in the proper GFP mask. I don't think
it was ever intended to be used to select allocation behavior like this,
only for debug warning checks and such.. Getting rid of in_atomic() and
creating a in_atomic_warn() that just raises a warning if called
atomically, might be the best long-term solution.

2009-01-28 12:19:25

by Andi Kleen

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

Roger Larsson <[email protected]> writes:

[cc netdev since network code is discussed]

> [2.] Full description of the problem/report:
>
> The problem is that in_atomic() only works on preemptive kernels...

That's not fully correct. It works in tasklets/softirqs/spin_lock_bh
on all kernels.

> file: include/net/sock.h
>
> static inline gfp_t gfp_any(void)
> {
> return in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
> }

That's typically for softirq vs non softirq, which is important
for the network stack.

That said its undoubtedly possible that some users get
it wrong and assume it applies to spinlocks (without _bh) too.
But at least the classical uses in the network stack should be ok.

I personally would consider anyone using it inside a normal
spinlock dubious anyways, because GFP_ATOMIC is the wrong
thing to use here. The correct way is to either switch to a mutex/sem
or move the allocation outside the lock.

-Andi

--
[email protected] -- Speaking for myself only.

2009-01-31 00:04:06

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

On Wed, 28 Jan 2009 13:18:50 +0100
Andi Kleen <[email protected]> wrote:

> > file: include/net/sock.h
> >
> > static inline gfp_t gfp_any(void)
> > {
> > return in_atomic() ? GFP_ATOMIC : GFP_KERNEL;
> > }
>
> That's typically for softirq vs non softirq, which is important
> for the network stack.
>

There's a bit of a problem here. If someone accidentally uses
gfp_any() inside a spinlock, it will do a sleeping allocation on
non-preempt kernels and will do an atomic allocation on preemptible
kernels, so we won't get to see the warning which would allow us to fix
the bug.

Would using irq_count() work? If so, that would fix this up.

2009-01-31 05:39:21

by Andi Kleen

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

> There's a bit of a problem here. If someone accidentally uses
> gfp_any() inside a spinlock, it will do a sleeping allocation on
> non-preempt kernels and will do an atomic allocation on preemptible
> kernels, so we won't get to see the warning which would allow us to fix
> the bug.

Yes exporting the function to drivers is dangerous I agree because
it's easy to abuse.

> Would using irq_count() work? If so, that would fix this up.

There's nothing that works reliably to detect spinlocks on non
preempt kernels.

-Andi

--
[email protected] -- Speaking for myself only.

2009-01-31 05:50:51

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

On Sat, 31 Jan 2009 06:55:08 +0100 Andi Kleen <[email protected]> wrote:

> > There's a bit of a problem here. If someone accidentally uses
> > gfp_any() inside a spinlock, it will do a sleeping allocation on
> > non-preempt kernels and will do an atomic allocation on preemptible
> > kernels, so we won't get to see the warning which would allow us to fix
> > the bug.
>
> Yes exporting the function to drivers is dangerous I agree because
> it's easy to abuse.
>
> > Would using irq_count() work? If so, that would fix this up.
>
> There's nothing that works reliably to detect spinlocks on non
> preempt kernels.

Hang on. You said

That's typically for softirq vs non softirq, which is important for
the network stack.

that's what in_softirq() does.

Now, if networking is indeed using in_atomic() to detect
are-we-inside-a-spinlock then networking is buggy.

If networking is _not_ doing that then we can safely switch it to
in_sortirq() or in_interrupt(). And this would reenable the bug
detection which networking's use of in_atomic() accidentally
suppressed.

2009-01-31 08:48:59

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

From: Andrew Morton <[email protected]>
Date: Fri, 30 Jan 2009 21:49:33 -0800

> Hang on. You said
>
> That's typically for softirq vs non softirq, which is important for
> the network stack.
>
> that's what in_softirq() does.
>
> Now, if networking is indeed using in_atomic() to detect
> are-we-inside-a-spinlock then networking is buggy.
>
> If networking is _not_ doing that then we can safely switch it to
> in_sortirq() or in_interrupt(). And this would reenable the bug
> detection which networking's use of in_atomic() accidentally
> suppressed.

I think this is a reasonable conclusion, looking at the
gfp_any() users.

Feel free to change it to use in_softirq() and see what
explodes in -mm. Report to me your findings :-)

2009-01-31 08:59:26

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

On Sat, 31 Jan 2009 00:48:43 -0800 (PST) David Miller <[email protected]> wrote:

> From: Andrew Morton <[email protected]>
> Date: Fri, 30 Jan 2009 21:49:33 -0800
>
> > Hang on. You said
> >
> > That's typically for softirq vs non softirq, which is important for
> > the network stack.
> >
> > that's what in_softirq() does.
> >
> > Now, if networking is indeed using in_atomic() to detect
> > are-we-inside-a-spinlock then networking is buggy.
> >
> > If networking is _not_ doing that then we can safely switch it to
> > in_sortirq() or in_interrupt(). And this would reenable the bug
> > detection which networking's use of in_atomic() accidentally
> > suppressed.
>
> I think this is a reasonable conclusion, looking at the
> gfp_any() users.
>
> Feel free to change it to use in_softirq() and see what
> explodes in -mm. Report to me your findings :-)

I don't get much network coverage in my testing...

I went for in_interrupt(), which is in_softirq()||in_hardirq(). I
guess that was a bit of a cop-out if the design decision is that this
is purely for are-we-in-softirq decision making.

I'll set it to in_softirq() and shall see what happens..

2009-01-31 11:45:31

by Joerg Roedel

[permalink] [raw]
Subject: Re: PROBLEM: in_atomic() misuse all over the place

On Tue, Jan 27, 2009 at 06:12:00PM -0600, Robert Hancock wrote:
> I would think that the code that's using it for this purpose should be
> changed to do things differently, such as by changing the functions
> using it to make their caller pass in the proper GFP mask. I don't think
> it was ever intended to be used to select allocation behavior like this,
> only for debug warning checks and such.. Getting rid of in_atomic() and
> creating a in_atomic_warn() that just raises a warning if called
> atomically, might be the best long-term solution.

I also made the mistake of using in_atomic() wrong in one of my last
patch sets. In my case this was pointed out by the reviewers. Is there
some documentation in the kernel tree how and when in_atomic() is used
right? If not I think its worth writing a little file that explains the
important details about the correct use of in_atomic() :-)

Joerg