2006-12-24 21:26:07

by Adam J. Richter

[permalink] [raw]
Subject: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
for several network programs running on my system:

[ 156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
[ 156.381876] in_atomic():1, irqs_disabled():0
[ 156.381881] no locks held by kio_http/9693.
[ 156.381886] [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
[ 156.381900] [<c0105dab>] show_trace+0x12/0x14
[ 156.381908] [<c0105e48>] dump_stack+0x16/0x18
[ 156.381917] [<c011e30f>] __might_sleep+0xe5/0xeb
[ 156.381926] [<c025942a>] lock_sock_nested+0x1d/0xc4
[ 156.381937] [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
[ 156.381946] [<c01c2505>] selinux_file_permission+0x96/0x9b
[ 156.381954] [<c0175a0a>] vfs_write+0x8d/0x167
[ 156.381962] [<c017605a>] sys_write+0x3f/0x63
[ 156.381971] [<c01040c0>] syscall_call+0x7/0xb
[ 156.381980] =======================

I have 35 of these messages is my console log at the moment.
The only difference that I've noticed between the messages is
that they are for variety of processes: most for tor, xntpd, sendmail,
procmail. The processes get to this point by sys_write, sys_send, or
sys_sendto (procmail was doing a sys_sendto, so it was also doing something
related to networking, even though it is not a program one normally
would think of as doing any networking system calls).

My system seems to work OK even with these warning messages.
I can debug it futher. I just figure I should report it now, because
I may have done everyone a disservice by putting off reporting it in
rc1 in the hopes of finding time to debug it.

Adam Richter


2006-12-25 00:25:31

by Andrew Morton

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

On Mon, 25 Dec 2006 05:21:24 +0800
"Adam J. Richter" <[email protected]> wrote:

> Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> for several network programs running on my system:
>
> [ 156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
> [ 156.381876] in_atomic():1, irqs_disabled():0
> [ 156.381881] no locks held by kio_http/9693.
> [ 156.381886] [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> [ 156.381900] [<c0105dab>] show_trace+0x12/0x14
> [ 156.381908] [<c0105e48>] dump_stack+0x16/0x18
> [ 156.381917] [<c011e30f>] __might_sleep+0xe5/0xeb
> [ 156.381926] [<c025942a>] lock_sock_nested+0x1d/0xc4
> [ 156.381937] [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> [ 156.381946] [<c01c2505>] selinux_file_permission+0x96/0x9b
> [ 156.381954] [<c0175a0a>] vfs_write+0x8d/0x167
> [ 156.381962] [<c017605a>] sys_write+0x3f/0x63
> [ 156.381971] [<c01040c0>] syscall_call+0x7/0xb
> [ 156.381980] =======================
>

There's a glaring bug in selinux_netlbl_inode_permission() - taking
lock_sock() inside rcu_read_lock().

I would again draw attention to Documentation/SubmitChecklist. In
particular please always always always enable all kernel debugging options
when developing and testing new kernel code. And everything else in that
file, too.

<guesses that this was tested on ia64>

2006-12-25 00:27:07

by Parag Warudkar

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

On Mon, 25 Dec 2006, Adam J. Richter wrote:

> Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> for several network programs running on my system:
>
> [ 156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
> [ 156.381876] in_atomic():1, irqs_disabled():0
> [ 156.381881] no locks held by kio_http/9693.
> [ 156.381886] [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> [ 156.381900] [<c0105dab>] show_trace+0x12/0x14
> [ 156.381908] [<c0105e48>] dump_stack+0x16/0x18
> [ 156.381917] [<c011e30f>] __might_sleep+0xe5/0xeb
> [ 156.381926] [<c025942a>] lock_sock_nested+0x1d/0xc4
> [ 156.381937] [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> [ 156.381946] [<c01c2505>] selinux_file_permission+0x96/0x9b
> [ 156.381954] [<c0175a0a>] vfs_write+0x8d/0x167
> [ 156.381962] [<c017605a>] sys_write+0x3f/0x63
> [ 156.381971] [<c01040c0>] syscall_call+0x7/0xb
> [ 156.381980] =======================
>

lock_sock_nested can sleep, its BH counterpart doesn't.
selinux_netlbl_inode_permission() probably needs to use the BH counterpart
unconditionally. But I am not sure if that function is always called from an atomic context. Assuming it is, the
attached patch should fix this.

Compile tested.

Signed-off-by: Parag Warudkar <[email protected]>

Parag


Attachments:
services.c.patch (443.00 B)
selinux - use proper locking interfaces

2007-01-02 09:09:34

by Adam J. Richter

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

On Sun, Dec 24, 2006 at 04:25:11PM -0800, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
> "Adam J. Richter" <[email protected]> wrote:
>
>> Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
>> for several network programs running on my system:
>>
>> [ 156.381868] BUG: sleeping function called from invalid context at net/core/sock.c:1523
[...]
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().
>
> I would again draw attention to Documentation/SubmitChecklist. In
> particular please always always always enable all kernel debugging options
> when developing and testing new kernel code. And everything else in that
> file, too.
>
> <guesses that this was tested on ia64>

I have not yet performed the 21 steps of
linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
great objectives list for future automation or some kind of community
web site. I hope to find time to make progress through that
checklist, but, in the meantime, I think the world may nevertheless be
infinitesmally better off if I post the patch that I'm currently
using that seems to fix the problem, seeing as how rc3 has passed
with no fix incorporated.

I think the intent of the offending code was to avoid doing
a lock_sock() in a presumably common case where there was no need to
take the lock. So, I have kept the presumably fast test to exit
early.

When it turns out to be necessary to take lock_sock(), RCU is
unlocked, then lock_sock is taken, the RCU is locked again, and
the test is repeated.

If I am wrong about lock_sock being expensive, I can
delete the lines that do the early return.

By the way, in a change not included in this patch,
I also tried consolidating the RCU locking in this file into a macro
IF_NLBL_REQUIRE(sksec, action), where "action" is the code
fragment to be executed with rcu_read_lock() held, although this
required splitting a couple of functions in half.

Anyhow, here is my current patch as MIME attachment.
Comments and labor in getting it through SubmitChecklist would
both be welcome.

Adam Richter


Attachments:
(No filename) (2.08 kB)
selinux.diff (621.00 B)
Download all attachments

2007-01-02 20:13:13

by Ingo Molnar

[permalink] [raw]
Subject: [patch] selinux: fix selinux_netlbl_inode_permission() locking


* Andrew Morton <[email protected]> wrote:

> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().

Note that the bug is still in -rc3, and is easily triggerable via a
default FC6 bootup. It's fixed by the (slightly modified) patch from
Parag Warudkar below that i have in the -rt tree.

Note that this bug became visible due to the recent __resched_legal()
fix, which bug made most of our atomicity debugging checks ineffective.
About half a dozen separate atomicity bugs triggered in -rt when i fixed
the __resched_legal() bug, so i'd expect some more to trigger upstream
too.

Ingo

------------------------>
Subject: [patch] selinux: fix selinux_netlbl_inode_permission() locking
From: Parag Warudkar <[email protected]>

do not call a sleeping lock API in an RCU read section.
lock_sock_nested can sleep, its BH counterpart doesn't.
selinux_netlbl_inode_permission() needs to use the BH counterpart
unconditionally.

Compile tested.

From: Ingo Molnar <[email protected]>

added BH disabling, because this function can be called from non-atomic
contexts too, so a naked bh_lock_sock() would be deadlock-prone.

Boot-tested the resulting kernel.

Signed-off-by: Parag Warudkar <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
security/selinux/ss/services.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux/security/selinux/ss/services.c
===================================================================
--- linux.orig/security/selinux/ss/services.c
+++ linux/security/selinux/ss/services.c
@@ -2660,9 +2660,11 @@ int selinux_netlbl_inode_permission(stru
rcu_read_unlock();
return 0;
}
- lock_sock(sock->sk);
+ local_bh_disable();
+ bh_lock_sock_nested(sock->sk);
rc = selinux_netlbl_socket_setsid(sock, sksec->sid);
- release_sock(sock->sk);
+ bh_unlock_sock(sock->sk);
+ local_bh_enable();
rcu_read_unlock();

return rc;

2007-01-02 21:25:37

by Paul Moore

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

On Tuesday, January 2 2007 2:58 am, Adam J. Richter wrote:
> I have not yet performed the 21 steps of
> linux-2.6.20-rc3/Documentation/SubmitChecklist, which I think is a
> great objectives list for future automation or some kind of community
> web site. I hope to find time to make progress through that
> checklist, but, in the meantime, I think the world may nevertheless be
> infinitesmally better off if I post the patch that I'm currently
> using that seems to fix the problem, seeing as how rc3 has passed
> with no fix incorporated.
>
> I think the intent of the offending code was to avoid doing
> a lock_sock() in a presumably common case where there was no need to
> take the lock. So, I have kept the presumably fast test to exit
> early.
>
> When it turns out to be necessary to take lock_sock(), RCU is
> unlocked, then lock_sock is taken, the RCU is locked again, and
> the test is repeated.

Hi Adam,

I'm sorry I just saw this mail (mail not sent directly to me get shuffled off
to a folder). I agree with your patch, I think dropping and then re-taking
the RCU lock is the best way to go, although I'm curious to see what others
have to say.

The only real comment I have with the patch is that there is some extra
whitespace which could probably be removed, but that is more of a style nit
than anything substantial.

> By the way, in a change not included in this patch,
> I also tried consolidating the RCU locking in this file into a macro
> IF_NLBL_REQUIRE(sksec, action), where "action" is the code
> fragment to be executed with rcu_read_lock() held, although this
> required splitting a couple of functions in half.

>From your description above I'm not sure I like that approach so much,
however, I could be misunderstanding something. Do you have a small example
you could send?

--
paul moore
linux security @ hp

2007-01-02 21:35:13

by Paul Moore

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

On Sunday, December 24 2006 7:25 pm, Andrew Morton wrote:
> On Mon, 25 Dec 2006 05:21:24 +0800
>
> "Adam J. Richter" <[email protected]> wrote:
> > Under 2.6.20-rc1 and 2.6.20-rc2, I get the following complaint
> > for several network programs running on my system:
> >
> > [ 156.381868] BUG: sleeping function called from invalid context at
> > net/core/sock.c:1523 [ 156.381876] in_atomic():1, irqs_disabled():0
> > [ 156.381881] no locks held by kio_http/9693.
> > [ 156.381886] [<c01057a2>] show_trace_log_lvl+0x1a/0x2f
> > [ 156.381900] [<c0105dab>] show_trace+0x12/0x14
> > [ 156.381908] [<c0105e48>] dump_stack+0x16/0x18
> > [ 156.381917] [<c011e30f>] __might_sleep+0xe5/0xeb
> > [ 156.381926] [<c025942a>] lock_sock_nested+0x1d/0xc4
> > [ 156.381937] [<c01cc570>] selinux_netlbl_inode_permission+0x5a/0x8e
> > [ 156.381946] [<c01c2505>] selinux_file_permission+0x96/0x9b
> > [ 156.381954] [<c0175a0a>] vfs_write+0x8d/0x167
> > [ 156.381962] [<c017605a>] sys_write+0x3f/0x63
> > [ 156.381971] [<c01040c0>] syscall_call+0x7/0xb
> > [ 156.381980] =======================
>
> There's a glaring bug in selinux_netlbl_inode_permission() - taking
> lock_sock() inside rcu_read_lock().

Sorry for the delay, I'm finally back at a machine where I can look at the
code.

I've been thinking about Parag Warudkar's and Ingo Molnar's patches as well as
what the selinux_netlbl_inode_permission() function actually needs to do; I
think the best answer isn't so much to change the socket locking calls, but
to restructure the function a bit.

Currently the function does the following (in order):

1. do some quick sanity checks (is the inode a socket, etc)
2. rcu_read_lock()
3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
4. lock_sock()
5. netlabel magic
6. release_sock()
7. rcu_read_unlock()

I propose changing it to the following (in order):

1. do some quick sanity checks (is the inode a socket, etc)
2. rcu_read_lock()
3. check the nlbl_state is set to NLBL_REQUIRE (otherwise return)
4. rcu_read_unlock()
5. lock_sock()
6. rcu_read_lock()
7. verify that nlbl_state is still set to NLBL_REQUIRE (otherwise return)
8. netlabel magic
9. rcu_read_unlock()
10. release_sock()

This way we no longer need to worry about any special socket locking. I
realize this adds a bit of duplicated work but it is my understanding that
RCU lock/unlock operations are *very* fast so the extra RCU lock operations
shouldn't be too bad and the extra nlbl_state check should be of minimal
cost.

However, I'm not the expert here, just a guy learning as he goes so any
comments/feedback on the above proposal are welcome. If it turns out this
approach has some merit I'll put together a patch and send it out.

Once again, sorry for the regression.

--
paul moore
linux security @ hp

2007-01-02 23:37:56

by David Miller

[permalink] [raw]
Subject: Re: selinux networking: sleeping functin called from invalid context in 2.6.20-rc[12]

From: Paul Moore <[email protected]>
Date: Tue, 2 Jan 2007 16:25:24 -0500

> I'm sorry I just saw this mail (mail not sent directly to me get
> shuffled off to a folder). I agree with your patch, I think
> dropping and then re-taking the RCU lock is the best way to go,
> although I'm curious to see what others have to say.

I think this is fine too.