2008-01-13 07:35:49

by Valdis Klētnieks

[permalink] [raw]
Subject: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is
listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it
has just fetched from an outside server via IMAP - it will often just hang and
not make any further progress. Looking at netstat shows something interesting:

% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED
% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED
% netstat -n -a -A inet | grep 25
tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED
% netstat -n -a -A inet6 | grep 25
tcp 0 0 :::25 :::* LISTEN
tcp 0 0 ::ffff:127.0.0.1:25 ::ffff:127.0.0.1:59355 ESTABLISHED

On the IPv4 side, it thinks it's got 5108 bytes in the send queue - but on
the IPv6 side of that same connection, it's showing 0 in the receive queue,
and we're stuck there.

It's not consistent - sometimes Fetchmail will wedge on the very first mail,
and do so several times in a row. Other times, it will do well for a while -
at the moment, it's gone through 471 of the 1,470 currently queued mails just
fine, only to get wedged again on number 472.

For what it's worth, here's what 'echo w > /proc/sysrq-trigger' got, although I
don't see anything that looks odd to me given the netstat output above -
procmail has sent data, and is waiting for a response back, and sendmail is
waiting for data to arrive:

fetchmail S ffffffff8053c520 5360 17612 9902
ffff81007d37bb08 0000000000000086 0000000000000000 000200d000000000
ffff81006bf826c0 ffffffff80687360 ffff81006bf82918 0000000000000001
0000000000000003 ffff81007d37bb88 0000000000000000 0000000000000000
Call Trace:
[<ffffffff80522682>] schedule_timeout+0x22/0xb4
[<ffffffff80523bd3>] _spin_lock_bh+0x11/0x38
[<ffffffff80523ac4>] _spin_unlock_bh+0x1e/0x20
[<ffffffff8047cec6>] release_sock+0xa3/0xac
[<ffffffff8047d98f>] sk_wait_data+0x8a/0xcf
[<ffffffff80249b99>] autoremove_wake_function+0x0/0x38
[<ffffffff804abdca>] tcp_recvmsg+0x35a/0x86b
[<ffffffff8047c7be>] sock_common_recvmsg+0x32/0x47
[<ffffffff803288be>] selinux_socket_recvmsg+0x1d/0x1f
[<ffffffff8047af38>] sock_recvmsg+0x10e/0x12f
[<ffffffff80249b99>] autoremove_wake_function+0x0/0x38
[<ffffffff8032425d>] avc_has_perm+0x4c/0x5e
[<ffffffff803ac952>] pty_write+0x3a/0x44
[<ffffffff80249dd8>] remove_wait_queue+0x2f/0x3b
[<ffffffff8047c06b>] sys_recvfrom+0xa4/0xf5
[<ffffffff8024c850>] hrtimer_start+0x11f/0x131
[<ffffffff8023aa6e>] do_setitimer+0x184/0x326
[<ffffffff8020c03b>] system_call_after_swapgs+0x7b/0x80

sendmail S ffff81007d30a400 5360 17613 16992
ffff81006bc419e8 0000000000000086 ffff81006bc41998 ffffffff8023f6a5
ffff81007d30a400 ffff81007d24f200 ffff81007d30a658 0000000100000286
ffff81006bc419e8 ffffffff8023f851 000000004789b768 ffff81000100eb20
Call Trace:
[<ffffffff8023f6a5>] lock_timer_base+0x26/0x4a
[<ffffffff8023f851>] __mod_timer+0xc4/0xd6
[<ffffffff805226ed>] schedule_timeout+0x8d/0xb4
[<ffffffff8023f37c>] process_timeout+0x0/0xb
[<ffffffff805226e8>] schedule_timeout+0x88/0xb4
[<ffffffff8029cd26>] do_select+0x4a9/0x50b
[<ffffffff8029d22d>] __pollwait+0x0/0xdf
[<ffffffff8022d7b9>] default_wake_function+0x0/0xf
[<ffffffff80523bd3>] _spin_lock_bh+0x11/0x38
[<ffffffff8047cf74>] lock_sock_nested+0xa5/0xb2
[<ffffffff80523bd3>] _spin_lock_bh+0x11/0x38
[<ffffffff80523ac4>] _spin_unlock_bh+0x1e/0x20
[<ffffffff8047cec6>] release_sock+0xa3/0xac
[<ffffffff804ac1c9>] tcp_recvmsg+0x759/0x86b
[<ffffffff8047c7be>] sock_common_recvmsg+0x32/0x47
[<ffffffff803288be>] selinux_socket_recvmsg+0x1d/0x1f
[<ffffffff8047a924>] sock_aio_read+0x121/0x139
[<ffffffff8032425d>] avc_has_perm+0x4c/0x5e
[<ffffffff8029cf7a>] core_sys_select+0x1f2/0x2a0
[<ffffffff80282f50>] page_add_new_anon_rmap+0x20/0x22
[<ffffffff803251f5>] file_has_perm+0xa5/0xb4
[<ffffffff80249b99>] autoremove_wake_function+0x0/0x38
[<ffffffff8029d45c>] sys_select+0x150/0x17b
[<ffffffff8020c03b>] system_call_after_swapgs+0x7b/0x80

Any ideas?


Attachments:
(No filename) (226.00 B)

2008-01-13 21:46:59

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

[email protected] wrote:
>
> Any ideas?

Please provide a packet dump on both sides (or at least the sender
side).

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-01-14 16:15:55

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Sun, 13 Jan 2008 02:35:33 EST, [email protected] said:

> I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is
> listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it
> has just fetched from an outside server via IMAP - it will often just hang and
> not make any further progress. Looking at netstat shows something interesting:
>
> % netstat -n -a -A inet | grep 25
> tcp 0 5108 127.0.0.1:59355 127.0.0.1:25 ESTABLISHED

The IPv6 is apparently a red herring - this morning I'm seeing the same problem
with another totally separate pair of programs that are IPv4-only, hanging
on loopback.


Attachments:
(No filename) (226.00 B)

2008-01-14 16:48:41

by Paul Moore

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Monday 14 January 2008 11:15:38 am [email protected] wrote:
> On Sun, 13 Jan 2008 02:35:33 EST, [email protected] said:
> > I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail
> > is listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject
> > mail it has just fetched from an outside server via IMAP - it will often
> > just hang and not make any further progress. Looking at netstat shows
> > something interesting:
> >
> > % netstat -n -a -A inet | grep 25
> > tcp 0 5108 127.0.0.1:59355 127.0.0.1:25
> > ESTABLISHED
>
> The IPv6 is apparently a red herring - this morning I'm seeing the same
> problem with another totally separate pair of programs that are IPv4-only,
> hanging on loopback.

Are you still only seeing these problems on loopback? I can't help but wonder
if this is the skb_clone() problem where it wasn't copying skb->iif causing
SELinux to silently drop the packets. Then again, I'm not sure if there is a
clone operation in the code path are going down. From what I can remember I
only saw clones on some of the multicast stuff but I'm still learning some of
the darker corners of the stack.

If you've got some spare cycles, the kernel below should both have the
clone/iif fix (it's in Linus' tree now) as well as some printks when errors
occur so packet's are no longer silently dropped by SELinux.

* git://git.infradead.org/users/pcmoore/lblnet-2.6_testing

--
paul moore
linux security @ hp

2008-01-14 18:06:17

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Mon, 14 Jan 2008 11:36:40 EST, Paul Moore said:

> Are you still only seeing these problems on loopback? I can't help but wonder
> if this is the skb_clone() problem where it wasn't copying skb->iif causing
> SELinux to silently drop the packets.

Yes, I've only spotted it on loopback. The odd part is that I had reverted the
one commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 "Convert the netif code to
use ifindex values" - so either I managed to get the revert terribly wrong,
or there's something else odd going on. The first time around, I was seeing
hangs during a TCP 3-packet handshake - this time data flows for some number
of packets before hanging.

I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the
moment, and seeing if there's already a fix in there for this.


Attachments:
(No filename) (226.00 B)

2008-01-14 18:23:17

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Mon, 14 Jan 2008 13:05:48 EST, [email protected] said:

> I'm pulling git://git.infradead.org/users/pcmoore/lblnet-2.6_testing at the
> moment, and seeing if there's already a fix in there for this.

Apparently the only new commit in there since the tree that was in
24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning
printk's. Would it be more productive to test against the full tree, or
leaving out the one commit I already reverted?


Attachments:
(No filename) (226.00 B)

2008-01-14 18:50:59

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Mon, 14 Jan 2008 13:22:10 EST, [email protected] said:
> Apparently the only new commit in there since the tree that was in
> 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some warning
> printk's. Would it be more productive to test against the full tree, or
> leaving out the one commit I already reverted?

<voice=Emily Litella> Nevermind... </voice> :)

The new commit won't apply with the other one reverted - it patches
security/selinux/netnode.c which was created by the problematic commit...


Attachments:
(No filename) (226.00 B)

2008-01-14 19:08:11

by Paul Moore

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Monday 14 January 2008 1:50:39 pm [email protected] wrote:
> On Mon, 14 Jan 2008 13:22:10 EST, [email protected] said:
> > Apparently the only new commit in there since the tree that was in
> > 24-rc6-mm1 is 5d95575903fd3865b884952bd93c339d48725c33 adding some
> > warning printk's. Would it be more productive to test against the full
> > tree, or leaving out the one commit I already reverted?
>
> <voice=Emily Litella> Nevermind... </voice> :)
>
> The new commit won't apply with the other one reverted - it patches
> security/selinux/netnode.c which was created by the problematic commit...

There have been quite a few changes in lblnet-2.6_testing since 2.6.24-rc6-mm1
so I would recommend taking the whole tree. I'm also not quite sure if
simply reverting the "Convert the netif code to use ifindex values" patch
would solve the problem as there are other patches in the rc6-mm1 tree that
rely on skb->iif being valid (new code, not converted code). If you want to
stick with a _relatively_ vanilla rc6-mm1 tree I would leave everything in
and simply apply the following patch which solved the skb_clone()/iif
problem:

http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b

--
paul moore
linux security @ hp

2008-01-14 19:37:54

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said:
> There have been quite a few changes in lblnet-2.6_testing since 2.6.24-rc6-mm1
> so I would recommend taking the whole tree. I'm also not quite sure if

Weird. I did a 'git clone git://git.infradead.org/users/pcmoore/lblnet-2.6_testing'
into a new directory this morning, and doing a 'git log' against that only
showed the one added commit:

commit 5d95575903fd3865b884952bd93c339d48725c33
Author: Paul Moore <[email protected]>
Date: Wed Jan 9 15:30:23 2008 -0500

SELinux: Add warning messages on network denial due to error

Currently network traffic can be sliently dropped due to non-avc errors which
can lead to much confusion when trying to debug the problem. This patch adds
warning messages so that when these events occur there is a user visible
notification.

Signed-off-by: Paul Moore <[email protected]>

commit 9259ca5fd8b9fbdd2c3edade593dead905d8391e
Author: Paul Moore <[email protected]>
Date: Wed Jan 9 15:30:23 2008 -0500

SELinux: Add network ingress and egress control permission checks
(already in 24-rc6-mm1).

Somebody please tell me it's my git-idiocy..

> simply reverting the "Convert the netif code to use ifindex values" patch
> would solve the problem as there are other patches in the rc6-mm1 tree that
> rely on skb->iif being valid (new code, not converted code).

That would explain why I'm still seeing issues..

> If you want to
> stick with a _relatively_ vanilla rc6-mm1 tree I would leave everything in
> and simply apply the following patch which solved the skb_clone()/iif
> problem:
>
> http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b

OK, I'll go look at that..


Attachments:
(No filename) (226.00 B)

2008-01-14 20:02:33

by Paul Moore

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Monday 14 January 2008 2:37:02 pm [email protected] wrote:
> On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said:
> > There have been quite a few changes in lblnet-2.6_testing since
> > 2.6.24-rc6-mm1 so I would recommend taking the whole tree. I'm also not
> > quite sure if
>
> Weird. I did a 'git clone
> git://git.infradead.org/users/pcmoore/lblnet-2.6_testing' into a new
> directory this morning, and doing a 'git log' against that only showed the
> one added commit:
>
> commit 5d95575903fd3865b884952bd93c339d48725c33
> Author: Paul Moore <[email protected]>
> Date: Wed Jan 9 15:30:23 2008 -0500
>
> SELinux: Add warning messages on network denial due to error
>
> Currently network traffic can be sliently dropped due to non-avc errors
> which can lead to much confusion when trying to debug the problem. This
> patch adds warning messages so that when these events occur there is a user
> visible notification.
>
> Signed-off-by: Paul Moore <[email protected]>
>
> commit 9259ca5fd8b9fbdd2c3edade593dead905d8391e
> Author: Paul Moore <[email protected]>
> Date: Wed Jan 9 15:30:23 2008 -0500
>
> SELinux: Add network ingress and egress control permission checks
> (already in 24-rc6-mm1).
>
> Somebody please tell me it's my git-idiocy..

It might be something on my end with managing the lblnet-2.6_testing git tree;
I'm still pretty clueless when it comes to git.

I've got a git tree on my dev machine which is backed against Linus' tree and
managed via stacked-git. I update the patches in this tree, refresh them
against new bits from Linus, etc and when something significant changes I
update the git tree on infradead.org and post a new patchset to the related
lists. The process of updating the git tree on infradead.org usually
involves deleting the entire tree located there, re-creating it, and then
doing a git-push from my dev machine. I have no idea if this is "correct" or
not, but I've often wondered if this is a the "right" way to do it ...

--
paul moore
linux security @ hp

2008-01-14 23:04:51

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said:
>
> http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff;h=02f1c89d6e36507476f78108a3dcc78538be460b

Initial testing indicates that 2.6.24-rc6-mm1 plus this one commit is
behaving itself correctly - my Tcl test case that reliably demonstrated wedges
during SYN handling is definitively fixed, and the current issue with hangs with
data pending seems to be gone as well (after admittedly light testing).

Thanks for finding the commit that fixed it...


Attachments:
(No filename) (226.00 B)

2008-01-14 23:19:53

by Paul Moore

[permalink] [raw]
Subject: Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...

On Monday 14 January 2008 6:04:28 pm [email protected] wrote:
> On Mon, 14 Jan 2008 14:07:46 EST, Paul Moore said:
> > http://git.infradead.org/?p=users/pcmoore/lblnet-2.6_testing;a=commitdiff
> >;h=02f1c89d6e36507476f78108a3dcc78538be460b
>
> Initial testing indicates that 2.6.24-rc6-mm1 plus this one commit is
> behaving itself correctly - my Tcl test case that reliably demonstrated
> wedges during SYN handling is definitively fixed, and the current issue
> with hangs with data pending seems to be gone as well (after admittedly
> light testing).
>
> Thanks for finding the commit that fixed it...

No problem, glad to hear that fixed the problem. It's already in Linus' tree
so any future -mm kernels as well as 2.6.24 should be problem-free, at least
with respect to this ;)

--
paul moore
linux security @ hp