LinuxLists.cc - Squid hang up on 2.6.34

2010-07-08 19:03:22

by Felipe W Damasio

[permalink] [raw]

Subject: Squid hang up on 2.6.34

Hi All,

I've been using squid for a few months now.

It worked great until recently, when we upgraded to kernel 2.6.34.

We're using squid on bridge scenario with TProxy.

Squid simply hung up (connections with squidclient didn't work), and
the process didn't respond to kill. I had to use "kill -9".

The dmesg output was:

kernel: general protection fault: 0000 [#1] SMP
kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
kernel: CPU 1
kernel: Modules linked in:

kernel: Pid: 18351, comm: squid Not tainted 2.6.34 #1 DX58SO/
kernel: RIP: 0010:[<ffffffff81360c2a>] ?[<ffffffff81360c2a>]
sock_rfree+0x26/0x37
kernel: RSP: 0018:ffff88041c28fc20 ?EFLAGS: 00010206
kernel: RAX: dce8dce85d415d41 RBX: ffff88038f098c00 RCX: 0000000000000720
kernel: RDX: ffff8804053b2e00 RSI: ffff88032564ee0c RDI: ffff88038f098c00
kernel: RBP: ffff8804051b2e00 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000020860 R11: ffff8804051b2e00 R12: 00000000000005a8
kernel: R13: 00000000000005a8 R14: 0000000000003d21 R15: 0000000000000000
kernel: FS: ?00007f214fa8c710(0000) GS:ffff880001840000(0000)
knlGS:0000000000000000
kernel: CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000000b388000 CR3: 000000041c4c4000 CR4: 00000000000006e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: Process squid (pid: 18351, threadinfo ffff88041c28e000, task
ffff88042e0fcec0)
kernel: Stack:
kernel: ffffffff81365dda ffff88038f098c00 ffffffff81365b8c ffff88038f098c00
kernel: <0> ffffffff813a222a 00000000000000d0 ffffffff81366af9 000000002e0fcec0
kernel: <0> ffff88042e0fcec0 ffff88042e0fcec0 ffff88042e0fcec0 0000000014d31cc0
kernel: Call Trace:
kernel: [<ffffffff81365dda>] ? skb_release_head_state+0x6d/0xb7
kernel: [<ffffffff81365b8c>] ? __kfree_skb+0x9/0x7d
kernel: [<ffffffff813a222a>] ? tcp_recvmsg+0x6a3/0x89a
kernel: [<ffffffff81366af9>] ? __alloc_skb+0x5e/0x14e
kernel: [<ffffffff81360ede>] ? sock_common_recvmsg+0x30/0x45
kernel: [<ffffffff8135ec0f>] ? sock_aio_read+0xdd/0xf1
kernel: [<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
kernel: [<ffffffff8142a25e>] ? _raw_spin_lock_bh+0x9/0x1f
kernel: [<ffffffff810acf32>] ? vfs_read+0xb9/0xff
kernel: [<ffffffff810ad034>] ? sys_read+0x45/0x6e
kernel: [<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
kernel: Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac
00 00 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42
38 <48> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
kernel: RIP ?[<ffffffff81360c2a>] sock_rfree+0x26/0x37
kernel: RSP <ffff88041c28fc20>
kernel: ---[ end trace bcd320fe508cc071 ]---

Can anybody help me?

What information can I provide you to track down this issue?

Cheers,

Felipe Damasio

2010-07-08 21:30:43

by Felipe W Damasio

[permalink] [raw]

Subject: Re: Squid hang up on 2.6.34

Hi again,

Just FYI: After squid hung up, I started it again.

A few minutes later, the machine frooze...after a reboot, syslog
didn't show any messages.

So the message below is the only tip of what happened.

Cheers,

Felipe Damasio

2010/7/8 Felipe W Damasio <[email protected]>
>
> Hi All,
>
> I've been using squid for a few months now.
>
> It worked great until recently, when we upgraded to kernel 2.6.34.
>
> We're using squid on bridge scenario with TProxy.
>
> Squid simply hung up (connections with squidclient didn't work), and
> the process didn't respond to kill. I had to use "kill -9".
>
> The dmesg output was:
>
> kernel: general protection fault: 0000 [#1] SMP
> kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
> kernel: CPU 1
> kernel: Modules linked in:
>
> kernel: Pid: 18351, comm: squid Not tainted 2.6.34 #1 DX58SO/
> kernel: RIP: 0010:[<ffffffff81360c2a>] ?[<ffffffff81360c2a>]
> sock_rfree+0x26/0x37
> kernel: RSP: 0018:ffff88041c28fc20 ?EFLAGS: 00010206
> kernel: RAX: dce8dce85d415d41 RBX: ffff88038f098c00 RCX: 0000000000000720
> kernel: RDX: ffff8804053b2e00 RSI: ffff88032564ee0c RDI: ffff88038f098c00
> kernel: RBP: ffff8804051b2e00 R08: 0000000000000000 R09: 0000000000000000
> kernel: R10: 0000000000020860 R11: ffff8804051b2e00 R12: 00000000000005a8
> kernel: R13: 00000000000005a8 R14: 0000000000003d21 R15: 0000000000000000
> kernel: FS: ?00007f214fa8c710(0000) GS:ffff880001840000(0000)
> knlGS:0000000000000000
> kernel: CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 000000000b388000 CR3: 000000041c4c4000 CR4: 00000000000006e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> kernel: Process squid (pid: 18351, threadinfo ffff88041c28e000, task
> ffff88042e0fcec0)
> kernel: Stack:
> kernel: ffffffff81365dda ffff88038f098c00 ffffffff81365b8c ffff88038f098c00
> kernel: <0> ffffffff813a222a 00000000000000d0 ffffffff81366af9 000000002e0fcec0
> kernel: <0> ffff88042e0fcec0 ffff88042e0fcec0 ffff88042e0fcec0 0000000014d31cc0
> kernel: Call Trace:
> kernel: [<ffffffff81365dda>] ? skb_release_head_state+0x6d/0xb7
> kernel: [<ffffffff81365b8c>] ? __kfree_skb+0x9/0x7d
> kernel: [<ffffffff813a222a>] ? tcp_recvmsg+0x6a3/0x89a
> kernel: [<ffffffff81366af9>] ? __alloc_skb+0x5e/0x14e
> kernel: [<ffffffff81360ede>] ? sock_common_recvmsg+0x30/0x45
> kernel: [<ffffffff8135ec0f>] ? sock_aio_read+0xdd/0xf1
> kernel: [<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
> kernel: [<ffffffff8142a25e>] ? _raw_spin_lock_bh+0x9/0x1f
> kernel: [<ffffffff810acf32>] ? vfs_read+0xb9/0xff
> kernel: [<ffffffff810ad034>] ? sys_read+0x45/0x6e
> kernel: [<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
> kernel: Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac
> 00 00 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42
> 38 <48> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
> kernel: RIP ?[<ffffffff81360c2a>] sock_rfree+0x26/0x37
> kernel: RSP <ffff88041c28fc20>
> kernel: ---[ end trace bcd320fe508cc071 ]---
>
> Can anybody help me?
>
> What information can I provide you to track down this issue?
>
> Cheers,
>
> Felipe Damasio

2010-07-08 22:06:32

by Eric Dumazet

[permalink] [raw]

Subject: Re: Squid hang up on 2.6.34

Le jeudi 08 juillet 2010 à 16:03 -0300, Felipe W Damasio a écrit :
> Hi All,
>

CC netdev

> I've been using squid for a few months now.
>
> It worked great until recently, when we upgraded to kernel 2.6.34.
>
> We're using squid on bridge scenario with TProxy.
>
> Squid simply hung up (connections with squidclient didn't work), and
> the process didn't respond to kill. I had to use "kill -9".
>
> The dmesg output was:
>
> kernel: general protection fault: 0000 [#1] SMP
> kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
> kernel: CPU 1
> kernel: Modules linked in:
>
> kernel: Pid: 18351, comm: squid Not tainted 2.6.34 #1 DX58SO/
> kernel: RIP: 0010:[<ffffffff81360c2a>] [<ffffffff81360c2a>]
> sock_rfree+0x26/0x37
> kernel: RSP: 0018:ffff88041c28fc20 EFLAGS: 00010206
> kernel: RAX: dce8dce85d415d41 RBX: ffff88038f098c00 RCX: 0000000000000720

dereferencing RAX, and RAX contains garbage (ascii chars :
"A ] A ]" ...)

At this point, RAX is supposed to contain a pointer to sk->sk_prot

static inline int sk_has_account(struct sock *sk)
{
/* return true if protocol supports memory accounting */
return !!sk->sk_prot->memory_allocated;
}

> kernel: RDX: ffff8804053b2e00 RSI: ffff88032564ee0c RDI: ffff88038f098c00
> kernel: RBP: ffff8804051b2e00 R08: 0000000000000000 R09: 0000000000000000
> kernel: R10: 0000000000020860 R11: ffff8804051b2e00 R12: 00000000000005a8
> kernel: R13: 00000000000005a8 R14: 0000000000003d21 R15: 0000000000000000
> kernel: FS: 00007f214fa8c710(0000) GS:ffff880001840000(0000)
> knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 000000000b388000 CR3: 000000041c4c4000 CR4: 00000000000006e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> kernel: Process squid (pid: 18351, threadinfo ffff88041c28e000, task
> ffff88042e0fcec0)
> kernel: Stack:
> kernel: ffffffff81365dda ffff88038f098c00 ffffffff81365b8c ffff88038f098c00
> kernel: <0> ffffffff813a222a 00000000000000d0 ffffffff81366af9 000000002e0fcec0
> kernel: <0> ffff88042e0fcec0 ffff88042e0fcec0 ffff88042e0fcec0 0000000014d31cc0
> kernel: Call Trace:
> kernel: [<ffffffff81365dda>] ? skb_release_head_state+0x6d/0xb7
> kernel: [<ffffffff81365b8c>] ? __kfree_skb+0x9/0x7d
> kernel: [<ffffffff813a222a>] ? tcp_recvmsg+0x6a3/0x89a
> kernel: [<ffffffff81366af9>] ? __alloc_skb+0x5e/0x14e
> kernel: [<ffffffff81360ede>] ? sock_common_recvmsg+0x30/0x45
> kernel: [<ffffffff8135ec0f>] ? sock_aio_read+0xdd/0xf1
> kernel: [<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
> kernel: [<ffffffff8142a25e>] ? _raw_spin_lock_bh+0x9/0x1f
> kernel: [<ffffffff810acf32>] ? vfs_read+0xb9/0xff
> kernel: [<ffffffff810ad034>] ? sys_read+0x45/0x6e
> kernel: [<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
> kernel: Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac
> 00 00 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42
> 38 <48> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
> kernel: RIP [<ffffffff81360c2a>] sock_rfree+0x26/0x37
> kernel: RSP <ffff88041c28fc20>
> kernel: ---[ end trace bcd320fe508cc071 ]---
>
> Can anybody help me?
>
> What information can I provide you to track down this issue?
>
> Cheers,
>
> Felipe Damasio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-07-08 22:15:28

by Eric Dumazet

[permalink] [raw]

Subject: Re: Squid hang up on 2.6.34

Le jeudi 08 juillet 2010 à 18:30 -0300, Felipe W Damasio a écrit :
> Hi again,
>
> Just FYI: After squid hung up, I started it again.
>
> A few minutes later, the machine frooze...after a reboot, syslog
> didn't show any messages.
>
> So the message below is the only tip of what happened.
>
> Cheers,

Please try to reproduce a new report.

It looks like a memory corruption, and it would be good to see if a
common pattern is occurring.

2010-07-09 15:04:00

by Felipe W Damasio

[permalink] [raw]

Subject: Re: Squid hang up on 2.6.34

Hi,

2010/7/8 Eric Dumazet <[email protected]>:
> Please try to reproduce a new report.
>
> It looks like a memory corruption, and it would be good to see if a
> common pattern is occurring.

I'm trying..the thing is the freeze occured on the machine that sits
on a 200Mbps ISP in bridge-mode. Since the machine frooze, and the
whole ISP went down for a few minutes, I'm not allowed to run any
tests on it.

I've setup the same scenario on a lab, but since last night been
unable to reproduce the bug. Maybe there's a clue on the this crash
below that can help me write some program to trigger the problem?

Cheers,

Felipe Damasio

kernel: general protection fault: 0000 [#1] SMP
kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
kernel: CPU 1
kernel: Modules linked in:

kernel: Pid: 18351, comm: squid Not tainted 2.6.34 #1 DX58SO/
kernel: RIP: 0010:[<ffffffff81360c2a>] [<ffffffff81360c2a>]
sock_rfree+0x26/0x37
kernel: RSP: 0018:ffff88041c28fc20 EFLAGS: 00010206
kernel: RAX: dce8dce85d415d41 RBX: ffff88038f098c00 RCX: 0000000000000720
kernel: RDX: ffff8804053b2e00 RSI: ffff88032564ee0c RDI: ffff88038f098c00
kernel: RBP: ffff8804051b2e00 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000020860 R11: ffff8804051b2e00 R12: 00000000000005a8
kernel: R13: 00000000000005a8 R14: 0000000000003d21 R15: 0000000000000000
kernel: FS: 00007f214fa8c710(0000) GS:ffff880001840000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000000b388000 CR3: 000000041c4c4000 CR4: 00000000000006e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: Process squid (pid: 18351, threadinfo ffff88041c28e000, task
ffff88042e0fcec0)
kernel: Stack:
kernel: ffffffff81365dda ffff88038f098c00 ffffffff81365b8c ffff88038f098c00
kernel: <0> ffffffff813a222a 00000000000000d0 ffffffff81366af9 000000002e0fcec0
kernel: <0> ffff88042e0fcec0 ffff88042e0fcec0 ffff88042e0fcec0 0000000014d31cc0
kernel: Call Trace:
kernel: [<ffffffff81365dda>] ? skb_release_head_state+0x6d/0xb7
kernel: [<ffffffff81365b8c>] ? __kfree_skb+0x9/0x7d
kernel: [<ffffffff813a222a>] ? tcp_recvmsg+0x6a3/0x89a
kernel: [<ffffffff81366af9>] ? __alloc_skb+0x5e/0x14e
kernel: [<ffffffff81360ede>] ? sock_common_recvmsg+0x30/0x45
kernel: [<ffffffff8135ec0f>] ? sock_aio_read+0xdd/0xf1
kernel: [<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
kernel: [<ffffffff8142a25e>] ? _raw_spin_lock_bh+0x9/0x1f
kernel: [<ffffffff810acf32>] ? vfs_read+0xb9/0xff
kernel: [<ffffffff810ad034>] ? sys_read+0x45/0x6e
kernel: [<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
kernel: Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac
00 00 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42
38 <48> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
kernel: RIP [<ffffffff81360c2a>] sock_rfree+0x26/0x37
kernel: RSP <ffff88041c28fc20>
kernel: ---[ end trace bcd320fe508cc071 ]---

2010-07-09 16:03:18

by Felipe W Damasio

[permalink] [raw]

Subject: Re: Squid hang up on 2.6.34

Hi again,

2010/7/9 Felipe W Damasio <[email protected]>:
> I'm trying..the thing is the freeze occured on the machine that sits
> on a 200Mbps ISP in bridge-mode. Since the machine frooze, and the
> whole ISP went down for a few minutes, I'm not allowed to run any
> tests on it.

The only thing I could track down is going through squid's cache.log,
I found these 2 entries a second before the general protection fault:

2010/07/08 14:51:10| httpReadReply: Excess data from "POST
http://bps.uol.com.br/send.html?ro=2VxogIeFwqQdX.ymjRSChUT67HabcLKfYsPrWlpnBtukZ5v8MANz10GJ4i9O3ED-7xl4ng7XW7RvS8AZ.9OP7WbPHhNq6tyh.0eDB5jl2W56wOu.El0KGg8i-bezIAunlQmJ9tFLERUth9-skZOlSnIUeKQeMqaG18kG8z9.tmkxvWMQtTq.fpUiv3mg5.oqN9ZtNuWqtu61GQGOCCQBKjcwTRMlkBCUoJzrOxMgIENaCwuoqHrK29WcpruyeYyDzv3Y2WFh92H-akWXJAFaPyiP-ZIILxBsSZCSmhY3wC-6lS4t.J6z4ek.J6u71vC8nEsYEhLPQBwHVICEpdqpBsW50pa2ooD32sTtUswlcUOU4iEnz8nX1ZRJLF.jOKH7ZPzCIHkAFF1ZAP87xjztOGTncc0X.7d5lwkdITonWzz1El7KLHmz8hB5sluq0Dus-RLbsCNFd0K4URoZLx6bKrypT.xcxL0ampRb.j.8Cais-IdyQDH43n3Z5TVoq5qjNVgPVIY4zA7omN8Wm5hoYIUUVzLUFhFV8hWc4PtPc7hjJK1audQf7jLB4mK5FaFR6VI9OxNTASehc0iZ8Nhee2YbAUxYLPbz.A3qb5iymjZ@&nout=1"
2010/07/08 14:51:10| clientTryParseRequest: FD 6088
(187.16.240.122:2035) Invalid Request

I suppose the last "Invalid request" triggered the bug. But like I
said, I don't know what I can do to help and fix this.

Cheers,

Felipe Damasio

2010-07-09 17:13:09

by Eric Dumazet

[permalink] [raw]

Subject: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le vendredi 09 juillet 2010 à 12:03 -0300, Felipe W Damasio a écrit :
> Hi,
>
> 2010/7/8 Eric Dumazet <[email protected]>:
> > Please try to reproduce a new report.
> >
> > It looks like a memory corruption, and it would be good to see if a
> > common pattern is occurring.
>
> I'm trying..the thing is the freeze occured on the machine that sits
> on a 200Mbps ISP in bridge-mode. Since the machine frooze, and the
> whole ISP went down for a few minutes, I'm not allowed to run any
> tests on it.
>
> I've setup the same scenario on a lab, but since last night been
> unable to reproduce the bug. Maybe there's a clue on the this crash
> below that can help me write some program to trigger the problem?
>

Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
but I could not see how it could explain your crash.

We can read uninitialized memory and trigger a fault in
nf_tproxy_assign_sock(), not later in tcp_recvmsg()...

David, Patrick, what do you think ?

Thanks

[PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).

Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c
index 5490fc3..daab8c4 100644
--- a/net/netfilter/nf_tproxy_core.c
+++ b/net/netfilter/nf_tproxy_core.c
@@ -70,7 +70,11 @@ nf_tproxy_destructor(struct sk_buff *skb)
int
nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
{
- if (inet_sk(sk)->transparent) {
+ bool transparent = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_transparent :
+ inet_sk(sk)->transparent;
+
+ if (transparent) {
skb_orphan(skb);
skb->sk = sk;
skb->destructor = nf_tproxy_destructor;

2010-07-09 17:53:18

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

From: Eric Dumazet <[email protected]>
Date: Fri, 09 Jul 2010 19:13:00 +0200

> Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
> but I could not see how it could explain your crash.
>
> We can read uninitialized memory and trigger a fault in
> nf_tproxy_assign_sock(), not later in tcp_recvmsg()...
>
> David, Patrick, what do you think ?
>
> Thanks
>
> [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
>
> transparent field of a socket is either inet_twsk(sk)->tw_transparent
> for timewait sockets, or inet_sk(sk)->transparent for other sockets
> (TCP/UDP).
>
> Signed-off-by: Eric Dumazet <[email protected]>

Looks fine to me:

Acked-by: David S. Miller <[email protected]>

2010-07-09 18:16:21

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi,

2010/7/9 Eric Dumazet <[email protected]>:
> Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
> but I could not see how it could explain your crash.
>
> We can read uninitialized memory and trigger a fault in
> nf_tproxy_assign_sock(), not later in tcp_recvmsg()...

Well, since I can't reproduce the bug, if you think this patch solves
my problem, I'll tell my bosses that we can put back the production
machine online.

But anyway, if there's test you think I can run, I have same the same
hardware on a lab using the same setup as the production
environment...

Thanks,

Felipe Damasio

2010-07-10 03:18:30

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/9 Eric Dumazet <[email protected]>:
> Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
> but I could not see how it could explain your crash.
>
> We can read uninitialized memory and trigger a fault in
> nf_tproxy_assign_sock(), not later in tcp_recvmsg()...
>
> David, Patrick, what do you think ?

But do you think that the bug that squid triggered was caused by the
TProxy code?

Or is related to the network-stack in some other point.

I don't know if this helps, but I'm using ebtables to remove the
packets from the bridge, and iptables to redirect the traffic to
squid.

ebtables rules are:

-p IPv4 -i eth0 --ip-proto tcp --ip-dport 80 -j redirect --redirect-target DROP
-p IPv4 -i eth1 --ip-proto tcp --ip-sport 80 -j redirect --redirect-target DROP

iptables -t mangle -L -n is:

iptables -t mangle -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket
extrachain tcp -- 0.0.0.0/0 0.0.0.0/0 tcp
dpt:80 ctstate NEW
TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
dpt:80 connmark match 0x0 TPROXY redirect 127.0.0.1:3127 mark 0x1/0x1
TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
dpt:80 connmark match 0x1 TPROXY redirect 127.0.0.1:3128 mark 0x1/0x1
TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
dpt:80 connmark match 0x2 TPROXY redirect 127.0.0.1:3129 mark 0x1/0x1

Chain DIVERT (1 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset
0x1/0xffffffff
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

Chain extrachain (1 references)
target prot opt source destination
CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
mode nth every 35 CONNMARK and 0x0
CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
mode nth every 35 packet 1 CONNMARK xset 0x1/0xffffffff
CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
mode nth every 35 packet 2 CONNMARK xset 0x2/0xffffffff

Don't know if the code on these can be traced back to tcp_recvmsg()
accessing some wrong memory address...

Cheers,

Felipe Damasio

2010-07-10 06:17:35

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le samedi 10 juillet 2010 à 00:18 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
>
> 2010/7/9 Eric Dumazet <[email protected]>:
> > Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
> > but I could not see how it could explain your crash.
> >
> > We can read uninitialized memory and trigger a fault in
> > nf_tproxy_assign_sock(), not later in tcp_recvmsg()...
> >
> > David, Patrick, what do you think ?
>
> But do you think that the bug that squid triggered was caused by the
> TProxy code?
>

I dont think so, but I was asking David or Patrick another point of
view.

Strange thing with your crash report is CR2 value, with unexpected value
of 000000000b388000 while RAX value is dce8dce85d415d41

Faulting instruction is :

48 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)

So I would have expected CR2 being RAX+0xb0, but its not.

> Or is related to the network-stack in some other point.
>
> I don't know if this helps, but I'm using ebtables to remove the
> packets from the bridge, and iptables to redirect the traffic to
> squid.
>
> ebtables rules are:
>
> -p IPv4 -i eth0 --ip-proto tcp --ip-dport 80 -j redirect --redirect-target DROP
> -p IPv4 -i eth1 --ip-proto tcp --ip-sport 80 -j redirect --redirect-target DROP
>
>
> iptables -t mangle -L -n is:
>
> iptables -t mangle -L -n
> Chain PREROUTING (policy ACCEPT)
> target prot opt source destination
> DIVERT tcp -- 0.0.0.0/0 0.0.0.0/0 socket
> extrachain tcp -- 0.0.0.0/0 0.0.0.0/0 tcp
> dpt:80 ctstate NEW
> TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
> dpt:80 connmark match 0x0 TPROXY redirect 127.0.0.1:3127 mark 0x1/0x1
> TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
> dpt:80 connmark match 0x1 TPROXY redirect 127.0.0.1:3128 mark 0x1/0x1
> TPROXY tcp -- 0.0.0.0/0 !201.40.162.5 tcp
> dpt:80 connmark match 0x2 TPROXY redirect 127.0.0.1:3129 mark 0x1/0x1
>
> Chain DIVERT (1 references)
> target prot opt source destination
> MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xset
> 0x1/0xffffffff
> ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
>
> Chain extrachain (1 references)
> target prot opt source destination
> CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
> mode nth every 35 CONNMARK and 0x0
> CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
> mode nth every 35 packet 1 CONNMARK xset 0x1/0xffffffff
> CONNMARK all -- 0.0.0.0/0 0.0.0.0/0 statistic
> mode nth every 35 packet 2 CONNMARK xset 0x2/0xffffffff
>
> Don't know if the code on these can be traced back to tcp_recvmsg()
> accessing some wrong memory address...
>
> Cheers,
>
> Felipe Damasio

2010-07-10 19:29:58

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

From: Eric Dumazet <[email protected]>
Date: Sat, 10 Jul 2010 08:17:29 +0200

> Strange thing with your crash report is CR2 value, with unexpected value
> of 000000000b388000 while RAX value is dce8dce85d415d41
>
> Faulting instruction is :
>
> 48 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)
>
> So I would have expected CR2 being RAX+0xb0, but its not.

It could be corruption from elsewhere. Those last four hex
digits (0x5d415d41) are "]A]A" in ascii, but that could just
be coincidence.

2010-07-11 03:11:22

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Miller,

2010/7/10 David Miller <[email protected]>:
> It could be corruption from elsewhere. ?Those last four hex
> digits (0x5d415d41) are "]A]A" in ascii, but that could just
> be coincidence.

What do you mean "from elsewhere"? You mean elsewhere on the network code?

Since the function that had the problem was tcp_recvmsg and we're
talking about a squid process, we're either talking about a typical
webserver-objet response, or about about an incorrect/faulty http
request from the user.

Like I told Mr. Dumazet, since on the squid logs I got a:

2010/07/08 14:51:10| clientTryParseRequest: FD 6088
(187.16.240.122:2035) Invalid Request

Only a second before the bug entry on syslog, I suppose that this
invalid request caused the problem (more like a guess, really).

If you think there's a way I can help reproduce/trigger and fix this
bug, please let me know, since the production machine is down until I
can ensure my bosses that this particular crash won't happen again.

Thanks,

Felipe Damasio

2010-07-11 05:19:32

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

On 07/10/2010 09:17 AM, Eric Dumazet wrote:
>
> Strange thing with your crash report is CR2 value, with unexpected value
> of 000000000b388000 while RAX value is dce8dce85d415d41
>
> Faulting instruction is :
>
> 48 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)
>
> So I would have expected CR2 being RAX+0xb0, but its not.
>

Nothing strange about it. You only get page faults and valid cr2 for
canonical addresses (17 high order bits all equal). In this case
rax+0xb0 is not a canonical address, so you got a general protection
fault instead, with cr2 unchanged.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2010-07-11 07:11:49

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi,

2010/7/11 Felipe W Damasio <[email protected]>:
> What do you mean "from elsewhere"? You mean elsewhere on the network code?
>
> Since the function that had the problem was tcp_recvmsg and we're
> talking about a squid process, we're either talking about a typical
> webserver-objet response, or about about an incorrect/faulty http
> request from the user.

I'm trying to understand the network code to see if I can help track
this down...

So I looked at the code of tcp_recvmsg, and I saw that the function
that calls __kfree_skb is sk_eat_skb, which is called in tcp_recvmsg:

if (tcp_hdr(skb)->fin)
goto found_fin_ok;
if (!(flags & MSG_PEEK)) {
sk_eat_skb(sk, skb, copied_early);
copied_early = 0;
}
continue;

found_fin_ok:
/* Process the FIN. */
++*seq;
if (!(flags & MSG_PEEK)) {
sk_eat_skb(sk, skb, copied_early);
copied_early = 0;
}
break;

Now, I'm no kernel programmer, but I saw the "Process the FIN" thing
on the code. I'm tuning these proc parameters....maybe some of these
are triggering the bug?

The production machine has 8GB of RAM:

echo 1 > /proc/sys/net/ipv4/ip_forward
echo 0 > /proc/sys/net/ipv4/conf/lo/rp_filter
echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind
echo 1 > /proc/sys/net/ipv4/tcp_low_latency

echo 0 > /proc/sys/net/ipv4/conf/eth2/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/br0/rp_filter

echo 1 > /proc/sys/net/ipv4/conf/all/forwarding
echo 1 > /proc/sys/net/ipv4/conf/all/send_redirects
echo 1 > /proc/sys/net/ipv4/conf/eth0/send_redirects

echo 16384 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
echo 32768 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
echo 65535 > /proc/sys/net/ipv4/neigh/default/gc_thresh3

echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range

echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 0 > /proc/sys/net/ipv4/tcp_ecn
echo 1 > /proc/sys/net/ipv4/tcp_low_latency
echo 100000 > /proc/sys/net/core/netdev_max_backlog
echo 409600 > /proc/sys/net/ipv4/tcp_max_syn_backlog

echo 3 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 65536 > /proc/sys/vm/min_free_kbytes
echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_rmem
echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_wmem
echo "1024000" > /proc/sys/net/core/rmem_max
echo "1024000" > /proc/sys/net/core/wmem_max
echo "512000" > /proc/sys/net/core/rmem_default
echo "512000" > /proc/sys/net/core/wmem_default
echo "524288" > /proc/sys/net/ipv4/netfilter/ip_conntrack_max

echo "2" > /proc/sys/net/ipv4/tcp_synack_retries
echo "2" > /proc/sys/net/ipv4/tcp_syn_retries
echo "262144" > /proc/sys/net/ipv4/tcp_max_orphans
echo "262144" > /proc/sys/net/core/somaxconn

Maybe these can help?

Cheers,

Felipe Damasio

2010-07-11 07:13:28

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

2010/7/11 Felipe W Damasio <[email protected]>:
> ? The production machine has 8GB of RAM:

I'm sorry, this is not right. The production machine has 16GB of RAM.

Don't know if that matters regarding those proc parameters, though.

Cheers,

Felipe Damasio

2010-07-11 08:02:20

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le dimanche 11 juillet 2010 à 08:19 +0300, Avi Kivity a écrit :
> On 07/10/2010 09:17 AM, Eric Dumazet wrote:
> >
> > Strange thing with your crash report is CR2 value, with unexpected value
> > of 000000000b388000 while RAX value is dce8dce85d415d41
> >
> > Faulting instruction is :
> >
> > 48 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)
> >
> > So I would have expected CR2 being RAX+0xb0, but its not.
> >
>
> Nothing strange about it. You only get page faults and valid cr2 for
> canonical addresses (17 high order bits all equal). In this case
> rax+0xb0 is not a canonical address, so you got a general protection
> fault instead, with cr2 unchanged.
>

OK, thanks Avi for this information, as I was not aware of this.

So something overwrote sk->sk_prot pointer (or skb->sk pointer) with
some data.

tcp sockets are allocated from a dedicated kmem_cache (because of
SLAB_DESTROY_RCU attribute). Their sk->sk_prot should never change in
normal operation, since underlying memory cannot be reused by another
object type in kernel. It should be NULL or &tcp_prot

Felipe, please describe your configuration as much as possible.
It might be a driver bug with with special kind of network frames.

lsmod
lspci -v
ethtool -k eth0
ethtool -k eth1 (if applicable)

2010-07-11 08:37:07

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le samedi 10 juillet 2010 à 12:30 -0700, David Miller a écrit :
> From: Eric Dumazet <[email protected]>
> Date: Sat, 10 Jul 2010 08:17:29 +0200
>
> > Strange thing with your crash report is CR2 value, with unexpected value
> > of 000000000b388000 while RAX value is dce8dce85d415d41
> >
> > Faulting instruction is :
> >
> > 48 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)
> >
> > So I would have expected CR2 being RAX+0xb0, but its not.
>
> It could be corruption from elsewhere. Those last four hex
> digits (0x5d415d41) are "]A]A" in ascii, but that could just
> be coincidence.
>

x86 being litle endian, string is "A]A]" followed by another "XYXY"
pattern (non ASCII chars : 0xE8, 0xDC, 0xE8, 0xDC, "èÜèÜ" in ISO8859)

2010-07-12 00:52:10

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/11 Eric Dumazet <[email protected]>:
> Felipe, please describe your configuration as much as possible.
> It might be a driver bug with with special kind of network frames.
>
> lsmod
> lspci -v
> ethtool -k eth0
> ethtool -k eth1 (if applicable)

Sure.

lsmod is empty. Though .config is attached.

We're using eth1 facing the user and eth2 facing the internet.

ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on

ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on

Dmesg output from a normal boot is also attached.

Cheers,

Felipe Damasio

Attachments:

dmesg.txt (52.25 kB)
dotconfig (57.24 kB)
lspci.txt (11.28 kB)
Download all attachments

2010-07-12 18:49:44

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/11 Felipe W Damasio <[email protected]>:
> We're using eth1 facing the user and eth2 facing the internet.

Here's the result using ethtool-2.6.34:

./ethtool -k eth1

Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off

./ethtool -k eth2

Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off

2010-07-13 14:31:40

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/12 Felipe W Damasio <[email protected]>:
> Here's the result using ethtool-2.6.34:
>
> ./ethtool -k eth1
>
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
>
>
> ./ethtool -k eth2
>
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off

Did these help you track down the issue?

Sorry to insist, it's just that my bosses are kind of pressuring me to
solve the problem and put the squid machine back online :-)

Is there a test I can run to try and trigger the issue?

I have the same scenario (hardware and network setup) on my lab...

Cheers,

Felipe Damasio

2010-07-13 14:40:32

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le mardi 13 juillet 2010 à 11:24 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
>
> 2010/7/12 Felipe W Damasio <[email protected]>:
> > Here's the result using ethtool-2.6.34:
> >
> > ./ethtool -k eth1
> >
> > Offload parameters for eth1:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp-segmentation-offload: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: off
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
> >
> >
> > ./ethtool -k eth2
> >
> > Offload parameters for eth2:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp-segmentation-offload: on
> > udp-fragmentation-offload: off
> > generic-segmentation-offload: on
> > generic-receive-offload: off
> > large-receive-offload: off
> > ntuple-filters: off
> > receive-hashing: off
>
> Did these help you track down the issue?
>
> Sorry to insist, it's just that my bosses are kind of pressuring me to
> solve the problem and put the squid machine back online :-)
>
> Is there a test I can run to try and trigger the issue?
>
> I have the same scenario (hardware and network setup) on my lab...
>

I currently have no fresh ideas. If you want this problem to be solved,
its important to setup in your lab a workload to trigger again and again
the bug, in order to provide us more crash information.

After code review doesnt spot obvious bugs, this is time for brute force
hunting, using git bisection for example...

2010-07-13 14:49:43

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/13 Eric Dumazet <[email protected]>:
> I currently have no fresh ideas. If you want this problem to be solved,
> its important to setup in your lab a workload to trigger again and again
> the bug, in order to provide us more crash information.

Right. I've been running non-stop since the first bug happened, but
so far the problem hasn't surfaced again :-(

I've been using the kernel with the patch that you provided me
(nf_tproxy.c). Is there a chance that patch fixed the problem?

Cheers,

Felipe Damasio

2010-07-13 15:49:59

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le mardi 13 juillet 2010 à 11:49 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
>
> 2010/7/13 Eric Dumazet <[email protected]>:
> > I currently have no fresh ideas. If you want this problem to be solved,
> > its important to setup in your lab a workload to trigger again and again
> > the bug, in order to provide us more crash information.
>
> Right. I've been running non-stop since the first bug happened, but
> so far the problem hasn't surfaced again :-(
>
> I've been using the kernel with the patch that you provided me
> (nf_tproxy.c). Is there a chance that patch fixed the problem?

This is a real bug, but I dont think it can fix your problem.

Looking again at your crash, we see RCX=0x720, decimal 1824

As its skb->len, we are freeing an skb that was collapsed or something
like that, since 1824 > 1460 (the normal MSS on ethernet)

GRO is off on your machine.

But coincidently 0x0720 is also a blank char for VGA screen...
(0x20 : ASCII space, 0x07 : default attribute)

So maybe you hit a corruption outside of network stack.

2010-07-13 20:55:42

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

I used the patched kernel on the production machine and squid frooze again.

This is the dmesg message:

general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
CPU 1
Modules linked in:

Pid: 5533, comm: squid Not tainted 2.6.34 #6 DX58SO/
RIP: 0010:[<ffffffff81369b2a>] [<ffffffff81369b2a>] sock_rfree+0x26/0x37
RSP: 0018:ffff88042287fc20 EFLAGS: 00010206
RAX: 66c86f938964c696 RBX: ffff88034e8f9a00 RCX: 0000000000000720
RDX: ffff8803f0ce05c0 RSI: ffff8803d441960c RDI: ffff88034e8f9a00
RBP: ffff8803f0ee05c0 R08: ffffea000dcb9998 R09: 0000000000000000
R10: 000000000003d830 R11: ffff8803f0ee05c0 R12: 00000000000005a8
R13: 00000000000005a8 R14: 0000000000004378 R15: 0000000000000000
FS: 00007f4cf33ee710(0000) GS:ffff880001840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000021d5fd0 CR3: 0000000422872000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process squid (pid: 5533, threadinfo ffff88042287e000, task ffff88042eb61a40)
Stack:
ffffffff8136ecda ffff88034e8f9a00 ffffffff8136ea8c ffff88034e8f9a00
<0> ffffffff813ab142 00000000000000d0 ffffffff8136f9f9 000000000eec60e2
<0> ffff88042eb61a40 ffff88042eb61a40 ffff88042eb61a40 00000000edca7300
Call Trace:
[<ffffffff8136ecda>] ? skb_release_head_state+0x6d/0xb7
[<ffffffff8136ea8c>] ? __kfree_skb+0x9/0x7d
[<ffffffff813ab142>] ? tcp_recvmsg+0x6a3/0x89a
[<ffffffff8136f9f9>] ? __alloc_skb+0x5e/0x14e
[<ffffffff81369dde>] ? sock_common_recvmsg+0x30/0x45
[<ffffffff81367b0f>] ? sock_aio_read+0xdd/0xf1
[<ffffffff813b6c97>] ? tcp_write_xmit+0x93e/0x96c
[<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
[<ffffffff810acf32>] ? vfs_read+0xb9/0xff
[<ffffffff810ad034>] ? sys_read+0x45/0x6e
[<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac 00 00
00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
RIP [<ffffffff81369b2a>] sock_rfree+0x26/0x37
RSP <ffff88042287fc20>
---[ end trace 22e6ca9ef825c0e6 ]---

Seems to be the same issue, right?

Cheers,

Felipe Damasio

2010/7/13 Eric Dumazet <[email protected]>:
> Le mardi 13 juillet 2010 ? 11:49 -0300, Felipe W Damasio a ?crit :
>> Hi Mr. Dumazet,
>>
>> 2010/7/13 Eric Dumazet <[email protected]>:
>> > I currently have no fresh ideas. If you want this problem to be solved,
>> > its important to setup in your lab a workload to trigger again and again
>> > the bug, in order to provide us more crash information.
>>
>> ?Right. I've been running non-stop since the first bug happened, but
>> so far the problem hasn't surfaced again :-(
>>
>> ?I've been using the kernel with the patch that you provided me
>> (nf_tproxy.c). Is there a chance that patch fixed the problem?
>
> This is a real bug, but I dont think it can fix your problem.
>
> Looking again at your crash, we see RCX=0x720, decimal 1824
>
> As its skb->len, we are freeing an skb that was collapsed or something
> like that, since 1824 > 1460 (the normal MSS on ethernet)
>
> GRO is off on your machine.
>
> But coincidently 0x0720 is also a blank char for VGA screen...
> (0x20 : ASCII space, 0x07 : default attribute)
>
> So maybe you hit a corruption outside of network stack.
>
>
>
>

2010-07-13 21:06:57

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi,

2010/7/13 Felipe W Damasio <[email protected]>:
> This is the dmesg message:

This error happened at 17:19:59.

A few seconds earlier, I have these on squid's logs:

2010/07/13 17:19:50| clientTryParseRequest: FD 1690
(189.113.65.55:52681) Invalid Request
2010/07/13 17:19:50| clientTryParseRequest: FD 5056
(189.113.74.101:2420) Invalid Request
2010/07/13 17:19:52| clientTryParseRequest: FD 26923
(189.113.74.101:2419) Invalid Request
2010/07/13 17:19:57| parseHttpRequest: Unsupported method 'PASS'
2010/07/13 17:19:57| clientTryParseRequest: FD 17786
(189.113.69.194:2422) Invalid Request
2010/07/13 17:19:57| parseHttpRequest: Unsupported method '<D2>Yk'
2010/07/13 17:19:57| clientTryParseRequest: FD 22554
(189.113.79.151:4225) Invalid Request
2010/07/13 17:19:59| httpReadReply: Excess data from "GET
http://webcs.msg.yahoo.com/crossdomain.xml"

Maybe with these two errors we can create some kind of trigger
program to help us duplicate this on the lab setup?

Cheers,

Felipe Damasio

2010-07-14 03:21:26

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le mardi 13 juillet 2010 à 17:55 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
>
> I used the patched kernel on the production machine and squid frooze again.
>
> This is the dmesg message:
>
>
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
> CPU 1
> Modules linked in:
>
> Pid: 5533, comm: squid Not tainted 2.6.34 #6 DX58SO/
> RIP: 0010:[<ffffffff81369b2a>] [<ffffffff81369b2a>] sock_rfree+0x26/0x37
> RSP: 0018:ffff88042287fc20 EFLAGS: 00010206
> RAX: 66c86f938964c696 RBX: ffff88034e8f9a00 RCX: 0000000000000720
> RDX: ffff8803f0ce05c0 RSI: ffff8803d441960c RDI: ffff88034e8f9a00
> RBP: ffff8803f0ee05c0 R08: ffffea000dcb9998 R09: 0000000000000000
> R10: 000000000003d830 R11: ffff8803f0ee05c0 R12: 00000000000005a8
> R13: 00000000000005a8 R14: 0000000000004378 R15: 0000000000000000
> FS: 00007f4cf33ee710(0000) GS:ffff880001840000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000021d5fd0 CR3: 0000000422872000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process squid (pid: 5533, threadinfo ffff88042287e000, task ffff88042eb61a40)
> Stack:
> ffffffff8136ecda ffff88034e8f9a00 ffffffff8136ea8c ffff88034e8f9a00
> <0> ffffffff813ab142 00000000000000d0 ffffffff8136f9f9 000000000eec60e2
> <0> ffff88042eb61a40 ffff88042eb61a40 ffff88042eb61a40 00000000edca7300
> Call Trace:
> [<ffffffff8136ecda>] ? skb_release_head_state+0x6d/0xb7
> [<ffffffff8136ea8c>] ? __kfree_skb+0x9/0x7d
> [<ffffffff813ab142>] ? tcp_recvmsg+0x6a3/0x89a
> [<ffffffff8136f9f9>] ? __alloc_skb+0x5e/0x14e
> [<ffffffff81369dde>] ? sock_common_recvmsg+0x30/0x45
> [<ffffffff81367b0f>] ? sock_aio_read+0xdd/0xf1
> [<ffffffff813b6c97>] ? tcp_write_xmit+0x93e/0x96c
> [<ffffffff810ac500>] ? do_sync_read+0xb0/0xf2
> [<ffffffff810acf32>] ? vfs_read+0xb9/0xff
> [<ffffffff810ad034>] ? sys_read+0x45/0x6e
> [<ffffffff8100292b>] ? system_call_fastpath+0x16/0x1b
> Code: ff ff ff ff c3 48 8b 57 18 8b 87 d8 00 00 00 48 8d 8a ac 00 00
> 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c3 41 57 41 89
> RIP [<ffffffff81369b2a>] sock_rfree+0x26/0x37
> RSP <ffff88042287fc20>
> ---[ end trace 22e6ca9ef825c0e6 ]---
>
>
> Seems to be the same issue, right?
>

Exactly the same. Only RAX value is different, its another chain.

BTW, 0x720 is not skb->len like I said earlier, but skb->truesize, and
0x720 is OK on a 64 bit machine for a regular packet.

48 8b 57 18 mov 0x18(%rdi),%rdx skb->sk
8b 87 d8 00 00 00 mov 0xd8(%rdi),%eax skb->truesize
48 8d 8a ac 00 00 00 lea 0xac(%rdx),%rcx
f0 29 82 ac 00 00 00 lock sub %eax,0xac(%rdx)
48 8b 57 18 mov 0x18(%rdi),%rdx skb->sk
8b 8f d8 00 00 00 mov 0xd8(%rdi),%ecx skb->truesize
48 8b 42 38 mov 0x38(%rdx),%rax sk->sk_prot
<48> 83 b8 b0 00 00 00 00 cmpq $0x0,0xb0(%rax)
74 06 je .+6
01 8a fa 00 00 00 add %ecx,0xfa(%rdx)

One thing to notice are the RDX and RBP values:

RDX: ffff8803f0ce05c0
RBP: ffff8803f0ee05c0

RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
, we notice RBP contains same "sk" value + 0x200000 (2 Mbytes).

(same remark on your initial bug report)

Could you enable CONFIG_FRAME_POINTER=y in your config ?

2010-07-14 03:27:58

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi Mr. Dumazet,

2010/7/14 Eric Dumazet <[email protected]>:
> RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
> , we notice RBP contains same "sk" value + 0x200000 ?(2 Mbytes).
>
> (same remark on your initial bug report)
>
> Could you enable CONFIG_FRAME_POINTER=y in your config ?

I can, but my bosses will kick my ass if I bring down the ISP again :)

If you think it's the only way to find the problem I'll tell them that
I need to do it. In this case, please tell me what other config
options/tools I can use to get as much info as possible...since I'll
probably be able to test this only once more on the production
environment for debugging purposes.

Cheers,

Felipe Damasio

2010-07-14 03:43:13

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le mercredi 14 juillet 2010 à 00:27 -0300, Felipe W Damasio a écrit :
> Hi Mr. Dumazet,
>
> 2010/7/14 Eric Dumazet <[email protected]>:
> > RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
> > , we notice RBP contains same "sk" value + 0x200000 (2 Mbytes).
> >
> > (same remark on your initial bug report)
> >
> > Could you enable CONFIG_FRAME_POINTER=y in your config ?
>
> I can, but my bosses will kick my ass if I bring down the ISP again :)
>

I have no guarantee at all, even if we find the bug.

> If you think it's the only way to find the problem I'll tell them that
> I need to do it. In this case, please tell me what other config
> options/tools I can use to get as much info as possible...since I'll
> probably be able to test this only once more on the production
> environment for debugging purposes.
>

You really should try to setup a lab to trigger the bug, and not doing
experiments on production :)

2010-07-14 03:51:36

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi,

2010/7/14 Eric Dumazet <[email protected]>:
>> I can, but my bosses will kick my ass if I bring down the ISP again :)
>
> I have no guarantee at all, even if we find the bug.

Ok :-)

>> If you think it's the only way to find the problem I'll tell them that
>> I need to do it. In this case, please tell me what other config
>> options/tools I can use to get as much info as possible...since I'll
>> probably be able to test this only once more on the production
>> environment for debugging purposes.
>
> You really should try to setup a lab to trigger the bug, and not doing
> experiments on production :)

Right, I'm trying.

The thing is: The ISP is a 200Mbps network with 10,000 users. The
first time it took around 2 minutes to trigger the bug. The second
time it took around 17 minutes.

So I *think* it's some TCP flag with some weird content...but I can't
find out what it is so I can trigger it on the lab.

So my only guess is to enable every possible debug flag I can think of
to track the bug down on the production environment. Any hints here
would be appreciated :)

Cheers,

Felipe Damasio

2010-07-14 06:56:25

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

On Wed, 14 Jul 2010, Felipe W Damasio wrote:

> Hi,
>
> 2010/7/14 Eric Dumazet <[email protected]>:
> >> I can, but my bosses will kick my ass if I bring down the ISP again :)
> >
> > I have no guarantee at all, even if we find the bug.
>
> Ok :-)
>
> >> If you think it's the only way to find the problem I'll tell them that
> >> I need to do it. In this case, please tell me what other config
> >> options/tools I can use to get as much info as possible...since I'll
> >> probably be able to test this only once more on the production
> >> environment for debugging purposes.
> >
> > You really should try to setup a lab to trigger the bug, and not doing
> > experiments on production :)
>
> Right, I'm trying.
>
> The thing is: The ISP is a 200Mbps network with 10,000 users. The
> first time it took around 2 minutes to trigger the bug. The second
> time it took around 17 minutes.
>
> So I *think* it's some TCP flag with some weird content...but I can't
> find out what it is so I can trigger it on the lab.
>
> So my only guess is to enable every possible debug flag I can think of
> to track the bug down on the production environment. Any hints here
> would be appreciated :)

Is it possible for you to mirror the production traffic to another
port, and then do a tcpdump capture to a series of files, so that
you might possibly be able to correlate the kernel crash to the
actual packets on the wire (and the Invalid Request squid errors)?
Just a suggestion.

-Bill

2010-07-14 11:41:27

by Patrick McHardy

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

On 09.07.2010 19:13, Eric Dumazet wrote:
> Le vendredi 09 juillet 2010 ? 12:03 -0300, Felipe W Damasio a ?crit :
>> Hi,
>>
>> 2010/7/8 Eric Dumazet <[email protected]>:
>>> Please try to reproduce a new report.
>>>
>>> It looks like a memory corruption, and it would be good to see if a
>>> common pattern is occurring.
>>
>> I'm trying..the thing is the freeze occured on the machine that sits
>> on a 200Mbps ISP in bridge-mode. Since the machine frooze, and the
>> whole ISP went down for a few minutes, I'm not allowed to run any
>> tests on it.
>>
>> I've setup the same scenario on a lab, but since last night been
>> unable to reproduce the bug. Maybe there's a clue on the this crash
>> below that can help me write some program to trigger the problem?
>>
>
> Reviewing tproxy stuff I spotted a problem in nf_tproxy_assign_sock()
> but I could not see how it could explain your crash.
>
> We can read uninitialized memory and trigger a fault in
> nf_tproxy_assign_sock(), not later in tcp_recvmsg()...
>
> David, Patrick, what do you think ?
>
> Thanks
>
> [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets
>
> transparent field of a socket is either inet_twsk(sk)->tw_transparent
> for timewait sockets, or inet_sk(sk)->transparent for other sockets
> (TCP/UDP).

I don't see anything preventing use of timewait sockets, so the
patch looks correct to me.

Applied to nf-2.6.git, thanks Eric.

2010-07-16 15:41:58

by Felipe W Damasio

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Hi All,

2010/7/14 Felipe W Damasio <[email protected]>:
> Hi Mr. Dumazet,
>
> 2010/7/14 Eric Dumazet <[email protected]>:
>> RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
>> , we notice RBP contains same "sk" value + 0x200000 ?(2 Mbytes).
>>
>> (same remark on your initial bug report)
>>
>> Could you enable CONFIG_FRAME_POINTER=y in your config ?

I did, this is the new bug:

general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
CPU 2
Modules linked in: e1000e

Pid: 4209, comm: squid Not tainted 2.6.34 #4 DX58SO/
RIP: 0010:[<ffffffff8137a887>] [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
RSP: 0018:ffff88042d781ba8 EFLAGS: 00010282
RAX: 9a7e7f4602400d48 RBX: ffff88034c918e00 RCX: 0000000000000720
RDX: ffff880413a82e00 RSI: ffff8804161e5e2a RDI: ffff88034c918e00
RBP: ffff88042d781ba8 R08: ffff88042d781b98 R09: 0000000000000000
R10: 0000000000040570 R11: 0000000000000000 R12: ffff880413882e00
R13: 00000000000005a8 R14: 00000000000005a8 R15: 000000000000a84d
FS: 00007f9aa0007710(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f11f831f020 CR3: 000000042d5f8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process squid (pid: 4209, threadinfo ffff88042d780000, task ffff88042e325620)
Stack:
ffff88042d781bc8 ffffffff8137fda0 ffff880413882e00 ffff88034c918e00
<0> ffff88042d781be8 ffffffff8137fb3b ffff88034c918e00 ffff88034c918e00
<0> ffff88042d781cd8 ffffffff813be69b ffff88042d781c38 ffffffff813c76e4
Call Trace:
[<ffffffff8137fda0>] skb_release_head_state+0x75/0xc0
[<ffffffff8137fb3b>] __kfree_skb+0x11/0x86
[<ffffffff813be69b>] tcp_recvmsg+0x6b9/0x8be
[<ffffffff813c76e4>] ? tcp_current_mss+0x46/0x65
[<ffffffff8137ab89>] sock_common_recvmsg+0x32/0x47
[<ffffffff811bafb6>] ? selinux_socket_recvmsg+0x1d/0x1f
[<ffffffff81378752>] __sock_recvmsg+0x6a/0x76
[<ffffffff81378847>] sock_aio_read+0xe9/0x102
[<ffffffff811b9d68>] ? avc_has_perm+0x4e/0x60
[<ffffffff810b63f6>] do_sync_read+0xc7/0x10d
[<ffffffff811bdb17>] ? selinux_file_permission+0xa5/0xb2
[<ffffffff811b7917>] ? security_file_permission+0x11/0x13
[<ffffffff810b6e7b>] vfs_read+0xbb/0x102
[<ffffffff810b6f86>] sys_read+0x47/0x70
[<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
Code: c3 48 8b 57 18 55 8b 87 d8 00 00 00 48 89 e5 48 8d 8a ac 00 00
00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c9 c3 55 48 89
RIP [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
RSP <ffff88042d781ba8>
---[ end trace 8932efc1ba58ce6e ]---

Does this tell you anything?

Cheers,

Felipe Damasio

2010-07-16 15:52:41

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH] tproxy: nf_tproxy_assign_sock() can handle tw sockets

Le vendredi 16 juillet 2010 à 12:41 -0300, Felipe W Damasio a écrit :
> Hi All,
>
> 2010/7/14 Felipe W Damasio <[email protected]>:
> > Hi Mr. Dumazet,
> >
> > 2010/7/14 Eric Dumazet <[email protected]>:
> >> RDX being the sk pointer (and sk+0x38 contains the corrupted "sk_prot" value)
> >> , we notice RBP contains same "sk" value + 0x200000 (2 Mbytes).
> >>
> >> (same remark on your initial bug report)
> >>
> >> Could you enable CONFIG_FRAME_POINTER=y in your config ?
>
> I did, this is the new bug:
>
> general protection fault: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/name
> CPU 2
> Modules linked in: e1000e
>
> Pid: 4209, comm: squid Not tainted 2.6.34 #4 DX58SO/
> RIP: 0010:[<ffffffff8137a887>] [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
> RSP: 0018:ffff88042d781ba8 EFLAGS: 00010282
> RAX: 9a7e7f4602400d48 RBX: ffff88034c918e00 RCX: 0000000000000720
> RDX: ffff880413a82e00 RSI: ffff8804161e5e2a RDI: ffff88034c918e00
> RBP: ffff88042d781ba8 R08: ffff88042d781b98 R09: 0000000000000000
> R10: 0000000000040570 R11: 0000000000000000 R12: ffff880413882e00
> R13: 00000000000005a8 R14: 00000000000005a8 R15: 000000000000a84d
> FS: 00007f9aa0007710(0000) GS:ffff880001a80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f11f831f020 CR3: 000000042d5f8000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process squid (pid: 4209, threadinfo ffff88042d780000, task ffff88042e325620)
> Stack:
> ffff88042d781bc8 ffffffff8137fda0 ffff880413882e00 ffff88034c918e00
> <0> ffff88042d781be8 ffffffff8137fb3b ffff88034c918e00 ffff88034c918e00
> <0> ffff88042d781cd8 ffffffff813be69b ffff88042d781c38 ffffffff813c76e4
> Call Trace:
> [<ffffffff8137fda0>] skb_release_head_state+0x75/0xc0
> [<ffffffff8137fb3b>] __kfree_skb+0x11/0x86
> [<ffffffff813be69b>] tcp_recvmsg+0x6b9/0x8be
> [<ffffffff813c76e4>] ? tcp_current_mss+0x46/0x65
> [<ffffffff8137ab89>] sock_common_recvmsg+0x32/0x47
> [<ffffffff811bafb6>] ? selinux_socket_recvmsg+0x1d/0x1f
> [<ffffffff81378752>] __sock_recvmsg+0x6a/0x76
> [<ffffffff81378847>] sock_aio_read+0xe9/0x102
> [<ffffffff811b9d68>] ? avc_has_perm+0x4e/0x60
> [<ffffffff810b63f6>] do_sync_read+0xc7/0x10d
> [<ffffffff811bdb17>] ? selinux_file_permission+0xa5/0xb2
> [<ffffffff811b7917>] ? security_file_permission+0x11/0x13
> [<ffffffff810b6e7b>] vfs_read+0xbb/0x102
> [<ffffffff810b6f86>] sys_read+0x47/0x70
> [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
> Code: c3 48 8b 57 18 55 8b 87 d8 00 00 00 48 89 e5 48 8d 8a ac 00 00
> 00 f0 29 82 ac 00 00 00 48 8b 57 18 8b 8f d8 00 00 00 48 8b 42 38 <48>
> 83 b8 b0 00 00 00 00 74 06 01 8a f4 00 00 00 c9 c3 55 48 89
> RIP [<ffffffff8137a887>] sock_rfree+0x2a/0x3c
> RSP <ffff88042d781ba8>
> ---[ end trace 8932efc1ba58ce6e ]---
>
> Does this tell you anything?
>

Could you privatly send me the vmlinux file ?