2014-04-14 15:25:47

by James Chapman

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

Please send the complete oops message.

Is this a regression? If so, do you know what the last kernel version
was that worked?

Thanks
James

On 14/04/14 14:33, Zhan Jianyu wrote:
> When I tried to connect my VPN, I got a panic, saying
> a NULL poiter dereference at 0x00000000000002c0
>
> I came across this bug twice today, after updateing to
> Linux-3.15-rc1.
>
> Below are some panic message(hand copy,not complete)
> =====
>
> Kernel panic - not syncing: Fatal exception in interupt
>
> RIP ip_queue_xmit+0x20/0x3e0
> Call Trace:
> l2tp_xmit_skb+0x335/0x6c0 [l2tp_core]
> ? skb_free_head+0x1e/0x80
> pppol2tp_xmit+0x141/0x210 [l2tp_ppp]
> ppp_channel_push+0x50/0xd0 [ppp_generic]
> ppp_write+0xa3/0xec [ppp_generic]
> vfs_write
> Sys_wirte
> ? __audit_syscall_exit
> system_call_fastpath
>
> =====
>
> I've tried to figure it out.
> I disassembled ip_queue_xmit, found that the null
> dereference is caused by the first argument of
> ip_queue_xmit(), which is sk_buff pointer became
> NULL.
>
> This seems some async skb freeing is in progress?
>
> Regards,
> Jianyu Zhan
>


2014-04-14 17:02:11

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, Apr 14, 2014 at 11:19 PM, James Chapman <[email protected]> wrote:
> Please send the complete oops message.

Hi, complete oops message is :


[ 100.243737] BUG: unable to handle kernel NULL pointer dereference
at 00000000000002c0
[ 100.244985] IP: [<ffffffff815d4cb0>] ip_queue_xmit+0x20/0x3e0
[ 100.262266] PGD 0
[ 100.288395] Oops: 0000 [#1] SMP
[ 100.325955] Modules linked in: l2tp-ppp l2tp-netlink l2tp_core
pppoe vmblock vsock vmmemctl vmhgfs acpiphp snd_ens1371 gameport
snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
snd_seq snd_timer snd_seq_device ppdev psmouse serio_raw fbcon
tileblit font bitblit softcursor snd parport_pc soundcore
snd_page_alloc vmci i2c_piix4 vga16fb vgastate intel_agp agpgart
shpchp lp parport floppy pcnet32 mii mptspi mptscsih mptbase
scsi_transport_spi vmxnet
[ 1100.472178] CPU: 0 PID: 1849 Comm: pppd Tainted: G IOE
3.15.0-rc1+ #13
[ 1100.540018] Hardware name: LENOVO 20216/v2QY041, BIOS
74CN34WW(v2.0k) 07/04/2013
[ 1100.562844] task: ffff88003ee48000 ti: ffff880257a4e000 task.ti:
ffff880257a4e000
[ 1100.584351] RIP: 0010: [<ffffffff815d4cb0>] [<ffffffff815d4cb0>]
ip_queue_xmit+0x20/0x3e0
[ 1100.584351] RSP: 0018: ffff880257a4fdd8 EFLAGS: 00010293
[ 1100.609358] RAX: ffff88003fe48e00 RBX:ffff88003f5ad100 RCX:0000000000000014
[ 1100.631467] RDX: ffffffff81ccab40 RSI: ffff88003fe49110
RDI:ffff88003f5ad100
[ 1100.657664] RBP: ffff880257a4fe00 R08: ffff88807ff06476
R09:0000000000000022
[ 1100.657694] R10: ffffe90009726600 R11: ffff88003fe49110
R12:0000000000000000
[ 1100.657694] R13: ffff88025a087480 R14: ffff880025b561400
R15:ffff88003fe49110
[ 1100.688994] FS: 00007f8857f8f840(0000)
GS:ffff88026f200000(0000) knlGS:000000000000
[ 1100.697694] CS: 0010 DS: 0000 ES: 0000 CR0:0000000080050033
[ 1100.697694] CR2:0000000000002c0 CR3: 0000000260190e00 CR4: 00000000001407f0
[ 1100.706083] Stack:
[ 1100.731783] ffff88025cadbe00 ffff88003fe48e00 0000000000000022
ffff88025b561400
[ 1100.759324] ffff88003f5ad100 ffff880257a4fe68 ffffffffa081a6e5
ffff88003f5ad100
[ 1100.811396] ffff880257a4fe28 ffffffff8158686e ffff880257a4fe68
ffffffff000000008
[ 1100.891922] Call Trace:
[ 1100.916257] [<ffffffffa081a6e5>] l2tp_xmit_skb+0x335/0x6c0 [l2tp_core]
[ 1100.943670] [<ffffffff8158686e>] ? skb_free_head+0x1e/0x80
[ 1100.970905] [<ffffffffa0830da1>] pppol2tp_xmit+0x141/0x210 [l2tp_ppp]
[ 1100.995542] [<ffffffffa0805230>] ppp_channel_push+0x50/0xd0 [ppp_generic]
[ 1101.024087] [<ffffffffa0805523>] ppp_write+0xa3/0xec [ppp_generic]
[ 1101.133447] [<ffffffff811c7bfa>] vfs_write+0xba/0x1e0
[ 1101.228987] [<ffffffff811c8746>] Sys_wirte+0x46/0xb0
[ 1101.428494] [<ffffffff8110aca6>] ? __audit_syscall_exit+0x1f6/0x2a0
[ 1101.898345] [<ffffffff816a90e9>] system_call_fastpath+0x16/0x1b
[ 1102.024087] Code: e9 31 fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00
55 48 89 e5 41 57 49 89 f7 41 55 41 54 53 48 89 fb 4c 8b 67 18 4c 8b
67 18 4c 8b 6f 58 <4d> 8b b4 24 c0 02 00 00 49 83 e5 fe 0f 84 3e 02 00
00 4d 85 f6
[ 1102.879592] RIP: [<ffffffff815d4cb0>] ip_queue_xmit+0x20/0x3e0
[ 1103.179592] RsP<FFFF880257A4FDD8>
[ 1103.834682] CR2: 00000000000002C0
[ 1104.043682] kernel panic - not syncing: Fatal exception in interrupt
[ 1104.789539] Offset: 0x0 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffff9fffffff)

>Is this a regression? If so, do you know what the last kernel version
>was that worked?

I think this is a regression. 3.14 is the good working version. Today,
after I updated to 3.15-rc1, evertime I tried connect my VPN, it
always panics.

2014-04-14 17:12:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Tue, 2014-04-15 at 01:01 +0800, Zhan Jianyu wrote:
> On Mon, Apr 14, 2014 at 11:19 PM, James Chapman <[email protected]> wrote:
> > Please send the complete oops message.
>
> Hi, complete oops message is :
>
>
> [ 100.243737] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000002c0
> [ 100.244985] IP: [<ffffffff815d4cb0>] ip_queue_xmit+0x20/0x3e0
> [ 100.262266] PGD 0
> [ 100.288395] Oops: 0000 [#1] SMP
> [ 100.325955] Modules linked in: l2tp-ppp l2tp-netlink l2tp_core
> pppoe vmblock vsock vmmemctl vmhgfs acpiphp snd_ens1371 gameport
> snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event
> snd_seq snd_timer snd_seq_device ppdev psmouse serio_raw fbcon
> tileblit font bitblit softcursor snd parport_pc soundcore
> snd_page_alloc vmci i2c_piix4 vga16fb vgastate intel_agp agpgart
> shpchp lp parport floppy pcnet32 mii mptspi mptscsih mptbase
> scsi_transport_spi vmxnet
> [ 1100.472178] CPU: 0 PID: 1849 Comm: pppd Tainted: G IOE
> 3.15.0-rc1+ #13
> [ 1100.540018] Hardware name: LENOVO 20216/v2QY041, BIOS
> 74CN34WW(v2.0k) 07/04/2013
> [ 1100.562844] task: ffff88003ee48000 ti: ffff880257a4e000 task.ti:
> ffff880257a4e000
> [ 1100.584351] RIP: 0010: [<ffffffff815d4cb0>] [<ffffffff815d4cb0>]
> ip_queue_xmit+0x20/0x3e0
> [ 1100.584351] RSP: 0018: ffff880257a4fdd8 EFLAGS: 00010293
> [ 1100.609358] RAX: ffff88003fe48e00 RBX:ffff88003f5ad100 RCX:0000000000000014
> [ 1100.631467] RDX: ffffffff81ccab40 RSI: ffff88003fe49110
> RDI:ffff88003f5ad100
> [ 1100.657664] RBP: ffff880257a4fe00 R08: ffff88807ff06476
> R09:0000000000000022
> [ 1100.657694] R10: ffffe90009726600 R11: ffff88003fe49110
> R12:0000000000000000
> [ 1100.657694] R13: ffff88025a087480 R14: ffff880025b561400
> R15:ffff88003fe49110
> [ 1100.688994] FS: 00007f8857f8f840(0000)
> GS:ffff88026f200000(0000) knlGS:000000000000
> [ 1100.697694] CS: 0010 DS: 0000 ES: 0000 CR0:0000000080050033
> [ 1100.697694] CR2:0000000000002c0 CR3: 0000000260190e00 CR4: 00000000001407f0
> [ 1100.706083] Stack:
> [ 1100.731783] ffff88025cadbe00 ffff88003fe48e00 0000000000000022
> ffff88025b561400
> [ 1100.759324] ffff88003f5ad100 ffff880257a4fe68 ffffffffa081a6e5
> ffff88003f5ad100
> [ 1100.811396] ffff880257a4fe28 ffffffff8158686e ffff880257a4fe68
> ffffffff000000008
> [ 1100.891922] Call Trace:
> [ 1100.916257] [<ffffffffa081a6e5>] l2tp_xmit_skb+0x335/0x6c0 [l2tp_core]
> [ 1100.943670] [<ffffffff8158686e>] ? skb_free_head+0x1e/0x80
> [ 1100.970905] [<ffffffffa0830da1>] pppol2tp_xmit+0x141/0x210 [l2tp_ppp]
> [ 1100.995542] [<ffffffffa0805230>] ppp_channel_push+0x50/0xd0 [ppp_generic]
> [ 1101.024087] [<ffffffffa0805523>] ppp_write+0xa3/0xec [ppp_generic]
> [ 1101.133447] [<ffffffff811c7bfa>] vfs_write+0xba/0x1e0
> [ 1101.228987] [<ffffffff811c8746>] Sys_wirte+0x46/0xb0
> [ 1101.428494] [<ffffffff8110aca6>] ? __audit_syscall_exit+0x1f6/0x2a0
> [ 1101.898345] [<ffffffff816a90e9>] system_call_fastpath+0x16/0x1b
> [ 1102.024087] Code: e9 31 fe ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00
> 55 48 89 e5 41 57 49 89 f7 41 55 41 54 53 48 89 fb 4c 8b 67 18 4c 8b
> 67 18 4c 8b 6f 58 <4d> 8b b4 24 c0 02 00 00 49 83 e5 fe 0f 84 3e 02 00
> 00 4d 85 f6
> [ 1102.879592] RIP: [<ffffffff815d4cb0>] ip_queue_xmit+0x20/0x3e0
> [ 1103.179592] RsP<FFFF880257A4FDD8>
> [ 1103.834682] CR2: 00000000000002C0
> [ 1104.043682] kernel panic - not syncing: Fatal exception in interrupt
> [ 1104.789539] Offset: 0x0 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffff9fffffff)
>
> >Is this a regression? If so, do you know what the last kernel version
> >was that worked?
>
> I think this is a regression. 3.14 is the good working version. Today,
> after I updated to 3.15-rc1, evertime I tried connect my VPN, it
> always panics.
> --

Hmm, it seems commit 31c70d5956fc l2tp: keep original skb ownership
is the problem.

ip_queue_xmit() assumes the socket attached to skb is an inet socket.


2014-04-14 17:34:20

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 10:12:29 -0700

> Hmm, it seems commit 31c70d5956fc l2tp: keep original skb ownership
> is the problem.
>
> ip_queue_xmit() assumes the socket attached to skb is an inet socket.

This is similar to the "send over AF_PACKET" issue we were discussing
the other week.

It seems we need a real resolution to this issue.

To recap:

1) We want to charge memory to the top-level socket.

2) However during encapsulations etc. we can end up in IP stack
which expects only IP sockets to be attached to skb->sk

I suspect that in the short term we may have to bit the bullet and
compromise #2, and do flow control via the tunnel's socket.

2014-04-14 17:40:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 13:34 -0400, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Mon, 14 Apr 2014 10:12:29 -0700
>
> > Hmm, it seems commit 31c70d5956fc l2tp: keep original skb ownership
> > is the problem.
> >
> > ip_queue_xmit() assumes the socket attached to skb is an inet socket.
>
> This is similar to the "send over AF_PACKET" issue we were discussing
> the other week.
>
> It seems we need a real resolution to this issue.
>
> To recap:
>
> 1) We want to charge memory to the top-level socket.
>
> 2) However during encapsulations etc. we can end up in IP stack
> which expects only IP sockets to be attached to skb->sk
>
> I suspect that in the short term we may have to bit the bullet and
> compromise #2, and do flow control via the tunnel's socket.

Or add a sk parameter to ip_queue_xmit(), to break the assumption "sk =
skb->sk", which happened to be generally true, but is ambiguous for IP
tunnels.


2014-04-14 17:49:10

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 10:40:04 -0700

> On Mon, 2014-04-14 at 13:34 -0400, David Miller wrote:
>> From: Eric Dumazet <[email protected]>
>> Date: Mon, 14 Apr 2014 10:12:29 -0700
>>
>> > Hmm, it seems commit 31c70d5956fc l2tp: keep original skb ownership
>> > is the problem.
>> >
>> > ip_queue_xmit() assumes the socket attached to skb is an inet socket.
>>
>> This is similar to the "send over AF_PACKET" issue we were discussing
>> the other week.
>>
>> It seems we need a real resolution to this issue.
>>
>> To recap:
>>
>> 1) We want to charge memory to the top-level socket.
>>
>> 2) However during encapsulations etc. we can end up in IP stack
>> which expects only IP sockets to be attached to skb->sk
>>
>> I suspect that in the short term we may have to bit the bullet and
>> compromise #2, and do flow control via the tunnel's socket.
>
> Or add a sk parameter to ip_queue_xmit(), to break the assumption "sk =
> skb->sk", which happened to be generally true, but is ambiguous for IP
> tunnels.

Yep that would handle the l2tp cases, but vxlan needs something different
since it goes through iptunnel_xmit().

So perhaps as a quick fix we can add an 'sk' arg to both
ip_queue_xmit() and ip_local_out().

2014-04-14 17:54:45

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 13:49 -0400, David Miller wrote:

> Yep that would handle the l2tp cases, but vxlan needs something different
> since it goes through iptunnel_xmit().
>
> So perhaps as a quick fix we can add an 'sk' arg to both
> ip_queue_xmit() and ip_local_out().

Right, I am working on this right now.

2014-04-14 18:02:57

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 10:54:42 -0700

> On Mon, 2014-04-14 at 13:49 -0400, David Miller wrote:
>
>> Yep that would handle the l2tp cases, but vxlan needs something different
>> since it goes through iptunnel_xmit().
>>
>> So perhaps as a quick fix we can add an 'sk' arg to both
>> ip_queue_xmit() and ip_local_out().
>
> Right, I am working on this right now.

Thanks a lot Eric.

BTW, it occurs to me that there may be some other spots in the output path
that expect that if SKB is ETH_P_IP then skb->sk is IP socket. For example
somewhere in netfilter or packet classifier paths.

Just FYI... that was one of the things I was going to audit.

2014-04-14 18:19:30

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 14:02 -0400, David Miller wrote:

> BTW, it occurs to me that there may be some other spots in the output path
> that expect that if SKB is ETH_P_IP then skb->sk is IP socket. For example
> somewhere in netfilter or packet classifier paths.
>
> Just FYI... that was one of the things I was going to audit.

packet classifiers cannot have such assumptions for sure.

About iptunnel_xmit(), I do not think there is an issue, as we do not
enter ip_queue_xmit() on this path, but ip_local_out().

ip_local_out() doesn't use skb->sk , unless some netfilter module
uses this. I am not sure how the previous behavior could be useful in
this case, as all packets were sharing same socket ownership.

net/netfilter/xt_owner.c for example has better coverage if it can
really have a pointer to the user socket, not the internal socket used
by l2tp or vxlan tunnel.



2014-04-14 18:48:48

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 11:19:26 -0700

> ip_local_out() doesn't use skb->sk

It does Eric.

We had just such a report with this in the backtrace, when AF_PACKET
sends over vxlan devices.

The problem is ip_mc_output().

2014-04-14 19:06:42

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 14:48 -0400, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Mon, 14 Apr 2014 11:19:26 -0700
>
> > ip_local_out() doesn't use skb->sk
>
> It does Eric.
>

Hmmm, right...

> We had just such a report with this in the backtrace, when AF_PACKET
> sends over vxlan devices.
>
> The problem is ip_mc_output().

So this means that : User socket wanted sk_mc_loop(sk), but because
vxlan changed skb->sk to internal socket, we were doing something else
anyway.

There are a lot of undocumented features here... like

skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;

which socket do we want here (in ip_queue_xmit()) ?

Its interesting to see ip6_xmit() already has a 'struct sock *sk'
parameter...

This was the preliminary patch I tested :

include/net/inet6_connection_sock.h | 2 +-
include/net/inet_connection_sock.h | 2 +-
include/net/ip.h | 2 +-
net/dccp/output.c | 2 +-
net/ipv4/ip_output.c | 5 +++--
net/ipv4/tcp_output.c | 2 +-
net/ipv6/inet6_connection_sock.c | 3 +--
net/l2tp/l2tp_core.c | 4 ++--
net/l2tp/l2tp_ip.c | 2 +-
net/sctp/protocol.c | 2 +-
10 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
index f981ba7adeed..74af137304be 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -40,7 +40,7 @@ void inet6_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,

void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr);

-int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl);
+int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);

struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu);
#endif /* _INET6_CONNECTION_SOCK_H */
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c55aeed41ace..7a4313887568 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -36,7 +36,7 @@ struct tcp_congestion_ops;
* (i.e. things that depend on the address family)
*/
struct inet_connection_sock_af_ops {
- int (*queue_xmit)(struct sk_buff *skb, struct flowi *fl);
+ int (*queue_xmit)(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
void (*send_check)(struct sock *sk, struct sk_buff *skb);
int (*rebuild_header)(struct sock *sk);
void (*sk_rx_dst_set)(struct sock *sk, const struct sk_buff *skb);
diff --git a/include/net/ip.h b/include/net/ip.h
index 25064c28e059..77e73d293e09 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -111,7 +111,7 @@ int ip_do_nat(struct sk_buff *skb);
void ip_send_check(struct iphdr *ip);
int __ip_local_out(struct sk_buff *skb);
int ip_local_out(struct sk_buff *skb);
-int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl);
+int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
void ip_init(void);
int ip_append_data(struct sock *sk, struct flowi4 *fl4,
int getfrag(void *from, char *to, int offset, int len,
diff --git a/net/dccp/output.c b/net/dccp/output.c
index 8876078859da..0248e8a3460c 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -138,7 +138,7 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)

DCCP_INC_STATS(DCCP_MIB_OUTSEGS);

- err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
+ err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
return net_xmit_eval(err);
}
return -ENOBUFS;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 1a0755fea491..7ad68b860935 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -315,9 +315,9 @@ static void ip_copy_addrs(struct iphdr *iph, const struct flowi4 *fl4)
sizeof(fl4->saddr) + sizeof(fl4->daddr));
}

-int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
+/* Note: skb->sk can be different from sk, in case of tunnels */
+int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
{
- struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
struct ip_options_rcu *inet_opt;
struct flowi4 *fl4;
@@ -389,6 +389,7 @@ packet_routed:
ip_select_ident_more(skb, &rt->dst, sk,
(skb_shinfo(skb)->gso_segs ?: 1) - 1);

+ /* TODO : should we use skb->sk here instead of sk ? */
skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 699fb102e971..025e25093984 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -981,7 +981,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
tcp_skb_pcount(skb));

- err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
+ err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
if (likely(err <= 0))
return err;

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index c9138189415a..d4ade34ab375 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -224,9 +224,8 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
return dst;
}

-int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl_unused)
+int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused)
{
- struct sock *sk = skb->sk;
struct ipv6_pinfo *np = inet6_sk(sk);
struct flowi6 fl6;
struct dst_entry *dst;
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 47f7a5490555..a4e37d7158dc 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1131,10 +1131,10 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
skb->local_df = 1;
#if IS_ENABLED(CONFIG_IPV6)
if (tunnel->sock->sk_family == PF_INET6 && !tunnel->v4mapped)
- error = inet6_csk_xmit(skb, NULL);
+ error = inet6_csk_xmit(tunnel->sock, skb, NULL);
else
#endif
- error = ip_queue_xmit(skb, fl);
+ error = ip_queue_xmit(tunnel->sock, skb, fl);

/* Update stats */
if (error >= 0) {
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 0b44d855269c..3397fe6897c0 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -487,7 +487,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m

xmit:
/* Queue the packet to IP for output */
- rc = ip_queue_xmit(skb, &inet->cork.fl);
+ rc = ip_queue_xmit(sk, skb, &inet->cork.fl);
rcu_read_unlock();

error:
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4e1d0fcb028e..c09757fbf803 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -957,7 +957,7 @@ static inline int sctp_v4_xmit(struct sk_buff *skb,

SCTP_INC_STATS(sock_net(&inet->sk), SCTP_MIB_OUTSCTPPACKS);

- return ip_queue_xmit(skb, &transport->fl);
+ return ip_queue_xmit(&inet->sk, skb, &transport->fl);
}

static struct sctp_af sctp_af_inet;

2014-04-14 19:22:29

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 12:06:36 -0700

> On Mon, 2014-04-14 at 14:48 -0400, David Miller wrote:
>> From: Eric Dumazet <[email protected]>
>> Date: Mon, 14 Apr 2014 11:19:26 -0700
>>
>> > ip_local_out() doesn't use skb->sk
>>
>> It does Eric.
>>
>
> Hmmm, right...
>
>> We had just such a report with this in the backtrace, when AF_PACKET
>> sends over vxlan devices.
>>
>> The problem is ip_mc_output().
>
> So this means that : User socket wanted sk_mc_loop(sk), but because
> vxlan changed skb->sk to internal socket, we were doing something else
> anyway.

Actually the exact opposite is happening. vxlan does not override skb->sk,
and leaves it as AF_PACKET's socket. Then we crash in ip_mc_output() because
it only expects IP sockets, not AF_PACKET ones, attached to skb->sk.

> There are a lot of undocumented features here... like
>
> skb->priority = sk->sk_priority;
> skb->mark = sk->sk_mark;
>
> which socket do we want here (in ip_queue_xmit()) ?

Probably that of tunnel.

> Its interesting to see ip6_xmit() already has a 'struct sock *sk'
> parameter...
>
> This was the preliminary patch I tested :

You still need to make similar transformation to ip_local_out() as mentioned
above.

2014-04-14 21:23:31

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 15:22 -0400, David Miller wrote:


>
> You still need to make similar transformation to ip_local_out() as mentioned
> above.

I was trying to cook a not too invasive patch.

Changing ip_local_out() means changing dst_output() / nf_hook() and
hundred of call sites.

Unless I misunderstood you ?

2014-04-14 22:51:59

by David Miller

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

From: Eric Dumazet <[email protected]>
Date: Mon, 14 Apr 2014 14:23:24 -0700

> I was trying to cook a not too invasive patch.
>
> Changing ip_local_out() means changing dst_output() / nf_hook() and
> hundred of call sites.
>
> Unless I misunderstood you ?

You could make ip_local_out(skb) be __ip_local_out(skb, skb->sk) and
make tunnel output use __ip_local_out().

2014-04-15 00:14:05

by Eric Dumazet

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Mon, 2014-04-14 at 18:51 -0400, David Miller wrote:
> From: Eric Dumazet <[email protected]>
> Date: Mon, 14 Apr 2014 14:23:24 -0700
>
> > I was trying to cook a not too invasive patch.
> >
> > Changing ip_local_out() means changing dst_output() / nf_hook() and
> > hundred of call sites.
> >
> > Unless I misunderstood you ?
>
> You could make ip_local_out(skb) be __ip_local_out(skb, skb->sk) and
> make tunnel output use __ip_local_out().

OK lets try ;)

Zhan, could you try following patch, thanks !

drivers/net/vxlan.c | 4 ++--
include/net/dst.h | 14 +++++++++++---
include/net/inet6_connection_sock.h | 2 +-
include/net/inet_connection_sock.h | 2 +-
include/net/ip.h | 13 +++++++++----
include/net/ip_tunnels.h | 2 +-
net/core/dst.c | 4 ++--
net/dccp/output.c | 2 +-
net/ipv4/ip_output.c | 16 ++++++++--------
net/ipv4/ip_tunnel.c | 2 +-
net/ipv4/ip_tunnel_core.c | 4 ++--
net/ipv4/route.c | 4 ++--
net/ipv4/tcp_output.c | 2 +-
net/ipv6/inet6_connection_sock.c | 3 +--
net/ipv6/sit.c | 5 +++--
net/l2tp/l2tp_core.c | 4 ++--
net/l2tp/l2tp_ip.c | 2 +-
net/openvswitch/vport-gre.c | 2 +-
net/sctp/protocol.c | 2 +-
19 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c55e316373a1..82355d5d155a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1755,8 +1755,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
if (err)
return err;

- return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
- false);
+ return iptunnel_xmit(vs->sock->sk, rt, skb, src, dst, IPPROTO_UDP,
+ tos, ttl, df, false);
}
EXPORT_SYMBOL_GPL(vxlan_xmit_skb);

diff --git a/include/net/dst.h b/include/net/dst.h
index 46ed958e0c6e..71c60f42be48 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -45,7 +45,7 @@ struct dst_entry {
void *__pad1;
#endif
int (*input)(struct sk_buff *);
- int (*output)(struct sk_buff *);
+ int (*output)(struct sock *sk, struct sk_buff *skb);

unsigned short flags;
#define DST_HOST 0x0001
@@ -367,7 +367,11 @@ static inline struct dst_entry *skb_dst_pop(struct sk_buff *skb)
return child;
}

-int dst_discard(struct sk_buff *skb);
+int dst_discard_sk(struct sock *sk, struct sk_buff *skb);
+static inline int dst_discard(struct sk_buff *skb)
+{
+ return dst_discard_sk(skb->sk, skb);
+}
void *dst_alloc(struct dst_ops *ops, struct net_device *dev, int initial_ref,
int initial_obsolete, unsigned short flags);
void __dst_free(struct dst_entry *dst);
@@ -449,9 +453,13 @@ static inline void dst_set_expires(struct dst_entry *dst, int timeout)
}

/* Output packet to network from transport. */
+static inline int dst_output_sk(struct sock *sk, struct sk_buff *skb)
+{
+ return skb_dst(skb)->output(sk, skb);
+}
static inline int dst_output(struct sk_buff *skb)
{
- return skb_dst(skb)->output(skb);
+ return dst_output_sk(skb->sk, skb);
}

/* Input packet from network to transport. */
diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
index f981ba7adeed..74af137304be 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -40,7 +40,7 @@ void inet6_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,

void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr);

-int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl);
+int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);

struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu);
#endif /* _INET6_CONNECTION_SOCK_H */
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index c55aeed41ace..7a4313887568 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -36,7 +36,7 @@ struct tcp_congestion_ops;
* (i.e. things that depend on the address family)
*/
struct inet_connection_sock_af_ops {
- int (*queue_xmit)(struct sk_buff *skb, struct flowi *fl);
+ int (*queue_xmit)(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
void (*send_check)(struct sock *sk, struct sk_buff *skb);
int (*rebuild_header)(struct sock *sk);
void (*sk_rx_dst_set)(struct sock *sk, const struct sk_buff *skb);
diff --git a/include/net/ip.h b/include/net/ip.h
index 25064c28e059..3ec2b0fb9d83 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -104,14 +104,19 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
struct net_device *orig_dev);
int ip_local_deliver(struct sk_buff *skb);
int ip_mr_input(struct sk_buff *skb);
-int ip_output(struct sk_buff *skb);
-int ip_mc_output(struct sk_buff *skb);
+int ip_output(struct sock *sk, struct sk_buff *skb);
+int ip_mc_output(struct sock *sk, struct sk_buff *skb);
int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
int ip_do_nat(struct sk_buff *skb);
void ip_send_check(struct iphdr *ip);
int __ip_local_out(struct sk_buff *skb);
-int ip_local_out(struct sk_buff *skb);
-int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl);
+int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
+static inline int ip_local_out(struct sk_buff *skb)
+{
+ return ip_local_out_sk(skb->sk, skb);
+}
+
+int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
void ip_init(void);
int ip_append_data(struct sock *sk, struct flowi4 *fl4,
int getfrag(void *from, char *to, int offset, int len,
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index e77c10405d51..a4daf9eb8562 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -153,7 +153,7 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
}

int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
-int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
+int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
__be32 src, __be32 dst, __u8 proto,
__u8 tos, __u8 ttl, __be16 df, bool xnet);

diff --git a/net/core/dst.c b/net/core/dst.c
index ca4231ec7347..7443c2725c9c 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -142,12 +142,12 @@ loop:
mutex_unlock(&dst_gc_mutex);
}

-int dst_discard(struct sk_buff *skb)
+int dst_discard_sk(struct sock *sk, struct sk_buff *skb)
{
kfree_skb(skb);
return 0;
}
-EXPORT_SYMBOL(dst_discard);
+EXPORT_SYMBOL(dst_discard_sk);

const u32 dst_default_metrics[RTAX_MAX + 1] = {
/* This initializer is needed to force linker to place this variable
diff --git a/net/dccp/output.c b/net/dccp/output.c
index 8876078859da..0248e8a3460c 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -138,7 +138,7 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)

DCCP_INC_STATS(DCCP_MIB_OUTSEGS);

- err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
+ err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
return net_xmit_eval(err);
}
return -ENOBUFS;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 1a0755fea491..1cbeba5edff9 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -101,17 +101,17 @@ int __ip_local_out(struct sk_buff *skb)
skb_dst(skb)->dev, dst_output);
}

-int ip_local_out(struct sk_buff *skb)
+int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
{
int err;

err = __ip_local_out(skb);
if (likely(err == 1))
- err = dst_output(skb);
+ err = dst_output_sk(sk, skb);

return err;
}
-EXPORT_SYMBOL_GPL(ip_local_out);
+EXPORT_SYMBOL_GPL(ip_local_out_sk);

static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
{
@@ -226,9 +226,8 @@ static int ip_finish_output(struct sk_buff *skb)
return ip_finish_output2(skb);
}

-int ip_mc_output(struct sk_buff *skb)
+int ip_mc_output(struct sock *sk, struct sk_buff *skb)
{
- struct sock *sk = skb->sk;
struct rtable *rt = skb_rtable(skb);
struct net_device *dev = rt->dst.dev;

@@ -287,7 +286,7 @@ int ip_mc_output(struct sk_buff *skb)
!(IPCB(skb)->flags & IPSKB_REROUTED));
}

-int ip_output(struct sk_buff *skb)
+int ip_output(struct sock *sk, struct sk_buff *skb)
{
struct net_device *dev = skb_dst(skb)->dev;

@@ -315,9 +314,9 @@ static void ip_copy_addrs(struct iphdr *iph, const struct flowi4 *fl4)
sizeof(fl4->saddr) + sizeof(fl4->daddr));
}

-int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
+/* Note: skb->sk can be different from sk, in case of tunnels */
+int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
{
- struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
struct ip_options_rcu *inet_opt;
struct flowi4 *fl4;
@@ -389,6 +388,7 @@ packet_routed:
ip_select_ident_more(skb, &rt->dst, sk,
(skb_shinfo(skb)->gso_segs ?: 1) - 1);

+ /* TODO : should we use skb->sk here instead of sk ? */
skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e77381d1df9a..484d0ce27ef7 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -670,7 +670,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
return;
}

- err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol,
+ err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr, protocol,
tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)));
iptunnel_xmit_stats(err, &dev->stats, dev->tstats);

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index e0c2b1d2ea4e..bcf206c79005 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -46,7 +46,7 @@
#include <net/netns/generic.h>
#include <net/rtnetlink.h>

-int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
+int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
__be32 src, __be32 dst, __u8 proto,
__u8 tos, __u8 ttl, __be16 df, bool xnet)
{
@@ -76,7 +76,7 @@ int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
iph->ttl = ttl;
__ip_select_ident(iph, &rt->dst, (skb_shinfo(skb)->gso_segs ?: 1) - 1);

- err = ip_local_out(skb);
+ err = ip_local_out_sk(sk, skb);
if (unlikely(net_xmit_eval(err)))
pkt_len = 0;
return pkt_len;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 34d094cadb11..f2279d4470c4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1129,7 +1129,7 @@ static void ipv4_link_failure(struct sk_buff *skb)
dst_set_expires(&rt->dst, 0);
}

-static int ip_rt_bug(struct sk_buff *skb)
+static int ip_rt_bug(struct sock *sk, struct sk_buff *skb)
{
pr_debug("%s: %pI4 -> %pI4, %s\n",
__func__, &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr,
@@ -2218,7 +2218,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or

new->__use = 1;
new->input = dst_discard;
- new->output = dst_discard;
+ new->output = dst_discard_sk;

new->dev = ort->dst.dev;
if (new->dev)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 699fb102e971..025e25093984 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -981,7 +981,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
tcp_skb_pcount(skb));

- err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
+ err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
if (likely(err <= 0))
return err;

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index c9138189415a..d4ade34ab375 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -224,9 +224,8 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
return dst;
}

-int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl_unused)
+int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused)
{
- struct sock *sk = skb->sk;
struct ipv6_pinfo *np = inet6_sk(sk);
struct flowi6 fl6;
struct dst_entry *dst;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 1693c8d885f0..8da8268d65f8 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -974,8 +974,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
goto out;
}

- err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos,
- ttl, df, !net_eq(tunnel->net, dev_net(dev)));
+ err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr,
+ IPPROTO_IPV6, tos, ttl, df,
+ !net_eq(tunnel->net, dev_net(dev)));
iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
return NETDEV_TX_OK;

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 47f7a5490555..a4e37d7158dc 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1131,10 +1131,10 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
skb->local_df = 1;
#if IS_ENABLED(CONFIG_IPV6)
if (tunnel->sock->sk_family == PF_INET6 && !tunnel->v4mapped)
- error = inet6_csk_xmit(skb, NULL);
+ error = inet6_csk_xmit(tunnel->sock, skb, NULL);
else
#endif
- error = ip_queue_xmit(skb, fl);
+ error = ip_queue_xmit(tunnel->sock, skb, fl);

/* Update stats */
if (error >= 0) {
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 0b44d855269c..3397fe6897c0 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -487,7 +487,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m

xmit:
/* Queue the packet to IP for output */
- rc = ip_queue_xmit(skb, &inet->cork.fl);
+ rc = ip_queue_xmit(sk, skb, &inet->cork.fl);
rcu_read_unlock();

error:
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index a3d6951602db..ebb6e2442554 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -174,7 +174,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)

skb->local_df = 1;

- return iptunnel_xmit(rt, skb, fl.saddr,
+ return iptunnel_xmit(skb->sk, rt, skb, fl.saddr,
OVS_CB(skb)->tun_key->ipv4_dst, IPPROTO_GRE,
OVS_CB(skb)->tun_key->ipv4_tos,
OVS_CB(skb)->tun_key->ipv4_ttl, df, false);
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4e1d0fcb028e..c09757fbf803 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -957,7 +957,7 @@ static inline int sctp_v4_xmit(struct sk_buff *skb,

SCTP_INC_STATS(sock_net(&inet->sk), SCTP_MIB_OUTSCTPPACKS);

- return ip_queue_xmit(skb, &transport->fl);
+ return ip_queue_xmit(&inet->sk, skb, &transport->fl);
}

static struct sctp_af sctp_af_inet;

2014-04-15 06:33:16

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [BUG] A panic caused by null pointer dereference aftering updating to

On Tue, Apr 15, 2014 at 8:14 AM, Eric Dumazet <[email protected]> wrote:
>
> Zhan, could you try following patch, thanks !
>
> drivers/net/vxlan.c | 4 ++--
> include/net/dst.h | 14 +++++++++++---
> include/net/inet6_connection_sock.h | 2 +-
> include/net/inet_connection_sock.h | 2 +-
> include/net/ip.h | 13 +++++++++----
> include/net/ip_tunnels.h | 2 +-
> net/core/dst.c | 4 ++--
> net/dccp/output.c | 2 +-
> net/ipv4/ip_output.c | 16 ++++++++--------
> net/ipv4/ip_tunnel.c | 2 +-
> net/ipv4/ip_tunnel_core.c | 4 ++--
> net/ipv4/route.c | 4 ++--
> net/ipv4/tcp_output.c | 2 +-
> net/ipv6/inet6_connection_sock.c | 3 +--
> net/ipv6/sit.c | 5 +++--
> net/l2tp/l2tp_core.c | 4 ++--
> net/l2tp/l2tp_ip.c | 2 +-
> net/openvswitch/vport-gre.c | 2 +-
> net/sctp/protocol.c | 2 +-
> 19 files changed, 51 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index c55e316373a1..82355d5d155a 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1755,8 +1755,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
> if (err)
> return err;
>
> - return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
> - false);
> + return iptunnel_xmit(vs->sock->sk, rt, skb, src, dst, IPPROTO_UDP,
> + tos, ttl, df, false);
> }
> EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 46ed958e0c6e..71c60f42be48 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -45,7 +45,7 @@ struct dst_entry {
> void *__pad1;
> #endif
> int (*input)(struct sk_buff *);
> - int (*output)(struct sk_buff *);
> + int (*output)(struct sock *sk, struct sk_buff *skb);
>
> unsigned short flags;
> #define DST_HOST 0x0001
> @@ -367,7 +367,11 @@ static inline struct dst_entry *skb_dst_pop(struct sk_buff *skb)
> return child;
> }
>
> -int dst_discard(struct sk_buff *skb);
> +int dst_discard_sk(struct sock *sk, struct sk_buff *skb);
> +static inline int dst_discard(struct sk_buff *skb)
> +{
> + return dst_discard_sk(skb->sk, skb);
> +}
> void *dst_alloc(struct dst_ops *ops, struct net_device *dev, int initial_ref,
> int initial_obsolete, unsigned short flags);
> void __dst_free(struct dst_entry *dst);
> @@ -449,9 +453,13 @@ static inline void dst_set_expires(struct dst_entry *dst, int timeout)
> }
>
> /* Output packet to network from transport. */
> +static inline int dst_output_sk(struct sock *sk, struct sk_buff *skb)
> +{
> + return skb_dst(skb)->output(sk, skb);
> +}
> static inline int dst_output(struct sk_buff *skb)
> {
> - return skb_dst(skb)->output(skb);
> + return dst_output_sk(skb->sk, skb);
> }
>
> /* Input packet from network to transport. */
> diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
> index f981ba7adeed..74af137304be 100644
> --- a/include/net/inet6_connection_sock.h
> +++ b/include/net/inet6_connection_sock.h
> @@ -40,7 +40,7 @@ void inet6_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>
> void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr *uaddr);
>
> -int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl);
> +int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
>
> struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu);
> #endif /* _INET6_CONNECTION_SOCK_H */
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index c55aeed41ace..7a4313887568 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -36,7 +36,7 @@ struct tcp_congestion_ops;
> * (i.e. things that depend on the address family)
> */
> struct inet_connection_sock_af_ops {
> - int (*queue_xmit)(struct sk_buff *skb, struct flowi *fl);
> + int (*queue_xmit)(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
> void (*send_check)(struct sock *sk, struct sk_buff *skb);
> int (*rebuild_header)(struct sock *sk);
> void (*sk_rx_dst_set)(struct sock *sk, const struct sk_buff *skb);
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 25064c28e059..3ec2b0fb9d83 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -104,14 +104,19 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
> struct net_device *orig_dev);
> int ip_local_deliver(struct sk_buff *skb);
> int ip_mr_input(struct sk_buff *skb);
> -int ip_output(struct sk_buff *skb);
> -int ip_mc_output(struct sk_buff *skb);
> +int ip_output(struct sock *sk, struct sk_buff *skb);
> +int ip_mc_output(struct sock *sk, struct sk_buff *skb);
> int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
> int ip_do_nat(struct sk_buff *skb);
> void ip_send_check(struct iphdr *ip);
> int __ip_local_out(struct sk_buff *skb);
> -int ip_local_out(struct sk_buff *skb);
> -int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl);
> +int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
> +static inline int ip_local_out(struct sk_buff *skb)
> +{
> + return ip_local_out_sk(skb->sk, skb);
> +}
> +
> +int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
> void ip_init(void);
> int ip_append_data(struct sock *sk, struct flowi4 *fl4,
> int getfrag(void *from, char *to, int offset, int len,
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index e77c10405d51..a4daf9eb8562 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -153,7 +153,7 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
> }
>
> int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
> -int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
> +int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
> __be32 src, __be32 dst, __u8 proto,
> __u8 tos, __u8 ttl, __be16 df, bool xnet);
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index ca4231ec7347..7443c2725c9c 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -142,12 +142,12 @@ loop:
> mutex_unlock(&dst_gc_mutex);
> }
>
> -int dst_discard(struct sk_buff *skb)
> +int dst_discard_sk(struct sock *sk, struct sk_buff *skb)
> {
> kfree_skb(skb);
> return 0;
> }
> -EXPORT_SYMBOL(dst_discard);
> +EXPORT_SYMBOL(dst_discard_sk);
>
> const u32 dst_default_metrics[RTAX_MAX + 1] = {
> /* This initializer is needed to force linker to place this variable
> diff --git a/net/dccp/output.c b/net/dccp/output.c
> index 8876078859da..0248e8a3460c 100644
> --- a/net/dccp/output.c
> +++ b/net/dccp/output.c
> @@ -138,7 +138,7 @@ static int dccp_transmit_skb(struct sock *sk, struct sk_buff *skb)
>
> DCCP_INC_STATS(DCCP_MIB_OUTSEGS);
>
> - err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
> + err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
> return net_xmit_eval(err);
> }
> return -ENOBUFS;
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 1a0755fea491..1cbeba5edff9 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -101,17 +101,17 @@ int __ip_local_out(struct sk_buff *skb)
> skb_dst(skb)->dev, dst_output);
> }
>
> -int ip_local_out(struct sk_buff *skb)
> +int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
> {
> int err;
>
> err = __ip_local_out(skb);
> if (likely(err == 1))
> - err = dst_output(skb);
> + err = dst_output_sk(sk, skb);
>
> return err;
> }
> -EXPORT_SYMBOL_GPL(ip_local_out);
> +EXPORT_SYMBOL_GPL(ip_local_out_sk);
>
> static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
> {
> @@ -226,9 +226,8 @@ static int ip_finish_output(struct sk_buff *skb)
> return ip_finish_output2(skb);
> }
>
> -int ip_mc_output(struct sk_buff *skb)
> +int ip_mc_output(struct sock *sk, struct sk_buff *skb)
> {
> - struct sock *sk = skb->sk;
> struct rtable *rt = skb_rtable(skb);
> struct net_device *dev = rt->dst.dev;
>
> @@ -287,7 +286,7 @@ int ip_mc_output(struct sk_buff *skb)
> !(IPCB(skb)->flags & IPSKB_REROUTED));
> }
>
> -int ip_output(struct sk_buff *skb)
> +int ip_output(struct sock *sk, struct sk_buff *skb)
> {
> struct net_device *dev = skb_dst(skb)->dev;
>
> @@ -315,9 +314,9 @@ static void ip_copy_addrs(struct iphdr *iph, const struct flowi4 *fl4)
> sizeof(fl4->saddr) + sizeof(fl4->daddr));
> }
>
> -int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
> +/* Note: skb->sk can be different from sk, in case of tunnels */
> +int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
> {
> - struct sock *sk = skb->sk;
> struct inet_sock *inet = inet_sk(sk);
> struct ip_options_rcu *inet_opt;
> struct flowi4 *fl4;
> @@ -389,6 +388,7 @@ packet_routed:
> ip_select_ident_more(skb, &rt->dst, sk,
> (skb_shinfo(skb)->gso_segs ?: 1) - 1);
>
> + /* TODO : should we use skb->sk here instead of sk ? */
> skb->priority = sk->sk_priority;
> skb->mark = sk->sk_mark;
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index e77381d1df9a..484d0ce27ef7 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -670,7 +670,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> return;
> }
>
> - err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol,
> + err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr, protocol,
> tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
>
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index e0c2b1d2ea4e..bcf206c79005 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -46,7 +46,7 @@
> #include <net/netns/generic.h>
> #include <net/rtnetlink.h>
>
> -int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
> +int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
> __be32 src, __be32 dst, __u8 proto,
> __u8 tos, __u8 ttl, __be16 df, bool xnet)
> {
> @@ -76,7 +76,7 @@ int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
> iph->ttl = ttl;
> __ip_select_ident(iph, &rt->dst, (skb_shinfo(skb)->gso_segs ?: 1) - 1);
>
> - err = ip_local_out(skb);
> + err = ip_local_out_sk(sk, skb);
> if (unlikely(net_xmit_eval(err)))
> pkt_len = 0;
> return pkt_len;
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 34d094cadb11..f2279d4470c4 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1129,7 +1129,7 @@ static void ipv4_link_failure(struct sk_buff *skb)
> dst_set_expires(&rt->dst, 0);
> }
>
> -static int ip_rt_bug(struct sk_buff *skb)
> +static int ip_rt_bug(struct sock *sk, struct sk_buff *skb)
> {
> pr_debug("%s: %pI4 -> %pI4, %s\n",
> __func__, &ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr,
> @@ -2218,7 +2218,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
>
> new->__use = 1;
> new->input = dst_discard;
> - new->output = dst_discard;
> + new->output = dst_discard_sk;
>
> new->dev = ort->dst.dev;
> if (new->dev)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 699fb102e971..025e25093984 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -981,7 +981,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
> TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
> tcp_skb_pcount(skb));
>
> - err = icsk->icsk_af_ops->queue_xmit(skb, &inet->cork.fl);
> + err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
> if (likely(err <= 0))
> return err;
>
> diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
> index c9138189415a..d4ade34ab375 100644
> --- a/net/ipv6/inet6_connection_sock.c
> +++ b/net/ipv6/inet6_connection_sock.c
> @@ -224,9 +224,8 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
> return dst;
> }
>
> -int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl_unused)
> +int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused)
> {
> - struct sock *sk = skb->sk;
> struct ipv6_pinfo *np = inet6_sk(sk);
> struct flowi6 fl6;
> struct dst_entry *dst;
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 1693c8d885f0..8da8268d65f8 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -974,8 +974,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
> goto out;
> }
>
> - err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos,
> - ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> + err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr,
> + IPPROTO_IPV6, tos, ttl, df,
> + !net_eq(tunnel->net, dev_net(dev)));
> iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
> return NETDEV_TX_OK;
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 47f7a5490555..a4e37d7158dc 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1131,10 +1131,10 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
> skb->local_df = 1;
> #if IS_ENABLED(CONFIG_IPV6)
> if (tunnel->sock->sk_family == PF_INET6 && !tunnel->v4mapped)
> - error = inet6_csk_xmit(skb, NULL);
> + error = inet6_csk_xmit(tunnel->sock, skb, NULL);
> else
> #endif
> - error = ip_queue_xmit(skb, fl);
> + error = ip_queue_xmit(tunnel->sock, skb, fl);
>
> /* Update stats */
> if (error >= 0) {
> diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
> index 0b44d855269c..3397fe6897c0 100644
> --- a/net/l2tp/l2tp_ip.c
> +++ b/net/l2tp/l2tp_ip.c
> @@ -487,7 +487,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
>
> xmit:
> /* Queue the packet to IP for output */
> - rc = ip_queue_xmit(skb, &inet->cork.fl);
> + rc = ip_queue_xmit(sk, skb, &inet->cork.fl);
> rcu_read_unlock();
>
> error:
> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> index a3d6951602db..ebb6e2442554 100644
> --- a/net/openvswitch/vport-gre.c
> +++ b/net/openvswitch/vport-gre.c
> @@ -174,7 +174,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
>
> skb->local_df = 1;
>
> - return iptunnel_xmit(rt, skb, fl.saddr,
> + return iptunnel_xmit(skb->sk, rt, skb, fl.saddr,
> OVS_CB(skb)->tun_key->ipv4_dst, IPPROTO_GRE,
> OVS_CB(skb)->tun_key->ipv4_tos,
> OVS_CB(skb)->tun_key->ipv4_ttl, df, false);

Hi, Eric,

I've applied the patch, it seems now works for me, at least no panic any more:-)
Thanks for all you guys.

Tested-by: Jianyu Zhan <[email protected]>


Regards,
Jianyu Zhan