2021-02-17 22:27:02

by Jason A. Donenfeld

[permalink] [raw]
Subject: possible stack corruption in icmp_send (__stack_chk_fail)

Hi Netdev & Willem,

I've received a report of stack corruption -- via the stack protector
check -- in icmp_send. I was sent a vmcore, and was able to extract
the OOPS from there. However, I've been unable to produce the bug and
I don't see where it'd be in the code. That might point to a more
sinister problem, or I'm simply just not seeing it. Apparently the
reporter reproduces it every 40 or so minutes, and has seen it happen
since at least ~5.10. Willem - I'm emailing you because it seems like
you were making a lot of changes to the icmp code around then, and
perhaps you have an intuition. For example, some of the error handling
code takes a pointer to a stack buffer (_objh and such), and maybe
that's problematic? I'm not quite sure. The vmcore, along with the
various kernel binaries I hunted down are here:
https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
. The extracted dmesg follows below, in case you or anyone has a
pointer. I've been staring at this for a while and don't see it.

Jason

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: __icmp_send+0x5bd/0x5c0
CPU: 0 PID: 959 Comm: kworker/0:2 Kdump: loaded Not tainted
5.11.0-051100-lowlatency #202102142330
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
Workqueue: wg-crypt-wg0 wg_packet_decrypt_worker [wireguard]
Call Trace:
<IRQ>
show_stack+0x52/0x58
dump_stack+0x70/0x8b
panic+0x108/0x2ea
? ip_push_pending_frames+0x42/0x90
? __icmp_send+0x5bd/0x5c0
__stack_chk_fail+0x14/0x20
__icmp_send+0x5bd/0x5c0
icmp_ndo_send+0x148/0x160
wg_xmit+0x359/0x450 [wireguard]
? harmonize_features+0x19/0x80
xmit_one.constprop.0+0x9f/0x190
dev_hard_start_xmit+0x43/0x90
sch_direct_xmit+0x11d/0x340
__qdisc_run+0x66/0xc0
__dev_xmit_skb+0xd5/0x340
__dev_queue_xmit+0x32b/0x4d0
? nf_conntrack_double_lock.constprop.0+0x97/0x140 [nf_conntrack]
dev_queue_xmit+0x10/0x20
neigh_connected_output+0xcb/0xf0
ip_finish_output2+0x17f/0x470
__ip_finish_output+0x9b/0x140
? ipv4_confirm+0x4a/0x80 [nf_conntrack]
ip_finish_output+0x2d/0xb0
ip_output+0x78/0x110
? __ip_finish_output+0x140/0x140
ip_forward_finish+0x58/0x90
ip_forward+0x40a/0x4d0
? ip4_key_hashfn+0xb0/0xb0
ip_sublist_rcv_finish+0x3d/0x50
ip_list_rcv_finish.constprop.0+0x163/0x190
ip_sublist_rcv+0x37/0xb0
? ip_rcv_finish_core.constprop.0+0x310/0x310
ip_list_rcv+0xf5/0x120
__netif_receive_skb_list_core+0x228/0x250
__netif_receive_skb_list+0x102/0x170
? dev_gro_receive+0x1b5/0x370
netif_receive_skb_list_internal+0xca/0x190
napi_complete_done+0x7a/0x1a0
wg_packet_rx_poll+0x384/0x400 [wireguard]
napi_poll+0x92/0x200
net_rx_action+0xb8/0x1c0
__do_softirq+0xce/0x2b3
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x3d/0x50
do_softirq+0x66/0x80
__local_bh_enable_ip+0x62/0x70
_raw_spin_unlock_bh+0x1e/0x20
wg_packet_decrypt_worker+0xf6/0x190 [wireguard]
process_one_work+0x217/0x3e0
worker_thread+0x4d/0x350
? rescuer_thread+0x390/0x390
kthread+0x145/0x170
? __kthread_bind_mask+0x70/0x70
ret_from_fork+0x22/0x30
Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)


2021-02-17 22:39:07

by Willem de Bruijn

[permalink] [raw]
Subject: Re: possible stack corruption in icmp_send (__stack_chk_fail)

On Wed, Feb 17, 2021 at 1:12 PM Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Netdev & Willem,
>
> I've received a report of stack corruption -- via the stack protector
> check -- in icmp_send. I was sent a vmcore, and was able to extract
> the OOPS from there. However, I've been unable to produce the bug and
> I don't see where it'd be in the code. That might point to a more
> sinister problem, or I'm simply just not seeing it. Apparently the
> reporter reproduces it every 40 or so minutes, and has seen it happen
> since at least ~5.10. Willem - I'm emailing you because it seems like
> you were making a lot of changes to the icmp code around then, and
> perhaps you have an intuition. For example, some of the error handling
> code takes a pointer to a stack buffer (_objh and such), and maybe
> that's problematic? I'm not quite sure. The vmcore, along with the
> various kernel binaries I hunted down are here:
> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> . The extracted dmesg follows below, in case you or anyone has a
> pointer. I've been staring at this for a while and don't see it.
>
> Jason

Sorry, I also don't immediately see a cause.

The _objh is a fairly standard approach to accessing skb data with
skb_header_pointer. More importantly, that codepath is in the icmp
receive path and then guarded by a socket option
(inet_sk(sk)->recverr_rfc4884), so unlikely to be exercised here.

This is an icmp send in response to a forwarded packet (assuming
__qdisc_run dequeued the packet that triggered it). The icmp send code
is quite robust against, e.g., undersized packets. But could it be
that the forwarded packet is not sensible IPv4? The skb->protocol is
inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.

As for on-stack variable overflow, __ip_options_echo parses the
(untrusted) input to write into stack allocated icmp_param. But that
is fairly well tested, rarely touched, code by now. Perhaps relevant,
though, the opt passed is in skb->cb[], which at should probably not
be interpreted as inet_skb_parm (IPCB).

static inline void icmp_send(struct sk_buff *skb_in, int type, int
code, __be32 info)
{
__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
}


A vmlinux image might help. I couldn't find one for this kernel.

Or if the kernel can be modified and this path is rarely taken,
logging the packet, e.g., with skb_dump.

2021-02-17 23:00:02

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: possible stack corruption in icmp_send (__stack_chk_fail)

Hi Willem,

On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
<[email protected]> wrote:
> A vmlinux image might help. I couldn't find one for this kernel.

https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
has .debs with vmlinuz in there, which you can extract to vmlinux, as
well as my own vmlinux elf construction with the symbols added back in
by extracting them from kallsyms. That's the best I've been able to
do, as all of this is coming from somebody random emailing me.

> But could it be
> that the forwarded packet is not sensible IPv4? The skb->protocol is
> inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.

The wg calls to icmp_ndo_send are gated by checking skb->protocol:

if (skb->protocol == htons(ETH_P_IP))
icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
else if (skb->protocol == htons(ETH_P_IPV6))
icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
ICMPV6_ADDR_UNREACH, 0);

On the other hand, that code is hit on an error path when
wg_check_packet_protocol returns false:

static inline bool wg_check_packet_protocol(struct sk_buff *skb)
{
__be16 real_protocol = ip_tunnel_parse_protocol(skb);
return real_protocol && skb->protocol == real_protocol;
}

So that means, at least in theory, icmp_ndo_send could be called with
skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
that. But... is it actually a problem?

Jason

2021-02-17 23:09:29

by Willem de Bruijn

[permalink] [raw]
Subject: Re: possible stack corruption in icmp_send (__stack_chk_fail)

On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Willem,
>
> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
> <[email protected]> wrote:
> > A vmlinux image might help. I couldn't find one for this kernel.
>
> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> has .debs with vmlinuz in there, which you can extract to vmlinux, as
> well as my own vmlinux elf construction with the symbols added back in
> by extracting them from kallsyms. That's the best I've been able to
> do, as all of this is coming from somebody random emailing me.
>
> > But could it be
> > that the forwarded packet is not sensible IPv4? The skb->protocol is
> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
>
> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
>
> if (skb->protocol == htons(ETH_P_IP))
> icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
> else if (skb->protocol == htons(ETH_P_IPV6))
> icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
> ICMPV6_ADDR_UNREACH, 0);
>
> On the other hand, that code is hit on an error path when
> wg_check_packet_protocol returns false:
>
> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
> {
> __be16 real_protocol = ip_tunnel_parse_protocol(skb);
> return real_protocol && skb->protocol == real_protocol;
> }
>
> So that means, at least in theory, icmp_ndo_send could be called with
> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
> that. But... is it actually a problem?

For this forwarded packet that arrived on a wireguard tunnel,
skb->protocol was originally also set by ip_tunnel_parse_protocol.
So likely not.

The other issue seems more like a real bug. wg_xmit calling
icmp_ndo_send without clearing IPCB first.

2021-02-18 00:04:15

by Willem de Bruijn

[permalink] [raw]
Subject: Re: possible stack corruption in icmp_send (__stack_chk_fail)

On Wed, Feb 17, 2021 at 6:18 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On 2/18/21, Willem de Bruijn <[email protected]> wrote:
> > On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <[email protected]> wrote:
> >>
> >> Hi Willem,
> >>
> >> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
> >> <[email protected]> wrote:
> >> > A vmlinux image might help. I couldn't find one for this kernel.
> >>
> >> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> >> has .debs with vmlinuz in there, which you can extract to vmlinux, as
> >> well as my own vmlinux elf construction with the symbols added back in
> >> by extracting them from kallsyms. That's the best I've been able to
> >> do, as all of this is coming from somebody random emailing me.
> >>
> >> > But could it be
> >> > that the forwarded packet is not sensible IPv4? The skb->protocol is
> >> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
> >>
> >> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
> >>
> >> if (skb->protocol == htons(ETH_P_IP))
> >> icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH,
> >> 0);
> >> else if (skb->protocol == htons(ETH_P_IPV6))
> >> icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
> >> ICMPV6_ADDR_UNREACH, 0);
> >>
> >> On the other hand, that code is hit on an error path when
> >> wg_check_packet_protocol returns false:
> >>
> >> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
> >> {
> >> __be16 real_protocol = ip_tunnel_parse_protocol(skb);
> >> return real_protocol && skb->protocol == real_protocol;
> >> }
> >>
> >> So that means, at least in theory, icmp_ndo_send could be called with
> >> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
> >> that. But... is it actually a problem?
> >
> > For this forwarded packet that arrived on a wireguard tunnel,
> > skb->protocol was originally also set by ip_tunnel_parse_protocol.
> > So likely not.
> >
> > The other issue seems more like a real bug. wg_xmit calling
> > icmp_ndo_send without clearing IPCB first.
> >
>
> Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb
> before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will
> send a patch for icmp_ndo_xmit momentarily and will CC you.

Great, let's hope that's it.

gtp_build_skb_ip4 zeroes before calling. The fix will be most
obviously correct if wg_xmit does the same.

But it is quite likely that the other callers, xfrmi_xmit2 and
sunvnet_start_xmit_common should zero, too. If so, then icmp_ndo_xmit
is the more robust location to do this. Then the Fixes tag will likely
go quite a bit farther back, too.

Whichever variant of the patch you prefer.

2021-02-18 00:55:30

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: possible stack corruption in icmp_send (__stack_chk_fail)

On 2/18/21, Willem de Bruijn <[email protected]> wrote:
> On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <[email protected]> wrote:
>>
>> Hi Willem,
>>
>> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
>> <[email protected]> wrote:
>> > A vmlinux image might help. I couldn't find one for this kernel.
>>
>> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
>> has .debs with vmlinuz in there, which you can extract to vmlinux, as
>> well as my own vmlinux elf construction with the symbols added back in
>> by extracting them from kallsyms. That's the best I've been able to
>> do, as all of this is coming from somebody random emailing me.
>>
>> > But could it be
>> > that the forwarded packet is not sensible IPv4? The skb->protocol is
>> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
>>
>> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
>>
>> if (skb->protocol == htons(ETH_P_IP))
>> icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH,
>> 0);
>> else if (skb->protocol == htons(ETH_P_IPV6))
>> icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
>> ICMPV6_ADDR_UNREACH, 0);
>>
>> On the other hand, that code is hit on an error path when
>> wg_check_packet_protocol returns false:
>>
>> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
>> {
>> __be16 real_protocol = ip_tunnel_parse_protocol(skb);
>> return real_protocol && skb->protocol == real_protocol;
>> }
>>
>> So that means, at least in theory, icmp_ndo_send could be called with
>> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
>> that. But... is it actually a problem?
>
> For this forwarded packet that arrived on a wireguard tunnel,
> skb->protocol was originally also set by ip_tunnel_parse_protocol.
> So likely not.
>
> The other issue seems more like a real bug. wg_xmit calling
> icmp_ndo_send without clearing IPCB first.
>

Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb
before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will
send a patch for icmp_ndo_xmit momentarily and will CC you.

Jason