2020-05-27 03:54:00

by John Fastabend

[permalink] [raw]
Subject: RE: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg

dihu wrote:
> From 865a45747de6b68fd02a0ff128a69a5c8feb73c3 Mon Sep 17 00:00:00 2001
> From: dihu <[email protected]>
> Date: Mon, 25 May 2020 17:23:16 +0800
> Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg
>
> When user application calls read() with MSG_PEEK flag to read data
> of bpf sockmap socket, kernel panic happens at
> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
> queue after read out under MSG_PEEK flag is set. Because it's not
> judged whether sk_msg is the last msg of ingress_msg queue, the next
> sk_msg may be the head of ingress_msg queue, whose memory address of
> sg page is invalid. So it's necessary to add check codes to prevent
> this problem.
>
> [20759.125457] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
> 5.4.32 #1
> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
> 4.1.12 06/18/2017
> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
> [20759.281137] inet_recvmsg+0x55/0xc0
> [20759.285734] __sys_recvfrom+0xc8/0x130
> [20759.290566] ? __audit_syscall_entry+0x103/0x130
> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
> [20759.307235] __x64_sys_recvfrom+0x24/0x30
> [20759.312226] do_syscall_64+0x55/0x1b0
> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: dihu <[email protected]>
> ---
> net/ipv4/tcp_bpf.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 5a05327..c0d4624 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
> } while (i != msg_rx->sg.end);
>
> if (unlikely(peek)) {
> + if (msg_rx == list_last_entry(&psock->ingress_msg,
> + struct sk_msg, list))
> + break;


Thanks. Change looks good but spacing is a bit off . Can we
turn those spaces into tabs? Otherwise adding fixes tag and
my ack would be great.

Fixes: 02c558b2d5d67 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Acked-by: John Fastabend <[email protected]>


2020-05-29 09:09:37

by dihu

[permalink] [raw]
Subject: Re: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg


On 2020/5/27 5:10, John Fastabend wrote:
> dihu wrote:
>> From 865a45747de6b68fd02a0ff128a69a5c8feb73c3 Mon Sep 17 00:00:00 2001
>> From: dihu <[email protected]>
>> Date: Mon, 25 May 2020 17:23:16 +0800
>> Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg
>>
>> When user application calls read() with MSG_PEEK flag to read data
>> of bpf sockmap socket, kernel panic happens at
>> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
>> queue after read out under MSG_PEEK flag is set. Because it's not
>> judged whether sk_msg is the last msg of ingress_msg queue, the next
>> sk_msg may be the head of ingress_msg queue, whose memory address of
>> sg page is invalid. So it's necessary to add check codes to prevent
>> this problem.
>>
>> [20759.125457] BUG: kernel NULL pointer dereference, address:
>> 0000000000000008
>> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
>> 5.4.32 #1
>> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
>> 4.1.12 06/18/2017
>> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
>> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
>> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
>> [20759.281137] inet_recvmsg+0x55/0xc0
>> [20759.285734] __sys_recvfrom+0xc8/0x130
>> [20759.290566] ? __audit_syscall_entry+0x103/0x130
>> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
>> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
>> [20759.307235] __x64_sys_recvfrom+0x24/0x30
>> [20759.312226] do_syscall_64+0x55/0x1b0
>> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: dihu <[email protected]>
>> ---
>> net/ipv4/tcp_bpf.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> index 5a05327..c0d4624 100644
>> --- a/net/ipv4/tcp_bpf.c
>> +++ b/net/ipv4/tcp_bpf.c
>> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
>> } while (i != msg_rx->sg.end);
>>
>> if (unlikely(peek)) {
>> + if (msg_rx == list_last_entry(&psock->ingress_msg,
>> + struct sk_msg, list))
>> + break;
>
> Thanks. Change looks good but spacing is a bit off . Can we
> turn those spaces into tabs? Otherwise adding fixes tag and
> my ack would be great.
>
> Fixes: 02c558b2d5d67 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
> Acked-by: John Fastabend <[email protected]>


From 127a334fa5e5d029353ceb1a0414886c527f4be5 Mon Sep 17 00:00:00 2001
From: dihu <[email protected]>
Date: Fri, 29 May 2020 16:38:50 +0800
Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg

When user application calls read() with MSG_PEEK flag to read data
of bpf sockmap socket, kernel panic happens at
__tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
queue after read out under MSG_PEEK flag is set. Because it's not
judged whether sk_msg is the last msg of ingress_msg queue, the next
sk_msg may be the head of ingress_msg queue, whose memory address of
sg page is invalid. So it's necessary to add check codes to prevent
this problem.

[20759.125457] BUG: kernel NULL pointer dereference, address:
0000000000000008
[20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G            E
5.4.32 #1
[20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
4.1.12 06/18/2017
[20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
[20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
[20759.276099] tcp_bpf_recvmsg+0x113/0x370
[20759.281137] inet_recvmsg+0x55/0xc0
[20759.285734] __sys_recvfrom+0xc8/0x130
[20759.290566] ? __audit_syscall_entry+0x103/0x130
[20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
[20759.301700] ? __audit_syscall_exit+0x1e4/0x290
[20759.307235] __x64_sys_recvfrom+0x24/0x30
[20759.312226] do_syscall_64+0x55/0x1b0
[20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: dihu <[email protected]>
---
 net/ipv4/tcp_bpf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5a05327..b82e4c3 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock
*psock,
         } while (i != msg_rx->sg.end);

         if (unlikely(peek)) {
+            if (msg_rx == list_last_entry(&psock->ingress_msg,
+                              struct sk_msg, list))
+                break;
             msg_rx = list_next_entry(msg_rx, list);
             continue;
         }
--
1.8.3.1

2020-06-02 09:06:27

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg

On Fri, May 29, 2020 at 11:05 AM CEST, dihu wrote:
> On 2020/5/27 5:10, John Fastabend wrote:
>> dihu wrote:
>>> From 865a45747de6b68fd02a0ff128a69a5c8feb73c3 Mon Sep 17 00:00:00 2001
>>> From: dihu <[email protected]>
>>> Date: Mon, 25 May 2020 17:23:16 +0800
>>> Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg
>>>
>>> When user application calls read() with MSG_PEEK flag to read data
>>> of bpf sockmap socket, kernel panic happens at
>>> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
>>> queue after read out under MSG_PEEK flag is set. Because it's not
>>> judged whether sk_msg is the last msg of ingress_msg queue, the next
>>> sk_msg may be the head of ingress_msg queue, whose memory address of
>>> sg page is invalid. So it's necessary to add check codes to prevent
>>> this problem.
>>>
>>> [20759.125457] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000008
>>> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
>>> 5.4.32 #1
>>> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
>>> 4.1.12 06/18/2017
>>> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
>>> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
>>> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
>>> [20759.281137] inet_recvmsg+0x55/0xc0
>>> [20759.285734] __sys_recvfrom+0xc8/0x130
>>> [20759.290566] ? __audit_syscall_entry+0x103/0x130
>>> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
>>> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
>>> [20759.307235] __x64_sys_recvfrom+0x24/0x30
>>> [20759.312226] do_syscall_64+0x55/0x1b0
>>> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> Signed-off-by: dihu <[email protected]>
>>> ---
>>> net/ipv4/tcp_bpf.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>>> index 5a05327..c0d4624 100644
>>> --- a/net/ipv4/tcp_bpf.c
>>> +++ b/net/ipv4/tcp_bpf.c
>>> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
>>> } while (i != msg_rx->sg.end);
>>>
>>> if (unlikely(peek)) {
>>> + if (msg_rx == list_last_entry(&psock->ingress_msg,
>>> + struct sk_msg, list))
>>> + break;
>>
>> Thanks. Change looks good but spacing is a bit off . Can we
>> turn those spaces into tabs? Otherwise adding fixes tag and
>> my ack would be great.
>>
>> Fixes: 02c558b2d5d67 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
>> Acked-by: John Fastabend <[email protected]>
>
>
> From 127a334fa5e5d029353ceb1a0414886c527f4be5 Mon Sep 17 00:00:00 2001
> From: dihu <[email protected]>
> Date: Fri, 29 May 2020 16:38:50 +0800
> Subject: [PATCH] bpf/sockmap: fix kernel panic at __tcp_bpf_recvmsg
>
> When user application calls read() with MSG_PEEK flag to read data
> of bpf sockmap socket, kernel panic happens at
> __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
> queue after read out under MSG_PEEK flag is set. Because it's not
> judged whether sk_msg is the last msg of ingress_msg queue, the next
> sk_msg may be the head of ingress_msg queue, whose memory address of
> sg page is invalid. So it's necessary to add check codes to prevent
> this problem.
>
> [20759.125457] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
> 5.4.32 #1
> [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
> 4.1.12 06/18/2017
> [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
> [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
> [20759.276099] tcp_bpf_recvmsg+0x113/0x370
> [20759.281137] inet_recvmsg+0x55/0xc0
> [20759.285734] __sys_recvfrom+0xc8/0x130
> [20759.290566] ? __audit_syscall_entry+0x103/0x130
> [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
> [20759.301700] ? __audit_syscall_exit+0x1e4/0x290
> [20759.307235] __x64_sys_recvfrom+0x24/0x30
> [20759.312226] do_syscall_64+0x55/0x1b0
> [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: dihu <[email protected]>
> ---
> net/ipv4/tcp_bpf.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 5a05327..b82e4c3 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -64,6 +64,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
> } while (i != msg_rx->sg.end);
>
> if (unlikely(peek)) {
> + if (msg_rx == list_last_entry(&psock->ingress_msg,
> + struct sk_msg, list))
> + break;
> msg_rx = list_next_entry(msg_rx, list);
> continue;
> }

Looks like the patch is garbled. I suspect due to copy-paste to an
e-mail client. Context line got wrapped and there are non-breaking
spaces instead of tabs in the body.

Crash fix is important so could you resend it with `git send-email`?

git send-email --to [email protected] --cc [email protected] file.patch

You might find the documentation below helpful:

https://www.kernel.org/doc/html/latest/process/email-clients.html