2019-08-28 06:39:51

by syzbot

[permalink] [raw]
Subject: general protection fault in tls_sk_proto_close (2)

Hello,

syzbot found the following crash on:

HEAD commit: a55aa89a Linux 5.3-rc6
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16c26ebc600000
kernel config: https://syzkaller.appspot.com/x/.config?x=2a6a2b9826fdadf9
dashboard link: https://syzkaller.appspot.com/bug?extid=7a6ee4d0078eac6bf782
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1112a4de600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10290 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #120
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tls_sk_proto_close+0x35b/0x990 net/tls/tls_main.c:321
tcp_bpf_close+0x17c/0x390 net/ipv4/tcp_bpf.c:582
inet_release+0xed/0x200 net/ipv4/af_inet.c:427
inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413540
Code: 01 f0 ff ff 0f 83 30 1b 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 83 3d 4d 2d 66 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff
RSP: 002b:00007fff5d481778 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413540
RDX: 0000001b2e520000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075bf20
R13: 0000000000000003 R14: 0000000000761220 R15: ffffffffffffffff
Modules linked in:
---[ end trace bdfd4385a0f1f76d ]---
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


2019-08-28 22:28:13

by Jakub Kicinski

[permalink] [raw]
Subject: Re: general protection fault in tls_sk_proto_close (2)

On Tue, 27 Aug 2019 23:38:07 -0700, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: a55aa89a Linux 5.3-rc6
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16c26ebc600000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2a6a2b9826fdadf9
> dashboard link: https://syzkaller.appspot.com/bug?extid=7a6ee4d0078eac6bf782
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1112a4de600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]

Hi John!

This is a loop where TLS calls it's own close function recursively.
It seems we must have gotten BPF installed on top of TLS, and then
it handed TLS TLS'es own sk_proto via tcp_update_ulp().

Can BPF on top of TLS be prevented somehow?

Quick fix should probably be something like:

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 43252a801c3f..3f4962756fa4 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -816,6 +816,9 @@ static void tls_update(struct sock *sk, struct proto *p)

ctx = tls_get_ctx(sk);
if (likely(ctx)) {
+ if (p->setsockopt == tls_setsockopt)
+ return;
+
ctx->sk_proto_close = p->close;
ctx->sk_proto = p;
} else {

> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 10290 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #120
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
> Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
> ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
> 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
> RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
> RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
> RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
> R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
> R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
> FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> tls_sk_proto_close+0x35b/0x990 net/tls/tls_main.c:321
> tcp_bpf_close+0x17c/0x390 net/ipv4/tcp_bpf.c:582
> inet_release+0xed/0x200 net/ipv4/af_inet.c:427
> inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
> __sock_release+0xce/0x280 net/socket.c:590
> sock_close+0x1e/0x30 net/socket.c:1268
> __fput+0x2ff/0x890 fs/file_table.c:280
> ____fput+0x16/0x20 fs/file_table.c:313
> task_work_run+0x145/0x1c0 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
> prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
> do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413540
> Code: 01 f0 ff ff 0f 83 30 1b 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00 00 83 3d 4d 2d 66 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff
> RSP: 002b:00007fff5d481778 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413540
> RDX: 0000001b2e520000 RSI: 0000000000000000 RDI: 0000000000000005
> RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075bf20
> R13: 0000000000000003 R14: 0000000000761220 R15: ffffffffffffffff
> Modules linked in:
> ---[ end trace bdfd4385a0f1f76d ]---
> RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
> Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
> ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
> 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
> RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
> RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
> RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
> R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
> R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
> FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

2019-08-29 16:45:30

by Jakub Kicinski

[permalink] [raw]
Subject: Re: general protection fault in tls_sk_proto_close (2)

On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:
> Alternatively work is done if sock is closed again. Anyway ctx is reset
> under sock's callback lock in write mode.
>
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
> long timeo = sock_sndtimeo(sk, 0);
> bool free_ctx;
>
> + if (!ctx)
> + return;
> if (ctx->tx_conf == TLS_SW)
> tls_sw_cancel_work_tx(ctx);

That's no bueno, the real socket's close will never get called.

2019-08-29 18:49:55

by John Fastabend

[permalink] [raw]
Subject: Re: general protection fault in tls_sk_proto_close (2)

Jakub Kicinski wrote:
> On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:
> > Alternatively work is done if sock is closed again. Anyway ctx is reset
> > under sock's callback lock in write mode.
> >
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
> > long timeo = sock_sndtimeo(sk, 0);
> > bool free_ctx;
> >
> > + if (!ctx)
> > + return;
> > if (ctx->tx_conf == TLS_SW)
> > tls_sw_cancel_work_tx(ctx);
>
> That's no bueno, the real socket's close will never get called.

Seems when we refactored BPF side we dropped the check for ULP on one
path so I'll add that back now. It would be nice and seems we are
getting closer now that tls side is a bit more dynamic if the ordering
didn't matter.

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1330a7442e5b..30d11558740e 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -666,6 +666,8 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
WARN_ON_ONCE(!rcu_read_lock_held());
if (unlikely(flags > BPF_EXIST))
return -EINVAL;
+ if (unlikely(icsk->icsk_ulp_data))
+ return -EINVAL;

link = sk_psock_init_link();
if (!link)

2019-08-29 18:54:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: general protection fault in tls_sk_proto_close (2)

On Thu, 29 Aug 2019 11:48:32 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:
> > > Alternatively work is done if sock is closed again. Anyway ctx is reset
> > > under sock's callback lock in write mode.
> > >
> > > --- a/net/tls/tls_main.c
> > > +++ b/net/tls/tls_main.c
> > > @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
> > > long timeo = sock_sndtimeo(sk, 0);
> > > bool free_ctx;
> > >
> > > + if (!ctx)
> > > + return;
> > > if (ctx->tx_conf == TLS_SW)
> > > tls_sw_cancel_work_tx(ctx);
> >
> > That's no bueno, the real socket's close will never get called.
>
> Seems when we refactored BPF side we dropped the check for ULP on one
> path so I'll add that back now. It would be nice and seems we are
> getting closer now that tls side is a bit more dynamic if the ordering
> didn't matter.

We'd probably need some more generic way of communicating the changes
in sk_proto stack, e.g. by moving the update into one of sk_proto
callbacks? but yes.

> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 1330a7442e5b..30d11558740e 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -666,6 +666,8 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
> WARN_ON_ONCE(!rcu_read_lock_held());
> if (unlikely(flags > BPF_EXIST))
> return -EINVAL;
> + if (unlikely(icsk->icsk_ulp_data))
> + return -EINVAL;
>
> link = sk_psock_init_link();
> if (!link)

Thanks! That looks good, if you feel like submitting officially feel
free to add my Reviewed-by!

2019-08-29 22:20:02

by John Fastabend

[permalink] [raw]
Subject: Re: general protection fault in tls_sk_proto_close (2)

Jakub Kicinski wrote:
> On Thu, 29 Aug 2019 11:48:32 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Thu, 29 Aug 2019 11:52:00 +0800, Hillf Danton wrote:
> > > > Alternatively work is done if sock is closed again. Anyway ctx is reset
> > > > under sock's callback lock in write mode.
> > > >
> > > > --- a/net/tls/tls_main.c
> > > > +++ b/net/tls/tls_main.c
> > > > @@ -295,6 +295,8 @@ static void tls_sk_proto_close(struct so
> > > > long timeo = sock_sndtimeo(sk, 0);
> > > > bool free_ctx;
> > > >
> > > > + if (!ctx)
> > > > + return;
> > > > if (ctx->tx_conf == TLS_SW)
> > > > tls_sw_cancel_work_tx(ctx);
> > >
> > > That's no bueno, the real socket's close will never get called.
> >
> > Seems when we refactored BPF side we dropped the check for ULP on one
> > path so I'll add that back now. It would be nice and seems we are
> > getting closer now that tls side is a bit more dynamic if the ordering
> > didn't matter.
>
> We'd probably need some more generic way of communicating the changes
> in sk_proto stack, e.g. by moving the update into one of sk_proto
> callbacks? but yes.
>
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 1330a7442e5b..30d11558740e 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -666,6 +666,8 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
> > WARN_ON_ONCE(!rcu_read_lock_held());
> > if (unlikely(flags > BPF_EXIST))
> > return -EINVAL;
> > + if (unlikely(icsk->icsk_ulp_data))
> > + return -EINVAL;
> >
> > link = sk_psock_init_link();
> > if (!link)
>
> Thanks! That looks good, if you feel like submitting officially feel
> free to add my Reviewed-by!

I'll send it out this evening after running the selftests.

.John