2019-08-20 01:19:44

by syzbot

[permalink] [raw]
Subject: general protection fault in xsk_poll

Hello,

syzbot found the following crash on:

HEAD commit: da657043 Add linux-next specific files for 20190819
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16af124c600000
kernel config: https://syzkaller.appspot.com/x/.config?x=739a9b3ab3d8c770
dashboard link: https://syzkaller.appspot.com/bug?extid=c82697e3043781e08802
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=109e1922600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1445bf02600000

The bug was bisected to:

commit 77cd0d7b3f257fd0e3096b4fdcff1a7d38e99e10
Author: Magnus Karlsson <[email protected]>
Date: Wed Aug 14 07:27:17 2019 +0000

xsk: add support for need_wakeup flag in AF_XDP rings

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15e1ea4c600000
final crash: https://syzkaller.appspot.com/x/report.txt?x=17e1ea4c600000
console output: https://syzkaller.appspot.com/x/log.txt?x=13e1ea4c600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP
rings")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 7959 Comm: syz-executor611 Not tainted 5.3.0-rc5-next-20190819
#68
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:xsk_poll+0x95/0x540 net/xdp/xsk.c:386
Code: 80 3c 02 00 0f 85 70 04 00 00 4c 8b a3 88 04 00 00 48 b8 00 00 00 00
00 fc ff df 49 8d bc 24 96 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48
89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 bf 03 00 00
RSP: 0018:ffff8880926f7850 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: ffff88809a141700 RCX: ffffffff859b07aa
RDX: 0000000000000012 RSI: ffffffff859b07c4 RDI: 0000000000000096
RBP: ffff8880926f7880 R08: ffff88809698a580 R09: ffffed1013428329
R10: ffffed1013428328 R11: ffff88809a141947 R12: 0000000000000000
R13: 0000000000000304 R14: ffff888095d4d840 R15: ffff888092bdd020
FS: 0000555557529880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000280 CR3: 0000000098281000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
sock_poll+0x15e/0x480 net/socket.c:1256
vfs_poll include/linux/poll.h:90 [inline]
do_pollfd fs/select.c:859 [inline]
do_poll fs/select.c:907 [inline]
do_sys_poll+0x7c2/0xde0 fs/select.c:1001
__do_sys_ppoll fs/select.c:1101 [inline]
__se_sys_ppoll fs/select.c:1081 [inline]
__x64_sys_ppoll+0x259/0x310 fs/select.c:1081
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440159
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd9fbd16e8 EFLAGS: 00000246 ORIG_RAX: 000000000000010f
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000020000280
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004019e0
R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace da907175426b4065 ]---
RIP: 0010:xsk_poll+0x95/0x540 net/xdp/xsk.c:386
Code: 80 3c 02 00 0f 85 70 04 00 00 4c 8b a3 88 04 00 00 48 b8 00 00 00 00
00 fc ff df 49 8d bc 24 96 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48
89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 bf 03 00 00
RSP: 0018:ffff8880926f7850 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: ffff88809a141700 RCX: ffffffff859b07aa
RDX: 0000000000000012 RSI: ffffffff859b07c4 RDI: 0000000000000096
RBP: ffff8880926f7880 R08: ffff88809698a580 R09: ffffed1013428329
R10: ffffed1013428328 R11: ffff88809a141947 R12: 0000000000000000
R13: 0000000000000304 R14: ffff888095d4d840 R15: ffff888092bdd020
FS: 0000555557529880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000280 CR3: 0000000098281000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


2019-08-20 09:18:59

by Björn Töpel

[permalink] [raw]
Subject: Re: general protection fault in xsk_poll

On 2019-08-20 05:31, Hillf Danton wrote:
>
> On Mon, 19 Aug 2019 18:18:06 -0700
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: da657043 Add linux-next specific files for 20190819
>> git tree: linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=16af124c600000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=739a9b3ab3d8c770
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c82697e3043781e08802
>> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=109e1922600000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1445bf02600000
>>
>> The bug was bisected to:
>>
>> commit 77cd0d7b3f257fd0e3096b4fdcff1a7d38e99e10
>> Author: Magnus Karlsson <[email protected]>
>> Date: Wed Aug 14 07:27:17 2019 +0000
>>
>> xsk: add support for need_wakeup flag in AF_XDP rings
>>
>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15e1ea4c600000
>> final crash: https://syzkaller.appspot.com/x/report.txt?x=17e1ea4c600000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13e1ea4c600000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: [email protected]
>> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault: 0000 [#1] PREEMPT SMP KASAN
>> CPU: 1 PID: 7959 Comm: syz-executor611 Not tainted 5.3.0-rc5-next-20190819 #68
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> RIP: 0010:xsk_poll+0x95/0x540 net/xdp/xsk.c:386
>> Code: 80 3c 02 00 0f 85 70 04 00 00 4c 8b a3 88 04 00 00 48 b8 00 00 00 00
>> 00 fc ff df 49 8d bc 24 96 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48
>> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 bf 03 00 00
>> RSP: 0018:ffff8880926f7850 EFLAGS: 00010207
>> RAX: dffffc0000000000 RBX: ffff88809a141700 RCX: ffffffff859b07aa
>> RDX: 0000000000000012 RSI: ffffffff859b07c4 RDI: 0000000000000096
>> RBP: ffff8880926f7880 R08: ffff88809698a580 R09: ffffed1013428329
>> R10: ffffed1013428328 R11: ffff88809a141947 R12: 0000000000000000
>> R13: 0000000000000304 R14: ffff888095d4d840 R15: ffff888092bdd020
>> FS: 0000555557529880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020000280 CR3: 0000000098281000 CR4: 00000000001406e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> sock_poll+0x15e/0x480 net/socket.c:1256
>> vfs_poll include/linux/poll.h:90 [inline]
>> do_pollfd fs/select.c:859 [inline]
>> do_poll fs/select.c:907 [inline]
>> do_sys_poll+0x7c2/0xde0 fs/select.c:1001
>> __do_sys_ppoll fs/select.c:1101 [inline]
>> __se_sys_ppoll fs/select.c:1081 [inline]
>> __x64_sys_ppoll+0x259/0x310 fs/select.c:1081
>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x440159
>> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
>> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
>> ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007ffd9fbd16e8 EFLAGS: 00000246 ORIG_RAX: 000000000000010f
>> RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000020000280
>> RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004019e0
>> R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
>> Modules linked in:
>> ---[ end trace da907175426b4065 ]---
>
> Add umem check.
>
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -381,9 +381,9 @@ static unsigned int xsk_poll(struct file
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
> struct net_device *dev = xs->dev;
> - struct xdp_umem *umem = xs->umem;
> + struct xdp_umem *umem = READ_ONCE(xs->umem);
>
> - if (umem->need_wakeup)
> + if (umem && umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> umem->need_wakeup);
>
> --
>

Thanks!

What do you think about making it a bit more generic, like:

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk,
struct msghdr *m,
return err;
}

+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+ struct net_device *dev = READ_ONCE(xs->dev);
+
+ return dev && xs->state == XSK_BOUND;
+}
+
static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t
total_len)
{
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
struct sock *sk = sock->sk;
struct xdp_sock *xs = xdp_sk(sk);

- if (unlikely(!xs->dev))
+ if (unlikely(!xsk_is_bound(xs)))
return -ENXIO;
if (unlikely(!(xs->dev->flags & IFF_UP)))
return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file,
struct socket *sock,
struct net_device *dev = xs->dev;
struct xdp_umem *umem = xs->umem;

+ if (unlikely(!xsk_is_bound(xs)))
+ return mask;
+
if (umem->need_wakeup)
dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
{
struct net_device *dev = xs->dev;

- if (!dev || xs->state != XSK_BOUND)
+ if (!xsk_is_bound(xs))
return;

xs->state = XSK_UNBOUND;

2019-08-20 10:07:03

by Björn Töpel

[permalink] [raw]
Subject: [PATCH bpf-next] xsk: proper socket state check in xsk_poll

From: Björn Töpel <[email protected]>

The poll() implementation for AF_XDP sockets did not perform the
proper state checks, prior accessing the socket umem. This patch fixes
that by performing a xsk_is_bound() check.

Suggested-by: Hillf Danton <[email protected]>
Reported-by: [email protected]
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <[email protected]>
---
net/xdp/xsk.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
return err;
}

+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+ struct net_device *dev = READ_ONCE(xs->dev);
+
+ return dev && xs->state == XSK_BOUND;
+}
+
static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
{
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
struct sock *sk = sock->sk;
struct xdp_sock *xs = xdp_sk(sk);

- if (unlikely(!xs->dev))
+ if (unlikely(!xsk_is_bound(xs)))
return -ENXIO;
if (unlikely(!(xs->dev->flags & IFF_UP)))
return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
struct net_device *dev = xs->dev;
struct xdp_umem *umem = xs->umem;

+ if (unlikely(!xsk_is_bound(xs)))
+ return mask;
+
if (umem->need_wakeup)
dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
{
struct net_device *dev = xs->dev;

- if (!dev || xs->state != XSK_BOUND)
+ if (!xsk_is_bound(xs))
return;

xs->state = XSK_UNBOUND;
--
2.20.1

2019-08-20 14:33:22

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH bpf-next] xsk: proper socket state check in xsk_poll

On 8/20/19 12:04 PM, Björn Töpel wrote:
> From: Björn Töpel <[email protected]>
>
> The poll() implementation for AF_XDP sockets did not perform the
> proper state checks, prior accessing the socket umem. This patch fixes
> that by performing a xsk_is_bound() check.
>
> Suggested-by: Hillf Danton <[email protected]>
> Reported-by: [email protected]
> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> Signed-off-by: Björn Töpel <[email protected]>
> ---
> net/xdp/xsk.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..08bed5e92af4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> return err;
> }
>
> +static bool xsk_is_bound(struct xdp_sock *xs)
> +{
> + struct net_device *dev = READ_ONCE(xs->dev);
> +
> + return dev && xs->state == XSK_BOUND;
> +}
> +
> static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> {
> bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> struct sock *sk = sock->sk;
> struct xdp_sock *xs = xdp_sk(sk);
>
> - if (unlikely(!xs->dev))
> + if (unlikely(!xsk_is_bound(xs)))
> return -ENXIO;
> if (unlikely(!(xs->dev->flags & IFF_UP)))
> return -ENETDOWN;
> @@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> struct net_device *dev = xs->dev;
> struct xdp_umem *umem = xs->umem;
>
> + if (unlikely(!xsk_is_bound(xs)))
> + return mask;
> +
> if (umem->need_wakeup)
> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> umem->need_wakeup);
> @@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> {
> struct net_device *dev = xs->dev;
>
> - if (!dev || xs->state != XSK_BOUND)
> + if (!xsk_is_bound(xs))
> return;

I think I'm a bit confused by your READ_ONCE() usage. ;-/ I can see why you're
using it in xsk_is_bound() above, but then at the same time all the other callbacks
like xsk_poll() or xsk_unbind_dev() above have a struct net_device *dev = xs->dev
right before the test. Could you elaborate?

Thanks,
Daniel

2019-08-20 15:31:29

by Björn Töpel

[permalink] [raw]
Subject: Re: [PATCH bpf-next] xsk: proper socket state check in xsk_poll

On Tue, 20 Aug 2019 at 16:30, Daniel Borkmann <[email protected]> wrote:
>
> On 8/20/19 12:04 PM, Björn Töpel wrote:
> > From: Björn Töpel <[email protected]>
> >
> > The poll() implementation for AF_XDP sockets did not perform the
> > proper state checks, prior accessing the socket umem. This patch fixes
> > that by performing a xsk_is_bound() check.
> >
> > Suggested-by: Hillf Danton <[email protected]>
> > Reported-by: [email protected]
> > Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
> > Signed-off-by: Björn Töpel <[email protected]>
> > ---
> > net/xdp/xsk.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index ee4428a892fa..08bed5e92af4 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
> > return err;
> > }
> >
> > +static bool xsk_is_bound(struct xdp_sock *xs)
> > +{
> > + struct net_device *dev = READ_ONCE(xs->dev);
> > +
> > + return dev && xs->state == XSK_BOUND;
> > +}
> > +
> > static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> > {
> > bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
> > struct sock *sk = sock->sk;
> > struct xdp_sock *xs = xdp_sk(sk);
> >
> > - if (unlikely(!xs->dev))
> > + if (unlikely(!xsk_is_bound(xs)))
> > return -ENXIO;
> > if (unlikely(!(xs->dev->flags & IFF_UP)))
> > return -ENETDOWN;
> > @@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
> > struct net_device *dev = xs->dev;
> > struct xdp_umem *umem = xs->umem;
> >
> > + if (unlikely(!xsk_is_bound(xs)))
> > + return mask;
> > +
> > if (umem->need_wakeup)
> > dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
> > umem->need_wakeup);
> > @@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
> > {
> > struct net_device *dev = xs->dev;
> >
> > - if (!dev || xs->state != XSK_BOUND)
> > + if (!xsk_is_bound(xs))
> > return;
>
> I think I'm a bit confused by your READ_ONCE() usage. ;-/ I can see why you're
> using it in xsk_is_bound() above, but then at the same time all the other callbacks
> like xsk_poll() or xsk_unbind_dev() above have a struct net_device *dev = xs->dev
> right before the test. Could you elaborate?
>

Yes, now I'm confused as well! Digging deeper... I believe there are a
couple of places in xsk.c that do not have
READ_ONCE/WRITE_ONCE-correctness. Various xdp_sock members are read
lock-less outside the control plane mutex (mutex member of struct
xdp_sock). This needs some re-work. I'll look into using the newly
introduced state member (with corresponding read/write barriers) for
this.

I'll cook some patch(es) that address this, but first it sounds like I
need to reread [1] two, or three times. At least. ;-)


Thanks,
Björn


[1] https://lwn.net/Articles/793253/


> Thanks,
> Daniel

2019-08-20 21:25:37

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH bpf-next] xsk: proper socket state check in xsk_poll

On 8/20/19 5:29 PM, Björn Töpel wrote:
> On Tue, 20 Aug 2019 at 16:30, Daniel Borkmann <[email protected]> wrote:
>> On 8/20/19 12:04 PM, Björn Töpel wrote:
>>> From: Björn Töpel <[email protected]>
>>>
>>> The poll() implementation for AF_XDP sockets did not perform the
>>> proper state checks, prior accessing the socket umem. This patch fixes
>>> that by performing a xsk_is_bound() check.
>>>
>>> Suggested-by: Hillf Danton <[email protected]>
>>> Reported-by: [email protected]
>>> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
>>> Signed-off-by: Björn Töpel <[email protected]>
>>> ---
>>> net/xdp/xsk.c | 14 ++++++++++++--
>>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index ee4428a892fa..08bed5e92af4 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>>> return err;
>>> }
>>>
>>> +static bool xsk_is_bound(struct xdp_sock *xs)
>>> +{
>>> + struct net_device *dev = READ_ONCE(xs->dev);
>>> +
>>> + return dev && xs->state == XSK_BOUND;
>>> +}
>>> +
>>> static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>>> {
>>> bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
>>> struct sock *sk = sock->sk;
>>> struct xdp_sock *xs = xdp_sk(sk);
>>>
>>> - if (unlikely(!xs->dev))
>>> + if (unlikely(!xsk_is_bound(xs)))
>>> return -ENXIO;
>>> if (unlikely(!(xs->dev->flags & IFF_UP)))
>>> return -ENETDOWN;
>>> @@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
>>> struct net_device *dev = xs->dev;
>>> struct xdp_umem *umem = xs->umem;
>>>
>>> + if (unlikely(!xsk_is_bound(xs)))
>>> + return mask;
>>> +
>>> if (umem->need_wakeup)
>>> dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
>>> umem->need_wakeup);
>>> @@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
>>> {
>>> struct net_device *dev = xs->dev;
>>>
>>> - if (!dev || xs->state != XSK_BOUND)
>>> + if (!xsk_is_bound(xs))
>>> return;
>>
>> I think I'm a bit confused by your READ_ONCE() usage. ;-/ I can see why you're
>> using it in xsk_is_bound() above, but then at the same time all the other callbacks
>> like xsk_poll() or xsk_unbind_dev() above have a struct net_device *dev = xs->dev
>> right before the test. Could you elaborate?
>
> Yes, now I'm confused as well! Digging deeper... I believe there are a
> couple of places in xsk.c that do not have
> READ_ONCE/WRITE_ONCE-correctness. Various xdp_sock members are read
> lock-less outside the control plane mutex (mutex member of struct
> xdp_sock). This needs some re-work. I'll look into using the newly

Right, so even in above two cases, the compiler could have refetched, e.g.
dev variable could have first been NULL, but xsk_is_bound() later returns
true.

> introduced state member (with corresponding read/write barriers) for
> this.
>
> I'll cook some patch(es) that address this, but first it sounds like I
> need to reread [1] two, or three times. At least. ;-)
>
>
> Thanks,
> Björn
>
>
> [1] https://lwn.net/Articles/793253/
>
>
>> Thanks,
>> Daniel