Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3633457ybi; Mon, 10 Jun 2019 13:48:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqy3qstWBqv9VAjjFhDEIM9dxHEKUQxalmSQBFWH769yUzD81ZxIO5P7HPD5I8pbRFqnXy1/ X-Received: by 2002:a17:902:27a8:: with SMTP id d37mr72731741plb.150.1560199724263; Mon, 10 Jun 2019 13:48:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560199724; cv=none; d=google.com; s=arc-20160816; b=qNsiaChKfY39xXwjVkMS/XgaL5TX1m8yXtibEzZuheQBaOIIVOdTnHSapnWBnXmHDv 8ziXcOQs9E2DpZmLT7rDG7AJxu6yDApy4oWe5iHwAatYc2z/Ks0SXtVqfVxOaZrR5zQM D9MhM4URg/mce5/cgnr6+uxyjKrOnsQs2VOxWEr/J0iyrW86a3y7thN5DFDnf0NqTGIW qImForFqSoE42awCKZVwzGo/jWV/qRlpsrb2KoJh2lbRRMLAHKd0VQ4m+M0m5LH1ENWO eKDMJkTBGWT2JojTJC13SgHEVYhLZqxxL9qTJrlXrxipTldQkox8/nZECu4cZY5/1MmC DHSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=9FIvf0wdVvsh5I953IzVSfOkHxWGo3u/M3vEl9IVjf4=; b=uE4zDoEqppffdwDUc3VYATxdHPSI6DfAuGAOZq2chv2iUtxl6Ar+HTMWEc03eefXED zeNp+n1Hr87KqStkDORHnL/mPKp6QHZdG8VO9LUKFlfmw5DF1J/1hyw/hWsEHPYvhJB2 3tTqYxlXKfaeabaJGddNw1Ov/6qLwr/XivofbLb5eUBwAuDM6/NJ4CGzBIGRyvMYoZ0R fBLUK/3HilXo1Km0725WXbg/GFAPPCE78rAHr9YYIPcm9hnVl5DFfUpLceTNKLSfPOU1 4/AVRiJhXaNjuU2KkQBDF06UEaDyqFMggcoPaP+3/pjJwuTj4pXCmsRI94VeoYuraLLc Ny4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dyo3HAQC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 123si11755429pfz.150.2019.06.10.13.48.29; Mon, 10 Jun 2019 13:48:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=dyo3HAQC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389862AbfFJUr5 (ORCPT + 99 others); Mon, 10 Jun 2019 16:47:57 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:36491 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389429AbfFJUr5 (ORCPT ); Mon, 10 Jun 2019 16:47:57 -0400 Received: by mail-pf1-f193.google.com with SMTP id r7so84494pfl.3; Mon, 10 Jun 2019 13:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version; bh=9FIvf0wdVvsh5I953IzVSfOkHxWGo3u/M3vEl9IVjf4=; b=dyo3HAQCVAXrKSUS/OuygXcHmDusgpOEkKSWnmNchqonJxYZXEUPC0DkQ36pdi7Ar0 KARN/uVRF2ozIIBwvYLFiH55asDk+maKte0fddLEbvBs2cP4Y8UE8tU2PsSm18xFu32Q vq/7/zdh93XIfv28anmqwej7k1W3UMC+K8yrZNdTsx1emRpbpY2AaXiXCgBYrtFrmaC1 Tv2Z9Gj3qL929kjCpu4izt58CwCj85IER8dtIxqLG16XRI9dOgYdkij9+NxYtpL3h20H CGBDeG/6xwshMWB7J8RmbDtMwfz0fKnQu/qaorPwI/bRMna20YY+b+jmcfYzHeSB6DTU Qa/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version; bh=9FIvf0wdVvsh5I953IzVSfOkHxWGo3u/M3vEl9IVjf4=; b=c1iGpAdDmmtJ3hGPcQ1Gl/FzeagHUey76E2AKDROC8cZEGXgJ1s30dzc3kX8LoYqUs jYZTYMGTomP9G9KR3z8JKxjwnZCgPlVyYWkOp4gD/K+umJ7ZWW3eR0fRUl7wZXc5EOR1 72UR0v7qNlMkaS4dCR+i6DeqtcwBggVoyZDbyfAQNFAGb8pJcd0BPWkELtEUTJhoHBJ1 KdnQdhSdZPbkofcay5NJok709D6nypnp0S7SwWUNfm25/8JsjImupv6tgiqytW9wvL/x HYWHNXWLsmtpzRVgvqiCs/a4xYhNyHrAHpmBfdw5GgHNXqXtrfoTnLJOT1LFy5wZGSLI imtQ== X-Gm-Message-State: APjAAAX7fTy8HngD7WBDTpFyVQffn4eCfmp7FioDJFiD0Ct1N1Dwb0vi pXbbz1gtFCuxJx7H4HT54zxoVYMLWLI= X-Received: by 2002:aa7:8013:: with SMTP id j19mr14391055pfi.212.1560199676141; Mon, 10 Jun 2019 13:47:56 -0700 (PDT) Received: from [172.20.174.171] ([2620:10d:c090:180::1:1558]) by smtp.gmail.com with ESMTPSA id d19sm310809pjs.22.2019.06.10.13.47.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Jun 2019 13:47:55 -0700 (PDT) From: "Jonathan Lemon" To: "Ilya Maximets" Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, xdp-newbies@vger.kernel.org, "David S. Miller" , "=?utf-8?b?QmrDtnJuIFTDtnBlbA==?=" , "Magnus Karlsson" , "Jakub Kicinski" Subject: Re: [PATCH bpf v3] xdp: fix hang while unregistering device bound to xdp socket Date: Mon, 10 Jun 2019 13:47:54 -0700 X-Mailer: MailMate (1.12.5r5635) Message-ID: <06C99519-64B9-4A91-96B9-0F99731E3857@gmail.com> In-Reply-To: <20190610161546.30569-1-i.maximets@samsung.com> References: <20190610161546.30569-1-i.maximets@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10 Jun 2019, at 9:15, Ilya Maximets wrote: > Device that bound to XDP socket will not have zero refcount until the > userspace application will not close it. This leads to hang inside > 'netdev_wait_allrefs()' if device unregistering requested: > > # ip link del p1 > < hang on recvmsg on netlink socket > > > # ps -x | grep ip > 5126 pts/0 D+ 0:00 ip link del p1 > > # journalctl -b > > Jun 05 07:19:16 kernel: > unregister_netdevice: waiting for p1 to become free. Usage count = 1 > > Jun 05 07:19:27 kernel: > unregister_netdevice: waiting for p1 to become free. Usage count = 1 > ... > > Fix that by implementing NETDEV_UNREGISTER event notification handler > to properly clean up all the resources and unref device. > > This should also allow socket killing via ss(8) utility. > > Fixes: 965a99098443 ("xsk: add support for bind for Rx") > Signed-off-by: Ilya Maximets > --- > > Version 3: > > * Declaration lines ordered from longest to shortest. > * Checking of event type moved to the top to avoid unnecessary > locking. > > Version 2: > > * Completely re-implemented using netdev event handler. > > net/xdp/xsk.c | 65 > ++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 64 insertions(+), 1 deletion(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index a14e8864e4fa..273a419a8c4d 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -693,6 +693,57 @@ static int xsk_mmap(struct file *file, struct > socket *sock, > size, vma->vm_page_prot); > } > > +static int xsk_notifier(struct notifier_block *this, > + unsigned long msg, void *ptr) > +{ > + struct net_device *dev = netdev_notifier_info_to_dev(ptr); > + struct net *net = dev_net(dev); > + int i, unregister_count = 0; > + struct sock *sk; > + > + switch (msg) { > + case NETDEV_UNREGISTER: > + mutex_lock(&net->xdp.lock); The call is under the rtnl lock, and we're not modifying the list, so this mutex shouldn't be needed. > + sk_for_each(sk, &net->xdp.list) { > + struct xdp_sock *xs = xdp_sk(sk); > + > + mutex_lock(&xs->mutex); > + if (dev != xs->dev) { > + mutex_unlock(&xs->mutex); > + continue; > + } > + > + sk->sk_err = ENETDOWN; > + if (!sock_flag(sk, SOCK_DEAD)) > + sk->sk_error_report(sk); > + > + /* Wait for driver to stop using the xdp socket. */ > + xdp_del_sk_umem(xs->umem, xs); > + xs->dev = NULL; > + synchronize_net(); Isn't this by handled by the unregister_count case below? > + > + /* Clear device references in umem. */ > + xdp_put_umem(xs->umem); > + xs->umem = NULL; This makes me uneasy. We need to unregister the umem from the device (xdp_umem_clear_dev()) but this can remove the umem pages out from underneath the xsk. Perhaps what's needed here is the equivalent of an unbind() call that just detaches the umem/sk from the device, but does not otherwise tear them down. > + mutex_unlock(&xs->mutex); > + unregister_count++; > + } > + mutex_unlock(&net->xdp.lock); > + > + if (unregister_count) { > + /* Wait for umem clearing completion. */ > + synchronize_net(); > + for (i = 0; i < unregister_count; i++) > + dev_put(dev); > + } > + > + break; > + } > + > + return NOTIFY_DONE; > +} > + > static struct proto xsk_proto = { > .name = "XDP", > .owner = THIS_MODULE, > @@ -727,7 +778,8 @@ static void xsk_destruct(struct sock *sk) > if (!sock_flag(sk, SOCK_DEAD)) > return; > > - xdp_put_umem(xs->umem); > + if (xs->umem) > + xdp_put_umem(xs->umem); Not needed - xdp_put_umem() already does a null check. -- Jonathan > > sk_refcnt_debug_dec(sk); > } > @@ -784,6 +836,10 @@ static const struct net_proto_family > xsk_family_ops = { > .owner = THIS_MODULE, > }; > > +static struct notifier_block xsk_netdev_notifier = { > + .notifier_call = xsk_notifier, > +}; > + > static int __net_init xsk_net_init(struct net *net) > { > mutex_init(&net->xdp.lock); > @@ -816,8 +872,15 @@ static int __init xsk_init(void) > err = register_pernet_subsys(&xsk_net_ops); > if (err) > goto out_sk; > + > + err = register_netdevice_notifier(&xsk_netdev_notifier); > + if (err) > + goto out_pernet; > + > return 0; > > +out_pernet: > + unregister_pernet_subsys(&xsk_net_ops); > out_sk: > sock_unregister(PF_XDP); > out_proto: > -- > 2.17.1