Received: by 2002:a17:90a:bc8d:0:0:0:0 with SMTP id x13csp2414462pjr; Tue, 19 May 2020 14:48:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdkvzV9vnAmTbGKelkX53Y5YjjY10IT9RGd/8zL5LOXRirA2A3PkKLJX8YoEsyi3fW4MBn X-Received: by 2002:a50:f057:: with SMTP id u23mr745752edl.238.1589924897790; Tue, 19 May 2020 14:48:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589924897; cv=none; d=google.com; s=arc-20160816; b=vLeXMs1AQGRJAO+yAP2LCq6XlOIi1RHeoHV8l4OO1hGNqXrlymUZ6baGPSFL+ZHiLv hrR+nQpb1j/DSv1HcHtTTpA2zkUTe+y1xVDWqHYsnSxrn1h3DBrVGBKeQYma4cZYm60v wITsMdS5an7JckFqniaElwawa/aBHe7znDnngjzW7MrPYKUldxupK0pVPezzk+hRvkt2 Rb1bNc6gmf3vpJs8zmlnLhqGAt3IL9zVe/09C6ym0b1bdjrgRtkoSwawcvWTBRi5rXD2 B+taWgVrmkwshG+WJGShivkjSDVUyPo3iG+YWYXXll0EhU0d0m1AcZLDwzbCNQGQGyA1 KPVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=p2vPSqk9ktgGKhCrm4FZp8Z5PppczSwvycmZdmrjc/0=; b=T+RDVbaPooUSn2ncZF9UUMY4MckrL2zhrdZncSx7l9/RkLq+Mooup0AV+XsW/Dt2gK omD8ASUkF/ZpFnhO28AQ+czTm7titJfeMax9hqCpiztJTEtRzckLf9Ds1+EqhG67hpQK 8zi9H5Y3sCAQbSQ3fl7RBRI90qFz1Xmmkzo+QoR5ufMA3wccmiTQCv0g2Rpbya9mWKUg G1M010d2JjFzmIVQTVs0QnU82DvOKLfGknElMGy+YK9TXlfQyjSDH5BwLyHFfAa7iD3H esa5/jHxAoPNielntE7IV1Ppu7JfI40zNykF70agdXLNNsTKfSKTMYaNyNUgP9rSZ42u +R3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g24si600180ejx.432.2020.05.19.14.47.54; Tue, 19 May 2020 14:48:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728015AbgESVqX (ORCPT + 99 others); Tue, 19 May 2020 17:46:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726030AbgESVqX (ORCPT ); Tue, 19 May 2020 17:46:23 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8C07C08C5C0; Tue, 19 May 2020 14:46:22 -0700 (PDT) Received: from [5.158.153.53] (helo=debian-buster-darwi.lab.linutronix.de.) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jbA3e-0002Yy-7I; Tue, 19 May 2020 23:45:54 +0200 From: "Ahmed S. Darwish" To: Peter Zijlstra , Ingo Molnar , Will Deacon Cc: Thomas Gleixner , "Paul E. McKenney" , "Sebastian A. Siewior" , Steven Rostedt , LKML , "Ahmed S. Darwish" , "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org Subject: [PATCH v1 01/25] net: core: device_rename: Use rwsem instead of a seqcount Date: Tue, 19 May 2020 23:45:23 +0200 Message-Id: <20200519214547.352050-2-a.darwish@linutronix.de> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200519214547.352050-1-a.darwish@linutronix.de> References: <20200519214547.352050-1-a.darwish@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sequence counters write paths are critical sections that must never be preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed. Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and netdev name retrieval.") handled a deadlock, observed with CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was infinitely spinning: it got scheduled after the seqcount write side blocked inside its own critical section. To fix that deadlock, among other issues, the commit added a cond_resched() inside the read side section. While this will get the non-preemptible kernel eventually unstuck, the seqcount reader is fully exhausting its slice just spinning -- until TIF_NEED_RESCHED is set. The fix is also still broken: if the seqcount reader belongs to a real-time scheduling policy, it can spin forever and the kernel will livelock. Disabling preemption over the seqcount write side critical section will not work: inside it are a number of GFP_KERNEL allocations and mutex locking through the drivers/base/ :: device_rename() call chain. From all the above, replace the seqcount with a rwsem. Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.) Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount) Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Cc: Signed-off-by: Ahmed S. Darwish Reviewed-by: Sebastian Andrzej Siewior --- net/core/dev.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 522288177bbd..e18a4c23df0e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -79,6 +79,7 @@ #include #include #include +#include #include #include #include @@ -194,7 +195,7 @@ static DEFINE_SPINLOCK(napi_hash_lock); static unsigned int napi_gen_id = NR_CPUS; static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8); -static seqcount_t devnet_rename_seq; +static DECLARE_RWSEM(devnet_rename_sem); static inline void dev_base_seq_inc(struct net *net) { @@ -930,18 +931,13 @@ EXPORT_SYMBOL(dev_get_by_napi_id); * @net: network namespace * @name: a pointer to the buffer where the name will be stored. * @ifindex: the ifindex of the interface to get the name from. - * - * The use of raw_seqcount_begin() and cond_resched() before - * retrying is required as we want to give the writers a chance - * to complete when CONFIG_PREEMPTION is not set. */ int netdev_get_name(struct net *net, char *name, int ifindex) { struct net_device *dev; - unsigned int seq; -retry: - seq = raw_seqcount_begin(&devnet_rename_seq); + down_read(&devnet_rename_sem); + rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); if (!dev) { @@ -951,10 +947,8 @@ int netdev_get_name(struct net *net, char *name, int ifindex) strcpy(name, dev->name); rcu_read_unlock(); - if (read_seqcount_retry(&devnet_rename_seq, seq)) { - cond_resched(); - goto retry; - } + + up_read(&devnet_rename_sem); return 0; } @@ -1228,10 +1222,10 @@ int dev_change_name(struct net_device *dev, const char *newname) likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK))) return -EBUSY; - write_seqcount_begin(&devnet_rename_seq); + down_write(&devnet_rename_sem); if (strncmp(newname, dev->name, IFNAMSIZ) == 0) { - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return 0; } @@ -1239,7 +1233,7 @@ int dev_change_name(struct net_device *dev, const char *newname) err = dev_get_valid_name(net, dev, newname); if (err < 0) { - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return err; } @@ -1254,11 +1248,11 @@ int dev_change_name(struct net_device *dev, const char *newname) if (ret) { memcpy(dev->name, oldname, IFNAMSIZ); dev->name_assign_type = old_assign_type; - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); return ret; } - write_seqcount_end(&devnet_rename_seq); + up_write(&devnet_rename_sem); netdev_adjacent_rename_links(dev, oldname); @@ -1279,7 +1273,7 @@ int dev_change_name(struct net_device *dev, const char *newname) /* err >= 0 after dev_alloc_name() or stores the first errno */ if (err >= 0) { err = ret; - write_seqcount_begin(&devnet_rename_seq); + down_write(&devnet_rename_sem); memcpy(dev->name, oldname, IFNAMSIZ); memcpy(oldname, newname, IFNAMSIZ); dev->name_assign_type = old_assign_type; -- 2.20.1