Received: by 2002:a17:90a:1609:0:0:0:0 with SMTP id n9csp2143324pja; Thu, 26 Mar 2020 10:20:57 -0700 (PDT) X-Google-Smtp-Source: ADFU+vthvoftZPYCJbk6nX2ZfPAsOuPGun/5AuzxZD3iGJqn+fSbG/gFrmovdoytoJGH5xSbtDA7 X-Received: by 2002:a9d:6241:: with SMTP id i1mr7033635otk.12.1585243257216; Thu, 26 Mar 2020 10:20:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585243257; cv=none; d=google.com; s=arc-20160816; b=Ix0owBBDnnP+SQLVvLOzhiTB+caHpIT1ceIq6EjaT7kjNL98i5A2K1tRub1+dhCqDF xTud/T0cXYdtrnop8gfSc0BB4h2PESdlcehLlvlwTh7dVI9/9UvfwVSz4bO6UuqDLLFR 6Co6wL+iteVwyJjfCURNX9L42baL0uA5GiiRNJ5MBTqTbZdbJvie1HJEDAGQ9Pa6bagP 2gkskBWWAllc9wcvwhYwtf22VNJw7uQfgEHUxlPAXgp1dn/JgTjnrcO1mE59gM0CLot+ B7Of0ZToWMdmQ8nRG1DHOk1E9UFycV9rnz47Scd+oNehnZL680BK/dOlynUzFNz3i4M9 HJ9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=XIbb3r9XzGbzKbx3YQ7JftkiOo3Mu3GCiWAshl+XWjo=; b=kcuO45BM5UFtvFwzfHBi0797G4iGDvU6yFF2dDyhQNlfsGgnIXCY4Vf8/ziJaD1Cai C+iDfJWWw3gsQdXeM1FoeXUwLmh5qTM6/dQZwZQYeW8CRPU2ykRhw4T+bK+DjMb+H3Dd WvD5DvYD2TVxjbvhpRU0RB6pLDlDDR4xPyka4qDd/VEflButKkv1ytiQTQ+G71bIve0N tdWTB5r8NtEAoVmhfwXFXcDyd2nWwxHtaHBL9ohZs7Lo1vJQJZEdG49TZ3CDXb8T5SZ0 AoHLlxBiYtzn56UP++lGPuCydIp9X76GUvKTwrsto6V9p/hwinYL4cvQNSlGKCX5u9eI DaTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l63si1337608oif.161.2020.03.26.10.20.26; Thu, 26 Mar 2020 10:20:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728026AbgCZRQt convert rfc822-to-8bit (ORCPT + 99 others); Thu, 26 Mar 2020 13:16:49 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:49902 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727920AbgCZRQr (ORCPT ); Thu, 26 Mar 2020 13:16:47 -0400 Received: from mail-pj1-f69.google.com ([209.85.216.69]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jHW7Y-0002zN-Ry for linux-kernel@vger.kernel.org; Thu, 26 Mar 2020 17:16:45 +0000 Received: by mail-pj1-f69.google.com with SMTP id z5so5030429pjq.9 for ; Thu, 26 Mar 2020 10:16:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=dqfMGNuGJlDejAz9mJaTfol96HG948oTLjPcdkSSNds=; b=sPauk8MshEMUA6IPjkUjOCcvuk+NQP8tEOTot+PUDyOE9dZrkr4BLaafuJlWhUKqtk bBjRoh5wzPiVzmfiPK+YZ0JLe7tVBAHCOnNUFhIrCYehmf0Ae8YSLE+MLeFKhq5xO/SP 1CTt0lLLph9uOgTE9oo43CF/Hrhs2nUrExiK+PSd8Dc57Zo0nV3k90o/3HD6iyQl/eaB dlegXMcA44gjJMnUR3NEMepVcLrejgxonhWK0CTrmpiesXux6aA3B0LV+oD40cvEgpq0 M9M7SoU4e3hgFxSyram+mlVW5tQb+AmHA6ZxX7pnE6mBarb+10oOcxfOUzUeAk9TmHXd znPw== X-Gm-Message-State: ANhLgQ1Ng7XQhNMdq2wFDtqJwJZ3Ia5WnHUY7V4EbW9sEi223GpiUCe7 l820GL2fUlU80SKPzz/qU/ivqND82erDpapxBjm3ZUpxKoUA5Z8BMZwPDusf9YWcVI/GGbEtreN eI5cDaVM6QZhzX7z9Y6yX0U3sL2v8NO1AtKzCm8dXWg== X-Received: by 2002:a63:8948:: with SMTP id v69mr9362895pgd.318.1585243003071; Thu, 26 Mar 2020 10:16:43 -0700 (PDT) X-Received: by 2002:a63:8948:: with SMTP id v69mr9362862pgd.318.1585243002685; Thu, 26 Mar 2020 10:16:42 -0700 (PDT) Received: from [192.168.1.208] (220-133-187-190.HINET-IP.hinet.net. [220.133.187.190]) by smtp.gmail.com with ESMTPSA id f69sm2157271pfa.124.2020.03.26.10.16.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Mar 2020 10:16:42 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\)) Subject: Re: [Intel-wired-lan] [PATCH] igb: Use a sperate mutex insead of rtnl_lock() From: Kai-Heng Feng In-Reply-To: Date: Fri, 27 Mar 2020 01:16:39 +0800 Cc: Jeff Kirsher , "open list:NETWORKING DRIVERS" , open list , "moderated list:INTEL ETHERNET DRIVERS" , "David S. Miller" Content-Transfer-Encoding: 8BIT Message-Id: <98E86E5A-4EE9-4CB5-81CF-49C3E74C3AE6@canonical.com> References: <20200326103926.20888-1-kai.heng.feng@canonical.com> To: Alexander Duyck X-Mailer: Apple Mail (2.3608.60.0.2.5) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alexander, > On Mar 27, 2020, at 00:27, Alexander Duyck wrote: > > On Thu, Mar 26, 2020 at 3:39 AM Kai-Heng Feng > > wrote: >> >> Commit 9474933caf21 ("igb: close/suspend race in netif_device_detach") >> fixed race condition between close and power management ops by using >> rtnl_lock(). >> >> This fix is a preparation for next patch, to prevent a dead lock under >> rtnl_lock() when calling runtime resume routine. >> >> However, we can't use device_lock() in igb_close() because when module >> is getting removed, the lock is already held for igb_remove(), and >> igb_close() gets called during unregistering the netdev, hence causing a >> deadlock. So let's introduce a new mutex so we don't cause a deadlock >> with driver core or netdev core. >> >> Signed-off-by: Kai-Heng Feng > > So this description doesn't make much sense to me. You describe the > use of the device_lock() in igb_close() but it isn't used there. Sorry I forgot to add a revision number. It was used by previous version and Aaron found a regression when device_lock() is used. > In addition it seems like you are arbitrarily moving code that was > wrapped in the rtnl_lock out of it. I'm not entirely sure that is safe > since there are calls within many of these functions that assume the > rtnl_lock is held and changing that is likely going to introduce more > issues. The reason why rtnl lock needs to be removed is because of the following patch: https://lore.kernel.org/lkml/20200207101005.4454-2-kai.heng.feng@canonical.com/ Ethtools helpers already held rtnl_lock, so to prevent a deadlock, my idea is to use another lock to solve what "igb: close/suspend race in netif_device_detach" originally tried to fix. > > > >> --- >> drivers/net/ethernet/intel/igb/igb_main.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c >> index b46bff8fe056..dc7ed5dd216b 100644 >> --- a/drivers/net/ethernet/intel/igb/igb_main.c >> +++ b/drivers/net/ethernet/intel/igb/igb_main.c >> @@ -288,6 +288,8 @@ static const struct igb_reg_info igb_reg_info_tbl[] = { >> {} >> }; >> >> +static DEFINE_MUTEX(igb_mutex); >> + >> /* igb_regdump - register printout routine */ >> static void igb_regdump(struct e1000_hw *hw, struct igb_reg_info *reginfo) >> { >> @@ -4026,9 +4028,14 @@ static int __igb_close(struct net_device *netdev, bool suspending) >> >> int igb_close(struct net_device *netdev) >> { >> + int err = 0; >> + >> + mutex_lock(&igb_mutex); >> if (netif_device_present(netdev) || netdev->dismantle) >> - return __igb_close(netdev, false); >> - return 0; >> + err = __igb_close(netdev, false); >> + mutex_unlock(&igb_mutex); >> + >> + return err; >> } >> > > Okay, so I am guessing the problem has something to do with the > addition of the netdev->dismantle test here and the fact that it is > bypassing the present check for the hotplug remove case? Please see the rationale above. > > So it looks like nobody ever really reviewed commit 888f22931478 > ("igb: Free IRQs when device is hotplugged"). What I would recommend > is reverting it and instead we fix the remaining pieces that need to > be addressed in igb so it more closely matches what we have in e1000e > after commit a7023819404a ("e1000e: Use rtnl_lock to prevent race > conditions between net and pci/pm"). From what I can tell the only > pieces that are really missing is to update igb_io_error_detected so > that in addition to igb_down it will call igb_free_irq, and then in > addition we should be wrapping most of the code in that function with > an rtnl_lock since it is detaching a device and making modifications > to it. In addition to that, igb_shutdown() indirectly calls igb_close() when netdev unregistering the device. My "only scratch the surface" approach is because I don't have a reproducer for commit "igb: close/suspend race in netif_device_detach", and I am afraid of breaking it. Kai-Heng