Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4071690pxu; Wed, 9 Dec 2020 07:42:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyjd67R0InopywElzfWXcVlbPjL4kIw2T6aEKUP8kSCZ2xwDwyeGttdg4LVlzblRJh3Eoyt X-Received: by 2002:a17:906:b309:: with SMTP id n9mr2556040ejz.365.1607528533977; Wed, 09 Dec 2020 07:42:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607528533; cv=none; d=google.com; s=arc-20160816; b=b4iIrcwIEH2Hh93o0By+0WSnz2hdEdDmtQCuSFpJEWMj2HZX4mVizvmdv1/ZiQXSzH eHHUQ+GSVY1jUT48osA1x+5a9+hCS3KTVp1J5XuM/6MuPSY9D/Yr0AIZFYhW3ytgYyJ5 nOJkUvzTbGytghBrXCO0ujwTJUf/Kx3fH6/CgA6fH0jkSl5/AFbwvSujy7/I16oWkOCk co7jEyhZQdVEhfZHKUiz9k+8gXBQVhw8OLfMR1ia0YYyWsgJjM+AgGYi6awD/aWQj+wZ 1usMVtcmeOyc/5u3dZ5TGLmifbXRyz0FAjk+7gAgNQ6iDUH1xVUBuqWcPYy9TIWEzMqJ CRAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=TwZsNeR+aocPo/aT6u4plPoYBOWpeq4qK4h3Enahzxk=; b=QjXvpAIBulltJ17dFTSB15ztynw5qlV2GSOcnFIvMPjK/ESEIhsbvrbc3yywK3cvxr XI3HvUNbzyFUpIQZlinowY8rxhb0SUAv4o/sJA9NaJ1tTgq084AJxwY0jGO878tq+IjA am2z8IBP8sU3ifKIs8/BMG8f8h9TtCCYtW82udAOYCWkN9aoTdDrSE5qhveDFOfXV8Md wEKMDswcBHJZPG3DZz1nVOC4UGmj8zqK0LeOEey1U0NZJDcVb2PSb3qlLNMzwfvIvzen Yyu+/Q9z/LAFtZ7aAveUJditgVJEhzr7KR42UVt4ScpA1KDR+e7YRxBZH2SBQuZq/Xdi QZoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NsFG7GUm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z15si858584eji.718.2020.12.09.07.41.42; Wed, 09 Dec 2020 07:42:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NsFG7GUm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730166AbgLIP2Q (ORCPT + 99 others); Wed, 9 Dec 2020 10:28:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:45511 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730095AbgLIP2Q (ORCPT ); Wed, 9 Dec 2020 10:28:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607527609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TwZsNeR+aocPo/aT6u4plPoYBOWpeq4qK4h3Enahzxk=; b=NsFG7GUmiCDwYwvqbkF4oonReHajye45BtJtUgYRfJI9SS273e0B60BOn21GCCie6sexOM bONK8R3WeuZDF/aQBbmPoGAUxT5SLiALsy0FpATEES+pMuBvE+fv8uKacewuDvuzXa3eG0 wUjMQIdYeaj0skN4y5Nhh8RaE72K0Uk= Received: from mail-ot1-f69.google.com (mail-ot1-f69.google.com [209.85.210.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-434-PSeZt9MrPieqXZJ5InZqtA-1; Wed, 09 Dec 2020 10:26:47 -0500 X-MC-Unique: PSeZt9MrPieqXZJ5InZqtA-1 Received: by mail-ot1-f69.google.com with SMTP id 92so957593otx.7 for ; Wed, 09 Dec 2020 07:26:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TwZsNeR+aocPo/aT6u4plPoYBOWpeq4qK4h3Enahzxk=; b=iU6M77QSH1Tww+n4cIIuB2o61iC0hBVDZVamGojZC+S54Qm94itEGRnx6J8z9rP7Xy 6YsMoZdhM6Pi5LqO983NCCITKglv6711pXCCVYqPWFqSEHfWcOluCVndJPz7GgHSCVb/ KlJmAFvqucn9I1JUfAmo8efMqICWLFAo7aGWS5Uy6Hj2URB+Deu1XJaYwOX/Hr2vUwGG uHL5LlHv9ecU7vHC3bYN2zCRNC1u1xerlAQBhQuHbdpejoFCCyv3m39sZFn/1WUmcZbu 8GyO9wBXLV6JqGdK6VcE7Ofwa1mfAQfzjh/3WsXI+lbVl3Q2SJFgQQjRbMamku6YK/Aq LI3A== X-Gm-Message-State: AOAM5322+vnjAWEzvPEUiJ7NEVlBEdjg90M2ir6djXu6ToNw8FOsYa5a L1JAgTjUtPMP/vP2FX0Oy+ry3LHPIgySX5ePh4XQtZHtXRhiVCfDqxVQ/Dc9M6+YyUYI0Bwf00O Qm/yTASMWxNI2H1JT3LHWs5PVXf16zwKLdoWtmQpY X-Received: by 2002:a05:6830:1308:: with SMTP id p8mr2146327otq.330.1607527607023; Wed, 09 Dec 2020 07:26:47 -0800 (PST) X-Received: by 2002:a05:6830:1308:: with SMTP id p8mr2146312otq.330.1607527606795; Wed, 09 Dec 2020 07:26:46 -0800 (PST) MIME-Version: 1.0 References: <20201205234354.1710-1-jarod@redhat.com> <11900.1607459690@famine> In-Reply-To: <11900.1607459690@famine> From: Jarod Wilson Date: Wed, 9 Dec 2020 10:26:38 -0500 Message-ID: Subject: Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread To: Jay Vosburgh Cc: LKML , Mahesh Bandewar , Veaceslav Falico , Andy Gospodarek , "David S. Miller" , Jakub Kicinski , Thomas Davis , Netdev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 8, 2020 at 3:35 PM Jay Vosburgh wrote: > > Jarod Wilson wrote: > > >I'm seeing a system get stuck unable to bring a downed interface back up > >when it's got an updelay value set, behavior which ceased when logging > >spew was removed from bond_miimon_inspect(). I'm monitoring logs on this > >system over another network connection, and it seems that the act of > >spewing logs at all there increases rtnl lock contention, because > >instrumented code showed bond_mii_monitor() never able to succeed in it's > >attempts to call rtnl_trylock() to actually commit link state changes, > >leaving the downed link stuck in BOND_LINK_DOWN. The system in question > >appears to be fine with the log spew being moved to > >bond_commit_link_state(), which is called after the successful > >rtnl_trylock(). I'm actually wondering if perhaps we ultimately need/want > >some bond-specific lock here to prevent racing with bond_close() instead > >of using rtnl, but this shift of the output appears to work. I believe > >this started happening when de77ecd4ef02 ("bonding: improve link-status > >update in mii-monitoring") went in, but I'm not 100% on that. > > We use RTNL not to avoid deadlock with bonding itself, but > because the "commit" side undertakes actions which require RTNL, e.g., > various events will eventually call netdev_lower_state_changed. > > However, the RTNL acquisition is a trylock to avoid the deadlock > with bond_close. Moving that out of line here (e.g., putting the commit > into another work queue event or the like) has the same problem, in that > bond_close needs to wait for all of the work queue events to finish, and > it holds RTNL. Ah, okay, it wasn't clear to me that we actually do need RTNL here, I'd thought it was just for the deadlock avoidance with bond_close, based on the comments in the source. > Also, a dim memory says that the various notification messages > were mostly placed in the "inspect" phase and not the "commit" phase to > avoid doing printk-like activities with RTNL held. As a general > principle, I don't think we want to add more verbiage under RTNL. Yeah, that's fair. > >The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat > >separate from the fix for the actual hang, but it eliminates a constant > >"invalid new link 3 on slave" message seen related to this issue, and it's > >not actually an invalid state here, so we shouldn't be reporting it as an > >error. > > Do you mean bond_miimon_commit here and not bond_miimon_inspect > (which already has a case for BOND_LINK_BACK)? Whoops, yes. > In principle, bond_miimon_commit should not see _BACK or _FAIL > state as a new link state, because those states should be managed at the > bond_miimon_inspect level (as they are the result of updelay and > downdelay). These states should not be "committed" in the sense of > causing notifications or doing actions that require RTNL. > > My recollection is that the "invalid new link" messages were the > result of a bug in de77ecd4ef02, which was fixed in 1899bb325149 > ("bonding: fix state transition issue in link monitoring"), but maybe > the RTNL problem here induces that in some other fashion. > > Either way, I believe this message is correct as-is. Hm, okay, definitely seeing this message pop up regularly when a link recovers, using a fairly simple reproducer: slave1=p6p1 slave2=p6p2 modprobe -rv bonding modprobe -v bonding mode=2 miimon=100 updelay=200 ip link set bond0 up ifenslave bond0 $slave1 $slave2 sleep 10 while : do ip link set $slave1 down sleep 1 ip link set $slave1 up sleep 1 done I wasn't actually seeing the problem until I was running a 'watch -n 1 "dmesg | tail -n 50"' or similar in a remote ssh session on the host. I should add the caveat that this was also initially seen on an older kernel, but with a fairly up-to-date bonding driver, which does include both de77ecd4ef02 and 1899bb325149. I'm going to keep prodding w/a more recent upstreamier kernel, and see if I can get a better idea of what's actually going on. -- Jarod Wilson jarod@redhat.com