Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65D21C433EF for ; Wed, 10 Nov 2021 19:24:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4CCCD611AD for ; Wed, 10 Nov 2021 19:24:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232951AbhKJT1p (ORCPT ); Wed, 10 Nov 2021 14:27:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232718AbhKJT1m (ORCPT ); Wed, 10 Nov 2021 14:27:42 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B205C061764 for ; Wed, 10 Nov 2021 11:24:54 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id d3so5828625wrh.8 for ; Wed, 10 Nov 2021 11:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hgxxmvGeWBqoVVWm+aT8y1AIFogI9R6xV+DgrB/wQzg=; b=iybUjt6NruXZ+ansxh33A5/V/Kjub9Ad/KI+A8G1/mu2HhjLxBemhQ/PqkuNtDN3Qo mWIeyHoxFSDcSgRYjZ1ST+cVXoQKdv0Pzqfm+a/6/HcfyjrRlQzTVrenX7pFKQR7bmR0 bdcIg5ZGPsGgRxo/cRYuseMzYyUPXpBAuseJRLwyZrX1JR9qxC+o8WK19GM3Kvl6fg+L ix+6IsKFXrrJE18EnuwzRbZcc1Trt1KELfyTTHARMAFQWLAgwiURgi8Y3X2wHm9CCghO ZhpL9sZhrWfR5U2gb8cqLb9CiJgGEUhTbwFvzR6Z0hXr1eGR091KNwJgPkoyZFQtQafa +ysQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hgxxmvGeWBqoVVWm+aT8y1AIFogI9R6xV+DgrB/wQzg=; b=NRD3zpVPXfROvYPr5WOmrSru2BsP9+h4Ox0EVNCnGlVpcb21GZEBPPloqBF3Urkf96 XcCowx630LLSNb2TlcPXGjil0YoCTxISWCAOs5+H82yKbHJstROnEUzrgUxbscy7IFfU y0JG42JlMsn9tbVQtVVyNCE78hznnuHdUszLKrt3vzvOuDPSHue7rXDKYkSeNlVb9O5n jty5LBOA41a/z+G7bogDnAM1BwmAnaUwJN+Yrg6tCMlAeSeioWm94lb2KISH0ePe4aw3 nP7bPwMjngUKaSpj/urymnL8udov7lew6oOPOGNxR8Ky3AMdXtQ5Qjlrl0sSmLACNfuz QncQ== X-Gm-Message-State: AOAM530bhRijDk6ws9AxG7pY6z/1VHNO1bTQAlttouoEEAKiO19dUfHD ibUh7zuCGNG2dK/Jf/WV12e0aM+CMbGscSNl/xSZ3Q== X-Google-Smtp-Source: ABdhPJyu7rmRJdHQ2mxXsdvx5G3j1bwvAoiFZVmhi+qE5EZUPDk/1xRxjliTsllaG3LdOf8V3w4b2AJtPUipHLS9ja0= X-Received: by 2002:a05:6000:1548:: with SMTP id 8mr1798212wry.279.1636572292501; Wed, 10 Nov 2021 11:24:52 -0800 (PST) MIME-Version: 1.0 References: <20211110191126.1214-1-alexandr.lobakin@intel.com> In-Reply-To: <20211110191126.1214-1-alexandr.lobakin@intel.com> From: Eric Dumazet Date: Wed, 10 Nov 2021 11:24:39 -0800 Message-ID: Subject: Re: [PATCH net] net: fix premature exit from NAPI state polling in napi_disable() To: Alexander Lobakin Cc: "David S. Miller" , Jakub Kicinski , Jesse Brandeburg , Maciej Fijalkowski , Michal Swiatkowski , Xuan Zhuo , Antoine Tenart , Wei Wang , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 10, 2021 at 11:11 AM Alexander Lobakin wrote: > > Commit 719c57197010 ("net: make napi_disable() symmetric with > enable") accidentally introduced a bug sometimes leading to a kernel > BUG when bringing an iface up/down under heavy traffic load. > > Prior to this commit, napi_disable() was polling n->state until > none of (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC) is set and then > always flip them. Now there's a possibility to get away with the > NAPIF_STATE_SCHE unset as 'continue' drops us to the cmpxchg() > call with an unitialized variable, rather than straight to > another round of the state check. > > Error path looks like: > > napi_disable(): > unsigned long val, new; /* new is uninitialized */ > > do { > val = READ_ONCE(n->state); /* NAPIF_STATE_NPSVC and/or > NAPIF_STATE_SCHED is set */ > if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { /* true */ > usleep_range(20, 200); > continue; /* go straight to the condition check */ > } > new = val | <...> > } while (cmpxchg(&n->state, val, new) != val); /* state == val, cmpxchg() > writes garbage */ > > napi_enable(): > do { > val = READ_ONCE(n->state); > BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); /* 50/50 boom */ > <...> > > while the typical BUG splat is like: > > [ > Fix this by replacing this 'continue' with a goto to the beginning > of the loop body to restore the original behaviour. > This could be written without a goto, but would look uglier and > require one more indent level. > > Fixes: 719c57197010 ("net: make napi_disable() symmetric with enable") > Signed-off-by: Alexander Lobakin > Reviewed-by: Jesse Brandeburg > --- > net/core/dev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index edeb811c454e..5e101c53b9de 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6929,10 +6929,11 @@ void napi_disable(struct napi_struct *n) > set_bit(NAPI_STATE_DISABLE, &n->state); > > do { > +retry: > val = READ_ONCE(n->state); > if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { > usleep_range(20, 200); > - continue; > + goto retry; > } > > new = val | NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC; > -- > 2.33.1 > Good catch ! What about replacing the error prone do {...} while (cmpxchg(..)) by something less confusing ? This way, no need for a label. diff --git a/net/core/dev.c b/net/core/dev.c index 5e37d6809317fb3c54686188a908bfcb0bfccdab..9327141892cdaaf0282e082e0c6746abae0f12a7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6264,7 +6264,7 @@ void napi_disable(struct napi_struct *n) might_sleep(); set_bit(NAPI_STATE_DISABLE, &n->state); - do { + for (;;) { val = READ_ONCE(n->state); if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { usleep_range(20, 200); @@ -6273,7 +6273,9 @@ void napi_disable(struct napi_struct *n) new = val | NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC; new &= ~(NAPIF_STATE_THREADED | NAPIF_STATE_PREFER_BUSY_POLL); - } while (cmpxchg(&n->state, val, new) != val); + if (cmpxchg(&n->state, val, new) == val) + break; + } hrtimer_cancel(&n->timer);