Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3165294rwb; Mon, 19 Sep 2022 16:20:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5435CiI7gz/MYO6qkec+Mxt4/jL15MHdc/fwlpg5HfTZAID3/ys10eWcNuNjE0BNtSgKVN X-Received: by 2002:a17:902:d4c9:b0:178:1e39:31f7 with SMTP id o9-20020a170902d4c900b001781e3931f7mr2033761plg.142.1663629638225; Mon, 19 Sep 2022 16:20:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663629638; cv=none; d=google.com; s=arc-20160816; b=r1dblD7RQmEc6OCvFrF3/+J1n5Fdz49wrCGTOM1SsXck+/cKva9Fbc3g/F5ecH+KFA 8qxGCREXobkml6h9i4yqtleP9h38YDycyfqxXs030W/RMrYmefSv3YWSAuFJXwyZM5eU e7TNl2On0oBnjOYwvn7atpTybUAXuwPhi957Hh+Zs0tkvVgmjbOIAXjGUCnNq0Ik0eOW qAYksnKo90rA0OGyZf8tYJYfaujihdfKWr53aJyW54skBIveXUTELYcQB3nr+0YsIQDz cJk4rycORuYDx/y6WrrXRJC7WX3wQii3niVzOgQTs/jRIHBBT2sLr+aHS/aKF6fksMmx VmAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=xixo44lTcERHPiUV64xTDRqasHw9DlQyVVg8mpJU9es=; b=S2gLZSA4o7D19VFEiGJVx8dXPXMf+nBXi3t8yG5xm5fCZEZW4gaybb+p/sWn8tmWQK 3STIPIlRdk91dA5DiTFHjlHbWdluaV3FBISjX8MW6GIA6wZ9vWOojUa0XBWsY9qoLrPs LAE3xKd17773QCSuo8ppc1xQJYrXbcQ5lagOGQEhGvqM33t0cWa0ukRUcZ58CgSwc2Zt 2dDKE4moV/WSGG+1nXpLMz6LPm1rxKDSXC4p9iwSurzKao4iVaz5OyaDV2hCOaJkEfru yJpja8mTXsVlAX++DxzbupzmveVe/qLAikUtw7tibmiqaYCeTmNmZsp3V/y8lxeSLmdX ewpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AgcVBHh0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g7-20020a632007000000b00434e8762271si1118249pgg.526.2022.09.19.16.20.25; Mon, 19 Sep 2022 16:20:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AgcVBHh0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229678AbiISXCD (ORCPT + 99 others); Mon, 19 Sep 2022 19:02:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229548AbiISXCB (ORCPT ); Mon, 19 Sep 2022 19:02:01 -0400 Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 996CE43E50 for ; Mon, 19 Sep 2022 16:01:59 -0700 (PDT) Received: by mail-qt1-x82a.google.com with SMTP id c11so609716qtw.8 for ; Mon, 19 Sep 2022 16:01:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date; bh=xixo44lTcERHPiUV64xTDRqasHw9DlQyVVg8mpJU9es=; b=AgcVBHh0Hw9LMN3EKAyNefT6Y8I8kX0c8Hih+eW5gYpAM+bvvfQMe4A3kV5CH89Fzq 0gQQDRzPET7tFaAMhsDE9ZzQA5NhHbKpj7bed/yRGTBX0hETdK9Gf9pHdAZknnFA9yRj SHnp0B0Us1AUQUod4cBI4+D316FWpO/dKexSTqaQPCZiiEpYhLekHLbITzKvDzNJl4Bx lo1nvff+zjopDK3RHPeWBq/xC+Q/nZIt05shBR2esadkg7leu8eUMq695diDosIWpat5 jHEP4W3/hXbE0dfpXOGP/EYHzY8r18TZS+El7G3hVR3IxIxIjOC8B0XkVV9Qvt/Gwyy0 52DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date; bh=xixo44lTcERHPiUV64xTDRqasHw9DlQyVVg8mpJU9es=; b=DAkHhUpIS8nmGLRk0hlg21ARVUCQUaDP1nht4QUiKyA01//pneh42lf6JfCPZ9bcOd qMu7fF2EhMnOpqfcGHSvNUOKSw3wE7TV6YkOF6k9ipprOuC74ae7k36Z/FnY8r/mc5ov bP8LcvoY3ptTLUTkI8mBjPn18DF8RtuoMZpcrQEQGfL6qj3rTB12wcjzKslJ/HWa4+ch IgnF/ItWnWBicAqKW3n5YW5kn7Ml/igJfU9XvYtofRuq4m6L07cdq+oOovWy6EnYtCk1 WtoXt73g2ytGk7pF7ugGwBLi7a4j9uOSClX3Qr7q1wZK6EipsSJSE9c6LvJCiUTKmTmI yv9A== X-Gm-Message-State: ACrzQf3yU0RX2xfbz3KXNrXY7VFIzWGSHc0G1sOfcqaR6cGK4XQ8hF8d /T8xWY9UJSRgxsuLSR6BjIBX2g== X-Received: by 2002:a05:622a:1015:b0:35c:e915:3b61 with SMTP id d21-20020a05622a101500b0035ce9153b61mr6086559qte.572.1663628518654; Mon, 19 Sep 2022 16:01:58 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id cc19-20020a05622a411300b0031e9ab4e4cesm8099175qtb.26.2022.09.19.16.01.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Sep 2022 16:01:57 -0700 (PDT) Date: Mon, 19 Sep 2022 16:01:39 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Keith Busch cc: Hugh Dickins , Jens Axboe , Yu Kuai , Jan Kara , Liu Song , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH next] sbitmap: fix lockup while swapping In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Sep 2022, Keith Busch wrote: > On Sun, Sep 18, 2022 at 02:10:51PM -0700, Hugh Dickins wrote: > > I have almost no grasp of all the possible sbitmap races, and their > > consequences: but using the same !waitqueue_active() check as used > > elsewhere, fixes the lockup and shows no adverse consequence for me. > > > > Fixes: 4acb83417cad ("sbitmap: fix batched wait_cnt accounting") > > Signed-off-by: Hugh Dickins > > --- > > > > lib/sbitmap.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > --- a/lib/sbitmap.c > > +++ b/lib/sbitmap.c > > @@ -620,7 +620,7 @@ static bool __sbq_wake_up(struct sbitmap > > * function again to wakeup a new batch on a different 'ws'. > > */ > > if (cur == 0) > > - return true; > > + return !waitqueue_active(&ws->wait); > > If it's 0, that is supposed to mean another thread is about to make it not zero > as well as increment the wakestate index. That should be happening after patch > 48c033314f37 was included, at least. I believe that the thread about to make wait_cnt not zero (and increment the wakestate index) is precisely this interrupted thread: the backtrace shows that it had just done its wakeups, so has not yet reached making wait_cnt not zero; and I suppose that either its wakeups did not empty the waitqueue completely, or another waiter got added as soon as it dropped the spinlock. > > Prior to 4acb83417cad, the code would also return 'true' if the count was > already zero, and this batched code wasn't supposed to behave any different in > that condition. In principle yes, but in practice no. Prior to 4acb83417cad, the swapping load would run okayish for a few minutes, before freezing up mysteriously (presumably due to missed wakeups). The "ish" in "okayish" because the system time was abnormally high, and occasionally there was an odd message from systemd about killing its journal or something - 4acb83417cad saved me from having to investigate that further. Prior to 4acb83417cad, it never locked up looping on wait_cnt < 0; after 4acb83417cad, it would lock up on wait_cnt 0 in a few seconds. But in writing that, and remembering the abnormal systime, I begin to suspect that it might have often been in a tight loop on wait_cnt < 0, but the batch accounting sufficiently wrong that it always got rescued by an unrelated wakeup (shifting wakestate index), before any lockup ever got observed and reported. Or something like that. (And I'm trying to avoid making a fool of myself with the arithmetic: how quickly would wait_cnt 0 have got decremented to positive before?) I won't mind Jens deleting that "Fixes: 4acb83417cad" if it's unfair. > > Anyway, I don't think the waitqueue_active criteria of the current waitstate is > correct either. The next waitstate may have waiters too, so we still may need > to account for this batch's count in order to wake them. I cannot usefully comment on that, it's all rather too subtle for me. But I did wonder if each of those !waitqueue_active()s would be better replaced just by "false"s: we only get that far into __sbq_wake_up() if waitqueue_active(), so the !waitqueue_active() case reflects a race: a possible race, yes, but a race that wants precise accounting at a few imprecise locations? Hugh