Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5668355pxb; Tue, 16 Feb 2021 04:44:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJwFExm7zMP0EcL1jbRteS8Z5RUjcXmrr4Tv5wKhcDW9aqyC02kASgmSntiOqv+XgS9gfcnQ X-Received: by 2002:a17:906:128e:: with SMTP id k14mr19755662ejb.427.1613479490418; Tue, 16 Feb 2021 04:44:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613479490; cv=none; d=google.com; s=arc-20160816; b=KoA3FhE7ANDjzuQRR42iNfaV2YE5rwtwAR+i5fMkkOevhC4POC7FPxzLKO/l0RaHCT eUcb0PisqQkq0cyh8FDF+JZpI1vHKOqvK4ffia0Jnca/IuvKi/BIkv7xVbo7twl5ZDVU 9Vni0ke6s/24YEpN8PEVxTvlpI2DkiBkElHW+gKA3hFTpmtn0HXiBHKByEVyBxTQ1nO5 zl4nUBqT9gzpCUCS00OkJqK4JQKvgK/hTnMpbBZJms507yfNoIH1gcwXQ12frxe9Cu0A 5eqsZQQ93kCN4BGUp6RmbCcGGa+YtlIoPI+9oFEZgqyADEM1Y8sK6f1eBbb3+rNx8bKw jRbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=sBG6a+QpeNP6/eTpQRj2CAfPlFZn3OKQsZGTBl2HJwU=; b=MA42mLPI1URQQa3zBmPe81XW75pkCnvD1FjEk3Z4gdq71yDXEsOEnw+k6CvGt9m2nm 0EyzSea7TF3dRewXbKkI7TOm8vxvhjUxKCqfbN5YsWlONzQVP99MqgAenauvVwKU/+UK d4RdM+EJ/s0ri+6rt+Rv1+4uJ4BjcNMe27eE/7/uuf7xAr+sMFjuFYkIqUOe4VW1IUGB j5lxAuGj2c21h1arOLunq9Z5XVB07mYwoOfNW9N1kNMSiqdSFaUgtvoed5nqhC0EnPUl YU8CWEc8jyrvAnWzKVasiTQnTcX5HgjsK5PfR0BaOfkjKiTP0+i3Gdkk4vvsyfXV4Zy4 Cs1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l14si2050383edr.488.2021.02.16.04.44.27; Tue, 16 Feb 2021 04:44:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbhBPMlb (ORCPT + 99 others); Tue, 16 Feb 2021 07:41:31 -0500 Received: from mx3.molgen.mpg.de ([141.14.17.11]:60349 "EHLO mx1.molgen.mpg.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229931AbhBPMlU (ORCPT ); Tue, 16 Feb 2021 07:41:20 -0500 Received: from [192.168.0.5] (ip5f5aed2c.dynamic.kabel-deutschland.de [95.90.237.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: buczek) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 8DDD320647935; Tue, 16 Feb 2021 13:40:36 +0100 (CET) Subject: Re: [PATCH] xfs: Wake CIL push waiters more reliably To: Brian Foster Cc: Dave Chinner , linux-xfs@vger.kernel.org, Linux Kernel Mailing List , it+linux-xfs@molgen.mpg.de References: <1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de> <20201229235627.33289-1-buczek@molgen.mpg.de> <20201230221611.GC164134@dread.disaster.area> <20210104162353.GA254939@bfoster> <20210107215444.GG331610@dread.disaster.area> <20210108165657.GC893097@bfoster> <20210111163848.GC1091932@bfoster> <20210113215348.GI331610@dread.disaster.area> <8416da5f-e8e5-8ec6-df3e-5ca89339359c@molgen.mpg.de> <20210216111820.GA534175@bfoster> From: Donald Buczek Message-ID: Date: Tue, 16 Feb 2021 13:40:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210216111820.GA534175@bfoster> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16.02.21 12:18, Brian Foster wrote: > On Mon, Feb 15, 2021 at 02:36:38PM +0100, Donald Buczek wrote: >> On 13.01.21 22:53, Dave Chinner wrote: >>> [...] >>> I agree that a throttling fix is needed, but I'm trying to >>> understand the scope and breadth of the problem first instead of >>> jumping the gun and making the wrong fix for the wrong reasons that >>> just papers over the underlying problems that the throttling bug has >>> made us aware of... >> >> Are you still working on this? >> >> If it takes more time to understand the potential underlying problem, the fix for the problem at hand should be applied. >> >> This is a real world problem, accidentally found in the wild. It appears very rarely, but it freezes a filesystem or the whole system. It exists in 5.7 , 5.8 , 5.9 , 5.10 and 5.11 and is caused by c7f87f3984cf ("xfs: fix use-after-free on CIL context on shutdown") which silently added a condition to the wakeup. The condition is based on a wrong assumption. >> >> Why is this "papering over"? If a reminder was needed, there were better ways than randomly hanging the system. >> >> Why is >> >> if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) >> wake_up_all(&cil->xc_push_wait); >> >> , which doesn't work reliably, preferable to >> >> if (waitqueue_active(&cil->xc_push_wait)) >> wake_up_all(&cil->xc_push_wait); >> >> which does? >> > > JFYI, Dave followed up with a patch a couple weeks or so ago: > > https://lore.kernel.org/linux-xfs/20210128044154.806715-5-david@fromorbit.com/ Oh, great. I apologize for the unneeded reminder. Best Donald > > Brian > >> Best >> Donald >> >>> Cheers, >>> >>> Dave >> >