Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4956673pxb; Mon, 15 Feb 2021 06:01:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJy9HHn4l5bmMHJziG4BL8pGilHPUfzuW3fGr2IgUebImAptfGyrA3yYyMR5iq8GcCESnr+4 X-Received: by 2002:a05:6512:3046:: with SMTP id b6mr9301237lfb.407.1613397703246; Mon, 15 Feb 2021 06:01:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613397703; cv=none; d=google.com; s=arc-20160816; b=jDAIXBIDfaMDbBidkgTU8iMAiXLobkLAZbIfTHHj2zmKpnAuRQhZyCXZsYrKGSFQuZ xQ50PVjCGYNcwzJmx8UIabJ5mSzMnuiwGTrLTQqnQp45aDmlXrLhsz2AIAQErrHj2v16 vIYRj1mCOTSbMxTZoaB0i3M00i+ZQdIhtBy0rN1/yhteQbDKtjDfkLNyZ9FEcTmB2Cbp B1dtECwOa0gUI9QJViPnzfG1HFZnCkQkm4W5Zhi0VPmNCX2q/iyA3K1/rnBvZ+0SVFxV tJQZKcaHtg9coDHn4uvvVkiNlnFxB0yq+0BRzsGeZc+lhjk9YO3V+2+VzcneQmnuRmSG YYlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=bB2DmOQ00pmntVFQyrZUeCHIw7TecIIWihaOtF+ORNA=; b=hQ016ONnysQaQYheAJEFVgQ6cjAsn+K7Vj/OVaebJEKs5YAnv9QA/3wKPdazzdtpT2 PFkoAg/IFHHmpk27Hbl+Pedogelkh3KeFVmOsl9b8cvTKRdyYAbeuqeLWa4VihWnugQE FEG9Jne/W3eReFHwHUfC/7xFTHdNnlEE4QUvfgdpaFAluwzF+t3aLNXwO+6azZPzW+B2 SbLPoztxDS1QCBVYtcZvAbAmUlT9OYx0F6iKxkLpko+lkhCxAJGQA73QP/EP1I/711gQ crQ/TnhLeNxFvm0S4OuD0xMGH5Ps2L7q1klSkOUYiBSKZlFBloS2IhnwciAS4Pvy6pvy fjBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b4si13493809edn.586.2021.02.15.06.01.20; Mon, 15 Feb 2021 06:01:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230142AbhBOOAW (ORCPT + 99 others); Mon, 15 Feb 2021 09:00:22 -0500 Received: from mx3.molgen.mpg.de ([141.14.17.11]:40977 "EHLO mx1.molgen.mpg.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S229908AbhBOOAU (ORCPT ); Mon, 15 Feb 2021 09:00:20 -0500 Received: from [192.168.0.5] (ip5f5aed2c.dynamic.kabel-deutschland.de [95.90.237.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: buczek) by mx.molgen.mpg.de (Postfix) with ESMTPSA id BDAB32064792F; Mon, 15 Feb 2021 14:36:38 +0100 (CET) Subject: Re: [PATCH] xfs: Wake CIL push waiters more reliably To: Dave Chinner , Brian Foster Cc: linux-xfs@vger.kernel.org, Linux Kernel Mailing List , it+linux-xfs@molgen.mpg.de References: <1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de> <20201229235627.33289-1-buczek@molgen.mpg.de> <20201230221611.GC164134@dread.disaster.area> <20210104162353.GA254939@bfoster> <20210107215444.GG331610@dread.disaster.area> <20210108165657.GC893097@bfoster> <20210111163848.GC1091932@bfoster> <20210113215348.GI331610@dread.disaster.area> From: Donald Buczek Message-ID: <8416da5f-e8e5-8ec6-df3e-5ca89339359c@molgen.mpg.de> Date: Mon, 15 Feb 2021 14:36:38 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210113215348.GI331610@dread.disaster.area> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13.01.21 22:53, Dave Chinner wrote: > [...] > I agree that a throttling fix is needed, but I'm trying to > understand the scope and breadth of the problem first instead of > jumping the gun and making the wrong fix for the wrong reasons that > just papers over the underlying problems that the throttling bug has > made us aware of... Are you still working on this? If it takes more time to understand the potential underlying problem, the fix for the problem at hand should be applied. This is a real world problem, accidentally found in the wild. It appears very rarely, but it freezes a filesystem or the whole system. It exists in 5.7 , 5.8 , 5.9 , 5.10 and 5.11 and is caused by c7f87f3984cf ("xfs: fix use-after-free on CIL context on shutdown") which silently added a condition to the wakeup. The condition is based on a wrong assumption. Why is this "papering over"? If a reminder was needed, there were better ways than randomly hanging the system. Why is if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) wake_up_all(&cil->xc_push_wait); , which doesn't work reliably, preferable to if (waitqueue_active(&cil->xc_push_wait)) wake_up_all(&cil->xc_push_wait); which does? Best Donald > Cheers, > > Dave