Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp873827pxb; Wed, 15 Sep 2021 15:39:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvA3jJncsOFbhmdz0RxSUN7khrKsl9JVGszFsJjC+DjtvefcvjAxG6aeimVgZKnT1YlvMK X-Received: by 2002:a05:6402:54:: with SMTP id f20mr2760066edu.382.1631745574396; Wed, 15 Sep 2021 15:39:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631745574; cv=none; d=google.com; s=arc-20160816; b=iJ5Cmak+4LQAln5X1eSA4K8HkD1yAP3sqKOEbkBopJ3FSxqmHo4OswM6haM5gOHDps 3PvRyOY5Bzjy83ZKz12zsswMDF3Ue6WeYeXGX05g/MiCh3gkq2l1C2wKmpTiH6Sy7VwZ UQT7Vp7Mf3SPRVRbVacSJjJLo1tLvOVURAG/lSM+7aK/n17NXMd1OIo+iU15wtuxfY8X +N+3tU1+tQvzErDWA1qZglxbHxJdetNd6QP8vYn/OCzCetLMxkCvkEo8OzbqjWSdhsvD MJU2+wpjTwr98RDrbWTDpqK7Dd9iUu+fCEj9eul7N8u4fMJPhpVr6zjfhZdx0jX/ghCK tlHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=U3/IeGCU56hqKJjN1N+JjoNROsf4toiw7CvgY7FU4aQ=; b=a6khqxtZVzSix/ftqk6yRf3HfcC3neAN6fW4iagJPiVHfvu9wOFvkpNvX88xbpdnmj w2PneYlgXQiA9Br7PjRsfwsgu2xnQRmtcVdfDmm0tVY+ta4zN6lh9IiRnbHK9fWFy7qL 2Ag1XoXBTblWJisTlUCAaU2oknh8IjnCMT5zhl2lU4Uc19JUHMOG9rGaHHRBdYbfweSD a+p9cSVw2yLmDIdUGxsEV23SOEg/bjbyaDu8AKv36inR1Ng2Cu8/zU1CmbpJybtuJ53I IuppyuhPAsjF3mgOtB8jiO8CeLwXNFutzgCHAEjCSzJCupvHh3swgntYWN77CnJKGBI9 3DFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hz15si1465113ejc.649.2021.09.15.15.39.10; Wed, 15 Sep 2021 15:39:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbhIOWk0 (ORCPT + 99 others); Wed, 15 Sep 2021 18:40:26 -0400 Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:37072 "EHLO mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229538AbhIOWk0 (ORCPT ); Wed, 15 Sep 2021 18:40:26 -0400 Received: from dread.disaster.area (pa49-195-238-16.pa.nsw.optusnet.com.au [49.195.238.16]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 4357499FB; Thu, 16 Sep 2021 08:39:00 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1mQdYQ-00CvLU-9f; Thu, 16 Sep 2021 08:38:58 +1000 Date: Thu, 16 Sep 2021 08:38:58 +1000 From: Dave Chinner To: Mel Gorman Cc: NeilBrown , Andrew Morton , Theodore Ts'o , Andreas Dilger , "Darrick J. Wong" , Jan Kara , Michal Hocko , Matthew Wilcox , linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/6] EXT4: Remove ENOMEM/congestion_wait() loops. Message-ID: <20210915223858.GM2361455@dread.disaster.area> References: <163157808321.13293.486682642188075090.stgit@noble.brown> <163157838437.13293.14244628630141187199.stgit@noble.brown> <20210914163432.GR3828@suse.com> <20210914235535.GL2361455@dread.disaster.area> <20210915085904.GU3828@suse.com> <20210915143510.GE3959@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210915143510.GE3959@techsingularity.net> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 a=DzKKRZjfViQTE5W6EVc0VA==:117 a=DzKKRZjfViQTE5W6EVc0VA==:17 a=kj9zAlcOel0A:10 a=7QKq2e-ADPsA:10 a=7-415B0cAAAA:8 a=rs-OYbL5b6b9dbSePnwA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Sep 15, 2021 at 03:35:10PM +0100, Mel Gorman wrote: > On Wed, Sep 15, 2021 at 09:59:04AM +0100, Mel Gorman wrote: > > > Yup, that's what we need, but I don't see why it needs to be exposed > > > outside the allocation code at all. > > > > > > > Probably not. At least some of it could be contained within reclaim > > itself to block when reclaim is not making progress as opposed to anything > > congestion related. That might still livelock if no progress can be made > > but that's not new, the OOM hammer should eventually kick in. > > > > There are two sides to the reclaim-related throttling > > 1. throttling because zero progress is being made > 2. throttling because there are too many dirty pages or pages under > writeback cycling through the LRU too quickly. > > The dirty page aspects (and the removal of wait_iff_congested which is > almost completely broken) could be done with something like the following > (completly untested). The downside is that end_page_writeback() takes an > atomic penalty if reclaim is throttled but at that point the system is > struggling anyway so I doubt it matters. The atomics are pretty nasty, as is directly accessing the pgdat on every call to end_page_writeback(). Those will be performance limiting factors. Indeed, we don't use atomics for dirty page throttling, which does dirty page accounting via percpu counters on the BDI and doesn't require wakeups. Also, we've already got per-node and per-zone counters there for dirty/write pending stats, so do we actually need new counters and wakeups here? i.e. balance_dirty_pages() does not have an explicit wakeup - it bases it's sleep time on the (memcg aware) measured writeback rate on the BDI the page belongs to and the amount of outstanding dirty data on that BDI. i.e. it estimates fairly accurately what the wait time for this task should be given the dirty page demand and current writeback progress being made is and just sleeps for that length of time. Ideally, that's what should be happening here - we should be able to calculate a page cleaning rate estimation and then base the sleep time on that. No wakeups needed - when we've waited for the estimated time, we try to reclaim again... In fact, why can't this "too many dirty pages" case just use the balance_dirty_pages() infrastructure to do the "wait for writeback" reclaim backoff? Why do we even need to re-invent the wheel here? > diff --git a/mm/filemap.c b/mm/filemap.c > index dae481293b5d..b9be9afa4308 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1606,6 +1606,8 @@ void end_page_writeback(struct page *page) > smp_mb__after_atomic(); > wake_up_page(page, PG_writeback); > put_page(page); > + > + acct_reclaim_writeback(page); UAF - that would need to be before the put_page() call... Cheers, Dave. -- Dave Chinner david@fromorbit.com