Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp224408lqs; Mon, 4 Mar 2024 23:07:58 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCW7c68W6hY+mdyA/T/+vbObYTeNw2ug+ur6EK2iuJv/oZ+FqPugicslHLur+u13KZkxisG7ycPAp5ZV0J2CrS7BaZ9hnki15PEDMszovA== X-Google-Smtp-Source: AGHT+IHGAjP7CU+KrYKQyrcF4vVsV8LUcW5tK2HTbJgEOfgPn3SamZGB16FgN1yhT7C8f8khKcTn X-Received: by 2002:a0c:fb10:0:b0:690:85d4:96f4 with SMTP id c16-20020a0cfb10000000b0069085d496f4mr1173492qvp.24.1709622477891; Mon, 04 Mar 2024 23:07:57 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709622477; cv=pass; d=google.com; s=arc-20160816; b=MbWJjirlayAzQ1PcBv02kyOJDdo25YDxYU0ddfnHg1aBd8f1XIVpfOmAOa/qyLMdVM HgMCNqd+djYpmo3JEMECZ7jiBjg71pwisetWOsCzzy6xlf0+BNTDZ7Ybbf1qLHzAmK7f CTxHNq4m/n3voIgNeHh4VpjL+42UZ5yHJsRtVJZB1B1RMvvpeXEiqgnHIQU3Yc+PpZoP FgY8OMmRyhiaHk3N8w+s0egN9QQi9uc/vSPFqdwOd1EsXQ3aMr3UC2/U1zXPbfeUzKPZ wUSrigqkKnlMEX5XrD/Lh5uHOxOyeR65XcWRNcNnniwzNtvOs4SVtXZjQuS6F0RBpMKL MyUg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:user-agent:date:message-id:from :cc:references:to:subject; bh=xfrPiQevPHUt1nE7WJ8TIdMXCt4bZV4rNMc/WAPuy4I=; fh=hbv5MRO0MRuOc2k3N5+jAm3sTuZnEz0GSjAXme45B2k=; b=un9DznY/T0pafsbQq1Pvq+GXuKPyufsnsS12BFbMq8+VJIE6weoTI4iwCtmKiVoXtk s/CekGIJQPzSYprAQHuTPDpq6WCQinZczFeTW+fPD5JRLcJ8gsdZ8HSKCFbPMzqU8ncQ wwwshIERoeRTNffk+/fFLKf3YLOBoZ2i/mDMU4z3fGLhMm+onrXRXNgi6p9Y91g0frIc fTrKDnFrNdv5vnXVpngrGeM2y1oJR16WnvMiTqfQ/n+FYRlkHMrdUU5wTBuQI6BYBzOT PoIrG2mVtkmigN31XMDHs1lTK6by49sMD1S8zY/dAh0xoyx82VcXjwrKqEH1o3Zw0UXG 3T/w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-91765-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-91765-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id on33-20020a05621444a100b0068edf63c212si11391234qvb.152.2024.03.04.23.07.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 23:07:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-91765-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-91765-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-91765-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9D0DD1C21DA5 for ; Tue, 5 Mar 2024 07:07:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A611D7E57D; Tue, 5 Mar 2024 07:07:37 +0000 (UTC) Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1320E45005; Tue, 5 Mar 2024 07:07:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.35 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709622457; cv=none; b=gCQTGAtSIWQmbTeQz59z/gHRVqjeSdkkR8qrs8yh8LoWaBClwV32tFe5AFfGDZKxKp/uh0SCHFxkuCKLEOXB9A9MSR4sy4fFWxwPHv4PBcLh06bVJM717+/KqfF36FcwDU30Q3oCFRFcJA+kM6jcaXggt+36fn0hBZRlyJr0L9s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709622457; c=relaxed/simple; bh=DXNf5QUK3RsSzXyDuaAMlqnL7FiXPA4FMKLeC4mq8Kk=; h=Subject:To:References:CC:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=ebkjlHe5fnD61SgKhg3M8ilzTcE4LIOYm80R+RCtZoasSdaUbPmVOpAorropgQHcZlQfKFrasmg/IFZGLuMuqgdGUlXW7Cv2htHZfIy/f8Yzk7f4pE1gBEqbPeBrbt42irz2+QYAz64yeAynwhjU3wSuJU644Uq9Td52embsj8w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Tpmm70PXPz1Q9sl; Tue, 5 Mar 2024 15:05:11 +0800 (CST) Received: from dggpemd200004.china.huawei.com (unknown [7.185.36.141]) by mail.maildlp.com (Postfix) with ESMTPS id 808D61402CE; Tue, 5 Mar 2024 15:07:31 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by dggpemd200004.china.huawei.com (7.185.36.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 5 Mar 2024 15:07:30 +0800 Subject: Re: [PATCH 2/2] mm/readahead: limit sync readahead while too many active refault To: Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , Andrew Morton References: <20240201100835.1626685-1-liushixin2@huawei.com> <20240201100835.1626685-3-liushixin2@huawei.com> CC: , , From: Liu Shixin Message-ID: <09e871aa-bbe6-47a8-4aea-e2a1674366a1@huawei.com> Date: Tue, 5 Mar 2024 15:07:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <20240201100835.1626685-3-liushixin2@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemd200004.china.huawei.com (7.185.36.141) Hi, Jan, All, Please take a look at this patch again. Although this may not be a graceful way. I can't think any other way to fix the problem except using workingset. Thanks, On 2024/2/1 18:08, Liu Shixin wrote: > When the pagefault is not for write and the refault distance is close, > the page will be activated directly. If there are too many such pages in > a file, that means the pages may be reclaimed immediately. > In such situation, there is no positive effect to read-ahead since it will > only waste IO. So collect the number of such pages and when the number is > too large, stop bothering with read-ahead for a while until it decreased > automatically. > > Define 'too large' as 10000 experientially, which can solves the problem > and does not affect by the occasional active refault. > > Signed-off-by: Liu Shixin > --- > include/linux/fs.h | 2 ++ > include/linux/pagemap.h | 1 + > mm/filemap.c | 16 ++++++++++++++++ > mm/readahead.c | 4 ++++ > 4 files changed, 23 insertions(+) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index ed5966a704951..f2a1825442f5a 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -960,6 +960,7 @@ struct fown_struct { > * the first of these pages is accessed. > * @ra_pages: Maximum size of a readahead request, copied from the bdi. > * @mmap_miss: How many mmap accesses missed in the page cache. > + * @active_refault: Number of active page refault. > * @prev_pos: The last byte in the most recent read request. > * > * When this structure is passed to ->readahead(), the "most recent" > @@ -971,6 +972,7 @@ struct file_ra_state { > unsigned int async_size; > unsigned int ra_pages; > unsigned int mmap_miss; > + unsigned int active_refault; > loff_t prev_pos; > }; > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 2df35e65557d2..da9eaf985dec4 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -1256,6 +1256,7 @@ struct readahead_control { > pgoff_t _index; > unsigned int _nr_pages; > unsigned int _batch_count; > + unsigned int _active_refault; > bool _workingset; > unsigned long _pflags; > }; > diff --git a/mm/filemap.c b/mm/filemap.c > index 750e779c23db7..4de80592ab270 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3037,6 +3037,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +#define ACTIVE_REFAULT_LIMIT (10000) > /* > * lock_folio_maybe_drop_mmap - lock the page, possibly dropping the mmap_lock > * @vmf - the vm_fault for this fault. > @@ -3142,6 +3143,18 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > if (mmap_miss > MMAP_LOTSAMISS) > return fpin; > > + ractl._active_refault = READ_ONCE(ra->active_refault); > + if (ractl._active_refault) > + WRITE_ONCE(ra->active_refault, --ractl._active_refault); > + > + /* > + * If there are a lot of refault of active pages in this file, > + * that means the memory reclaim is ongoing. Stop bothering with > + * read-ahead since it will only waste IO. > + */ > + if (ractl._active_refault >= ACTIVE_REFAULT_LIMIT) > + return fpin; > + > /* > * mmap read-around > */ > @@ -3151,6 +3164,9 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > ra->async_size = ra->ra_pages / 4; > ractl._index = ra->start; > page_cache_ra_order(&ractl, ra, 0); > + > + WRITE_ONCE(ra->active_refault, ractl._active_refault); > + > return fpin; > } > > diff --git a/mm/readahead.c b/mm/readahead.c > index cc4abb67eb223..d79bb70a232c4 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -263,6 +263,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > folio_set_readahead(folio); > ractl->_workingset |= folio_test_workingset(folio); > ractl->_nr_pages++; > + if (unlikely(folio_test_workingset(folio))) > + ractl->_active_refault++; > + else if (unlikely(ractl->_active_refault)) > + ractl->_active_refault--; > } > > /*