Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp132551rdb; Sun, 28 Jan 2024 17:48:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IH/G2yuD1Io+3m7FPFzV/+3M7GhusQiAvXNrtIyenL59Bpz14U3Ot0QhtjhDnr3bFdzx2NC X-Received: by 2002:ad4:5aa2:0:b0:685:7c23:1e0f with SMTP id u2-20020ad45aa2000000b006857c231e0fmr6125544qvg.10.1706492880800; Sun, 28 Jan 2024 17:48:00 -0800 (PST) Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id iw13-20020a0562140f2d00b006810bc0e406si6535232qvb.374.2024.01.28.17.48.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 17:48:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42072-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=2G3BDLap; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-42072-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42072-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 8CF831C20E62 for ; Mon, 29 Jan 2024 01:48:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B4D9CE556; Mon, 29 Jan 2024 01:47:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="2G3BDLap" Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58F93DDCA for ; Mon, 29 Jan 2024 01:47:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706492869; cv=none; b=NB2XdcDEpt5WpaaZON7eAAymemcTG9H87dIxvEmQNpL/cp5Ybjk92vi/3jaACi478rd14BrXghLfbT+pvl52cE6vWclLvs4xUQC+Gphw8utZ+6o7enJkKrEoczmcWSjrgQmxZfg3fOnZiA95/mr5Q6dpBDdLdTVhOqwYPTZ4iDs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706492869; c=relaxed/simple; bh=LZpXrEqXUBZjxYKaP7iXfH75PEgXom/gvKUA5L2KGOo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=LgAg3Jq6feomhN2KK5FIf7M2VnmzQqLA2hM7hJ+oU80rsa90QGxGP7kuNDQG7AP09mOKHrCWvCoZ+DghTrFNTMqEtSnv/0GkCOLRGDgvLWILqEQWTujPVrXwDhYau7aVbompdIpkVp5EQlY8x6WUBb1Qz+Hgzyns0LVrgguQtLg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=2G3BDLap; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1d7354ba334so17544585ad.1 for ; Sun, 28 Jan 2024 17:47:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1706492867; x=1707097667; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=/HXe1ZkYYO9fD1r0+7cwZjBt+if+SneUe0KqnaZMV1Y=; b=2G3BDLapJJRYmG8he6BHWioT/r75gg2Gw+k5eCmjM3jyCwMaTXdzB6FtRC5NbREqhP H5sbyadveC7rNVrjAbKlPltPcd63HMo8pBTv/eUPbUXAtu7AAvviBpJ0nu5ZKiJl/12w rhM0xVA5nAUY0jGZViAxhrcNeQ2G+zHZ6waY0xtEVfzD1A54XxwVCCbLG0bYYX/uMS1E nGEe6SbsRd5ex3fwJfIXfqK49KflqgqSozDyYIf7uSD1oo54tcByw5KmDnhOhnM46uDg rMi8BznmQRIjH0BuscQKXLufbzejxw7A8/hSne933wfUDO+T4b5eoAna3lH7BO0wkorl Feyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706492867; x=1707097667; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/HXe1ZkYYO9fD1r0+7cwZjBt+if+SneUe0KqnaZMV1Y=; b=nP4NPnv/GudIp7oIYm0qnvrLh+xi2TVIEDb2eXf5NP9+VgEj2hhBIejHfb/t75/PrS 6xggeL2cOzqWJJWmW8QlqMuRhOxvX0lwRALUV9KLHUcThx6AkEMh6mAMUuX2SDiGrqf+ Q5nKU1/5Epl6E2WOmQCuOl3mmtd2lRv+zBQ5PCIaaOVdx8DqAaH9BnSfND82Zug6HzIz TAQItzdtJJ8j90hQ8qMkKpZixy162PDh+4bQ37/sXFzrFEK6Z1OZmfD7SFYtJU4/m2EK hJ4nQhxl63mRpFw7AF4jXhydsNaLUkxpdCDKrDO8hNxShj4hqPG9Wu6zkC3oStHOVfX/ awBg== X-Gm-Message-State: AOJu0YyERAy5Y+2QYY02WvTEWA1B7R1xa3OSXyNsp2MxGijJM2qD9xen a9jNVXF/IAjImGBcipx66rN8NuxpNh8ZF5yLUGi/4BvH4MAavPSw5+KD47ZJq4o= X-Received: by 2002:a17:902:d4c9:b0:1d8:c3be:8f10 with SMTP id o9-20020a170902d4c900b001d8c3be8f10mr3714366plg.46.1706492867567; Sun, 28 Jan 2024 17:47:47 -0800 (PST) Received: from dread.disaster.area (pa49-181-38-249.pa.nsw.optusnet.com.au. [49.181.38.249]) by smtp.gmail.com with ESMTPSA id iz3-20020a170902ef8300b001d8e974ed2fsm292730plb.284.2024.01.28.17.47.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 17:47:47 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rUGkP-00GfP7-0g; Mon, 29 Jan 2024 12:47:41 +1100 Date: Mon, 29 Jan 2024 12:47:41 +1100 From: Dave Chinner To: Mike Snitzer Cc: Matthew Wilcox , Ming Lei , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Sun, Jan 28, 2024 at 07:39:49PM -0500, Mike Snitzer wrote: > On Sun, Jan 28, 2024 at 7:22 PM Matthew Wilcox wrote: > > > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > > On Sun, Jan 28 2024 at 5:02P -0500, > > > Matthew Wilcox wrote: > > Understood. But ... the application is asking for as much readahead as > > possible, and the sysadmin has said "Don't readahead more than 64kB at > > a time". So why will we not get a bug report in 1-15 years time saying > > "I put a limit on readahead and the kernel is ignoring it"? I think > > typically we allow the sysadmin to override application requests, > > don't we? > > The application isn't knowingly asking for readahead. It is asking to > mmap the file (and reporter wants it done as quickly as possible.. > like occurred before). .. which we do within the constraints of the given configuration. > This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap > request size based on read-ahead setting") -- same logic, just applied > to callchain that ends up using madvise(MADV_WILLNEED). Not really. There is a difference between performing a synchronous read IO here that we must complete, compared to optimistic asynchronous read-ahead which we can fail or toss away without the user ever seeing the data the IO returned. We want required IO to be done in as few, larger IOs as possible, and not be limited by constraints placed on background optimistic IOs. madvise(WILLNEED) is optimistic IO - there is no requirement that it complete the data reads successfully. If the data is actually required, we'll guarantee completion when the user accesses it, not when madvise() is called. IOWs, madvise is async readahead, and so really should be constrained by readahead bounds and not user IO bounds. We could change this behaviour for madvise of large ranges that we force into the page cache by ignoring device readahead bounds, but I'm not sure we want to do this in general. Perhaps fadvise/madvise(willneed) can fiddle the file f_ra.ra_pages value in this situation to override the device limit for large ranges (for some definition of large - say 10x bdi->ra_pages) and restore it once the readahead operation is done. This would make it behave less like readahead and more like a user read from an IO perspective... -Dave. -- Dave Chinner david@fromorbit.com