Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp168434rdb; Sun, 28 Jan 2024 19:58:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IFMzy+OH7vaW1hazEUCWgq3P1uS2kyScibb/ezgRffeUNGew7e/nbC4Q+vpL9VeQTm9ZoQq X-Received: by 2002:a17:906:724b:b0:a28:8dc:455a with SMTP id n11-20020a170906724b00b00a2808dc455amr3537485ejk.48.1706500704823; Sun, 28 Jan 2024 19:58:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706500704; cv=pass; d=google.com; s=arc-20160816; b=lH82xJ7RjPavxpBJ7Cu7diZ/z+cm6I7N6q85hHwb4Ecxc2nVz/7R00x4DfPloz7zB6 ysCEEN8i+iZBnPVPHn9dnRVCk1vn0VG4itMIonG3VLsdn1lsLKg8pR9vl/mMdU0N8uzY H93CGLksf4MXOfKyDAUORkamOOOpdXtosOqtisSCj0zjcVTgErRWwYY/Z8L6JJrFoAwm +2UDSfNvz5Ft1SmAJK9pQFYYBgUqAPtBQTAmMvv5F/Jlki35k/HKExpM1Slz60iFKWdO ncDwDglE6m/l2flAY0EFTaXennnXbTa4jTR5tTQF650inv7530Om2C9CBlu3FeuZo0kl 2b4g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=n/MxaP1kzI5B5Dd1Ifm/aTvDRwpaQpXdO3YcG9Kmi5A=; fh=UYTD3999j5NohqOp9wuxOnilwNnymJ0R9tUHsJNGjIg=; b=i/+IBOWl+pQeWBPXXdmPjhJQHZ+/XvRDDQL48qHlKIM8RPVOfCu+6tcOjFdvwi7NtE MiElIqc7nOPaehN6amSQhxIHuxHE44k/eqx3RQ+5in49UClYsr1ti+SNBIC3eFfQJZh1 iu1HUhPavt3S0X5Iz6poxl+0rWnoufMN7wSdaVZ/VOVLo292M7qVSoHlm43gCDQi+VJT GSJiGk/bUU1VfEHprKZo+0ammN4wgDeWjKfQJlaVXJp4MBPeEpPRDP6xilTgu7HtYcTe 9XCytZUwK2gCLNsS4MJRDo+RLRf06tBLuWTsKtn9+hjdqiVq/zUgflJLjpWPfRsH/b42 KX3w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LGsrwHLU; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-42149-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42149-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id yk18-20020a17090770d200b00a3124a632fbsi3127969ejb.838.2024.01.28.19.58.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 19:58:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42149-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LGsrwHLU; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-42149-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42149-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 6AD1C1F24861 for ; Mon, 29 Jan 2024 03:58:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E41291DA2A; Mon, 29 Jan 2024 03:58:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LGsrwHLU" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52C8314A9E for ; Mon, 29 Jan 2024 03:58:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706500683; cv=none; b=pg57LX+Bqvfw/VBHkAjIbJuKBVTcEIYy28kjFiNnioeMWU6HcaDQ0pLbgcX9Tsc/IVp/kPM10mZIfE5Oy3ir/Sfs58pm9ilqR0AYphH14Q/7IsnOfI4RafUfhW6RnJtGCgQxNCCgANuZ520tAIsdNN+KKlPXk99i4CLqS69++78= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706500683; c=relaxed/simple; bh=KLl/UJhNY/3jG+0jMPjVgkjiPYawaw6fziPDlbDmeY0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=TsEebT1lQH1waTil+8Fuk81+J3bNGu8TwbjWD3JuAzzYb2eoYIYEaAXzodkh2DYdsOzo0W3EXanfi5LXY7yABwl3dsJf6UGJ+hIqWUu25BngMe5MkhGE5IfhSJ0yaptx7LxGxrVs8Kiet4Iuex189QSV0CyoSxV7Nk1G3z3nUH0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LGsrwHLU; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706500680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n/MxaP1kzI5B5Dd1Ifm/aTvDRwpaQpXdO3YcG9Kmi5A=; b=LGsrwHLUYuwN/pCx8l6U7GomHH8BhrbmU/+DcSnO41dze/jRUSH7BQShsSLU6feDbVdvXT JzxTCZWW53NzmPRYa98tUYZDsa7TEc5KjLRk8cmLpiDgUhVg+2476E3l4QAVevsuGtPQPb 7yDXGtE2cY1VzrRmnc8Eao4S/vn9sc0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-564-Rma6JoVROtOm-i8iAgSF3A-1; Sun, 28 Jan 2024 22:57:55 -0500 X-MC-Unique: Rma6JoVROtOm-i8iAgSF3A-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E460683B86A; Mon, 29 Jan 2024 03:57:54 +0000 (UTC) Received: from fedora (unknown [10.72.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4F37F492BC6; Mon, 29 Jan 2024 03:57:48 +0000 (UTC) Date: Mon, 29 Jan 2024 11:57:45 +0800 From: Ming Lei To: Dave Chinner Cc: Mike Snitzer , Matthew Wilcox , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner , linux-block@vger.kernel.org, ming.lei@redhat.com Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 On Mon, Jan 29, 2024 at 12:47:41PM +1100, Dave Chinner wrote: > On Sun, Jan 28, 2024 at 07:39:49PM -0500, Mike Snitzer wrote: > > On Sun, Jan 28, 2024 at 7:22 PM Matthew Wilcox wrote: > > > > > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > > > On Sun, Jan 28 2024 at 5:02P -0500, > > > > Matthew Wilcox wrote: > > > Understood. But ... the application is asking for as much readahead as > > > possible, and the sysadmin has said "Don't readahead more than 64kB at > > > a time". So why will we not get a bug report in 1-15 years time saying > > > "I put a limit on readahead and the kernel is ignoring it"? I think > > > typically we allow the sysadmin to override application requests, > > > don't we? > > > > The application isn't knowingly asking for readahead. It is asking to > > mmap the file (and reporter wants it done as quickly as possible.. > > like occurred before). > > ... which we do within the constraints of the given configuration. > > > This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap > > request size based on read-ahead setting") -- same logic, just applied > > to callchain that ends up using madvise(MADV_WILLNEED). > > Not really. There is a difference between performing a synchronous > read IO here that we must complete, compared to optimistic > asynchronous read-ahead which we can fail or toss away without the > user ever seeing the data the IO returned. Yeah, the big readahead in this patch happens when user starts to read over mmaped buffer instead of madvise(). > > We want required IO to be done in as few, larger IOs as possible, > and not be limited by constraints placed on background optimistic > IOs. > > madvise(WILLNEED) is optimistic IO - there is no requirement that it > complete the data reads successfully. If the data is actually > required, we'll guarantee completion when the user accesses it, not > when madvise() is called. IOWs, madvise is async readahead, and so > really should be constrained by readahead bounds and not user IO > bounds. > > We could change this behaviour for madvise of large ranges that we > force into the page cache by ignoring device readahead bounds, but > I'm not sure we want to do this in general. > > Perhaps fadvise/madvise(willneed) can fiddle the file f_ra.ra_pages > value in this situation to override the device limit for large > ranges (for some definition of large - say 10x bdi->ra_pages) and > restore it once the readahead operation is done. This would make it > behave less like readahead and more like a user read from an IO > perspective... ->ra_pages is just one hint, which is 128KB at default, and either device or userspace can override it. fadvise/madvise(willneed) already readahead bytes from bdi->io_pages which is the max device sector size(often 10X of ->ra_pages), please see force_page_cache_ra(). Follows the current report: 1) usersapce call madvise(willneed, 1G) 2) only the 1st part(size is from bdi->io_pages, suppose it is 2MB) is readahead in madvise(willneed, 1G) since commit 6d2be915e589 3) the other parts(2M ~ 1G) is readahead by unit of bdi->ra_pages which is set as 64KB by userspace when userspace reads the mmaped buffer, then the whole application becomes slower. This patch changes 3) to use bdi->io_pages as readahead unit. Thanks, Ming