Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp112674rdb; Sun, 28 Jan 2024 16:40:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBagX/Tm6vZYsPJXms/ZlAZ9bPK4cFvSrwRtt0Gb/2j/OrxIbA7XtBTNRj8bJ2fAmJtfpJ X-Received: by 2002:a05:6402:2709:b0:55d:3787:fa06 with SMTP id y9-20020a056402270900b0055d3787fa06mr3353062edd.26.1706488816163; Sun, 28 Jan 2024 16:40:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706488816; cv=pass; d=google.com; s=arc-20160816; b=arBHlG0LaiAmLHKhiEduQ3E0C0cyERbq4xaAuoMneGmsc6+Y015d/3mUmj12oVzxCG nGEEDkgi1xZGnlCC8GttstkJq8wbX1pZNaiwckH7jSmDiX2/Sp59L3qHo01gxGTUw/yb yHzgS8ERat+7dZSucrG2YdAUObn4V+R7oy9eDSeQlNEPQoPQKFPXvcwIO21fNLAjM6Rd yPfQDjNwGUtKXIYPtzb54uXUPhYZDJCLwDH39f28utEH8H/Oi2n70c3IRGf7mpYy7TvE Y7pmkn9hO9SlkLGGPh+uU9bjt2MM09fEydl6NAQp04N7lsanfyTwd/8xZQvJWqE6vjXe iJrg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=GSBIVu4e7zyJwYA8omI9DtyW9Cv4D1K3x1rR9vS0Ibk=; fh=nVfetIwOI1O69yj39tNdTUxb9VKe8jN4eRTEOTweTk4=; b=PxiJybEav6fqfDxlr/g5MTBE2E2FnFjQxbhiIGG5mC/BqsAnJwDzZ7j3Gung8vwsL6 ydf1b3V9ea8vrqYp/bmzwVn2NHQMHSBJ1veX7tQMrAz+pg2YIVIf4nwS4QstETApdKas Z2mBzW8IxqV1PU7cPXefQL08JrS/0tVwxjqz0G9/VeGbNfqvm8oSP9LnMgH/7wabKEC1 5/M0TNu2PRzanv3tpU7Z5gp+5SylPwajVsfYVuiK2Wo+RKlSkeOc5sCPWcmB0nAz5tqA 4/SvmmmStM1Cg81R+sZgetZi70t9VU/eW3NVpT2K48AxMbzNFVzNImERze1Qbpjt/cKD 3BsQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S3fcEKHk; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-42031-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42031-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id bc7-20020a056402204700b00558bdc547d4si3091662edb.135.2024.01.28.16.40.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 16:40:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42031-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S3fcEKHk; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-42031-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42031-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id A68BB1F2313A for ; Mon, 29 Jan 2024 00:40:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B621D8F6A; Mon, 29 Jan 2024 00:40:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="S3fcEKHk" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C55495382 for ; Mon, 29 Jan 2024 00:40:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706488801; cv=none; b=CtvZgDWguoolyVuetIv0FFDyhlbRdHsDZgdhXCmPQvxm8I9XPPxW+VMC37QKq0y4e6tVFhC1Ug22XrIYF8Q1NVgJyblIAWevenjJfQyhqF+IpjFbI20iDCcifd15JGLSDswXxhf1PnMMyjqc9kbN9d8oTgmuiOpjgt9/CXllimU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706488801; c=relaxed/simple; bh=GOR+vztBp9qsOfWG/p7VJatHV1ebL3mS/nkMHAUATnU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=WwI6wuGfEFD2v2oSTDjOekO2AM6CDvu4K9IDI6rdYlOQnjTixNZl2apCd8jZYkVB22xhUTsqnZf5vm7rrkKxu7BRf8enZPBPMZLYnRqGxG8Arc3751scrlORjqWfxv7nOEOu57WGwG5yiNM12kiMVTn2MwsN+q28d26ZFiySaBE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=S3fcEKHk; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 636C6C433B2 for ; Mon, 29 Jan 2024 00:40:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706488801; bh=GOR+vztBp9qsOfWG/p7VJatHV1ebL3mS/nkMHAUATnU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=S3fcEKHkqEqitRMIVj5hmqHJZU8fhO5OEUusrfOzN27629hkWvh/7fyDdeLi0PZr9 n3vr5PXcB47an+jKNzx35aEt0hmZOf9EWGsH5XYeb9ylZl1NWtz4bqsBvWTpD44/lm nT1LsovvG7nanfo1PWWW25GgvPqdNHMVyPJDiLO0zJZETUyw/QsgWENqq+dOaSLyQI zX9iTaHdT580sJjFb1dFA0vAfwPmUzC8iU4yrYfigwYmpILiBfCFgIb1+6MDMKfjxO Z+7cssDVheDauDC40B3Stqn2Awq+CmHEyVgw2Vs0HQVYtQUtE8VYNA5yZ6/gCc8Q/y Y0ksnseRU6AFw== Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a3510d79ae9so200394266b.0 for ; Sun, 28 Jan 2024 16:40:01 -0800 (PST) X-Gm-Message-State: AOJu0Ywzwn69PxMugZFiGnXsaWMCL9YuNae3wmQGGpjnN/um5PJfzQ// qSqoIHWpoEOOzI5AYVjd2+xNy5dRqE51E/69eNu3ShfhkCnodNJwr1pnESjuYA6Voit6YPUoFLZ Rpbqy6w3I3s0CRaXjV3Llp8QV+36fbUwAU/2a X-Received: by 2002:a17:906:6810:b0:a31:4cf0:cf81 with SMTP id k16-20020a170906681000b00a314cf0cf81mr2993727ejr.37.1706488799959; Sun, 28 Jan 2024 16:39:59 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240128142522.1524741-1-ming.lei@redhat.com> In-Reply-To: From: Mike Snitzer Date: Sun, 28 Jan 2024 19:39:49 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range To: Matthew Wilcox Cc: Mike Snitzer , Ming Lei , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jan 28, 2024 at 7:22=E2=80=AFPM Matthew Wilcox wrote: > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > On Sun, Jan 28 2024 at 5:02P -0500, > > Matthew Wilcox wrote: > > > > > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure f= or > > > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > > > only tries to readahead 512 pages, and the remained part in the adv= ised > > > > range fallback on normal readahead. > > > > > > Does the MAINTAINERS file mean nothing any more? > > > > "Ming, please use scripts/get_maintainer.pl when submitting patches." > > That's an appropriate response to a new contributor, sure. Ming has > been submitting patches since, what, 2008? Surely they know how to > submit patches by now. > > > I agree this patch's header could've worked harder to establish the > > problem that it fixes. But I'll now take a crack at backfilling the > > regression report that motivated this patch be developed: > > Thank you. > > > Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED) > > allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_k= b. > > > > Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that > > mounting XFS on a device with read_ahead_kb=3D64 and max_sectors_kb=3D1= 024 > > and running this reproducer against a 2G file will take ~5x longer > > (depending on the system's capabilities), mmap_load_test.java follows: > > > > import java.nio.ByteBuffer; > > import java.nio.ByteOrder; > > import java.io.RandomAccessFile; > > import java.nio.MappedByteBuffer; > > import java.nio.channels.FileChannel; > > import java.io.File; > > import java.io.FileNotFoundException; > > import java.io.IOException; > > > > public class mmap_load_test { > > > > public static void main(String[] args) throws FileNotFoundExcep= tion, IOException, InterruptedException { > > if (args.length =3D=3D 0) { > > System.out.println("Please provide a file"); > > System.exit(0); > > } > > FileChannel fc =3D new RandomAccessFile(new File(args[0])= , "rw").getChannel(); > > MappedByteBuffer mem =3D fc.map(FileChannel.MapMode.READ_= ONLY, 0, fc.size()); > > > > System.out.println("Loading the file"); > > > > long startTime =3D System.currentTimeMillis(); > > mem.load(); > > long endTime =3D System.currentTimeMillis(); > > System.out.println("Done! Loading took " + (endTime-start= Time) + " ms"); > > > > } > > } > > It's good to have the original reproducer. The unfortunate part is > that being at such a high level, it doesn't really show what syscalls > the library makes on behalf of the application. I'll take your word > for it that it calls madvise(MADV_WILLNEED). An strace might not go > amiss. > > > reproduce with: > > > > javac mmap_load_test.java > > echo 64 > /sys/block/sda/queue/read_ahead_kb > > echo 1024 > /sys/block/sda/queue/max_sectors_kb > > mkfs.xfs /dev/sda > > mount /dev/sda /mnt/test > > dd if=3D/dev/zero of=3D/mnt/test/2G_file bs=3D1024k count=3D2000 > > > > echo 3 > /proc/sys/vm/drop_caches > > (I prefer to unmount/mount /mnt/test; it drops the cache for > /mnt/test/2G_file without affecting the rest of the system) > > > java mmap_load_test /mnt/test/2G_file > > > > Without a fix, like the patch Ming provided, iostat will show rareq-sz > > is 64 rather than ~1024. > > Understood. But ... the application is asking for as much readahead as > possible, and the sysadmin has said "Don't readahead more than 64kB at > a time". So why will we not get a bug report in 1-15 years time saying > "I put a limit on readahead and the kernel is ignoring it"? I think > typically we allow the sysadmin to override application requests, > don't we? The application isn't knowingly asking for readahead. It is asking to mmap the file (and reporter wants it done as quickly as possible.. like occurred before). This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting") -- same logic, just applied to callchain that ends up using madvise(MADV_WILLNEED). > > > > @@ -972,6 +974,7 @@ struct file_ra_state { > > > > unsigned int ra_pages; > > > > unsigned int mmap_miss; > > > > loff_t prev_pos; > > > > + struct maple_tree *need_mt; > > > > > > No. Embed the struct maple tree. Don't allocate it. > > > > Constructive feedback, thanks. > > > > > What made you think this was the right approach? > > > > But then you closed with an attack, rather than inform Ming and/or > > others why you feel so strongly, e.g.: Best to keep memory used for > > file_ra_state contiguous. > > That's not an attack, it's a genuine question. Is there somewhere else > doing it wrong that Ming copied from? Does the documentation need to > be clearer? I can't fix what I don't know. OK