Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp159279rdb; Sun, 28 Jan 2024 19:20:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IEbJmmEaQipsRwEoYGx5Y8RUx4n/RBogi8Gcl/TNuCKfnzT0yTSyprrM7u5f9jTbEsID1Qm X-Received: by 2002:a05:6a00:99d:b0:6dd:83a5:a893 with SMTP id u29-20020a056a00099d00b006dd83a5a893mr1557036pfg.5.1706498452956; Sun, 28 Jan 2024 19:20:52 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706498452; cv=pass; d=google.com; s=arc-20160816; b=sOcPiwHTcOtrqyjVeV+fng04nhVK0v9wjgbiyrYokfbYl1nyYL0149x0rml3Fmb+sO jaywcvJmMqNiaCMUHi5HAw6bvEECd8XPFAkqBQfetIwWFbHh4g4l/0GKSZfXk8J40l2y 5TUcwKr6cu0MPPprV6h83HUZDi9y0b8FRNnpGRwVoCHjMhrXoiKpfrEutb0FyDoiqeAZ SBLLXXYIMXts4RExtymE+60/IsXzYjQqUErCUEgDaqWXoszSytLEGckhUuXuM8Nxjt6x QLBcVd0uHPn0FGVaGupxix00nKem8s1yEORQvubhjxHQkPSrM6K8+kPlwq73B2RbFMKr mzXQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=7G+EdxBrZWtWfEsSXmLsfmBlcaKy+iy1nng6IU24Ic8=; fh=rGIpv0dOaHAYrB53TRLWlXdxY1iL2b59gzoFLoi8X44=; b=XP7aXWPNvDYKrtxkwGnXSjxYfywArRrpODJneiOAFVFlBLDFJDVa6DlDVCzlMX9ozY AyzXBYAZB43PGCg8Xd0pMVDnBIPmORDchRFTyHLM67fGc6gT1w5nYrIKNoe5adZt0vgX MqjLVjJ2f7PPp54dyB11NYVnXeOVmTNw072gkfnVDF4OSrGUXH3zGeGmjda1aFhDZNWI EAK00YEMSJ9h5Zzqfyvcg6xe/4KA/5GUFU6/cEa4ZYF1rFBKIWtQF8f8l7YhTiI02F6J BBR8vSIPCbIdD9vz6ktVrPoi07l0ucuACq4s2/K8IapWJfDbaqIeRiCs4x1BQUUh3rjE uLkQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OXpgMQQJ; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-42120-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42120-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id z26-20020a056a001d9a00b006dde0089410si5021714pfw.354.2024.01.28.19.20.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Jan 2024 19:20:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42120-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OXpgMQQJ; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-42120-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42120-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8A777B21292 for ; Mon, 29 Jan 2024 03:20:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AC5711078D; Mon, 29 Jan 2024 03:20:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OXpgMQQJ" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46249FC1F for ; Mon, 29 Jan 2024 03:20:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706498442; cv=none; b=IFiTDI1hMYwfwVs6JYxIz5tgwcftDl0I9I6pKu8CWmcjZkiu+w6umGwY1cBOSWo9nnCGRCq29qKi4Pg2gOUJkZbobe26f5zRdhK/IMfJi3MXXKBaaZ8V57TWU0oNMEtBkRNR7eyy/YACbdyHgBr0fysBL6+6lsJjyjXsP7q1Fz0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706498442; c=relaxed/simple; bh=r/4Yclr74ahtXpXV+m5YFeBz0Y8jrHDZvJEmycRcYo8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=mt1PLPqTUHQfVl+ZOWL2Q4BcDqHv3R1nQHyuHbK7xiWfMYrCZPtu89fNoHPsAUml63pJWKEaoUpbYoBhv8bjVb9lVQa/uMEAP7I7vcCfHbuZvRVqHnau5QVs+p0UQ5fIzlaKU1OOQHooeAYhnrC7tYAdA+tpB+m46mv161wKqc0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OXpgMQQJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706498440; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7G+EdxBrZWtWfEsSXmLsfmBlcaKy+iy1nng6IU24Ic8=; b=OXpgMQQJc9R1Cf13tqNL4R6YcKf+UQXdy+8sd01CsqfFQgW2KxfitWjmO/ol1k7PFFhKVe hwtksK154gPXEe45inn2Ex88XE/KnET2mISwOEbWB/PJyGab0kYn6GIiiLb0xGD5NBkpaS /ksc4OGI5cQhrrG1L87yCNn19N1ZhCE= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-163-n_431acuNtSus-asaD7sqQ-1; Sun, 28 Jan 2024 22:20:34 -0500 X-MC-Unique: n_431acuNtSus-asaD7sqQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 212553C14942; Mon, 29 Jan 2024 03:20:34 +0000 (UTC) Received: from fedora (unknown [10.72.116.135]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D6C94487; Mon, 29 Jan 2024 03:20:28 +0000 (UTC) Date: Mon, 29 Jan 2024 11:20:24 +0800 From: Ming Lei To: Matthew Wilcox Cc: Mike Snitzer , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Don Dutile , Raghavendra K T , Alexander Viro , Christian Brauner , ming.lei@redhat.com Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Message-ID: References: <20240128142522.1524741-1-ming.lei@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 On Mon, Jan 29, 2024 at 12:21:16AM +0000, Matthew Wilcox wrote: > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote: > > On Sun, Jan 28 2024 at 5:02P -0500, > > Matthew Wilcox wrote: > > > > > On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote: > > > > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for > > > > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED > > > > only tries to readahead 512 pages, and the remained part in the advised > > > > range fallback on normal readahead. > > > > > > Does the MAINTAINERS file mean nothing any more? > > > > "Ming, please use scripts/get_maintainer.pl when submitting patches." > > That's an appropriate response to a new contributor, sure. Ming has > been submitting patches since, what, 2008? Surely they know how to > submit patches by now. > > > I agree this patch's header could've worked harder to establish the > > problem that it fixes. But I'll now take a crack at backfilling the > > regression report that motivated this patch be developed: > > Thank you. > > > Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED) > > allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_kb. > > > > Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that > > mounting XFS on a device with read_ahead_kb=64 and max_sectors_kb=1024 > > and running this reproducer against a 2G file will take ~5x longer > > (depending on the system's capabilities), mmap_load_test.java follows: > > > > import java.nio.ByteBuffer; > > import java.nio.ByteOrder; > > import java.io.RandomAccessFile; > > import java.nio.MappedByteBuffer; > > import java.nio.channels.FileChannel; > > import java.io.File; > > import java.io.FileNotFoundException; > > import java.io.IOException; > > > > public class mmap_load_test { > > > > public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException { > > if (args.length == 0) { > > System.out.println("Please provide a file"); > > System.exit(0); > > } > > FileChannel fc = new RandomAccessFile(new File(args[0]), "rw").getChannel(); > > MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()); > > > > System.out.println("Loading the file"); > > > > long startTime = System.currentTimeMillis(); > > mem.load(); > > long endTime = System.currentTimeMillis(); > > System.out.println("Done! Loading took " + (endTime-startTime) + " ms"); > > > > } > > } > > It's good to have the original reproducer. The unfortunate part is > that being at such a high level, it doesn't really show what syscalls > the library makes on behalf of the application. I'll take your word > for it that it calls madvise(MADV_WILLNEED). An strace might not go > amiss. Yeah, it can be fadvise(WILLNEED)/readahead syscall too. > > > reproduce with: > > > > javac mmap_load_test.java > > echo 64 > /sys/block/sda/queue/read_ahead_kb > > echo 1024 > /sys/block/sda/queue/max_sectors_kb > > mkfs.xfs /dev/sda > > mount /dev/sda /mnt/test > > dd if=/dev/zero of=/mnt/test/2G_file bs=1024k count=2000 > > > > echo 3 > /proc/sys/vm/drop_caches > > (I prefer to unmount/mount /mnt/test; it drops the cache for > /mnt/test/2G_file without affecting the rest of the system) > > > java mmap_load_test /mnt/test/2G_file > > > > Without a fix, like the patch Ming provided, iostat will show rareq-sz > > is 64 rather than ~1024. > > Understood. But ... the application is asking for as much readahead as > possible, and the sysadmin has said "Don't readahead more than 64kB at > a time". So why will we not get a bug report in 1-15 years time saying > "I put a limit on readahead and the kernel is ignoring it"? I think > typically we allow the sysadmin to override application requests, > don't we? ra_pages is just one hint for readahead, the reality is that sysadmin can't understand how much bytes is perfect for readahead. But application often knows how much bytes it will need, so here I think application requirement should have higher priority, especially when application doesn't want kernel to readahead blindly. And madvise/fadvise(WILLNEED) syscall already reads bdi->io_pages first, and which is bigger than ra_pages. Thanks, Ming