Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp2295285rdb; Mon, 5 Feb 2024 01:54:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IEh95ngi9l0vvBiELDebgfsARRKZsXu8wGhPDDeabeSa6ARUEOT+/tZHjia5H7FlO3b1YQx X-Received: by 2002:a05:622a:478b:b0:42b:fe2d:2c25 with SMTP id do11-20020a05622a478b00b0042bfe2d2c25mr5296105qtb.18.1707126859871; Mon, 05 Feb 2024 01:54:19 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707126859; cv=pass; d=google.com; s=arc-20160816; b=uDNFI1rFW8ygVR5gMAA5rNHgZQ2RYuBPQWxrnYOfFQfejNtOHg7gm9ciW/Fh5a/xtA 3BmViKY5CSpPaYm9mz8kC9VkgyZviTQN74fmd4k3PV/QKXrdi2PxKgexU5pU575fyKiB fGuGKR5PV6DKSsdTWBNq/xGqWFETL9OwugbbwboQwa4GnL+q8RvBJR6JrA0AHR37JCBE rm09KMvXZWC7bk90Ks4nqe/oGfyLZfIKvyTQA3Ui03McWZ14UveiTvORgUmdhv8weUST j+Stb992HZghD+tCH/fUwPjJK1jKe0Y1/zU7PHgzjeMfzllrRQ+yZ6B/4LP17allImwv 8IVQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=L40x9EmC//Z/IURxbr1HiKYTbhCONWsaujp5SiZFte0=; fh=nb3Cq0bumYSTF3q4oS+eyZqJH49YhA0SaVkiWQ1BSso=; b=GXgGxOefA3ZqVsPiG2Kz+/Xd8SrvUiJTKkM4oC+DnN5E10lgCw3NaTGTW8ax6UfBTy 9ra/1QgwMEf5ro8KOkbK5aN9WJHqT+IahOp9qxxAazYusXki4UOuZFGBmyP/0OzwuS0f HsFnXe5EdmfL72Hjrvswu/3bVRusqtRA4iq+Y7eQFxkZjcyvOcGxQyBt9ISbO71P5wDt vfE1Gjh6y5Plt2xLKRZ5qcN3agNPC68oA/Rrk5wvCBIfJ8nPLC/fBx2NW6dEL91+86Y1 q9X5PJWRx6x2SGj5u96BF4607d4gz2G2FD1d81Pf+jOzpkq8t7nQZ1/bVtzikq4NKIqp xb3w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Zm/eVhkT"; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-52354-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52354-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Forwarded-Encrypted: i=1; AJvYcCUoTrausdEHymxE8kb4bGnXqhIWEe+GGAagxqmSyMAUvcWGHcZy9LTg3HGBe9EGmjeD6ZEO+EJQrD1zKt4JHop+tep974/j9hesRMdsMA== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id v19-20020a05622a145300b0042c2c0f54a5si413303qtx.781.2024.02.05.01.54.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 01:54:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52354-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Zm/eVhkT"; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-52354-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52354-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id A038B1C20E5B for ; Mon, 5 Feb 2024 09:54:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 62E61134BA; Mon, 5 Feb 2024 09:54:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Zm/eVhkT" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C785A134A8 for ; Mon, 5 Feb 2024 09:53:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707126841; cv=none; b=LadCb3jN8zBEJ79Piijlc4L280eXWw6PBZ8zzEVaXsjHBcxW5ZmciI7JKSaRpUJDgjWZRFw22MChYd72Bd/cEfE80cmK1HHuXFu5o3LW5XlS+itYup1vSoSfjC/s26N9CGZdmERgp07iFblKEUUqa9PXSXJQ/FringoLJxxxEh0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707126841; c=relaxed/simple; bh=9BTY2mhrviUVL/AYXjnpvkCacHDAz83bGYR0WxWH1B0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=S+4vzjvc1fkMKXn9NOdC9LX5RzuT7UpKLTVuF4e2YvsSdnb8YWanLCTGDbDCFZHRqQL3OjjRzMPnPZHIJRD66Qz72KzrhoB6V2Ul2b8zj2x6HibTOrgE3LlTaiE06N4Blh32IWygYGZJl6AYKNUHrwfSb0o4O1tRsYwVlisjS4Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Zm/eVhkT; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1707126838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=L40x9EmC//Z/IURxbr1HiKYTbhCONWsaujp5SiZFte0=; b=Zm/eVhkTHO7fU14/sXBupnmtIWr4h1RhaHPPtPUYjtAkRmecQAvoo59oGxoMh3bYarTuof v8EI6/oKkpcq3eio84NEfTtyJuegNp1hFR34cvFHKyuDL4NHKqGSQCNmQkn5z7riqxF85P IIFM4uJ74MBkXE1Ws5yTrr41b3lhSSw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-49-JhF2NxylPlyvXTa0o4_Bkg-1; Mon, 05 Feb 2024 04:53:55 -0500 X-MC-Unique: JhF2NxylPlyvXTa0o4_Bkg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0897A1065221; Mon, 5 Feb 2024 09:53:55 +0000 (UTC) Received: from fedora (unknown [10.72.116.6]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5CE813C2E; Mon, 5 Feb 2024 09:53:49 +0000 (UTC) Date: Mon, 5 Feb 2024 17:53:45 +0800 From: Ming Lei To: Dave Chinner Cc: Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Hildenbrand , Matthew Wilcox , Alexander Viro , Christian Brauner , Don Dutile , Rafael Aquini , Mike Snitzer Subject: Re: [PATCH] mm/madvise: set ra_pages as device max request size during ADV_POPULATE_READ Message-ID: References: <20240202022029.1903629-1-ming.lei@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 On Mon, Feb 05, 2024 at 10:34:47AM +1100, Dave Chinner wrote: > On Fri, Feb 02, 2024 at 10:20:29AM +0800, Ming Lei wrote: > > madvise(MADV_POPULATE_READ) tries to populate all page tables in the > > specific range, so it is usually sequential IO if VMA is backed by > > file. > > > > Set ra_pages as device max request size for the involved readahead in > > the ADV_POPULATE_READ, this way reduces latency of madvise(MADV_POPULATE_READ) > > to 1/10 when running madvise(MADV_POPULATE_READ) over one 1GB file with > > usual(default) 128KB of read_ahead_kb. > > > > Cc: David Hildenbrand > > Cc: Matthew Wilcox > > Cc: Alexander Viro > > Cc: Christian Brauner > > Cc: Don Dutile > > Cc: Rafael Aquini > > Cc: Dave Chinner > > Cc: Mike Snitzer > > Cc: Andrew Morton > > Signed-off-by: Ming Lei > > --- > > mm/madvise.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 51 insertions(+), 1 deletion(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 912155a94ed5..db5452c8abdd 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -900,6 +900,37 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > > return -EINVAL; > > } > > > > +static void madvise_restore_ra_win(struct file **file, unsigned int ra_pages) > > +{ > > + if (*file) { > > + struct file *f = *file; > > + > > + f->f_ra.ra_pages = ra_pages; > > + fput(f); > > + *file = NULL; > > + } > > +} > > + > > +static struct file *madvise_override_ra_win(struct file *f, > > + unsigned long start, unsigned long end, > > + unsigned int *old_ra_pages) > > +{ > > + unsigned int io_pages; > > + > > + if (!f || !f->f_mapping || !f->f_mapping->host) > > + return NULL; > > + > > + io_pages = inode_to_bdi(f->f_mapping->host)->io_pages; > > + if (((end - start) >> PAGE_SHIFT) < io_pages) > > + return NULL; > > + > > + f = get_file(f); > > + *old_ra_pages = f->f_ra.ra_pages; > > + f->f_ra.ra_pages = io_pages; > > + > > + return f; > > +} > > This won't do what you think if the file has been marked > FMODE_RANDOM before this populate call. Yeah. But madvise(POPULATE_READ) is actually one action, so userspace can call fadvise(POSIX_FADV_NORMAL) or fadvise(POSIX_FADV_SEQUENTIAL) before madvise(POPULATE_READ), and set RANDOM advise back after madvise(POPULATE_READ) returns, so looks not big issue in reality. > > IOWs, I don't think madvise should be digging in the struct file > readahead stuff here. It should call vfs_fadvise(FADV_SEQUENTIAL) to > do the set the readahead mode, rather that try to duplicate > FADV_SEQUENTIAL (badly). We already do this for WILLNEED to make it > do the right thing, we should be doing the same thing here. FADV_SEQUENTIAL doubles current readahead window, which is far from enough to get top performance, such as, latency of doubling (default) ra window is still 2X of setting ra windows as bdi->io_pages. If application sets small 'bdi/read_ahead_kb' just like this report, the gap can be very big. Or can we add one API/helper in fs code to set file readahead ra_pages for this use case? > > Also, AFAICT, there is no need for get_file()/fput() here - the vma > already has a reference to the struct file, and the vma should not > be going away whilst the madvise() operation is in progress. You are right, get_file() is only needed in case of dropping mm lock. Thanks, Ming