Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3004350ybt; Mon, 22 Jun 2020 12:23:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqRUWPrgY6yccChPGk3NZSYotV54qXDeu11FoUFTykzGeSVSgnFNY2xH97vry0u1n5pWRi X-Received: by 2002:a05:6402:3052:: with SMTP id bu18mr18133778edb.323.1592853813606; Mon, 22 Jun 2020 12:23:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592853813; cv=none; d=google.com; s=arc-20160816; b=XkNne6XTOflUtR+j+38YBZvn3gQIpvp84vAtg7kNdPVZihjUL9J+lwmxKL56f8XkoV 6nWE9oWWL+IeBvp6hyNnZ1VzaA5SFHaLFSJPQRU0AeHWF9GbMIwGz1s3NWS8jS14F9ZD 3eJakgI37w6zolNXazOS9iw8GzmXYAe4LvbZR4t40HZGzm2K4Poi0APOvfFoYz+sekfb LKmhjD/7f3QkTGPiIWuMtesqT6fdKbGTsVSRy3QZK2sdVY9qUOSvuapUWjd2sdvRkXtP QPZEjZZRvf1GbP3gsnCNqE4atA2Mp0c3+soCb6c4l1rTks7tK5xuv93Alsw2bQkeJnDH SRdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=RQJq+7AzCVSDpLZ6UztkdVPIx33ju3bxwq7Nfv4giNc=; b=KP212HBfdmaNxT52b/3lC7HEIKtwrbe/QYeoYdMJgVnWGHPomzQnPaYUqrxI84sIUX 64WTS8i9mqZ7wueS2PwX+jcZQqiygum48BXxFT/k47ypu2Lpu5X72OBadnnX/hcVUVa7 z5qem4teVDXPwSwr0BRHAabyGotYQlZUiw3LUbbbPzzco19v8dGy+Wv1WUOR5hg1x//S /AyzvXtw24S6w7QVT7rqVmwDrJsIDl/0zTwH4ncgYurS+l/0GNJVkBxfOl5VyB/dTUOD X3KPodPmXQUbOTQ1uGwdKZwcseWTpC2cOAA5/X/gYP1jAXkP6znd6zeQldS4NSRWxWc2 MJpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b="ETs4k3/6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n12si9920259edy.470.2020.06.22.12.23.11; Mon, 22 Jun 2020 12:23:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b="ETs4k3/6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728447AbgFVTTT (ORCPT + 99 others); Mon, 22 Jun 2020 15:19:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728165AbgFVTTN (ORCPT ); Mon, 22 Jun 2020 15:19:13 -0400 Received: from casper.infradead.org (unknown [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CD4FC061573; Mon, 22 Jun 2020 12:19:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=RQJq+7AzCVSDpLZ6UztkdVPIx33ju3bxwq7Nfv4giNc=; b=ETs4k3/6ptZjuGJ6PGMlAriF7D M2ZdWwBt07llvSQB3BRcysDLSM3TjUVsx2uKutQ2vRUYTecj/a7AXSp0PL3NPz0UPP2IXjq5VLKjn 6ozhtSFlY44iX3P4Ap8ARRAzJivWbV/gjuu+us5GHL1+vWt3oT6DCfP/QV87dyc4uhiGZcDLgnu8N uiRNKe31sLOCccpwMvpf0WOFRcD3KbMhRWGZYheSJnDhCgDg6cO7jup1Xc7/MnC1OFUfosB6V8oNv 9IKsT6yEgtZCNKy+qKjYCc4j/f7UfIglA9F+Jwa67DqAWH/1GBFptB4m0SRMrD+ASdOEZEdHCcMpn 0uIbqEuA==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jnRy5-0001ra-3v; Mon, 22 Jun 2020 19:18:57 +0000 Date: Mon, 22 Jun 2020 20:18:57 +0100 From: Matthew Wilcox To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, agruenba@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [RFC] Bypass filesystems for reading cached pages Message-ID: <20200622191857.GB21350@casper.infradead.org> References: <20200619155036.GZ8681@bombadil.infradead.org> <20200622003215.GC2040@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200622003215.GC2040@dread.disaster.area> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 22, 2020 at 10:32:15AM +1000, Dave Chinner wrote: > On Fri, Jun 19, 2020 at 08:50:36AM -0700, Matthew Wilcox wrote: > > > > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS. > > The advantage of this patch is that we can avoid taking any filesystem > > lock, as long as the pages being accessed are in the cache (and we don't > > need to readahead any pages into the cache). We also avoid an indirect > > function call in these cases. > > What does this micro-optimisation actually gain us except for more > complexity in the IO path? > > i.e. if a filesystem lock has such massive overhead that it slows > down the cached readahead path in production workloads, then that's > something the filesystem needs to address, not unconditionally > bypass the filesystem before the IO gets anywhere near it. You're been talking about adding a range lock to XFS for a while now. I remain quite sceptical that range locks are a good idea; they have not worked out well as a replacement for the mmap_sem, although the workload for the mmap_sem is quite different and they may yet show promise for the XFS iolock. There are production workloads that do not work well on top of a single file on an XFS filesystem. For example, using an XFS file in a host as the backing store for a guest block device. People tend to work around that kind of performance bug rather than report it. Do you agree that the guarantees that XFS currently supplies regarding locked operation will be maintained if the I/O is contained within a single page and the mutex is not taken? ie add this check to the original patch: if (iocb->ki_pos / PAGE_SIZE != (iocb->ki_pos + iov_iter_count(iter) - 1) / PAGE_SIZE) goto uncached; I think that gets me almost everything I want. Small I/Os are going to notice the pain of the mutex more than large I/Os.