Received: by 2002:ac0:8c8e:0:0:0:0:0 with SMTP id r14csp1201441ima; Wed, 6 Feb 2019 15:46:11 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib7u+BW2groswYW23ENxEq/42HcVT/HDdd6/+bKs6hMj18bL45AzeqYS7+i53h+1HrTHR8F X-Received: by 2002:a63:4618:: with SMTP id t24mr12143685pga.316.1549496770784; Wed, 06 Feb 2019 15:46:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549496770; cv=none; d=google.com; s=arc-20160816; b=SnE6hCRAoo5qn5Labz2GCSSGFB6Ixflrym7m2WHPKTpWYh0V6qRRWcsb2ay14gMrZt KWiMX5u9J8gmmF1YOsi/qYO6epnx/tMRt67eG/Kj4Rgq1urY+rt0Yptc8003VsFECtzv oCjDpsHIm5v68JHhVZbi9htk1YHQKIwF6pe0JisDolEqBlY5k3qw+idNXILCC1QsfYF7 B23lDBVn/vqLOZghUzUAR0B+LJCLtsTYr1W41hc7SdUKuFoy01OkXXOQmkC8jhtnp7el LV45fE+DM1Uu7jSkioQv+SknXt7zrq3OAwARjButQR4PvRGtVmFzWBmKeVcG7FFNd5O4 iE6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qr6O64OUOLtaB1Daaa6zDJgd2jTu3zlC8MPY7Jj6ucg=; b=R9vePI9BR3+25QvD0q2/oC6W1iUZoccmbZSgsULYND4ccgGqwEIoe5aGjxOIiWHJiC yh03VmyaDBGTjhgf4LMpZ6OjtXtSj2q03F0M8Mw9K/7j3+c5+9kOA6Jak+FEEemWc92u +JoDAoaq4kw2UefMw3O9ldOFG3UucoubZbtiheIE8g17dwk512ZLQqKOhtW7qOLPujBj 704I0MrN9FHu9zfLF89ZjYCYBfZ7AXnbQjTtZrE6lHA5//wqX9VS/dPVLte2SgvyCMiU MNgAkUPXUxszKIlMRRF/atE1v6ewJku9Edq7dyRpiuIodJS2s1zuQNMsAg1p9tRD3wVV Aukg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d142si7476046pfd.93.2019.02.06.15.45.54; Wed, 06 Feb 2019 15:46:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726718AbfBFXnd (ORCPT + 99 others); Wed, 6 Feb 2019 18:43:33 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:44683 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726161AbfBFXnd (ORCPT ); Wed, 6 Feb 2019 18:43:33 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail06.adl2.internode.on.net with ESMTP; 07 Feb 2019 10:13:30 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1grWqm-0003U1-Pa; Thu, 07 Feb 2019 10:43:28 +1100 Date: Thu, 7 Feb 2019 10:43:28 +1100 From: Dave Chinner To: Nix Cc: linux-bcache@vger.kernel.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all? Message-ID: <20190206234328.GH14116@dastard> References: <87h8dgefee.fsf@esperi.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h8dgefee.fsf@esperi.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 06, 2019 at 10:11:21PM +0000, Nix wrote: > So I just upgraded to 4.20 and revived my long-turned-off bcache now > that the metadata corruption leading to mount failure on dirty close may > have been identified (applying Tang Junhui's patch to do so)... and I > spotted something a bit disturbing. It appears that XFS directory and > metadata I/O is going more or less entirely uncached. > > Here's some bcache stats before and after a git status of a *huge* > uncached tree (Chromium) on my no-writeback readaround cache. It takes > many minutes and pounds the disk with massively seeky metadata I/O in > the process: > > Before: > > stats_total/bypassed: 48.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 861045 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16286 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411575 > stats_total/cache_readaheads: 0 > > After: > stats_total/bypassed: 49.3G > stats_total/cache_bypass_hits: 7942 > stats_total/cache_bypass_misses: 1154887 > stats_total/cache_hit_ratio: 3 > stats_total/cache_hits: 16291 > stats_total/cache_miss_collisions: 25 > stats_total/cache_misses: 411625 > stats_total/cache_readaheads: 0 > > Huge increase in bypassed reads, essentially no new cached reads. This > is... basically the optimum case for bcache, and it's not caching it! > > From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially > all directory reads in XFS appear to bcache as a single non-readahead > followed by a pile of readahead I/O: bcache bypasses readahead bios, so > all directory reads (or perhaps all directory reads larger than a single > block) are going to be bypassed out of hand. That's a bcache problem, not an XFS problem. XFS does extensive amounts of metadata readahead (btree traversals, directory access, etc), and always has. If bcache considers readahead as "not worth caching" then that has nothing to do with XFS. > > This seems... suboptimal, but so does filling up the cache with > read-ahead blocks (particularly for non-metadata) that are never used. Which is not the case for XFS. We do readahead when we know we are going to need a block in the near future. It is rarely unnecessary, it's a mechanism to reduce access latency when we do need to access the metadata. > Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear > to let us distinguish between "read-ahead just in case but almost > certain to be accessed" (like directory blocks) and "read ahead on the > offchance because someone did a single-block file read and what the hell > let's suck in a bunch more". File data readahead: REQ_RAHEAD Metadata readahead: REQ_META | REQ_RAHEAD drivers/md/bcache/request.c::check_should_bypass(): /* * Flag for bypass if the IO is for read-ahead or background, * unless the read-ahead request is for metadata (eg, for gfs2). */ if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && !(bio->bi_opf & REQ_PRIO)) goto skip; bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's wrong - REQ_META means it's metadata IO, and so this is a bcache bug. Cheers, Dave. -- Dave Chinner david@fromorbit.com