2019-02-06 23:28:45

by Nix

[permalink] [raw]
Subject: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

So I just upgraded to 4.20 and revived my long-turned-off bcache now
that the metadata corruption leading to mount failure on dirty close may
have been identified (applying Tang Junhui's patch to do so)... and I
spotted something a bit disturbing. It appears that XFS directory and
metadata I/O is going more or less entirely uncached.

Here's some bcache stats before and after a git status of a *huge*
uncached tree (Chromium) on my no-writeback readaround cache. It takes
many minutes and pounds the disk with massively seeky metadata I/O in
the process:

Before:

stats_total/bypassed: 48.3G
stats_total/cache_bypass_hits: 7942
stats_total/cache_bypass_misses: 861045
stats_total/cache_hit_ratio: 3
stats_total/cache_hits: 16286
stats_total/cache_miss_collisions: 25
stats_total/cache_misses: 411575
stats_total/cache_readaheads: 0

After:
stats_total/bypassed: 49.3G
stats_total/cache_bypass_hits: 7942
stats_total/cache_bypass_misses: 1154887
stats_total/cache_hit_ratio: 3
stats_total/cache_hits: 16291
stats_total/cache_miss_collisions: 25
stats_total/cache_misses: 411625
stats_total/cache_readaheads: 0

Huge increase in bypassed reads, essentially no new cached reads. This
is... basically the optimum case for bcache, and it's not caching it!

From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially
all directory reads in XFS appear to bcache as a single non-readahead
followed by a pile of readahead I/O: bcache bypasses readahead bios, so
all directory reads (or perhaps all directory reads larger than a single
block) are going to be bypassed out of hand.

This seems... suboptimal, but so does filling up the cache with
read-ahead blocks (particularly for non-metadata) that are never used.
Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear
to let us distinguish between "read-ahead just in case but almost
certain to be accessed" (like directory blocks) and "read ahead on the
offchance because someone did a single-block file read and what the hell
let's suck in a bunch more".

As it is, this seems to render bcache more or less useless with XFS,
since bcache's primary raison d'etre is precisely to cache seeky stuff
like metadata. :(


2019-02-06 23:46:11

by Dave Chinner

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Wed, Feb 06, 2019 at 10:11:21PM +0000, Nix wrote:
> So I just upgraded to 4.20 and revived my long-turned-off bcache now
> that the metadata corruption leading to mount failure on dirty close may
> have been identified (applying Tang Junhui's patch to do so)... and I
> spotted something a bit disturbing. It appears that XFS directory and
> metadata I/O is going more or less entirely uncached.
>
> Here's some bcache stats before and after a git status of a *huge*
> uncached tree (Chromium) on my no-writeback readaround cache. It takes
> many minutes and pounds the disk with massively seeky metadata I/O in
> the process:
>
> Before:
>
> stats_total/bypassed: 48.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 861045
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16286
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411575
> stats_total/cache_readaheads: 0
>
> After:
> stats_total/bypassed: 49.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 1154887
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16291
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411625
> stats_total/cache_readaheads: 0
>
> Huge increase in bypassed reads, essentially no new cached reads. This
> is... basically the optimum case for bcache, and it's not caching it!
>
> From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially
> all directory reads in XFS appear to bcache as a single non-readahead
> followed by a pile of readahead I/O: bcache bypasses readahead bios, so
> all directory reads (or perhaps all directory reads larger than a single
> block) are going to be bypassed out of hand.

That's a bcache problem, not an XFS problem. XFS does extensive
amounts of metadata readahead (btree traversals, directory access,
etc), and always has.

If bcache considers readahead as "not worth caching" then that has
nothing to do with XFS.

>
> This seems... suboptimal, but so does filling up the cache with
> read-ahead blocks (particularly for non-metadata) that are never used.

Which is not the case for XFS. We do readahead when we know we are
going to need a block in the near future. It is rarely unnecessary,
it's a mechanism to reduce access latency when we do need to access
the metadata.

> Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear
> to let us distinguish between "read-ahead just in case but almost
> certain to be accessed" (like directory blocks) and "read ahead on the
> offchance because someone did a single-block file read and what the hell
> let's suck in a bunch more".

File data readahead: REQ_RAHEAD
Metadata readahead: REQ_META | REQ_RAHEAD

drivers/md/bcache/request.c::check_should_bypass():

/*
* Flag for bypass if the IO is for read-ahead or background,
* unless the read-ahead request is for metadata (eg, for gfs2).
*/
if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
!(bio->bi_opf & REQ_PRIO))
goto skip;

bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
wrong - REQ_META means it's metadata IO, and so this is a bcache
bug.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-02-07 00:54:51

by Andre Noll

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Thu, Feb 07, 10:43, Dave Chinner wrote
> File data readahead: REQ_RAHEAD
> Metadata readahead: REQ_META | REQ_RAHEAD
>
> drivers/md/bcache/request.c::check_should_bypass():
>
> /*
> * Flag for bypass if the IO is for read-ahead or background,
> * unless the read-ahead request is for metadata (eg, for gfs2).
> */
> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
> !(bio->bi_opf & REQ_PRIO))
> goto skip;
>
> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
> wrong - REQ_META means it's metadata IO, and so this is a bcache
> bug.

Do you think 752f66a75abad is bad (ha!) and should be reverted?

Thanks
Andre
--
Max Planck Institute for Developmental Biology
Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829
http://people.tuebingen.mpg.de/maan/


Attachments:
(No filename) (893.00 B)
signature.asc (201.00 B)
Download all attachments

2019-02-07 02:29:25

by Dave Chinner

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote:
> On Thu, Feb 07, 10:43, Dave Chinner wrote
> > File data readahead: REQ_RAHEAD
> > Metadata readahead: REQ_META | REQ_RAHEAD
> >
> > drivers/md/bcache/request.c::check_should_bypass():
> >
> > /*
> > * Flag for bypass if the IO is for read-ahead or background,
> > * unless the read-ahead request is for metadata (eg, for gfs2).
> > */
> > if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
> > !(bio->bi_opf & REQ_PRIO))
> > goto skip;
> >
> > bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
> > wrong - REQ_META means it's metadata IO, and so this is a bcache
> > bug.
>
> Do you think 752f66a75abad is bad (ha!) and should be reverted?

Yes, that change is just broken. From include/linux/blk_types.h:

__REQ_META, /* metadata io request */
__REQ_PRIO, /* boost priority in cfq */


i.e. REQ_META means that it is a metadata request, REQ_PRIO means it
is a "high priority" request. Two completely different things, often
combined, but not interchangeable.

So, yeah, that needs to be reverted if you want bcache to function
properly for metadata caching.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-02-07 02:29:49

by Coly Li

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 2019/2/7 8:24 上午, Andre Noll wrote:
> On Thu, Feb 07, 10:43, Dave Chinner wrote
>> File data readahead: REQ_RAHEAD Metadata readahead: REQ_META |
>> REQ_RAHEAD
>>
>> drivers/md/bcache/request.c::check_should_bypass():
>>
>> /* * Flag for bypass if the IO is for read-ahead or background, *
>> unless the read-ahead request is for metadata (eg, for gfs2). */
>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && !(bio->bi_opf &
>> REQ_PRIO)) goto skip;
>>
>> bcache needs fixing - it thinks REQ_PRIO means metadata IO.
>> That's wrong - REQ_META means it's metadata IO, and so this is a
>> bcache bug.
>
> Do you think 752f66a75abad is bad (ha!) and should be reverted?

Hi Dave and Andre,

Correct me if I am wrong: REQ_META is used for blktrace to tag
metadata IO, and REQ_PRIO is used for block layer to handle metadata IO.

I discussed with Christoph Hellwig about this topic quite long time
ago, and got the above conclusion.

If different file system handles metadata flags in unified ways, it is
OK to me to change the code to: !(bio->bi_opf & (REQ_META |REQ_PRIO)).

Thanks in advance.

--

Coly Li

2019-02-07 02:41:21

by Coly Li

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 2019/2/7 10:26 上午, Dave Chinner wrote:
> On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote:
>> On Thu, Feb 07, 10:43, Dave Chinner wrote
>>> File data readahead: REQ_RAHEAD
>>> Metadata readahead: REQ_META | REQ_RAHEAD
>>>
>>> drivers/md/bcache/request.c::check_should_bypass():
>>>
>>> /*
>>> * Flag for bypass if the IO is for read-ahead or background,
>>> * unless the read-ahead request is for metadata (eg, for gfs2).
>>> */
>>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
>>> !(bio->bi_opf & REQ_PRIO))
>>> goto skip;
>>>
>>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
>>> wrong - REQ_META means it's metadata IO, and so this is a bcache
>>> bug.
>>
>> Do you think 752f66a75abad is bad (ha!) and should be reverted?
>
> Yes, that change is just broken. From include/linux/blk_types.h:
>
> __REQ_META, /* metadata io request */
> __REQ_PRIO, /* boost priority in cfq */
>
>

Hi Dave,

> i.e. REQ_META means that it is a metadata request, REQ_PRIO means it
> is a "high priority" request. Two completely different things, often
> combined, but not interchangeable.

I found in file system metadata IO, most of time REQ_META and REQ_PRIO
are tagged together for bio, but XFS seems not use REQ_PRIO.

Is there any basic principle for when should these tags to be used or
not ? e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used
too. And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ?

>
> So, yeah, that needs to be reverted if you want bcache to function
> properly for metadata caching.

Sure, I will fix this, once I make it clear to me.

Thanks for the hint.

--

Coly Li

2019-02-07 03:12:27

by Dave Chinner

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Thu, Feb 07, 2019 at 10:38:58AM +0800, Coly Li wrote:
> On 2019/2/7 10:26 上午, Dave Chinner wrote:
> > On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote:
> >> On Thu, Feb 07, 10:43, Dave Chinner wrote
> >>> File data readahead: REQ_RAHEAD
> >>> Metadata readahead: REQ_META | REQ_RAHEAD
> >>>
> >>> drivers/md/bcache/request.c::check_should_bypass():
> >>>
> >>> /*
> >>> * Flag for bypass if the IO is for read-ahead or background,
> >>> * unless the read-ahead request is for metadata (eg, for gfs2).
> >>> */
> >>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
> >>> !(bio->bi_opf & REQ_PRIO))
> >>> goto skip;
> >>>
> >>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
> >>> wrong - REQ_META means it's metadata IO, and so this is a bcache
> >>> bug.
> >>
> >> Do you think 752f66a75abad is bad (ha!) and should be reverted?
> >
> > Yes, that change is just broken. From include/linux/blk_types.h:
> >
> > __REQ_META, /* metadata io request */
> > __REQ_PRIO, /* boost priority in cfq */
> >
> >
>
> Hi Dave,
>
> > i.e. REQ_META means that it is a metadata request, REQ_PRIO means it
> > is a "high priority" request. Two completely different things, often
> > combined, but not interchangeable.
>
> I found in file system metadata IO, most of time REQ_META and REQ_PRIO
> are tagged together for bio, but XFS seems not use REQ_PRIO.

Yes, that's correct, because we don't specifically prioritise
metadata IO over data IO.

> Is there any basic principle for when should these tags to be used or
> not ?

Yes.

> e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used
> too.

REQ_META is used for metadata. REQ_PRIO is used to communicate to
the lower layers that the submitter considers this IO to be more
important that non REQ_PRIO IO and so dispatch should be expedited.

IOWs, if the filesystem considers metadata IO to be more important
that user data IO, then it will use REQ_PRIO | REQ_META rather than
just REQ_META.

Historically speaking, REQ_PRIO was a hack for CFQ to get it to
dispatch journal IO from a different thread without waiting for a
time slice to expire. In the XFs world, we just said "don't use CFQ,
it's fundametnally broken for highly concurrent applications" and
didn't bother trying to hack around the limitations of CFQ.

These days REQ_PRIO is only used by the block layer writeback
throttle, but unlike bcache it considers both REQ_META and REQ_PRIO
to mean the same thing.

REQ_META, OTOH, is used by BFQ and blk-cgroup to detect metadata
IO and don't look at REQ_PRIO at all. So, really, REQ_META is for
metadata, not REQ_PRIO. REQ_PRIO looks like it should just go away.

> And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ?

It's not necessary, it's just an /optimisation/ that some
filesystems make and some IO schedulers used to pay attention to. It
looks completely redundant now.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-02-07 08:17:46

by Coly Li

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 2019/2/7 6:11 上午, Nix wrote:
> So I just upgraded to 4.20 and revived my long-turned-off bcache now
> that the metadata corruption leading to mount failure on dirty close may
> have been identified (applying Tang Junhui's patch to do so)... and I
> spotted something a bit disturbing. It appears that XFS directory and
> metadata I/O is going more or less entirely uncached.
>
> Here's some bcache stats before and after a git status of a *huge*
> uncached tree (Chromium) on my no-writeback readaround cache. It takes
> many minutes and pounds the disk with massively seeky metadata I/O in
> the process:
>
> Before:
>
> stats_total/bypassed: 48.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 861045
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16286
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411575
> stats_total/cache_readaheads: 0
>
> After:
> stats_total/bypassed: 49.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 1154887
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16291
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411625
> stats_total/cache_readaheads: 0
>
> Huge increase in bypassed reads, essentially no new cached reads. This
> is... basically the optimum case for bcache, and it's not caching it!
>
> From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially
> all directory reads in XFS appear to bcache as a single non-readahead
> followed by a pile of readahead I/O: bcache bypasses readahead bios, so
> all directory reads (or perhaps all directory reads larger than a single
> block) are going to be bypassed out of hand.
>
> This seems... suboptimal, but so does filling up the cache with
> read-ahead blocks (particularly for non-metadata) that are never used.
> Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear
> to let us distinguish between "read-ahead just in case but almost
> certain to be accessed" (like directory blocks) and "read ahead on the
> offchance because someone did a single-block file read and what the hell
> let's suck in a bunch more".
>
> As it is, this seems to render bcache more or less useless with XFS,
> since bcache's primary raison d'etre is precisely to cache seeky stuff
> like metadata. :(
>
Hi Nix,

Could you please to try whether the attached patch makes things better ?

Thanks in advance for your help.

--

Coly Li


Attachments:
0001-bcache-use-REQ_META-REQ_PRIO-to-indicate-bio-for-met.patch (2.64 kB)

2019-02-07 08:19:07

by Coly Li

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 2019/2/7 11:10 上午, Dave Chinner wrote:
> On Thu, Feb 07, 2019 at 10:38:58AM +0800, Coly Li wrote:
>> On 2019/2/7 10:26 上午, Dave Chinner wrote:
>>> On Thu, Feb 07, 2019 at 01:24:25AM +0100, Andre Noll wrote:
>>>> On Thu, Feb 07, 10:43, Dave Chinner wrote
>>>>> File data readahead: REQ_RAHEAD
>>>>> Metadata readahead: REQ_META | REQ_RAHEAD
>>>>>
>>>>> drivers/md/bcache/request.c::check_should_bypass():
>>>>>
>>>>> /*
>>>>> * Flag for bypass if the IO is for read-ahead or background,
>>>>> * unless the read-ahead request is for metadata (eg, for gfs2).
>>>>> */
>>>>> if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
>>>>> !(bio->bi_opf & REQ_PRIO))
>>>>> goto skip;
>>>>>
>>>>> bcache needs fixing - it thinks REQ_PRIO means metadata IO. That's
>>>>> wrong - REQ_META means it's metadata IO, and so this is a bcache
>>>>> bug.
>>>>
>>>> Do you think 752f66a75abad is bad (ha!) and should be reverted?
>>>
>>> Yes, that change is just broken. From include/linux/blk_types.h:
>>>
>>> __REQ_META, /* metadata io request */
>>> __REQ_PRIO, /* boost priority in cfq */
>>>
>>>
>>
>> Hi Dave,
>>
>>> i.e. REQ_META means that it is a metadata request, REQ_PRIO means it
>>> is a "high priority" request. Two completely different things, often
>>> combined, but not interchangeable.
>>
>> I found in file system metadata IO, most of time REQ_META and REQ_PRIO
>> are tagged together for bio, but XFS seems not use REQ_PRIO.
>
> Yes, that's correct, because we don't specifically prioritise
> metadata IO over data IO.
>
>> Is there any basic principle for when should these tags to be used or
>> not ?
>
> Yes.
>
>> e.g. If REQ_META is enough for meta data I/O, why REQ_PRIO is used
>> too.
>
> REQ_META is used for metadata. REQ_PRIO is used to communicate to
> the lower layers that the submitter considers this IO to be more
> important that non REQ_PRIO IO and so dispatch should be expedited.
>
> IOWs, if the filesystem considers metadata IO to be more important
> that user data IO, then it will use REQ_PRIO | REQ_META rather than
> just REQ_META.
>
> Historically speaking, REQ_PRIO was a hack for CFQ to get it to
> dispatch journal IO from a different thread without waiting for a
> time slice to expire. In the XFs world, we just said "don't use CFQ,
> it's fundametnally broken for highly concurrent applications" and
> didn't bother trying to hack around the limitations of CFQ.
>
> These days REQ_PRIO is only used by the block layer writeback
> throttle, but unlike bcache it considers both REQ_META and REQ_PRIO
> to mean the same thing.
>
> REQ_META, OTOH, is used by BFQ and blk-cgroup to detect metadata
> IO and don't look at REQ_PRIO at all. So, really, REQ_META is for
> metadata, not REQ_PRIO. REQ_PRIO looks like it should just go away.
>
>> And if REQ_PRIO is necessary, why it is not used in fs/xfs/ code ?
>
> It's not necessary, it's just an /optimisation/ that some
> filesystems make and some IO schedulers used to pay attention to. It
> looks completely redundant now.

Hi Dave,

Thanks for your detailed explanation. This is great hint from view of
file system developer :-)

I just compose a fix, hope it makes better.

--

Coly Li

2019-02-07 09:28:51

by Andre Noll

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Thu, Feb 07, 10:27, Coly Li wrote

> If different file system handles metadata flags in unified ways, it is
> OK to me to change the code to: !(bio->bi_opf & (REQ_META |REQ_PRIO)).

Yes, that's the smallest fix that should also go into 4.19-stable.

In the long run, we should try to get rid of the 45 instances of
REQ_PRIO. Most users specify REQ_META | REQ_PRIO anyway, which leaves
only a few other instances to look at.

I think the one in submit_bh_wbc() of fs/buffer.c can just be removed
while block/cfq-iosched.c does not use REQ_META at all, so the simple
s/REQ_PRIO/REQ_META should be OK. drivers/staging/erofs/data.c is
also easy to fix.

Best
Andre
--
Max Planck Institute for Developmental Biology
Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829
http://people.tuebingen.mpg.de/maan/


Attachments:
(No filename) (845.00 B)
signature.asc (201.00 B)
Download all attachments

2019-02-07 09:43:29

by Andre Noll

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On Thu, Feb 07, 16:16, Coly Li wrote
> From: Coly Li <[email protected]>
> Date: Thu, 7 Feb 2019 15:54:24 +0800
> Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata
>
> In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
> metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
> This assumption is not always correct, e.g. XFS uses REQ_META to mark
> metadata bio other than REQ_PRIO. This is why Nix reports a regression
> that bcache does not cache metadata for XFS after the above commit.

maybe s/reports a regression/noticed

> Thanks to Dave Chinner, he explains the difference between REQ_META and
> REQ_PRIO from view of file system developer. Here I quote part of his
> explanation from mailing list,
> REQ_META is used for metadata. REQ_PRIO is used to communicate to
> the lower layers that the submitter considers this IO to be more
> important that non REQ_PRIO IO and so dispatch should be expedited.
>
> IOWs, if the filesystem considers metadata IO to be more important
> that user data IO, then it will use REQ_PRIO | REQ_META rather than
> just REQ_META.
>
> Then it seems bios with REQ_META or REQ_PRIO should both be cached for
> performance optimation, because they are all probably low I/O latency
> demand by upper layer (e.g. file system).
>
> So in this patch, when we want to check whether a bio is metadata
> related, REQ_META and REQ_PRIO are both checked. Then both metadata and
> high priority I/O requests will be handled properly.

s/check whether a bio is metadata related/decide whether to bypass the cache

Apart from these two nitpicks, feel free to add my Reviewed-by.

Thanks
Andre
--
Max Planck Institute for Developmental Biology
Max-Planck-Ring 5, 72076 Tübingen, Germany. Phone: (+49) 7071 601 829
http://people.tuebingen.mpg.de/maan/


Attachments:
(No filename) (1.85 kB)
signature.asc (201.00 B)
Download all attachments

2019-02-07 10:24:13

by Coly Li

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2019/2/7 5:41 下午, Andre Noll wrote:
> On Thu, Feb 07, 16:16, Coly Li wrote
>> From: Coly Li <[email protected]> Date: Thu, 7 Feb 2019 15:54:24
>> +0800 Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to
>> indicate bio for metadata
>>
>> In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio
>> for metadata")' REQ_META is replaced by REQ_PRIO to indicate
>> metadata bio. This assumption is not always correct, e.g. XFS
>> uses REQ_META to mark metadata bio other than REQ_PRIO. This is
>> why Nix reports a regression that bcache does not cache metadata
>> for XFS after the above commit.
>
> maybe s/reports a regression/noticed
>
>> Thanks to Dave Chinner, he explains the difference between
>> REQ_META and REQ_PRIO from view of file system developer. Here I
>> quote part of his explanation from mailing list, REQ_META is used
>> for metadata. REQ_PRIO is used to communicate to the lower layers
>> that the submitter considers this IO to be more important that
>> non REQ_PRIO IO and so dispatch should be expedited.
>>
>> IOWs, if the filesystem considers metadata IO to be more
>> important that user data IO, then it will use REQ_PRIO | REQ_META
>> rather than just REQ_META.
>>
>> Then it seems bios with REQ_META or REQ_PRIO should both be
>> cached for performance optimation, because they are all probably
>> low I/O latency demand by upper layer (e.g. file system).
>>
>> So in this patch, when we want to check whether a bio is
>> metadata related, REQ_META and REQ_PRIO are both checked. Then
>> both metadata and high priority I/O requests will be handled
>> properly.
>
> s/check whether a bio is metadata related/decide whether to bypass
> the cache
>
> Apart from these two nitpicks, feel free to add my Reviewed-by.

Hi Andre,

Thanks for your review :-)

- --

Coly Li
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE6j5FL/T5SGCN6PrQxzkHk2t9+PwFAlxcBy0ACgkQxzkHk2t9
+PxmeRAAnIh3KzGx13lmpj/0sDlellc02eawvooSDE8zTNsqo/UzqAB1lzUvtJx1
SQXYepoM4hPjrGzDKBe+oaMWuzG6WO08kI0dMT1Q1cfKX2eucExa8G9oRHQqxxBq
wRChl4HKScqvyCy0XTQYet8QJwFAeBAdBzM2L0VOuulsRjLKBSfpcH+jr9fdNt/S
wWGv9KVlJDq3BZrRA0TA1ovDl4wohyvtXM4OL27ahUMlmj4cI0XlaoketaAVM8QS
z7uK+9sRdemW3cvfkl76wfIkiSN43Pk6RZcKRAHzlSgwF1L3xY7ZPxhFz187bFzI
ANpTLVXiv1JNx+jnceV4O7UVZyj06hQhh1VFLedqpAVvzgYNp8RRSaiWucNH7m5d
6qzw5+ob5jMEI2vdXooYC8hVwNaXa+3XfVU5QLlKVG9Al1kZxgkE3e8s28pjj3Df
WJXaqkOCnal12FZ9Hz+Ev/Jp/1wmN3TagPJyeVlc/0vQTIXfBFuSssL+n/1RrKjd
wTc2WuWBf2x7BLnuCX6z4fDvPLuW8LOJCrCYsqr4z5s7YI6mb0B4qDSqli8dN2TB
vxnY1K1WQW/9BhE/QQpXHKBD2nRbPYcaAZEhk3UvIdV2dvafdQpcD4FgrEXGKXV8
MmkaeTcQ4NcSNvGWgG9vaiY0acE8YGtqNsBDAtjOglJrP/tJjJA=
=kG5J
-----END PGP SIGNATURE-----

2019-02-07 13:12:31

by Nix

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 7 Feb 2019, Coly Li stated:
> On 2019/2/7 10:26 上午, Dave Chinner wrote:
>> So, yeah, that needs to be reverted if you want bcache to function
>> properly for metadata caching.
>
> Sure, I will fix this, once I make it clear to me.

I'll give it a test :)

The meaning of these flags was somewhat opaque to me, too (mostly due to
novelty: I've never really looked at anything in the block layer
before).

--
NULL && (void)

2019-02-07 20:54:14

by Nix

[permalink] [raw]
Subject: Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

On 7 Feb 2019, Coly Li uttered the following:
> On 2019/2/7 6:11 上午, Nix wrote:
>> As it is, this seems to render bcache more or less useless with XFS,
>> since bcache's primary raison d'etre is precisely to cache seeky stuff
>> like metadata. :(
>
> Hi Nix,
>
> Could you please to try whether the attached patch makes things better ?

Looks good! Before huge tree cp -al:

loom:~# bcache-stats
stats_total/bypassed: 1.0G
stats_total/cache_bypass_hits: 16
stats_total/cache_bypass_misses: 26436
stats_total/cache_hit_ratio: 46
stats_total/cache_hits: 24349
stats_total/cache_miss_collisions: 8
stats_total/cache_misses: 27898
stats_total/cache_readaheads: 0

After:

stats_total/bypassed: 1.1G
stats_total/cache_bypass_hits: 16
stats_total/cache_bypass_misses: 27176
stats_total/cache_hit_ratio: 43
stats_total/cache_hits: 24443
stats_total/cache_miss_collisions: 9
stats_total/cache_misses: 32152 <----
stats_total/cache_readaheads: 0

So loads of new misses. (A bunch of bypassed misses too. Not sure where
those came from, maybe some larger sequential reads somewhere, but a lot
is getting cached now, and every bit of metadata that gets cached means
things get a bit faster.)


btw I have ported ewheeler's ioprio-based cache hinting patch to 4.20;
I/O below the ioprio threshold bypasses everything, even metadata and
REQ_PRIO stuff. It was trivial, but I was able to spot and fix a tiny
bypass accounting bug in the patch in the process): see
http://www.esperi.org.uk/~nix/bundles/bcache-ioprio.bundle. (I figured
you didn't want almost exactly the same patch series as before posted to
the list, but I can do that if you prefer.)


Put this all together and it seems to work very well: my test massive
compile triggered 500MiB of metadata writes at the start and then the
actual compile (being entirely sequential reads) hardly wrote anything
out and was almost entirely bypassed: meanwhile a huge git push I ran at
idle priority didn't pollute the cache at all. Excellent!

(I'm also keeping write volumes down by storing transient things like
objdirs that just sit in the page cache and then get deleted on a
separate non-bcached, non-journalled ext4 fs at the start of the the
spinning rust disk, with subdirs of this fs bind-mounted into various
places as needed. I should make the scripts that do that public because
they seem likely to be useful to bcache users...)


Semi-unrelated side note: after my most recent reboot, which involved a
bcache journal replay even though my shutdown was clean, the stats_total
reset; the cache device's bcache/written and
bcache/set/cache_available_percent also flipped to 0 and 100%,. I
suspect this is merely a stats bug of some sort, because the boot was
notably faster than before and cache_hits was about 6000 by the time it
was done. bcache/priority_stats *does* say that the cache is "only" 98%
unused, like it did before. Maybe cache_available_percent doesn't mean
what I thought it did.

--
NULL && (void)