2020-12-17 09:18:10

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 18/25] btrfs: Use readahead_batch_length

On 12/16/20 10:23 AM, Matthew Wilcox (Oracle) wrote:
> Implement readahead_batch_length() to determine the number of bytes in
> the current batch of readahead pages and use it in btrfs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> fs/btrfs/extent_io.c | 6 ++----
> include/linux/pagemap.h | 9 +++++++++
> 2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6e3b72e63e42..42936a83a91b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4436,10 +4436,8 @@ void extent_readahead(struct readahead_control *rac)
> int nr;
>
> while ((nr = readahead_page_batch(rac, pagepool))) {
> - u64 contig_start = page_offset(pagepool[0]);
> - u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
> -
> - ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> + u64 contig_start = readahead_pos(rac);
> + u64 contig_end = contig_start + readahead_batch_length(rac);

Something in this tiny change is breaking btrfs: it hangs my Fedora 33 test
system (which changed over to btrfs) on boot. I haven't quite figured out
what's really wrong, but git bisect lands here, *and* turning the whole
extent_readahead() function into a no-op (on top of the whole series)
allows everything to work once again.

Sorry for not actually solving the root cause, but I figured you'd be able
to jump straight to the answer, with the above information, so I'm sending
it out early.


thanks,
--
John Hubbard
NVIDIA


2020-12-17 12:14:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 18/25] btrfs: Use readahead_batch_length

On Thu, Dec 17, 2020 at 01:15:10AM -0800, John Hubbard wrote:
> On 12/16/20 10:23 AM, Matthew Wilcox (Oracle) wrote:
> > Implement readahead_batch_length() to determine the number of bytes in
> > the current batch of readahead pages and use it in btrfs.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > fs/btrfs/extent_io.c | 6 ++----
> > include/linux/pagemap.h | 9 +++++++++
> > 2 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 6e3b72e63e42..42936a83a91b 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4436,10 +4436,8 @@ void extent_readahead(struct readahead_control *rac)
> > int nr;
> > while ((nr = readahead_page_batch(rac, pagepool))) {
> > - u64 contig_start = page_offset(pagepool[0]);
> > - u64 contig_end = page_offset(pagepool[nr - 1]) + PAGE_SIZE - 1;
> > -
> > - ASSERT(contig_start + nr * PAGE_SIZE - 1 == contig_end);
> > + u64 contig_start = readahead_pos(rac);
> > + u64 contig_end = contig_start + readahead_batch_length(rac);
>
> Something in this tiny change is breaking btrfs: it hangs my Fedora 33 test
> system (which changed over to btrfs) on boot. I haven't quite figured out
> what's really wrong, but git bisect lands here, *and* turning the whole
> extent_readahead() function into a no-op (on top of the whole series)
> allows everything to work once again.
>
> Sorry for not actually solving the root cause, but I figured you'd be able
> to jump straight to the answer, with the above information, so I'm sending
> it out early.

ehh ... probably an off-by-one. Does subtracting 1 from contig_end fix it?
I'll spool up a test VM shortly and try it out.

2020-12-17 13:47:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 18/25] btrfs: Use readahead_batch_length

On Thu, Dec 17, 2020 at 12:12:46PM +0000, Matthew Wilcox wrote:
> ehh ... probably an off-by-one. Does subtracting 1 from contig_end fix it?
> I'll spool up a test VM shortly and try it out.

Yes, this fixed it:

- u64 contig_end = contig_start + readahead_batch_length(rac);
+ u64 contig_end = contig_start + readahead_batch_length(rac) - 1;

2020-12-17 19:39:42

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 18/25] btrfs: Use readahead_batch_length

On 12/17/20 5:42 AM, Matthew Wilcox wrote:
> On Thu, Dec 17, 2020 at 12:12:46PM +0000, Matthew Wilcox wrote:
>> ehh ... probably an off-by-one. Does subtracting 1 from contig_end fix it?
>> I'll spool up a test VM shortly and try it out.
>
> Yes, this fixed it:
>
> - u64 contig_end = contig_start + readahead_batch_length(rac);
> + u64 contig_end = contig_start + readahead_batch_length(rac) - 1;
>

Yes, confirmed on my end, too.

thanks,
--
John Hubbard
NVIDIA