2022-06-01 19:51:00

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()

On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> This is the first series of patches aimed towards the conversion of Btrfs
> filesystem from the use of kmap() to kmap_local_page().

We've already had patches converting kmaps and you're changing the last
ones, so this is could be the last series, with two exceptions.

1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
used in zstd.c.

2) kmap_atomic in inode.c, so that's technically not kmap but it's said
to be deprecated and also can be replaced by kmap_local_page. The
context in check_compressed_csum is atomic (in end io) so the kmap
hasn't been used there.

> tweed32:~ # btrfs check -p ~zoek/dev/btrfs.file

That won't verify if the kmap conversion is OK, it's a runtime thing
while 'check' verifies the data on device. Have you run any kind of
stress test with enabled compression before running the check?

> Opening filesystem to check...
> Checking filesystem on /home/zoek/dev/btrfs.file
> UUID: 897d65c5-1167-45b4-b811-2bfe74a320ca
> [1/7] checking root items (0:00:00 elapsed, 1774 items checked)
> [2/7] checking extents (0:00:00 elapsed, 135 items checked)
> [3/7] checking free space tree (0:00:00 elapsed, 4 items checked)
> [4/7] checking fs roots (0:00:00 elapsed, 104 items checked)
> [5/7] checking csums (without verifying data) (0:00:00 elapsed, 205 items checked)
> [6/7] checking root refs (0:00:00 elapsed, 3 items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 47394816 bytes used, no error found
> total csum bytes: 44268
> total tree bytes: 2064384
> total fs tree bytes: 1720320
> total extent tree bytes: 180224
> btree space waste bytes: 465350
> file data blocks allocated: 45330432
> referenced 45330432
>
> Fabio M. De Francesco (3):
> btrfs: Replace kmap() with kmap_local_page() in inode.c
> btrfs: Replace kmap() with kmap_local_page() in lzo.c
> btrfs: Replace kmap() with kmap_local_page() in zlib.c

Please send patches converting zstd.c and the remaining kmap_atomic
usage in inode.c, otherwise the 3 patches are now in misc-next, thanks.


2022-06-03 00:46:24

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()

On mercoledì 1 giugno 2022 15:25:45 CEST David Sterba wrote:
> On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > This is the first series of patches aimed towards the conversion of
Btrfs
> > filesystem from the use of kmap() to kmap_local_page().
>
> We've already had patches converting kmaps and you're changing the last
> ones, so this is could be the last series, with two exceptions.
>
> 1) You've changed lzo.c and zlib.

and inode.c.

> but the same kmap/kunmap pattern is
> used in zstd.c.

I thought that these mappings I had worked on were safe to convert.

Instead I wasn't sure that the others I left untouched in zstd.c could so
easily and mechanically be converted without prior code inspection and
proper tests.

I did see those in zstd.c, but I decided to postpone those conversions
because I'm not yet sure if and how the virtual addresses we get currently
from kmap() are re-used.

I saw assignments like "workspace->in_buf.src = kmap(in_page);". Is
"in_buf" re-used across different contexts? (I see that Btrfs uses a dozen
workqueues).

I also see that kunmap() is called in the same functions that assign
virtual addresses to "in_buf" and this makes me think that those addresses
are not handled across contexts, otherwise you already have bugs. But may
very well be that somewhere in the calls chain the code flushes workqueues
before returning to the callers (it would mean that when kunmap() is called
we can be sure that those workqueues are done with using those addresses).

Furthermore, what can you say about the tens of page_address() spread
across the whole fs/btrfs?

If those page_address() take pages from HIGHMEM which were mapped using
kmap_local_page(), the filesystem will oops the kernel...

About this issue, please see a bug fix ("[PATCH v2] fs/ext2: Avoid
page_address on pages returned by ext2_get_page") at
https://lore.kernel.org/lkml/
[email protected]/#r

Do they only use physical memory from ZONE_NORMAL?

Can you please confirm that it is safe to convert those left kmap() to
kmap_local_page() and that those page_address() are safe?

If so, I have no problems to convert what I had left for later. Otherwise
I'll need to carefully inspect the code.

> 2) kmap_atomic in inode.c, so that's technically not kmap but it's said
> to be deprecated and also can be replaced by kmap_local_page. The
> context in check_compressed_csum is atomic (in end io) so the kmap
> hasn't been used there.

I was not 100% sure about the preemption requirements for those call sites
so I had not converted them yet. Are you saying that there is no need for
preempt_disable() at the following sites?

# git grep kmap_atomic fs/btrfs
fs/btrfs/compression.c: kaddr = kmap_atomic(page);
fs/btrfs/inode.c: kaddr = kmap_atomic(cpage);
fs/btrfs/inode.c: kaddr = kmap_atomic(page);
fs/btrfs/inode.c: kaddr = kmap_atomic(page);

> > tweed32:~ # btrfs check -p ~zoek/dev/btrfs.file
>
> That won't verify if the kmap conversion is OK, it's a runtime thing
> while 'check' verifies the data on device. Have you run any kind of
> stress test with enabled compression before running the check?

Ah, thanks. I didn't know this thing.

I installed (x)fstests a couple of days ago. I think it helps to test this
and other conversions to local mappings, but I haven't yet had time to
learn how to use it.

Does (x)fstests cover the compression code? Are there any specific tests I
should target?

> Please send patches converting zstd.c and the remaining kmap_atomic
> usage in inode.c, otherwise the 3 patches are now in misc-next, thanks.

New version is required in any case because LKP reported four uninitialized
variables in patch 3/3.

I'm just reading the reports that both you and Christoph hit. At first
sight they seem to be due to page_address() calls (but I may be wrong, just
had few minutes to reply so late) :(

I was wrong in thinking that these call sites could be converted safely.
I'll do the tests before posting v2.

Thanks,

Fabio

P.S.: I've just read a message from Ira Weiny about something he saw in the
unmapping order in zstd_compress_pages(). We must think how to address
mapping /unmapping order properly.





2022-06-03 17:38:18

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()

On Wed, Jun 01, 2022 at 03:25:45PM +0200, David Sterba wrote:
> On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > This is the first series of patches aimed towards the conversion of Btrfs
> > filesystem from the use of kmap() to kmap_local_page().
>
> We've already had patches converting kmaps and you're changing the last
> ones, so this is could be the last series, with two exceptions.
>
> 1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
> used in zstd.c.

I checked out zstd.c and one of the issues there is the way that the input
workspace is mapped page by page while iterating those pages.

This got me thinking about what Willy said at LSFmm concerning something like
kmap_local_range(). Mapping more than 1 page at a time could save some
unmap/remap of output pages required for kmap_local_page() ordering.

Unfortunately, I think the length of the input is probably to long in many
cases. And some remapping may still be required.

Cc: Willy

As an aside, Willy, I'm thinking that a kmap_local_range() should return a
folio in some way. Would you agree?

Ira

2022-06-06 04:33:23

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/3] btrfs: Replace kmap() with kmap_local_page()

On Thu, Jun 02, 2022 at 09:22:15AM -0700, Ira Weiny wrote:
> On Wed, Jun 01, 2022 at 03:25:45PM +0200, David Sterba wrote:
> > On Tue, May 31, 2022 at 04:53:32PM +0200, Fabio M. De Francesco wrote:
> > > This is the first series of patches aimed towards the conversion of Btrfs
> > > filesystem from the use of kmap() to kmap_local_page().
> >
> > We've already had patches converting kmaps and you're changing the last
> > ones, so this is could be the last series, with two exceptions.
> >
> > 1) You've changed lzo.c and zlib. but the same kmap/kunmap pattern is
> > used in zstd.c.
>
> I checked out zstd.c and one of the issues there is the way that the input
> workspace is mapped page by page while iterating those pages.
>
> This got me thinking about what Willy said at LSFmm concerning something like
> kmap_local_range(). Mapping more than 1 page at a time could save some
> unmap/remap of output pages required for kmap_local_page() ordering.

Umm ... Not entirely sure what I said, but it'd be really hard to kmap
multiple pages with the current PAE implementation. I've steered away
from doing that for now, and kmap_local_folio() just guarantees the
page that the offset lands in is mapped.

I don't think the right answer is having a kmap_folio() that will map
the entire folio. I'd be more tempted to add vmap_folio() for that.
My understanding is that PAE systems have more address space available
for vmap than they do for kmap.

> Unfortunately, I think the length of the input is probably to long in many
> cases. And some remapping may still be required.
>
> Cc: Willy
>
> As an aside, Willy, I'm thinking that a kmap_local_range() should return a
> folio in some way. Would you agree?

I imagine it taking a folio to describe the range that's being accessed.
But maybe it should be a phys_addr_t? I tend to prefer phys_addr_t over
pfn + offset as it's more compact on 64-bit systems and the same on
32-bit systems.