2023-12-13 23:35:13

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
<[email protected]> wrote:
>
> Change the dstmem size from 2 * PAGE_SIZE to only one page since
> we only need at most one page when compress, and the "dlen" is also
> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
> we don't wanna store the output in zswap anyway.
>
> So change it to one page, and delete the stale comment.

I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
be nice if someone has the context, perhaps one of the maintainers.

One potential reason is that we used to store a zswap header
containing the swap entry in the compressed page for writeback
purposes, but we don't do that anymore. Maybe we wanted to be able to
handle the case where an incompressible page would exceed PAGE_SIZE
because of that?


2023-12-14 00:18:51

by Nhat Pham

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
>
> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
> <[email protected]> wrote:
> >
> > Change the dstmem size from 2 * PAGE_SIZE to only one page since
> > we only need at most one page when compress, and the "dlen" is also
> > PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
> > we don't wanna store the output in zswap anyway.
> >
> > So change it to one page, and delete the stale comment.
>
> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
> be nice if someone has the context, perhaps one of the maintainers.

It'd be very nice indeed.

>
> One potential reason is that we used to store a zswap header
> containing the swap entry in the compressed page for writeback
> purposes, but we don't do that anymore. Maybe we wanted to be able to
> handle the case where an incompressible page would exceed PAGE_SIZE
> because of that?

It could be hmm. I didn't study the old zswap architecture too much,
but it has been 2 * PAGE_SIZE since the time zswap was first merged
last I checked.
I'm not 100% comfortable ACK-ing the undoing of something that looks
so intentional, but FTR, AFAICT, this looks correct to me.

2023-12-14 13:34:14

by Chengming Zhou

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On 2023/12/14 08:18, Nhat Pham wrote:
> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
>>
>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
>> <[email protected]> wrote:
>>>
>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
>>> we only need at most one page when compress, and the "dlen" is also
>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
>>> we don't wanna store the output in zswap anyway.
>>>
>>> So change it to one page, and delete the stale comment.
>>
>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
>> be nice if someone has the context, perhaps one of the maintainers.
>
> It'd be very nice indeed.
>
>>
>> One potential reason is that we used to store a zswap header
>> containing the swap entry in the compressed page for writeback
>> purposes, but we don't do that anymore. Maybe we wanted to be able to
>> handle the case where an incompressible page would exceed PAGE_SIZE
>> because of that?
>
> It could be hmm. I didn't study the old zswap architecture too much,
> but it has been 2 * PAGE_SIZE since the time zswap was first merged
> last I checked.
> I'm not 100% comfortable ACK-ing the undoing of something that looks
> so intentional, but FTR, AFAICT, this looks correct to me.

Right, there is no any history about the reason why we needed 2 pages.
But obviously only one page is needed from the current code and no any
problem found in the kernel build stress testing.

Thanks!

2023-12-14 13:38:05

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/14 08:18, Nhat Pham wrote:
> > On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
> >>
> >> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
> >> <[email protected]> wrote:
> >>>
> >>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
> >>> we only need at most one page when compress, and the "dlen" is also
> >>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
> >>> we don't wanna store the output in zswap anyway.
> >>>
> >>> So change it to one page, and delete the stale comment.
> >>
> >> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
> >> be nice if someone has the context, perhaps one of the maintainers.
> >
> > It'd be very nice indeed.
> >
> >>
> >> One potential reason is that we used to store a zswap header
> >> containing the swap entry in the compressed page for writeback
> >> purposes, but we don't do that anymore. Maybe we wanted to be able to
> >> handle the case where an incompressible page would exceed PAGE_SIZE
> >> because of that?
> >
> > It could be hmm. I didn't study the old zswap architecture too much,
> > but it has been 2 * PAGE_SIZE since the time zswap was first merged
> > last I checked.
> > I'm not 100% comfortable ACK-ing the undoing of something that looks
> > so intentional, but FTR, AFAICT, this looks correct to me.
>
> Right, there is no any history about the reason why we needed 2 pages.
> But obviously only one page is needed from the current code and no any
> problem found in the kernel build stress testing.

Could you try manually stressing the compression with data that
doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
that this case is specifically handled. I think using data from
/dev/random will do that but please double check that dlen ==
PAGE_SIZE.

2023-12-14 13:57:34

by Chengming Zhou

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On 2023/12/14 21:37, Yosry Ahmed wrote:
> On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
> <[email protected]> wrote:
>>
>> On 2023/12/14 08:18, Nhat Pham wrote:
>>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
>>>>
>>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
>>>> <[email protected]> wrote:
>>>>>
>>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
>>>>> we only need at most one page when compress, and the "dlen" is also
>>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
>>>>> we don't wanna store the output in zswap anyway.
>>>>>
>>>>> So change it to one page, and delete the stale comment.
>>>>
>>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
>>>> be nice if someone has the context, perhaps one of the maintainers.
>>>
>>> It'd be very nice indeed.
>>>
>>>>
>>>> One potential reason is that we used to store a zswap header
>>>> containing the swap entry in the compressed page for writeback
>>>> purposes, but we don't do that anymore. Maybe we wanted to be able to
>>>> handle the case where an incompressible page would exceed PAGE_SIZE
>>>> because of that?
>>>
>>> It could be hmm. I didn't study the old zswap architecture too much,
>>> but it has been 2 * PAGE_SIZE since the time zswap was first merged
>>> last I checked.
>>> I'm not 100% comfortable ACK-ing the undoing of something that looks
>>> so intentional, but FTR, AFAICT, this looks correct to me.
>>
>> Right, there is no any history about the reason why we needed 2 pages.
>> But obviously only one page is needed from the current code and no any
>> problem found in the kernel build stress testing.
>
> Could you try manually stressing the compression with data that
> doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
> that this case is specifically handled. I think using data from
> /dev/random will do that but please double check that dlen ==
> PAGE_SIZE.

I just did the same kernel build testing, indeed there are a few cases
that output dlen == PAGE_SIZE.

bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'

@[1]: 2
@[0]: 12011430

2023-12-14 15:04:09

by Chengming Zhou

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On 2023/12/14 21:57, Chengming Zhou wrote:
> On 2023/12/14 21:37, Yosry Ahmed wrote:
>> On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
>> <[email protected]> wrote:
>>>
>>> On 2023/12/14 08:18, Nhat Pham wrote:
>>>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
>>>>>
>>>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
>>>>>> we only need at most one page when compress, and the "dlen" is also
>>>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
>>>>>> we don't wanna store the output in zswap anyway.
>>>>>>
>>>>>> So change it to one page, and delete the stale comment.
>>>>>
>>>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
>>>>> be nice if someone has the context, perhaps one of the maintainers.
>>>>
>>>> It'd be very nice indeed.
>>>>
>>>>>
>>>>> One potential reason is that we used to store a zswap header
>>>>> containing the swap entry in the compressed page for writeback
>>>>> purposes, but we don't do that anymore. Maybe we wanted to be able to
>>>>> handle the case where an incompressible page would exceed PAGE_SIZE
>>>>> because of that?
>>>>
>>>> It could be hmm. I didn't study the old zswap architecture too much,
>>>> but it has been 2 * PAGE_SIZE since the time zswap was first merged
>>>> last I checked.
>>>> I'm not 100% comfortable ACK-ing the undoing of something that looks
>>>> so intentional, but FTR, AFAICT, this looks correct to me.
>>>
>>> Right, there is no any history about the reason why we needed 2 pages.
>>> But obviously only one page is needed from the current code and no any
>>> problem found in the kernel build stress testing.
>>
>> Could you try manually stressing the compression with data that
>> doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
>> that this case is specifically handled. I think using data from
>> /dev/random will do that but please double check that dlen ==
>> PAGE_SIZE.
>
> I just did the same kernel build testing, indeed there are a few cases
> that output dlen == PAGE_SIZE.
>
> bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'
>
> @[1]: 2
> @[0]: 12011430

I think we shouldn't put these poorly compressed output into zswap,
maybe it's better to early return in these cases when compress ratio
< threshold ratio, which can be tune by the user?

e.g. in the same kernel build testing:

bpftrace -e 'k:zpool_malloc {@[(uint32)arg1>2048]=count()}'

@[1]: 1597706
@[0]: 10886138

2023-12-14 18:31:23

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On Thu, Dec 14, 2023 at 5:57 AM Chengming Zhou
<[email protected]> wrote:
>
> On 2023/12/14 21:37, Yosry Ahmed wrote:
> > On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
> > <[email protected]> wrote:
> >>
> >> On 2023/12/14 08:18, Nhat Pham wrote:
> >>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
> >>>>
> >>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
> >>>>> we only need at most one page when compress, and the "dlen" is also
> >>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
> >>>>> we don't wanna store the output in zswap anyway.
> >>>>>
> >>>>> So change it to one page, and delete the stale comment.
> >>>>
> >>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
> >>>> be nice if someone has the context, perhaps one of the maintainers.
> >>>
> >>> It'd be very nice indeed.
> >>>
> >>>>
> >>>> One potential reason is that we used to store a zswap header
> >>>> containing the swap entry in the compressed page for writeback
> >>>> purposes, but we don't do that anymore. Maybe we wanted to be able to
> >>>> handle the case where an incompressible page would exceed PAGE_SIZE
> >>>> because of that?
> >>>
> >>> It could be hmm. I didn't study the old zswap architecture too much,
> >>> but it has been 2 * PAGE_SIZE since the time zswap was first merged
> >>> last I checked.
> >>> I'm not 100% comfortable ACK-ing the undoing of something that looks
> >>> so intentional, but FTR, AFAICT, this looks correct to me.
> >>
> >> Right, there is no any history about the reason why we needed 2 pages.
> >> But obviously only one page is needed from the current code and no any
> >> problem found in the kernel build stress testing.
> >
> > Could you try manually stressing the compression with data that
> > doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
> > that this case is specifically handled. I think using data from
> > /dev/random will do that but please double check that dlen ==
> > PAGE_SIZE.
>
> I just did the same kernel build testing, indeed there are a few cases
> that output dlen == PAGE_SIZE.
>
> bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'
>
> @[1]: 2
> @[0]: 12011430

That's very useful information, thanks for testing that. Please
include this in the commit log. Please also include the fact that we
used to store a zswap header with the compressed page but don't do
that anymore, which *may* be the reason why this was needed back then.

I still want someone who knows the history to Ack this, but FWIW it
looks correct to me, so low-key:
Reviewed-by: Yosry Ahmed <[email protected]>

2023-12-14 18:34:53

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

[..]
>
> I think we shouldn't put these poorly compressed output into zswap,
> maybe it's better to early return in these cases when compress ratio
> < threshold ratio, which can be tune by the user?

We have something similar at Google, but because we use zswap without
a backing swapfile, we make those pages unevictable. For the upstream
code, the pages will go to a backing swapfile, which arguably violates
the LRU ordering, but may be the correct thing to do. There was a
recent upstream attempt to solidify storing those incompressible pages
in zswap in their uncompressed form to retain the LRU ordering.

If you want, feel free to start a discussion about this separately,
it's out of context for this patch series.

Thanks!

2023-12-14 20:29:47

by Nhat Pham

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm/zswap: change dstmem size to one page

On Thu, Dec 14, 2023 at 10:30 AM Yosry Ahmed <[email protected]> wrote:
>
> On Thu, Dec 14, 2023 at 5:57 AM Chengming Zhou
> <[email protected]> wrote:
> >
> > On 2023/12/14 21:37, Yosry Ahmed wrote:
> > > On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou
> > > <[email protected]> wrote:
> > >>
> > >> On 2023/12/14 08:18, Nhat Pham wrote:
> > >>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <[email protected]> wrote:
> > >>>>
> > >>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou
> > >>>> <[email protected]> wrote:
> > >>>>>
> > >>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since
> > >>>>> we only need at most one page when compress, and the "dlen" is also
> > >>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
> > >>>>> we don't wanna store the output in zswap anyway.
> > >>>>>
> > >>>>> So change it to one page, and delete the stale comment.
> > >>>>
> > >>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would
> > >>>> be nice if someone has the context, perhaps one of the maintainers.
> > >>>
> > >>> It'd be very nice indeed.
> > >>>
> > >>>>
> > >>>> One potential reason is that we used to store a zswap header
> > >>>> containing the swap entry in the compressed page for writeback
> > >>>> purposes, but we don't do that anymore. Maybe we wanted to be able to
> > >>>> handle the case where an incompressible page would exceed PAGE_SIZE
> > >>>> because of that?
> > >>>
> > >>> It could be hmm. I didn't study the old zswap architecture too much,
> > >>> but it has been 2 * PAGE_SIZE since the time zswap was first merged
> > >>> last I checked.
> > >>> I'm not 100% comfortable ACK-ing the undoing of something that looks
> > >>> so intentional, but FTR, AFAICT, this looks correct to me.
> > >>
> > >> Right, there is no any history about the reason why we needed 2 pages.
> > >> But obviously only one page is needed from the current code and no any
> > >> problem found in the kernel build stress testing.
> > >
> > > Could you try manually stressing the compression with data that
> > > doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure
> > > that this case is specifically handled. I think using data from
> > > /dev/random will do that but please double check that dlen ==
> > > PAGE_SIZE.

FWIW, zsmalloc supports the storing of pages that are PAGE_SIZE in
length, so a use case is probably there (although it could be for
ZRAM). We tested it during the storing-uncompressed-pages patch.
Architecturally, it seems that zswap just lets the backend allocator
handle the rejection of compressed objects that are too large, and the
compressor to reject pages that are too poorly compressed.

> >
> > I just did the same kernel build testing, indeed there are a few cases
> > that output dlen == PAGE_SIZE.
> >
> > bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'
> >
> > @[1]: 2
> > @[0]: 12011430
>
> That's very useful information, thanks for testing that. Please
> include this in the commit log. Please also include the fact that we
> used to store a zswap header with the compressed page but don't do
> that anymore, which *may* be the reason why this was needed back then.
>
> I still want someone who knows the history to Ack this, but FWIW it
> looks correct to me, so low-key:
> Reviewed-by: Yosry Ahmed <[email protected]>

Anyway:
Reviewed-by: Nhat Pham <[email protected]>