2024-03-01 19:29:02

by Barry Song

[permalink] [raw]
Subject: [PATCH v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

From: Barry Song <[email protected]>

while sg_nents is 1, which is always true for the current kernel
as the only user - zswap is this case, we might have a chance to
remove memcpy, thus improve the performance.
Though sg_nents is 1, its buffer might cross two pages. If those
pages are highmem, we have no cheap way to map them to contiguous
virtual address because kmap doesn't support more than one page
(kmap single higmem page could be still expensive for tlb) and
vmap is expensive.
So we also test and enure page is not highmem in order to safely
use page_to_virt before removing the memcpy. The good news is
that in the most majority of cases, we are lowmem, and we are
always lowmem in those modern and popular hardware.

Cc: Johannes Weiner <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Tested-by: Chengming Zhou <[email protected]>
---
-v7:
* fix the problem pointed out by Herbert - flush all pages if dst
is longer than one page.

crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index b108a30a7600..60bbb7ea4060 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ void *src, *dst;
unsigned int dlen;
int ret;

@@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);

- scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0);
+ if (sg_nents(req->src) == 1 && !PageHighMem(sg_page(req->src))) {
+ src = page_to_virt(sg_page(req->src)) + req->src->offset;
+ } else {
+ scatterwalk_map_and_copy(scratch->src, req->src, 0,
+ req->slen, 0);
+ src = scratch->src;
+ }
+
+ if (req->dst && sg_nents(req->dst) == 1 && !PageHighMem(sg_page(req->dst)))
+ dst = page_to_virt(sg_page(req->dst)) + req->dst->offset;
+ else
+ dst = scratch->dst;
+
if (dir)
- ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
- scratch->dst, &req->dlen, *ctx);
+ ret = crypto_scomp_compress(scomp, src, req->slen,
+ dst, &req->dlen, *ctx);
else
- ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
- scratch->dst, &req->dlen, *ctx);
+ ret = crypto_scomp_decompress(scomp, src, req->slen,
+ dst, &req->dlen, *ctx);
if (!ret) {
if (!req->dst) {
req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
@@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOSPC;
goto out;
}
- scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
- 1);
+ if (dst == scratch->dst) {
+ scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
+ req->dlen, 1);
+ } else {
+ int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
+ int i;
+ struct page *dst_page = sg_page(req->dst);
+
+ for (i = 0; i < nr_pages; i++)
+ flush_dcache_page(dst_page + i);
+ }
}
out:
spin_unlock(&scratch->lock);
--
2.34.1



2024-03-08 11:26:33

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
> From: Barry Song <[email protected]>
>
> while sg_nents is 1, which is always true for the current kernel
> as the only user - zswap is this case, we might have a chance to
> remove memcpy, thus improve the performance.
> Though sg_nents is 1, its buffer might cross two pages. If those
> pages are highmem, we have no cheap way to map them to contiguous
> virtual address because kmap doesn't support more than one page
> (kmap single higmem page could be still expensive for tlb) and
> vmap is expensive.
> So we also test and enure page is not highmem in order to safely
> use page_to_virt before removing the memcpy. The good news is
> that in the most majority of cases, we are lowmem, and we are
> always lowmem in those modern and popular hardware.
>
> Cc: Johannes Weiner <[email protected]>
> Cc: Nhat Pham <[email protected]>
> Cc: Yosry Ahmed <[email protected]>
> Signed-off-by: Barry Song <[email protected]>
> Tested-by: Chengming Zhou <[email protected]>
> ---
> -v7:
> * fix the problem pointed out by Herbert - flush all pages if dst
> is longer than one page.
>
> crypto/scompress.c | 36 +++++++++++++++++++++++++++++-------
> 1 file changed, 29 insertions(+), 7 deletions(-)

Patch applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2024-03-18 00:14:10

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

On 3/17/24 16:48, Barry Song wrote:
> On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <[email protected]> wrote:
>>
>> Hi,
>>
>> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
>> [ ... ]
>>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>> ret = -ENOSPC;
>>> goto out;
>>> }
>>> - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>>> - 1);
>>> + if (dst == scratch->dst) {
>>> + scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
>>> + req->dlen, 1);
>>> + } else {
>>> + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
>>> + int i;
>>> + struct page *dst_page = sg_page(req->dst);
>>> +
>>> + for (i = 0; i < nr_pages; i++)
>>> + flush_dcache_page(dst_page + i);
>>
>> flush_dcache_page() is an empty macro on some architectures
>> such as xtensa. This results in
>
> Hi Guenter,
>
> this is a bug of xtensa, could you test the patch:

Thanks for the update.

> https://lore.kernel.org/all/[email protected]/
>

That doesn't build for me.

CC arch/xtensa/kernel/asm-offsets.s
In file included from ./include/linux/highmem.h:8,
from ./include/linux/bvec.h:10,
from ./include/linux/blk_types.h:10,
from ./include/linux/writeback.h:13,
from ./include/linux/memcontrol.h:23,
from ./include/linux/swap.h:9,
from ./include/linux/suspend.h:5,
from arch/xtensa/kernel/asm-offsets.c:24:
./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef]
9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE

A similar works for loongarch, so I don't know what is wrong.
Maybe some context patch is missing.

Guenter


2024-03-18 00:21:29

by Barry Song

[permalink] [raw]
Subject: Re: [PATCH v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

On Mon, Mar 18, 2024 at 8:14 AM Guenter Roeck <[email protected]> wrote:
>
> On 3/17/24 16:48, Barry Song wrote:
> > On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
> >> [ ... ]
> >>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
> >>> ret = -ENOSPC;
> >>> goto out;
> >>> }
> >>> - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> >>> - 1);
> >>> + if (dst == scratch->dst) {
> >>> + scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
> >>> + req->dlen, 1);
> >>> + } else {
> >>> + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
> >>> + int i;
> >>> + struct page *dst_page = sg_page(req->dst);
> >>> +
> >>> + for (i = 0; i < nr_pages; i++)
> >>> + flush_dcache_page(dst_page + i);
> >>
> >> flush_dcache_page() is an empty macro on some architectures
> >> such as xtensa. This results in
> >
> > Hi Guenter,
> >
> > this is a bug of xtensa, could you test the patch:
>
> Thanks for the update.
>
> > https://lore.kernel.org/all/[email protected]/
> >
>
> That doesn't build for me.
>
> CC arch/xtensa/kernel/asm-offsets.s
> In file included from ./include/linux/highmem.h:8,
> from ./include/linux/bvec.h:10,
> from ./include/linux/blk_types.h:10,
> from ./include/linux/writeback.h:13,
> from ./include/linux/memcontrol.h:23,
> from ./include/linux/swap.h:9,
> from ./include/linux/suspend.h:5,
> from arch/xtensa/kernel/asm-offsets.c:24:
> ./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef]
> 9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
>
> A similar works for loongarch, so I don't know what is wrong.
> Maybe some context patch is missing.

this is weird as include/asm-generic/cacheflush.h will define it to 0

#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
static inline void flush_dcache_page(struct page *page)
{
}

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#endif

Maybe somewhere else also needs to be fixed.
Can you tell me your toolchain version and toolchain name? and defconfig name?

>
> Guenter

Thanks
Barry

2024-03-18 00:33:59

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem

On 3/17/24 17:21, Barry Song wrote:
> On Mon, Mar 18, 2024 at 8:14 AM Guenter Roeck <[email protected]> wrote:
>>
>> On 3/17/24 16:48, Barry Song wrote:
>>> On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote:
>>>> [ ... ]
>>>>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
>>>>> ret = -ENOSPC;
>>>>> goto out;
>>>>> }
>>>>> - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>>>>> - 1);
>>>>> + if (dst == scratch->dst) {
>>>>> + scatterwalk_map_and_copy(scratch->dst, req->dst, 0,
>>>>> + req->dlen, 1);
>>>>> + } else {
>>>>> + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE);
>>>>> + int i;
>>>>> + struct page *dst_page = sg_page(req->dst);
>>>>> +
>>>>> + for (i = 0; i < nr_pages; i++)
>>>>> + flush_dcache_page(dst_page + i);
>>>>
>>>> flush_dcache_page() is an empty macro on some architectures
>>>> such as xtensa. This results in
>>>
>>> Hi Guenter,
>>>
>>> this is a bug of xtensa, could you test the patch:
>>
>> Thanks for the update.
>>
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>
>> That doesn't build for me.
>>
>> CC arch/xtensa/kernel/asm-offsets.s
>> In file included from ./include/linux/highmem.h:8,
>> from ./include/linux/bvec.h:10,
>> from ./include/linux/blk_types.h:10,
>> from ./include/linux/writeback.h:13,
>> from ./include/linux/memcontrol.h:23,
>> from ./include/linux/swap.h:9,
>> from ./include/linux/suspend.h:5,
>> from arch/xtensa/kernel/asm-offsets.c:24:
>> ./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef]
>> 9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
>>
>> A similar works for loongarch, so I don't know what is wrong.
>> Maybe some context patch is missing.
>
> this is weird as include/asm-generic/cacheflush.h will define it to 0
>
> #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> static inline void flush_dcache_page(struct page *page)
> {
> }
>
> #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
> #endif
>
> Maybe somewhere else also needs to be fixed.
> Can you tell me your toolchain version and toolchain name? and defconfig name?
>

config is allmodconfig - GCC_PLUGINS.

Toolchains are self-built from gcc source using the buildall system. I tried
gcc 11.4, 12.3, and 13.2. I don't think that matters, though, since
"asm-generic/cacheflush.h" isn't included from arch/xtensa. The diff below
seems to fix the problem, but I have not fully tested it.

Guenter

---
diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 11efdc8eb262..62662919cbbc 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -183,4 +183,6 @@ extern void copy_from_user_page(struct vm_area_struct*, struct page*,

#endif

+#include <asm-generic/cacheflush.h>
+
#endif /* _XTENSA_CACHEFLUSH_H */