2019-06-03 09:07:41

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 0/4] zstd: reduce stack usage

This patch set reduces stack usage for zstd code, because target like ARM has
limited 8KB kernel stack, which is getting overflowed due to hight stack usage
of zstd code with call flow like:

....
....
(FSE_compress_usingCTable) from (HUF_compressWeights_wksp+0x140/0x200)
(HUF_compressWeights_wksp) from (HUF_writeCTable_wksp+0xdc/0x1c8)
(HUF_writeCTable_wksp) from (HUF_compress4X_repeat+0x214/0x450)
(HUF_compress4X_repeat) from (ZSTD_compressBlock_internal+0x228/0x135c)
(ZSTD_compressBlock_internal) from (ZSTD_compressContinue_internal+0x1f8/0x3c8)
(ZSTD_compressContinue_internal) from (ZSTD_compressCCtx+0xc0/0x1cc)
(ZSTD_compressCCtx) from (zstd_compress+0x90/0xa8)
(zstd_compress) from (crypto_compress+0x2c/0x34)
(crypto_compress) from (zcomp_compress+0x3c/0x44)
(zcomp_compress) from (zram_bvec_rw+0x2f8/0xa7c)
(zram_bvec_rw) from [<c03e1e58>] (zram_rw_page+0x104/0x170)
(zram_rw_page) from [<c01d3f94>] (bdev_write_page+0x80/0xb4)
(bdev_write_page) from [<c017dc9c>] (__swap_writepage+0x160/0x29c)
(__swap_writepage) from [<c017de14>] (swap_writepage+0x3c/0x58)
(swap_writepage) from [<c014f88c>] (shrink_page_list+0x788/0xae0)
(shrink_page_list) from [<c01502e0>] (shrink_inactive_list+0x210/0x4a8)
(shrink_inactive_list) from [<c0150dd4>] (shrink_zone+0x53c/0x7c0)
(shrink_zone) from [<c0151354>] (try_to_free_pages+0x2fc/0x7cc)
(try_to_free_pages) from [<c0144c60>] (__alloc_pages_nodemask+0x534/0x91c)
(__alloc_pages_nodemask) from [<c01393ac>] (pagecache_get_page+0xe0/0x1d8)
....
....

Maninder Singh, Vaneet Narang (4):
zstd: pass pointer rathen than structure to functions
zstd: use U16 data type for rankPos
zstd: move params structure to global variable to reduce stack
usage
zstd: change structure variable from int to char

crypto/zstd.c | 11 ++++---
include/linux/zstd.h | 16 +++++-----
lib/zstd/compress.c | 85 +++++++++++++++++++++++++------------------------
lib/zstd/decompress.c | 2 +-
lib/zstd/huf_compress.c | 4 +--
5 files changed, 62 insertions(+), 56 deletions(-)

--
2.7.4


2019-06-03 09:07:41

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

currently params structure is passed in all functions, which increases
stack usage in all the function and lead to stack overflow on target like
ARM with kernel stack size of 8 KB so better to pass pointer.

Checked for ARM:

Original Patched
Call FLow Size: 1264 1040
....
(HUF_sort) -> 296
(HUF_buildCTable_wksp) -> 144
(HUF_compress4X_repeat) -> 88
(ZSTD_compressBlock_internal) -> 200
(ZSTD_compressContinue_internal)-> 136 -> 88
(ZSTD_compressCCtx) -> 192 -> 64
(zstd_compress) -> 144 -> 96
(crypto_compress) -> 32
(zcomp_compress) -> 32
....

Signed-off-by: Maninder Singh <[email protected]>
Signed-off-by: Vaneet Narang <[email protected]>

Fixing, Line 211: Using & instead of && makes this somewhat difficult to read.
It's hard to believe this is a performance optimization.
Signed-off-by: Joe Perches <[email protected]>
---
https://lkml.org/lkml/2019/5/10/539 <[email protected]>

crypto/zstd.c | 2 +-
include/linux/zstd.h | 10 +++----
lib/zstd/compress.c | 85 +++++++++++++++++++++++++++-------------------------
3 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/crypto/zstd.c b/crypto/zstd.c
index 2c04055..4e9ff22 100644
--- a/crypto/zstd.c
+++ b/crypto/zstd.c
@@ -162,7 +162,7 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
struct zstd_ctx *zctx = ctx;
const ZSTD_parameters params = zstd_params();

- out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
+ out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
if (ZSTD_isError(out_len))
return -EINVAL;
*dlen = out_len;
diff --git a/include/linux/zstd.h b/include/linux/zstd.h
index 249575e..5103efa 100644
--- a/include/linux/zstd.h
+++ b/include/linux/zstd.h
@@ -254,7 +254,7 @@ ZSTD_CCtx *ZSTD_initCCtx(void *workspace, size_t workspaceSize);
* ZSTD_isError().
*/
size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
- const void *src, size_t srcSize, ZSTD_parameters params);
+ const void *src, size_t srcSize, const ZSTD_parameters *params);

/**
* ZSTD_DCtxWorkspaceBound() - amount of memory needed to initialize a ZSTD_DCtx
@@ -324,7 +324,7 @@ size_t ZSTD_decompressDCtx(ZSTD_DCtx *ctx, void *dst, size_t dstCapacity,
*/
size_t ZSTD_compress_usingDict(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
const void *src, size_t srcSize, const void *dict, size_t dictSize,
- ZSTD_parameters params);
+ const ZSTD_parameters *params);

/**
* ZSTD_decompress_usingDict() - decompress src into dst using a dictionary
@@ -381,7 +381,7 @@ typedef struct ZSTD_CDict_s ZSTD_CDict;
* Return: The digested dictionary emplaced into workspace.
*/
ZSTD_CDict *ZSTD_initCDict(const void *dictBuffer, size_t dictSize,
- ZSTD_parameters params, void *workspace, size_t workspaceSize);
+ const ZSTD_parameters *params, void *workspace, size_t workspaceSize);

/**
* ZSTD_compress_usingCDict() - compress src into dst using a ZSTD_CDict
@@ -552,7 +552,7 @@ typedef struct ZSTD_CStream_s ZSTD_CStream;
*
* Return: The zstd streaming compression context.
*/
-ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params,
+ZSTD_CStream *ZSTD_initCStream(const ZSTD_parameters *params,
unsigned long long pledgedSrcSize, void *workspace,
size_t workspaceSize);

@@ -1006,7 +1006,7 @@ size_t ZSTD_compressBegin(ZSTD_CCtx *cctx, int compressionLevel);
size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx *cctx, const void *dict,
size_t dictSize, int compressionLevel);
size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict,
- size_t dictSize, ZSTD_parameters params,
+ size_t dictSize, const ZSTD_parameters *params,
unsigned long long pledgedSrcSize);
size_t ZSTD_copyCCtx(ZSTD_CCtx *cctx, const ZSTD_CCtx *preparedCCtx,
unsigned long long pledgedSrcSize);
diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c
index 5e0b670..306e31b 100644
--- a/lib/zstd/compress.c
+++ b/lib/zstd/compress.c
@@ -206,18 +206,21 @@ ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_compressionParameters cPar, u
return cPar;
}

-static U32 ZSTD_equivalentParams(ZSTD_parameters param1, ZSTD_parameters param2)
+static U32 ZSTD_equivalentParams(const ZSTD_parameters *param1, const ZSTD_parameters *param2)
{
- return (param1.cParams.hashLog == param2.cParams.hashLog) & (param1.cParams.chainLog == param2.cParams.chainLog) &
- (param1.cParams.strategy == param2.cParams.strategy) & ((param1.cParams.searchLength == 3) == (param2.cParams.searchLength == 3));
+ return (param1->cParams.hashLog == param2->cParams.hashLog) &&
+ (param1->cParams.chainLog == param2->cParams.chainLog) &&
+ (param1->cParams.strategy == param2->cParams.strategy) &&
+ (param1->cParams.searchLength == 3) &&
+ (param1->cParams.searchLength == param2->cParams.searchLength);
}

/*! ZSTD_continueCCtx() :
reuse CCtx without reset (note : requires no dictionary) */
-static size_t ZSTD_continueCCtx(ZSTD_CCtx *cctx, ZSTD_parameters params, U64 frameContentSize)
+static size_t ZSTD_continueCCtx(ZSTD_CCtx *cctx, const ZSTD_parameters *params, U64 frameContentSize)
{
U32 const end = (U32)(cctx->nextSrc - cctx->base);
- cctx->params = params;
+ cctx->params = *params;
cctx->frameContentSize = frameContentSize;
cctx->lowLimit = end;
cctx->dictLimit = end;
@@ -239,23 +242,23 @@ typedef enum { ZSTDcrp_continue, ZSTDcrp_noMemset, ZSTDcrp_fullReset } ZSTD_comp

/*! ZSTD_resetCCtx_advanced() :
note : `params` must be validated */
-static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64 frameContentSize, ZSTD_compResetPolicy_e const crp)
+static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, const ZSTD_parameters *params, U64 frameContentSize, ZSTD_compResetPolicy_e const crp)
{
if (crp == ZSTDcrp_continue)
- if (ZSTD_equivalentParams(params, zc->params)) {
+ if (ZSTD_equivalentParams(params, &zc->params)) {
zc->flagStaticTables = 0;
zc->flagStaticHufTable = HUF_repeat_none;
return ZSTD_continueCCtx(zc, params, frameContentSize);
}

{
- size_t const blockSize = MIN(ZSTD_BLOCKSIZE_ABSOLUTEMAX, (size_t)1 << params.cParams.windowLog);
- U32 const divider = (params.cParams.searchLength == 3) ? 3 : 4;
+ size_t const blockSize = MIN(ZSTD_BLOCKSIZE_ABSOLUTEMAX, (size_t)1 << params->cParams.windowLog);
+ U32 const divider = (params->cParams.searchLength == 3) ? 3 : 4;
size_t const maxNbSeq = blockSize / divider;
size_t const tokenSpace = blockSize + 11 * maxNbSeq;
- size_t const chainSize = (params.cParams.strategy == ZSTD_fast) ? 0 : (1 << params.cParams.chainLog);
- size_t const hSize = ((size_t)1) << params.cParams.hashLog;
- U32 const hashLog3 = (params.cParams.searchLength > 3) ? 0 : MIN(ZSTD_HASHLOG3_MAX, params.cParams.windowLog);
+ size_t const chainSize = (params->cParams.strategy == ZSTD_fast) ? 0 : (1 << params->cParams.chainLog);
+ size_t const hSize = ((size_t)1) << params->cParams.hashLog;
+ U32 const hashLog3 = (params->cParams.searchLength > 3) ? 0 : MIN(ZSTD_HASHLOG3_MAX, params->cParams.windowLog);
size_t const h3Size = ((size_t)1) << hashLog3;
size_t const tableSpace = (chainSize + hSize + h3Size) * sizeof(U32);
void *ptr;
@@ -265,7 +268,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
size_t const optSpace = ((MaxML + 1) + (MaxLL + 1) + (MaxOff + 1) + (1 << Litbits)) * sizeof(U32) +
(ZSTD_OPT_NUM + 1) * (sizeof(ZSTD_match_t) + sizeof(ZSTD_optimal_t));
size_t const neededSpace = tableSpace + (256 * sizeof(U32)) /* huffTable */ + tokenSpace +
- (((params.cParams.strategy == ZSTD_btopt) || (params.cParams.strategy == ZSTD_btopt2)) ? optSpace : 0);
+ (((params->cParams.strategy == ZSTD_btopt) || (params->cParams.strategy == ZSTD_btopt2)) ? optSpace : 0);
if (zc->workSpaceSize < neededSpace) {
ZSTD_free(zc->workSpace, zc->customMem);
zc->workSpace = ZSTD_malloc(neededSpace, zc->customMem);
@@ -294,7 +297,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
zc->dictBase = NULL;
zc->dictLimit = 0;
zc->lowLimit = 0;
- zc->params = params;
+ zc->params = *params;
zc->blockSize = blockSize;
zc->frameContentSize = frameContentSize;
{
@@ -303,7 +306,7 @@ static size_t ZSTD_resetCCtx_advanced(ZSTD_CCtx *zc, ZSTD_parameters params, U64
zc->rep[i] = repStartValue[i];
}

- if ((params.cParams.strategy == ZSTD_btopt) || (params.cParams.strategy == ZSTD_btopt2)) {
+ if ((params->cParams.strategy == ZSTD_btopt) || (params->cParams.strategy == ZSTD_btopt2)) {
zc->seqStore.litFreq = (U32 *)ptr;
zc->seqStore.litLengthFreq = zc->seqStore.litFreq + (1 << Litbits);
zc->seqStore.matchLengthFreq = zc->seqStore.litLengthFreq + (MaxLL + 1);
@@ -354,7 +357,7 @@ size_t ZSTD_copyCCtx(ZSTD_CCtx *dstCCtx, const ZSTD_CCtx *srcCCtx, unsigned long
{
ZSTD_parameters params = srcCCtx->params;
params.fParams.contentSizeFlag = (pledgedSrcSize > 0);
- ZSTD_resetCCtx_advanced(dstCCtx, params, pledgedSrcSize, ZSTDcrp_noMemset);
+ ZSTD_resetCCtx_advanced(dstCCtx, &params, pledgedSrcSize, ZSTDcrp_noMemset);
}

/* copy tables */
@@ -2428,16 +2431,16 @@ static size_t ZSTD_compress_generic(ZSTD_CCtx *cctx, void *dst, size_t dstCapaci
return op - ostart;
}

-static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_parameters params, U64 pledgedSrcSize, U32 dictID)
+static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_parameters *params, U64 pledgedSrcSize, U32 dictID)
{
BYTE *const op = (BYTE *)dst;
U32 const dictIDSizeCode = (dictID > 0) + (dictID >= 256) + (dictID >= 65536); /* 0-3 */
- U32 const checksumFlag = params.fParams.checksumFlag > 0;
- U32 const windowSize = 1U << params.cParams.windowLog;
- U32 const singleSegment = params.fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
- BYTE const windowLogByte = (BYTE)((params.cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
+ U32 const checksumFlag = params->fParams.checksumFlag > 0;
+ U32 const windowSize = 1U << params->cParams.windowLog;
+ U32 const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
+ BYTE const windowLogByte = (BYTE)((params->cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
U32 const fcsCode =
- params.fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
+ params->fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
BYTE const frameHeaderDecriptionByte = (BYTE)(dictIDSizeCode + (checksumFlag << 2) + (singleSegment << 5) + (fcsCode << 6));
size_t pos;

@@ -2496,7 +2499,7 @@ static size_t ZSTD_compressContinue_internal(ZSTD_CCtx *cctx, void *dst, size_t
return ERROR(stage_wrong); /* missing init (ZSTD_compressBegin) */

if (frame && (cctx->stage == ZSTDcs_init)) {
- fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->params, cctx->frameContentSize, cctx->dictID);
+ fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->params, cctx->frameContentSize, cctx->dictID);
if (ZSTD_isError(fhSize))
return fhSize;
dstCapacity -= fhSize;
@@ -2735,7 +2738,7 @@ static size_t ZSTD_compress_insertDictionary(ZSTD_CCtx *cctx, const void *dict,

/*! ZSTD_compressBegin_internal() :
* @return : 0, or an error code */
-static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, ZSTD_parameters params, U64 pledgedSrcSize)
+static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, const ZSTD_parameters *params, U64 pledgedSrcSize)
{
ZSTD_compResetPolicy_e const crp = dictSize ? ZSTDcrp_fullReset : ZSTDcrp_continue;
CHECK_F(ZSTD_resetCCtx_advanced(cctx, params, pledgedSrcSize, crp));
@@ -2744,17 +2747,17 @@ static size_t ZSTD_compressBegin_internal(ZSTD_CCtx *cctx, const void *dict, siz

/*! ZSTD_compressBegin_advanced() :
* @return : 0, or an error code */
-size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize)
+size_t ZSTD_compressBegin_advanced(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, const ZSTD_parameters *params, unsigned long long pledgedSrcSize)
{
/* compression parameters verification and optimization */
- CHECK_F(ZSTD_checkCParams(params.cParams));
+ CHECK_F(ZSTD_checkCParams(params->cParams));
return ZSTD_compressBegin_internal(cctx, dict, dictSize, params, pledgedSrcSize);
}

size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx *cctx, const void *dict, size_t dictSize, int compressionLevel)
{
ZSTD_parameters const params = ZSTD_getParams(compressionLevel, 0, dictSize);
- return ZSTD_compressBegin_internal(cctx, dict, dictSize, params, 0);
+ return ZSTD_compressBegin_internal(cctx, dict, dictSize, &params, 0);
}

size_t ZSTD_compressBegin(ZSTD_CCtx *cctx, int compressionLevel) { return ZSTD_compressBegin_usingDict(cctx, NULL, 0, compressionLevel); }
@@ -2773,7 +2776,7 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity)

/* special case : empty frame */
if (cctx->stage == ZSTDcs_init) {
- fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, cctx->params, 0, 0);
+ fhSize = ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->params, 0, 0);
if (ZSTD_isError(fhSize))
return fhSize;
dstCapacity -= fhSize;
@@ -2816,19 +2819,19 @@ size_t ZSTD_compressEnd(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity, const vo
}

static size_t ZSTD_compress_internal(ZSTD_CCtx *cctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize,
- ZSTD_parameters params)
+ const ZSTD_parameters *params)
{
CHECK_F(ZSTD_compressBegin_internal(cctx, dict, dictSize, params, srcSize));
return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
}

size_t ZSTD_compress_usingDict(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const void *dict, size_t dictSize,
- ZSTD_parameters params)
+ const ZSTD_parameters *params)
{
return ZSTD_compress_internal(ctx, dst, dstCapacity, src, srcSize, dict, dictSize, params);
}

-size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, ZSTD_parameters params)
+size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity, const void *src, size_t srcSize, const ZSTD_parameters *params)
{
return ZSTD_compress_internal(ctx, dst, dstCapacity, src, srcSize, NULL, 0, params);
}
@@ -2844,7 +2847,7 @@ struct ZSTD_CDict_s {

size_t ZSTD_CDictWorkspaceBound(ZSTD_compressionParameters cParams) { return ZSTD_CCtxWorkspaceBound(cParams) + ZSTD_ALIGN(sizeof(ZSTD_CDict)); }

-static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dictSize, unsigned byReference, ZSTD_parameters params, ZSTD_customMem customMem)
+static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dictSize, unsigned byReference, const ZSTD_parameters *params, ZSTD_customMem customMem)
{
if (!customMem.customAlloc || !customMem.customFree)
return NULL;
@@ -2890,7 +2893,7 @@ static ZSTD_CDict *ZSTD_createCDict_advanced(const void *dictBuffer, size_t dict
}
}

-ZSTD_CDict *ZSTD_initCDict(const void *dict, size_t dictSize, ZSTD_parameters params, void *workspace, size_t workspaceSize)
+ZSTD_CDict *ZSTD_initCDict(const void *dict, size_t dictSize, const ZSTD_parameters *params, void *workspace, size_t workspaceSize)
{
ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
return ZSTD_createCDict_advanced(dict, dictSize, 1, params, stackMem);
@@ -2918,7 +2921,7 @@ size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx *cctx, const ZSTD_CDict *cdict, u
else {
ZSTD_parameters params = cdict->refContext->params;
params.fParams.contentSizeFlag = (pledgedSrcSize > 0);
- CHECK_F(ZSTD_compressBegin_advanced(cctx, NULL, 0, params, pledgedSrcSize));
+ CHECK_F(ZSTD_compressBegin_advanced(cctx, NULL, 0, &params, pledgedSrcSize));
}
return 0;
}
@@ -3031,7 +3034,7 @@ static size_t ZSTD_resetCStream_internal(ZSTD_CStream *zcs, unsigned long long p
if (zcs->cdict)
CHECK_F(ZSTD_compressBegin_usingCDict(zcs->cctx, zcs->cdict, pledgedSrcSize))
else
- CHECK_F(ZSTD_compressBegin_advanced(zcs->cctx, NULL, 0, zcs->params, pledgedSrcSize));
+ CHECK_F(ZSTD_compressBegin_advanced(zcs->cctx, NULL, 0, &zcs->params, pledgedSrcSize));

zcs->inToCompress = 0;
zcs->inBuffPos = 0;
@@ -3052,11 +3055,11 @@ size_t ZSTD_resetCStream(ZSTD_CStream *zcs, unsigned long long pledgedSrcSize)
return ZSTD_resetCStream_internal(zcs, pledgedSrcSize);
}

-static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize)
+static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, size_t dictSize, const ZSTD_parameters *params, unsigned long long pledgedSrcSize)
{
/* allocate buffers */
{
- size_t const neededInBuffSize = (size_t)1 << params.cParams.windowLog;
+ size_t const neededInBuffSize = (size_t)1 << params->cParams.windowLog;
if (zcs->inBuffSize < neededInBuffSize) {
zcs->inBuffSize = neededInBuffSize;
ZSTD_free(zcs->inBuff, zcs->customMem);
@@ -3083,13 +3086,13 @@ static size_t ZSTD_initCStream_advanced(ZSTD_CStream *zcs, const void *dict, siz
} else
zcs->cdict = NULL;

- zcs->checksum = params.fParams.checksumFlag > 0;
- zcs->params = params;
+ zcs->checksum = params->fParams.checksumFlag > 0;
+ zcs->params = *params;

return ZSTD_resetCStream_internal(zcs, pledgedSrcSize);
}

-ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
+ZSTD_CStream *ZSTD_initCStream(const ZSTD_parameters *params, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
{
ZSTD_customMem const stackMem = ZSTD_initStack(workspace, workspaceSize);
ZSTD_CStream *const zcs = ZSTD_createCStream_advanced(stackMem);
@@ -3105,7 +3108,7 @@ ZSTD_CStream *ZSTD_initCStream(ZSTD_parameters params, unsigned long long pledge
ZSTD_CStream *ZSTD_initCStream_usingCDict(const ZSTD_CDict *cdict, unsigned long long pledgedSrcSize, void *workspace, size_t workspaceSize)
{
ZSTD_parameters const params = ZSTD_getParamsFromCDict(cdict);
- ZSTD_CStream *const zcs = ZSTD_initCStream(params, pledgedSrcSize, workspace, workspaceSize);
+ ZSTD_CStream *const zcs = ZSTD_initCStream(&params, pledgedSrcSize, workspace, workspaceSize);
if (zcs) {
zcs->cdict = cdict;
if (ZSTD_isError(ZSTD_resetCStream_internal(zcs, pledgedSrcSize))) {
--
2.7.4

2019-06-03 09:07:53

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 2/4] zstd: use U16 data type for rankPos

rankPos structure variables value can not be more than 512.
So it can easily be declared as U16 rather than U32.

It will reduce stack usage of HUF_sort from 256 bytes to 128 bytes

original:
e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc}
e24cb004 sub fp, ip, #4
e24ddc01 sub sp, sp, #256 ; 0x100

changed:
e92ddbf0 push {r4, r5, r6, r7, r8, r9, fp, ip, lr, pc}
e24cb004 sub fp, ip, #4
e24dd080 sub sp, sp, #128 ; 0x80


Signed-off-by: Maninder Singh <[email protected]>
Signed-off-by: Vaneet Narang <[email protected]>
---
lib/zstd/huf_compress.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/zstd/huf_compress.c b/lib/zstd/huf_compress.c
index e727812..2203124 100644
--- a/lib/zstd/huf_compress.c
+++ b/lib/zstd/huf_compress.c
@@ -382,8 +382,8 @@ static U32 HUF_setMaxHeight(nodeElt *huffNode, U32 lastNonNull, U32 maxNbBits)
}

typedef struct {
- U32 base;
- U32 curr;
+ U16 base;
+ U16 curr;
} rankPos;

static void HUF_sort(nodeElt *huffNode, const U32 *count, U32 maxSymbolValue)
--
2.7.4

2019-06-03 09:08:16

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 3/4] zstd: move params structure to global variable to reduce stack usage

As params structure remains same for lifetime, just initialise it
at init time and make it global variable.

Signed-off-by: Maninder Singh <[email protected]>
Signed-off-by: Vaneet Narang <[email protected]>
---
crypto/zstd.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/crypto/zstd.c b/crypto/zstd.c
index 4e9ff22..80a9e5c 100644
--- a/crypto/zstd.c
+++ b/crypto/zstd.c
@@ -32,6 +32,8 @@ struct zstd_ctx {
void *dwksp;
};

+static ZSTD_parameters params;
+
static ZSTD_parameters zstd_params(void)
{
return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
@@ -40,8 +42,10 @@ static ZSTD_parameters zstd_params(void)
static int zstd_comp_init(struct zstd_ctx *ctx)
{
int ret = 0;
- const ZSTD_parameters params = zstd_params();
- const size_t wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);
+ size_t wksp_size;
+
+ params = zstd_params();
+ wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);

ctx->cwksp = vzalloc(wksp_size);
if (!ctx->cwksp) {
@@ -160,7 +164,6 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
{
size_t out_len;
struct zstd_ctx *zctx = ctx;
- const ZSTD_parameters params = zstd_params();

out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
if (ZSTD_isError(out_len))
--
2.7.4

2019-06-03 09:08:58

by Maninder Singh

[permalink] [raw]
Subject: [PATCH 4/4] zstd: change structure variable from int to char

Elements of ZSTD_frameParameters structure are used as flag, so
just declare them as char. ZSTD_frameParameters structure is used by
ZSTD_parameters.

Before:
======
sizeof(ZSTD_parameters)
$1 = 40

After:
=====
sizeof(ZSTD_parameters)
$1 = 32

Signed-off-by: Maninder Singh <[email protected]>
Signed-off-by: Vaneet Narang <[email protected]>
---
include/linux/zstd.h | 6 +++---
lib/zstd/compress.c | 4 ++--
lib/zstd/decompress.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/zstd.h b/include/linux/zstd.h
index 5103efa..a1e3483 100644
--- a/include/linux/zstd.h
+++ b/include/linux/zstd.h
@@ -164,9 +164,9 @@ typedef struct {
* The default value is all fields set to 0.
*/
typedef struct {
- unsigned int contentSizeFlag;
- unsigned int checksumFlag;
- unsigned int noDictIDFlag;
+ unsigned char contentSizeFlag;
+ unsigned char checksumFlag;
+ unsigned char noDictIDFlag;
} ZSTD_frameParameters;

/**
diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c
index 306e31b..7e0cea2 100644
--- a/lib/zstd/compress.c
+++ b/lib/zstd/compress.c
@@ -2435,9 +2435,9 @@ static size_t ZSTD_writeFrameHeader(void *dst, size_t dstCapacity, ZSTD_paramete
{
BYTE *const op = (BYTE *)dst;
U32 const dictIDSizeCode = (dictID > 0) + (dictID >= 256) + (dictID >= 65536); /* 0-3 */
- U32 const checksumFlag = params->fParams.checksumFlag > 0;
+ BYTE const checksumFlag = params->fParams.checksumFlag > 0;
U32 const windowSize = 1U << params->cParams.windowLog;
- U32 const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
+ BYTE const singleSegment = params->fParams.contentSizeFlag && (windowSize >= pledgedSrcSize);
BYTE const windowLogByte = (BYTE)((params->cParams.windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN) << 3);
U32 const fcsCode =
params->fParams.contentSizeFlag ? (pledgedSrcSize >= 256) + (pledgedSrcSize >= 65536 + 256) + (pledgedSrcSize >= 0xFFFFFFFFU) : 0; /* 0-3 */
diff --git a/lib/zstd/decompress.c b/lib/zstd/decompress.c
index 269ee9a..7828264 100644
--- a/lib/zstd/decompress.c
+++ b/lib/zstd/decompress.c
@@ -233,7 +233,7 @@ size_t ZSTD_getFrameParams(ZSTD_frameParams *fparamsPtr, const void *src, size_t
BYTE const fhdByte = ip[4];
size_t pos = 5;
U32 const dictIDSizeCode = fhdByte & 3;
- U32 const checksumFlag = (fhdByte >> 2) & 1;
+ BYTE const checksumFlag = (fhdByte >> 2) & 1;
U32 const singleSegment = (fhdByte >> 5) & 1;
U32 const fcsID = fhdByte >> 6;
U32 const windowSizeMax = 1U << ZSTD_WINDOWLOG_MAX;
--
2.7.4

2019-06-03 21:48:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/4] zstd: move params structure to global variable to reduce stack usage

On Mon, 3 Jun 2019 14:32:05 +0530 Maninder Singh <[email protected]> wrote:

>
> Subject: [PATCH 3/4] zstd: move params structure to global variable to reduce stack usage

If this affected lib/zstd/ I'd be alarmed. But it's a client of
lib/zstd that is choosing to have a single kernel-wide copy. I'll
rewrite this patch title to "crypto/zstd.c: ...".


2019-06-03 21:48:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

On Mon, 3 Jun 2019 14:32:03 +0530 Maninder Singh <[email protected]> wrote:

> currently params structure is passed in all functions, which increases
> stack usage in all the function and lead to stack overflow on target like
> ARM with kernel stack size of 8 KB so better to pass pointer.
>
> Checked for ARM:
>
> Original Patched
> Call FLow Size: 1264 1040
> ....
> (HUF_sort) -> 296
> (HUF_buildCTable_wksp) -> 144
> (HUF_compress4X_repeat) -> 88
> (ZSTD_compressBlock_internal) -> 200
> (ZSTD_compressContinue_internal)-> 136 -> 88
> (ZSTD_compressCCtx) -> 192 -> 64
> (zstd_compress) -> 144 -> 96
> (crypto_compress) -> 32
> (zcomp_compress) -> 32
> ....
>
> ...
>
> --- a/crypto/zstd.c
> +++ b/crypto/zstd.c
> @@ -162,7 +162,7 @@ static int __zstd_compress(const u8 *src, unsigned int slen,
> struct zstd_ctx *zctx = ctx;
> const ZSTD_parameters params = zstd_params();
>
> - out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
> + out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, &params);
> if (ZSTD_isError(out_len))
> return -EINVAL;
> *dlen = out_len;
> diff --git a/include/linux/zstd.h b/include/linux/zstd.h
> index 249575e..5103efa 100644
> --- a/include/linux/zstd.h
> +++ b/include/linux/zstd.h
> @@ -254,7 +254,7 @@ ZSTD_CCtx *ZSTD_initCCtx(void *workspace, size_t workspaceSize);
> * ZSTD_isError().
> */
> size_t ZSTD_compressCCtx(ZSTD_CCtx *ctx, void *dst, size_t dstCapacity,
> - const void *src, size_t srcSize, ZSTD_parameters params);
> + const void *src, size_t srcSize, const ZSTD_parameters *params);

Making the params poniter const made review a lot easier ;)

But struct ZSTD_CCtx_s.params is still a copied structure. Could we
make it `const ZSTD_parameters *params'? Probably not, due to lifetime
issues?


2019-06-03 21:49:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/4] zstd: reduce stack usage

On Mon, 3 Jun 2019 14:32:02 +0530 Maninder Singh <[email protected]> wrote:

> This patch set reduces stack usage for zstd code, because target like ARM has
> limited 8KB kernel stack, which is getting overflowed due to hight stack usage
> of zstd code with call flow like:

That's rather bad behaviour. I assume the patchset actually fixes this?

I think I'll schedule the patchset for 5.2-rcX so that zstd is actually
usable on arm in 5.2. Does that sound OK?

2019-06-04 12:31:22

by Vaneet Narang

[permalink] [raw]
Subject: RE:(2) [PATCH 0/4] zstd: reduce stack usage

Hi Andrew,

>> This patch set reduces stack usage for zstd code, because target like ARM has
>> limited 8KB kernel stack, which is getting overflowed due to hight stack usage
>> of zstd code with call flow like:

>That's rather bad behaviour. I assume the patchset actually fixes this?

Yes, patchset tries to reduce around 300 bytes of stack usage of zstd compression path.
We faced high stack usage issue on switching compression algo from LZO/LZ4 to zstd algo.
zstd compression uses around 1200 bytes of stack which is huge as compared
to LZO/LZ4 which uses negligible stack (< 200 bytes).


>I think I'll schedule the patchset for 5.2-rcX so that zstd is actually
>usable on arm in 5.2. Does that sound OK? 
OK

Regards,
Vaneet Narang

2019-06-04 13:33:56

by Vaneet Narang

[permalink] [raw]
Subject: RE:(2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions

Hi Andrew,
 
>But struct ZSTD_CCtx_s.params is still a copied structure.  Could we
>make it `const ZSTD_parameters *params'?  Probably not, due to lifetime
>issues?

ZSTD maintains its own ctxt. Yes we can avoid storing pointers of other modules
because of lifetime issues. We should keep ZSTD_CCtx_s.params as structure instead of pointer.


Thanks & Regards,
Vaneet Narang

 
 

2019-06-04 22:43:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

On Mon, 3 Jun 2019 14:32:03 +0530 Maninder Singh <[email protected]> wrote:

> currently params structure is passed in all functions, which increases
> stack usage in all the function and lead to stack overflow on target like
> ARM with kernel stack size of 8 KB so better to pass pointer.
>
> Checked for ARM:
>
> Original Patched
> Call FLow Size: 1264 1040
> ....
> (HUF_sort) -> 296
> (HUF_buildCTable_wksp) -> 144
> (HUF_compress4X_repeat) -> 88
> (ZSTD_compressBlock_internal) -> 200
> (ZSTD_compressContinue_internal)-> 136 -> 88
> (ZSTD_compressCCtx) -> 192 -> 64
> (zstd_compress) -> 144 -> 96
> (crypto_compress) -> 32
> (zcomp_compress) -> 32
> ....
>
> Signed-off-by: Maninder Singh <[email protected]>
> Signed-off-by: Vaneet Narang <[email protected]>
>

You missed btrfs. This needs review, please - particularly the
kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().

The base patch is here:

http://lkml.kernel.org/r/[email protected]

--- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
+++ a/fs/btrfs/zstd.c
@@ -27,15 +27,17 @@
/* 307s to avoid pathologically clashing with transaction commit */
#define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)

-static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
+static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
size_t src_len)
{
- ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
+ static ZSTD_parameters params;
+
+ params = ZSTD_getParams(level, src_len, 0);

if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
WARN_ON(src_len > ZSTD_BTRFS_MAX_INPUT);
- return params;
+ return &params;
}

struct workspace {
@@ -155,11 +157,12 @@ static void zstd_calc_ws_mem_sizes(void)
unsigned int level;

for (level = 1; level <= ZSTD_BTRFS_MAX_LEVEL; level++) {
- ZSTD_parameters params =
- zstd_get_btrfs_parameters(level, ZSTD_BTRFS_MAX_INPUT);
- size_t level_size =
- max_t(size_t,
- ZSTD_CStreamWorkspaceBound(params.cParams),
+ ZSTD_parameters *params;
+ size_t level_size;
+
+ params = zstd_get_btrfs_parameters(level, ZSTD_BTRFS_MAX_INPUT);
+ level_size = max_t(size_t,
+ ZSTD_CStreamWorkspaceBound(params->cParams),
ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));

max_size = max_t(size_t, max_size, level_size);
@@ -385,8 +388,9 @@ static int zstd_compress_pages(struct li
unsigned long len = *total_out;
const unsigned long nr_dest_pages = *out_pages;
unsigned long max_out = nr_dest_pages * PAGE_SIZE;
- ZSTD_parameters params = zstd_get_btrfs_parameters(workspace->req_level,
- len);
+ ZSTD_parameters *params;
+
+ params = zstd_get_btrfs_parameters(workspace->req_level, len);

*out_pages = 0;
*total_out = 0;
_

2019-06-05 11:56:45

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

On Tue, Jun 04, 2019 at 03:43:26PM -0700, Andrew Morton wrote:
> On Mon, 3 Jun 2019 14:32:03 +0530 Maninder Singh <[email protected]> wrote:
>
> > currently params structure is passed in all functions, which increases
> > stack usage in all the function and lead to stack overflow on target like
> > ARM with kernel stack size of 8 KB so better to pass pointer.
> >
> > Checked for ARM:
> >
> > Original Patched
> > Call FLow Size: 1264 1040
> > ....
> > (HUF_sort) -> 296
> > (HUF_buildCTable_wksp) -> 144
> > (HUF_compress4X_repeat) -> 88
> > (ZSTD_compressBlock_internal) -> 200
> > (ZSTD_compressContinue_internal)-> 136 -> 88
> > (ZSTD_compressCCtx) -> 192 -> 64
> > (zstd_compress) -> 144 -> 96
> > (crypto_compress) -> 32
> > (zcomp_compress) -> 32
> > ....
> >
> > Signed-off-by: Maninder Singh <[email protected]>
> > Signed-off-by: Vaneet Narang <[email protected]>
> >
>
> You missed btrfs. This needs review, please - particularly the
> kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().

>
> The base patch is here:
>
> http://lkml.kernel.org/r/[email protected]
>
> --- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
> +++ a/fs/btrfs/zstd.c
> @@ -27,15 +27,17 @@
> /* 307s to avoid pathologically clashing with transaction commit */
> #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
>
> -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
> size_t src_len)
> {
> - ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> + static ZSTD_parameters params;

> +
> + params = ZSTD_getParams(level, src_len, 0);

No thats' broken, the params can't be static as it depends on level and
src_len. What happens if there are several requests in parallel with
eg. different levels?

Would be really great if the mailinglist is CCed when the code is
changed in a non-trivial way.

2019-06-05 12:34:31

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

On Wed, Jun 05, 2019 at 01:57:03PM +0200, David Sterba wrote:
> On Tue, Jun 04, 2019 at 03:43:26PM -0700, Andrew Morton wrote:
> > On Mon, 3 Jun 2019 14:32:03 +0530 Maninder Singh <[email protected]> wrote:
> >
> > > currently params structure is passed in all functions, which increases
> > > stack usage in all the function and lead to stack overflow on target like
> > > ARM with kernel stack size of 8 KB so better to pass pointer.
> > >
> > > Checked for ARM:
> > >
> > > Original Patched
> > > Call FLow Size: 1264 1040
> > > ....
> > > (HUF_sort) -> 296
> > > (HUF_buildCTable_wksp) -> 144
> > > (HUF_compress4X_repeat) -> 88
> > > (ZSTD_compressBlock_internal) -> 200
> > > (ZSTD_compressContinue_internal)-> 136 -> 88
> > > (ZSTD_compressCCtx) -> 192 -> 64
> > > (zstd_compress) -> 144 -> 96
> > > (crypto_compress) -> 32
> > > (zcomp_compress) -> 32
> > > ....
> > >
> > > Signed-off-by: Maninder Singh <[email protected]>
> > > Signed-off-by: Vaneet Narang <[email protected]>
> > >
> >
> > You missed btrfs. This needs review, please - particularly the
> > kernel-wide static ZSTD_parameters in zstd_get_btrfs_parameters().
>
> >
> > The base patch is here:
> >
> > http://lkml.kernel.org/r/[email protected]
> >
> > --- a/fs/btrfs/zstd.c~zstd-pass-pointer-rathen-than-structure-to-functions-fix
> > +++ a/fs/btrfs/zstd.c
> > @@ -27,15 +27,17 @@
> > /* 307s to avoid pathologically clashing with transaction commit */
> > #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
> >
> > -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> > +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
> > size_t src_len)
> > {
> > - ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> > + static ZSTD_parameters params;
>
> > +
> > + params = ZSTD_getParams(level, src_len, 0);
>
> No thats' broken, the params can't be static as it depends on level and
> src_len. What happens if there are several requests in parallel with
> eg. different levels?
>
> Would be really great if the mailinglist is CCed when the code is
> changed in a non-trivial way.

So this does not compile fs/btrfs/zstd.o which Andrew probably found
too, otherwise btrfs is the only in-tree user of the function outside of
lib/ and crypto/.

I think that Nick Terrell should have been CCed too, as he ported zstd
to linux.

2019-06-05 21:32:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] zstd: pass pointer rathen than structure to functions

On Wed, 5 Jun 2019 14:32:53 +0200 David Sterba <[email protected]> wrote:

> > >
> > > -static ZSTD_parameters zstd_get_btrfs_parameters(unsigned int level,
> > > +static ZSTD_parameters *zstd_get_btrfs_parameters(unsigned int level,
> > > size_t src_len)
> > > {
> > > - ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
> > > + static ZSTD_parameters params;
> >
> > > +
> > > + params = ZSTD_getParams(level, src_len, 0);
> >
> > No thats' broken, the params can't be static as it depends on level and
> > src_len. What happens if there are several requests in parallel with
> > eg. different levels?

I wondered. I'll drop the patch series as some more serious thinking
is needed.

> > Would be really great if the mailinglist is CCed when the code is
> > changed in a non-trivial way.

Well we didn't actually change btrfs until I discovered that Maninder
had missed it.

> So this does not compile fs/btrfs/zstd.o which Andrew probably found
> too, otherwise btrfs is the only in-tree user of the function outside of
> lib/ and crypto/.

Worked for me - I might have sent the wrong version.

> I think that Nick Terrell should have been CCed too, as he ported zstd
> to linux.

2019-06-06 14:23:42

by Vaneet Narang

[permalink] [raw]
Subject: RE:(2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions

Hi Andrew / David,

 
>> > > -        ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
>> > > +        static ZSTD_parameters params;
>> > 
>> > > +
>> > > +        params = ZSTD_getParams(level, src_len, 0);
>> > 
>> > No thats' broken, the params can't be static as it depends on level and
>> > src_len. What happens if there are several requests in parallel with
>> > eg. different levels?

There is no need to make static for btrfs. We can keep it as a stack variable.
This patch set focussed on reducing stack usage of zstd compression when triggered
through zram. ZRAM internally uses crypto and currently crpto uses fixed level and also
not dependent upon source length.

crypto/zstd.c:
static ZSTD_parameters zstd_params(void)
{
return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
}


Actually high stack usage problem with zstd compression patch gets exploited more incase of
shrink path which gets triggered randomly from any call flow in case of low memory and adds overhead
of more than 2000 byte of stack and results in stack overflow.

Stack usage of alloc_page in case of low memory

72 HUF_compressWeights_wksp+0x140/0x200
64 HUF_writeCTable_wksp+0xdc/0x1c8
88 HUF_compress4X_repeat+0x214/0x450
208 ZSTD_compressBlock_internal+0x224/0x137c
136 ZSTD_compressContinue_internal+0x210/0x3b0
192 ZSTD_compressCCtx+0x6c/0x144
144 zstd_compress+0x40/0x58
32 crypto_compress+0x2c/0x34
32 zcomp_compress+0x3c/0x44
80 zram_bvec_rw+0x2f8/0xa7c
64 zram_rw_page+0x104/0x170
48 bdev_write_page+0x80/0xb4
112 __swap_writepage+0x160/0x29c
24 swap_writepage+0x3c/0x58
160 shrink_page_list+0x788/0xae0
128 shrink_inactive_list+0x210/0x4a8
184 shrink_zone+0x53c/0x7c0
160 try_to_free_pages+0x2fc/0x7cc
80 __alloc_pages_nodemask+0x534/0x91c

Thanks & Regards,
Vaneet Narang 
 

2019-06-06 20:20:02

by Nick Terrell

[permalink] [raw]
Subject: Re: (2) [PATCH 1/4] zstd: pass pointer rathen than structure to functions



> On Jun 6, 2019, at 7:10 AM, Vaneet Narang <[email protected]> wrote:
>
> Hi Andrew / David,
>
>
>>> > > - ZSTD_parameters params = ZSTD_getParams(level, src_len, 0);
>>> > > + static ZSTD_parameters params;
>>> >
>>> > > +
>>> > > + params = ZSTD_getParams(level, src_len, 0);
>>> >
>>> > No thats' broken, the params can't be static as it depends on level and
>>> > src_len. What happens if there are several requests in parallel with
>>> > eg. different levels?
>
> There is no need to make static for btrfs. We can keep it as a stack variable.
> This patch set focussed on reducing stack usage of zstd compression when triggered
> through zram. ZRAM internally uses crypto and currently crpto uses fixed level and also
> not dependent upon source length.

Can we measure the performance of these patches on btrfs and/or zram? See the benchmarks
I ran on my original patch to btrfs for reference https://lore.kernel.org/patchwork/patch/802866/.

I don't expect a speed difference, but I think it is worth measuring.

> crypto/zstd.c:
> static ZSTD_parameters zstd_params(void)
> {
> return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
> }
>
>
> Actually high stack usage problem with zstd compression patch gets exploited more incase of
> shrink path which gets triggered randomly from any call flow in case of low memory and adds overhead
> of more than 2000 byte of stack and results in stack overflow.
>
> Stack usage of alloc_page in case of low memory
>
> 72 HUF_compressWeights_wksp+0x140/0x200
> 64 HUF_writeCTable_wksp+0xdc/0x1c8
> 88 HUF_compress4X_repeat+0x214/0x450
> 208 ZSTD_compressBlock_internal+0x224/0x137c
> 136 ZSTD_compressContinue_internal+0x210/0x3b0
> 192 ZSTD_compressCCtx+0x6c/0x144
> 144 zstd_compress+0x40/0x58
> 32 crypto_compress+0x2c/0x34
> 32 zcomp_compress+0x3c/0x44
> 80 zram_bvec_rw+0x2f8/0xa7c
> 64 zram_rw_page+0x104/0x170
> 48 bdev_write_page+0x80/0xb4
> 112 __swap_writepage+0x160/0x29c
> 24 swap_writepage+0x3c/0x58
> 160 shrink_page_list+0x788/0xae0
> 128 shrink_inactive_list+0x210/0x4a8
> 184 shrink_zone+0x53c/0x7c0
> 160 try_to_free_pages+0x2fc/0x7cc
> 80 __alloc_pages_nodemask+0x534/0x91c
>
> Thanks & Regards,
> Vaneet Narang
>