2020-05-08 06:44:31

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: compress: fix zstd data corruption

On 2020/5/8 11:30, Daeho Jeong wrote:
> I am a little bit confused.
>
> In compress_log=2 (4 pages),
>
> Every compression algorithm will set the cc->nr_cpages to 5 pages like below.
>
>         max_len = COMPRESS_HEADER_SIZE + cc->clen;
>         cc->nr_cpages = DIV_ROUND_UP(max_len, PAGE_SIZE);
>
>         cc->cpages = f2fs_kzalloc(sbi, sizeof(struct page *) *
>                                         cc->nr_cpages, GFP_NOFS);
>
> And call cops->compress_pages(cc) and the returned length of the compressed data will be set to cc->clen for every case.
> And if the cc->clen is larger than max_len, we will give up compression.
>
>         ret = cops->compress_pages(cc);
>         if (ret)
>                 goto out_vunmap_cbuf;
>
>         max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
>
>         if (cc->clen > max_len) {
>                 ret = -EAGAIN;
>                 goto out_vunmap_cbuf;
>         }
>
> So, with your patch, we will just use 3 pages for ZSTD and 5 pages for LZO and LZ4 now.
> My question was whether it is also possible to decrease the compression buffer size for LZO and LZ4 to 3 pages like ZSTD case.
> I was just curious about that. :)
I guess we can change LZ4 as we did for ZSTD case, since it supports partially
compression:

- lz4_compress_pages
- LZ4_compress_default
- LZ4_compress_fast
- LZ4_compress_fast_extState
if (maxOutputSize < LZ4_COMPRESSBOUND(inputSize))
- LZ4_compress_generic(..., limitedOutput, ...)
- if (outputLimited && boundary_check_condition) return 0;

And for LZO case, it looks we have to keep to allocate 5 pages for worst
compression case as it doesn't support partially compression as I checked.

Thanks,

>
>
> 2020년 5월 8일 (금) 오전 11:48, Chao Yu <[email protected] <mailto:[email protected]>>님이 작성:
>
> Hi Daeho,
>
> On 2020/5/8 9:28, Daeho Jeong wrote:
> > Hi Chao,
> >
> > IIUC, you are trying not to use ZSTD_compressBound() to save the memory
> > space. Am I right?
> >
> > Then, how about LZ4_compressBound() for LZ4 and lzo1x_worst_compress() for
> > LZO?
>
> Oops, it looks those limits were wrongly used...
>
> #define LZ4_COMPRESSBOUND(isize)        (\
>         (unsigned int)(isize) > (unsigned int)LZ4_MAX_INPUT_SIZE \
>         ? 0 \
>         : (isize) + ((isize)/255) + 16)
>
> #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3 + 2)
>
> Newly calculated boundary size is larger than target buffer size.
>
> However comments on LZ4_compress_default() said:
>
> ...
>  * @maxOutputSize: full or partial size of buffer 'dest'
>  *      which must be already allocated
> ...
> int LZ4_compress_default(const char *source, char *dest, int inputSize,
>         int maxOutputSize, void *wrkmem);
>
> And @out_len in lzo1x_1_compress() was passed as an output parameter to
> pass length of data that compressor compressed into @out buffer.
>
> Let me know if I missed sth.
>
> Thannks,
>
> > Could we save more memory space for these two cases like ZSTD?
> > As you know, we are using 5 pages compression buffer for LZ4 and LZO in
> > compress_log_size=2,
> > and if the compressed data doesn't fit in 3 pages, it returns -EAGAIN to
> > give up compressing that one.
> >
> > Thanks,
> >
> > 2020년 5월 8일 (금) 오전 10:17, Chao Yu <[email protected] <mailto:[email protected]>>님이 작성:
> >
> >> During zstd compression, ZSTD_endStream() may return non-zero value
> >> because distination buffer is full, but there is still compressed data
> >> remained in intermediate buffer, it means that zstd algorithm can not
> >> save at last one block space, let's just writeback raw data instead of
> >> compressed one, this can fix data corruption when decompressing
> >> incomplete stored compression data.
> >>
> >> Signed-off-by: Daeho Jeong <[email protected] <mailto:[email protected]>>
> >> Signed-off-by: Chao Yu <[email protected] <mailto:[email protected]>>
> >> ---
> >>  fs/f2fs/compress.c | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> >> index c22cc0d37369..5e4947250262 100644
> >> --- a/fs/f2fs/compress.c
> >> +++ b/fs/f2fs/compress.c
> >> @@ -358,6 +358,13 @@ static int zstd_compress_pages(struct compress_ctx
> >> *cc)
> >>                 return -EIO;
> >>         }
> >>
> >> +       /*
> >> +        * there is compressed data remained in intermediate buffer due to
> >> +        * no more space in cbuf.cdata
> >> +        */
> >> +       if (ret)
> >> +               return -EAGAIN;
> >> +
> >>         cc->clen = outbuf.pos;
> >>         return 0;
> >>  }
> >> --
> >> 2.18.0.rc1
> >>
> >>
> >>
> >> _______________________________________________
> >> Linux-f2fs-devel mailing list
> >> [email protected] <mailto:[email protected]>
> >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >>
> >
>


2020-05-08 06:56:05

by Daeho Jeong

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: compress: fix zstd data corruption

Great, thanks for checking~! :)


2020년 5월 8일 (금) 오후 3:42, Chao Yu <[email protected]>님이 작성:
>
> On 2020/5/8 11:30, Daeho Jeong wrote:
> > I am a little bit confused.
> >
> > In compress_log=2 (4 pages),
> >
> > Every compression algorithm will set the cc->nr_cpages to 5 pages like below.
> >
> > max_len = COMPRESS_HEADER_SIZE + cc->clen;
> > cc->nr_cpages = DIV_ROUND_UP(max_len, PAGE_SIZE);
> >
> > cc->cpages = f2fs_kzalloc(sbi, sizeof(struct page *) *
> > cc->nr_cpages, GFP_NOFS);
> >
> > And call cops->compress_pages(cc) and the returned length of the compressed data will be set to cc->clen for every case.
> > And if the cc->clen is larger than max_len, we will give up compression.
> >
> > ret = cops->compress_pages(cc);
> > if (ret)
> > goto out_vunmap_cbuf;
> >
> > max_len = PAGE_SIZE * (cc->cluster_size - 1) - COMPRESS_HEADER_SIZE;
> >
> > if (cc->clen > max_len) {
> > ret = -EAGAIN;
> > goto out_vunmap_cbuf;
> > }
> >
> > So, with your patch, we will just use 3 pages for ZSTD and 5 pages for LZO and LZ4 now.
> > My question was whether it is also possible to decrease the compression buffer size for LZO and LZ4 to 3 pages like ZSTD case.
> > I was just curious about that. :)
> I guess we can change LZ4 as we did for ZSTD case, since it supports partially
> compression:
>
> - lz4_compress_pages
> - LZ4_compress_default
> - LZ4_compress_fast
> - LZ4_compress_fast_extState
> if (maxOutputSize < LZ4_COMPRESSBOUND(inputSize))
> - LZ4_compress_generic(..., limitedOutput, ...)
> - if (outputLimited && boundary_check_condition) return 0;
>
> And for LZO case, it looks we have to keep to allocate 5 pages for worst
> compression case as it doesn't support partially compression as I checked.
>
> Thanks,
>
> >
> >
> > 2020년 5월 8일 (금) 오전 11:48, Chao Yu <[email protected] <mailto:[email protected]>>님이 작성:
> >
> > Hi Daeho,
> >
> > On 2020/5/8 9:28, Daeho Jeong wrote:
> > > Hi Chao,
> > >
> > > IIUC, you are trying not to use ZSTD_compressBound() to save the memory
> > > space. Am I right?
> > >
> > > Then, how about LZ4_compressBound() for LZ4 and lzo1x_worst_compress() for
> > > LZO?
> >
> > Oops, it looks those limits were wrongly used...
> >
> > #define LZ4_COMPRESSBOUND(isize) (\
> > (unsigned int)(isize) > (unsigned int)LZ4_MAX_INPUT_SIZE \
> > ? 0 \
> > : (isize) + ((isize)/255) + 16)
> >
> > #define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3 + 2)
> >
> > Newly calculated boundary size is larger than target buffer size.
> >
> > However comments on LZ4_compress_default() said:
> >
> > ...
> > * @maxOutputSize: full or partial size of buffer 'dest'
> > * which must be already allocated
> > ...
> > int LZ4_compress_default(const char *source, char *dest, int inputSize,
> > int maxOutputSize, void *wrkmem);
> >
> > And @out_len in lzo1x_1_compress() was passed as an output parameter to
> > pass length of data that compressor compressed into @out buffer.
> >
> > Let me know if I missed sth.
> >
> > Thannks,
> >
> > > Could we save more memory space for these two cases like ZSTD?
> > > As you know, we are using 5 pages compression buffer for LZ4 and LZO in
> > > compress_log_size=2,
> > > and if the compressed data doesn't fit in 3 pages, it returns -EAGAIN to
> > > give up compressing that one.
> > >
> > > Thanks,
> > >
> > > 2020년 5월 8일 (금) 오전 10:17, Chao Yu <[email protected] <mailto:[email protected]>>님이 작성:
> > >
> > >> During zstd compression, ZSTD_endStream() may return non-zero value
> > >> because distination buffer is full, but there is still compressed data
> > >> remained in intermediate buffer, it means that zstd algorithm can not
> > >> save at last one block space, let's just writeback raw data instead of
> > >> compressed one, this can fix data corruption when decompressing
> > >> incomplete stored compression data.
> > >>
> > >> Signed-off-by: Daeho Jeong <[email protected] <mailto:[email protected]>>
> > >> Signed-off-by: Chao Yu <[email protected] <mailto:[email protected]>>
> > >> ---
> > >> fs/f2fs/compress.c | 7 +++++++
> > >> 1 file changed, 7 insertions(+)
> > >>
> > >> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> > >> index c22cc0d37369..5e4947250262 100644
> > >> --- a/fs/f2fs/compress.c
> > >> +++ b/fs/f2fs/compress.c
> > >> @@ -358,6 +358,13 @@ static int zstd_compress_pages(struct compress_ctx
> > >> *cc)
> > >> return -EIO;
> > >> }
> > >>
> > >> + /*
> > >> + * there is compressed data remained in intermediate buffer due to
> > >> + * no more space in cbuf.cdata
> > >> + */
> > >> + if (ret)
> > >> + return -EAGAIN;
> > >> +
> > >> cc->clen = outbuf.pos;
> > >> return 0;
> > >> }
> > >> --
> > >> 2.18.0.rc1
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Linux-f2fs-devel mailing list
> > >> [email protected] <mailto:[email protected]>
> > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > >>
> > >
> >