LinuxLists.cc - [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

Hi Sergey,

>You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
16 bits of offset. So only 15 Bits can be used to store offset value.

>'page should be less than 32KB', which I'm sure will be confusing.
lz4_dyn will work on bigger data length(> 32k) but in that case compression
ratio may not be better than LZ4. This is same as LZ4 compressing data more
than 64K (16Bits). LZ4 can't store offset more than 64K similarly
LZ4 dyn can't store offset more than 32K.

There is a handling in LZ4 code for this and similar handling added for LZ4 Dyn.

Handling in LZ4 Dyn: max_distance is 32K for lz4_dyn and will be 64K for LZ4
int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

>And you
>rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
>`#if (PAGE_SIZE < (32 * KB))'.
They don't need to add this code, they just need to choose right compression algorithm
that fits their requirement. If source length is less than 32K then lz4_dyn
would give better compression ratio then LZ4.

Considering ZRAM as a user for LZ4 dyn, we have added this check for PAGE_SIZE which
is source length. This code adds lz4 dyn to preferred list of compression algorithm
when PAGE size is less than 32K.

>Apart from that, lz4_dyn supports only data
>in up to page_size chunks. Suppose my system has page_size of less than 32K,
>so I legitimately can enable lz4_dyn, but suppose that I will use it
>somewhere where I don't work with page_size-d chunks. Will I able to just
>do tfm->compress(src, sz) on random buffers? The whole thing looks to be
>quite fragile.
No thats not true, lz4_dyn can work for random buffers and it need not be
of page size chunks. There is no difference in Lz4 and Lz4 dyn working.

Only difference is LZ4 dyn doesn't use fixed offset size, this concept already getting
used in LZO which uses dynamic size of Metadata based on Match Length and Match offset.
It uses different markers for this which defines length of meta data.

lzodefs.h:

#define M1_MAX_OFFSET 0x0400
#define M2_MAX_OFFSET 0x0800
#define M3_MAX_OFFSET 0x4000
#define M4_MAX_OFFSET 0xbfff

#define M1_MIN_LEN 2
#define M1_MAX_LEN 2
#define M2_MIN_LEN 3
#define M2_MAX_LEN 8
#define M3_MIN_LEN 3
#define M3_MAX_LEN 33
#define M4_MIN_LEN 3
#define M4_MAX_LEN 9

#define M1_MARKER 0
#define M2_MARKER 64
#define M3_MARKER 32
#define M4_MARKER 16

Similarly for LZ4 Dyn, we have used 1 bit as a marker to determine offset length.

Thanks & Regards,
Vaneet Narang

Attachments:

rcptInfo.txt (1.88 kB)

2018-04-04 01:40:23

by Sergey Senozhatsky

[permalink] [raw]

Subject: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

On (04/03/18 19:13), Vaneet Narang wrote:
> Hi Sergey,
>
> >You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
> 2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
> 16 bits of offset. So only 15 Bits can be used to store offset value.

Yes, you are right. My bad, was thinking about something else.

> >'page should be less than 32KB', which I'm sure will be confusing.
> lz4_dyn will work on bigger data length(> 32k) but in that case compression
> ratio may not be better than LZ4. This is same as LZ4 compressing data more
> than 64K (16Bits). LZ4 can't store offset more than 64K similarly
> LZ4 dyn can't store offset more than 32K.

Then drop that `if PAGE_SIZE' thing. I'd rather do that stuff internally
in lz4... if it needed at all.

> >And you
> >rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
> >`#if (PAGE_SIZE < (32 * KB))'.
> They don't need to add this code

Then drop it.

> >Apart from that, lz4_dyn supports only data
> >in up to page_size chunks. Suppose my system has page_size of less than 32K,
> >so I legitimately can enable lz4_dyn, but suppose that I will use it
> >somewhere where I don't work with page_size-d chunks. Will I able to just
> >do tfm->compress(src, sz) on random buffers? The whole thing looks to be
> >quite fragile.
> No thats not true, lz4_dyn can work for random buffers and it need not be
> of page size chunks. There is no difference in Lz4 and Lz4 dyn working.

You are right.

-ss

2018-04-16 10:21:38

by Maninder Singh

[permalink] [raw]

Subject: Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

Hello Nick/ Yann,

Any inputs regarding LZ4 dyn results & lz4 dyn approach.

>Hello Nick/Sergey,
>
>Any suggestion or comments, so that we can change code and resend the patch?
>
>> Hi Nick / Sergey,
>>
>>
>> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
>> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
>> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead
>> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
>> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
>>
>> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
>>
>> Original LZ4:
>> sh-3.2# ./lzbench -r -elz4 data/
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 2205 MB/s 2217 MB/s 4096 100.00 data//data_1
>> lz4 1.8.0 216 MB/s 761 MB/s 2433 59.40 data//data_1
>> lz4 1.8.0 269 MB/s 877 MB/s 1873 45.73 data//data_2
>> lz4 1.8.0 238 MB/s 575 MB/s 2060 50.29 data//data_3
>> lz4 1.8.0 321 MB/s 1015 MB/s 1464 35.74 data//data_4
>> lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5
>> lz4 1.8.0 296 MB/s 956 MB/s 1597 38.99 data//data_6
>> lz4 1.8.0 338 MB/s 994 MB/s 2238 54.64 data//data_7
>> lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8
>> lz4 1.8.0 404 MB/s 1150 MB/s 1097 26.78 data//data_9
>> lz4 1.8.0 216 MB/s 921 MB/s 3183 77.71 data//data_10
>> lz4 1.8.0 456 MB/s 1101 MB/s 1011 24.68 data//data_11
>> lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 data//data_12
>>
>>
>> LZ4 Dynamic Offet:
>> sh-3.2# ./lzbench -r -elz4_dyn data/
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 2203 MB/s 2218 MB/s 4096 100.00 data//data_1
>> lz4 1.8.0 218 MB/s 693 MB/s 2228 54.39 data//data_1
>> lz4 1.8.0 273 MB/s 851 MB/s 1739 42.46 data//data_2
>> lz4 1.8.0 230 MB/s 526 MB/s 1800 43.95 data//data_3
>> lz4 1.8.0 321 MB/s 952 MB/s 1357 33.13 data//data_4
>> lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5
>> lz4 1.8.0 303 MB/s 964 MB/s 1455 35.52 data//data_6
>> lz4 1.8.0 345 MB/s 951 MB/s 2126 51.90 data//data_7
>> lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8
>> lz4 1.8.0 409 MB/s 1257 MB/s 1033 25.22 data//data_9
>> lz4 1.8.0 220 MB/s 857 MB/s 3049 74.44 data//data_10
>> lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 data//data_11
>> lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 data//data_12
>>
>>
>> LZ4 Dynamic Offset with 32K data:
>> sh-3.2# ./lzbench -elz4_dyn data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 5285 MB/s 5283 MB/s 32768 100.00 data/data32k
>> lz4 1.8.0 274 MB/s 995 MB/s 13435 41.00 data/data32k
>> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
>>
>> Original LZ4 with 32K data:
>> sh-3.2# ./lzbench_orig -elz4 data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 4918 MB/s 5108 MB/s 32768 100.00 data/data32k
>> lz4 1.8.0 276 MB/s 1045 MB/s 14492 44.23 data/data32k
>>
>> LZO1x with 32K data (Default Compressor for ZRAM):
>> sh-3.2# ./lzbench -elzo1x,1 data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 5273 MB/s 5320 MB/s 32768 100.00 data/data32k
>> lzo1x 2.09 -1 283 MB/s 465 MB/s 14292 43.62 data/data32k

Thanks,
Maninder Singh

2018-04-16 19:34:29

by Yann Collet

[permalink] [raw]

Subject: Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

Hi Singh

I don't have any strong opinion on this topic.

You made your case clear:
your variant trades a little bit of speed for a little bit more compression ratio.
In the context of zram, it makes sense, and I would expect it to work, as advertised in your benchmark results.
(disclaimer: I haven't reproduced these results, just, they look reasonable to me, I have no reason to doubt them).

So, the issue is less about performance, than about code complexity.

As mentioned, this is an incompatible variant.
So, it requires its own entry point, and preferably its own code path
(even if it's heavily duplicated,
mixing it with regular lz4 source code, as proposed in the patch, will be bad for maintenance,
and can negatively impact regular lz4 usage, outside of zram).

So that's basically the "cost" of adding this option.

Is it worth it?
Well, this is completely outside of my responsibility area, so I really can't tell.
You'll have to convince people in charge that the gains are worth their complexity,
since _they_ will inherit the duty to keep the system working through its future evolutions.
At a minimum, you are targeting maintainers of zram and the crypto interface.
For this topic, they are the right people to talk to.

On 4/16/18, 04:09, "Maninder Singh" <[email protected]> wrote:

Hello Nick/ Yann,

Any inputs regarding LZ4 dyn results & lz4 dyn approach.

>Hello Nick/Sergey,
>
>Any suggestion or comments, so that we can change code and resend the patch?
>
>> Hi Nick / Sergey,
>>
>>
>> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
>> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
>> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead
>> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
>> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
>>
>> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
>>
>> Original LZ4:
>> sh-3.2# ./lzbench -r -elz4 data/
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 2205 MB/s 2217 MB/s 4096 100.00 data//data_1
>> lz4 1.8.0 216 MB/s 761 MB/s 2433 59.40 data//data_1
>> lz4 1.8.0 269 MB/s 877 MB/s 1873 45.73 data//data_2
>> lz4 1.8.0 238 MB/s 575 MB/s 2060 50.29 data//data_3
>> lz4 1.8.0 321 MB/s 1015 MB/s 1464 35.74 data//data_4
>> lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5
>> lz4 1.8.0 296 MB/s 956 MB/s 1597 38.99 data//data_6
>> lz4 1.8.0 338 MB/s 994 MB/s 2238 54.64 data//data_7
>> lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8
>> lz4 1.8.0 404 MB/s 1150 MB/s 1097 26.78 data//data_9
>> lz4 1.8.0 216 MB/s 921 MB/s 3183 77.71 data//data_10
>> lz4 1.8.0 456 MB/s 1101 MB/s 1011 24.68 data//data_11
>> lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 data//data_12
>>
>>
>> LZ4 Dynamic Offet:
>> sh-3.2# ./lzbench -r -elz4_dyn data/
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 2203 MB/s 2218 MB/s 4096 100.00 data//data_1
>> lz4 1.8.0 218 MB/s 693 MB/s 2228 54.39 data//data_1
>> lz4 1.8.0 273 MB/s 851 MB/s 1739 42.46 data//data_2
>> lz4 1.8.0 230 MB/s 526 MB/s 1800 43.95 data//data_3
>> lz4 1.8.0 321 MB/s 952 MB/s 1357 33.13 data//data_4
>> lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5
>> lz4 1.8.0 303 MB/s 964 MB/s 1455 35.52 data//data_6
>> lz4 1.8.0 345 MB/s 951 MB/s 2126 51.90 data//data_7
>> lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8
>> lz4 1.8.0 409 MB/s 1257 MB/s 1033 25.22 data//data_9
>> lz4 1.8.0 220 MB/s 857 MB/s 3049 74.44 data//data_10
>> lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 data//data_11
>> lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 data//data_12
>>
>>
>> LZ4 Dynamic Offset with 32K data:
>> sh-3.2# ./lzbench -elz4_dyn data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 5285 MB/s 5283 MB/s 32768 100.00 data/data32k
>> lz4 1.8.0 274 MB/s 995 MB/s 13435 41.00 data/data32k
>> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
>>
>> Original LZ4 with 32K data:
>> sh-3.2# ./lzbench_orig -elz4 data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 4918 MB/s 5108 MB/s 32768 100.00 data/data32k
>> lz4 1.8.0 276 MB/s 1045 MB/s 14492 44.23 data/data32k
>>
>> LZO1x with 32K data (Default Compressor for ZRAM):
>> sh-3.2# ./lzbench -elzo1x,1 data/data32k
>> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size Ratio Filename
>> memcpy 5273 MB/s 5320 MB/s 32768 100.00 data/data32k
>> lzo1x 2.09 -1 283 MB/s 465 MB/s 14292 43.62 data/data32k

Thanks,
Maninder Singh

2018-04-16 20:01:18

by Eric Biggers

[permalink] [raw]

Subject: Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

On Mon, Apr 16, 2018 at 07:34:29PM +0000, Yann Collet wrote:
> Hi Singh
>
> I don't have any strong opinion on this topic.
>
> You made your case clear:
> your variant trades a little bit of speed for a little bit more compression ratio.
> In the context of zram, it makes sense, and I would expect it to work, as advertised in your benchmark results.
> (disclaimer: I haven't reproduced these results, just, they look reasonable to me, I have no reason to doubt them).
>
> So, the issue is less about performance, than about code complexity.
>
> As mentioned, this is an incompatible variant.
> So, it requires its own entry point, and preferably its own code path
> (even if it's heavily duplicated,
> mixing it with regular lz4 source code, as proposed in the patch, will be bad for maintenance,
> and can negatively impact regular lz4 usage, outside of zram).
>
> So that's basically the "cost" of adding this option.
>
> Is it worth it?
> Well, this is completely outside of my responsibility area, so I really can't tell.
> You'll have to convince people in charge that the gains are worth their complexity,
> since _they_ will inherit the duty to keep the system working through its future evolutions.
> At a minimum, you are targeting maintainers of zram and the crypto interface.
> For this topic, they are the right people to talk to.
>
>
> On 4/16/18, 04:09, "Maninder Singh" <[email protected]> wrote:
>
>
> Hello Nick/ Yann,
>
> Any inputs regarding LZ4 dyn results & lz4 dyn approach.
>
> >Hello Nick/Sergey,
> >
> >Any suggestion or comments, so that we can change code and resend the patch?
> >
> >> Hi Nick / Sergey,
> >>
> >>
> >> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb)
> >> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench)
> >> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead
> >> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression
> >> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
> >>
> >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM.
> >>

Unfortunately the track record of maintaining compression code in the Linux
kernel is not great. zlib for example was forked from v1.2.3, which was
released in 2005, and hasn't been updated since besides some random drive-by
patches which have made it diverge even further from upstream. There have even
been bugs assigned CVE numbers in upstream zlib, and I don't think anyone has
looked at whether the Linux kernel version has those bugs or not.

The story with LZ4 is a bit better as someone updated it to v1.7.3 last year.
But, it took a lot of rounds of review in which I had to point out some subtle
regressions like the hash table size being accidentally changed, and things not
being inlined that should be. And of course now that version is outdated
already.

We also have LZO and Zstandard in the kernel to maintain too, as well as XZ
decompression.

And very problematically, *none* of these compression algorithms have a
maintainer listed in the MAINTAINERS file.

So in my opinion, as a prerequisite for this change, someone would need to
volunteer to actually maintain LZ4 in the kernel.

Thanks,

Eric