This patch series adds a new compression algorithm to the kernel and to
the crypto api.
Changes since v6:
- Fixed git apply error due to other recently applied patches
Changes since v5:
- Fixed compile-error due to variable definitions inside #ifdef CONFIG_ZRAM_WRITEBACK
Changes since v4:
- Fix mismatching function-prototypes
- Fix mismatching License errors
- Add static to global vars
- Add ULL to long constants
Changes since v3:
- Split patch into patchset
- Add Zstd = Zstandard to the list of benchmarked algorithms
- Added configurable compression levels to crypto-api
- Added multiple compression levels to the benchmarks below
- Added unsafe decompressor functions to crypto-api
- Added flag to mark unstable algorithms to crypto-api
- Test the code using afl-fuzz -> and fix the code
- Added 2 new Benchmark datasets
- checkpatch.pl fixes
Changes since v2:
- added linux-kernel Mailinglist
Changes since v1:
- improved documentation
- improved code style
- replaced numerous casts with get_unaligned*
- added tests in crypto/testmgr.h/c
- added zBeWalgo to the list of algorithms shown by
/sys/block/zram0/comp_algorithm
Currently ZRAM uses compression-algorithms from the crypto-api. ZRAM
compresses each page individually. As a result the compression algorithm is
forced to use a very small sliding window. None of the available compression
algorithms is designed to achieve high compression ratios with small inputs.
This patch-set adds a new compression algorithm 'zBeWalgo' to the crypto api.
This algorithm focusses on increasing the capacity of the compressed
block-device created by ZRAM. The choice of compression algorithms is always
a tradeoff between speed and compression ratio.
If faster algorithms like 'lz4' are chosen the compression ratio is often
lower than the ratio of zBeWalgo as shown in the following benchmarks. Due to
the lower compression ratio, ZRAM needs to fall back to backing_devices
mode often. If backing_devices are required, the effective speed of ZRAM is a
weighted average of de/compression time and writing/reading from the
backing_device. This should be considered when comparing the speeds in the
benchmarks.
There are different kinds of backing_devices, each with its own drawbacks.
1. HDDs: This kind of backing device is very slow. If the compression ratio
of an algorithm is much lower than the ratio of zBeWalgo, it might be faster
to use zBewalgo instead.
2. SSDs: I tested a swap partition on my NVME-SSD. The speed is even higher
than zram with lz4, but after about 5 Minutes the SSD is blocking all
read/write requests due to overheating. This is definitly not an option.
Benchmarks:
To obtain reproducable benchmarks, the datasets were first loaded into a
userspace-program. Than the data is written directly to a clean
zram-partition without any filesystem. Between writing and reading 'sync'
and 'echo 3 > /proc/sys/vm/drop_caches' is called. All time measurements are
wall clock times, and the benchmarks are using only one cpu-core at a time.
The new algorithm is compared to all available compression algorithms from
the crypto-api.
Before loading the datasets to user-space deduplication is applied, since
none Algorithm has deduplication. Duplicated pages are removed to
prevent an algorithm to obtain high/low ratios, just because a single page can
be compressed very well - or not.
All Algorithms marked with '*' are using unsafe decompression.
All Read and Write Speed Measurements are given in MBit/s
zbewalgo' uses per dataset specialized different combinations. These can be
specified at runtime via /sys/kernel/zbewalgo/combinations.
- '/dev/zero' This dataset is used to measure the speed limitations
for ZRAM. ZRAM filters zero-data internally and does not even call the
specified compression algorithm.
Algorithm write read
--zram-- 2724.08 2828.87
- 'ecoham' This dataset is one of the input files for the scientific
application ECOHAM which runs an ocean simulation. This dataset contains a
lot of zeros - even after deduplication. Where the data is not zero there are
arrays of floating point values, adjacent float values are likely to be
similar to each other, allowing for high compression ratios.
zbewalgo reaches very high compression ratios and is a lot faster than other
algorithms with similar compression ratios.
Algorithm ratio write read
--hdd-- 1.00 134.70 156.62
lz4*_10 6.73 1303.12 1547.17
lz4_10 6.73 1303.12 1574.51
lzo 6.88 1205.98 1468.09
lz4*_05 7.00 1291.81 1642.41
lz4_05 7.00 1291.81 1682.81
lz4_07 7.13 1250.29 1593.89
lz4*_07 7.13 1250.29 1677.08
lz4_06 7.16 1307.62 1666.66
lz4*_06 7.16 1307.62 1669.42
lz4_03 7.21 1250.87 1449.48
lz4*_03 7.21 1250.87 1621.97
lz4*_04 7.23 1281.62 1645.56
lz4_04 7.23 1281.62 1666.81
lz4_02 7.33 1267.54 1523.11
lz4*_02 7.33 1267.54 1576.54
lz4_09 7.36 1140.55 1510.01
lz4*_09 7.36 1140.55 1692.38
lz4*_01 7.36 1215.40 1575.38
lz4_01 7.36 1215.40 1676.65
lz4_08 7.36 1242.73 1544.07
lz4*_08 7.36 1242.73 1692.92
lz4hc_01 7.51 235.85 1545.61
lz4hc*_01 7.51 235.85 1678.00
lz4hc_02 7.62 226.30 1697.42
lz4hc*_02 7.62 226.30 1738.79
lz4hc*_03 7.71 194.64 1711.58
lz4hc_03 7.71 194.64 1713.59
lz4hc*_04 7.76 177.17 1642.39
lz4hc_04 7.76 177.17 1698.36
deflate_1 7.80 84.71 584.89
lz4hc*_05 7.81 149.11 1558.43
lz4hc_05 7.81 149.11 1686.71
deflate_2 7.82 82.83 599.38
deflate_3 7.86 84.27 616.05
lz4hc_06 7.88 106.61 1680.52
lz4hc*_06 7.88 106.61 1739.78
zstd_07 7.92 230.34 1016.91
zstd_05 7.92 252.71 1070.46
zstd_06 7.93 237.84 1062.11
lz4hc*_07 7.94 75.22 1751.91
lz4hc_07 7.94 75.22 1768.98
zstd_04 7.94 403.21 1080.62
zstd_03 7.94 411.91 1077.26
zstd_01 7.94 455.89 1082.54
zstd_09 7.94 456.81 1079.22
zstd_08 7.94 459.54 1082.07
zstd_02 7.94 465.82 1056.67
zstd_11 7.95 150.15 1070.31
zstd_10 7.95 169.95 1107.86
lz4hc_08 7.98 49.53 1611.61
lz4hc*_08 7.98 49.53 1793.68
lz4hc_09 7.98 49.62 1629.63
lz4hc*_09 7.98 49.62 1639.83
lz4hc*_10 7.99 37.96 1742.65
lz4hc_10 7.99 37.96 1790.08
zbewalgo 8.02 38.58 237.92
zbewalgo* 8.02 38.58 239.10
842 8.05 169.90 597.01
zstd_13 8.06 129.78 1131.66
zstd_12 8.06 135.50 1126.59
deflate_4 8.16 71.14 546.52
deflate_5 8.17 70.86 537.05
zstd_17 8.19 61.46 1061.45
zstd_14 8.20 124.43 1133.68
zstd_18 8.21 56.82 1151.25
zstd_19 8.22 51.51 1161.83
zstd_20 8.24 44.26 1108.36
zstd_16 8.25 76.26 1042.82
zstd_15 8.25 86.65 1181.98
deflate_6 8.28 66.45 619.62
deflate_7 8.30 63.83 631.13
zstd_21 8.41 6.73 1177.38
zstd_22 8.46 2.23 1188.39
deflate_9 8.47 44.16 678.43
deflate_8 8.47 48.00 677.50
zbewalgo' 8.80 634.68 1247.56
zbewalgo'* 8.80 634.68 1429.42
- 'source-code' This dataset is a tarball of the source-code from a
linux-kernel.
zBeWalgo is very bad in compressing text based datasets.
Algorithm ratio write read
--hdd-- 1.00 134.70 156.62
lz4_10 1.49 584.41 1200.01
lz4*_10 1.49 584.41 1251.79
lz4*_07 1.64 559.05 1160.75
lz4_07 1.64 559.05 1160.97
842 1.65 63.66 158.53
lz4_06 1.71 513.03 1068.18
lz4*_06 1.71 513.03 1162.68
lz4_05 1.78 526.31 1136.51
lz4*_05 1.78 526.31 1144.81
lz4*_04 1.87 506.63 1106.31
lz4_04 1.87 506.63 1132.96
zbewalgo 1.89 27.56 35.04
zbewalgo* 1.89 27.56 36.20
zbewalgo' 1.89 46.62 34.75
zbewalgo'* 1.89 46.62 36.34
lz4_03 1.98 485.91 984.92
lz4*_03 1.98 485.91 1125.68
lz4_02 2.07 454.96 1061.05
lz4*_02 2.07 454.96 1133.42
lz4_01 2.17 441.11 1141.52
lz4*_01 2.17 441.11 1146.26
lz4*_08 2.17 446.45 1103.61
lz4_08 2.17 446.45 1163.91
lz4*_09 2.17 453.21 1071.91
lz4_09 2.17 453.21 1155.43
lzo 2.27 430.27 871.87
lz4hc*_01 2.35 137.71 1089.94
lz4hc_01 2.35 137.71 1200.45
lz4hc_02 2.38 139.18 1117.44
lz4hc*_02 2.38 139.18 1210.58
lz4hc_03 2.39 127.09 1097.90
lz4hc*_03 2.39 127.09 1214.22
lz4hc_10 2.40 96.26 1203.89
lz4hc*_10 2.40 96.26 1221.94
lz4hc*_08 2.40 98.80 1191.79
lz4hc_08 2.40 98.80 1226.59
lz4hc*_09 2.40 102.36 1213.34
lz4hc_09 2.40 102.36 1225.45
lz4hc*_07 2.40 113.81 1217.63
lz4hc_07 2.40 113.81 1218.49
lz4hc*_06 2.40 117.32 1214.13
lz4hc_06 2.40 117.32 1224.51
lz4hc_05 2.40 122.12 1108.34
lz4hc*_05 2.40 122.12 1214.97
lz4hc*_04 2.40 124.91 1093.58
lz4hc_04 2.40 124.91 1222.05
zstd_01 2.93 200.01 401.15
zstd_08 2.93 200.01 414.52
zstd_09 2.93 200.26 394.83
zstd_02 3.00 201.12 405.73
deflate_1 3.01 53.83 240.64
deflate_2 3.05 52.58 243.31
deflate_3 3.08 52.07 244.84
zstd_04 3.10 158.80 365.06
zstd_03 3.10 169.56 405.92
zstd_05 3.18 125.00 410.23
zstd_06 3.20 106.50 404.81
zstd_07 3.21 99.02 404.23
zstd_15 3.22 24.95 376.58
zstd_16 3.22 26.88 416.44
deflate_4 3.22 45.26 225.56
zstd_13 3.22 62.53 388.33
zstd_14 3.22 64.15 391.81
zstd_12 3.22 66.24 417.67
zstd_11 3.22 66.44 404.31
zstd_10 3.22 73.13 401.98
zstd_17 3.24 14.66 412.00
zstd_18 3.25 13.37 408.46
deflate_5 3.26 43.54 252.18
deflate_7 3.27 39.37 245.63
deflate_6 3.27 42.51 251.33
deflate_9 3.28 40.02 253.99
deflate_8 3.28 40.10 253.98
zstd_19 3.34 10.36 399.85
zstd_22 3.35 4.88 353.63
zstd_21 3.35 6.02 323.33
zstd_20 3.35 8.34 339.81
- 'hpcg' This dataset is a (partial) memory-snapshot of the
running hpcg-benchmark. At the time of the snapshot, that application
performed a sparse matrix - vector multiplication.
The compression ratio of zBeWalgo on this dataset is nearly 3 times higher
than the ratio of any other algorithm regardless of the compression-level
specified.
Algorithm ratio write read
--hdd-- 1.00 134.70 156.62
lz4*_10 1.00 1130.73 2131.82
lz4_10 1.00 1130.73 2181.60
lz4_06 1.34 625.48 1145.74
lz4*_06 1.34 625.48 1145.90
lz4_07 1.57 515.39 895.42
lz4*_07 1.57 515.39 1062.53
lz4*_05 1.72 539.40 1030.76
lz4_05 1.72 539.40 1038.86
lzo 1.76 475.20 805.41
lz4_08 1.76 480.35 939.16
lz4*_08 1.76 480.35 1015.04
lz4*_03 1.76 488.05 893.13
lz4_03 1.76 488.05 1013.65
lz4*_09 1.76 501.49 1032.69
lz4_09 1.76 501.49 1105.47
lz4*_01 1.76 501.54 1040.72
lz4_01 1.76 501.54 1102.22
lz4*_02 1.76 510.79 1014.78
lz4_02 1.76 510.79 1080.69
lz4_04 1.76 516.18 1047.06
lz4*_04 1.76 516.18 1049.55
842 2.35 109.68 192.50
lz4hc_07 2.36 152.57 1265.77
lz4hc*_07 2.36 152.57 1331.01
lz4hc*_06 2.36 155.78 1313.85
lz4hc_06 2.36 155.78 1346.52
lz4hc*_08 2.36 158.80 1297.16
lz4hc_08 2.36 158.80 1382.54
lz4hc*_10 2.36 159.84 1317.81
lz4hc_10 2.36 159.84 1346.85
lz4hc*_03 2.36 160.01 1162.91
lz4hc_03 2.36 160.01 1377.09
lz4hc*_09 2.36 161.02 1320.87
lz4hc_09 2.36 161.02 1374.39
lz4hc*_05 2.36 164.67 1324.40
lz4hc_05 2.36 164.67 1341.64
lz4hc*_04 2.36 168.11 1323.19
lz4hc_04 2.36 168.11 1377.56
lz4hc_01 2.36 168.40 1231.55
lz4hc*_01 2.36 168.40 1329.72
lz4hc*_02 2.36 170.74 1316.54
lz4hc_02 2.36 170.74 1337.42
deflate_3 3.52 46.51 336.67
deflate_2 3.52 62.05 343.03
deflate_1 3.52 65.68 359.96
deflate_4 4.01 61.01 432.66
deflate_8 4.61 41.51 408.29
deflate_5 4.61 44.09 434.79
deflate_9 4.61 45.14 417.18
deflate_7 4.61 45.22 440.27
deflate_6 4.61 46.01 440.39
zstd_09 5.95 277.11 542.93
zstd_08 5.95 277.40 541.27
zstd_01 5.95 277.41 540.61
zstd_16 5.97 32.05 465.03
zstd_15 5.97 39.12 515.07
zstd_13 5.97 70.90 511.94
zstd_14 5.97 72.20 522.68
zstd_11 5.97 74.14 512.18
zstd_12 5.97 74.27 497.95
zstd_10 5.97 86.98 519.78
zstd_07 5.97 135.16 504.07
zstd_06 5.97 145.49 505.10
zstd_05 6.02 177.86 510.08
zstd_04 6.02 205.13 516.29
zstd_03 6.02 217.82 515.50
zstd_02 6.02 260.97 484.64
zstd_18 6.27 12.10 490.72
zstd_17 6.27 12.33 462.65
zstd_21 6.70 9.25 391.16
zstd_20 6.70 9.50 395.38
zstd_22 6.70 9.74 390.99
zstd_19 6.70 9.99 450.42
zbewalgo 16.33 47.17 430.06
zbewalgo* 16.33 47.17 436.92
zbewalgo' 16.33 188.86 427.78
zbewalgo'* 16.33 188.86 437.43
- 'partdiff' (8 GiB) Array of double values. Adjacent doubles are similar, but
not equal. This array is produced by a partial differential equation solver
using a Jakobi-implementation.
zBewalgo gains higher compression ratios than all other algorithms.
Some algorithms are even slower than a hdd without any compression at all.
Algorithm ratio write read
zstd_18 1.00 13.77 2080.06
zstd_17 1.00 13.80 2075.23
zstd_16 1.00 28.04 2138.99
zstd_15 1.00 45.04 2143.32
zstd_13 1.00 55.72 2128.27
zstd_14 1.00 56.09 2123.54
zstd_11 1.00 57.31 2095.04
zstd_12 1.00 57.53 2134.61
842 1.00 61.61 2267.89
zstd_10 1.00 80.40 2081.35
zstd_07 1.00 120.66 2119.09
zstd_06 1.00 128.80 2134.02
zstd_05 1.00 131.25 2133.01
--hdd-- 1.00 134.70 156.62
lz4hc*_03 1.00 152.82 1982.94
lz4hc_03 1.00 152.82 2261.55
lz4hc*_07 1.00 159.43 1990.03
lz4hc_07 1.00 159.43 2269.05
lz4hc_10 1.00 166.33 2243.78
lz4hc*_10 1.00 166.33 2260.63
lz4hc_09 1.00 167.03 2244.20
lz4hc*_09 1.00 167.03 2264.72
lz4hc*_06 1.00 167.17 2245.15
lz4hc_06 1.00 167.17 2271.88
lz4hc_08 1.00 167.49 2237.79
lz4hc*_08 1.00 167.49 2283.98
lz4hc_02 1.00 167.51 2275.36
lz4hc*_02 1.00 167.51 2279.72
lz4hc*_05 1.00 167.52 2248.92
lz4hc_05 1.00 167.52 2273.99
lz4hc*_04 1.00 167.71 2268.23
lz4hc_04 1.00 167.71 2268.78
lz4hc*_01 1.00 167.91 2268.76
lz4hc_01 1.00 167.91 2269.16
zstd_04 1.00 175.84 2241.60
zstd_03 1.00 176.35 2285.13
zstd_02 1.00 195.41 2269.51
zstd_09 1.00 199.47 2271.91
zstd_01 1.00 199.74 2287.15
zstd_08 1.00 199.87 2286.27
lz4_01 1.00 1160.95 2257.78
lz4*_01 1.00 1160.95 2275.42
lz4_08 1.00 1164.37 2280.06
lz4*_08 1.00 1164.37 2280.43
lz4*_09 1.00 1166.30 2263.05
lz4_09 1.00 1166.30 2280.54
lz4*_03 1.00 1174.00 2074.96
lz4_03 1.00 1174.00 2257.37
lz4_02 1.00 1212.18 2273.60
lz4*_02 1.00 1212.18 2285.66
lz4*_04 1.00 1253.55 2259.60
lz4_04 1.00 1253.55 2287.15
lz4_05 1.00 1279.88 2282.47
lz4*_05 1.00 1279.88 2287.05
lz4_06 1.00 1292.22 2277.95
lz4*_06 1.00 1292.22 2284.84
lz4*_07 1.00 1303.58 2276.10
lz4_07 1.00 1303.58 2276.99
lz4*_10 1.00 1304.80 2183.30
lz4_10 1.00 1304.80 2285.25
lzo 1.00 1360.88 2281.19
deflate_7 1.07 33.51 463.73
deflate_2 1.07 33.99 473.07
deflate_9 1.07 34.05 473.57
deflate_6 1.07 34.06 473.69
deflate_8 1.07 34.12 472.86
deflate_5 1.07 34.22 468.03
deflate_4 1.07 34.32 447.33
deflate_1 1.07 35.45 431.95
deflate_3 1.07 35.63 472.56
zstd_22 1.11 9.81 668.64
zstd_21 1.11 10.71 734.52
zstd_20 1.11 10.78 714.86
zstd_19 1.11 12.02 790.71
zbewalgo 1.29 25.93 225.07
zbewalgo* 1.29 25.93 226.72
zbewalgo'* 1.31 23.54 84.29
zbewalgo' 1.31 23.54 86.08
- 'Isabella CLOUDf01'
This dataset is an array of floating point values between 0.00000 and 0.00332.
Detailed Information about this dataset is online available at
http://www.vets.ucar.edu/vg/isabeldata/readme.html
All algorithms obtain similar compression ratios. The compression ratio of
zBeWalgo is slightly higher, and the speed is higher too.
Algorithm ratio write read
--hdd-- 1.00 134.70 156.62
lzo 2.06 1022.09 916.22
lz4*_10 2.09 1126.03 1533.35
lz4_10 2.09 1126.03 1569.06
lz4*_07 2.09 1135.89 1444.21
lz4_07 2.09 1135.89 1581.96
lz4*_01 2.10 972.22 1405.21
lz4_01 2.10 972.22 1579.78
lz4*_09 2.10 982.39 1429.17
lz4_09 2.10 982.39 1490.27
lz4_08 2.10 1006.56 1491.14
lz4*_08 2.10 1006.56 1558.66
lz4_02 2.10 1019.82 1366.16
lz4*_02 2.10 1019.82 1578.79
lz4_03 2.10 1129.74 1417.33
lz4*_03 2.10 1129.74 1456.68
lz4_04 2.10 1131.28 1478.27
lz4*_04 2.10 1131.28 1517.84
lz4_06 2.10 1147.78 1424.90
lz4*_06 2.10 1147.78 1462.47
lz4*_05 2.10 1172.44 1434.86
lz4_05 2.10 1172.44 1578.80
lz4hc*_10 2.11 29.01 1498.01
lz4hc_10 2.11 29.01 1580.23
lz4hc*_09 2.11 56.30 1510.26
lz4hc_09 2.11 56.30 1583.11
lz4hc_08 2.11 56.39 1426.43
lz4hc*_08 2.11 56.39 1565.12
lz4hc_07 2.11 129.27 1540.38
lz4hc*_07 2.11 129.27 1578.35
lz4hc*_06 2.11 162.72 1456.27
lz4hc_06 2.11 162.72 1581.69
lz4hc*_05 2.11 183.78 1487.71
lz4hc_05 2.11 183.78 1589.10
lz4hc*_04 2.11 187.41 1431.35
lz4hc_04 2.11 187.41 1566.24
lz4hc*_03 2.11 190.21 1531.98
lz4hc_03 2.11 190.21 1580.81
lz4hc*_02 2.11 199.69 1432.00
lz4hc_02 2.11 199.69 1565.10
lz4hc_01 2.11 205.87 1540.33
lz4hc*_01 2.11 205.87 1567.68
842 2.15 89.89 414.49
deflate_1 2.29 48.84 352.09
deflate_2 2.29 49.47 353.77
deflate_3 2.30 50.00 345.88
zstd_22 2.31 5.59 658.59
zstd_21 2.31 14.34 664.02
zstd_20 2.31 21.22 665.77
zstd_19 2.31 24.26 587.99
zstd_17 2.31 26.24 670.14
zstd_18 2.31 26.47 668.64
deflate_9 2.31 33.79 345.81
deflate_8 2.31 34.67 347.96
deflate_4 2.31 41.46 326.50
deflate_7 2.31 42.56 346.99
deflate_6 2.31 43.51 343.56
deflate_5 2.31 45.83 343.86
zstd_05 2.31 126.01 571.70
zstd_04 2.31 178.39 597.26
zstd_03 2.31 192.04 644.24
zstd_01 2.31 206.31 563.68
zstd_08 2.31 207.39 669.05
zstd_02 2.31 216.98 600.77
zstd_09 2.31 236.92 667.64
zstd_16 2.32 41.47 660.06
zstd_15 2.32 60.37 584.45
zstd_14 2.32 74.60 673.10
zstd_12 2.32 75.16 661.96
zstd_13 2.32 75.22 676.12
zstd_11 2.32 75.58 636.75
zstd_10 2.32 95.05 645.07
zstd_07 2.32 139.52 672.88
zstd_06 2.32 145.40 670.45
zbewalgo'* 2.37 337.07 463.32
zbewalgo' 2.37 337.07 468.96
zbewalgo* 2.60 101.17 578.35
zbewalgo 2.60 101.17 586.88
- 'Isabella TCf01'
This dataset is an array of floating point values between -83.00402 and 31.51576.
Detailed Information about this dataset is online available at
http://www.vets.ucar.edu/vg/isabeldata/readme.html
zBeWalgo is the only algorithm which can compress this dataset with a noticeable
compressionratio.
Algorithm ratio write read
842 1.00 60.09 1956.26
--hdd-- 1.00 134.70 156.62
lz4hc_01 1.00 154.81 1839.37
lz4hc*_01 1.00 154.81 2105.53
lz4hc_10 1.00 157.33 2078.69
lz4hc*_10 1.00 157.33 2113.14
lz4hc_09 1.00 158.50 2018.51
lz4hc*_09 1.00 158.50 2093.65
lz4hc*_02 1.00 159.54 2104.91
lz4hc_02 1.00 159.54 2117.34
lz4hc_03 1.00 161.26 2070.76
lz4hc*_03 1.00 161.26 2107.27
lz4hc*_08 1.00 161.34 2100.74
lz4hc_08 1.00 161.34 2105.26
lz4hc*_04 1.00 161.95 2080.96
lz4hc_04 1.00 161.95 2104.00
lz4hc_05 1.00 162.17 2044.43
lz4hc*_05 1.00 162.17 2101.74
lz4hc*_06 1.00 163.61 2087.19
lz4hc_06 1.00 163.61 2104.61
lz4hc_07 1.00 164.51 2094.78
lz4hc*_07 1.00 164.51 2105.53
lz4_01 1.00 1134.89 2109.70
lz4*_01 1.00 1134.89 2118.71
lz4*_08 1.00 1141.96 2104.87
lz4_08 1.00 1141.96 2118.97
lz4_09 1.00 1145.55 2087.76
lz4*_09 1.00 1145.55 2118.85
lz4_02 1.00 1157.28 2094.33
lz4*_02 1.00 1157.28 2124.67
lz4*_03 1.00 1194.18 2106.36
lz4_03 1.00 1194.18 2119.89
lz4_04 1.00 1195.09 2117.03
lz4*_04 1.00 1195.09 2120.23
lz4*_05 1.00 1225.56 2109.04
lz4_05 1.00 1225.56 2120.52
lz4*_06 1.00 1261.67 2109.14
lz4_06 1.00 1261.67 2121.13
lz4*_07 1.00 1270.86 1844.63
lz4_07 1.00 1270.86 2041.08
lz4_10 1.00 1305.36 2109.22
lz4*_10 1.00 1305.36 2120.65
lzo 1.00 1338.61 2109.66
zstd_17 1.03 13.93 1138.94
zstd_18 1.03 14.01 1170.78
zstd_16 1.03 27.12 1073.75
zstd_15 1.03 43.52 1061.97
zstd_14 1.03 49.60 1082.98
zstd_12 1.03 55.03 1042.43
zstd_13 1.03 55.14 1173.50
zstd_11 1.03 55.24 1178.05
zstd_10 1.03 70.01 1173.05
zstd_07 1.03 118.10 1041.92
zstd_06 1.03 123.00 1171.59
zstd_05 1.03 124.61 1165.74
zstd_01 1.03 166.80 1005.29
zstd_04 1.03 170.25 1127.75
zstd_03 1.03 171.40 1172.34
zstd_02 1.03 174.08 1017.34
zstd_09 1.03 195.30 1176.82
zstd_08 1.03 195.98 1175.09
deflate_9 1.05 30.15 483.55
deflate_8 1.05 30.45 466.67
deflate_5 1.05 31.25 480.92
deflate_4 1.05 31.84 472.81
deflate_7 1.05 31.84 484.18
deflate_6 1.05 31.94 481.37
deflate_2 1.05 33.07 484.09
deflate_3 1.05 33.11 463.57
deflate_1 1.05 33.19 469.71
zstd_22 1.06 8.89 647.75
zstd_21 1.06 10.70 700.11
zstd_20 1.06 10.80 723.42
zstd_19 1.06 12.41 764.24
zbewalgo* 1.51 146.45 581.43
zbewalgo 1.51 146.45 592.86
zbewalgo'* 1.54 38.14 120.96
zbewalgo' 1.54 38.14 125.81
Signed-off-by: Benjamin Warnke <[email protected]>
Benjamin Warnke (5):
add compression algorithm zBeWalgo
crypto: add zBeWalgo to crypto-api
crypto: add unsafe decompression to api
crypto: configurable compression level
crypto: add flag for unstable encoding
crypto/842.c | 3 +-
crypto/Kconfig | 12 +
crypto/Makefile | 1 +
crypto/api.c | 76 ++++
crypto/compress.c | 10 +
crypto/crypto_null.c | 3 +-
crypto/deflate.c | 19 +-
crypto/lz4.c | 39 +-
crypto/lz4hc.c | 36 +-
crypto/lzo.c | 3 +-
crypto/testmgr.c | 39 +-
crypto/testmgr.h | 134 +++++++
crypto/zbewalgo.c | 191 ++++++++++
drivers/block/zram/zcomp.c | 13 +-
drivers/block/zram/zcomp.h | 3 +-
drivers/block/zram/zram_drv.c | 56 ++-
drivers/block/zram/zram_drv.h | 2 +
drivers/crypto/cavium/zip/zip_main.c | 6 +-
drivers/crypto/nx/nx-842-powernv.c | 3 +-
drivers/crypto/nx/nx-842-pseries.c | 3 +-
fs/ubifs/compress.c | 2 +-
include/linux/crypto.h | 31 +-
include/linux/zbewalgo.h | 50 +++
lib/Kconfig | 3 +
lib/Makefile | 1 +
lib/zbewalgo/BWT.c | 120 ++++++
lib/zbewalgo/BWT.h | 21 ++
lib/zbewalgo/JBE.c | 204 ++++++++++
lib/zbewalgo/JBE.h | 13 +
lib/zbewalgo/JBE2.c | 221 +++++++++++
lib/zbewalgo/JBE2.h | 13 +
lib/zbewalgo/MTF.c | 122 ++++++
lib/zbewalgo/MTF.h | 13 +
lib/zbewalgo/Makefile | 4 +
lib/zbewalgo/RLE.c | 137 +++++++
lib/zbewalgo/RLE.h | 13 +
lib/zbewalgo/bewalgo.c | 401 ++++++++++++++++++++
lib/zbewalgo/bewalgo.h | 13 +
lib/zbewalgo/bewalgo2.c | 407 ++++++++++++++++++++
lib/zbewalgo/bewalgo2.h | 13 +
lib/zbewalgo/bitshuffle.c | 93 +++++
lib/zbewalgo/bitshuffle.h | 13 +
lib/zbewalgo/huffman.c | 262 +++++++++++++
lib/zbewalgo/huffman.h | 13 +
lib/zbewalgo/include.h | 94 +++++
lib/zbewalgo/zbewalgo.c | 713 +++++++++++++++++++++++++++++++++++
mm/zswap.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 3 +-
48 files changed, 3605 insertions(+), 42 deletions(-)
create mode 100644 crypto/zbewalgo.c
create mode 100644 include/linux/zbewalgo.h
create mode 100644 lib/zbewalgo/BWT.c
create mode 100644 lib/zbewalgo/BWT.h
create mode 100644 lib/zbewalgo/JBE.c
create mode 100644 lib/zbewalgo/JBE.h
create mode 100644 lib/zbewalgo/JBE2.c
create mode 100644 lib/zbewalgo/JBE2.h
create mode 100644 lib/zbewalgo/MTF.c
create mode 100644 lib/zbewalgo/MTF.h
create mode 100644 lib/zbewalgo/Makefile
create mode 100644 lib/zbewalgo/RLE.c
create mode 100644 lib/zbewalgo/RLE.h
create mode 100644 lib/zbewalgo/bewalgo.c
create mode 100644 lib/zbewalgo/bewalgo.h
create mode 100644 lib/zbewalgo/bewalgo2.c
create mode 100644 lib/zbewalgo/bewalgo2.h
create mode 100644 lib/zbewalgo/bitshuffle.c
create mode 100644 lib/zbewalgo/bitshuffle.h
create mode 100644 lib/zbewalgo/huffman.c
create mode 100644 lib/zbewalgo/huffman.h
create mode 100644 lib/zbewalgo/include.h
create mode 100644 lib/zbewalgo/zbewalgo.c
--
2.14.1
Most compression algorithms published by the crypto api are supporting
multiple different compression levels. The crypto api currently just
calls these algorithms with their default compression level.
This patch enables the caller to specify the compression level.
Signed-off-by: Benjamin Warnke <[email protected]>
---
crypto/api.c | 76 +++++++++++++++++++++++++++++++++++++++++++
crypto/deflate.c | 16 +++++----
crypto/lz4.c | 16 +++++----
crypto/lz4hc.c | 13 +++++---
crypto/testmgr.c | 2 +-
drivers/block/zram/zcomp.c | 10 +++---
drivers/block/zram/zcomp.h | 3 +-
drivers/block/zram/zram_drv.c | 24 ++++++++++++--
drivers/block/zram/zram_drv.h | 1 +
fs/pstore/platform.c | 2 +-
fs/ubifs/compress.c | 2 +-
include/linux/crypto.h | 9 +++--
mm/zswap.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 3 +-
14 files changed, 147 insertions(+), 32 deletions(-)
diff --git a/crypto/api.c b/crypto/api.c
index 1d5290c6..81c3d416 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -388,6 +388,47 @@ struct crypto_tfm *__crypto_alloc_tfm(struct crypto_alg *alg, u32 type,
}
EXPORT_SYMBOL_GPL(__crypto_alloc_tfm);
+struct crypto_tfm *__crypto_alloc_tfm_compress(struct crypto_alg *alg,
+ u32 type, u32 mask, int level)
+{
+ struct crypto_tfm *tfm = NULL;
+ unsigned int tfm_size;
+ int err = -ENOMEM;
+
+ tfm_size = sizeof(*tfm) + crypto_ctxsize(alg, type, mask);
+ tfm = kzalloc(tfm_size, GFP_KERNEL);
+ if (!tfm)
+ goto out_err;
+
+ tfm->__crt_alg = alg;
+ if (alg->cra_flags & CRYPTO_ALG_TYPE_COMPRESS)
+ tfm->crt_compress.cot_level = level;
+
+ err = crypto_init_ops(tfm, type, mask);
+ if (err)
+ goto out_free_tfm;
+
+ if (!tfm->exit && alg->cra_init) {
+ err = alg->cra_init(tfm);
+ if (err)
+ goto cra_init_failed;
+ }
+
+ goto out;
+
+cra_init_failed:
+ crypto_exit_ops(tfm);
+out_free_tfm:
+ if (err == -EAGAIN)
+ crypto_shoot_alg(alg);
+ kfree(tfm);
+out_err:
+ tfm = ERR_PTR(err);
+out:
+ return tfm;
+}
+EXPORT_SYMBOL_GPL(__crypto_alloc_tfm_compress);
+
/*
* crypto_alloc_base - Locate algorithm and allocate transform
* @alg_name: Name of algorithm
@@ -444,6 +485,41 @@ struct crypto_tfm *crypto_alloc_base(const char *alg_name, u32 type, u32 mask)
}
EXPORT_SYMBOL_GPL(crypto_alloc_base);
+struct crypto_tfm *crypto_alloc_base_compress(const char *alg_name, u32 type,
+ u32 mask, int level)
+{
+ struct crypto_tfm *tfm;
+ int err;
+
+ for (;;) {
+ struct crypto_alg *alg;
+
+ alg = crypto_alg_mod_lookup(alg_name, type, mask);
+ if (IS_ERR(alg)) {
+ err = PTR_ERR(alg);
+ goto err;
+ }
+
+ tfm = __crypto_alloc_tfm_compress(alg, type, mask, level);
+ if (!IS_ERR(tfm))
+ return tfm;
+
+ crypto_mod_put(alg);
+ err = PTR_ERR(tfm);
+
+err:
+ if (err != -EAGAIN)
+ break;
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
+ }
+
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(crypto_alloc_base_compress);
+
void *crypto_create_tfm(struct crypto_alg *alg,
const struct crypto_type *frontend)
{
diff --git a/crypto/deflate.c b/crypto/deflate.c
index 4b681a37..54a2ff21 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -24,6 +24,7 @@
* it is not needed for IPCOMP and keeps the code simpler. It can be
* implemented if someone wants it.
*/
+
#include <linux/init.h>
#include <linux/module.h>
#include <linux/crypto.h>
@@ -43,7 +44,7 @@ struct deflate_ctx {
struct z_stream_s decomp_stream;
};
-static int deflate_comp_init(struct deflate_ctx *ctx, int format)
+static int deflate_comp_init(struct deflate_ctx *ctx, int format, int level)
{
int ret = 0;
struct z_stream_s *stream = &ctx->comp_stream;
@@ -55,9 +56,9 @@ static int deflate_comp_init(struct deflate_ctx *ctx, int format)
goto out;
}
if (format)
- ret = zlib_deflateInit(stream, 3);
+ ret = zlib_deflateInit(stream, level);
else
- ret = zlib_deflateInit2(stream, DEFLATE_DEF_LEVEL, Z_DEFLATED,
+ ret = zlib_deflateInit2(stream, level, Z_DEFLATED,
-DEFLATE_DEF_WINBITS,
DEFLATE_DEF_MEMLEVEL,
Z_DEFAULT_STRATEGY);
@@ -109,11 +110,11 @@ static void deflate_decomp_exit(struct deflate_ctx *ctx)
vfree(ctx->decomp_stream.workspace);
}
-static int __deflate_init(void *ctx, int format)
+static int __deflate_init(void *ctx, int format, int level)
{
int ret;
- ret = deflate_comp_init(ctx, format);
+ ret = deflate_comp_init(ctx, format, level);
if (ret)
goto out;
ret = deflate_decomp_init(ctx, format);
@@ -132,7 +133,7 @@ static void *gen_deflate_alloc_ctx(struct crypto_scomp *tfm, int format)
if (!ctx)
return ERR_PTR(-ENOMEM);
- ret = __deflate_init(ctx, format);
+ ret = __deflate_init(ctx, format, DEFLATE_DEF_LEVEL);
if (ret) {
kfree(ctx);
return ERR_PTR(ret);
@@ -154,8 +155,9 @@ static void *zlib_deflate_alloc_ctx(struct crypto_scomp *tfm)
static int deflate_init(struct crypto_tfm *tfm)
{
struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
+ const int level = tfm->crt_compress.cot_level;
- return __deflate_init(ctx, 0);
+ return __deflate_init(ctx, 0, level);
}
static void __deflate_exit(void *ctx)
diff --git a/crypto/lz4.c b/crypto/lz4.c
index 60a1914b..8486188e 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -63,11 +63,11 @@ static void lz4_exit(struct crypto_tfm *tfm)
lz4_free_ctx(NULL, ctx->lz4_comp_mem);
}
-static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen, void *ctx)
+static int __lz4_compress_crypto(const u8 *src, unsigned int slen, u8 *dst,
+ unsigned int *dlen, void *ctx, int level)
{
- int out_len = LZ4_compress_default(src, dst,
- slen, *dlen, ctx);
+ int out_len = LZ4_compress_fast(src, dst,
+ slen, *dlen, level, ctx);
if (!out_len)
return -EINVAL;
@@ -80,15 +80,19 @@ static int lz4_scompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
{
- return __lz4_compress_crypto(src, slen, dst, dlen, ctx);
+ return __lz4_compress_crypto(src, slen, dst, dlen, ctx,
+ LZ4_ACCELERATION_DEFAULT);
}
static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen)
{
struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
+ const int level = tfm->crt_compress.cot_level;
- return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
+ return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem,
+ level != 0 ? level
+ : LZ4_ACCELERATION_DEFAULT);
}
static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index 9ecb4e18..96de5227 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -63,10 +63,12 @@ static void lz4hc_exit(struct crypto_tfm *tfm)
}
static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen, void *ctx)
+ u8 *dst, unsigned int *dlen, void *ctx,
+ int level)
{
int out_len = LZ4_compress_HC(src, dst, slen,
- *dlen, LZ4HC_DEFAULT_CLEVEL, ctx);
+ *dlen, level != 0 ? level
+ : LZ4HC_DEFAULT_CLEVEL, ctx);
if (!out_len)
return -EINVAL;
@@ -79,7 +81,8 @@ static int lz4hc_scompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
{
- return __lz4hc_compress_crypto(src, slen, dst, dlen, ctx);
+ return __lz4hc_compress_crypto(src, slen, dst, dlen, ctx,
+ LZ4HC_DEFAULT_CLEVEL);
}
static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
@@ -87,9 +90,9 @@ static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
unsigned int *dlen)
{
struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
-
+const int level = tfm->crt_compress.cot_level;
return __lz4hc_compress_crypto(src, slen, dst, dlen,
- ctx->lz4hc_comp_mem);
+ ctx->lz4hc_comp_mem, level);
}
static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index a41ef290..2dde03a9 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1782,7 +1782,7 @@ static int alg_test_comp(const struct alg_test_desc *desc, const char *driver,
desc->suite.comp.decomp.count);
crypto_free_acomp(acomp);
} else {
- comp = crypto_alloc_comp(driver, type, mask);
+ comp = crypto_alloc_comp(driver, type, mask, 0);
if (IS_ERR(comp)) {
pr_err("alg: comp: Failed to load transform for %s: %ld\n",
driver, PTR_ERR(comp));
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 15b3a016..5806a06b 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -50,13 +50,13 @@ static void zcomp_strm_free(struct zcomp_strm *zstrm)
* allocate new zcomp_strm structure with ->tfm initialized by
* backend, return NULL on error
*/
-static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
+static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp, int level)
{
struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
if (!zstrm)
return NULL;
- zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0);
+ zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0, level);
/*
* allocate 2 pages. 1 for compressed data, plus 1 extra for the
* case when compressed size is larger than the original one
@@ -165,11 +165,12 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
+ int level = comp->level;
if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
return 0;
- zstrm = zcomp_strm_alloc(comp);
+ zstrm = zcomp_strm_alloc(comp, level);
if (IS_ERR_OR_NULL(zstrm)) {
pr_err("Can't allocate a compression stream\n");
return -ENOMEM;
@@ -223,7 +224,7 @@ void zcomp_destroy(struct zcomp *comp)
* case of allocation error, or any other error potentially
* returned by zcomp_init().
*/
-struct zcomp *zcomp_create(const char *compress)
+struct zcomp *zcomp_create(const char *compress, int level)
{
struct zcomp *comp;
int error;
@@ -236,6 +237,7 @@ struct zcomp *zcomp_create(const char *compress)
return ERR_PTR(-ENOMEM);
comp->name = compress;
+ comp->level = level;
error = zcomp_init(comp);
if (error) {
kfree(comp);
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 41c1002a..e1b3023e 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -21,6 +21,7 @@ struct zcomp {
struct zcomp_strm * __percpu *stream;
const char *name;
struct hlist_node node;
+ int level;
};
int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node);
@@ -28,7 +29,7 @@ int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node);
ssize_t zcomp_available_show(const char *comp, char *buf);
bool zcomp_available_algorithm(const char *comp);
-struct zcomp *zcomp_create(const char *comp);
+struct zcomp *zcomp_create(const char *comp, int level);
void zcomp_destroy(struct zcomp *comp);
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c46f21ce..8fb0c429 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -631,6 +631,23 @@ static ssize_t max_comp_streams_store(struct device *dev,
return len;
}
+static ssize_t comp_level_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return scnprintf(buf, PAGE_SIZE, "%d\n", zram->comp_level);
+}
+
+static ssize_t comp_level_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ zram->comp_level = memparse(buf, NULL);
+ return len;
+}
static ssize_t comp_algorithm_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -1332,6 +1349,8 @@ static void zram_reset_device(struct zram *zram)
down_write(&zram->init_lock);
zram->limit_pages = 0;
+ zram->unsafe_decompression = 1;
+ zram->comp_level = 0;
if (!init_done(zram)) {
up_write(&zram->init_lock);
@@ -1351,7 +1370,6 @@ static void zram_reset_device(struct zram *zram)
memset(&zram->stats, 0, sizeof(zram->stats));
zcomp_destroy(comp);
reset_bdev(zram);
- zram->unsafe_decompression = 1;
}
static ssize_t unsafe_decompression_store(struct device *dev,
@@ -1391,7 +1409,7 @@ static ssize_t disksize_store(struct device *dev,
goto out_unlock;
}
- comp = zcomp_create(zram->compressor);
+ comp = zcomp_create(zram->compressor, zram->comp_level);
if (IS_ERR(comp)) {
pr_err("Cannot initialise %s compressing backend\n",
zram->compressor);
@@ -1491,6 +1509,7 @@ static DEVICE_ATTR_WO(mem_limit);
static DEVICE_ATTR_WO(mem_used_max);
static DEVICE_ATTR_RW(max_comp_streams);
static DEVICE_ATTR_RW(comp_algorithm);
+static DEVICE_ATTR_RW(comp_level);
#ifdef CONFIG_ZRAM_WRITEBACK
static DEVICE_ATTR_RW(backing_dev);
#endif
@@ -1505,6 +1524,7 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_mem_used_max.attr,
&dev_attr_max_comp_streams.attr,
&dev_attr_comp_algorithm.attr,
+ &dev_attr_comp_level.attr,
#ifdef CONFIG_ZRAM_WRITEBACK
&dev_attr_backing_dev.attr,
#endif
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 3448316c..5809df24 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -100,6 +100,7 @@ struct zram {
*/
bool claim; /* Protected by bdev->bd_mutex */
unsigned char unsafe_decompression;
+ int comp_level;
#ifdef CONFIG_ZRAM_WRITEBACK
struct file *backing_dev;
struct block_device *bdev;
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index dc720573..e6507cef 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -276,7 +276,7 @@ static void allocate_buf_for_compression(void)
return;
}
- tfm = crypto_alloc_comp(zbackend->name, 0, 0);
+ tfm = crypto_alloc_comp(zbackend->name, 0, 0, 0);
if (IS_ERR_OR_NULL(tfm)) {
kfree(big_oops_buf);
big_oops_buf = NULL;
diff --git a/fs/ubifs/compress.c b/fs/ubifs/compress.c
index 565cb56d..b5bf7c12 100644
--- a/fs/ubifs/compress.c
+++ b/fs/ubifs/compress.c
@@ -191,7 +191,7 @@ int ubifs_decompress(const struct ubifs_info *c, const void *in_buf,
static int __init compr_init(struct ubifs_compressor *compr)
{
if (compr->capi_name) {
- compr->cc = crypto_alloc_comp(compr->capi_name, 0, 0);
+ compr->cc = crypto_alloc_comp(compr->capi_name, 0, 0, 0);
if (IS_ERR(compr->cc)) {
pr_err("UBIFS error (pid %d): cannot initialize compressor %s, error %ld",
current->pid, compr->name, PTR_ERR(compr->cc));
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index bb1fada2..6bfb1aea 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -584,6 +584,7 @@ struct compress_tfm {
int (*cot_decompress_unsafe)(struct crypto_tfm *tfm,
const u8 *src, unsigned int slen,
u8 *dst, unsigned int *dlen);
+ int cot_level;
};
#define crt_ablkcipher crt_u.ablkcipher
@@ -656,6 +657,8 @@ struct crypto_attr_u32 {
*/
struct crypto_tfm *crypto_alloc_base(const char *alg_name, u32 type, u32 mask);
+struct crypto_tfm *crypto_alloc_base_compress(const char *alg_name, u32 type,
+ u32 mask, int level);
void crypto_destroy_tfm(void *mem, struct crypto_tfm *tfm);
static inline void crypto_free_tfm(struct crypto_tfm *tfm)
@@ -1612,13 +1615,15 @@ static inline struct crypto_comp *crypto_comp_cast(struct crypto_tfm *tfm)
}
static inline struct crypto_comp *crypto_alloc_comp(const char *alg_name,
- u32 type, u32 mask)
+ u32 type, u32 mask,
+ int level)
{
type &= ~CRYPTO_ALG_TYPE_MASK;
type |= CRYPTO_ALG_TYPE_COMPRESS;
mask |= CRYPTO_ALG_TYPE_MASK;
- return __crypto_comp_cast(crypto_alloc_base(alg_name, type, mask));
+ return __crypto_comp_cast(crypto_alloc_base_compress(alg_name, type,
+ mask, level));
}
static inline struct crypto_tfm *crypto_comp_tfm(struct crypto_comp *tfm)
diff --git a/mm/zswap.c b/mm/zswap.c
index 61a5c419..98b756ad 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -412,7 +412,7 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
return 0;
- tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
+ tfm = crypto_alloc_comp(pool->tfm_name, 0, 0, 0);
if (IS_ERR_OR_NULL(tfm)) {
pr_err("could not alloc crypto comp %s : %ld\n",
pool->tfm_name, PTR_ERR(tfm));
diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index a00ec715..0602ed4b 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -305,7 +305,8 @@ static struct crypto_comp * __percpu *ipcomp_alloc_tfms(const char *alg_name)
for_each_possible_cpu(cpu) {
struct crypto_comp *tfm = crypto_alloc_comp(alg_name, 0,
- CRYPTO_ALG_ASYNC);
+ CRYPTO_ALG_ASYNC,
+ 0);
if (IS_ERR(tfm))
goto error;
*per_cpu_ptr(tfms, cpu) = tfm;
--
2.14.1
Up to Version 3 of this patch the decompressor of zbewalgo did not verify
that there is no overflow in the output buffer. Now zbewalgo includes a
safe decompressor which does check for buffer overflows and heap-error.
ZBewalgo and other Algorithms like lz4 include an unsafe decompressor version,
which is a bit faster, but does no error checking. These unsafe decompressors
can be applied when the datasource and the whole datapath is trusted.
This patch publishes these existing functions in the crypto-api
Signed-off-by: Benjamin Warnke <[email protected]>
---
crypto/842.c | 3 ++-
crypto/compress.c | 10 ++++++++++
crypto/crypto_null.c | 3 ++-
crypto/deflate.c | 3 ++-
crypto/lz4.c | 23 ++++++++++++++++++++++-
crypto/lz4hc.c | 23 ++++++++++++++++++++++-
crypto/lzo.c | 3 ++-
crypto/testmgr.c | 27 ++++++++++++++++++++++++++-
crypto/zbewalgo.c | 29 ++++++++++++++++++++++++++++-
drivers/block/zram/zram_drv.c | 34 +++++++++++++++++++++++++++++++++-
drivers/block/zram/zram_drv.h | 1 +
drivers/crypto/cavium/zip/zip_main.c | 6 ++++--
drivers/crypto/nx/nx-842-powernv.c | 3 ++-
drivers/crypto/nx/nx-842-pseries.c | 3 ++-
include/linux/crypto.h | 16 ++++++++++++++++
15 files changed, 174 insertions(+), 13 deletions(-)
diff --git a/crypto/842.c b/crypto/842.c
index bc26dc94..7e74ea26 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -112,7 +112,8 @@ static struct crypto_alg alg = {
.cra_exit = crypto842_exit,
.cra_u = { .compress = {
.coa_compress = crypto842_compress,
- .coa_decompress = crypto842_decompress } }
+ .coa_decompress = crypto842_decompress,
+ .coa_decompress_unsafe = crypto842_decompress } }
};
static struct scomp_alg scomp = {
diff --git a/crypto/compress.c b/crypto/compress.c
index f2d52292..bec79624 100644
--- a/crypto/compress.c
+++ b/crypto/compress.c
@@ -33,12 +33,22 @@ static int crypto_decompress(struct crypto_tfm *tfm,
dlen);
}
+static int crypto_decompress_unsafe(struct crypto_tfm *tfm,
+ const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen)
+{
+ return tfm->__crt_alg->cra_compress.coa_decompress_unsafe(tfm, src,
+ slen, dst,
+ dlen);
+}
+
int crypto_init_compress_ops(struct crypto_tfm *tfm)
{
struct compress_tfm *ops = &tfm->crt_compress;
ops->cot_compress = crypto_compress;
ops->cot_decompress = crypto_decompress;
+ ops->cot_decompress_unsafe = crypto_decompress_unsafe;
return 0;
}
diff --git a/crypto/crypto_null.c b/crypto/crypto_null.c
index 20ff2c74..6e15e8c0 100644
--- a/crypto/crypto_null.c
+++ b/crypto/crypto_null.c
@@ -146,7 +146,8 @@ static struct crypto_alg null_algs[3] = { {
.cra_module = THIS_MODULE,
.cra_u = { .compress = {
.coa_compress = null_compress,
- .coa_decompress = null_compress } }
+ .coa_decompress = null_compress,
+ .coa_decompress_unsafe = null_compress } }
} };
MODULE_ALIAS_CRYPTO("compress_null");
diff --git a/crypto/deflate.c b/crypto/deflate.c
index 94ec3b36..4b681a37 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -286,7 +286,8 @@ static struct crypto_alg alg = {
.cra_exit = deflate_exit,
.cra_u = { .compress = {
.coa_compress = deflate_compress,
- .coa_decompress = deflate_decompress } }
+ .coa_decompress = deflate_decompress,
+ .coa_decompress_unsafe = deflate_decompress } }
};
static struct scomp_alg scomp[] = { {
diff --git a/crypto/lz4.c b/crypto/lz4.c
index 2ce2660d..60a1914b 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -103,6 +103,19 @@ static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
return 0;
}
+static int __lz4_decompress_crypto_unsafe(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ int out_len = LZ4_decompress_fast(src, dst, *dlen);
+
+ if (out_len < 0)
+ return -EINVAL;
+
+ *dlen = out_len;
+ return 0;
+}
+
static int lz4_sdecompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
@@ -117,6 +130,13 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
}
+static int lz4_decompress_crypto_unsafe(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen)
+{
+ return __lz4_decompress_crypto_unsafe(src, slen, dst, dlen, NULL);
+}
+
static struct crypto_alg alg_lz4 = {
.cra_name = "lz4",
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
@@ -127,7 +147,8 @@ static struct crypto_alg alg_lz4 = {
.cra_exit = lz4_exit,
.cra_u = { .compress = {
.coa_compress = lz4_compress_crypto,
- .coa_decompress = lz4_decompress_crypto } }
+ .coa_decompress = lz4_decompress_crypto,
+ .coa_decompress_unsafe = lz4_decompress_crypto_unsafe } }
};
static struct scomp_alg scomp = {
diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index 2be14f05..9ecb4e18 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -104,6 +104,19 @@ static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
return 0;
}
+static int __lz4hc_decompress_crypto_unsafe(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ int out_len = LZ4_decompress_fast(src, dst, *dlen);
+
+ if (out_len < 0)
+ return -EINVAL;
+
+ *dlen = out_len;
+ return 0;
+}
+
static int lz4hc_sdecompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
@@ -118,6 +131,13 @@ static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
}
+static int lz4hc_decompress_crypto_unsafe(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen)
+{
+ return __lz4hc_decompress_crypto_unsafe(src, slen, dst, dlen, NULL);
+}
+
static struct crypto_alg alg_lz4hc = {
.cra_name = "lz4hc",
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
@@ -128,7 +148,8 @@ static struct crypto_alg alg_lz4hc = {
.cra_exit = lz4hc_exit,
.cra_u = { .compress = {
.coa_compress = lz4hc_compress_crypto,
- .coa_decompress = lz4hc_decompress_crypto } }
+ .coa_decompress = lz4hc_decompress_crypto,
+ .coa_decompress_unsafe = lz4hc_decompress_crypto_unsafe } }
};
static struct scomp_alg scomp = {
diff --git a/crypto/lzo.c b/crypto/lzo.c
index 218567d7..81fb2d63 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -129,7 +129,8 @@ static struct crypto_alg alg = {
.cra_exit = lzo_exit,
.cra_u = { .compress = {
.coa_compress = lzo_compress,
- .coa_decompress = lzo_decompress } }
+ .coa_decompress = lzo_decompress,
+ .coa_decompress_unsafe = lzo_decompress } }
};
static struct scomp_alg scomp = {
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 53fd43d1..a41ef290 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1383,9 +1383,9 @@ static int test_comp(struct crypto_comp *tfm,
int ilen;
unsigned int dlen = COMP_BUF_SIZE;
- memset(result, 0, sizeof (result));
ilen = dtemplate[i].inlen;
+ memset(result, 0, sizeof(result));
ret = crypto_comp_decompress(tfm, dtemplate[i].input,
ilen, result, &dlen);
if (ret) {
@@ -1410,6 +1410,31 @@ static int test_comp(struct crypto_comp *tfm,
ret = -EINVAL;
goto out;
}
+ memset(result, 0, sizeof(result));
+ ret = crypto_comp_decompress_unsafe(tfm, dtemplate[i].input,
+ ilen, result, &dlen);
+ if (ret) {
+ printk(KERN_ERR "alg: comp: unsafe decompression failed on test %d for %s: ret=%d\n",
+ i + 1, algo,
+ -ret);
+ goto out;
+ }
+
+ if (dlen != dtemplate[i].outlen) {
+ printk(KERN_ERR "alg: comp: unsafe Decompression test %d failed for %s: output len = %d\n",
+ i + 1, algo,
+ dlen);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (memcmp(result, dtemplate[i].output, dlen)) {
+ printk(KERN_ERR "alg: comp: unsafe Decompression test %d failed for %s\n",
+ i + 1, algo);
+ hexdump(result, dlen);
+ ret = -EINVAL;
+ goto out;
+ }
}
ret = 0;
diff --git a/crypto/zbewalgo.c b/crypto/zbewalgo.c
index 044b8811..9db0d43b 100644
--- a/crypto/zbewalgo.c
+++ b/crypto/zbewalgo.c
@@ -90,6 +90,20 @@ static int __zbewalgo_decompress_crypto(const u8 *src, unsigned int slen,
return 0;
}
+static int __zbewalgo_decompress_crypto_unsafe(const u8 *src,
+ unsigned int slen,
+ u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ int out_len;
+
+ out_len = zbewalgo_decompress_fast(src, dst, ctx, slen);
+ if (out_len < 0)
+ return -EINVAL;
+ *dlen = out_len;
+ return 0;
+}
+
static int zbewalgo_sdecompress(struct crypto_scomp *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen,
void *ctx)
@@ -107,6 +121,17 @@ static int zbewalgo_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
ctx->zbewalgo_comp_mem);
}
+static int zbewalgo_decompress_crypto_unsafe(struct crypto_tfm *tfm,
+ const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen)
+{
+ struct zbewalgo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zbewalgo_decompress_crypto_unsafe(src, slen, dst, dlen,
+ ctx->zbewalgo_comp_mem);
+}
+
static struct crypto_alg crypto_alg_zbewalgo = {
.cra_name = "zbewalgo",
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
@@ -117,7 +142,9 @@ static struct crypto_alg crypto_alg_zbewalgo = {
.cra_u = {
.compress = {
.coa_compress = zbewalgo_compress_crypto,
- .coa_decompress = zbewalgo_decompress_crypto
+ .coa_decompress = zbewalgo_decompress_crypto,
+ .coa_decompress_unsafe =
+ zbewalgo_decompress_crypto_unsafe
}
}
};
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd7..c46f21ce 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -219,6 +219,15 @@ static ssize_t disksize_show(struct device *dev,
return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
}
+static ssize_t unsafe_decompression_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n", zram->unsafe_decompression);
+}
+
static ssize_t mem_limit_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
@@ -886,9 +895,17 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
ret = 0;
} else {
struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
+ unsigned int dst_len = PAGE_SIZE;
dst = kmap_atomic(page);
- ret = zcomp_decompress(zstrm, src, size, dst);
+ if (zram->unsafe_decompression)
+ ret = crypto_comp_decompress_unsafe(zstrm->tfm,
+ src, size,
+ dst, &dst_len);
+ else
+ ret = crypto_comp_decompress(zstrm->tfm,
+ src, size,
+ dst, &dst_len);
kunmap_atomic(dst);
zcomp_stream_put(zram->comp);
}
@@ -1334,6 +1351,19 @@ static void zram_reset_device(struct zram *zram)
memset(&zram->stats, 0, sizeof(zram->stats));
zcomp_destroy(comp);
reset_bdev(zram);
+ zram->unsafe_decompression = 1;
+}
+
+static ssize_t unsafe_decompression_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ unsigned char flag;
+ struct zram *zram = dev_to_zram(dev);
+
+ flag = memparse(buf, NULL);
+ zram->unsafe_decompression = flag ? 1 : 0;
+ return len;
}
static ssize_t disksize_store(struct device *dev,
@@ -1454,6 +1484,7 @@ static const struct block_device_operations zram_devops = {
static DEVICE_ATTR_WO(compact);
static DEVICE_ATTR_RW(disksize);
+static DEVICE_ATTR_RW(unsafe_decompression);
static DEVICE_ATTR_RO(initstate);
static DEVICE_ATTR_WO(reset);
static DEVICE_ATTR_WO(mem_limit);
@@ -1466,6 +1497,7 @@ static DEVICE_ATTR_RW(backing_dev);
static struct attribute *zram_disk_attrs[] = {
&dev_attr_disksize.attr,
+ &dev_attr_unsafe_decompression.attr,
&dev_attr_initstate.attr,
&dev_attr_reset.attr,
&dev_attr_compact.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 00886122..3448316c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -99,6 +99,7 @@ struct zram {
* zram is claimed so open request will be failed
*/
bool claim; /* Protected by bdev->bd_mutex */
+ unsigned char unsafe_decompression;
#ifdef CONFIG_ZRAM_WRITEBACK
struct file *backing_dev;
struct block_device *bdev;
diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c
index 1cd8aa48..bd5d9f64 100644
--- a/drivers/crypto/cavium/zip/zip_main.c
+++ b/drivers/crypto/cavium/zip/zip_main.c
@@ -359,7 +359,8 @@ static struct crypto_alg zip_comp_deflate = {
.cra_exit = zip_free_comp_ctx,
.cra_u = { .compress = {
.coa_compress = zip_comp_compress,
- .coa_decompress = zip_comp_decompress
+ .coa_decompress = zip_comp_decompress,
+ .coa_decompress_unsafe = zip_comp_decompress
} }
};
@@ -373,7 +374,8 @@ static struct crypto_alg zip_comp_lzs = {
.cra_exit = zip_free_comp_ctx,
.cra_u = { .compress = {
.coa_compress = zip_comp_compress,
- .coa_decompress = zip_comp_decompress
+ .coa_decompress = zip_comp_decompress,
+ .coa_decompress_unsafe = zip_comp_decompress
} }
};
diff --git a/drivers/crypto/nx/nx-842-powernv.c b/drivers/crypto/nx/nx-842-powernv.c
index 1e87637c..f00326b4 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -981,7 +981,8 @@ static struct crypto_alg nx842_powernv_alg = {
.cra_exit = nx842_crypto_exit,
.cra_u = { .compress = {
.coa_compress = nx842_crypto_compress,
- .coa_decompress = nx842_crypto_decompress } }
+ .coa_decompress = nx842_crypto_decompress,
+ .coa_decompress_unsafe = nx842_crypto_decompress } }
};
static __init int nx842_powernv_init(void)
diff --git a/drivers/crypto/nx/nx-842-pseries.c b/drivers/crypto/nx/nx-842-pseries.c
index 66869976..5badd418 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -982,7 +982,8 @@ static struct crypto_alg nx842_pseries_alg = {
.cra_exit = nx842_crypto_exit,
.cra_u = { .compress = {
.coa_compress = nx842_crypto_compress,
- .coa_decompress = nx842_crypto_decompress } }
+ .coa_decompress = nx842_crypto_decompress,
+ .coa_decompress_unsafe = nx842_crypto_decompress } }
};
static int nx842_probe(struct vio_dev *viodev,
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6eb06101..bb1fada2 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -362,6 +362,9 @@ struct compress_alg {
unsigned int slen, u8 *dst, unsigned int *dlen);
int (*coa_decompress)(struct crypto_tfm *tfm, const u8 *src,
unsigned int slen, u8 *dst, unsigned int *dlen);
+ int (*coa_decompress_unsafe)(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen);
};
@@ -578,6 +581,9 @@ struct compress_tfm {
int (*cot_decompress)(struct crypto_tfm *tfm,
const u8 *src, unsigned int slen,
u8 *dst, unsigned int *dlen);
+ int (*cot_decompress_unsafe)(struct crypto_tfm *tfm,
+ const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen);
};
#define crt_ablkcipher crt_u.ablkcipher
@@ -1660,5 +1666,15 @@ static inline int crypto_comp_decompress(struct crypto_comp *tfm,
src, slen, dst, dlen);
}
+static inline int crypto_comp_decompress_unsafe(struct crypto_comp *tfm,
+ const u8 *src,
+ unsigned int slen,
+ u8 *dst, unsigned int *dlen)
+{
+ return crypto_comp_crt(tfm)->cot_decompress_unsafe(crypto_comp_tfm(tfm),
+ src, slen, dst,
+ dlen);
+}
+
#endif /* _LINUX_CRYPTO_H */
--
2.14.1
zBeWalgo is a completely new algorithm - Currently it is not published
somewhere else right now, googleing it would not show up any results. The
following section describes how the algorithm works.
zBeWalgo itself is a container compression algorithm, which can execute
multiple different compression and transformation algorithms after each other.
The execution of different compression algorithms after each other will be
called 'combination' in this description and in the code. Additionally to be
able to execute combinations of algorithms, zBeWalgo can try different
combinations on the same input. This allows high compression ratios on
completely different datasets, which would otherwise require its own
algorithm each. Executing all known combinations on each input page would be
very slow. Therefore the data is compressed at first with that combination,
which was already successful on the last input page. If the compressed data
size of the current page is similar to that of the last page, the compressed
data is returned immediately without even trying the other combinations. Even
if there is no guarantee that consecutive calls to the algorithm belong to
each other, the speed improvement is obvious.
ZRAM uses zsmalloc for the management of the compressed pages. The largest
size-class in zsmalloc is 3264 Bytes. If the compressed data is larger than
that threshold, ZRAM ignores the compression and writes the uncompressed page
instead. As a consequence it is useless to continue compression, if the
algorithm detects, that the data can not be compressed using the current
combination. The threshold for aborting compression can be changed via sysfs at
any time, even if the algorithm is currently in use. If a combination fails to
compress the data, zBeWalgo tries the next combination. If no combination is
able to reduce the data in size, zBeWalgo returns a negative value.
Each combination consists of up to 7 compression and transformation steps.
Combinations can be added and removed at any time via sysfs. Already compressed
Data can always be decompressed, even if the combination used to produce it
does not exist anymore. Technically the user could add up to 256 combinations
concurrently, but that would be very time consuming if the data can not be
compressed.
To be able to build combinations and call different algorithms, all those
algorithms are implementing the same interface. This enables the user to
specify additional combinations while ZRAM is running.
Within the combinations many different algorithms can be used. Some of those
algorithms are published. This patch adds the following algorithms to be used
within the combinations:
- bwt: The Burrows-Wheeler-Transformation was published by 'M. Burrows' and
'D. J. Wheeler' in 1994. This implementation uses counting sort for
sorting the data. Their original paper is online available at:
http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf
- mtf: The Move-To-Front algorithm as described by 'M. Burrows' and
'D. J. Wheeler' in the same paper as bwt.
- jbe: j-bit-encoding as proposed by 'I Made Agus Dwi Suarjaya' in 2012.
https://arxiv.org/pdf/1209.1045.pdf
- jbe2: A minor modification of jbe. Swapping groups of 4 Bit in consecutive
Bytes can increase the compression ratio, if for example the first
4 Bits of each Byte are zero. If jbe2 is called after mtf, this
happens ofthen.
- rle: Run Length Encoding
- huffman: Huffman encoding
- bewalgo: I invented this algorithm for my bachelors thesis
'Page-Based compression in the Linux Kernel'. This algorithm is
mainly inspired by lz4, focusing on increasing the speed even more,
with the help of page aligned read an write access. To achieve the
page alignment, the input and output data is accessed only in
blocks of 8 Bytes, therefore the encoding of the compressed data is
changed.
https://wr.informatik.uni-hamburg.de/_media/research:theses:benjamin_warnke_page_based_compression_in_the_linux_kernel.pdf
- bewalgo2: At the beginning of my work to improve ZRAM this was the whole
algorithm. The input is read in blocks of 8 Bytes. These Blocks
are added to an avl-tree. The avl-tree is mapped directly to an
array. The encoding is a variation of Run Length Encoding using the
indices in the avl-tree as data. The reason for using the tree
with indices is, that the indices can be encoded in less then
8 Bytes each.
Signed-off-by: Benjamin Warnke <[email protected]>
---
include/linux/zbewalgo.h | 50 ++++
lib/Kconfig | 3 +
lib/Makefile | 1 +
lib/zbewalgo/BWT.c | 120 ++++++++
lib/zbewalgo/BWT.h | 21 ++
lib/zbewalgo/JBE.c | 204 +++++++++++++
lib/zbewalgo/JBE.h | 13 +
lib/zbewalgo/JBE2.c | 221 ++++++++++++++
lib/zbewalgo/JBE2.h | 13 +
lib/zbewalgo/MTF.c | 122 ++++++++
lib/zbewalgo/MTF.h | 13 +
lib/zbewalgo/Makefile | 4 +
lib/zbewalgo/RLE.c | 137 +++++++++
lib/zbewalgo/RLE.h | 13 +
lib/zbewalgo/bewalgo.c | 401 ++++++++++++++++++++++++++
lib/zbewalgo/bewalgo.h | 13 +
lib/zbewalgo/bewalgo2.c | 407 ++++++++++++++++++++++++++
lib/zbewalgo/bewalgo2.h | 13 +
lib/zbewalgo/bitshuffle.c | 93 ++++++
lib/zbewalgo/bitshuffle.h | 13 +
lib/zbewalgo/huffman.c | 262 +++++++++++++++++
lib/zbewalgo/huffman.h | 13 +
lib/zbewalgo/include.h | 94 ++++++
lib/zbewalgo/zbewalgo.c | 713 ++++++++++++++++++++++++++++++++++++++++++++++
24 files changed, 2957 insertions(+)
create mode 100644 include/linux/zbewalgo.h
create mode 100644 lib/zbewalgo/BWT.c
create mode 100644 lib/zbewalgo/BWT.h
create mode 100644 lib/zbewalgo/JBE.c
create mode 100644 lib/zbewalgo/JBE.h
create mode 100644 lib/zbewalgo/JBE2.c
create mode 100644 lib/zbewalgo/JBE2.h
create mode 100644 lib/zbewalgo/MTF.c
create mode 100644 lib/zbewalgo/MTF.h
create mode 100644 lib/zbewalgo/Makefile
create mode 100644 lib/zbewalgo/RLE.c
create mode 100644 lib/zbewalgo/RLE.h
create mode 100644 lib/zbewalgo/bewalgo.c
create mode 100644 lib/zbewalgo/bewalgo.h
create mode 100644 lib/zbewalgo/bewalgo2.c
create mode 100644 lib/zbewalgo/bewalgo2.h
create mode 100644 lib/zbewalgo/bitshuffle.c
create mode 100644 lib/zbewalgo/bitshuffle.h
create mode 100644 lib/zbewalgo/huffman.c
create mode 100644 lib/zbewalgo/huffman.h
create mode 100644 lib/zbewalgo/include.h
create mode 100644 lib/zbewalgo/zbewalgo.c
diff --git a/include/linux/zbewalgo.h b/include/linux/zbewalgo.h
new file mode 100644
index 00000000..2958867a
--- /dev/null
+++ b/include/linux/zbewalgo.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_INCLUDE_H
+#define ZBEWALGO_INCLUDE_H
+
+/*
+ * zbewalgo_get_wrkmem_size must be used to determine the size which is
+ * required for allocating the working memory for the compression and
+ * decompression algorithm
+ */
+int zbewalgo_get_wrkmem_size(void);
+
+/*
+ * this function compresses the data given by 'source' into the
+ * preallocated memory given by 'dest'.
+ * The maximum allowed source_length is 4096 Bytes. If larger values are
+ * given, the algorithm returns an error.
+ * If the data is not compressible the function returns -1. Otherwise the
+ * size of the compressed data is returned.
+ * wrkmem must point to a memory region of at least the size returned by
+ * 'zbewalgo_get_wrkmem_size'.
+ */
+int zbewalgo_compress(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+
+/*
+ * this function decompresses data which was already successfully compressed
+ * with 'zbewalgo_compress'.
+ * the function decompresses the data given by 'source' into the preallocated
+ * buffer 'dest'.
+ * wrkmem must point to a memory region of at least the size returned by
+ * 'zbewalgo_get_wrkmem_size'.
+ * the wrkmem for compression and decompression does not need to be the same
+ * the function 'zbewalgo_decompress_safe' detects any errors in the given
+ * compressed data and decompresses it safely.
+ */
+int zbewalgo_decompress_safe(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+
+/*
+ * like 'zbewalgo_decompress_safe', but all safeness branches are disabled at
+ * compiletime which leads to a much higher decompression speed.
+ */
+int zbewalgo_decompress_fast(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 5fe57767..1770f508 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -255,6 +255,9 @@ config LZO_DECOMPRESS
config LZ4_COMPRESS
tristate
+config ZBEWALGO_COMPRESS
+ tristate
+
config LZ4HC_COMPRESS
tristate
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d..d4ef5d9f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_BCH) += bch.o
obj-$(CONFIG_LZO_COMPRESS) += lzo/
obj-$(CONFIG_LZO_DECOMPRESS) += lzo/
obj-$(CONFIG_LZ4_COMPRESS) += lz4/
+obj-$(CONFIG_ZBEWALGO_COMPRESS) += zbewalgo/
obj-$(CONFIG_LZ4HC_COMPRESS) += lz4/
obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/
obj-$(CONFIG_ZSTD_COMPRESS) += zstd/
diff --git a/lib/zbewalgo/BWT.c b/lib/zbewalgo/BWT.c
new file mode 100644
index 00000000..ae51258c
--- /dev/null
+++ b/lib/zbewalgo/BWT.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * The Burrows-Wheeler-Transformation was published by 'M. Burrows' and
+ * 'D. J. Wheeler' in 1994. This implementation uses counting sort for
+ * sorting the data. Their original paper is online available at:
+ * http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf
+ */
+
+#include "BWT.h"
+
+unsigned long zbewalgo_bwt_max_alphabet = 90;
+
+/*
+ * implementation of the Burrows-Wheeler transformation. Optimized for speed
+ * and small input sizes
+ */
+static __always_inline int compress_bwt(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ u16 * const C = wrkmem;
+ u16 i;
+ u16 alphabet;
+ u8 * const op = dest + 1;
+
+ *dest = source[source_length - 1];
+ memset(C, 0, 512);
+ for (i = 0; i < source_length; i++)
+ C[source[i]]++;
+ alphabet = (C[0] > 0);
+ for (i = 1; i < 256; i++) {
+ alphabet += (C[i] > 0);
+ C[i] += C[i - 1];
+ }
+ if (alphabet > zbewalgo_bwt_max_alphabet) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ i = source_length - 1;
+ while (i > 0) {
+ C[source[i]]--;
+ op[C[source[i]]] = source[i - 1];
+ i--;
+ }
+ C[source[0]]--;
+ op[C[source[0]]] = source[source_length - 1];
+ return source_length + 1;
+}
+
+/*
+ * reverses the transformation
+ */
+static __always_inline int decompress_bwt(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ const u16 dest_length = source_length - 1;
+ u16 * const C = wrkmem;
+ u8 * const L = (u8 *)(wrkmem + 256);
+ u8 key = *source;
+ u8 *dest_end = dest + dest_length;
+ const u8 *ip = source + 1;
+ int i, j;
+
+ if (safe_mode && source_length == 0)
+ return -EINVAL;
+
+ memset(C, 0, 512);
+ for (i = 0; i < dest_length; i++)
+ C[ip[i]]++;
+ for (i = 1; i < 256; i++)
+ C[i] += C[i - 1];
+ j = 0;
+ for (i = 0; i < 256; i++)
+ while (j < C[i])
+ L[j++] = i;
+ do {
+ C[key]--;
+ *--dest_end = L[C[key]];
+ key = ip[C[key]];
+ } while (dest < dest_end);
+ return dest_length;
+}
+
+static __always_inline int decompress_bwt_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bwt(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_bwt_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bwt(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_bwt(void)
+{
+ return 0;
+}
+
+static void exit_bwt(void)
+{
+}
+
+struct zbewalgo_alg alg_bwt = {
+ .name = "bwt",
+ .flags = ZBEWALGO_ALG_FLAG_TRANSFORM,
+ .wrkmem_size = PAGE_SIZE << 1,
+ .init = init_bwt,
+ .exit = exit_bwt,
+ .compress = compress_bwt,
+ .decompress_safe = decompress_bwt_safe,
+ .decompress_fast = decompress_bwt_fast
+};
diff --git a/lib/zbewalgo/BWT.h b/lib/zbewalgo/BWT.h
new file mode 100644
index 00000000..18756627
--- /dev/null
+++ b/lib/zbewalgo/BWT.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_BWT_H
+#define ZBEWALGO_BWT_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_bwt;
+
+/*
+ * If more than the specified number of chars are to be transformed,
+ * it is unlikely that the compression will achieve high ratios.
+ * As a consequence the Burrows-Wheeler Transformation will abort if the number
+ * of different bytes exeeds this value.
+ */
+extern unsigned long zbewalgo_bwt_max_alphabet;
+
+#endif
diff --git a/lib/zbewalgo/JBE.c b/lib/zbewalgo/JBE.c
new file mode 100644
index 00000000..625cedd8
--- /dev/null
+++ b/lib/zbewalgo/JBE.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * j-bit-encoding as proposed by 'I Made Agus Dwi Suarjaya' in 2012.
+ * https://arxiv.org/pdf/1209.1045.pdf
+ */
+
+#include "JBE.h"
+
+static __always_inline int compress_jbe(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ u64 source_tmp;
+ u8 tmp;
+ const u8 *source_end = source + (source_length & ~0x7);
+ const u8 *ip = source;
+ u8 *data1 = dest + 2;
+ u8 *data2 = data1 + (source_length >> 3);
+ u8 * const source_tmp_ptr = (u8 *)(&source_tmp);
+
+ put_unaligned_le16(source_length, dest);
+ do {
+ source_tmp = get_unaligned_le64(ip);
+ ip += 8;
+ tmp = source_tmp_ptr[0] > 0;
+ *data2 = source_tmp_ptr[0];
+ *data1 = tmp << 7;
+ data2 += tmp;
+ tmp = source_tmp_ptr[1] > 0;
+ *data2 = source_tmp_ptr[1];
+ *data1 |= tmp << 6;
+ data2 += tmp;
+ tmp = source_tmp_ptr[2] > 0;
+ *data2 = source_tmp_ptr[2];
+ *data1 |= tmp << 5;
+ data2 += tmp;
+ tmp = source_tmp_ptr[3] > 0;
+ *data2 = source_tmp_ptr[3];
+ *data1 |= tmp << 4;
+ data2 += tmp;
+ tmp = source_tmp_ptr[4] > 0;
+ *data2 = source_tmp_ptr[4];
+ *data1 |= tmp << 3;
+ data2 += tmp;
+ tmp = source_tmp_ptr[5] > 0;
+ *data2 = source_tmp_ptr[5];
+ *data1 |= tmp << 2;
+ data2 += tmp;
+ tmp = source_tmp_ptr[6] > 0;
+ *data2 = source_tmp_ptr[6];
+ *data1 |= tmp << 1;
+ data2 += tmp;
+ tmp = source_tmp_ptr[7] > 0;
+ *data2 = source_tmp_ptr[7];
+ *data1 |= tmp;
+ data2 += tmp;
+ data1++;
+ } while (ip < source_end);
+ memcpy(data2, ip, source_length & 0x7);
+ data2 += source_length & 0x7;
+ return data2 - dest;
+}
+
+static __always_inline int decompress_jbe(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ u64 dest_tmp;
+ const u16 dest_length = get_unaligned_le16(source);
+ const u8 *data1 = source + 2;
+ const u8 *data2 = data1 + (dest_length >> 3);
+ const u8 *dest_end = dest + (dest_length & ~0x7);
+ u8 * const dest_tmp_ptr = (u8 *)(&dest_tmp);
+ u8 *op = dest;
+ const u8 * const source_end = source + source_length;
+ const u8 * const source_end_8 = source_end - 8;
+
+ if (safe_mode && unlikely(dest_length > ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ if (safe_mode && unlikely(dest_length > source_length << 3))
+ return -EINVAL;
+ do {
+ if (data2 >= source_end_8)
+ goto _last_8;
+ dest_tmp_ptr[0] = ((*data1 & 0x80) ? *data2 : 0);
+ data2 += (*data1 & 0x80) > 0;
+ dest_tmp_ptr[1] = ((*data1 & 0x40) ? *data2 : 0);
+ data2 += (*data1 & 0x40) > 0;
+ dest_tmp_ptr[2] = ((*data1 & 0x20) ? *data2 : 0);
+ data2 += (*data1 & 0x20) > 0;
+ dest_tmp_ptr[3] = ((*data1 & 0x10) ? *data2 : 0);
+ data2 += (*data1 & 0x10) > 0;
+ dest_tmp_ptr[4] = ((*data1 & 0x08) ? *data2 : 0);
+ data2 += (*data1 & 0x08) > 0;
+ dest_tmp_ptr[5] = ((*data1 & 0x04) ? *data2 : 0);
+ data2 += (*data1 & 0x04) > 0;
+ dest_tmp_ptr[6] = ((*data1 & 0x02) ? *data2 : 0);
+ data2 += (*data1 & 0x02) > 0;
+ dest_tmp_ptr[7] = ((*data1 & 0x01) ? *data2 : 0);
+ data2 += (*data1 & 0x01) > 0;
+ data1++;
+ put_unaligned_le64(dest_tmp, op);
+ op += 8;
+ } while (op < dest_end);
+goto _finish;
+_last_8:
+ /*
+ * Reading last 8 bytes from data2. This may produce a lot of output,
+ * if data1 indicates to NOT read from - and inrement - data2
+ */
+ do {
+ dest_tmp = 0;
+ if (safe_mode && unlikely(data1 >= source_end))
+ return -EINVAL;
+ if (*data1 & 0x80) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[0] = *data2++;
+ }
+ if (*data1 & 0x40) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[1] = *data2++;
+ }
+ if (*data1 & 0x20) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[2] = *data2++;
+ }
+ if (*data1 & 0x10) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[3] = *data2++;
+ }
+ if (*data1 & 0x08) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[4] = *data2++;
+ }
+ if (*data1 & 0x04) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[5] = *data2++;
+ }
+ if (*data1 & 0x02) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[6] = *data2++;
+ }
+ if (*data1 & 0x01) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[7] = *data2++;
+ }
+ data1++;
+ put_unaligned_le64(dest_tmp, op);
+ op += 8;
+ } while (op < dest_end);
+_finish:
+ memcpy(op, data2, dest_length & 0x7);
+ op += dest_length & 0x7;
+ if (safe_mode && (dest_length != (op - dest)))
+ return -EINVAL;
+ return dest_length;
+}
+
+static __always_inline int decompress_jbe_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ return decompress_jbe(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_jbe_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ return decompress_jbe(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_jbe(void)
+{
+ return 0;
+}
+
+static void exit_jbe(void)
+{
+}
+
+struct zbewalgo_alg alg_jbe = {
+ .name = "jbe",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 0,
+ .init = init_jbe,
+ .exit = exit_jbe,
+ .compress = compress_jbe,
+ .decompress_safe = decompress_jbe_safe,
+ .decompress_fast = decompress_jbe_fast
+};
diff --git a/lib/zbewalgo/JBE.h b/lib/zbewalgo/JBE.h
new file mode 100644
index 00000000..99c8d90a
--- /dev/null
+++ b/lib/zbewalgo/JBE.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_JBE_H
+#define ZBEWALGO_JBE_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_jbe;
+
+#endif
diff --git a/lib/zbewalgo/JBE2.c b/lib/zbewalgo/JBE2.c
new file mode 100644
index 00000000..9706529b
--- /dev/null
+++ b/lib/zbewalgo/JBE2.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * j-bit-encoding was published by 'I Made Agus Dwi Suarjaya' in 2012.
+ * https://arxiv.org/pdf/1209.1045.pdf
+ *
+ * jbe2 is a minor modification of jbe. Swapping groups of 4 Bit in consecutive
+ * Bytes can increase the compression ratio, if for example the first
+ * 4 Bits of each Byte are zero. If jbe2 is called after mtf, this
+ * happens ofthen.
+ */
+
+#include "JBE2.h"
+
+/*
+ * This implementation is modified to swap groups of 4 bits before processing
+ */
+static __always_inline int compress_jbe2(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ u64 source_tmp;
+ u8 tmp;
+ const u8 *source_end = source + (source_length & ~0x7);
+ const u8 *ip = source;
+ u8 *data1 = dest + 2;
+ u8 *data2 = data1 + (source_length >> 3);
+ u8 * const source_tmp_ptr = (u8 *)(&source_tmp);
+
+ put_unaligned_le16(source_length, dest);
+ do {
+ source_tmp = get_unaligned_le64(ip);
+ ip += 8;
+ source_tmp = (source_tmp & 0xF0F0F0F00F0F0F0FULL)
+ | ((source_tmp & 0x0F0F0F0F00000000ULL) >> 28)
+ | ((source_tmp & 0x00000000F0F0F0F0ULL) << 28);
+ tmp = source_tmp_ptr[0] > 0;
+ *data2 = source_tmp_ptr[0];
+ *data1 = tmp << 7;
+ data2 += tmp;
+ tmp = source_tmp_ptr[1] > 0;
+ *data2 = source_tmp_ptr[1];
+ *data1 |= tmp << 6;
+ data2 += tmp;
+ tmp = source_tmp_ptr[2] > 0;
+ *data2 = source_tmp_ptr[2];
+ *data1 |= tmp << 5;
+ data2 += tmp;
+ tmp = source_tmp_ptr[3] > 0;
+ *data2 = source_tmp_ptr[3];
+ *data1 |= tmp << 4;
+ data2 += tmp;
+ tmp = source_tmp_ptr[4] > 0;
+ *data2 = source_tmp_ptr[4];
+ *data1 |= tmp << 3;
+ data2 += tmp;
+ tmp = source_tmp_ptr[5] > 0;
+ *data2 = source_tmp_ptr[5];
+ *data1 |= tmp << 2;
+ data2 += tmp;
+ tmp = source_tmp_ptr[6] > 0;
+ *data2 = source_tmp_ptr[6];
+ *data1 |= tmp << 1;
+ data2 += tmp;
+ tmp = source_tmp_ptr[7] > 0;
+ *data2 = source_tmp_ptr[7];
+ *data1 |= tmp;
+ data2 += tmp;
+ data1++;
+ } while (ip < source_end);
+ memcpy(data2, ip, source_length & 0x7);
+ data2 += source_length & 0x7;
+ return data2 - dest;
+}
+
+static __always_inline int decompress_jbe2(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ u64 dest_tmp;
+ const u16 dest_length = get_unaligned_le16(source);
+ const u8 *data1 = source + 2;
+ const u8 *data2 = data1 + (dest_length >> 3);
+ const u8 *dest_end = dest + (dest_length & ~0x7);
+ u8 * const dest_tmp_ptr = (u8 *)(&dest_tmp);
+ u8 *op = dest;
+ const u8 * const source_end = source + source_length;
+ const u8 * const source_end_8 = source_end - 8;
+
+ if (safe_mode && unlikely(dest_length > ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ if (safe_mode && unlikely(dest_length > source_length << 3))
+ return -EINVAL;
+ do {
+ if (data2 >= source_end_8)
+ goto _last_8;
+ dest_tmp_ptr[0] = ((*data1 & 0x80) ? *data2 : 0);
+ data2 += (*data1 & 0x80) > 0;
+ dest_tmp_ptr[1] = ((*data1 & 0x40) ? *data2 : 0);
+ data2 += (*data1 & 0x40) > 0;
+ dest_tmp_ptr[2] = ((*data1 & 0x20) ? *data2 : 0);
+ data2 += (*data1 & 0x20) > 0;
+ dest_tmp_ptr[3] = ((*data1 & 0x10) ? *data2 : 0);
+ data2 += (*data1 & 0x10) > 0;
+ dest_tmp_ptr[4] = ((*data1 & 0x08) ? *data2 : 0);
+ data2 += (*data1 & 0x08) > 0;
+ dest_tmp_ptr[5] = ((*data1 & 0x04) ? *data2 : 0);
+ data2 += (*data1 & 0x04) > 0;
+ dest_tmp_ptr[6] = ((*data1 & 0x02) ? *data2 : 0);
+ data2 += (*data1 & 0x02) > 0;
+ dest_tmp_ptr[7] = ((*data1 & 0x01) ? *data2 : 0);
+ data2 += (*data1 & 0x01) > 0;
+ data1++;
+ dest_tmp = (dest_tmp & 0xF0F0F0F00F0F0F0FULL)
+ | ((dest_tmp & 0x0F0F0F0F00000000ULL) >> 28)
+ | ((dest_tmp & 0x00000000F0F0F0F0ULL) << 28);
+ put_unaligned_le64(dest_tmp, op);
+ op += 8;
+ } while (op < dest_end);
+ goto _finish;
+_last_8:
+ /*
+ * Reading last 8 bytes from data2. This may produce a lot of output,
+ * if data1 indicates to NOT read from - and inrement - data2
+ */
+ do {
+ dest_tmp = 0;
+ if (safe_mode && unlikely(data1 >= source_end))
+ return -EINVAL;
+ if (*data1 & 0x80) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[0] = *data2++;
+ }
+ if (*data1 & 0x40) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[1] = *data2++;
+ }
+ if (*data1 & 0x20) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[2] = *data2++;
+ }
+ if (*data1 & 0x10) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[3] = *data2++;
+ }
+ if (*data1 & 0x08) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[4] = *data2++;
+ }
+ if (*data1 & 0x04) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[5] = *data2++;
+ }
+ if (*data1 & 0x02) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[6] = *data2++;
+ }
+ if (*data1 & 0x01) {
+ if (safe_mode && unlikely(data2 >= source_end))
+ return -EINVAL;
+ dest_tmp_ptr[7] = *data2++;
+ }
+ data1++;
+ dest_tmp = (dest_tmp & 0xF0F0F0F00F0F0F0FULL)
+ | ((dest_tmp & 0x0F0F0F0F00000000ULL) >> 28)
+ | ((dest_tmp & 0x00000000F0F0F0F0ULL) << 28);
+ put_unaligned_le64(dest_tmp, op);
+ op += 8;
+ } while (op < dest_end);
+_finish:
+ memcpy(op, data2, dest_length & 0x7);
+ op += dest_length & 0x7;
+ if (safe_mode && (dest_length != (op - dest)))
+ return -EINVAL;
+ return dest_length;
+}
+
+static __always_inline int decompress_jbe2_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ return decompress_jbe2(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_jbe2_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ return decompress_jbe2(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_jbe2(void)
+{
+ return 0;
+}
+
+static void exit_jbe2(void)
+{
+}
+
+struct zbewalgo_alg alg_jbe2 = {
+ .name = "jbe2",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 0,
+ .init = init_jbe2,
+ .exit = exit_jbe2,
+ .compress = compress_jbe2,
+ .decompress_safe = decompress_jbe2_safe,
+ .decompress_fast = decompress_jbe2_fast
+};
diff --git a/lib/zbewalgo/JBE2.h b/lib/zbewalgo/JBE2.h
new file mode 100644
index 00000000..78e6a52e
--- /dev/null
+++ b/lib/zbewalgo/JBE2.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_JBE2_H
+#define ZBEWALGO_JBE2_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_jbe2;
+
+#endif
diff --git a/lib/zbewalgo/MTF.c b/lib/zbewalgo/MTF.c
new file mode 100644
index 00000000..acc621d7
--- /dev/null
+++ b/lib/zbewalgo/MTF.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * The Move-To-Front algorithm as described by 'M. Burrows' and
+ * 'D. J. Wheeler' in the same paper as bwt.
+ * Their original paper is online available at:
+ * http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf
+ */
+
+#include "MTF.h"
+
+static u8 initial_data[256];
+
+static __always_inline int compress_mtf(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ u8 * const wrk = (u8 *)wrkmem;
+ const u8 *source_end = source + source_length;
+ const u8 *ip = source;
+ u8 *op = dest;
+ u16 i;
+ u8 tmp;
+ u64 tmp64;
+ u32 tmp32;
+ u16 tmp16;
+
+ memcpy(wrk, &initial_data[0], 256);
+ do {
+ i = 0;
+ while (wrk[i] != *ip)
+ i++;
+ ip++;
+ *op++ = i;
+ tmp = wrk[i];
+ while (i >= 8) {
+ tmp64 = get_unaligned_le64(&wrk[i - 8]);
+ put_unaligned_le64(tmp64, &wrk[i - 7]);
+ i -= 8;
+ }
+ if (i & 0x4) {
+ tmp32 = get_unaligned_le32(&wrk[i - 4]);
+ put_unaligned_le32(tmp32, &wrk[i - 3]);
+ i -= 4;
+ }
+ if (i & 0x2) {
+ tmp16 = get_unaligned_le16(&wrk[i - 2]);
+ put_unaligned_le16(tmp16, &wrk[i - 1]);
+ i -= 2;
+ }
+ if (i & 0x1)
+ wrk[1] = wrk[0];
+ wrk[0] = tmp;
+ } while (ip < source_end);
+ return source_length;
+}
+
+static __always_inline int decompress_mtf(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ u8 * const wrk = (u8 *)wrkmem;
+ u8 *dest_end = dest + source_length;
+ u16 i;
+ u8 tmp;
+ const u8 *ip = source;
+ u8 *op = dest;
+ u64 tmp64;
+ u32 tmp32;
+ u16 tmp16;
+
+ memcpy(wrk, &initial_data[0], 256);
+ do {
+ i = *ip++;
+ *op++ = wrk[i];
+ tmp = wrk[i];
+ while (i >= 8) {
+ tmp64 = get_unaligned_le64(&wrk[i - 8]);
+ put_unaligned_le64(tmp64, &wrk[i - 7]);
+ i -= 8;
+ }
+ if (i & 0x4) {
+ tmp32 = get_unaligned_le32(&wrk[i - 4]);
+ put_unaligned_le32(tmp32, &wrk[i - 3]);
+ i -= 4;
+ }
+ if (i & 0x2) {
+ tmp16 = get_unaligned_le16(&wrk[i - 2]);
+ put_unaligned_le16(tmp16, &wrk[i - 1]);
+ i -= 2;
+ }
+ if (i & 0x1)
+ wrk[1] = wrk[0];
+ wrk[0] = tmp;
+ } while (op < dest_end);
+ return source_length;
+}
+
+static int init_mtf(void)
+{
+ int i;
+
+ for (i = 0; i < 256; i++)
+ initial_data[i] = i;
+ return 0;
+}
+
+static void exit_mtf(void)
+{
+}
+
+struct zbewalgo_alg alg_mtf = {
+ .name = "mtf",
+ .flags = ZBEWALGO_ALG_FLAG_TRANSFORM,
+ .wrkmem_size = 256,
+ .init = init_mtf,
+ .exit = exit_mtf,
+ .compress = compress_mtf,
+ .decompress_safe = decompress_mtf,
+ .decompress_fast = decompress_mtf
+};
diff --git a/lib/zbewalgo/MTF.h b/lib/zbewalgo/MTF.h
new file mode 100644
index 00000000..3fa9efa6
--- /dev/null
+++ b/lib/zbewalgo/MTF.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_MTF_H
+#define ZBEWALGO_MTF_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_mtf;
+
+#endif
diff --git a/lib/zbewalgo/Makefile b/lib/zbewalgo/Makefile
new file mode 100644
index 00000000..f8cf7bfc
--- /dev/null
+++ b/lib/zbewalgo/Makefile
@@ -0,0 +1,4 @@
+ccflags-y += -O3 -fno-tree-vrp
+
+obj-$(CONFIG_ZBEWALGO_COMPRESS) += zbewalgo_compress.o
+zbewalgo_compress-objs := zbewalgo.o bewalgo.o bewalgo2.o bitshuffle.o BWT.o huffman.o JBE.o JBE2.o MTF.o RLE.o
diff --git a/lib/zbewalgo/RLE.c b/lib/zbewalgo/RLE.c
new file mode 100644
index 00000000..3169cea9
--- /dev/null
+++ b/lib/zbewalgo/RLE.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#include "RLE.h"
+
+#define RLE_REPEAT 0x80
+#define RLE_SIMPLE 0x00
+#define RLE_MAX_LENGTH ((1 << 7) - 1)
+
+/*
+ * Run Length Encoder
+ */
+static __always_inline int compress_rle(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length)
+{
+ const u8 *anchor = source;
+ const u8 *source_end = source + source_length;
+ unsigned int count;
+ u8 lastval;
+ u8 *op = dest;
+ const u8 *ip = source;
+
+ do {
+ /* RLE_SIMPLE */
+ do {
+ lastval = *ip++;
+ } while ((ip < source_end) && (lastval != *ip));
+ count = ip - anchor - (ip < source_end);
+ if (count > 0) {
+ while (count > RLE_MAX_LENGTH) {
+ *op++ = RLE_SIMPLE | RLE_MAX_LENGTH;
+ memcpy(op, anchor, RLE_MAX_LENGTH + 1);
+ anchor += RLE_MAX_LENGTH + 1;
+ op += RLE_MAX_LENGTH + 1;
+ count -= RLE_MAX_LENGTH + 1;
+ }
+ if (count > 0) {
+ *op++ = RLE_SIMPLE | (count - 1);
+ memcpy(op, anchor, count);
+ anchor += count;
+ op += count;
+ }
+ }
+ if (ip == source_end)
+ return op - dest;
+ /* RLE_REPEAT */
+ do {
+ lastval = *ip++;
+ } while ((ip < source_end) && (lastval == *ip));
+ count = ip - anchor;
+ if (count > 0) {
+ anchor += count;
+ while (count > RLE_MAX_LENGTH) {
+ *op++ = RLE_REPEAT | RLE_MAX_LENGTH;
+ *op++ = lastval;
+ count -= RLE_MAX_LENGTH + 1;
+ }
+ if (count > 0) {
+ *op++ = RLE_REPEAT | (count - 1);
+ *op++ = lastval;
+ }
+ }
+ } while (ip != source_end);
+ return op - dest;
+}
+
+static __always_inline int decompress_rle(const u8 * const source,
+ u8 * const dest, u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ unsigned int length;
+ const u8 *ip = source;
+ u8 *op = dest;
+ const u8 *const source_end = source + source_length;
+
+ while (ip + 1 < source_end) {
+ if ((*ip & RLE_REPEAT) == RLE_REPEAT) {
+ length = *ip++ - RLE_REPEAT + 1;
+ if (safe_mode
+ && unlikely(op + length
+ > dest + ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ memset(op, *ip++, length);
+ op += length;
+ } else {
+ length = *ip++ - RLE_SIMPLE + 1;
+ if (safe_mode && unlikely(ip + length > source_end))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(op + length
+ > dest + ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ memcpy(op, ip, length);
+ ip += length;
+ op += length;
+ }
+ }
+ return op - dest;
+}
+
+static __always_inline int decompress_rle_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_rle(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_rle_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_rle(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_rle(void)
+{
+ return 0;
+}
+
+static void exit_rle(void)
+{
+}
+
+struct zbewalgo_alg alg_rle = {
+ .name = "rle",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 0,
+ .init = init_rle,
+ .exit = exit_rle,
+ .compress = compress_rle,
+ .decompress_safe = decompress_rle_safe,
+ .decompress_fast = decompress_rle_fast
+};
diff --git a/lib/zbewalgo/RLE.h b/lib/zbewalgo/RLE.h
new file mode 100644
index 00000000..c1c06264
--- /dev/null
+++ b/lib/zbewalgo/RLE.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_RLE_H
+#define ZBEWALGO_RLE_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_rle;
+
+#endif
diff --git a/lib/zbewalgo/bewalgo.c b/lib/zbewalgo/bewalgo.c
new file mode 100644
index 00000000..88681933
--- /dev/null
+++ b/lib/zbewalgo/bewalgo.c
@@ -0,0 +1,401 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * Benjamin Warnke invented this algorithm for his bachelors thesis
+ * 'Page-Based compression in the Linux Kernel'. This algorithm is
+ * mainly inspired by lz4, focusing on increasing the speed even more,
+ * with the help of page aligned read an write access. To achieve the
+ * page alignment, the input and output data is accessed only in
+ * blocks of 8 Bytes, therefore the encoding of the compressed data is
+ * changed.
+ *
+ * His thesis is available at:
+ * https://wr.informatik.uni-hamburg.de/_media/research:theses:benjamin_warnke_page_based_compression_in_the_linux_kernel.pdf
+ */
+
+#include "bewalgo.h"
+
+#define BEWALGO_ACCELERATION_DEFAULT 1
+
+#define BEWALGO_MEMORY_USAGE 14
+
+#define BEWALGO_SKIPTRIGGER 6
+
+#define BEWALGO_LENGTH_BITS 8
+#define BEWALGO_LENGTH_MAX ((1 << BEWALGO_LENGTH_BITS) - 1)
+#define BEWALGO_OFFSET_BITS 16
+#define BEWALGO_OFFSET_MAX ((1 << BEWALGO_OFFSET_BITS) - 1)
+
+#define BEWALGO_HASHLOG (BEWALGO_MEMORY_USAGE - 2)
+
+/*
+ * this macro is faster than memcpy if mostly short byte sequences are
+ * copied
+ */
+#define bewalgo_copy_helper_src(d, s, t) \
+do { \
+ while (s + 32 <= t) { \
+ put_unaligned_le64(get_unaligned_le64(s), d);\
+ put_unaligned_le64(get_unaligned_le64(s + 8), d + 8);\
+ put_unaligned_le64(get_unaligned_le64(s + 16), d + 16);\
+ put_unaligned_le64(get_unaligned_le64(s + 24), d + 24);\
+ d += 32; \
+ s += 32; \
+ } \
+ if (s + 16 <= t) { \
+ put_unaligned_le64(get_unaligned_le64(s), d);\
+ put_unaligned_le64(get_unaligned_le64(s + 8), d + 8);\
+ d += 16; \
+ s += 16; \
+ } \
+ if (s < t) {\
+ put_unaligned_le64(get_unaligned_le64(s), d);\
+ d += 8; \
+ s += 8; \
+ } \
+} while (0)
+
+static __always_inline int decompress_bewalgo(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ const u16 dest_length = get_unaligned_le16(source);
+ const u8 * const source_end = source + ((source_length - 2) & ~0x7);
+ const u8 * const dest_end = dest + (dest_length & ~0x7);
+ const u8 *ip = source + 2;
+ u8 *match = dest;
+ u8 *op = dest;
+ const u8 *control_block_ptr;
+ const u8 *target;
+ u16 length[4];
+
+ if (safe_mode
+ && unlikely(((source_length - 2) & 0x7) != (dest_length & 0x7)))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(ip + 8 >= source_end))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(dest_length > ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ do {
+ control_block_ptr = ip;
+ length[0] = control_block_ptr[0] << 3;
+ length[1] = control_block_ptr[1] << 3;
+ length[2] = control_block_ptr[4] << 3;
+ length[3] = control_block_ptr[5] << 3;
+ if (safe_mode
+ && unlikely(ip + length[0] + length[2] > source_end))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(op + length[0] + length[1] + length[2]
+ + length[3] > dest_end))
+ return -EINVAL;
+ ip += 8;
+ target = ip + length[0];
+ bewalgo_copy_helper_src(op, ip, target);
+ match = op - (get_unaligned_le16(control_block_ptr + 2) << 3);
+ if (safe_mode && unlikely(match < dest))
+ return -EINVAL;
+ target = match + (control_block_ptr[1] << 3);
+ bewalgo_copy_helper_src(op, match, target);
+ target = ip + (control_block_ptr[4] << 3);
+ bewalgo_copy_helper_src(op, ip, target);
+ match = op - (get_unaligned_le16(control_block_ptr + 6) << 3);
+ if (safe_mode && unlikely(match < dest))
+ return -EINVAL;
+ target = match + (control_block_ptr[5] << 3);
+ bewalgo_copy_helper_src(op, match, target);
+ } while (likely(ip < source_end));
+ memcpy(op, ip, (source_length - 2) & 0x7);
+ op += (source_length - 2) & 0x7;
+ if (safe_mode && (dest_length != (op - dest)))
+ return -EINVAL;
+ return dest_length;
+}
+
+static __always_inline int decompress_bewalgo_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bewalgo(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_bewalgo_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bewalgo(source, dest, wrkmem, source_length, 0);
+}
+
+/*
+ * The hashtable 'wrkmem' allows indicees in the range
+ * [0 .. ((1 << BEWALGO_HASHLOG) - 1)].
+ * Each input sequence hashed and mapped to a fixed index in that array. The
+ * final shift '>> (64 - BEWALGO_HASHLOG)' guarantees, that the index is
+ * valid. The hashtable is used to find known sequences in the input array.
+ */
+static __always_inline u32 bewalgo_compress_hash(u64 sequence)
+{
+ return ((sequence >> 24) * 11400714785074694791ULL)
+ >> (64 - BEWALGO_HASHLOG);
+}
+
+static __always_inline int
+compress_bewalgo_(u16 *wrkmem,
+ const u8 * const source, u8 * const dest,
+ const u16 source_length, u8 acceleration)
+{
+ u32 * const table = (u32 *)wrkmem;
+ const int acceleration_start =
+ (acceleration < 4 ? 32 >> acceleration : 4);
+ const u8 * const dest_end_ptr = dest
+ + ((zbewalgo_max_output_size + 0x7) & ~0x7) + 2;
+ const u8 * const source_end_ptr = source
+ + (source_length & ~0x7);
+ const u8 * const source_end_ptr_1 = source_end_ptr - 8;
+ const u8 *match = source;
+ const u8 *anchor = source;
+ const u8 *ip = source;
+ u8 *op = dest + 2;
+ u8 *op_control = NULL;
+ u32 op_control_available = 0;
+ const u8 *target;
+ int length;
+ u16 offset;
+ u32 h;
+ int j;
+ int tmp_literal_length;
+ int match_nodd;
+ int match_nzero_nodd;
+
+ put_unaligned_le16(source_length, dest);
+ memset(wrkmem, 0, 1 << BEWALGO_MEMORY_USAGE);
+ do {
+ j = acceleration_start;
+ length = (source_end_ptr_1 - ip) >> 3;
+ j = j < length ? j : length;
+ for (length = 1; length <= j; length++) {
+ ip += 8;
+ h = bewalgo_compress_hash(get_unaligned_le64(ip));
+ match = source + table[h];
+ table[h] = ip - source;
+ if (get_unaligned_le64(match)
+ == get_unaligned_le64(ip))
+ goto _find_match_left;
+ }
+ length = acceleration_start
+ + (acceleration << BEWALGO_SKIPTRIGGER);
+ ip += 8;
+ do {
+ if (unlikely(ip >= source_end_ptr))
+ goto _encode_last_literal;
+ h = bewalgo_compress_hash(get_unaligned_le64(ip));
+ match = source + table[h];
+ table[h] = ip - source;
+ if (get_unaligned_le64(match)
+ == get_unaligned_le64(ip))
+ goto _find_match_left;
+ ip += (length++ >> BEWALGO_SKIPTRIGGER) << 3;
+ } while (1);
+_find_match_left:
+ while ((match != source)
+ && (get_unaligned_le64(match - 8)
+ == get_unaligned_le64(ip - 8))) {
+ ip -= 8;
+ match -= 8;
+ if (ip == anchor)
+ goto _find_match_right;
+ }
+ length = (ip - anchor) >> 3;
+ tmp_literal_length = length
+ - (op_control_available ? BEWALGO_LENGTH_MAX : 0);
+ if (unlikely(op
+ + (((tmp_literal_length / (BEWALGO_LENGTH_MAX * 2))
+ + ((tmp_literal_length % (BEWALGO_LENGTH_MAX * 2))
+ > 0)
+ + length) << 3) > dest_end_ptr)) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ while (length > BEWALGO_LENGTH_MAX) {
+ if (op_control_available == 0) {
+ op_control = op;
+ put_unaligned_le64(0, op);
+ op += 8;
+ }
+ op_control_available = !op_control_available;
+ *op_control = BEWALGO_LENGTH_MAX;
+ op_control += 4;
+ target = anchor + (BEWALGO_LENGTH_MAX << 3);
+ bewalgo_copy_helper_src(op, anchor, target);
+ length -= BEWALGO_LENGTH_MAX;
+ }
+ if (likely(length > 0)) {
+ if (op_control_available == 0) {
+ op_control = op;
+ put_unaligned_le64(0, op);
+ op += 8;
+ }
+ op_control_available = !op_control_available;
+ *op_control = length;
+ op_control += 4;
+ bewalgo_copy_helper_src(op, anchor, ip);
+ }
+_find_match_right:
+ do {
+ ip += 8;
+ match += 8;
+ } while ((ip < source_end_ptr)
+ && (get_unaligned_le64(match)
+ == get_unaligned_le64(ip)));
+ length = (ip - anchor) >> 3;
+ offset = (ip - match) >> 3;
+ anchor = ip;
+ if (length > BEWALGO_LENGTH_MAX) {
+ u32 val =
+ (BEWALGO_LENGTH_MAX << 8) | (offset << 16);
+ size_t match_length_div_255 = length / 255;
+ size_t match_length_mod_255 = length % 255;
+ u32 match_zero = match_length_mod_255 == 0;
+ u32 match_nzero = !match_zero;
+ int control_blocks_needed = match_length_div_255
+ + match_nzero
+ - op_control_available;
+
+ if (unlikely(op + (((control_blocks_needed >> 1)
+ + (control_blocks_needed & 1))
+ << 3) > dest_end_ptr)) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ op_control = op_control_available > 0
+ ? op_control
+ : op;
+ put_unaligned_le32(val, op_control);
+ match_length_div_255 -= op_control_available > 0;
+ match_nodd = !(match_length_div_255 & 1);
+ match_nzero_nodd = match_nzero && match_nodd;
+ if (match_length_div_255 > 0) {
+ u64 val_l =
+ val
+ | (((u64)val)
+ << 32);
+ target = op + (((match_length_div_255 >> 1)
+ + (match_length_div_255 & 1))
+ << 3);
+ do {
+ put_unaligned_le64(val_l, op);
+ op += 8;
+ } while (op < target);
+ }
+ op_control = op - 4;
+ put_unaligned_le32(0, op_control
+ + (match_nzero_nodd << 3));
+ put_unaligned_le32(0, op_control
+ + (match_nzero_nodd << 2));
+ *(op_control + (match_nzero_nodd << 2) + 1) =
+ (match_zero & match_nodd)
+ ? BEWALGO_LENGTH_MAX
+ : match_length_mod_255;
+ put_unaligned_le16(offset, op_control
+ + (match_nzero_nodd << 2) + 2);
+ op_control += match_nzero_nodd << 3;
+ op += match_nzero_nodd << 3;
+ op_control_available =
+ (match_length_div_255 & 1) == match_zero;
+ } else {
+ if (unlikely(op_control_available == 0
+ && op >= dest_end_ptr
+ && op_control[-3] != 0))
+ return -EINVAL;
+ if (op_control[-3] != 0) {
+ if (op_control_available == 0) {
+ op_control = op;
+ put_unaligned_le64(0, op);
+ op += 8;
+ }
+ op_control_available = !op_control_available;
+ op_control += 4;
+ }
+ op_control[-3] = length;
+ put_unaligned_le16(offset, op_control - 2);
+ }
+ if (unlikely(ip == source_end_ptr))
+ goto _finish;
+ h = bewalgo_compress_hash(get_unaligned_le64(ip));
+ match = source + table[h];
+ table[h] = ip - source;
+ if (get_unaligned_le64(match) == get_unaligned_le64(ip))
+ goto _find_match_right;
+ } while (1);
+_encode_last_literal:
+ length = (source_end_ptr - anchor) >> 3;
+ tmp_literal_length = length
+ - (op_control_available ? BEWALGO_LENGTH_MAX : 0);
+ if (op + ((((tmp_literal_length / (BEWALGO_LENGTH_MAX * 2)))
+ + ((tmp_literal_length % (BEWALGO_LENGTH_MAX * 2)) > 0)
+ + length) << 3) > dest_end_ptr)
+ return -EINVAL;
+ while (length > BEWALGO_LENGTH_MAX) {
+ if (op_control_available == 0) {
+ op_control = op;
+ put_unaligned_le64(0, op);
+ op += 8;
+ }
+ op_control_available = !op_control_available;
+ *op_control = BEWALGO_LENGTH_MAX;
+ op_control += 4;
+ target = anchor + (BEWALGO_LENGTH_MAX << 3);
+ bewalgo_copy_helper_src(op, anchor, target);
+ length -= BEWALGO_LENGTH_MAX;
+ }
+ if (length > 0) {
+ if (op_control_available == 0) {
+ op_control = op;
+ put_unaligned_le64(0, op);
+ op += 8;
+ }
+ op_control_available = !op_control_available;
+ *op_control = length;
+ op_control += 4;
+ bewalgo_copy_helper_src(op, anchor, source_end_ptr);
+ }
+_finish:
+ memcpy(op, anchor, source_length & 0x7);
+ op += source_length & 0x7;
+ return op - dest;
+}
+
+static __always_inline int compress_bewalgo(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ return compress_bewalgo_(wrkmem,
+ source, dest,
+ source_length, 1);
+}
+
+static int init_bewalgo(void)
+{
+ return 0;
+}
+
+static void exit_bewalgo(void)
+{
+}
+
+struct zbewalgo_alg alg_bewalgo = {
+ .name = "bewalgo",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 1 << BEWALGO_MEMORY_USAGE,
+ .init = init_bewalgo,
+ .exit = exit_bewalgo,
+ .compress = compress_bewalgo,
+ .decompress_safe = decompress_bewalgo_safe,
+ .decompress_fast = decompress_bewalgo_fast
+};
diff --git a/lib/zbewalgo/bewalgo.h b/lib/zbewalgo/bewalgo.h
new file mode 100644
index 00000000..81df2375
--- /dev/null
+++ b/lib/zbewalgo/bewalgo.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_BEWALGO_H
+#define ZBEWALGO_BEWALGO_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_bewalgo;
+
+#endif
diff --git a/lib/zbewalgo/bewalgo2.c b/lib/zbewalgo/bewalgo2.c
new file mode 100644
index 00000000..d0677e08
--- /dev/null
+++ b/lib/zbewalgo/bewalgo2.c
@@ -0,0 +1,407 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * BeWalgo2 reads it's input in blocks of 8 Bytes. These Blocks
+ * are added to an avl-tree. The avl-tree is mapped directly to an
+ * array. The encoding is a variation of Run Length Encoding using the
+ * indices in the avl-tree as data. The reason for using the tree
+ * with indices is, that the indices can be encoded in less then
+ * 8 Bytes each.
+ */
+
+#include "bewalgo2.h"
+
+#define MAX_LITERALS ((zbewalgo_max_output_size >> 3) - 4)
+
+#define OFFSET_BITS 9
+#define OFFSET_SHIFT (16 - OFFSET_BITS)
+#define MATCH_BIT_SHIFT 6
+#define MATCH_BIT_MASK BIT(MATCH_BIT_SHIFT)
+#define LENGTH_BITS 6
+#define LENGTH_MASK ((1 << LENGTH_BITS) - 1)
+#define LEFT 0
+#define RIGHT 1
+#define NEITHER 2
+#define TREE_NODE_NULL 0xffff
+
+/*
+ * insert first node into an empty avl-tree
+ * the function returns the index of the node in the preallocated array
+ */
+static __always_inline int avl_insert_first(u64 *wrk_literal, u16 *wrk_tree,
+ u16 *treep, u64 target,
+ u16 *treesize)
+{
+ u16 g_a_tree, g_o_tree2;
+
+ g_a_tree = *treesize;
+ g_o_tree2 = g_a_tree << 2;
+ *treesize = *treesize + 1;
+ wrk_tree[g_o_tree2 + 0] = TREE_NODE_NULL;
+ wrk_tree[g_o_tree2 + 1] = TREE_NODE_NULL;
+ wrk_tree[g_o_tree2 + 2] = NEITHER;
+ wrk_literal[g_a_tree] = target;
+ *treep = g_a_tree;
+ return g_a_tree;
+}
+
+/*
+ * insert a node into a non-empty avl-tree
+ * for speed, the nodes are saved in preallocated arrays
+ * the function returns the index of the node in the preallocated array
+ */
+static __always_inline int avl_insert(u64 *wrk_literal, u16 *wrk_tree,
+ u16 *treep, u64 target,
+ u16 *treesize)
+{
+ u16 *path_top[2];
+ u16 g_a_tree, g_o_tree2, g_o_next_step;
+ u16 r_1_arr[10];
+ u16 path, path2, first[2], second;
+
+ if (unlikely(target == wrk_literal[*treep]))
+ return *treep;
+ path_top[0] = treep;
+ g_o_next_step = target > wrk_literal[*treep];
+ g_o_tree2 = *treep << 2;
+ path_top[wrk_tree[g_o_tree2 + 2] == NEITHER] = treep;
+ treep = &wrk_tree[g_o_tree2 + g_o_next_step];
+ if (unlikely(*treep == TREE_NODE_NULL))
+ goto _insert_new_node;
+_search_node:
+ if (target == wrk_literal[*treep])
+ return *treep;
+ g_o_next_step = target > wrk_literal[*treep];
+ g_o_tree2 = *treep << 2;
+ path_top[wrk_tree[g_o_tree2 + 2] == NEITHER] = treep;
+ treep = &wrk_tree[g_o_tree2 + g_o_next_step];
+ if (likely(*treep != TREE_NODE_NULL))
+ goto _search_node;
+_insert_new_node:
+ g_a_tree = *treesize;
+ g_o_tree2 = g_a_tree << 2;
+ *treesize = *treesize + 1;
+ wrk_tree[g_o_tree2 + 0] = TREE_NODE_NULL;
+ wrk_tree[g_o_tree2 + 1] = TREE_NODE_NULL;
+ wrk_tree[g_o_tree2 + 2] = NEITHER;
+ wrk_literal[g_a_tree] = target;
+ *treep = g_a_tree;
+ path = *path_top[0];
+ path2 = path << 2;
+ if (wrk_tree[path2 + 2] == NEITHER) {
+ do {
+ r_1_arr[0] = target > wrk_literal[path];
+ wrk_tree[path2 + 2] = r_1_arr[0];
+ path = wrk_tree[path2 + r_1_arr[0]];
+ path2 = path << 2;
+ } while (target != wrk_literal[path]);
+ return g_a_tree;
+ }
+ first[0] = target > wrk_literal[path];
+ if (wrk_tree[path2 + 2] != first[0]) {
+ wrk_tree[path2 + 2] = NEITHER;
+ r_1_arr[2] = wrk_tree[path2 + first[0]];
+ if (target == wrk_literal[r_1_arr[2]])
+ return g_a_tree;
+ do {
+ r_1_arr[0] = target > wrk_literal[r_1_arr[2]];
+ r_1_arr[1] = r_1_arr[2] << 2;
+ r_1_arr[2] = wrk_tree[r_1_arr[1] + r_1_arr[0]];
+ wrk_tree[r_1_arr[1] + 2] = r_1_arr[0];
+ } while (target != wrk_literal[r_1_arr[2]]);
+ return g_a_tree;
+ }
+ second = target > wrk_literal[wrk_tree[path2 + first[0]]];
+ first[1] = 1 - first[0];
+ if (first[0] == second) {
+ r_1_arr[2] = *path_top[0];
+ r_1_arr[3] = r_1_arr[2] << 2;
+ r_1_arr[4] = wrk_tree[r_1_arr[3] + first[0]];
+ r_1_arr[5] = r_1_arr[4] << 2;
+ wrk_tree[r_1_arr[3] + first[0]] =
+ wrk_tree[r_1_arr[5] + first[1]];
+ path = wrk_tree[r_1_arr[5] + first[0]];
+ *path_top[0] = r_1_arr[4];
+ wrk_tree[r_1_arr[5] + first[1]] = r_1_arr[2];
+ wrk_tree[r_1_arr[3] + 2] = NEITHER;
+ wrk_tree[r_1_arr[5] + 2] = NEITHER;
+ if (target == wrk_literal[path])
+ return g_a_tree;
+ do {
+ r_1_arr[0] = target > wrk_literal[path];
+ r_1_arr[1] = path << 2;
+ wrk_tree[r_1_arr[1] + 2] = r_1_arr[0];
+ path = wrk_tree[r_1_arr[1] + r_1_arr[0]];
+ } while (target != wrk_literal[path]);
+ return g_a_tree;
+ }
+ path = wrk_tree[(wrk_tree[path2 + first[0]] << 2) + second];
+ r_1_arr[5] = *path_top[0];
+ r_1_arr[1] = r_1_arr[5] << 2;
+ r_1_arr[8] = wrk_tree[r_1_arr[1] + first[0]];
+ r_1_arr[0] = r_1_arr[8] << 2;
+ r_1_arr[6] = wrk_tree[r_1_arr[0] + first[1]];
+ r_1_arr[7] = r_1_arr[6] << 2;
+ r_1_arr[2] = wrk_tree[r_1_arr[7] + first[1]];
+ r_1_arr[3] = wrk_tree[r_1_arr[7] + first[0]];
+ *path_top[0] = r_1_arr[6];
+ wrk_tree[r_1_arr[7] + first[1]] = r_1_arr[5];
+ wrk_tree[r_1_arr[7] + first[0]] = r_1_arr[8];
+ wrk_tree[r_1_arr[1] + first[0]] = r_1_arr[2];
+ wrk_tree[r_1_arr[0] + first[1]] = r_1_arr[3];
+ wrk_tree[r_1_arr[7] + 2] = NEITHER;
+ wrk_tree[r_1_arr[1] + 2] = NEITHER;
+ wrk_tree[r_1_arr[0] + 2] = NEITHER;
+ if (target == wrk_literal[path])
+ return g_a_tree;
+ r_1_arr[9] = (target > wrk_literal[path]) == first[0];
+ wrk_tree[r_1_arr[r_1_arr[9]] + 2] = first[r_1_arr[9]];
+ path = r_1_arr[r_1_arr[9] + 2];
+ if (target == wrk_literal[path])
+ return g_a_tree;
+ do {
+ path2 = path << 2;
+ r_1_arr[4] = target > wrk_literal[path];
+ wrk_tree[path2 + 2] = r_1_arr[4];
+ path = wrk_tree[path2 + r_1_arr[4]];
+ } while (target != wrk_literal[path]);
+ return g_a_tree;
+}
+
+/*
+ * compress the given data using a binary tree holding all the previously
+ * seen 64-bit values
+ * returns the length of the compressed data
+ * wrkmem should be aligned to at least 8 by the caller
+ */
+static __always_inline int compress_bewalgo2(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ const u8 * const source_begin = source;
+ u64 *wrk_literal = (u64 *)wrkmem;
+ u16 *wrk_tree = wrkmem + 2048;
+ u8 *op_match = dest + 4;
+ int count;
+ u16 i;
+ u16 j;
+ u16 tree_root = TREE_NODE_NULL;
+ u16 tree_size = 0;
+ const u8 *ip = source + (source_length & ~0x7) - 8;
+
+ /*
+ * add first value into the tree
+ */
+ i = avl_insert_first(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(ip), &tree_size);
+ /*
+ * to gain performance abort if the data does not seem to be
+ * compressible
+ */
+ if (source_length > 512) {
+ /*
+ * verify that at there are at most 5 different values
+ * at the first 10 positions
+ */
+ for (i = 2; i < 11; i++)
+ avl_insert(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(ip - (i << 3)),
+ &tree_size);
+ if (tree_size < 6)
+ goto _start;
+ /*
+ * verify that there are at most 12 different values
+ * at the first 10 and last 10 positions
+ */
+ for (i = 0; i < 10; i++)
+ avl_insert(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(source_begin + (i << 3)),
+ &tree_size);
+ if (tree_size < 13)
+ goto _start;
+ /*
+ * if the previous conditions do not pass, check some bytes
+ * in the middle for matches.
+ * if not enough matches found: abort
+ */
+ for (i = 0; i < 10; i++)
+ avl_insert(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(source_begin
+ + (256 + (i << 3))),
+ &tree_size);
+ if (tree_size >= 21) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ }
+_start:
+ i = 0;
+_loop1_after_insert:
+ count = 0;
+ do {
+ ip -= 8;
+ count++;
+ } while (likely(ip > source_begin)
+ && (get_unaligned_le64(ip) == wrk_literal[i]));
+ if (count == 1) {
+ count = 0;
+ ip += 8;
+ j = i + count;
+ do {
+ ip -= 8;
+ count++;
+ j++;
+ } while (likely(ip > source_begin)
+ && (j < tree_size)
+ && (get_unaligned_le64(ip) == wrk_literal[j]));
+ ip += 8;
+ count--;
+ if (likely(ip > source_begin)) {
+ do {
+ ip -= 8;
+ count++;
+ j = avl_insert(wrk_literal, wrk_tree,
+ &tree_root,
+ get_unaligned_le64(ip),
+ &tree_size);
+ if (unlikely(tree_size > MAX_LITERALS)) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ } while ((j == i + count)
+ && likely(ip > source_begin));
+ }
+ while (unlikely(count > LENGTH_MASK)) {
+ put_unaligned_le16((i << OFFSET_SHIFT)
+ + MATCH_BIT_MASK + LENGTH_MASK,
+ op_match);
+ op_match += 2;
+ count -= LENGTH_MASK;
+ i += LENGTH_MASK;
+ }
+ put_unaligned_le16((i << OFFSET_SHIFT) + MATCH_BIT_MASK
+ + count, op_match);
+ op_match += 2;
+ if (unlikely(ip <= source_begin))
+ goto _finish;
+ i = j;
+ goto _loop1_after_insert;
+ } else {
+ while (unlikely(count > LENGTH_MASK)) {
+ put_unaligned_le16((i << OFFSET_SHIFT) + LENGTH_MASK,
+ op_match);
+ op_match += 2;
+ count -= LENGTH_MASK;
+ }
+ put_unaligned_le16((i << OFFSET_SHIFT) + count, op_match);
+ op_match += 2;
+ }
+ if (unlikely(ip <= source_begin))
+ goto _finish;
+ i = avl_insert(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(ip), &tree_size);
+ goto _loop1_after_insert;
+_finish:
+ j = avl_insert(wrk_literal, wrk_tree, &tree_root,
+ get_unaligned_le64(ip), &tree_size);
+ put_unaligned_le16((j << OFFSET_SHIFT) + 1, op_match);
+ op_match += 2;
+ put_unaligned_le16(op_match - dest, dest);
+ put_unaligned_le16(source_length, dest + 2);
+ count = sizeof(u64) * (tree_size);
+ memcpy(op_match, wrk_literal, count);
+ op_match += count;
+ memcpy(op_match, source + (source_length & ~0x7), source_length & 0x7);
+ op_match += source_length & 0x7;
+ return op_match - dest;
+}
+
+static __always_inline int decompress_bewalgo2(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ const u16 dest_length = get_unaligned_le16(source + 2);
+ const u8 * const source_end1 = source + source_length
+ - (dest_length & 0x7);
+ const u8 *ip_match1 = source + 4;
+ const u8 *ip_match_end1 = source + get_unaligned_le16(source);
+ const u8 *ip_literal1 = ip_match_end1;
+ u16 i;
+ u16 count;
+ u64 tmp;
+ u8 *op1 = dest + (dest_length & ~0x7) - 8;
+
+ if (safe_mode && unlikely(op1 - dest > ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+ while (ip_match1 < ip_match_end1) {
+ i = (get_unaligned_le16(ip_match1) >> OFFSET_SHIFT) << 3;
+ count = get_unaligned_le16(ip_match1) & LENGTH_MASK;
+ if (get_unaligned_le16(ip_match1) & MATCH_BIT_MASK) {
+ if (safe_mode && unlikely(ip_literal1 + i
+ + (count << 3) > source_end1))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(op1 - ((count - 1) << 3) < dest))
+ return -EINVAL;
+ while (count-- > 0) {
+ tmp = get_unaligned_le64(ip_literal1 + i);
+ i += 8;
+ put_unaligned_le64(tmp, op1);
+ op1 -= 8;
+ }
+ } else{
+ if (safe_mode
+ && unlikely(ip_literal1 + i > source_end1))
+ return -EINVAL;
+ if (safe_mode
+ && unlikely(op1 - ((count - 1) << 3) < dest))
+ return -EINVAL;
+ while (count-- > 0) {
+ tmp = get_unaligned_le64(ip_literal1 + i);
+ put_unaligned_le64(tmp, op1);
+ op1 -= 8;
+ }
+ }
+ ip_match1 += 2;
+ }
+ memcpy(dest + (dest_length & ~0x7), source_end1, dest_length & 0x7);
+ return dest_length;
+}
+
+static __always_inline int decompress_bewalgo2_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bewalgo2(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_bewalgo2_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_bewalgo2(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_bewalgo2(void)
+{
+ return 0;
+}
+
+static void exit_bewalgo2(void)
+{
+}
+
+struct zbewalgo_alg alg_bewalgo2 = {
+ .name = "bewalgo2",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 8192,
+ .init = init_bewalgo2,
+ .exit = exit_bewalgo2,
+ .compress = compress_bewalgo2,
+ .decompress_safe = decompress_bewalgo2_safe,
+ .decompress_fast = decompress_bewalgo2_fast
+};
diff --git a/lib/zbewalgo/bewalgo2.h b/lib/zbewalgo/bewalgo2.h
new file mode 100644
index 00000000..b2963f20
--- /dev/null
+++ b/lib/zbewalgo/bewalgo2.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_BEWALGO2_H
+#define ZBEWALGO_BEWALGO2_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_bewalgo2;
+
+#endif
diff --git a/lib/zbewalgo/bitshuffle.c b/lib/zbewalgo/bitshuffle.c
new file mode 100644
index 00000000..c8ed2a12
--- /dev/null
+++ b/lib/zbewalgo/bitshuffle.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#include "bitshuffle.h"
+
+/*
+ * performs a static transformation on the data. Moves every eighth byte into
+ * a consecutive range
+ */
+static __always_inline int compress_bitshuffle(const u8 *const source,
+ u8 *const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ u16 i;
+ const u16 source_length2 = source_length & ~0x7;
+ u8 *op = dest;
+
+ for (i = 0; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 1; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 2; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 3; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 4; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 5; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 6; i < source_length2; i += 8)
+ *op++ = source[i];
+ for (i = 7; i < source_length2; i += 8)
+ *op++ = source[i];
+ memcpy(dest + source_length2, source + source_length2,
+ source_length & 0x7);
+ return source_length;
+}
+
+/*
+ * reverses the transformation step
+ */
+static __always_inline int decompress_bitshuffle(const u8 *const source,
+ u8 *const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ u16 i;
+ const u16 source_length2 = source_length & ~0x7;
+ const u8 *ip = source;
+
+ for (i = 0; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 1; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 2; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 3; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 4; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 5; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 6; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ for (i = 7; i < source_length2; i += 8)
+ dest[i] = *ip++;
+ memcpy(dest + source_length2, source + source_length2,
+ source_length & 0x7);
+ return source_length;
+}
+
+static int init_bitshuffle(void)
+{
+ return 0;
+}
+
+static void exit_bitshuffle(void)
+{
+}
+
+struct zbewalgo_alg alg_bitshuffle = {
+ .name = "bitshuffle",
+ .flags = ZBEWALGO_ALG_FLAG_TRANSFORM,
+ .wrkmem_size = 0,
+ .init = init_bitshuffle,
+ .exit = exit_bitshuffle,
+ .compress = compress_bitshuffle,
+ .decompress_safe = decompress_bitshuffle,
+ .decompress_fast = decompress_bitshuffle
+};
diff --git a/lib/zbewalgo/bitshuffle.h b/lib/zbewalgo/bitshuffle.h
new file mode 100644
index 00000000..2088e48f
--- /dev/null
+++ b/lib/zbewalgo/bitshuffle.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_BITSHUFFLE_H
+#define ZBEWALGO_BITSHUFFLE_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_bitshuffle;
+
+#endif
diff --git a/lib/zbewalgo/huffman.c b/lib/zbewalgo/huffman.c
new file mode 100644
index 00000000..210dc4e3
--- /dev/null
+++ b/lib/zbewalgo/huffman.c
@@ -0,0 +1,262 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#include "huffman.h"
+
+/*
+ * compression using the bare huffman encoding. Optimized for speed and small
+ * input buffers.
+ * wrkmem should be aligned by the caller to at least 4
+ */
+static __always_inline int compress_huffman(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length)
+{
+ const u8 * const source_end = source + source_length;
+ const u8 * const dest_end = dest + zbewalgo_max_output_size;
+ short * const nodes_index = (short *)(wrkmem);
+ u16 * const leaf_parent_index = wrkmem + 768;
+ u16 * const nodes_weight = wrkmem + 1024;
+ u16 * const frequency = wrkmem + 1536;
+ u16 * const code_lengths = wrkmem + 1792;
+ u32 * const code_words = (u32 *)(wrkmem + 2048);
+ short i;
+ u16 node_index;
+ int i2;
+ short max_codeword_length;
+ u32 weight2;
+ short num_nodes;
+ u32 bits_in_buffer;
+ u32 bits_in_stack;
+ u16 free_index;
+ const u8 *ip;
+ u8 *op;
+ u32 stack;
+ u8 * const stack_ptr = (u8 *)&stack;
+
+ memset(dest, 0, zbewalgo_max_output_size);
+ memset(wrkmem, 0, 5120);
+ op = dest;
+ ip = source;
+ put_unaligned_le16(source_length, dest);
+ do {
+ frequency[*ip++]++;
+ } while (ip < source_end);
+ ip = source;
+ num_nodes = 0;
+ for (i = 0; i < 256; i++) {
+ if (frequency[i] > 0) {
+ i2 = num_nodes++;
+ while ((i2 > 0) && (nodes_weight[i2] > frequency[i])) {
+ nodes_weight[i2 + 1] = nodes_weight[i2];
+ nodes_index[i2 + 1] = nodes_index[i2];
+ leaf_parent_index[nodes_index[i2]]++;
+ i2--;
+ }
+ i2++;
+ nodes_index[i2] = -(i + 1);
+ leaf_parent_index[-(i + 1)] = i2;
+ nodes_weight[i2] = frequency[i];
+ }
+ }
+ dest[2] = num_nodes - 1;
+ op = dest + 3;
+ for (i = 1; i <= num_nodes; ++i) {
+ *op++ = -(nodes_index[i] + 1);
+ put_unaligned_le16(nodes_weight[i], op);
+ op += 2;
+ }
+ free_index = 2;
+ while (free_index <= num_nodes) {
+ weight2 = nodes_weight[free_index - 1]
+ + nodes_weight[free_index];
+ i2 = num_nodes++;
+ while ((i2 > 0) && (nodes_weight[i2] > weight2)) {
+ nodes_weight[i2 + 1] = nodes_weight[i2];
+ nodes_index[i2 + 1] = nodes_index[i2];
+ leaf_parent_index[nodes_index[i2]]++;
+ i2--;
+ }
+ i2++;
+ nodes_index[i2] = free_index >> 1;
+ leaf_parent_index[free_index >> 1] = i2;
+ nodes_weight[i2] = weight2;
+ free_index += 2;
+ }
+ if (num_nodes > 400) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ for (i = 0; i < 256; i++) {
+ if (frequency[i] == 0)
+ continue;
+ bits_in_stack = 0;
+ stack = 0;
+ node_index = leaf_parent_index[-(i + 1)];
+ while (node_index < num_nodes) {
+ stack |= (node_index & 0x1) << bits_in_stack;
+ bits_in_stack++;
+ node_index = leaf_parent_index[(node_index + 1) >> 1];
+ }
+ code_lengths[i] = bits_in_stack;
+ code_words[i] = stack;
+ }
+ max_codeword_length = 0;
+ node_index = leaf_parent_index[nodes_index[1]];
+ while (node_index < num_nodes) {
+ node_index = leaf_parent_index[(node_index + 1) >> 1];
+ max_codeword_length++;
+ }
+ if (max_codeword_length > 24) {
+ /* not encodeable with optimizations */
+ return -EINVAL;
+ }
+ bits_in_buffer = 0;
+ do {
+ bits_in_stack = code_lengths[*ip];
+ stack = code_words[*ip];
+ ip++;
+ bits_in_buffer += bits_in_stack;
+ stack = stack << (32 - bits_in_buffer);
+#ifdef __LITTLE_ENDIAN
+ op[0] |= stack_ptr[3];
+ op[1] |= stack_ptr[2];
+ op[2] |= stack_ptr[1];
+ op[3] |= stack_ptr[0];
+#else
+ op[0] |= stack_ptr[0];
+ op[1] |= stack_ptr[1];
+ op[2] |= stack_ptr[2];
+ op[3] |= stack_ptr[3];
+#endif
+ op += bits_in_buffer >> 3;
+ bits_in_buffer = bits_in_buffer & 0x7;
+ if (unlikely(op >= dest_end)) {
+ /* not compressible */
+ return -EINVAL;
+ }
+ } while (likely(ip < source_end));
+ return op - dest + (bits_in_buffer > 0);
+}
+
+/*
+ * reverses the huffman compression
+ */
+static __always_inline int decompress_huffman(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ const u8 * const source_end = source + source_length;
+ const u32 dest_length = get_unaligned_le16(source);
+ const u8 * const dest_end = dest + dest_length;
+ short * const nodes_index = (short *)(wrkmem);
+ u16 * const nodes_weight = wrkmem + 512;
+ u32 i;
+ int node_index;
+ u32 i2;
+ u32 weight2;
+ u32 num_nodes;
+ u32 current_bit;
+ u32 free_index;
+ const u8 *ip;
+ u8 *op;
+
+ if (safe_mode && unlikely(dest_length > ZBEWALGO_BUFFER_SIZE))
+ return -EINVAL;
+
+ memset(wrkmem, 0, 2048);
+ num_nodes = source[2] + 1;
+ ip = source + 3;
+ op = dest;
+ if (safe_mode && unlikely(ip + 3 * num_nodes > source_end)) {
+ /* source_length too small */
+ return -EINVAL;
+ }
+ if (safe_mode && unlikely(num_nodes == 0))
+ return -EINVAL;
+ for (i = 1; i <= num_nodes; ++i) {
+ nodes_index[i] = -(*ip++ + 1);
+ nodes_weight[i] = get_unaligned_le16(ip);
+ ip += 2;
+ }
+ free_index = 2;
+ while (free_index <= num_nodes) {
+ weight2 = nodes_weight[free_index - 1]
+ + nodes_weight[free_index];
+ i2 = num_nodes++;
+ while (i2 > 0 && nodes_weight[i2] > weight2) {
+ nodes_weight[i2 + 1] = nodes_weight[i2];
+ nodes_index[i2 + 1] = nodes_index[i2];
+ i2--;
+ }
+ i2++;
+ nodes_index[i2] = free_index >> 1;
+ nodes_weight[i2] = weight2;
+ free_index += 2;
+ }
+ current_bit = 0;
+ do {
+ node_index = nodes_index[num_nodes];
+ while (node_index > 0) {
+ if (safe_mode) {
+ /* evaluated at compiletime */
+ if (current_bit == 8) {
+ ip++;
+ if (ip >= source_end) {
+ /* source_length too small */
+ return -EINVAL;
+ }
+ }
+ } else{
+ ip += current_bit == 8;
+ }
+ current_bit = ((current_bit) & 0x7) + 1;
+ node_index = (node_index << 1)
+ - ((*ip >> (8 - current_bit)) & 0x1);
+ if (safe_mode && node_index >= num_nodes)
+ return -EINVAL;
+ node_index = nodes_index[node_index];
+ }
+ *op++ = -(node_index + 1);
+ } while (op < dest_end);
+ return dest_length;
+}
+
+static __always_inline int decompress_huffman_safe(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_huffman(source, dest, wrkmem, source_length, 1);
+}
+
+static __always_inline int decompress_huffman_fast(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length){
+ return decompress_huffman(source, dest, wrkmem, source_length, 0);
+}
+
+static int init_huffman(void)
+{
+ return 0;
+}
+
+static void exit_huffman(void)
+{
+}
+
+struct zbewalgo_alg alg_huffman = {
+ .name = "huffman",
+ .flags = ZBEWALGO_ALG_FLAG_COMPRESS,
+ .wrkmem_size = 5120,
+ .init = init_huffman,
+ .exit = exit_huffman,
+ .compress = compress_huffman,
+ .decompress_safe = decompress_huffman_safe,
+ .decompress_fast = decompress_huffman_fast
+};
diff --git a/lib/zbewalgo/huffman.h b/lib/zbewalgo/huffman.h
new file mode 100644
index 00000000..db148591
--- /dev/null
+++ b/lib/zbewalgo/huffman.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef ZBEWALGO_HUFFMAN_H
+#define ZBEWALGO_HUFFMAN_H
+
+#include "include.h"
+
+extern struct zbewalgo_alg alg_huffman;
+
+#endif
diff --git a/lib/zbewalgo/include.h b/lib/zbewalgo/include.h
new file mode 100644
index 00000000..49da9b7a
--- /dev/null
+++ b/lib/zbewalgo/include.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#ifndef LIB_ZBEWALGO_INCLUDE_H
+#define LIB_ZBEWALGO_INCLUDE_H
+
+#include "linux/module.h"
+#include "asm/unaligned.h"
+
+#define ZBEWALGO_ALG_MAX_NAME 128
+#define ZBEWALGO_ALG_FLAG_COMPRESS 1
+#define ZBEWALGO_ALG_FLAG_TRANSFORM 2
+#define ZBEWALGO_COMBINATION_MAX_IDS 7
+#define ZBEWALGO_MAX_BASE_ALGORITHMS 16
+#define ZBEWALGO_COMBINATION_MAX_ACTIVE 256
+#define ZBEWALGO_BUFFER_SIZE 8192
+
+/*
+ * __KERNEL_DIV_ROUND_UP(..) is not used since shifting is faster than dividing
+ * if the divisor is a power of 2.
+ */
+#define DIV_BY_8_ROUND_UP(val) ((val + 0x7) >> 3)
+
+struct zbewalgo_alg {
+ char name[ZBEWALGO_ALG_MAX_NAME];
+ /* flag wheather this algorithm compresses the data or
+ * transforms the data only
+ */
+ u8 flags;
+ /* the wrkmem required for this algorithm */
+ u32 wrkmem_size;
+ int (*init)(void);
+ void (*exit)(void);
+ /* the compression function must store the size of
+ * input/output data itself
+ */
+ int (*compress)(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+ int (*decompress_safe)(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+ int (*decompress_fast)(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+ u8 id;
+};
+
+/*
+ * to gain speed the compression starts with the algorithm which was good for
+ * the last compressed page.
+ */
+struct zbewalgo_combination {
+ u8 count;
+ u8 ids[ZBEWALGO_COMBINATION_MAX_IDS];
+};
+
+struct zbewalgo_main_data {
+ /*
+ * best_id contains the id of the best combination for the last page
+ */
+ u8 best_id;
+
+ /*
+ * if zero search again for best_id - must be unsigned - underflow of
+ * values is intended
+ */
+ u8 counter_search;
+
+ /*
+ * a bit larger than original compressed size to be still accepted
+ * immediately
+ */
+ u16 best_id_accepted_size;
+};
+
+/*
+ * compression aborts automatically if the compressed data is too large.
+ */
+extern unsigned long zbewalgo_max_output_size;
+
+/*
+ * add a combination to the current active ones. All combinations are the
+ * same for all instances of this algorithm
+ */
+int zbewalgo_add_combination(const char * const string,
+ const int string_length);
+
+/*
+ * replace ALL combinations with the one specified.
+ */
+int zbewalgo_set_combination(const char * const string,
+ const int string_length);
+
+#endif
diff --git a/lib/zbewalgo/zbewalgo.c b/lib/zbewalgo/zbewalgo.c
new file mode 100644
index 00000000..97ed4eaa
--- /dev/null
+++ b/lib/zbewalgo/zbewalgo.c
@@ -0,0 +1,713 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ *
+ * zBeWalgo is a container compression algorithm, which can execute
+ * multiple different compression and transformation algorithms after each
+ * other. The execution of different compression algorithms after each other
+ * will be called 'combination' in this description and in the code.
+ * Additionally to be able to execute combinations of algorithms, zBeWalgo
+ * can try different combinations on the same input. This allows high
+ * compression ratios on completely different datasets, which would otherwise
+ * require its own algorithm each. Executing all known combinations on each
+ * input page would be very slow. Therefore the data is compressed at first
+ * with that combination, which was already successful on the last input
+ * page. If the compressed data size of the current page is similar to that
+ * of the last page, the compressed data is returned immediately without even
+ * trying the other combinations. Even if there is no guarantee that
+ * consecutive calls to the algorithm belong to each other, the speed
+ * improvement is obvious.
+ *
+ * ZRAM uses zsmalloc for the management of the compressed pages. The largest
+ * size-class in zsmalloc is 3264 Bytes. If the compressed data is larger than
+ * that threshold, ZRAM ignores the compression and writes the uncompressed
+ * page instead. As a consequence it is useless to continue compression, if the
+ * algorithm detects, that the data can not be compressed using the current
+ * combination. The threshold for aborting compression can be changed via
+ * sysfs at any time, even if the algorithm is currently in use. If a
+ * combination fails to compress the data, zBeWalgo tries the next
+ * combination. If no combination is able to reduce the data in size,
+ * zBeWalgo returns a negative value.
+ *
+ * Each combination consists of up to 7 compression and transformation steps.
+ * Combinations can be added and removed at any time via sysfs. Already
+ * compressed Data can always be decompressed, even if the combination used
+ * to produce it does not exist anymore. Technically the user could add up to
+ * 256 combinations concurrently, but that would be very time consuming if
+ * the data can not be compressed.
+ *
+ * To be able to build combinations and call different algorithms, all those
+ * algorithms are implementing the same interface. This enables the user to
+ * specify additional combinations while ZRAM is running.
+ */
+
+#include <linux/init.h>
+#include <linux/zbewalgo.h>
+
+#include "include.h"
+
+#include "BWT.h"
+#include "JBE.h"
+#include "JBE2.h"
+#include "MTF.h"
+#include "RLE.h"
+#include "bewalgo.h"
+#include "bewalgo2.h"
+#include "bitshuffle.h"
+#include "huffman.h"
+
+static atomic64_t zbewalgo_stat_combination[256];
+static atomic64_t zbewalgo_stat_count[256];
+
+unsigned long zbewalgo_max_output_size;
+
+/*
+ * all currently available combination sequences of algorithms
+ */
+static struct zbewalgo_combination
+ zbewalgo_combinations[ZBEWALGO_COMBINATION_MAX_ACTIVE];
+static u8 zbewalgo_combinations_count;
+
+/*
+ * maximum required wrkmem for compression and decompression for each instance
+ * of the algorithm
+ */
+static u32 zbewalgo_wrkmem_size;
+
+/*
+ * compression can be aborted if the data is smaller than this threshold to
+ * speed up the algorithm.
+ */
+static u16 zbewalgo_early_abort_size;
+
+/*
+ * each cpu has its own independent compression history to avoid locks
+ */
+static struct zbewalgo_main_data __percpu *zbewalgo_main_data_ptr;
+
+/*
+ * all available algorithms
+ */
+static struct zbewalgo_alg
+ zbewalgo_base_algorithms[ZBEWALGO_MAX_BASE_ALGORITHMS];
+static u8 zbewalgo_base_algorithms_count;
+
+/*
+ * returns the required size of wrkmem for compression and decompression
+ */
+int zbewalgo_get_wrkmem_size(void)
+{
+ return zbewalgo_wrkmem_size;
+}
+EXPORT_SYMBOL(zbewalgo_get_wrkmem_size);
+
+/*
+ * this function adds a combination to the set of enabled combinations or
+ * if 'flag' is set, replaces all combinations with the newly supplied one.
+ * this function is called from the sysfs context, and therefore accepts
+ * a string as input.
+ */
+static __always_inline int add_set_combination(const char * const string,
+ const int string_length,
+ int flag)
+{
+ /* points behind the string for fast looping over the entire string */
+ const char * const string_end = string + string_length;
+ /* the string up to 'anchor' is parsed */
+ const char *anchor = string;
+ const char *pos = string;
+ struct zbewalgo_combination combination;
+ u8 i;
+
+ memset(&combination, 0, sizeof(struct zbewalgo_combination));
+ /* loop over entire string */
+ while ((pos < string_end) && (*pos != 0)) {
+ while ((pos < string_end)
+ && (*pos != 0)
+ && (*pos != '-')
+ && (*pos != '\n'))
+ pos++;
+ if (pos - anchor <= 0) {
+ /* skipp leading or consecutive '-' chars */
+ pos++;
+ anchor = pos;
+ continue;
+ }
+ /* find the algorithm by name in the list of all algorithms */
+ for (i = 0; i < zbewalgo_base_algorithms_count; i++) {
+ if (((unsigned int)(pos - anchor)
+ == strlen(zbewalgo_base_algorithms[i].name))
+ && (memcmp(anchor,
+ zbewalgo_base_algorithms[i].name,
+ pos - anchor)
+ == 0)) {
+ /* found algorithm */
+ combination.ids[combination.count++] =
+ zbewalgo_base_algorithms[i].id;
+ break;
+ }
+ }
+ /*
+ * abort parsing if maximum of algorithms reached or
+ * if string is parsed completely
+ */
+ if (combination.count >= ZBEWALGO_COMBINATION_MAX_IDS
+ || (*pos != '-'))
+ goto _finalize;
+ if (i == zbewalgo_base_algorithms_count)
+ /* misstyped arguments */
+ return -EINVAL;
+ pos++;
+ anchor = pos;
+ }
+_finalize:
+ if (combination.count) {
+ /* if combination has at least 1 algorithm */
+ if (flag == 1)
+ zbewalgo_combinations_count = 0;
+ /* don't store the same combination twice */
+ for (i = 0; i < zbewalgo_combinations_count; i++)
+ if (memcmp(&zbewalgo_combinations[i], &combination,
+ sizeof(struct zbewalgo_combination)) == 0) {
+ return 0;
+ }
+ /* store the new combination */
+ memcpy(&zbewalgo_combinations[zbewalgo_combinations_count],
+ &combination, sizeof(struct zbewalgo_combination));
+ zbewalgo_combinations_count++;
+ return 0;
+ }
+ /* empty algorithm is not allowed */
+ return -EINVAL;
+}
+
+int zbewalgo_add_combination(const char * const string,
+ const int string_length)
+{
+ return add_set_combination(string, string_length, 0);
+}
+EXPORT_SYMBOL(zbewalgo_add_combination);
+
+int zbewalgo_set_combination(const char * const string,
+ const int string_length)
+{
+ return add_set_combination(string, string_length, 1);
+}
+EXPORT_SYMBOL(zbewalgo_set_combination);
+
+int zbewalgo_compress(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length)
+{
+ struct zbewalgo_main_data * const main_data =
+ get_cpu_ptr(zbewalgo_main_data_ptr);
+ u16 * const wrkmem1 = PTR_ALIGN(wrkmem, 8);
+ u8 *dest_best = (u8 *)wrkmem1;
+ u8 *dest1 = (u8 *)(wrkmem1 + 4096);
+ u8 *dest2 = (u8 *)(wrkmem1 + 4096 * 2);
+ u16 * const wrk = wrkmem1 + 4096 * 3;
+ u8 *dest_tmp;
+ const u8 *current_source;
+ u8 i, j;
+ u16 dest_best_size = ZBEWALGO_BUFFER_SIZE;
+ int dest_current_size;
+ u8 dest_best_id = 0;
+ u8 i_from = main_data->best_id
+ + (main_data->counter_search-- == 0);
+ u8 i_to = zbewalgo_combinations_count;
+ u8 looped = 0;
+ u16 local_abort_size = max_t(u16,
+ main_data->best_id_accepted_size,
+ zbewalgo_early_abort_size);
+ u16 counter = 0;
+ struct zbewalgo_alg *alg;
+ struct zbewalgo_combination * const dest_best_combination =
+ (struct zbewalgo_combination *)dest;
+ u8 k;
+
+ if (source_length > 4096) {
+ /*
+ * This algorithm is optimized for small data buffers
+ * and can not handle larger inputs.
+ */
+ return -EINVAL;
+ }
+_begin:
+ /*
+ * loop over zbewalgo_combinations_count starting with the last
+ * successful combination
+ */
+ i = i_from;
+ while (i < i_to) {
+ current_source = source;
+ dest_current_size = source_length;
+ counter++;
+ for (j = 0; j < zbewalgo_combinations[i].count; j++) {
+ k = zbewalgo_combinations[i].ids[j];
+ alg = &zbewalgo_base_algorithms[k];
+ dest_current_size = alg->compress(current_source,
+ dest2, wrk,
+ dest_current_size);
+ if (dest_current_size <= 0)
+ goto _next_algorithm;
+ current_source = dest2;
+ dest_tmp = dest2;
+ dest2 = dest1;
+ dest1 = dest_tmp;
+ if (dest_current_size < dest_best_size) {
+ /*
+ * if a higher compression ratio is found,
+ * update to the best
+ */
+ dest_best_id = i;
+ dest_best_size = dest_current_size;
+ dest_tmp = dest_best;
+ dest_best = dest1;
+ dest1 = dest_tmp;
+ memcpy(dest,
+ &zbewalgo_combinations[i],
+ sizeof(struct zbewalgo_combination));
+ /*
+ * partial combination is allowed, if its
+ * compression ratio is high
+ */
+ dest_best_combination->count = j;
+ }
+ }
+_next_algorithm:
+ if (dest_best_size < local_abort_size)
+ goto _early_abort;
+ local_abort_size = zbewalgo_early_abort_size;
+ i++;
+ }
+ if (!(looped++) && i_from > 0) {
+ i_to = min_t(u8, i_from, zbewalgo_combinations_count);
+ i_from = 0;
+ goto _begin;
+ }
+ if (dest_best_size > zbewalgo_max_output_size) {
+ /* not compressible */
+ return -EINVAL;
+ }
+_early_abort:
+ atomic64_inc(&zbewalgo_stat_combination[dest_best_id]);
+ atomic64_inc(&zbewalgo_stat_count[counter]);
+ main_data->best_id = dest_best_id;
+ main_data->best_id_accepted_size =
+ max_t(u16,
+ dest_best_size + (dest_best_size >> 3),
+ zbewalgo_early_abort_size);
+ main_data->best_id_accepted_size =
+ min_t(u16,
+ main_data->best_id_accepted_size,
+ zbewalgo_max_output_size);
+ memcpy(dest + sizeof(struct zbewalgo_combination),
+ dest_best, dest_best_size);
+ return sizeof(struct zbewalgo_combination) + dest_best_size;
+}
+EXPORT_SYMBOL(zbewalgo_compress);
+
+static __always_inline int zbewalgo_decompress(const u8 * const source,
+ u8 * const dest,
+ u16 * const wrkmem,
+ const u16 source_length,
+ const int safe_mode)
+{
+ const u16 s_length = source_length
+ - sizeof(struct zbewalgo_combination);
+ u16 * const wrkmem1 = PTR_ALIGN(wrkmem, 8);
+ u8 *dest1 = (u8 *)wrkmem1;
+ u8 *dest2 = (u8 *)(wrkmem1 + 4096);
+ u16 *wrk = wrkmem1 + 4096 * 2;
+ const struct zbewalgo_combination * const combination =
+ (struct zbewalgo_combination *)source;
+ u8 j;
+ short res;
+ struct zbewalgo_alg *alg;
+ int (*decompress)(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length);
+ u8 count = combination->count + 1;
+
+ if (safe_mode && unlikely(s_length > 4096))
+ return -EINVAL;
+ if (safe_mode && unlikely(count > 7))
+ return -EINVAL;
+
+ if (count == 1) {
+ if (safe_mode
+ && unlikely(combination->ids[0]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[0]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(source + sizeof(struct zbewalgo_combination),
+ dest, wrk, s_length);
+ return res;
+ }
+ if (safe_mode && unlikely(combination->ids[count - 1]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[count - 1]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(source + sizeof(struct zbewalgo_combination),
+ dest1, wrk, s_length);
+ if (res < 0)
+ return res;
+ for (j = count - 2; j > 1; j -= 2) {
+ if (safe_mode
+ && unlikely(combination->ids[j]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[j]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(dest1, dest2, wrk, res);
+ if (res < 0)
+ return res;
+ if (safe_mode
+ && unlikely(combination->ids[j - 1]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[j - 1]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(dest2, dest1, wrk, res);
+ if (res < 0)
+ return res;
+ }
+ if (j == 1) {
+ if (safe_mode
+ && unlikely(combination->ids[1]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[1]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(dest1, dest2, wrk, res);
+ if (res < 0)
+ return res;
+ if (safe_mode
+ && unlikely(combination->ids[0]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[0]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(dest2, dest, wrk, res);
+ return res;
+ }
+ if (safe_mode
+ && unlikely(combination->ids[0]
+ >= zbewalgo_base_algorithms_count))
+ return -EINVAL;
+ alg = &zbewalgo_base_algorithms[combination->ids[0]];
+ if (safe_mode)
+ decompress = alg->decompress_safe;
+ else
+ decompress = alg->decompress_fast;
+ res = decompress(dest1, dest, wrk, res);
+ return res;
+}
+
+int zbewalgo_decompress_safe(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length)
+{
+ return zbewalgo_decompress(source, dest, wrkmem, source_length, 1);
+}
+EXPORT_SYMBOL(zbewalgo_decompress_safe);
+int zbewalgo_decompress_fast(const u8 * const source, u8 * const dest,
+ u16 * const wrkmem, const u16 source_length)
+{
+ return zbewalgo_decompress(source, dest, wrkmem, source_length, 0);
+}
+EXPORT_SYMBOL(zbewalgo_decompress_fast);
+
+#define add_combination_compile_time(name) \
+ zbewalgo_add_combination(name, sizeof(name))
+
+static ssize_t zbewalgo_combinations_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ ssize_t res = 0;
+ ssize_t tmp;
+ u8 i, j;
+ struct zbewalgo_combination *com;
+
+ tmp = scnprintf(buf, PAGE_SIZE - res, "combinations={\n");
+ res = tmp;
+ buf += tmp;
+ for (i = 0; i < zbewalgo_combinations_count; i++) {
+ com = &zbewalgo_combinations[i];
+ res += tmp = scnprintf(buf, PAGE_SIZE - res,
+ "\tcombination[%d]=", i);
+ buf += tmp;
+ for (j = 0; j < com->count - 1; j++) {
+ res += tmp = scnprintf(buf, PAGE_SIZE - res, "%s-",
+ zbewalgo_base_algorithms[com->ids[j]].name);
+ buf += tmp;
+ }
+ res += tmp = scnprintf(buf, PAGE_SIZE - res, "%s\n",
+ zbewalgo_base_algorithms[com->ids[j]].name);
+ buf += tmp;
+ }
+ res += tmp = scnprintf(buf, PAGE_SIZE - res, "}\n");
+ return res;
+}
+
+static __always_inline void zbewalgo_combinations_reset(void)
+{
+ zbewalgo_combinations_count = 0;
+ add_combination_compile_time("bwt-mtf-huffman-jbe-rle");
+ add_combination_compile_time("bitshuffle-rle-bitshuffle-rle");
+ add_combination_compile_time("bewalgo2-bitshuffle-rle");
+ add_combination_compile_time("bitshuffle-jbe-mtf-huffman-jbe");
+ add_combination_compile_time("bitshuffle-bewalgo2-mtf-bewalgo-jbe2");
+ add_combination_compile_time("mtf-bewalgo-huffman-jbe-rle");
+ add_combination_compile_time("jbe-rle-bitshuffle-rle");
+ add_combination_compile_time("mtf-mtf-jbe-jbe-rle");
+ add_combination_compile_time("jbe2-bitshuffle-rle");
+ add_combination_compile_time("jbe-mtf-jbe-rle");
+// add_combination_compile_time("bewalgo2-bitshuffle-jbe-rle");
+// add_combination_compile_time("bwt-mtf-bewalgo-huffman");
+// add_combination_compile_time("bitshuffle-bewalgo2-mtf-bewalgo-jbe");
+// add_combination_compile_time("bitshuffle-rle-bitshuffle-rle");
+}
+
+static ssize_t zbewalgo_combinations_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ ssize_t res;
+
+ if (count < 5)
+ return -EINVAL;
+ if (memcmp(buf, "add ", 4) == 0) {
+ res = zbewalgo_add_combination(buf + 4, count - 4);
+ return res < 0 ? res : count;
+ }
+ if (memcmp(buf, "set ", 4) == 0) {
+ res = zbewalgo_set_combination(buf + 4, count - 4);
+ return res < 0 ? res : count;
+ }
+ if (memcmp(buf, "reset", 5) == 0) {
+ zbewalgo_combinations_reset();
+ return count;
+ }
+ return -EINVAL;
+}
+
+static ssize_t zbewalgo_max_output_size_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return scnprintf(buf, PAGE_SIZE, "%lu", zbewalgo_max_output_size);
+}
+
+static ssize_t zbewalgo_max_output_size_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ char localbuf[10];
+ ssize_t res;
+ unsigned long tmp;
+
+ memcpy(&localbuf[0], buf, count < 10 ? count : 10);
+ localbuf[count < 9 ? count : 9] = 0;
+ res = kstrtoul(localbuf, 10, &tmp);
+ if (tmp <= 4096 - sizeof(struct zbewalgo_combination)) {
+ zbewalgo_max_output_size = tmp;
+ if (zbewalgo_max_output_size < zbewalgo_early_abort_size)
+ zbewalgo_early_abort_size
+ = zbewalgo_max_output_size >> 1;
+ } else {
+ return -EINVAL;
+ }
+ return res < 0 ? res : count;
+}
+
+static ssize_t zbewalgo_early_abort_size_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+char *buf)
+{
+ return scnprintf(buf, PAGE_SIZE, "%u", zbewalgo_early_abort_size);
+}
+
+static ssize_t zbewalgo_early_abort_size_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ char localbuf[10];
+ ssize_t res;
+ unsigned long tmp;
+
+ memcpy(&localbuf[0], buf, count < 10 ? count : 10);
+ localbuf[count < 9 ? count : 9] = 0;
+ res = kstrtoul(localbuf, 10, &tmp);
+ if (tmp <= zbewalgo_max_output_size)
+ zbewalgo_early_abort_size = tmp;
+ else
+ return -EINVAL;
+ return res < 0 ? res : count;
+}
+
+static ssize_t zbewalgo_bwt_max_alphabet_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return scnprintf(buf, PAGE_SIZE, "%lu", zbewalgo_bwt_max_alphabet);
+}
+
+static ssize_t zbewalgo_bwt_max_alphabet_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ char localbuf[10];
+ ssize_t res;
+
+ memcpy(&localbuf[0], buf, count < 10 ? count : 10);
+ localbuf[count < 9 ? count : 9] = 0;
+ res = kstrtoul(localbuf, 10, &zbewalgo_bwt_max_alphabet);
+ return res < 0 ? res : count;
+}
+
+static struct kobj_attribute zbewalgo_combinations_attribute =
+ __ATTR(combinations, 0664, zbewalgo_combinations_show,
+ zbewalgo_combinations_store);
+static struct kobj_attribute zbewalgo_max_output_size_attribute =
+ __ATTR(max_output_size, 0664, zbewalgo_max_output_size_show,
+ zbewalgo_max_output_size_store);
+static struct kobj_attribute zbewalgo_early_abort_size_attribute =
+ __ATTR(early_abort_size, 0664, zbewalgo_early_abort_size_show,
+ zbewalgo_early_abort_size_store);
+static struct kobj_attribute zbewalgo_bwt_max_alphabet_attribute =
+ __ATTR(bwt_max_alphabet, 0664, zbewalgo_bwt_max_alphabet_show,
+ zbewalgo_bwt_max_alphabet_store);
+static struct attribute *attrs[] = {
+ &zbewalgo_combinations_attribute.attr,
+ &zbewalgo_max_output_size_attribute.attr,
+ &zbewalgo_early_abort_size_attribute.attr,
+ &zbewalgo_bwt_max_alphabet_attribute.attr,
+ NULL,
+};
+
+static struct attribute_group attr_group = {
+ .attrs = attrs,
+};
+
+static struct kobject *zbewalgo_kobj;
+
+static int __init zbewalgo_mod_init(void)
+{
+ u16 i;
+ int j;
+ int res = 0;
+
+ zbewalgo_early_abort_size = 400;
+ /*
+ * this algorithm is intended to be used for zram with zsmalloc.
+ * Therefore zbewalgo_max_output_size equals zsmalloc max size class
+ */
+ i = 0;
+ zbewalgo_max_output_size =
+ 3264 /* largest reported size_class by zsmalloc */
+ - (sizeof(unsigned long)) /* zsmalloc internal overhead */
+ - sizeof(struct zbewalgo_combination);/* zbewalgo overhead */
+ zbewalgo_base_algorithms[i++] = alg_bewalgo;
+ zbewalgo_base_algorithms[i++] = alg_bewalgo2;
+ zbewalgo_base_algorithms[i++] = alg_bitshuffle;
+ zbewalgo_base_algorithms[i++] = alg_bwt;
+ zbewalgo_base_algorithms[i++] = alg_jbe;
+ zbewalgo_base_algorithms[i++] = alg_jbe2;
+ zbewalgo_base_algorithms[i++] = alg_mtf;
+ zbewalgo_base_algorithms[i++] = alg_rle;
+ zbewalgo_base_algorithms[i++] = alg_huffman;
+ zbewalgo_base_algorithms_count = i;
+ /*
+ * the wrkmem size is the largest wrkmem required by any callable
+ * algorithm
+ */
+ zbewalgo_wrkmem_size = 0;
+ for (i = 0; i < zbewalgo_base_algorithms_count; i++) {
+ res = zbewalgo_base_algorithms[i].init();
+ if (res < 0) {
+ if (i > 0)
+ zbewalgo_base_algorithms[0].exit();
+ i--;
+ while (i > 0)
+ zbewalgo_base_algorithms[i].exit();
+ return res;
+ }
+ zbewalgo_base_algorithms[i].id = i;
+ zbewalgo_wrkmem_size = max_t(u32,
+ zbewalgo_wrkmem_size,
+ zbewalgo_base_algorithms[i].wrkmem_size);
+ }
+ /* adding some pages for temporary compression results */
+ zbewalgo_wrkmem_size += 4096 * 6;
+ zbewalgo_wrkmem_size += 8;/* for alignment */
+ zbewalgo_main_data_ptr = alloc_percpu(struct zbewalgo_main_data);
+ for_each_possible_cpu(j) {
+ memset(per_cpu_ptr(zbewalgo_main_data_ptr, j),
+ 0, sizeof(struct zbewalgo_main_data));
+ }
+ for (i = 0; i < 256; i++) {
+ atomic64_set(&zbewalgo_stat_combination[i], 0);
+ atomic64_set(&zbewalgo_stat_count[i], 0);
+ }
+
+ zbewalgo_kobj = kobject_create_and_add("zbewalgo", kernel_kobj);
+ if (!zbewalgo_kobj) {
+ /* allocation error */
+ return -ENOMEM;
+ }
+ res = sysfs_create_group(zbewalgo_kobj, &attr_group);
+ if (res)
+ kobject_put(zbewalgo_kobj);
+ zbewalgo_combinations_reset();
+ return res;
+}
+
+void static __exit zbewalgo_mod_fini(void)
+{
+ u16 i;
+ u64 tmp;
+
+ kobject_put(zbewalgo_kobj);
+ for (i = 0; i < zbewalgo_base_algorithms_count; i++)
+ zbewalgo_base_algorithms[i].exit();
+ free_percpu(zbewalgo_main_data_ptr);
+ /* log statistics via printk for debugging purpose */
+ for (i = 0; i < 256; i++) {
+ tmp = atomic64_read(&zbewalgo_stat_combination[i]);
+ if (tmp > 0)
+ printk(KERN_INFO "%s %4d -> %llu combination\n",
+ __func__, i, tmp);
+ }
+ for (i = 0; i < 256; i++) {
+ tmp = atomic64_read(&zbewalgo_stat_count[i]);
+ if (tmp > 0)
+ printk(KERN_INFO "%s %4d -> %llu counter\n",
+ __func__, i, tmp);
+ }
+}
+
+module_init(zbewalgo_mod_init);
+module_exit(zbewalgo_mod_fini);
+
+MODULE_AUTHOR("Benjamin Warnke <[email protected]>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("zBeWalgo Compression Algorithm");
--
2.14.1
This patch adds zBeWalgo to the crypto api so that zBeWalgo can be used by
zram.
Signed-off-by: Benjamin Warnke <[email protected]>
---
crypto/Kconfig | 12 ++++
crypto/Makefile | 1 +
crypto/testmgr.c | 10 +++
crypto/testmgr.h | 134 ++++++++++++++++++++++++++++++++++++
crypto/zbewalgo.c | 164 +++++++++++++++++++++++++++++++++++++++++++++
drivers/block/zram/zcomp.c | 3 +
6 files changed, 324 insertions(+)
create mode 100644 crypto/zbewalgo.c
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 76e8c88c..749457a6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1686,6 +1686,18 @@ config CRYPTO_LZ4
help
This is the LZ4 algorithm.
+config CRYPTO_ZBEWALGO
+ tristate "zBeWalgo compression algorithm"
+ select CRYPTO_ALGAPI
+ select CRYPTO_ACOMP2
+ select ZBEWALGO_COMPRESS
+ help
+ This is the zBeWalgo compression algorithm. This algorithm
+ accepts only input sizes of at most one page at once.
+ To achieve high compression ratios zbewalgo can call multiple
+ transformation and compression algorithms in a row to optimize
+ the compressed size.
+
config CRYPTO_LZ4HC
tristate "LZ4HC compression algorithm"
select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index 4fc69fe9..493cbd65 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_CRYPTO_CRCT10DIF) += crct10dif_common.o crct10dif_generic.o
obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o
obj-$(CONFIG_CRYPTO_LZO) += lzo.o
obj-$(CONFIG_CRYPTO_LZ4) += lz4.o
+obj-$(CONFIG_CRYPTO_ZBEWALGO) += zbewalgo.o
obj-$(CONFIG_CRYPTO_LZ4HC) += lz4hc.o
obj-$(CONFIG_CRYPTO_842) += 842.o
obj-$(CONFIG_CRYPTO_RNG2) += rng.o
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index af4a01c5..53fd43d1 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3611,6 +3611,16 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(tf_xts_dec_tv_template)
}
}
+ }, {
+ .alg = "zbewalgo",
+ .test = alg_test_comp,
+ .fips_allowed = 1,
+ .suite = {
+ .comp = {
+ .comp = __VECS(zbewalgo_comp_tv_template),
+ .decomp = __VECS(zbewalgo_decomp_tv_template)
+ }
+ }
}, {
.alg = "zlib-deflate",
.test = alg_test_comp,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 004c0a0f..960bfbcf 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -37009,6 +37009,140 @@ static const struct hash_testvec bfin_crc_tv_template[] = {
};
+static const struct comp_testvec zbewalgo_comp_tv_template[] = {
+ {
+ .inlen = 512,
+ .outlen = 402,
+ .input =
+ "\x8a\x3a\xf3\xbe\x33\xf9\xab\x3d\xa1\x51\x9f\x7f\xad\xf6\xab\x3d"
+ "\xad\x29\x8f\x3c\x27\xf4\xab\x3d\x06\x19\xc3\xf5\xa0\xf1\xab\x3d"
+ "\xfb\x75\x3b\xab\x1a\xef\xab\x3d\xe3\x96\xf8\x5c\x94\xec\xab\x3d"
+ "\x13\xd2\xfa\x0a\x0e\xea\xab\x3d\xe0\x7d\x42\xb5\x87\xe7\xab\x3d"
+ "\xa1\xf0\xcf\x5b\x01\xe5\xab\x3d\xad\x80\xa3\xfe\x7a\xe2\xab\x3d"
+ "\x59\x84\xbd\x9d\xf4\xdf\xab\x3d\xff\x51\x1e\x39\x6e\xdd\xab\x3d"
+ "\xf5\x3f\xc6\xd0\xe7\xda\xab\x3d\x96\xa4\xb5\x64\x61\xd8\xab\x3d"
+ "\x3b\xd6\xec\xf4\xda\xd5\xab\x3d\x3b\x2b\x6c\x81\x54\xd3\xab\x3d"
+ "\xf2\xf9\x33\x0a\xce\xd0\xab\x3d\xbb\x98\x44\x8f\x47\xce\xab\x3d"
+ "\xed\x5d\x9e\x10\xc1\xcb\xab\x3d\xe7\x9f\x41\x8e\x3a\xc9\xab\x3d"
+ "\x07\xb5\x2e\x08\xb4\xc6\xab\x3d\xa9\xf3\x65\x7e\x2d\xc4\xab\x3d"
+ "\x28\xb2\xe7\xf0\xa6\xc1\xab\x3d\xe3\x46\xb4\x5f\x20\xbf\xab\x3d"
+ "\x38\x08\xcc\xca\x99\xbc\xab\x3d\x85\x4c\x2f\x32\x13\xba\xab\x3d"
+ "\x2a\x6a\xde\x95\x8c\xb7\xab\x3d\x85\xb7\xd9\xf5\x05\xb5\xab\x3d"
+ "\xf7\x8a\x21\x52\x7f\xb2\xab\x3d\xe2\x3a\xb6\xaa\xf8\xaf\xab\x3d"
+ "\xa5\x1d\x98\xff\x71\xad\xab\x3d\xa3\x89\xc7\x50\xeb\xaa\xab\x3d"
+ "\x3d\xd5\x44\x9e\x64\xa8\xab\x3d\xd6\x56\x10\xe8\xdd\xa5\xab\x3d"
+ "\xce\x64\x2a\x2e\x57\xa3\xab\x3d\x8d\x55\x93\x70\xd0\xa0\xab\x3d"
+ "\x76\x7f\x4b\xaf\x49\x9e\xab\x3d\xeb\x38\x53\xea\xc2\x9b\xab\x3d"
+ "\x53\xd8\xaa\x21\x3c\x99\xab\x3d\x13\xb4\x52\x55\xb5\x96\xab\x3d"
+ "\x92\x22\x4b\x85\x2e\x94\xab\x3d\x35\x7a\x94\xb1\xa7\x91\xab\x3d"
+ "\x65\x11\x2f\xda\x20\x8f\xab\x3d\x86\x3e\x1b\xff\x99\x8c\xab\x3d"
+ "\x02\x58\x59\x20\x13\x8a\xab\x3d\x41\xb4\xe9\x3d\x8c\x87\xab\x3d"
+ "\xaf\xa9\xcc\x57\x05\x85\xab\x3d\xb5\x8e\x02\x6e\x7e\x82\xab\x3d"
+ "\xb9\xb9\x8b\x80\xf7\x7f\xab\x3d\x27\x81\x68\x8f\x70\x7d\xab\x3d"
+ "\x6a\x3b\x99\x9a\xe9\x7a\xab\x3d\xef\x3e\x1e\xa2\x62\x78\xab\x3d"
+ "\x20\xe2\xf7\xa5\xdb\x75\xab\x3d\x6a\x7b\x26\xa6\x54\x73\xab\x3d"
+ "\x3b\x61\xaa\xa2\xcd\x70\xab\x3d\xfe\xe9\x83\x9b\x46\x6e\xab\x3d"
+ "\x22\x6c\xb3\x90\xbf\x6b\xab\x3d\x16\x3e\x39\x82\x38\x69\xab\x3d"
+ "\x48\xb6\x15\x70\xb1\x66\xab\x3d\x26\x2b\x49\x5a\x2a\x64\xab\x3d"
+ "\x23\xf3\xd3\x40\xa3\x61\xab\x3d\xae\x64\xb6\x23\x1c\x5f\xab\x3d"
+ "\x37\xd6\xf0\x02\x95\x5c\xab\x3d\x30\x9e\x83\xde\x0d\x5a\xab\x3d",
+ .output =
+ "\x02\x02\x07\x02\x07\x00\x00\x00\x0f\x00\x02\x8a\xa1\xad\x06\xfb"
+ "\xe3\x13\xe0\xa1\xad\x59\xff\xf5\x96\x81\x3b\x7f\xf2\xbb\xed\xe7"
+ "\x07\xa9\x28\xe3\x38\x85\x2a\x85\xf7\xe2\xa5\xa3\x3d\xd6\xce\x8d"
+ "\x76\xeb\x53\x13\x92\x35\x65\x86\x02\x41\xaf\xb5\xb9\x27\x6a\xef"
+ "\x20\x6a\x3b\xfe\x22\x16\x48\x26\x23\xae\x37\x30\x3a\x51\x29\x19"
+ "\x75\x96\xd2\x7d\xf0\x80\x84\x51\x3f\xa4\xd6\x2b\xf9\x98\x5d\x9f"
+ "\xb5\xf3\xb2\x46\x08\x4c\x6a\xb7\x8a\x3a\x1d\x89\xd5\x56\x64\x55"
+ "\x7f\x38\xd8\xb4\x22\x7a\x11\x3e\x58\xb4\xa9\x8e\xb9\x81\x3b\x3e"
+ "\xe2\x7b\x61\xe9\x6c\x3e\xb6\x2b\xf3\x64\xd6\x9e\xf3\x9f\x8f\xc3"
+ "\x3b\xf8\xfa\x42\xcf\xa3\xbd\x1e\xc6\xb5\xec\x6c\x7f\x33\x44\x9e"
+ "\x41\x2e\x65\xe7\xb4\xcc\x2f\xde\xd9\x21\xb6\x98\xc7\x44\x10\x2a"
+ "\x93\x4b\x53\xaa\x52\x4b\x94\x2f\x1b\x59\xe9\xcc\x02\x8b\x68\x99"
+ "\x1e\xf7\x26\xaa\x83\xb3\x39\x15\x49\xd3\xb6\xf0\x83\xbe\x7f\x3c"
+ "\xf5\xab\x5c\x0a\xb5\x5b\xfe\x9d\x39\xd0\x64\xf4\x81\x0a\x8f\x10"
+ "\x8e\x08\x7e\xf0\x5f\xca\x32\x95\xf5\x52\xaa\xff\x50\x9e\xe8\x2e"
+ "\x70\xaf\xea\x21\x55\x85\xb1\xda\xff\x20\x3d\x57\x6e\x80\x8f\x9a"
+ "\xa2\xa5\xa6\xa2\x9b\x90\x82\x70\x5a\x40\x23\x02\xde\x33\xad\x27"
+ "\xa0\x1a\x94\x0e\x87\x01\x7a\xf4\x6e\xe7\x61\xda\x54\x6f\xce\x47"
+ "\xc1\x3a\xb4\x2d\xa6\x20\x99\x13\x8c\x05\x7f\xf8\x71\xeb\x64\xdd"
+ "\x57\xd0\x49\xc2\x3c\xb5\x2e\xa7\x20\x99\x13\x8c\x05\x7e\xf7\x70"
+ "\xe9\x62\xdb\x54\xcd\x46\xbf\x38\xb1\x2a\xa3\x1c\x95\x0d\xf9\xf6"
+ "\xf4\xf1\xef\xec\xea\xe7\xe5\xe2\xdf\xdd\xda\xd8\xd5\xd3\xd0\xce"
+ "\xcb\xc9\xc6\xc4\xc1\xbf\xbc\xba\xb7\xb5\xb2\xaf\xad\xaa\xa8\xa5"
+ "\xa3\xa0\x9e\x9b\x99\x96\x94\x91\x8f\x8c\x8a\x87\x85\x82\x7f\x7d"
+ "\x7a\x78\x75\x73\x70\x6e\x6b\x69\x66\x64\x61\x5f\x5c\x5a\xbf\xab"
+ "\xbf\x3d",
+ },
+};
+
+static const struct comp_testvec zbewalgo_decomp_tv_template[] = {
+ {
+ .inlen = 402,
+ .outlen = 512,
+ .input =
+ "\x02\x02\x07\x02\x07\x00\x00\x00\x0f\x00\x02\x8a\xa1\xad\x06\xfb"
+ "\xe3\x13\xe0\xa1\xad\x59\xff\xf5\x96\x81\x3b\x7f\xf2\xbb\xed\xe7"
+ "\x07\xa9\x28\xe3\x38\x85\x2a\x85\xf7\xe2\xa5\xa3\x3d\xd6\xce\x8d"
+ "\x76\xeb\x53\x13\x92\x35\x65\x86\x02\x41\xaf\xb5\xb9\x27\x6a\xef"
+ "\x20\x6a\x3b\xfe\x22\x16\x48\x26\x23\xae\x37\x30\x3a\x51\x29\x19"
+ "\x75\x96\xd2\x7d\xf0\x80\x84\x51\x3f\xa4\xd6\x2b\xf9\x98\x5d\x9f"
+ "\xb5\xf3\xb2\x46\x08\x4c\x6a\xb7\x8a\x3a\x1d\x89\xd5\x56\x64\x55"
+ "\x7f\x38\xd8\xb4\x22\x7a\x11\x3e\x58\xb4\xa9\x8e\xb9\x81\x3b\x3e"
+ "\xe2\x7b\x61\xe9\x6c\x3e\xb6\x2b\xf3\x64\xd6\x9e\xf3\x9f\x8f\xc3"
+ "\x3b\xf8\xfa\x42\xcf\xa3\xbd\x1e\xc6\xb5\xec\x6c\x7f\x33\x44\x9e"
+ "\x41\x2e\x65\xe7\xb4\xcc\x2f\xde\xd9\x21\xb6\x98\xc7\x44\x10\x2a"
+ "\x93\x4b\x53\xaa\x52\x4b\x94\x2f\x1b\x59\xe9\xcc\x02\x8b\x68\x99"
+ "\x1e\xf7\x26\xaa\x83\xb3\x39\x15\x49\xd3\xb6\xf0\x83\xbe\x7f\x3c"
+ "\xf5\xab\x5c\x0a\xb5\x5b\xfe\x9d\x39\xd0\x64\xf4\x81\x0a\x8f\x10"
+ "\x8e\x08\x7e\xf0\x5f\xca\x32\x95\xf5\x52\xaa\xff\x50\x9e\xe8\x2e"
+ "\x70\xaf\xea\x21\x55\x85\xb1\xda\xff\x20\x3d\x57\x6e\x80\x8f\x9a"
+ "\xa2\xa5\xa6\xa2\x9b\x90\x82\x70\x5a\x40\x23\x02\xde\x33\xad\x27"
+ "\xa0\x1a\x94\x0e\x87\x01\x7a\xf4\x6e\xe7\x61\xda\x54\x6f\xce\x47"
+ "\xc1\x3a\xb4\x2d\xa6\x20\x99\x13\x8c\x05\x7f\xf8\x71\xeb\x64\xdd"
+ "\x57\xd0\x49\xc2\x3c\xb5\x2e\xa7\x20\x99\x13\x8c\x05\x7e\xf7\x70"
+ "\xe9\x62\xdb\x54\xcd\x46\xbf\x38\xb1\x2a\xa3\x1c\x95\x0d\xf9\xf6"
+ "\xf4\xf1\xef\xec\xea\xe7\xe5\xe2\xdf\xdd\xda\xd8\xd5\xd3\xd0\xce"
+ "\xcb\xc9\xc6\xc4\xc1\xbf\xbc\xba\xb7\xb5\xb2\xaf\xad\xaa\xa8\xa5"
+ "\xa3\xa0\x9e\x9b\x99\x96\x94\x91\x8f\x8c\x8a\x87\x85\x82\x7f\x7d"
+ "\x7a\x78\x75\x73\x70\x6e\x6b\x69\x66\x64\x61\x5f\x5c\x5a\xbf\xab"
+ "\xbf\x3d",
+ .output =
+ "\x8a\x3a\xf3\xbe\x33\xf9\xab\x3d\xa1\x51\x9f\x7f\xad\xf6\xab\x3d"
+ "\xad\x29\x8f\x3c\x27\xf4\xab\x3d\x06\x19\xc3\xf5\xa0\xf1\xab\x3d"
+ "\xfb\x75\x3b\xab\x1a\xef\xab\x3d\xe3\x96\xf8\x5c\x94\xec\xab\x3d"
+ "\x13\xd2\xfa\x0a\x0e\xea\xab\x3d\xe0\x7d\x42\xb5\x87\xe7\xab\x3d"
+ "\xa1\xf0\xcf\x5b\x01\xe5\xab\x3d\xad\x80\xa3\xfe\x7a\xe2\xab\x3d"
+ "\x59\x84\xbd\x9d\xf4\xdf\xab\x3d\xff\x51\x1e\x39\x6e\xdd\xab\x3d"
+ "\xf5\x3f\xc6\xd0\xe7\xda\xab\x3d\x96\xa4\xb5\x64\x61\xd8\xab\x3d"
+ "\x3b\xd6\xec\xf4\xda\xd5\xab\x3d\x3b\x2b\x6c\x81\x54\xd3\xab\x3d"
+ "\xf2\xf9\x33\x0a\xce\xd0\xab\x3d\xbb\x98\x44\x8f\x47\xce\xab\x3d"
+ "\xed\x5d\x9e\x10\xc1\xcb\xab\x3d\xe7\x9f\x41\x8e\x3a\xc9\xab\x3d"
+ "\x07\xb5\x2e\x08\xb4\xc6\xab\x3d\xa9\xf3\x65\x7e\x2d\xc4\xab\x3d"
+ "\x28\xb2\xe7\xf0\xa6\xc1\xab\x3d\xe3\x46\xb4\x5f\x20\xbf\xab\x3d"
+ "\x38\x08\xcc\xca\x99\xbc\xab\x3d\x85\x4c\x2f\x32\x13\xba\xab\x3d"
+ "\x2a\x6a\xde\x95\x8c\xb7\xab\x3d\x85\xb7\xd9\xf5\x05\xb5\xab\x3d"
+ "\xf7\x8a\x21\x52\x7f\xb2\xab\x3d\xe2\x3a\xb6\xaa\xf8\xaf\xab\x3d"
+ "\xa5\x1d\x98\xff\x71\xad\xab\x3d\xa3\x89\xc7\x50\xeb\xaa\xab\x3d"
+ "\x3d\xd5\x44\x9e\x64\xa8\xab\x3d\xd6\x56\x10\xe8\xdd\xa5\xab\x3d"
+ "\xce\x64\x2a\x2e\x57\xa3\xab\x3d\x8d\x55\x93\x70\xd0\xa0\xab\x3d"
+ "\x76\x7f\x4b\xaf\x49\x9e\xab\x3d\xeb\x38\x53\xea\xc2\x9b\xab\x3d"
+ "\x53\xd8\xaa\x21\x3c\x99\xab\x3d\x13\xb4\x52\x55\xb5\x96\xab\x3d"
+ "\x92\x22\x4b\x85\x2e\x94\xab\x3d\x35\x7a\x94\xb1\xa7\x91\xab\x3d"
+ "\x65\x11\x2f\xda\x20\x8f\xab\x3d\x86\x3e\x1b\xff\x99\x8c\xab\x3d"
+ "\x02\x58\x59\x20\x13\x8a\xab\x3d\x41\xb4\xe9\x3d\x8c\x87\xab\x3d"
+ "\xaf\xa9\xcc\x57\x05\x85\xab\x3d\xb5\x8e\x02\x6e\x7e\x82\xab\x3d"
+ "\xb9\xb9\x8b\x80\xf7\x7f\xab\x3d\x27\x81\x68\x8f\x70\x7d\xab\x3d"
+ "\x6a\x3b\x99\x9a\xe9\x7a\xab\x3d\xef\x3e\x1e\xa2\x62\x78\xab\x3d"
+ "\x20\xe2\xf7\xa5\xdb\x75\xab\x3d\x6a\x7b\x26\xa6\x54\x73\xab\x3d"
+ "\x3b\x61\xaa\xa2\xcd\x70\xab\x3d\xfe\xe9\x83\x9b\x46\x6e\xab\x3d"
+ "\x22\x6c\xb3\x90\xbf\x6b\xab\x3d\x16\x3e\x39\x82\x38\x69\xab\x3d"
+ "\x48\xb6\x15\x70\xb1\x66\xab\x3d\x26\x2b\x49\x5a\x2a\x64\xab\x3d"
+ "\x23\xf3\xd3\x40\xa3\x61\xab\x3d\xae\x64\xb6\x23\x1c\x5f\xab\x3d"
+ "\x37\xd6\xf0\x02\x95\x5c\xab\x3d\x30\x9e\x83\xde\x0d\x5a\xab\x3d",
+ },
+};
+
static const struct comp_testvec lz4_comp_tv_template[] = {
{
.inlen = 255,
diff --git a/crypto/zbewalgo.c b/crypto/zbewalgo.c
new file mode 100644
index 00000000..044b8811
--- /dev/null
+++ b/crypto/zbewalgo.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Benjamin Warnke <[email protected]>
+ */
+
+#include <crypto/internal/scompress.h>
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/zbewalgo.h>
+
+struct zbewalgo_ctx {
+ void *zbewalgo_comp_mem;
+};
+
+static void *zbewalgo_alloc_ctx(struct crypto_scomp *tfm)
+{
+ void *ctx;
+
+ ctx = vmalloc(zbewalgo_get_wrkmem_size());
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+ return ctx;
+}
+
+static int zbewalgo_init(struct crypto_tfm *tfm)
+{
+ struct zbewalgo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ ctx->zbewalgo_comp_mem = zbewalgo_alloc_ctx(NULL);
+ if (IS_ERR(ctx->zbewalgo_comp_mem))
+ return -ENOMEM;
+ return 0;
+}
+
+static void zbewalgo_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+ vfree(ctx);
+}
+
+static void zbewalgo_exit(struct crypto_tfm *tfm)
+{
+ struct zbewalgo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ zbewalgo_free_ctx(NULL, ctx->zbewalgo_comp_mem);
+}
+
+static int __zbewalgo_compress_crypto(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen, void *ctx)
+{
+ int out_len;
+
+ if (slen > 4096)
+ return -EINVAL;
+ out_len = zbewalgo_compress(src, dst, ctx, slen);
+ if (!out_len)
+ return -EINVAL;
+ *dlen = out_len;
+ return 0;
+}
+
+static int zbewalgo_scompress(struct crypto_scomp *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ return __zbewalgo_compress_crypto(src, slen, dst, dlen, ctx);
+}
+
+static int zbewalgo_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen)
+{
+ struct zbewalgo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zbewalgo_compress_crypto(src, slen, dst, dlen,
+ ctx->zbewalgo_comp_mem);
+}
+
+static int __zbewalgo_decompress_crypto(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen, void *ctx)
+{
+ int out_len;
+
+ out_len = zbewalgo_decompress_safe(src, dst, ctx, slen);
+ if (out_len < 0)
+ return -EINVAL;
+ *dlen = out_len;
+ return 0;
+}
+
+static int zbewalgo_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ return __zbewalgo_decompress_crypto(src, slen, dst, dlen, ctx);
+}
+
+static int zbewalgo_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst,
+ unsigned int *dlen)
+{
+ struct zbewalgo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zbewalgo_decompress_crypto(src, slen, dst, dlen,
+ ctx->zbewalgo_comp_mem);
+}
+
+static struct crypto_alg crypto_alg_zbewalgo = {
+ .cra_name = "zbewalgo",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct zbewalgo_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_init = zbewalgo_init,
+ .cra_exit = zbewalgo_exit,
+ .cra_u = {
+ .compress = {
+ .coa_compress = zbewalgo_compress_crypto,
+ .coa_decompress = zbewalgo_decompress_crypto
+ }
+ }
+};
+
+static struct scomp_alg scomp = {
+ .alloc_ctx = zbewalgo_alloc_ctx,
+ .free_ctx = zbewalgo_free_ctx,
+ .compress = zbewalgo_scompress,
+ .decompress = zbewalgo_sdecompress,
+ .base = {
+ .cra_name = "zbewalgo",
+ .cra_driver_name = "zbewalgo-scomp",
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static int __init zbewalgo_mod_init(void)
+{
+ int ret;
+
+ ret = crypto_register_alg(&crypto_alg_zbewalgo);
+ if (ret)
+ return ret;
+ ret = crypto_register_scomp(&scomp);
+ if (ret) {
+ crypto_unregister_alg(&crypto_alg_zbewalgo);
+ return ret;
+ }
+ return ret;
+}
+
+static void __exit zbewalgo_mod_fini(void)
+{
+ crypto_unregister_alg(&crypto_alg_zbewalgo);
+ crypto_unregister_scomp(&scomp);
+}
+
+module_init(zbewalgo_mod_init);
+module_exit(zbewalgo_mod_fini);
+
+MODULE_AUTHOR("Benjamin Warnke <[email protected]>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("zBeWalgo Compression Algorithm");
+MODULE_ALIAS_CRYPTO("zbewalgo");
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 4ed0a78f..15b3a016 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -23,6 +23,9 @@ static const char * const backends[] = {
#if IS_ENABLED(CONFIG_CRYPTO_LZ4)
"lz4",
#endif
+#if IS_ENABLED(CONFIG_CRYPTO_ZBEWALGO)
+ "zbewalgo",
+#endif
#if IS_ENABLED(CONFIG_CRYPTO_LZ4HC)
"lz4hc",
#endif
--
2.14.1
The data-format of zBeWalgo, and some other algorithms is unstable. To
identify such unstable algorithms this patch adds a new flag to the
crypto-api.
Signed-off-by: Benjamin Warnke <[email protected]>
---
crypto/zbewalgo.c | 2 +-
include/linux/crypto.h | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/crypto/zbewalgo.c b/crypto/zbewalgo.c
index 9db0d43b..e57b5ced 100644
--- a/crypto/zbewalgo.c
+++ b/crypto/zbewalgo.c
@@ -134,7 +134,7 @@ static int zbewalgo_decompress_crypto_unsafe(struct crypto_tfm *tfm,
static struct crypto_alg crypto_alg_zbewalgo = {
.cra_name = "zbewalgo",
- .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS | CRYPTO_ALG_UNSTABLE_ENCODING,
.cra_ctxsize = sizeof(struct zbewalgo_ctx),
.cra_module = THIS_MODULE,
.cra_init = zbewalgo_init,
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6bfb1aea..2228dd08 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -112,6 +112,12 @@
*/
#define CRYPTO_ALG_OPTIONAL_KEY 0x00004000
+/*
+ * Set if the algorithm is new and it is likely that the encoding may
+ * change in near future
+ */
+#define CRYPTO_ALG_UNSTABLE_ENCODING 0x00008000
+
/*
* Transform masks and values (for crt_flags).
*/
--
2.14.1
Hi Benjamin,
Today I tried your new patchset but I couldn't go further due to below
problem. Unfortunately, I don't have the time to look into.
Could you check on it?
Thanks.
[ 169.597064] zram0: detected capacity change from 1073741824 to 0
[ 177.523268] zram0: detected capacity change from 0 to 1073741824
[ 181.312545] BUG: sleeping function called from invalid context at mm/page-writeback.c:2274
[ 181.315578] in_atomic(): 1, irqs_disabled(): 0, pid: 2051, name: dd
[ 181.317804] 1 lock held by dd/2051:
[ 181.318973] #0: 00000000d83cd3cb (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x41/0x1f0
[ 181.321590] CPU: 5 PID: 2051 Comm: dd Not tainted 4.16.0-mm1+ #202
[ 181.323599] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 181.326295] Call Trace:
[ 181.327117] dump_stack+0x67/0x9b
[ 181.328246] ___might_sleep+0x149/0x230
[ 181.329475] write_cache_pages+0x31d/0x620
[ 181.330726] ? tag_pages_for_writeback+0x140/0x140
[ 181.332201] ? __lock_acquire+0x2b5/0x1300
[ 181.333466] generic_writepages+0x5f/0x90
[ 181.334695] ? do_writepages+0x4b/0xf0
[ 181.335840] ? blkdev_readpages+0x20/0x20
[ 181.337077] do_writepages+0x4b/0xf0
[ 181.338174] ? __filemap_fdatawrite_range+0xb4/0x100
[ 181.339672] ? __blkdev_put+0x41/0x1f0
[ 181.340826] ? __filemap_fdatawrite_range+0xc1/0x100
[ 181.342251] __filemap_fdatawrite_range+0xc1/0x100
[ 181.343610] filemap_write_and_wait+0x2c/0x70
[ 181.344867] __blkdev_put+0x71/0x1f0
[ 181.345891] blkdev_close+0x21/0x30
[ 181.346889] __fput+0xeb/0x220
[ 181.347769] task_work_run+0x93/0xc0
[ 181.348803] exit_to_usermode_loop+0x8d/0x90
[ 181.350009] do_syscall_64+0x16b/0x1b0
[ 181.351080] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 181.352498] RIP: 0033:0x7f5e88e028f0
[ 181.353512] RSP: 002b:00007fff448399d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 181.355501] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f5e88e028f0
[ 181.357382] RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000001
[ 181.359254] RBP: 00007f5e892e2698 R08: 000000000117e000 R09: 00007fff448f2080
[ 181.361134] R10: 000000000000086f R11: 0000000000000246 R12: 0000000000000000
[ 181.362995] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 181.365448] show_signal_msg: 12 callbacks suppressed
[ 181.365452] dd[2051]: segfault at 7f5e88d78d70 ip 00007f5e88d78d70 sp 00007fff44839548 error 14 in libc-2.23.so[7f5e88d0b000+1c0000]
[ 181.369877] BUG: scheduling while atomic: dd/2051/0x00000002
[ 181.371734] no locks held by dd/2051.
[ 181.372658] Modules linked in:
[ 181.373503] CPU: 5 PID: 2051 Comm: dd Tainted: G W 4.16.0-mm1+ #202
[ 181.375379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 181.377454] Call Trace:
[ 181.378055] dump_stack+0x67/0x9b
[ 181.378854] __schedule_bug+0x5d/0x80
[ 181.379731] __schedule+0x7b5/0xbd0
[ 181.380569] ? find_held_lock+0x2d/0x90
[ 181.381503] ? try_to_wake_up+0x56/0x510
[ 181.382437] ? wait_for_completion+0x112/0x1a0
[ 181.383486] schedule+0x2f/0x90
[ 181.384237] schedule_timeout+0x22b/0x550
[ 181.385198] ? find_held_lock+0x2d/0x90
[ 181.386105] ? wait_for_completion+0x132/0x1a0
[ 181.387158] ? wait_for_completion+0x112/0x1a0
[ 181.388221] wait_for_completion+0x13a/0x1a0
[ 181.389236] ? wake_up_q+0x70/0x70
[ 181.390008] call_usermodehelper_exec+0x13b/0x170
[ 181.391067] do_coredump+0xaed/0x1040
[ 181.391893] ? try_to_wake_up+0x56/0x510
[ 181.392815] ? __lock_is_held+0x55/0x90
[ 181.393694] get_signal+0x32f/0x8e0
[ 181.394485] ? page_fault+0x2f/0x50
[ 181.395271] do_signal+0x36/0x6f0
[ 181.396021] ? force_sig_info_fault+0x97/0xf0
[ 181.397018] ? __bad_area_nosemaphore+0x19e/0x1b0
[ 181.398074] ? __do_page_fault+0xde/0x4b0
[ 181.398977] ? page_fault+0x2f/0x50
[ 181.399780] exit_to_usermode_loop+0x62/0x90
[ 181.400770] prepare_exit_to_usermode+0xbf/0xd0
[ 181.401734] retint_user+0x8/0x18
[ 181.402446] RIP: 0033:0x7f5e88d78d70
[ 181.403213] RSP: 002b:00007fff44839548 EFLAGS: 00010246
[ 181.404319] RAX: 00007fff4483956f RBX: 00007fff44839550 RCX: 007361696c612e65
[ 181.405827] RDX: 0000000000000000 RSI: 00007f5e88e97733 RDI: 00007fff44839550
[ 181.407245] RBP: 00007fff44839790 R08: 656c61636f6c2f65 R09: feff7e5cff372c00
[ 181.408920] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000117df30
[ 181.410817] R13: 00007fff44839860 R14: 00007fff44839880 R15: 0000000000000000
On Fri, Apr 13, 2018 at 05:48:35PM +0200, Benjamin Warnke wrote:
> This patch series adds a new compression algorithm to the kernel and to
> the crypto api.
>
> Changes since v6:
> - Fixed git apply error due to other recently applied patches
>
> Changes since v5:
> - Fixed compile-error due to variable definitions inside #ifdef CONFIG_ZRAM_WRITEBACK
>
> Changes since v4:
> - Fix mismatching function-prototypes
> - Fix mismatching License errors
> - Add static to global vars
> - Add ULL to long constants
>
> Changes since v3:
> - Split patch into patchset
> - Add Zstd = Zstandard to the list of benchmarked algorithms
> - Added configurable compression levels to crypto-api
> - Added multiple compression levels to the benchmarks below
> - Added unsafe decompressor functions to crypto-api
> - Added flag to mark unstable algorithms to crypto-api
> - Test the code using afl-fuzz -> and fix the code
> - Added 2 new Benchmark datasets
> - checkpatch.pl fixes
>
> Changes since v2:
> - added linux-kernel Mailinglist
>
> Changes since v1:
> - improved documentation
> - improved code style
> - replaced numerous casts with get_unaligned*
> - added tests in crypto/testmgr.h/c
> - added zBeWalgo to the list of algorithms shown by
> /sys/block/zram0/comp_algorithm
>
>
> Currently ZRAM uses compression-algorithms from the crypto-api. ZRAM
> compresses each page individually. As a result the compression algorithm is
> forced to use a very small sliding window. None of the available compression
> algorithms is designed to achieve high compression ratios with small inputs.
>
> This patch-set adds a new compression algorithm 'zBeWalgo' to the crypto api.
> This algorithm focusses on increasing the capacity of the compressed
> block-device created by ZRAM. The choice of compression algorithms is always
> a tradeoff between speed and compression ratio.
>
> If faster algorithms like 'lz4' are chosen the compression ratio is often
> lower than the ratio of zBeWalgo as shown in the following benchmarks. Due to
> the lower compression ratio, ZRAM needs to fall back to backing_devices
> mode often. If backing_devices are required, the effective speed of ZRAM is a
> weighted average of de/compression time and writing/reading from the
> backing_device. This should be considered when comparing the speeds in the
> benchmarks.
>
> There are different kinds of backing_devices, each with its own drawbacks.
> 1. HDDs: This kind of backing device is very slow. If the compression ratio
> of an algorithm is much lower than the ratio of zBeWalgo, it might be faster
> to use zBewalgo instead.
> 2. SSDs: I tested a swap partition on my NVME-SSD. The speed is even higher
> than zram with lz4, but after about 5 Minutes the SSD is blocking all
> read/write requests due to overheating. This is definitly not an option.
>
>
> Benchmarks:
>
>
> To obtain reproducable benchmarks, the datasets were first loaded into a
> userspace-program. Than the data is written directly to a clean
> zram-partition without any filesystem. Between writing and reading 'sync'
> and 'echo 3 > /proc/sys/vm/drop_caches' is called. All time measurements are
> wall clock times, and the benchmarks are using only one cpu-core at a time.
> The new algorithm is compared to all available compression algorithms from
> the crypto-api.
>
> Before loading the datasets to user-space deduplication is applied, since
> none Algorithm has deduplication. Duplicated pages are removed to
> prevent an algorithm to obtain high/low ratios, just because a single page can
> be compressed very well - or not.
>
> All Algorithms marked with '*' are using unsafe decompression.
>
> All Read and Write Speed Measurements are given in MBit/s
>
> zbewalgo' uses per dataset specialized different combinations. These can be
> specified at runtime via /sys/kernel/zbewalgo/combinations.
>
>
> - '/dev/zero' This dataset is used to measure the speed limitations
> for ZRAM. ZRAM filters zero-data internally and does not even call the
> specified compression algorithm.
>
> Algorithm write read
> --zram-- 2724.08 2828.87
>
>
> - 'ecoham' This dataset is one of the input files for the scientific
> application ECOHAM which runs an ocean simulation. This dataset contains a
> lot of zeros - even after deduplication. Where the data is not zero there are
> arrays of floating point values, adjacent float values are likely to be
> similar to each other, allowing for high compression ratios.
>
> zbewalgo reaches very high compression ratios and is a lot faster than other
> algorithms with similar compression ratios.
>
> Algorithm ratio write read
> --hdd-- 1.00 134.70 156.62
> lz4*_10 6.73 1303.12 1547.17
> lz4_10 6.73 1303.12 1574.51
> lzo 6.88 1205.98 1468.09
> lz4*_05 7.00 1291.81 1642.41
> lz4_05 7.00 1291.81 1682.81
> lz4_07 7.13 1250.29 1593.89
> lz4*_07 7.13 1250.29 1677.08
> lz4_06 7.16 1307.62 1666.66
> lz4*_06 7.16 1307.62 1669.42
> lz4_03 7.21 1250.87 1449.48
> lz4*_03 7.21 1250.87 1621.97
> lz4*_04 7.23 1281.62 1645.56
> lz4_04 7.23 1281.62 1666.81
> lz4_02 7.33 1267.54 1523.11
> lz4*_02 7.33 1267.54 1576.54
> lz4_09 7.36 1140.55 1510.01
> lz4*_09 7.36 1140.55 1692.38
> lz4*_01 7.36 1215.40 1575.38
> lz4_01 7.36 1215.40 1676.65
> lz4_08 7.36 1242.73 1544.07
> lz4*_08 7.36 1242.73 1692.92
> lz4hc_01 7.51 235.85 1545.61
> lz4hc*_01 7.51 235.85 1678.00
> lz4hc_02 7.62 226.30 1697.42
> lz4hc*_02 7.62 226.30 1738.79
> lz4hc*_03 7.71 194.64 1711.58
> lz4hc_03 7.71 194.64 1713.59
> lz4hc*_04 7.76 177.17 1642.39
> lz4hc_04 7.76 177.17 1698.36
> deflate_1 7.80 84.71 584.89
> lz4hc*_05 7.81 149.11 1558.43
> lz4hc_05 7.81 149.11 1686.71
> deflate_2 7.82 82.83 599.38
> deflate_3 7.86 84.27 616.05
> lz4hc_06 7.88 106.61 1680.52
> lz4hc*_06 7.88 106.61 1739.78
> zstd_07 7.92 230.34 1016.91
> zstd_05 7.92 252.71 1070.46
> zstd_06 7.93 237.84 1062.11
> lz4hc*_07 7.94 75.22 1751.91
> lz4hc_07 7.94 75.22 1768.98
> zstd_04 7.94 403.21 1080.62
> zstd_03 7.94 411.91 1077.26
> zstd_01 7.94 455.89 1082.54
> zstd_09 7.94 456.81 1079.22
> zstd_08 7.94 459.54 1082.07
> zstd_02 7.94 465.82 1056.67
> zstd_11 7.95 150.15 1070.31
> zstd_10 7.95 169.95 1107.86
> lz4hc_08 7.98 49.53 1611.61
> lz4hc*_08 7.98 49.53 1793.68
> lz4hc_09 7.98 49.62 1629.63
> lz4hc*_09 7.98 49.62 1639.83
> lz4hc*_10 7.99 37.96 1742.65
> lz4hc_10 7.99 37.96 1790.08
> zbewalgo 8.02 38.58 237.92
> zbewalgo* 8.02 38.58 239.10
> 842 8.05 169.90 597.01
> zstd_13 8.06 129.78 1131.66
> zstd_12 8.06 135.50 1126.59
> deflate_4 8.16 71.14 546.52
> deflate_5 8.17 70.86 537.05
> zstd_17 8.19 61.46 1061.45
> zstd_14 8.20 124.43 1133.68
> zstd_18 8.21 56.82 1151.25
> zstd_19 8.22 51.51 1161.83
> zstd_20 8.24 44.26 1108.36
> zstd_16 8.25 76.26 1042.82
> zstd_15 8.25 86.65 1181.98
> deflate_6 8.28 66.45 619.62
> deflate_7 8.30 63.83 631.13
> zstd_21 8.41 6.73 1177.38
> zstd_22 8.46 2.23 1188.39
> deflate_9 8.47 44.16 678.43
> deflate_8 8.47 48.00 677.50
> zbewalgo' 8.80 634.68 1247.56
> zbewalgo'* 8.80 634.68 1429.42
>
>
> - 'source-code' This dataset is a tarball of the source-code from a
> linux-kernel.
>
> zBeWalgo is very bad in compressing text based datasets.
>
>
> Algorithm ratio write read
> --hdd-- 1.00 134.70 156.62
> lz4_10 1.49 584.41 1200.01
> lz4*_10 1.49 584.41 1251.79
> lz4*_07 1.64 559.05 1160.75
> lz4_07 1.64 559.05 1160.97
> 842 1.65 63.66 158.53
> lz4_06 1.71 513.03 1068.18
> lz4*_06 1.71 513.03 1162.68
> lz4_05 1.78 526.31 1136.51
> lz4*_05 1.78 526.31 1144.81
> lz4*_04 1.87 506.63 1106.31
> lz4_04 1.87 506.63 1132.96
> zbewalgo 1.89 27.56 35.04
> zbewalgo* 1.89 27.56 36.20
> zbewalgo' 1.89 46.62 34.75
> zbewalgo'* 1.89 46.62 36.34
> lz4_03 1.98 485.91 984.92
> lz4*_03 1.98 485.91 1125.68
> lz4_02 2.07 454.96 1061.05
> lz4*_02 2.07 454.96 1133.42
> lz4_01 2.17 441.11 1141.52
> lz4*_01 2.17 441.11 1146.26
> lz4*_08 2.17 446.45 1103.61
> lz4_08 2.17 446.45 1163.91
> lz4*_09 2.17 453.21 1071.91
> lz4_09 2.17 453.21 1155.43
> lzo 2.27 430.27 871.87
> lz4hc*_01 2.35 137.71 1089.94
> lz4hc_01 2.35 137.71 1200.45
> lz4hc_02 2.38 139.18 1117.44
> lz4hc*_02 2.38 139.18 1210.58
> lz4hc_03 2.39 127.09 1097.90
> lz4hc*_03 2.39 127.09 1214.22
> lz4hc_10 2.40 96.26 1203.89
> lz4hc*_10 2.40 96.26 1221.94
> lz4hc*_08 2.40 98.80 1191.79
> lz4hc_08 2.40 98.80 1226.59
> lz4hc*_09 2.40 102.36 1213.34
> lz4hc_09 2.40 102.36 1225.45
> lz4hc*_07 2.40 113.81 1217.63
> lz4hc_07 2.40 113.81 1218.49
> lz4hc*_06 2.40 117.32 1214.13
> lz4hc_06 2.40 117.32 1224.51
> lz4hc_05 2.40 122.12 1108.34
> lz4hc*_05 2.40 122.12 1214.97
> lz4hc*_04 2.40 124.91 1093.58
> lz4hc_04 2.40 124.91 1222.05
> zstd_01 2.93 200.01 401.15
> zstd_08 2.93 200.01 414.52
> zstd_09 2.93 200.26 394.83
> zstd_02 3.00 201.12 405.73
> deflate_1 3.01 53.83 240.64
> deflate_2 3.05 52.58 243.31
> deflate_3 3.08 52.07 244.84
> zstd_04 3.10 158.80 365.06
> zstd_03 3.10 169.56 405.92
> zstd_05 3.18 125.00 410.23
> zstd_06 3.20 106.50 404.81
> zstd_07 3.21 99.02 404.23
> zstd_15 3.22 24.95 376.58
> zstd_16 3.22 26.88 416.44
> deflate_4 3.22 45.26 225.56
> zstd_13 3.22 62.53 388.33
> zstd_14 3.22 64.15 391.81
> zstd_12 3.22 66.24 417.67
> zstd_11 3.22 66.44 404.31
> zstd_10 3.22 73.13 401.98
> zstd_17 3.24 14.66 412.00
> zstd_18 3.25 13.37 408.46
> deflate_5 3.26 43.54 252.18
> deflate_7 3.27 39.37 245.63
> deflate_6 3.27 42.51 251.33
> deflate_9 3.28 40.02 253.99
> deflate_8 3.28 40.10 253.98
> zstd_19 3.34 10.36 399.85
> zstd_22 3.35 4.88 353.63
> zstd_21 3.35 6.02 323.33
> zstd_20 3.35 8.34 339.81
>
>
> - 'hpcg' This dataset is a (partial) memory-snapshot of the
> running hpcg-benchmark. At the time of the snapshot, that application
> performed a sparse matrix - vector multiplication.
>
> The compression ratio of zBeWalgo on this dataset is nearly 3 times higher
> than the ratio of any other algorithm regardless of the compression-level
> specified.
>
> Algorithm ratio write read
> --hdd-- 1.00 134.70 156.62
> lz4*_10 1.00 1130.73 2131.82
> lz4_10 1.00 1130.73 2181.60
> lz4_06 1.34 625.48 1145.74
> lz4*_06 1.34 625.48 1145.90
> lz4_07 1.57 515.39 895.42
> lz4*_07 1.57 515.39 1062.53
> lz4*_05 1.72 539.40 1030.76
> lz4_05 1.72 539.40 1038.86
> lzo 1.76 475.20 805.41
> lz4_08 1.76 480.35 939.16
> lz4*_08 1.76 480.35 1015.04
> lz4*_03 1.76 488.05 893.13
> lz4_03 1.76 488.05 1013.65
> lz4*_09 1.76 501.49 1032.69
> lz4_09 1.76 501.49 1105.47
> lz4*_01 1.76 501.54 1040.72
> lz4_01 1.76 501.54 1102.22
> lz4*_02 1.76 510.79 1014.78
> lz4_02 1.76 510.79 1080.69
> lz4_04 1.76 516.18 1047.06
> lz4*_04 1.76 516.18 1049.55
> 842 2.35 109.68 192.50
> lz4hc_07 2.36 152.57 1265.77
> lz4hc*_07 2.36 152.57 1331.01
> lz4hc*_06 2.36 155.78 1313.85
> lz4hc_06 2.36 155.78 1346.52
> lz4hc*_08 2.36 158.80 1297.16
> lz4hc_08 2.36 158.80 1382.54
> lz4hc*_10 2.36 159.84 1317.81
> lz4hc_10 2.36 159.84 1346.85
> lz4hc*_03 2.36 160.01 1162.91
> lz4hc_03 2.36 160.01 1377.09
> lz4hc*_09 2.36 161.02 1320.87
> lz4hc_09 2.36 161.02 1374.39
> lz4hc*_05 2.36 164.67 1324.40
> lz4hc_05 2.36 164.67 1341.64
> lz4hc*_04 2.36 168.11 1323.19
> lz4hc_04 2.36 168.11 1377.56
> lz4hc_01 2.36 168.40 1231.55
> lz4hc*_01 2.36 168.40 1329.72
> lz4hc*_02 2.36 170.74 1316.54
> lz4hc_02 2.36 170.74 1337.42
> deflate_3 3.52 46.51 336.67
> deflate_2 3.52 62.05 343.03
> deflate_1 3.52 65.68 359.96
> deflate_4 4.01 61.01 432.66
> deflate_8 4.61 41.51 408.29
> deflate_5 4.61 44.09 434.79
> deflate_9 4.61 45.14 417.18
> deflate_7 4.61 45.22 440.27
> deflate_6 4.61 46.01 440.39
> zstd_09 5.95 277.11 542.93
> zstd_08 5.95 277.40 541.27
> zstd_01 5.95 277.41 540.61
> zstd_16 5.97 32.05 465.03
> zstd_15 5.97 39.12 515.07
> zstd_13 5.97 70.90 511.94
> zstd_14 5.97 72.20 522.68
> zstd_11 5.97 74.14 512.18
> zstd_12 5.97 74.27 497.95
> zstd_10 5.97 86.98 519.78
> zstd_07 5.97 135.16 504.07
> zstd_06 5.97 145.49 505.10
> zstd_05 6.02 177.86 510.08
> zstd_04 6.02 205.13 516.29
> zstd_03 6.02 217.82 515.50
> zstd_02 6.02 260.97 484.64
> zstd_18 6.27 12.10 490.72
> zstd_17 6.27 12.33 462.65
> zstd_21 6.70 9.25 391.16
> zstd_20 6.70 9.50 395.38
> zstd_22 6.70 9.74 390.99
> zstd_19 6.70 9.99 450.42
> zbewalgo 16.33 47.17 430.06
> zbewalgo* 16.33 47.17 436.92
> zbewalgo' 16.33 188.86 427.78
> zbewalgo'* 16.33 188.86 437.43
>
>
> - 'partdiff' (8 GiB) Array of double values. Adjacent doubles are similar, but
> not equal. This array is produced by a partial differential equation solver
> using a Jakobi-implementation.
>
> zBewalgo gains higher compression ratios than all other algorithms.
> Some algorithms are even slower than a hdd without any compression at all.
>
> Algorithm ratio write read
> zstd_18 1.00 13.77 2080.06
> zstd_17 1.00 13.80 2075.23
> zstd_16 1.00 28.04 2138.99
> zstd_15 1.00 45.04 2143.32
> zstd_13 1.00 55.72 2128.27
> zstd_14 1.00 56.09 2123.54
> zstd_11 1.00 57.31 2095.04
> zstd_12 1.00 57.53 2134.61
> 842 1.00 61.61 2267.89
> zstd_10 1.00 80.40 2081.35
> zstd_07 1.00 120.66 2119.09
> zstd_06 1.00 128.80 2134.02
> zstd_05 1.00 131.25 2133.01
> --hdd-- 1.00 134.70 156.62
> lz4hc*_03 1.00 152.82 1982.94
> lz4hc_03 1.00 152.82 2261.55
> lz4hc*_07 1.00 159.43 1990.03
> lz4hc_07 1.00 159.43 2269.05
> lz4hc_10 1.00 166.33 2243.78
> lz4hc*_10 1.00 166.33 2260.63
> lz4hc_09 1.00 167.03 2244.20
> lz4hc*_09 1.00 167.03 2264.72
> lz4hc*_06 1.00 167.17 2245.15
> lz4hc_06 1.00 167.17 2271.88
> lz4hc_08 1.00 167.49 2237.79
> lz4hc*_08 1.00 167.49 2283.98
> lz4hc_02 1.00 167.51 2275.36
> lz4hc*_02 1.00 167.51 2279.72
> lz4hc*_05 1.00 167.52 2248.92
> lz4hc_05 1.00 167.52 2273.99
> lz4hc*_04 1.00 167.71 2268.23
> lz4hc_04 1.00 167.71 2268.78
> lz4hc*_01 1.00 167.91 2268.76
> lz4hc_01 1.00 167.91 2269.16
> zstd_04 1.00 175.84 2241.60
> zstd_03 1.00 176.35 2285.13
> zstd_02 1.00 195.41 2269.51
> zstd_09 1.00 199.47 2271.91
> zstd_01 1.00 199.74 2287.15
> zstd_08 1.00 199.87 2286.27
> lz4_01 1.00 1160.95 2257.78
> lz4*_01 1.00 1160.95 2275.42
> lz4_08 1.00 1164.37 2280.06
> lz4*_08 1.00 1164.37 2280.43
> lz4*_09 1.00 1166.30 2263.05
> lz4_09 1.00 1166.30 2280.54
> lz4*_03 1.00 1174.00 2074.96
> lz4_03 1.00 1174.00 2257.37
> lz4_02 1.00 1212.18 2273.60
> lz4*_02 1.00 1212.18 2285.66
> lz4*_04 1.00 1253.55 2259.60
> lz4_04 1.00 1253.55 2287.15
> lz4_05 1.00 1279.88 2282.47
> lz4*_05 1.00 1279.88 2287.05
> lz4_06 1.00 1292.22 2277.95
> lz4*_06 1.00 1292.22 2284.84
> lz4*_07 1.00 1303.58 2276.10
> lz4_07 1.00 1303.58 2276.99
> lz4*_10 1.00 1304.80 2183.30
> lz4_10 1.00 1304.80 2285.25
> lzo 1.00 1360.88 2281.19
> deflate_7 1.07 33.51 463.73
> deflate_2 1.07 33.99 473.07
> deflate_9 1.07 34.05 473.57
> deflate_6 1.07 34.06 473.69
> deflate_8 1.07 34.12 472.86
> deflate_5 1.07 34.22 468.03
> deflate_4 1.07 34.32 447.33
> deflate_1 1.07 35.45 431.95
> deflate_3 1.07 35.63 472.56
> zstd_22 1.11 9.81 668.64
> zstd_21 1.11 10.71 734.52
> zstd_20 1.11 10.78 714.86
> zstd_19 1.11 12.02 790.71
> zbewalgo 1.29 25.93 225.07
> zbewalgo* 1.29 25.93 226.72
> zbewalgo'* 1.31 23.54 84.29
> zbewalgo' 1.31 23.54 86.08
>
> - 'Isabella CLOUDf01'
> This dataset is an array of floating point values between 0.00000 and 0.00332.
> Detailed Information about this dataset is online available at
> http://www.vets.ucar.edu/vg/isabeldata/readme.html
>
> All algorithms obtain similar compression ratios. The compression ratio of
> zBeWalgo is slightly higher, and the speed is higher too.
>
> Algorithm ratio write read
> --hdd-- 1.00 134.70 156.62
> lzo 2.06 1022.09 916.22
> lz4*_10 2.09 1126.03 1533.35
> lz4_10 2.09 1126.03 1569.06
> lz4*_07 2.09 1135.89 1444.21
> lz4_07 2.09 1135.89 1581.96
> lz4*_01 2.10 972.22 1405.21
> lz4_01 2.10 972.22 1579.78
> lz4*_09 2.10 982.39 1429.17
> lz4_09 2.10 982.39 1490.27
> lz4_08 2.10 1006.56 1491.14
> lz4*_08 2.10 1006.56 1558.66
> lz4_02 2.10 1019.82 1366.16
> lz4*_02 2.10 1019.82 1578.79
> lz4_03 2.10 1129.74 1417.33
> lz4*_03 2.10 1129.74 1456.68
> lz4_04 2.10 1131.28 1478.27
> lz4*_04 2.10 1131.28 1517.84
> lz4_06 2.10 1147.78 1424.90
> lz4*_06 2.10 1147.78 1462.47
> lz4*_05 2.10 1172.44 1434.86
> lz4_05 2.10 1172.44 1578.80
> lz4hc*_10 2.11 29.01 1498.01
> lz4hc_10 2.11 29.01 1580.23
> lz4hc*_09 2.11 56.30 1510.26
> lz4hc_09 2.11 56.30 1583.11
> lz4hc_08 2.11 56.39 1426.43
> lz4hc*_08 2.11 56.39 1565.12
> lz4hc_07 2.11 129.27 1540.38
> lz4hc*_07 2.11 129.27 1578.35
> lz4hc*_06 2.11 162.72 1456.27
> lz4hc_06 2.11 162.72 1581.69
> lz4hc*_05 2.11 183.78 1487.71
> lz4hc_05 2.11 183.78 1589.10
> lz4hc*_04 2.11 187.41 1431.35
> lz4hc_04 2.11 187.41 1566.24
> lz4hc*_03 2.11 190.21 1531.98
> lz4hc_03 2.11 190.21 1580.81
> lz4hc*_02 2.11 199.69 1432.00
> lz4hc_02 2.11 199.69 1565.10
> lz4hc_01 2.11 205.87 1540.33
> lz4hc*_01 2.11 205.87 1567.68
> 842 2.15 89.89 414.49
> deflate_1 2.29 48.84 352.09
> deflate_2 2.29 49.47 353.77
> deflate_3 2.30 50.00 345.88
> zstd_22 2.31 5.59 658.59
> zstd_21 2.31 14.34 664.02
> zstd_20 2.31 21.22 665.77
> zstd_19 2.31 24.26 587.99
> zstd_17 2.31 26.24 670.14
> zstd_18 2.31 26.47 668.64
> deflate_9 2.31 33.79 345.81
> deflate_8 2.31 34.67 347.96
> deflate_4 2.31 41.46 326.50
> deflate_7 2.31 42.56 346.99
> deflate_6 2.31 43.51 343.56
> deflate_5 2.31 45.83 343.86
> zstd_05 2.31 126.01 571.70
> zstd_04 2.31 178.39 597.26
> zstd_03 2.31 192.04 644.24
> zstd_01 2.31 206.31 563.68
> zstd_08 2.31 207.39 669.05
> zstd_02 2.31 216.98 600.77
> zstd_09 2.31 236.92 667.64
> zstd_16 2.32 41.47 660.06
> zstd_15 2.32 60.37 584.45
> zstd_14 2.32 74.60 673.10
> zstd_12 2.32 75.16 661.96
> zstd_13 2.32 75.22 676.12
> zstd_11 2.32 75.58 636.75
> zstd_10 2.32 95.05 645.07
> zstd_07 2.32 139.52 672.88
> zstd_06 2.32 145.40 670.45
> zbewalgo'* 2.37 337.07 463.32
> zbewalgo' 2.37 337.07 468.96
> zbewalgo* 2.60 101.17 578.35
> zbewalgo 2.60 101.17 586.88
>
>
> - 'Isabella TCf01'
> This dataset is an array of floating point values between -83.00402 and 31.51576.
> Detailed Information about this dataset is online available at
> http://www.vets.ucar.edu/vg/isabeldata/readme.html
>
> zBeWalgo is the only algorithm which can compress this dataset with a noticeable
> compressionratio.
>
> Algorithm ratio write read
> 842 1.00 60.09 1956.26
> --hdd-- 1.00 134.70 156.62
> lz4hc_01 1.00 154.81 1839.37
> lz4hc*_01 1.00 154.81 2105.53
> lz4hc_10 1.00 157.33 2078.69
> lz4hc*_10 1.00 157.33 2113.14
> lz4hc_09 1.00 158.50 2018.51
> lz4hc*_09 1.00 158.50 2093.65
> lz4hc*_02 1.00 159.54 2104.91
> lz4hc_02 1.00 159.54 2117.34
> lz4hc_03 1.00 161.26 2070.76
> lz4hc*_03 1.00 161.26 2107.27
> lz4hc*_08 1.00 161.34 2100.74
> lz4hc_08 1.00 161.34 2105.26
> lz4hc*_04 1.00 161.95 2080.96
> lz4hc_04 1.00 161.95 2104.00
> lz4hc_05 1.00 162.17 2044.43
> lz4hc*_05 1.00 162.17 2101.74
> lz4hc*_06 1.00 163.61 2087.19
> lz4hc_06 1.00 163.61 2104.61
> lz4hc_07 1.00 164.51 2094.78
> lz4hc*_07 1.00 164.51 2105.53
> lz4_01 1.00 1134.89 2109.70
> lz4*_01 1.00 1134.89 2118.71
> lz4*_08 1.00 1141.96 2104.87
> lz4_08 1.00 1141.96 2118.97
> lz4_09 1.00 1145.55 2087.76
> lz4*_09 1.00 1145.55 2118.85
> lz4_02 1.00 1157.28 2094.33
> lz4*_02 1.00 1157.28 2124.67
> lz4*_03 1.00 1194.18 2106.36
> lz4_03 1.00 1194.18 2119.89
> lz4_04 1.00 1195.09 2117.03
> lz4*_04 1.00 1195.09 2120.23
> lz4*_05 1.00 1225.56 2109.04
> lz4_05 1.00 1225.56 2120.52
> lz4*_06 1.00 1261.67 2109.14
> lz4_06 1.00 1261.67 2121.13
> lz4*_07 1.00 1270.86 1844.63
> lz4_07 1.00 1270.86 2041.08
> lz4_10 1.00 1305.36 2109.22
> lz4*_10 1.00 1305.36 2120.65
> lzo 1.00 1338.61 2109.66
> zstd_17 1.03 13.93 1138.94
> zstd_18 1.03 14.01 1170.78
> zstd_16 1.03 27.12 1073.75
> zstd_15 1.03 43.52 1061.97
> zstd_14 1.03 49.60 1082.98
> zstd_12 1.03 55.03 1042.43
> zstd_13 1.03 55.14 1173.50
> zstd_11 1.03 55.24 1178.05
> zstd_10 1.03 70.01 1173.05
> zstd_07 1.03 118.10 1041.92
> zstd_06 1.03 123.00 1171.59
> zstd_05 1.03 124.61 1165.74
> zstd_01 1.03 166.80 1005.29
> zstd_04 1.03 170.25 1127.75
> zstd_03 1.03 171.40 1172.34
> zstd_02 1.03 174.08 1017.34
> zstd_09 1.03 195.30 1176.82
> zstd_08 1.03 195.98 1175.09
> deflate_9 1.05 30.15 483.55
> deflate_8 1.05 30.45 466.67
> deflate_5 1.05 31.25 480.92
> deflate_4 1.05 31.84 472.81
> deflate_7 1.05 31.84 484.18
> deflate_6 1.05 31.94 481.37
> deflate_2 1.05 33.07 484.09
> deflate_3 1.05 33.11 463.57
> deflate_1 1.05 33.19 469.71
> zstd_22 1.06 8.89 647.75
> zstd_21 1.06 10.70 700.11
> zstd_20 1.06 10.80 723.42
> zstd_19 1.06 12.41 764.24
> zbewalgo* 1.51 146.45 581.43
> zbewalgo 1.51 146.45 592.86
> zbewalgo'* 1.54 38.14 120.96
> zbewalgo' 1.54 38.14 125.81
>
>
> Signed-off-by: Benjamin Warnke <[email protected]>
>
> Benjamin Warnke (5):
> add compression algorithm zBeWalgo
> crypto: add zBeWalgo to crypto-api
> crypto: add unsafe decompression to api
> crypto: configurable compression level
> crypto: add flag for unstable encoding
>
> crypto/842.c | 3 +-
> crypto/Kconfig | 12 +
> crypto/Makefile | 1 +
> crypto/api.c | 76 ++++
> crypto/compress.c | 10 +
> crypto/crypto_null.c | 3 +-
> crypto/deflate.c | 19 +-
> crypto/lz4.c | 39 +-
> crypto/lz4hc.c | 36 +-
> crypto/lzo.c | 3 +-
> crypto/testmgr.c | 39 +-
> crypto/testmgr.h | 134 +++++++
> crypto/zbewalgo.c | 191 ++++++++++
> drivers/block/zram/zcomp.c | 13 +-
> drivers/block/zram/zcomp.h | 3 +-
> drivers/block/zram/zram_drv.c | 56 ++-
> drivers/block/zram/zram_drv.h | 2 +
> drivers/crypto/cavium/zip/zip_main.c | 6 +-
> drivers/crypto/nx/nx-842-powernv.c | 3 +-
> drivers/crypto/nx/nx-842-pseries.c | 3 +-
> fs/ubifs/compress.c | 2 +-
> include/linux/crypto.h | 31 +-
> include/linux/zbewalgo.h | 50 +++
> lib/Kconfig | 3 +
> lib/Makefile | 1 +
> lib/zbewalgo/BWT.c | 120 ++++++
> lib/zbewalgo/BWT.h | 21 ++
> lib/zbewalgo/JBE.c | 204 ++++++++++
> lib/zbewalgo/JBE.h | 13 +
> lib/zbewalgo/JBE2.c | 221 +++++++++++
> lib/zbewalgo/JBE2.h | 13 +
> lib/zbewalgo/MTF.c | 122 ++++++
> lib/zbewalgo/MTF.h | 13 +
> lib/zbewalgo/Makefile | 4 +
> lib/zbewalgo/RLE.c | 137 +++++++
> lib/zbewalgo/RLE.h | 13 +
> lib/zbewalgo/bewalgo.c | 401 ++++++++++++++++++++
> lib/zbewalgo/bewalgo.h | 13 +
> lib/zbewalgo/bewalgo2.c | 407 ++++++++++++++++++++
> lib/zbewalgo/bewalgo2.h | 13 +
> lib/zbewalgo/bitshuffle.c | 93 +++++
> lib/zbewalgo/bitshuffle.h | 13 +
> lib/zbewalgo/huffman.c | 262 +++++++++++++
> lib/zbewalgo/huffman.h | 13 +
> lib/zbewalgo/include.h | 94 +++++
> lib/zbewalgo/zbewalgo.c | 713 +++++++++++++++++++++++++++++++++++
> mm/zswap.c | 2 +-
> net/xfrm/xfrm_ipcomp.c | 3 +-
> 48 files changed, 3605 insertions(+), 42 deletions(-)
> create mode 100644 crypto/zbewalgo.c
> create mode 100644 include/linux/zbewalgo.h
> create mode 100644 lib/zbewalgo/BWT.c
> create mode 100644 lib/zbewalgo/BWT.h
> create mode 100644 lib/zbewalgo/JBE.c
> create mode 100644 lib/zbewalgo/JBE.h
> create mode 100644 lib/zbewalgo/JBE2.c
> create mode 100644 lib/zbewalgo/JBE2.h
> create mode 100644 lib/zbewalgo/MTF.c
> create mode 100644 lib/zbewalgo/MTF.h
> create mode 100644 lib/zbewalgo/Makefile
> create mode 100644 lib/zbewalgo/RLE.c
> create mode 100644 lib/zbewalgo/RLE.h
> create mode 100644 lib/zbewalgo/bewalgo.c
> create mode 100644 lib/zbewalgo/bewalgo.h
> create mode 100644 lib/zbewalgo/bewalgo2.c
> create mode 100644 lib/zbewalgo/bewalgo2.h
> create mode 100644 lib/zbewalgo/bitshuffle.c
> create mode 100644 lib/zbewalgo/bitshuffle.h
> create mode 100644 lib/zbewalgo/huffman.c
> create mode 100644 lib/zbewalgo/huffman.h
> create mode 100644 lib/zbewalgo/include.h
> create mode 100644 lib/zbewalgo/zbewalgo.c
>
> --
> 2.14.1
>