Hi,
This is kernel port of LZO1X-1 compressor and LZO1X decompressor (safe
version only).
* Changes since 'take 5' (Full Changelog after this):
- Added compressor and decomrpesssor as separate and hidden config
options (default: n)
- Cleanups: removed LZO_CHECK_MPOS_NON_DET macro
- removed PTR* macros.
* Benchmarks: (system: P4 3GHz, 1G RAM)
(Using tester program from Daniel)
Following compares this kernel port ('take 6') vs original miniLZO code:
'TinyLZO' refers to this kernel port.
10000 run averages:
'Tiny LZO':
Combined: 61.2223 usec
Compression: 41.8412 usec
Decompression: 19.3811 usec
'miniLZO':
Combined: 66.0444 usec
Compression: 46.6323 usec
Decompression: 19.4121 usec
Result:
Overall: TinyLZO is 7.3% faster
Compressor: TinyLZO is 10.2% faster
Decompressor: TinyLZO is 0.15% faster
TODO: test 'take 6' port on archs other than x86(_32)
* Changelog vs. original LZO:
1) Used standard/kernel defined data types: (this eliminated _huge_
#ifdef chunks)
lzo_bytep -> unsigned char *
lzo_uint -> size_t
lzo_xint -> size_t
lzo_uint32p -> u32 *
lzo_uintptr_t -> unsigned long
2) Removed everything #ifdef'ed under COPY_DICT (this is not set for
LZO1X, so removed corres. parts).
3) Removed code #ifdef'ed for LZO1Y, LZO1Z, other variants.
4) Reformatted the code to match general kernel style.
5) The only code change: (as suggested by Andrey)
-#if defined(__LITTLE_ENDIAN)
- m_pos = op - 1;
- m_pos -= (*(const unsigned short *)ip) >> 2;
-#else
- m_pos = op - 1;
- m_pos -= (ip[0] >> 2) + (ip[1] << 6);
-#endif
+ m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
(Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
*** Need testing on big endian machine ***
Similarly:
-#if defined(__LITTLE_ENDIAN)
- m_pos -= (*(const unsigned short *)ip) >> 2;
-#else
- m_pos -= (ip[0] >> 2) + (ip[1] << 6);
-#endif
+ m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;
6) Get rid of LZO_CHECK_MPOS_NON_DET macro and PTR* macros.
I think it's now ready for mainline inclusion.
include/linux/lzo1x.h | 66 +++++++++++
lib/Kconfig | 6 +
lib/Makefile | 2 +
lib/lzo1x/Makefile | 2 +
lib/lzo1x/lzo1x_compress.c | 259 ++++++++++++++++++++++++++++++++++++++++++
lib/lzo1x/lzo1x_decompress.c | 238 ++++++++++++++++++++++++++++++++++++++
lib/lzo1x/lzo1x_int.h | 85 ++++++++++++++
7 files changed, 658 insertions(+), 0 deletions(-)
Signed-off-by: Nitin Gupta <nitingupta910 [at] gmail [dot] com>
---
diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h
new file mode 100755
index 0000000..11a6f23
--- /dev/null
+++ b/include/linux/lzo1x.h
@@ -0,0 +1,66 @@
+/* lzo1x.h -- public interface of the LZO1X compression algorithm
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is modified version of lzo1x.h found in original LZO 2.02
+ code. Some additional changes have also been made to make it work
+ in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#ifndef __LZO1X_H
+#define __LZO1X_H
+
+/* LZO return codes */
+#define LZO_E_OK 0
+#define LZO_E_ERROR (-1)
+#define LZO_E_OUT_OF_MEMORY (-2) /* [not used right now] */
+#define LZO_E_NOT_COMPRESSIBLE (-3) /* [not used right now] */
+#define LZO_E_INPUT_OVERRUN (-4)
+#define LZO_E_OUTPUT_OVERRUN (-5)
+#define LZO_E_LOOKBEHIND_OVERRUN (-6)
+#define LZO_E_EOF_NOT_FOUND (-7)
+#define LZO_E_INPUT_NOT_CONSUMED (-8)
+#define LZO_E_NOT_YET_IMPLEMENTED (-9) /* [not used right now] */
+
+/* Size of temp buffer (workmem) required by lzo1x_compress */
+#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *)))
+
+/*
+ * This requires 'workmem' of size LZO1X_WORKMEM_SIZE
+ */
+int lzo1x_compress(const unsigned char *src, size_t src_len,
+ unsigned char *dst, size_t *dst_len,
+ void *workmem);
+
+/*
+ * This decompressor will catch all compressed data violations and
+ * return an error code in this case.
+ */
+int lzo1x_decompress(const unsigned char *src, size_t src_len,
+ unsigned char *dst, size_t *dst_len);
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 2e7ae6b..eb95eaa 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -64,6 +64,12 @@ config ZLIB_INFLATE
config ZLIB_DEFLATE
tristate
+config LZO1X_COMPRESS
+ tristate
+
+config LZO1X_DECOMPRESS
+ tristate
+
#
# Generic allocator support is selected if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index c8c8e20..448ae37 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,8 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
+obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x/
+obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x/
obj-$(CONFIG_TEXTSEARCH) += textsearch.o
obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
new file mode 100644
index 0000000..7b56a4d
--- /dev/null
+++ b/lib/lzo1x/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x_compress.o
+obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x_decompress.o
diff --git a/lib/lzo1x/lzo1x_compress.c b/lib/lzo1x/lzo1x_compress.c
new file mode 100755
index 0000000..b525be6
--- /dev/null
+++ b/lib/lzo1x/lzo1x_compress.c
@@ -0,0 +1,259 @@
+/* lzo1x_compress.c -- LZO1X-1 compression
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is derived from lzo1x_1.c and lzo1x_c.ch found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/lzo1x.h>
+
+#include "lzo1x_int.h"
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("LZO1X Compression");
+
+/* compress a block of data. */
+static noinline unsigned int
+lzo1x_compress_worker(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ void *workmem)
+{
+ register const unsigned char *ip;
+ unsigned char *op;
+ const unsigned char * const in_end = in + in_len;
+ const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
+ const unsigned char *ii;
+ const unsigned char ** const dict = (const unsigned char **)workmem;
+
+ op = out;
+ ip = in;
+ ii = ip;
+
+ ip += 4;
+ for (;;) {
+ register const unsigned char *m_pos;
+ size_t m_off;
+ size_t m_len;
+ size_t dindex;
+
+ DINDEX1(dindex, ip);
+ m_pos = dict[dindex];
+
+ if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
+ || m_off > M4_MAX_OFFSET)
+ goto literal;
+
+ if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
+ goto try_match;
+
+ DINDEX2(dindex, ip);
+ m_pos = dict[dindex];
+
+ if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
+ || m_off > M4_MAX_OFFSET)
+ goto literal;
+
+ if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
+ goto try_match;
+
+ goto literal;
+
+try_match:
+ if (*(const unsigned short *)m_pos ==
+ *(const unsigned short *)ip) {
+ if (likely(m_pos[2] == ip[2]))
+ goto match;
+ }
+
+ /* a literal */
+literal:
+ dict[dindex] = ip;
+ ++ip;
+ if (unlikely(ip >= ip_end))
+ break;
+ continue;
+
+ /* a match */
+match:
+ dict[dindex] = ip;
+ /* store current literal run */
+ if ((size_t)(ip - ii) > 0) {
+ register size_t t = (size_t)(ip - ii);
+ if (t <= 3)
+ op[-2] |= (unsigned char)t;
+ else if (t <= 18)
+ *op++ = (unsigned char)(t - 3);
+ else {
+ register size_t tt = t - 18;
+ *op++ = 0;
+ while (tt > 255) {
+ tt -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)tt;
+ }
+ do
+ *op++ = *ii++;
+ while (--t > 0);
+ }
+
+ /* code the match */
+ ip += 3;
+ if (m_pos[3] != *ip++ || m_pos[4] != *ip++ ||
+ m_pos[5] != *ip++ || m_pos[6] != *ip++ ||
+ m_pos[7] != *ip++ || m_pos[8] != *ip++) {
+ --ip;
+ m_len = (size_t)(ip - ii);
+
+ if (m_off <= M2_MAX_OFFSET) {
+ m_off -= 1;
+ *op++ = (unsigned char)(((m_len - 1) << 5) |
+ ((m_off & 7) << 2));
+ *op++ = (unsigned char)(m_off >> 3);
+ }
+ else if (m_off <= M3_MAX_OFFSET) {
+ m_off -= 1;
+ *op++ = (unsigned char)(M3_MARKER |
+ (m_len - 2));
+ goto m3_m4_offset;
+ } else {
+ m_off -= 0x4000;
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11) |
+ (m_len - 2));
+ goto m3_m4_offset;
+ }
+ } else {
+ const unsigned char *end = in_end;
+ const unsigned char *m = m_pos + M2_MAX_LEN + 1;
+ while (ip < end && *m == *ip)
+ m++, ip++;
+ m_len = (size_t)(ip - ii);
+
+ if (m_off <= M3_MAX_OFFSET) {
+ m_off -= 1;
+ if (m_len <= 33)
+ *op++ = (unsigned char)(M3_MARKER |
+ (m_len - 2));
+ else {
+ m_len -= 33;
+ *op++ = M3_MARKER | 0;
+ goto m3_m4_len;
+ }
+ } else {
+ m_off -= 0x4000;
+ if (m_len <= M4_MAX_LEN)
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11) |
+ (m_len - 2));
+ else {
+ m_len -= M4_MAX_LEN;
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11));
+m3_m4_len:
+ while (m_len > 255) {
+ m_len -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)(m_len);
+ }
+ }
+
+m3_m4_offset:
+ *op++ = (unsigned char)((m_off & 63) << 2);
+ *op++ = (unsigned char)(m_off >> 6);
+ }
+
+ ii = ip;
+ if (unlikely(ip >= ip_end))
+ break;
+ }
+
+ *out_len = (size_t)(op - out);
+ return (size_t)(in_end - ii);
+}
+
+
+/*
+ * This requires buffer (workmem) of size LZO1X_WORKMEM_SIZE
+ * (exported by lzo1x.h).
+ */
+int
+lzo1x_compress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ void *workmem)
+{
+ unsigned char *op = out;
+ size_t t;
+
+ if (!workmem)
+ return -EINVAL;
+
+ if (unlikely(in_len <= M2_MAX_LEN + 5))
+ t = in_len;
+ else {
+ t = lzo1x_compress_worker(in, in_len, op, out_len, workmem);
+ op += *out_len;
+ }
+
+ if (t > 0) {
+ const unsigned char *ii = in + in_len - t;
+
+ if (op == out && t <= 238)
+ *op++ = (unsigned char)(17 + t);
+ else if (t <= 3)
+ op[-2] |= (unsigned char)t;
+ else if (t <= 18)
+ *op++ = (unsigned char)(t - 3);
+ else {
+ size_t tt = t - 18;
+ *op++ = 0;
+ while (tt > 255) {
+ tt -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)tt;
+ }
+ do
+ *op++ = *ii++;
+ while (--t > 0);
+ }
+ *op++ = M4_MARKER | 1;
+ *op++ = 0;
+ *op++ = 0;
+
+ *out_len = (size_t)(op - out);
+ return LZO_E_OK;
+}
+
+EXPORT_SYMBOL(lzo1x_compress);
diff --git a/lib/lzo1x/lzo1x_decompress.c b/lib/lzo1x/lzo1x_decompress.c
new file mode 100755
index 0000000..75ce294
--- /dev/null
+++ b/lib/lzo1x/lzo1x_decompress.c
@@ -0,0 +1,238 @@
+/* lzo1x_decompress.c -- LZO1X decompression
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is derived from lzo1x_d1.c and lzo1x_d.ch found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <asm/byteorder.h>
+#include <linux/lzo1x.h>
+
+#include "lzo1x_int.h"
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("LZO1X Decompression");
+
+int
+lzo1x_decompress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len)
+{
+ register size_t t;
+ register unsigned char *op = out;
+ register const unsigned char *ip = in, *m_pos;
+ const unsigned char * const ip_end = in + in_len;
+ unsigned char * const op_end = out + *out_len;
+ *out_len = 0;
+
+ if (*ip > 17) {
+ t = *ip++ - 17;
+ if (t < 4)
+ goto match_next;
+ NEED_OP(t);
+ NEED_IP(t + 1);
+ do
+ *op++ = *ip++;
+ while (--t > 0);
+ goto first_literal_run;
+ }
+
+ while (TEST_IP) {
+ t = *ip++;
+ if (t >= 16)
+ goto match;
+ /* a literal run */
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 15 + *ip++;
+ }
+ /* copy literals */
+ NEED_OP(t + 3);
+ NEED_IP(t + 4);
+ COPY4(op, ip);
+ op += 4; ip += 4;
+ if (--t > 0) {
+ if (t >= 4) {
+ do {
+ COPY4(op, ip);
+ op += 4; ip += 4; t -= 4;
+ } while (t >= 4);
+ if (t > 0)
+ do
+ *op++ = *ip++;
+ while (--t > 0);
+ }
+ else
+ do
+ *op++ = *ip++;
+ while (--t > 0);
+ }
+
+first_literal_run:
+ t = *ip++;
+ if (t >= 16)
+ goto match;
+ m_pos = op - (1 + M2_MAX_OFFSET);
+ m_pos -= t >> 2;
+ m_pos -= *ip++ << 2;
+ TEST_LB(m_pos);
+ NEED_OP(3);
+ *op++ = *m_pos++;
+ *op++ = *m_pos++;
+ *op++ = *m_pos;
+ goto match_done;
+
+ /* handle matches */
+ do {
+match:
+ if (t >= 64) { /* a M2 match */
+ m_pos = op - 1;
+ m_pos -= (t >> 2) & 7;
+ m_pos -= *ip++ << 3;
+ t = (t >> 5) - 1;
+ TEST_LB(m_pos);
+ NEED_OP(t + 3 - 1);
+ goto copy_match;
+ } else if (t >= 32) { /* a M3 match */
+ t &= 31;
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 31 + *ip++;
+ }
+ m_pos = op - 1 - (cpu_to_le16(
+ *(const unsigned short *)ip) >> 2);
+ ip += 2;
+ } else if (t >= 16) { /* a M4 match */
+ m_pos = op;
+ m_pos -= (t & 8) << 11;
+ t &= 7;
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 7 + *ip++;
+ }
+ m_pos -= cpu_to_le16(
+ *(const unsigned short *)ip) >> 2;
+ ip += 2;
+ if (m_pos == op)
+ goto eof_found;
+ m_pos -= 0x4000;
+ } else { /* a M1 match */
+ m_pos = op - 1;
+ m_pos -= t >> 2;
+ m_pos -= *ip++ << 2;
+ TEST_LB(m_pos);
+ NEED_OP(2);
+ *op++ = *m_pos++;
+ *op++ = *m_pos;
+ goto match_done;
+ }
+
+ /* copy match */
+ TEST_LB(m_pos);
+ NEED_OP(t + 3 - 1);
+
+ if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
+ COPY4(op, m_pos);
+ op += 4; m_pos += 4; t -= 4 - (3 - 1);
+ do {
+ COPY4(op, m_pos);
+ op += 4; m_pos += 4; t -= 4;
+ } while (t >= 4);
+ if (t > 0)
+ do *op++ = *m_pos++;
+ while (--t > 0);
+ } else {
+copy_match:
+ *op++ = *m_pos++;
+ *op++ = *m_pos++;
+ do
+ *op++ = *m_pos++;
+ while (--t > 0);
+ }
+
+match_done:
+ t = ip[-2] & 3;
+ if (t == 0)
+ break;
+
+ /* copy literals */
+match_next:
+ NEED_OP(t);
+ NEED_IP(t + 1);
+ *op++ = *ip++;
+ if (t > 1) {
+ *op++ = *ip++;
+ if (t > 2)
+ *op++ = *ip++;
+ }
+ t = *ip++;
+ } while (TEST_IP);
+ }
+
+ /* no EOF code was found */
+ *out_len = (size_t)(op - out);
+ return LZO_E_EOF_NOT_FOUND;
+
+eof_found:
+ *out_len = (size_t)(op - out);
+ return (ip == ip_end ? LZO_E_OK :
+ (ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED :
+ LZO_E_INPUT_OVERRUN));
+
+input_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_INPUT_OVERRUN;
+
+output_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_OUTPUT_OVERRUN;
+
+lookbehind_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_LOOKBEHIND_OVERRUN;
+}
+
+EXPORT_SYMBOL(lzo1x_decompress);
diff --git a/lib/lzo1x/lzo1x_int.h b/lib/lzo1x/lzo1x_int.h
new file mode 100755
index 0000000..4f0fe8d
--- /dev/null
+++ b/lib/lzo1x/lzo1x_int.h
@@ -0,0 +1,85 @@
+/* lzo1x_int.h -- to be used internally by LZO de/compression algorithms
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file was derived from several header files found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#ifndef __LZO1X_INT_H
+#define __LZO1X_INT_H
+
+#include <linux/types.h>
+
+#define D_BITS 14
+#define D_SIZE (1u << D_BITS)
+#define D_MASK (D_SIZE - 1)
+#define D_HIGH ((D_MASK >> 1) + 1)
+
+#define DX2(p,s1,s2) \
+ (((((size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0])
+#define DX3(p,s1,s2,s3) \
+ ((DX2((p) + 1, s2, s3) << (s1)) ^ (p)[0])
+#define DINDEX1(d,p) \
+ d = ((size_t)(0x21 * DX3(p, 5, 5, 6)) >> 5) & D_MASK
+#define DINDEX2(d,p) \
+ d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f)
+
+#define COPY4(dst,src) *(u32 *)(dst) = *(u32 *)(src)
+
+/* LZO1X Specific constants */
+#define M1_MAX_OFFSET 0x0400
+#define M2_MAX_OFFSET 0x0800
+#define M3_MAX_OFFSET 0x4000
+#define M4_MAX_OFFSET 0xbfff
+
+#define M1_MIN_LEN 2
+#define M1_MAX_LEN 2
+#define M2_MIN_LEN 3
+#define M2_MAX_LEN 8
+#define M3_MIN_LEN 3
+#define M3_MAX_LEN 33
+#define M4_MIN_LEN 3
+#define M4_MAX_LEN 9
+
+#define M1_MARKER 0
+#define M2_MARKER 64
+#define M3_MARKER 32
+#define M4_MARKER 16
+
+/* Bounds checking */
+#define TEST_IP (ip < ip_end)
+#define NEED_IP(x) \
+ if ((size_t)(ip_end - ip) < (size_t)(x)) goto input_overrun
+#define NEED_OP(x) \
+ if ((size_t)(op_end - op) < (size_t)(x)) goto output_overrun
+#define TEST_LB(m_pos) \
+ if (m_pos < out || m_pos >= op) goto lookbehind_overrun
+
+#endif
Hi,
Attached is tester code used for testing.
(developed by Daniel Hazelton -- modified slightly to now use 'take 6'
version for 'TinyLZO')
Cheers,
Nitin
On 5/28/07, Nitin Gupta <[email protected]> wrote:
> (Using tester program from Daniel)
>
> Following compares this kernel port ('take 6') vs original miniLZO code:
>
> 'TinyLZO' refers to this kernel port.
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 61.2223 usec
> Compression: 41.8412 usec
> Decompression: 19.3811 usec
> 'miniLZO':
> Combined: 66.0444 usec
> Compression: 46.6323 usec
> Decompression: 19.4121 usec
>
> Result:
> Overall: TinyLZO is 7.3% faster
> Compressor: TinyLZO is 10.2% faster
> Decompressor: TinyLZO is 0.15% faster
>
On Monday 28 May 2007 10:40:31 Nitin Gupta wrote:
> Hi,
>
> Attached is tester code used for testing.
> (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> version for 'TinyLZO')
>
> Cheers,
> Nitin
>
<snip>
I haven't tested with version 6, but after removing the LZO_CHECK_MPOS_NON_DET
macro from the 'take 5' code and replacing the open-coded byte-for-byte
copies with calls to memcpy:
10000 run averages:
'Tiny LZO':
Combined: 57.4691 usec
Compression: 39.8837 usec
Decompression: 17.5854 usec
'miniLZO':
Combined: 64.0484 usec
Compression: 46.0604 usec
Decompression: 17.988 usec
which means:
Overall TinyLZO is 10.2% faster
TinyLZO compresses 13.4% faster
TinyLZO decompresses 2.23% faster
-Benchmark run a a Pentium-M 1.73GHz, 1GB Ram
With the speed-up seen with just the removal of the LZO_CHECK_MPOS_NON_DET I
wasn't sure that changing the open-coded copy to a call to memcpy() was going
to have a big impact on the code, but it does appear to have has several
percentage points of difference. (Though I am unsure about the speed increase
with the decompression - I didn't touch that file when making this set of
changes. I'm going to make the memcpy() changes to 'take 6' and see if the
speed-up holds true)
DRH
On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> On Monday 28 May 2007 10:40:31 Nitin Gupta wrote:
> > Hi,
> >
> > Attached is tester code used for testing.
> > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > version for 'TinyLZO')
> >
> > Cheers,
> > Nitin
> >
>
> <snip>
> I haven't tested with version 6, but after removing the LZO_CHECK_MPOS_NON_DET
> macro from the 'take 5' code and replacing the open-coded byte-for-byte
> copies with calls to memcpy:
>
I did memcpy() changes in some initial post (take '2', I think). That
caused some _correctness_ issue in de/compressor code -- Bret's test
could not succeed on ppc machine. After going back to byte-by-byte
copying, his tests were successful.
So, it's better not to include such changes now or test on all
supported archs if you really want to do so :)
> 10000 run averages:
> 'Tiny LZO':
> Combined: 57.4691 usec
> Compression: 39.8837 usec
> Decompression: 17.5854 usec
> 'miniLZO':
> Combined: 64.0484 usec
> Compression: 46.0604 usec
> Decompression: 17.988 usec
>
> which means:
> Overall TinyLZO is 10.2% faster
> TinyLZO compresses 13.4% faster
> TinyLZO decompresses 2.23% faster
>
> -Benchmark run a a Pentium-M 1.73GHz, 1GB Ram
> With the speed-up seen with just the removal of the LZO_CHECK_MPOS_NON_DET I
> wasn't sure that changing the open-coded copy to a call to memcpy() was going
> to have a big impact on the code, but it does appear to have has several
> percentage points of difference.
Yes, memcpy() changes have potential of giving significant perf gain
but I am not too sure if memcpy() will be good if we want to copy just
few bytes (which is the case at many times in de/compressor). Also, at
some places, memcpy() changes are not trival and have actually caused
correctness issues as mentioned above.
So, before this change, it will be good if it gets merged in mainline
and tested, at least for correctness, on all supported achs. All the
while, we will have a good feeling that there is still a good scope
for perf improvement :)
Cheers,
Nitin
On Mon, May 28, 2007 at 08:10:31PM +0530, Nitin Gupta wrote:
> Hi,
>
> Attached is tester code used for testing.
> (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> version for 'TinyLZO')
>
> Cheers,
> Nitin
>
> On 5/28/07, Nitin Gupta <[email protected]> wrote:
>> (Using tester program from Daniel)
>>
>> Following compares this kernel port ('take 6') vs original miniLZO code:
>>
>> 'TinyLZO' refers to this kernel port.
>>
>> 10000 run averages:
>> 'Tiny LZO':
>> Combined: 61.2223 usec
>> Compression: 41.8412 usec
>> Decompression: 19.3811 usec
>> 'miniLZO':
>> Combined: 66.0444 usec
>> Compression: 46.6323 usec
>> Decompression: 19.4121 usec
>>
>> Result:
>> Overall: TinyLZO is 7.3% faster
>> Compressor: TinyLZO is 10.2% faster
>> Decompressor: TinyLZO is 0.15% faster
So your the compressor of your version runs 10.2% faster than the
original version.
That's a huge difference.
Why exactly is it that much faster?
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Mon, May 28, 2007 at 08:36:44PM +0530, Nitin Gupta wrote:
>...
> So, before this change, it will be good if it gets merged in mainline
> and tested, at least for correctness, on all supported achs. All the
> while, we will have a good feeling that there is still a good scope
> for perf improvement :)
The correct order is:
- create one version with all the optimizations you have in mind
- then ensure that it works correctly on all architectures and
document why your version is that much faster than the original
version and why you know your optimizations have no side effects
- then get it tested in -mm
After these steps, you can start considering getting it into mainline.
> Cheers,
> Nitin
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On 5/28/07, Adrian Bunk <[email protected]> wrote:
> On Mon, May 28, 2007 at 08:10:31PM +0530, Nitin Gupta wrote:
> > Hi,
> >
> > Attached is tester code used for testing.
> > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > version for 'TinyLZO')
> >
> > Cheers,
> > Nitin
> >
> > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> >> (Using tester program from Daniel)
> >>
> >> Following compares this kernel port ('take 6') vs original miniLZO code:
> >>
> >> 'TinyLZO' refers to this kernel port.
> >>
> >> 10000 run averages:
> >> 'Tiny LZO':
> >> Combined: 61.2223 usec
> >> Compression: 41.8412 usec
> >> Decompression: 19.3811 usec
> >> 'miniLZO':
> >> Combined: 66.0444 usec
> >> Compression: 46.6323 usec
> >> Decompression: 19.4121 usec
> >>
> >> Result:
> >> Overall: TinyLZO is 7.3% faster
> >> Compressor: TinyLZO is 10.2% faster
> >> Decompressor: TinyLZO is 0.15% faster
>
> So your the compressor of your version runs 10.2% faster than the
> original version.
>
> That's a huge difference.
>
> Why exactly is it that much faster?
>
> cu
> Adrian
I am not sure on how to account for this _big_ perf. gain but from
benchmarks I see that whenever I remove unncessary casting from tight
loops I get perf. gain of 1-2%. For e.g. open coding
LZO_CHECK_MPOS_NON_DET macro with all unnecessary casting removed,
gave perf. gain of ~2%. Similarly, I found many other places where
such casting was unnecessary.
These changes have been tested on x86, amd64, ppc. Testing of 'take 6'
version is yet to be done - this will confirm that I didn't
reintroduce some error.
- Nitin
On Monday 28 May 2007 11:30:55 Adrian Bunk wrote:
> On Mon, May 28, 2007 at 08:10:31PM +0530, Nitin Gupta wrote:
> > Hi,
> >
> > Attached is tester code used for testing.
> > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > version for 'TinyLZO')
> >
> > Cheers,
> > Nitin
> >
> > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> >> (Using tester program from Daniel)
> >>
> >> Following compares this kernel port ('take 6') vs original miniLZO code:
> >>
> >> 'TinyLZO' refers to this kernel port.
> >>
> >> 10000 run averages:
> >> 'Tiny LZO':
> >> Combined: 61.2223 usec
> >> Compression: 41.8412 usec
> >> Decompression: 19.3811 usec
> >> 'miniLZO':
> >> Combined: 66.0444 usec
> >> Compression: 46.6323 usec
> >> Decompression: 19.4121 usec
> >>
> >> Result:
> >> Overall: TinyLZO is 7.3% faster
> >> Compressor: TinyLZO is 10.2% faster
> >> Decompressor: TinyLZO is 0.15% faster
>
> So your the compressor of your version runs 10.2% faster than the
> original version.
>
> That's a huge difference.
>
> Why exactly is it that much faster?
All I've done is write the code used for testing the compressor in userspace,
but I think it might have something to do with the NOP that cpu_to_le16() is
on LE systems (the original has an open-coded byte-swap and I don't think it
ever really gets passed) and the removal of a lot of un-needed casts.
(version 5 of the code re-introduced one of the macro's that had a lot of
casts and it showed a drastic slow-down). Anyway, I'm going to get back into
my testbed and see about actually making a lot of the stuff that is currently
just a NOP (I made exceedingly minor changes to the files to get the working
in userspace - one of the changes was to define the kernel-specific macros's
likely(), unlikely(), noinline and cpu_to_le16() to be NOP's.) into
functional code. That set of changes may affect the performance and bring it
closer to the original or it may not.
And the testbed code is open - I guess I should stick the GPL headers in it,
even though it isn't likely to see much use outside a very small circle. If
*anyone* is skeptical about the numbers, feel free to use the testbed
yourself. (It currently only works as intended on LE systems, however) Just
untar the source into a directory, run make and the run 'tester.pl' to
actually perform the test runs and gather the numbers. There is no overhead
included in the numbers - they come directly from running gettimeofday()
before and after the calls to the respective functions (well, a bit of magic
is done to make the numbers have meaning).
DRH
PS: the variable '$RUNS' in tester.pl determines how many times the script
loops each program to generate the numbers. I'm currently using 10000 runs to
generate the average timings, but if you feel that is too large or too small
a sample set, go ahead and change it.
On Monday 28 May 2007 11:47:55 Nitin Gupta wrote:
> On 5/28/07, Adrian Bunk <[email protected]> wrote:
> > On Mon, May 28, 2007 at 08:10:31PM +0530, Nitin Gupta wrote:
> > > Hi,
> > >
> > > Attached is tester code used for testing.
> > > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > > version for 'TinyLZO')
> > >
> > > Cheers,
> > > Nitin
> > >
> > > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > >> (Using tester program from Daniel)
> > >>
> > >> Following compares this kernel port ('take 6') vs original miniLZO
> > >> code:
> > >>
> > >> 'TinyLZO' refers to this kernel port.
> > >>
> > >> 10000 run averages:
> > >> 'Tiny LZO':
> > >> Combined: 61.2223 usec
> > >> Compression: 41.8412 usec
> > >> Decompression: 19.3811 usec
> > >> 'miniLZO':
> > >> Combined: 66.0444 usec
> > >> Compression: 46.6323 usec
> > >> Decompression: 19.4121 usec
> > >>
> > >> Result:
> > >> Overall: TinyLZO is 7.3% faster
> > >> Compressor: TinyLZO is 10.2% faster
> > >> Decompressor: TinyLZO is 0.15% faster
> >
> > So your the compressor of your version runs 10.2% faster than the
> > original version.
> >
> > That's a huge difference.
> >
> > Why exactly is it that much faster?
> >
> > cu
> > Adrian
>
> I am not sure on how to account for this _big_ perf. gain but from
> benchmarks I see that whenever I remove unncessary casting from tight
> loops I get perf. gain of 1-2%. For e.g. open coding
> LZO_CHECK_MPOS_NON_DET macro with all unnecessary casting removed,
> gave perf. gain of ~2%. Similarly, I found many other places where
> such casting was unnecessary.
This is my guess as well. Though performance will likely drop when I make the
noinline macro mean something. (This may be offset by figuring out a way to
make likely() and unlikely() also have a meaningful effect in userspace).
This benchmark should be run on BE machines, but I'm still trying to figure
out a way to open-code a *fast* and *reliable* version of cpu_to_le16.
Suggestions for the above bits will always be appreciated.
> These changes have been tested on x86, amd64, ppc. Testing of 'take 6'
> version is yet to be done - this will confirm that I didn't
> reintroduce some error.
I don't see that 'take 6' has had too much changed from 'take 4', although I
haven't do a full comparison.
DRH
On 5/28/07, Adrian Bunk <[email protected]> wrote:
> On Mon, May 28, 2007 at 08:36:44PM +0530, Nitin Gupta wrote:
> >...
> > So, before this change, it will be good if it gets merged in mainline
> > and tested, at least for correctness, on all supported achs. All the
> > while, we will have a good feeling that there is still a good scope
> > for perf improvement :)
>
> The correct order is:
> - create one version with all the optimizations you have in mind
Already done. One more optimization is regarding use of memcpy() in
place of COPY4() macros and open byte-by-byte copying. There are some
places where it's very hard to get it correct without adding
additional checks on various values which casues futher overhead by
iteslf - even then I could not get them correct so I decided not to go
with this particular optimization by myself.
> - then ensure that it works correctly on all architectures and
Already tested on x86, amd64, ppc (by Bret). I do not have machines
from other archs available. Bret tested 'take 3' version but no
changes were introduced in further revisions that could affect
correctness - but still it will be good to have this version tested
too. Only with inclusion in -mm and testing by much wider user base
can make it to mainline (I suppose nobody uses -mm for production use
anyway).
> document why your version is that much faster than the original
> version and why you know your optimizations have no side effects
Replied in earlier mail regarding this.
> - then get it tested in -mm
>
This is what I am looking for :)
> After these steps, you can start considering getting it into mainline.
>
- Nitin
On Mon, May 28, 2007 at 11:55:14AM -0400, Daniel Hazelton wrote:
>...
> This is my guess as well. Though performance will likely drop when I make the
> noinline macro mean something. (This may be offset by figuring out a way to
> make likely() and unlikely() also have a meaningful effect in userspace).
>...
What is your problem?
The likely/unlikely macros aren't in any way depending on any kernel
infrastructure.
> DRH
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> On 5/28/07, Adrian Bunk <[email protected]> wrote:
>...
>> - then ensure that it works correctly on all architectures and
>
> Already tested on x86, amd64, ppc (by Bret). I do not have machines
> from other archs available. Bret tested 'take 3' version but no
> changes were introduced in further revisions that could affect
> correctness - but still it will be good to have this version tested
> too. Only with inclusion in -mm and testing by much wider user base
> can make it to mainline (I suppose nobody uses -mm for production use
> anyway).
>
>> document why your version is that much faster than the original
>> version and why you know your optimizations have no side effects
>
> Replied in earlier mail regarding this.
>...
I have not seen any explanations:
- Why did the upstream author write the code that way?
- Why are your changes correct?
- Why do your changes allow the compiler to produce faster code?
The upstream author of the code is available - and he might be able to
help you getting answers. No matter whether your changes are incorrect
or correct optimizations that should go upstream, in both cases
discussing these issues with upstream is the best solution.
And testing is nice, but if you broke some case that's outside of your
tests you'll never notice.
> - Nitin
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Monday 28 May 2007 13:01:09 Adrian Bunk wrote:
> On Mon, May 28, 2007 at 11:55:14AM -0400, Daniel Hazelton wrote:
> >...
> > This is my guess as well. Though performance will likely drop when I make
> > the noinline macro mean something. (This may be offset by figuring out a
> > way to make likely() and unlikely() also have a meaningful effect in
> > userspace). ...
>
> What is your problem?
>
> The likely/unlikely macros aren't in any way depending on any kernel
> infrastructure.
That I'm just plain lazy and haven't felt like pulling them out of the kernel
sources ?
Actually, that is the case - and I wasn't exactly sure that likely() and
unlikely() were completely decoupled from the kernel's infrastructure.
DRH
> > DRH
>
> cu
> Adrian
On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
> On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > On 5/28/07, Adrian Bunk <[email protected]> wrote:
> >...
> >
> >> - then ensure that it works correctly on all architectures and
> >
> > Already tested on x86, amd64, ppc (by Bret). I do not have machines
> > from other archs available. Bret tested 'take 3' version but no
> > changes were introduced in further revisions that could affect
> > correctness - but still it will be good to have this version tested
> > too. Only with inclusion in -mm and testing by much wider user base
> > can make it to mainline (I suppose nobody uses -mm for production use
> > anyway).
> >
> >> document why your version is that much faster than the original
> >> version and why you know your optimizations have no side effects
> >
> > Replied in earlier mail regarding this.
> >...
>
> I have not seen any explanations:
> - Why did the upstream author write the code that way?
> - Why are your changes correct?
> - Why do your changes allow the compiler to produce faster code?
Would it help if I added instrumentation code to both the upstream code and
the stripped down code in question?
> The upstream author of the code is available - and he might be able to
> help you getting answers. No matter whether your changes are incorrect
> or correct optimizations that should go upstream, in both cases
> discussing these issues with upstream is the best solution.
>
> And testing is nice, but if you broke some case that's outside of your
> tests you'll never notice.
Any suggestions for tests I could add to the boilerplate I've been using to
benchmark the code? I'll add a few tests that use random input rather than
one of the source files - I'd add a test to see how it deals with NULL input,
but I ran into that in an early version of the testbed (where I had a typo
that stopped the code from working) and the compressor didn't like it. And I
suppose I could put in a test that reads in a chunk of /dev/mem or have it
get a chunk of data from HAL/DBus as well.
But if anyone has a suggestion of what tests I could toss at it to really
stress the code, let me know and I'll get to work extending this code so it
can stress test both the standard implementation of miniLZO *and* this
stripped down 'Tiny' version.
DRH
> > - Nitin
>
> cu
> Adrian
On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
> On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > On 5/28/07, Adrian Bunk <[email protected]> wrote:
> >...
> >
> >> - then ensure that it works correctly on all architectures and
> >
> > Already tested on x86, amd64, ppc (by Bret). I do not have machines
> > from other archs available. Bret tested 'take 3' version but no
> > changes were introduced in further revisions that could affect
> > correctness - but still it will be good to have this version tested
> > too. Only with inclusion in -mm and testing by much wider user base
> > can make it to mainline (I suppose nobody uses -mm for production use
> > anyway).
> >
> >> document why your version is that much faster than the original
> >> version and why you know your optimizations have no side effects
With likely(), unlikely() and noinline *not* defined as NOP's performance
drops:
10000 run averages:
'Tiny LZO':
Combined: 84.9292 usec
Compression: 42.4646 usec
Decompression: 42.4646 usec
'miniLZO':
Combined: 61.3548 usec
Compression: 43.5648 usec
Decompression: 17.79 usec
However, I'm worried that my testbed code - likely the Perl script that
actually loops the test code and collects its output - is somehow faulting,
as the way that the Compression and Decompression code have the exact same
value.
I'm going to toss some debugging output in the script and see if I can spot
the problem.
DRH
On Monday 28 May 2007 16:18:40 Daniel Hazelton wrote:
> On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
> > On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > > On 5/28/07, Adrian Bunk <[email protected]> wrote:
> > >...
> > >
> > >> - then ensure that it works correctly on all architectures and
> > >
> > > Already tested on x86, amd64, ppc (by Bret). I do not have machines
> > > from other archs available. Bret tested 'take 3' version but no
> > > changes were introduced in further revisions that could affect
> > > correctness - but still it will be good to have this version tested
> > > too. Only with inclusion in -mm and testing by much wider user base
> > > can make it to mainline (I suppose nobody uses -mm for production use
> > > anyway).
> > >
> > >> document why your version is that much faster than the original
> > >> version and why you know your optimizations have no side effects
>
> With likely(), unlikely() and noinline *not* defined as NOP's performance
> drops:
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 84.9292 usec
> Compression: 42.4646 usec
> Decompression: 42.4646 usec
> 'miniLZO':
> Combined: 61.3548 usec
> Compression: 43.5648 usec
> Decompression: 17.79 usec
>
> However, I'm worried that my testbed code - likely the Perl script that
> actually loops the test code and collects its output - is somehow faulting,
> as the way that the Compression and Decompression code have the exact same
> value.
>
> I'm going to toss some debugging output in the script and see if I can spot
> the problem.
>
Okay, checked my code and it was all a problem with cpu_to_le16 getting
defined fully instead of just being a NOP. With that problem corrected the
performance returns:
10000 run averages:
'Tiny LZO':
Combined: 60.1028 usec
Compression: 42.0652 usec
Decompression: 18.0376 usec
'miniLZO':
Combined: 61.0932 usec
Compression: 43.4382 usec
Decompression: 17.655 usec
Combined average shows 'Tiny' to be 1.6% faster
Compression in 'Tiny' is 3.2% faster
Decompression in 'Tiny' is 2.2% slower
All in all, the trade-off in this code, with the overall performance of
the 'Tiny' code being faster than the stock miniLZO code I'd like to say that
I'm certain that the decompressor could probably be sped up more, although I
don't know of a place in the kernel where less than half a microsecond would
make a massive impact. (Only place I can think of where this might have a
negative is in SLAB/SLOB/SLUB, and I don't think that a low-level memory
manager like those is a place for a compression algorithm anyway)
Later today or tommorrow I'll start putting together another part to
this "test-bed" for stress-testing the algorithm. Currently I only have a few
ideas for tests:
(for benchmarking)
1) Random input to compressor
2) large input data
3) real-world data (mmap()'d chunk of /dev/mem as input)
(for stability checking)
4) deliberate corruption of compressed data
5) early finish (realloc() compressed data buffer to less than the full size
of the data)
6) late start (change pointer to start of compressed data so the decompressor
doesn't start at the beginning)
When I have a better idea of how the LZO algorithm works I might try creating
an input data-set for the decompressor that would deliberately overflow the
output buffer. This is supposed to be caught by bounds-checking in
the '_safe' version of the decompressor (the version used in 'Tiny' and
used in the benchmarking of miniLZO), however it might be possible to trick
the decompressor. (If I do manage this and it crashes both 'Tiny'
and 'miniLZO' I *will* report it to Herr Oberhumer as well as see if I can
come up with a quick patch for the problem).
As I said before, any suggestions for how to improve the benchmark code are
welcome, as are suggestions for tests I can add to it for stressing the
new 'TinyLZO' implementation. Latest version of the test-bed attached.
Significant changes include:
likely(), unlikely() and noinline are no longer NOP's
when the marked line in helpers.h is commented out, cpu_to_le16 functions as
expected and is no longer a NOP. (Yes, that means this code is now fully
capable of working correctly on a BE system).
DRH
On 5/28/07, Nitin Gupta <[email protected]> wrote:
> Hi,
>
> This is kernel port of LZO1X-1 compressor and LZO1X decompressor (safe
> version only).
>
> * Changes since 'take 5' (Full Changelog after this):
> - Added compressor and decomrpesssor as separate and hidden config
> options (default: n)
> - Cleanups: removed LZO_CHECK_MPOS_NON_DET macro
> - removed PTR* macros.
>
> * Benchmarks: (system: P4 3GHz, 1G RAM)
> (Using tester program from Daniel)
>
> Following compares this kernel port ('take 6') vs original miniLZO code:
>
> 'TinyLZO' refers to this kernel port.
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 61.2223 usec
> Compression: 41.8412 usec
> Decompression: 19.3811 usec
> 'miniLZO':
> Combined: 66.0444 usec
> Compression: 46.6323 usec
> Decompression: 19.4121 usec
>
> Result:
> Overall: TinyLZO is 7.3% faster
> Compressor: TinyLZO is 10.2% faster
> Decompressor: TinyLZO is 0.15% faster
>
> TODO: test 'take 6' port on archs other than x86(_32)
>
> * Changelog vs. original LZO:
> 1) Used standard/kernel defined data types: (this eliminated _huge_
> #ifdef chunks)
> lzo_bytep -> unsigned char *
> lzo_uint -> size_t
> lzo_xint -> size_t
> lzo_uint32p -> u32 *
> lzo_uintptr_t -> unsigned long
> 2) Removed everything #ifdef'ed under COPY_DICT (this is not set for
> LZO1X, so removed corres. parts).
> 3) Removed code #ifdef'ed for LZO1Y, LZO1Z, other variants.
> 4) Reformatted the code to match general kernel style.
> 5) The only code change: (as suggested by Andrey)
> -#if defined(__LITTLE_ENDIAN)
> - m_pos = op - 1;
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos = op - 1;
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
>
> (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> *** Need testing on big endian machine ***
>
> Similarly:
> -#if defined(__LITTLE_ENDIAN)
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;
>
> 6) Get rid of LZO_CHECK_MPOS_NON_DET macro and PTR* macros.
>
>
> I think it's now ready for mainline inclusion.
>
>
> include/linux/lzo1x.h | 66 +++++++++++
> lib/Kconfig | 6 +
> lib/Makefile | 2 +
> lib/lzo1x/Makefile | 2 +
> lib/lzo1x/lzo1x_compress.c | 259
tested this on ppc and its still good
is there any reason to bother with a test on amd64?
if there is I might be able to get to it tonight
On 5/28/07, Nitin Gupta <[email protected]> wrote:
> Hi,
>
> Attached is tester code used for testing.
> (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> version for 'TinyLZO')
>
> Cheers,
> Nitin
>
> On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > (Using tester program from Daniel)
> >
> > Following compares this kernel port ('take 6') vs original miniLZO code:
> >
> > 'TinyLZO' refers to this kernel port.
> >
> > 10000 run averages:
> > 'Tiny LZO':
> > Combined: 61.2223 usec
> > Compression: 41.8412 usec
> > Decompression: 19.3811 usec
> > 'miniLZO':
> > Combined: 66.0444 usec
> > Compression: 46.6323 usec
> > Decompression: 19.4121 usec
> >
> > Result:
> > Overall: TinyLZO is 7.3% faster
> > Compressor: TinyLZO is 10.2% faster
> > Decompressor: TinyLZO is 0.15% faster
> >
>
>
10000 run averages:
'Tiny LZO':
Combined: 112.6642 usec
Compression: 56.3321 usec
Decompression: 56.3321 usec
'miniLZO':
Combined: 77.6642 usec
Compression: 56.3416 usec
Decompression: 21.3226 usec
now the interesting bit I thought was the following
root@mini:/usr/src/lzo1x-test-4# ./fulltest
[test_lzo.c::compress (93)] run took 42 microseconds
[test_lzo.c::decompress (127)] run took 20 microseconds
root@mini:/usr/src/lzo1x-test-4# ./tinytest
[test.c::compress (91)] run took 44 microseconds
[test.c::decompress (117)] BUG: lzo1x_decompress has failed ( t == -6 )
[test.c::main (149)] BUG: Decompression routine failure
On 5/28/07, Bret Towe <[email protected]> wrote:
> On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > Hi,
> >
> > This is kernel port of LZO1X-1 compressor and LZO1X decompressor (safe
> > version only).
> >
> > * Changes since 'take 5' (Full Changelog after this):
> > - Added compressor and decomrpesssor as separate and hidden config
> > options (default: n)
> > - Cleanups: removed LZO_CHECK_MPOS_NON_DET macro
> > - removed PTR* macros.
> >
> > * Benchmarks: (system: P4 3GHz, 1G RAM)
> > (Using tester program from Daniel)
> >
> > Following compares this kernel port ('take 6') vs original miniLZO code:
> >
> > 'TinyLZO' refers to this kernel port.
> >
> > 10000 run averages:
> > 'Tiny LZO':
> > Combined: 61.2223 usec
> > Compression: 41.8412 usec
> > Decompression: 19.3811 usec
> > 'miniLZO':
> > Combined: 66.0444 usec
> > Compression: 46.6323 usec
> > Decompression: 19.4121 usec
> >
> > Result:
> > Overall: TinyLZO is 7.3% faster
> > Compressor: TinyLZO is 10.2% faster
> > Decompressor: TinyLZO is 0.15% faster
> >
> > TODO: test 'take 6' port on archs other than x86(_32)
> >
> > * Changelog vs. original LZO:
> > 1) Used standard/kernel defined data types: (this eliminated _huge_
> > #ifdef chunks)
> > lzo_bytep -> unsigned char *
> > lzo_uint -> size_t
> > lzo_xint -> size_t
> > lzo_uint32p -> u32 *
> > lzo_uintptr_t -> unsigned long
> > 2) Removed everything #ifdef'ed under COPY_DICT (this is not set for
> > LZO1X, so removed corres. parts).
> > 3) Removed code #ifdef'ed for LZO1Y, LZO1Z, other variants.
> > 4) Reformatted the code to match general kernel style.
> > 5) The only code change: (as suggested by Andrey)
> > -#if defined(__LITTLE_ENDIAN)
> > - m_pos = op - 1;
> > - m_pos -= (*(const unsigned short *)ip) >> 2;
> > -#else
> > - m_pos = op - 1;
> > - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> > -#endif
> >
> > + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
> >
> > (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> > *** Need testing on big endian machine ***
> >
> > Similarly:
> > -#if defined(__LITTLE_ENDIAN)
> > - m_pos -= (*(const unsigned short *)ip) >> 2;
> > -#else
> > - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> > -#endif
> >
> > + m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;
> >
> > 6) Get rid of LZO_CHECK_MPOS_NON_DET macro and PTR* macros.
> >
> >
> > I think it's now ready for mainline inclusion.
> >
> >
> > include/linux/lzo1x.h | 66 +++++++++++
> > lib/Kconfig | 6 +
> > lib/Makefile | 2 +
> > lib/lzo1x/Makefile | 2 +
> > lib/lzo1x/lzo1x_compress.c | 259
>
> tested this on ppc and its still good
> is there any reason to bother with a test on amd64?
> if there is I might be able to get to it tonight
>
forgot to mention that it would make life easier if you would put the
version of the patch
in the .bz2 files you attach since you poping out new takes so quick
On 5/29/07, Bret Towe <[email protected]> wrote:
> On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > Hi,
> >
> > Attached is tester code used for testing.
> > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > version for 'TinyLZO')
> >
> > Cheers,
> > Nitin
> >
> > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > > (Using tester program from Daniel)
> > >
> > > Following compares this kernel port ('take 6') vs original miniLZO code:
> > >
> > > 'TinyLZO' refers to this kernel port.
> > >
> > > 10000 run averages:
> > > 'Tiny LZO':
> > > Combined: 61.2223 usec
> > > Compression: 41.8412 usec
> > > Decompression: 19.3811 usec
> > > 'miniLZO':
> > > Combined: 66.0444 usec
> > > Compression: 46.6323 usec
> > > Decompression: 19.4121 usec
> > >
> > > Result:
> > > Overall: TinyLZO is 7.3% faster
> > > Compressor: TinyLZO is 10.2% faster
> > > Decompressor: TinyLZO is 0.15% faster
> > >
> >
> >
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 112.6642 usec
> Compression: 56.3321 usec
> Decompression: 56.3321 usec
> 'miniLZO':
> Combined: 77.6642 usec
> Compression: 56.3416 usec
> Decompression: 21.3226 usec
>
> now the interesting bit I thought was the following
> root@mini:/usr/src/lzo1x-test-4# ./fulltest
> [test_lzo.c::compress (93)] run took 42 microseconds
> [test_lzo.c::decompress (127)] run took 20 microseconds
> root@mini:/usr/src/lzo1x-test-4# ./tinytest
> [test.c::compress (91)] run took 44 microseconds
> [test.c::decompress (117)] BUG: lzo1x_decompress has failed ( t == -6 )
> [test.c::main (149)] BUG: Decompression routine failure
>
Did you use x86 for above test? Maybe some problem with testing
script? What data did you use for this test?
- Nitin
On 5/28/07, Adrian Bunk <[email protected]> wrote:
> On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > On 5/28/07, Adrian Bunk <[email protected]> wrote:
<snip>
>
> I have not seen any explanations:
> - Why did the upstream author write the code that way?
> - Why are your changes correct?
> - Why do your changes allow the compiler to produce faster code?
>
> The upstream author of the code is available - and he might be able to
> help you getting answers. No matter whether your changes are incorrect
> or correct optimizations that should go upstream, in both cases
> discussing these issues with upstream is the best solution.
The changelog I posted along with patch mentions all the changes I
made. I thought we will find all problems with this changelog in hand
and considering that its just ~500 LOC. But still, ok, asking
author himself will be good if he replies. I will mail him detailed
changelog and seek his feedback on this. This should answer all of
your questions.
>
> And testing is nice, but if you broke some case that's outside of your
> tests you'll never notice.
>
Yes. We cannot come up with exhaustive set of test cases to cover all
cases. But assuming that _original_ version is right and taking the
chagelog we should be able to verify if the porting is correct.
- Nitin
On 5/29/07, Bret Towe <[email protected]> wrote:
>
> tested this on ppc and its still good
> is there any reason to bother with a test on amd64?
> if there is I might be able to get to it tonight
>
Yes, this test is desired on 'take 6'
(In future I will append version to patch bz2)
Thanks,
Nitin
On 28 May 2007, at 18:11, Adrian Bunk wrote:
> I have not seen any explanations:
> - Why did the upstream author write the code that way?
Apparently due to his requirement for extreme portability. The
original code was designed to work on everything from 16-bit DOS
through CRAY supercomputers through Windows, Unices and Linux.
The author has stated on the thread that it's a good idea to remove
unnecessary ifdefs when porting the code into the kernel, given that
the portability requirements are obviously no longer needed.
Michael-Luke
On 28 May 2007, at 15:34, Nitin Gupta wrote:
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 2e7ae6b..eb95eaa 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -64,6 +64,12 @@ config ZLIB_INFLATE
> config ZLIB_DEFLATE
> tristate
>
> +config LZO1X_COMPRESS
> + tristate
> +
> +config LZO1X_DECOMPRESS
> + tristate
> +
> #
> # Generic allocator support is selected if needed
> #
> diff --git a/lib/Makefile b/lib/Makefile
> index c8c8e20..448ae37 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -49,6 +49,8 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
> obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
> obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
> obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
> +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x/
> +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x/
>
> obj-$(CONFIG_TEXTSEARCH) += textsearch.o
> obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
Hey there,
Is this syntax OK for Makefile use? I'm only asking out of personal
ignorance and because I'd use:
Kconfig:
config LZO1X_COMPRESS
select LZO1X
tristate
config LZO1X_DECOMPRESS
select LZO1X
tristate
config LZO1X
tristate (or boolean??)
Makefile:
obj-$(CONFIG_LZO1X) += lzo1x/
Thanks,
Michael-Luke
On 5/29/07, Michael-Luke Jones <[email protected]> wrote:
> On 28 May 2007, at 15:34, Nitin Gupta wrote:
>
> > diff --git a/lib/Kconfig b/lib/Kconfig
> > index 2e7ae6b..eb95eaa 100644
> > --- a/lib/Kconfig
> > +++ b/lib/Kconfig
> > @@ -64,6 +64,12 @@ config ZLIB_INFLATE
> > config ZLIB_DEFLATE
> > tristate
> >
> > +config LZO1X_COMPRESS
> > + tristate
> > +
> > +config LZO1X_DECOMPRESS
> > + tristate
> > +
> > #
> > # Generic allocator support is selected if needed
> > #
> > diff --git a/lib/Makefile b/lib/Makefile
> > index c8c8e20..448ae37 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -49,6 +49,8 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
> > obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
> > obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
> > obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
> > +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x/
> > +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x/
> >
> > obj-$(CONFIG_TEXTSEARCH) += textsearch.o
> > obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
>
> Hey there,
>
> Is this syntax OK for Makefile use? I'm only asking out of personal
> ignorance and because I'd use:
> > +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x/
> > +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x/
This is syntactically correct (and wouldn't produce any build errors),
but it's quite ... strange, still. Why would I even want to /build/ the
compress code if all I selected was the decompress option?
Also, when both the options are y, make would just make the first
target, then recurse back into that same directory again and skip
it because it just made it ... which, although harmless, is ... strange.
So I'm not sure if we really want both the options to resolve to the
same make target -- the kernel typically has more users of
_decompress_ than those of _compress_, so it does make some sense
to completely decouple the two into separate directories in lib/ and have
separate Kconfig options (as well as Makefile target subdirs) for them.
> Kconfig:
>
> config LZO1X_COMPRESS
> select LZO1X
> tristate
>
> config LZO1X_DECOMPRESS
> select LZO1X
> tristate
>
> config LZO1X
> tristate (or boolean??)
Hmmm ... if this is /really/ what we want to do, what's wrong with simply:
config LZO1x
tristate
?
> Makefile:
>
> obj-$(CONFIG_LZO1X) += lzo1x/
I find nothing wrong with the style being used for ZLIB presently. Agreed,
users of ZLIB do typically want to select only decompress *without*
compress whereas there's no such user for LZO currently, but I see
nothing wrong in just copying over ZLIB's style for LZO too (i.e. decoupling
the compress and decompress code into separate subdirs in lib/).
And if not, then why have two separate (dummy) Kconfig options either?
On 29 May 2007, at 11:41, Satyam Sharma wrote:
> This is syntactically correct (and wouldn't produce any build errors),
> but it's quite ... strange, still. Why would I even want to /build/
> the
> compress code if all I selected was the decompress option?
Apologies, you gave me the answer I was looking for (make is
'intelligent' enough not to build the same file twice in this
situation...) but I think you may have missed the makefile in the
lzo1x directory:
> diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
> new file mode 100644
> index 0000000..7b56a4d
> --- /dev/null
> +++ b/lib/lzo1x/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x_compress.o
> +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x_decompress.o
Thus, the Kconfig options for compress/decompress won't be simply
'dummy' options...
Given the shared private header between the compress and decompress
code, I don't think there is any need to separate the code into two
directories a la the zlib code.
Sorry for noise,
Michael-Luke
On 5/29/07, Michael-Luke Jones <[email protected]> wrote:
> On 29 May 2007, at 11:41, Satyam Sharma wrote:
>
> > This is syntactically correct (and wouldn't produce any build errors),
> > but it's quite ... strange, still. Why would I even want to /build/
> > the
> > compress code if all I selected was the decompress option?
>
> Apologies, you gave me the answer I was looking for (make is
> 'intelligent' enough not to build the same file twice in this
> situation...) but I think you may have missed the makefile in the
> lzo1x directory:
>
> > diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
> > new file mode 100644
> > index 0000000..7b56a4d
> > --- /dev/null
> > +++ b/lib/lzo1x/Makefile
> > @@ -0,0 +1,2 @@
> > +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x_compress.o
> > +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x_decompress.o
Ah, yes, I only saw the mail and didn't check lib/lzo1x/Makefile, thanks.
So effectively, we've done precisely what zlib does too ... only, in a
smarter way!
> Thus, the Kconfig options for compress/decompress won't be simply
> 'dummy' options...
>
> Given the shared private header between the compress and decompress
> code, I don't think there is any need to separate the code into two
> directories a la the zlib code.
Right, actually, zlib could be switched over to this style, in fact.
Because zlib_deflate/ and zlib_inflate/ too share a private header
zutil.h which has unfortunately been stuck into include/linux/ with a big
/* WARNING: this file should *not* be used by applications. */ comment ...
Satyam
On Tue, May 29, 2007 at 09:08:27AM +0100, Michael-Luke Jones wrote:
> On 28 May 2007, at 18:11, Adrian Bunk wrote:
>
>> I have not seen any explanations:
>> - Why did the upstream author write the code that way?
>
> Apparently due to his requirement for extreme portability. The original
> code was designed to work on everything from 16-bit DOS through CRAY
> supercomputers through Windows, Unices and Linux.
Sure, this could be the reason in some or all cases.
The upstream author knows the code best, and discussing such issues with
him will in many cases be a win:
It could be that there was in some cases no good reason, and the
upstream code that gets used by many other projects could become faster.
Or there was a good reason that applies also to the in-kernel version
and a change breaks some corner case.
> The author has stated on the thread that it's a good idea to remove
> unnecessary ifdefs when porting the code into the kernel, given that the
> portability requirements are obviously no longer needed.
"remove unnecessary ifdefs" implies "generated code is identical".
That's quite different from "code is 10% faster".
> Michael-Luke
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On 5/29/07, Adrian Bunk <[email protected]> wrote:
> On Tue, May 29, 2007 at 09:08:27AM +0100, Michael-Luke Jones wrote:
> > On 28 May 2007, at 18:11, Adrian Bunk wrote:
> >
> >> I have not seen any explanations:
> >> - Why did the upstream author write the code that way?
> >
> > Apparently due to his requirement for extreme portability. The original
> > code was designed to work on everything from 16-bit DOS through CRAY
> > supercomputers through Windows, Unices and Linux.
>
> Sure, this could be the reason in some or all cases.
>
> The upstream author knows the code best, and discussing such issues with
> him will in many cases be a win:
>
> It could be that there was in some cases no good reason, and the
> upstream code that gets used by many other projects could become faster.
>
> Or there was a good reason that applies also to the in-kernel version
> and a change breaks some corner case.
>
I have mailed the author with detailed changelog - waiting for reply.
> > The author has stated on the thread that it's a good idea to remove
> > unnecessary ifdefs when porting the code into the kernel, given that the
> > portability requirements are obviously no longer needed.
>
> "remove unnecessary ifdefs" implies "generated code is identical".
>
> That's quite different from "code is 10% faster".
>
Daniel made some changes to his testing code and now the perf gain is just 1.6%.
- Nitin
On Tuesday 29 May 2007 01:48:29 Nitin Gupta wrote:
> On 5/29/07, Bret Towe <[email protected]> wrote:
> > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > > Hi,
> > >
> > > Attached is tester code used for testing.
> > > (developed by Daniel Hazelton -- modified slightly to now use 'take 6'
> > > version for 'TinyLZO')
> > >
> > > Cheers,
> > > Nitin
> > >
> > > On 5/28/07, Nitin Gupta <[email protected]> wrote:
> > > > (Using tester program from Daniel)
> > > >
> > > > Following compares this kernel port ('take 6') vs original miniLZO
> > > > code:
> > > >
> > > > 'TinyLZO' refers to this kernel port.
> > > >
> > > > 10000 run averages:
> > > > 'Tiny LZO':
> > > > Combined: 61.2223 usec
> > > > Compression: 41.8412 usec
> > > > Decompression: 19.3811 usec
> > > > 'miniLZO':
> > > > Combined: 66.0444 usec
> > > > Compression: 46.6323 usec
> > > > Decompression: 19.4121 usec
> > > >
> > > > Result:
> > > > Overall: TinyLZO is 7.3% faster
> > > > Compressor: TinyLZO is 10.2% faster
> > > > Decompressor: TinyLZO is 0.15% faster
> >
> > 10000 run averages:
> > 'Tiny LZO':
> > Combined: 112.6642 usec
> > Compression: 56.3321 usec
> > Decompression: 56.3321 usec
> > 'miniLZO':
> > Combined: 77.6642 usec
> > Compression: 56.3416 usec
> > Decompression: 21.3226 usec
> >
> > now the interesting bit I thought was the following
> > root@mini:/usr/src/lzo1x-test-4# ./fulltest
> > [test_lzo.c::compress (93)] run took 42 microseconds
> > [test_lzo.c::decompress (127)] run took 20 microseconds
> > root@mini:/usr/src/lzo1x-test-4# ./tinytest
> > [test.c::compress (91)] run took 44 microseconds
> > [test.c::decompress (117)] BUG: lzo1x_decompress has failed ( t == -6 )
> > [test.c::main (149)] BUG: Decompression routine failure
>
> Did you use x86 for above test? Maybe some problem with testing
> script? What data did you use for this test?
>
>
> - Nitin
Using any version of the test code below 5 on a non-LE system would result in
this error. Those earlier versions couldn't handle BE systems because I was
too lazy to define cpu_to_le16. (And yes, Nitin, that does appear to be the
correct conversion to use)
DRH
On Tuesday 29 May 2007 08:03:55 Nitin Gupta wrote:
> On 5/29/07, Adrian Bunk <[email protected]> wrote:
> > On Tue, May 29, 2007 at 09:08:27AM +0100, Michael-Luke Jones wrote:
> > > On 28 May 2007, at 18:11, Adrian Bunk wrote:
> > >> I have not seen any explanations:
> > >> - Why did the upstream author write the code that way?
> > >
> > > Apparently due to his requirement for extreme portability. The original
> > > code was designed to work on everything from 16-bit DOS through CRAY
> > > supercomputers through Windows, Unices and Linux.
> >
> > Sure, this could be the reason in some or all cases.
> >
> > The upstream author knows the code best, and discussing such issues with
> > him will in many cases be a win:
> >
> > It could be that there was in some cases no good reason, and the
> > upstream code that gets used by many other projects could become faster.
> >
> > Or there was a good reason that applies also to the in-kernel version
> > and a change breaks some corner case.
>
> I have mailed the author with detailed changelog - waiting for reply.
>
> > > The author has stated on the thread that it's a good idea to remove
> > > unnecessary ifdefs when porting the code into the kernel, given that
> > > the portability requirements are obviously no longer needed.
> >
> > "remove unnecessary ifdefs" implies "generated code is identical".
> >
> > That's quite different from "code is 10% faster".
>
> Daniel made some changes to his testing code and now the perf gain is just
> 1.6%.
>
> - Nitin
The changes for version 5 are all in one file - helpers.h. All I did was
define the macros that had previously been NOP's. Most of the performance
loss, IMHO, is due to the compression and decompression functions being
marked noinline. The currently reported 1.6% overall performance boost is
likely related to the use of likely()/unlikely().
DRH
On 29 May 2007, at 12:27, Satyam Sharma wrote:
> Right, actually, zlib could be switched over to [using a common
> directory].
> Because zlib_deflate/ and zlib_inflate/ too share a private header
> zutil.h which has unfortunately been stuck into include/linux/ with
> a big
> /* WARNING: this file should *not* be used by applications. */
> comment ...
Well, unfortunately zutil.h is currently being used in-kernel by fs/
jffs2/compr_zlib.c despite the "WARNING: this file should *not* be
used by applications" notice.
So moving this header to a truly private location isn't possible
right now, unfortunately,
Michael-Luke Jones
[added David Whitehouse to cc]
On Tuesday 29 May 2007 09:33:51 Michael-Luke Jones wrote:
> On 29 May 2007, at 12:27, Satyam Sharma wrote:
> > Right, actually, zlib could be switched over to [using a common
> > directory].
> > Because zlib_deflate/ and zlib_inflate/ too share a private header
> > zutil.h which has unfortunately been stuck into include/linux/ with
> > a big
> > /* WARNING: this file should *not* be used by applications. */
> > comment ...
>
> Well, unfortunately zutil.h is currently being used in-kernel by fs/
> jffs2/compr_zlib.c despite the "WARNING: this file should *not* be
> used by applications" notice.
>
> So moving this header to a truly private location isn't possible
> right now, unfortunately,
>
> Michael-Luke Jones
> [added David Whitehouse to cc]
I've looked at that code and it seems to need that file for one constant.
Perhaps it'd be better for jffs2/compr_zlib.c to define that constant itself
(or use it as a "Magic Number") rather than include the zlib private header.
Another possibility would be to move that constant out of zutil.h and into
zconf.h or zlih.b - doing any of those would allow the zlib private header to
be moved such that zlib could be changed to use a common directory *and* have
said private header in that directory.
DRH
[ Trimmed Cc list; added original zlib authors. ]
> On Tuesday 29 May 2007 09:33:51 Michael-Luke Jones wrote:
> > On 29 May 2007, at 12:27, Satyam Sharma wrote:
> > > Right, actually, zlib could be switched over to [using a common
> > > directory].
> > > Because zlib_deflate/ and zlib_inflate/ too share a private header
> > > zutil.h which has unfortunately been stuck into include/linux/ with
> > > a big
> > > /* WARNING: this file should *not* be used by applications. */
> > > comment ...
> >
> > Well, unfortunately zutil.h is currently being used in-kernel by fs/
> > jffs2/compr_zlib.c despite the "WARNING: this file should *not* be
> > used by applications" notice.
Hmmm, either jffs2 thinks the zlib interfaces exposed through
include/linux/zlib.h are insufficient, or it isn't using zlib properly
(or at least the way it was supposed to be used :-)
Looking at fs/jffs2/compr_zlib.c, however, it just seems to be
an optimization -- skipping some checksum calculation if some
flag (PRESET_DICT) is absent from the input stream about to
be decompressed ...
This /looks/ like an optimization that might make sense for other
users of zlib too, in which case that entire check and
skip-the-checksum thing could actually be put into zlib proper. (?)
> > So moving this header to a truly private location isn't possible
> > right now, unfortunately,
> >
> > Michael-Luke Jones
> > [added David Whitehouse to cc]
On 5/29/07, Daniel Hazelton <[email protected]> wrote:
> I've looked at that code and it seems to need that file for one constant.
> Perhaps it'd be better for jffs2/compr_zlib.c to define that constant itself
> (or use it as a "Magic Number") rather than include the zlib private header.
> Another possibility would be to move that constant out of zutil.h and into
> zconf.h or zlih.b - doing any of those would allow the zlib private header to
> be moved such that zlib could be changed to use a common directory *and* have
> said private header in that directory.
Moving PRESET_DICT to zlib.h robs lib/zlib_deflate/deflate.c from
being able to resolve that macro. FWIW, a dirty hack could be to
simply duplicate it in zlib.h. But then, implementation and interface
are decoupled for good reasons ...
Satyam
On Tuesday 29 May 2007 11:15:41 Satyam Sharma wrote:
> [ Trimmed Cc list; added original zlib authors. ]
>
> > On Tuesday 29 May 2007 09:33:51 Michael-Luke Jones wrote:
> > > On 29 May 2007, at 12:27, Satyam Sharma wrote:
> > > > Right, actually, zlib could be switched over to [using a common
> > > > directory].
> > > > Because zlib_deflate/ and zlib_inflate/ too share a private header
> > > > zutil.h which has unfortunately been stuck into include/linux/ with
> > > > a big
> > > > /* WARNING: this file should *not* be used by applications. */
> > > > comment ...
> > >
> > > Well, unfortunately zutil.h is currently being used in-kernel by fs/
> > > jffs2/compr_zlib.c despite the "WARNING: this file should *not* be
> > > used by applications" notice.
>
> Hmmm, either jffs2 thinks the zlib interfaces exposed through
> include/linux/zlib.h are insufficient, or it isn't using zlib properly
> (or at least the way it was supposed to be used :-)
>
> Looking at fs/jffs2/compr_zlib.c, however, it just seems to be
> an optimization -- skipping some checksum calculation if some
> flag (PRESET_DICT) is absent from the input stream about to
> be decompressed ...
>
> This /looks/ like an optimization that might make sense for other
> users of zlib too, in which case that entire check and
> skip-the-checksum thing could actually be put into zlib proper. (?)
This *is* a good idea. If someone doesn't beat me to it I'll make sure I've
got the latest git and do up a patch that does this.
DRH
> > > So moving this header to a truly private location isn't possible
> > > right now, unfortunately,
> > >
> > > Michael-Luke Jones
> > > [added David Whitehouse to cc]
>
> On 5/29/07, Daniel Hazelton <[email protected]> wrote:
> > I've looked at that code and it seems to need that file for one constant.
> > Perhaps it'd be better for jffs2/compr_zlib.c to define that constant
> > itself (or use it as a "Magic Number") rather than include the zlib
> > private header. Another possibility would be to move that constant out of
> > zutil.h and into zconf.h or zlih.b - doing any of those would allow the
> > zlib private header to be moved such that zlib could be changed to use a
> > common directory *and* have said private header in that directory.
>
> Moving PRESET_DICT to zlib.h robs lib/zlib_deflate/deflate.c from
> being able to resolve that macro. FWIW, a dirty hack could be to
> simply duplicate it in zlib.h. But then, implementation and interface
> are decoupled for good reasons ...
>
> Satyam
On Tuesday 29 May 2007 01:58:43 Nitin Gupta wrote:
> On 5/29/07, Bret Towe <[email protected]> wrote:
> > tested this on ppc and its still good
> >
> > is there any reason to bother with a test on amd64?
> > if there is I might be able to get to it tonight
>
> Yes, this test is desired on 'take 6'
> (In future I will append version to patch bz2)
>
> Thanks,
> Nitin
I added a few functions to my test-bed to see how well the LZO code coped with
random input data. I'll extend my error checking so that it dumps all local
variables at the first error, but for the time being I'm staring at a problem
with what appears to be a double-free (in addition to lzo1x_decompress(_safe)
screaming about overruns when I pass it the buffer of compressed, totally
random data).
(I fill the buffer by doing a read from /dev/urandom, though I'll certainly
change it to use random() if anyone thinks that'll make a difference)
I'll post the version with the random-data tests when I get the double-free
solved. (I've looked and don't see how its happening)
DRH
On Tuesday 29 May 2007 16:14:34 Daniel Hazelton wrote:
> On Tuesday 29 May 2007 01:58:43 Nitin Gupta wrote:
> > On 5/29/07, Bret Towe <[email protected]> wrote:
> > > tested this on ppc and its still good
> > >
> > > is there any reason to bother with a test on amd64?
> > > if there is I might be able to get to it tonight
> >
> > Yes, this test is desired on 'take 6'
> > (In future I will append version to patch bz2)
> >
> > Thanks,
> > Nitin
>
> I added a few functions to my test-bed to see how well the LZO code coped
> with random input data. I'll extend my error checking so that it dumps all
> local variables at the first error, but for the time being I'm staring at a
> problem with what appears to be a double-free (in addition to
> lzo1x_decompress(_safe) screaming about overruns when I pass it the buffer
> of compressed, totally random data).
>
> (I fill the buffer by doing a read from /dev/urandom, though I'll certainly
> change it to use random() if anyone thinks that'll make a difference)
>
> I'll post the version with the random-data tests when I get the double-free
> solved. (I've looked and don't see how its happening)
>
> DRH
After some additional debugging the problem appears to be that both
lzo1x_decompress_safe (from miniLZO) and lzo1x_decompress (from the 'Tiny'
version) both are somehow corrupting memory.
*** glibc detected *** ./tinytest: double free or corruption (out): 0x08070e60
***
The buffer at 0x08070e60 is the one I allocated specifically for the new
random data tests. It has the same location when it is allocated as it is
when free()'d, so the corruption has to be coming from somewhere. The problem
*might* be in my code - that is, I might have made a mistake in the read()
from /dev/urandom. I'm going to go test that now.
DRH
On May 28 2007 20:04, Nitin Gupta wrote:
>
> * Changelog vs. original LZO:
> 1) Used standard/kernel defined data types: (this eliminated _huge_
> #ifdef chunks)
> lzo_bytep -> unsigned char *
> lzo_uint -> size_t
> lzo_xint -> size_t
Is this safe (as far as compressed LZO stream is concerned) --
or is it even needed (could it be unsigned int)?
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos = op - 1;
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
>
> (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> *** Need testing on big endian machine ***
On i386, both cpu_to_le16 and le16_to_cpu do nothing.
On sparc for example, cpu_to_leXX and leXX_to_cpu do 'the same' ;-)
they swap 1234<->4321.
It is the bytestream (ip) that is reinterpreted as uint16_t.
And I really doubt that the LZO author has a big-endian machine,
given the days of ubiquitous x86. le16_to_cpu it is.
Jan
--
On May 28 2007 19:11, Adrian Bunk wrote:
>
>I have not seen any explanations:
>- Why did the upstream author write the code that way?
I guess it's along the lines of
- portability
(note how this contradicts itself). Really. I have yet to
figure out why everyone invents their own xxx32_t types,
like e.g. glib. well, integer types, one might understand, but when it
comes to gchar or gpointer, that's just plain microsoft-style
(think LPCSTR and LPCVOID...)
Jan
--
On Tue, May 29, 2007 at 11:10:05PM +0200, Jan Engelhardt wrote:
>
> On May 28 2007 19:11, Adrian Bunk wrote:
> >
> >I have not seen any explanations:
> >- Why did the upstream author write the code that way?
>
> I guess it's along the lines of
> - portability
>
> (note how this contradicts itself). Really. I have yet to
> figure out why everyone invents their own xxx32_t types,
> like e.g. glib. well, integer types, one might understand, but when it
> comes to gchar or gpointer, that's just plain microsoft-style
> (think LPCSTR and LPCVOID...)
You completely miss the point of my question.
It's about the performance improvements of the modified code that were
mentioned.
What you are talking about shouldn't have any effect on the generated
code.
> Jan
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
All problems I was having with the test-bed code have been solved, and the
error I was running into was, as I suspected, in the code I used to fill the
buffer for the random-data test.
Results of running the new benchmark (version 6 of the benchmark, version 6
of 'tinyLZO'):
10000 run averages:
'Tiny LZO':
Combined: 104.1529 usec
Compression: 43.065 usec
Decompression: 18.4648 usec
Random Data Compression: 31.132 usec
Random Data Decompression: 11.4911 usec
'miniLZO':
Combined: 106.1363 usec
Compression: 44.6988 usec
Decompression: 18.3576 usec
Random Data Compression: 31.5772 usec
Random Data Decompression: 11.5027 usec
Percentages (calculated as: ((full - tiny)/full)*100):
Overall (combined totals): Tiny is 1.87 % faster
Compression: Tiny is 3.66 % faster
Decompression: Tiny is 0.58 % slower
Random Compression: Tiny is 1.41 % faster
Random Decompression: Tiny is 0.10 % faster
The results, on my system, are not consistent, except in that 'TinyLZO' is
generally faster in the non-random data tests than miniLZO. It did, three of
the five times I ran the test, perform (on average) about 1% worse than
miniLZO.
In order to sidestep any issues that a difference in the input data might have
caused, I'm going to rewrite the code so that all tests are run against the
same data. However, in the meantime, I've attached the latest version of my
test-code.
DRH
PS: the code is going to massively change as I work to include more data
sources for the benchmarking, as well as tests that will try to really stress
the stripped-down code.
I just noticed a bug in my testbed/benchmarking code. It's fixed, but I
decided to compare version 6 of the code against the *unsafe* decompressor
again. The results of the three runs I've put it through after changing it to
compare against the unsafe decompressor were startling. `Tiny's` compressor
is still faster - I've seen it be rated up to 3% faster. The decompressor,
OTOH, when compared to the unsafe version (which is the comparison that
started me on this binge of hacking), is more than 7% worse. About 11% slower
on the original test against a C source file, and about 6% slower for random
data. However, looking at the numbers involved, I can't see a reason to keep
the unsafe version around - the percentages look worse than they are - from 1
to 3 microseconds. (well, the compressed-cache people might want those extra
usecs - but the difference will never be noticeable anywhere outside the
kernel)
DRH
On 5/30/07, Daniel Hazelton <[email protected]> wrote:
> I just noticed a bug in my testbed/benchmarking code. It's fixed, but I
> decided to compare version 6 of the code against the *unsafe* decompressor
> again. The results of the three runs I've put it through after changing it to
> compare against the unsafe decompressor were startling. `Tiny's` compressor
> is still faster - I've seen it be rated up to 3% faster. The decompressor,
> OTOH, when compared to the unsafe version (which is the comparison that
> started me on this binge of hacking), is more than 7% worse. About 11% slower
> on the original test against a C source file, and about 6% slower for random
> data.
Unsafe vs safe is within 10%. Its okay.
> However, looking at the numbers involved, I can't see a reason to keep
> the unsafe version around - the percentages look worse than they are - from 1
> to 3 microseconds.
Not just numbers. Most of applications cannot afford to use unsafe
versions anyway (like fs people).
(well, the compressed-cache people might want those extra
> usecs - but the difference will never be noticeable anywhere outside the
> kernel)
>
> DRH
>
compressed cache people require every single percent of that
performance. For now, ccaching is not ready for mainline (many things
need to be done). So, till then I will keep off the unsafe version. If
ever compressed caching is on its way to mainline _then_ I will try
and add back the unsafe version. But I see no other project that
really cares about unsafe version so it's okay to keep it off.
- Nitin
On 5/30/07, Adrian Bunk <[email protected]> wrote:
> On Tue, May 29, 2007 at 11:10:05PM +0200, Jan Engelhardt wrote:
> >
> > On May 28 2007 19:11, Adrian Bunk wrote:
> You completely miss the point of my question.
>
> It's about the performance improvements of the modified code that were
> mentioned.
>
> What you are talking about shouldn't have any effect on the generated
> code.
>
After Daniel refined this testing program, we can see that perf gain
is < 2% which can surely be accounted to cleanups like unnecessary
castings in tight loops.
Again, all the original code has been retained _as-is_. Whatever was
changed, has been mentioned in that detailed changelog that I post
along with patch.
If someone want to review these 500 lines with this changelog in hand,
it should not take more than couple of hours. If no-one can see any
problem in code by now and considering that it's tested on x86(_32),
amd64, ppc and giving somewhat better perf. than original then I
believe it is unnecessarily hanging outside of -mm tree.
I also contacted author (Markus Oberhumer) regarding above changes but
he seems not be responding right now. But still, if it gets into -mm
and gets used by various related projects then it should be good
enough for mainline also.
Cheers,
Nitin
On 5/30/07, Jan Engelhardt <[email protected]> wrote:
>
> On May 28 2007 20:04, Nitin Gupta wrote:
> >
> > * Changelog vs. original LZO:
> > 1) Used standard/kernel defined data types: (this eliminated _huge_
> > #ifdef chunks)
> > lzo_bytep -> unsigned char *
> > lzo_uint -> size_t
> > lzo_xint -> size_t
>
> Is this safe (as far as compressed LZO stream is concerned) --
> or is it even needed (could it be unsigned int)?
>
> > - m_pos -= (*(const unsigned short *)ip) >> 2;
> > -#else
> > - m_pos = op - 1;
> > - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> > -#endif
> >
> > + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
> >
> > (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> > *** Need testing on big endian machine ***
>
> On i386, both cpu_to_le16 and le16_to_cpu do nothing.
> On sparc for example, cpu_to_leXX and leXX_to_cpu do 'the same' ;-)
> they swap 1234<->4321.
>
> It is the bytestream (ip) that is reinterpreted as uint16_t.
> And I really doubt that the LZO author has a big-endian machine,
> given the days of ubiquitous x86.
> le16_to_cpu it is.
>
I just missed the point that leXX_to_cpu() and cpu_to_leXX() are
identical on BE machine anyway. But then why you think it should be
le_16_cpu() -- how will this make any difference?
For your ref (from big_endian.h):
#define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
- Nitin
On May 29, 2007, at 8:15 AM, Satyam Sharma wrote:
> skipping some checksum calculation if some
> flag (PRESET_DICT) is absent from the input stream about to
> be decompressed ...
You don't need to dissect the header manually to look for that bit.
If you feed inflate() at least the first two bytes, it will return
immediately with the Z_NEED_DICT return code if a preset dictionary
is requested. You can force inflate() to return immediately after
decoding the two byte header even if a preset dictionary is not
requested by using the Z_BLOCK flush code.
Mark
On May 30 2007 11:24, Nitin Gupta wrote:
>>
>> It is the bytestream (ip) that is reinterpreted as uint16_t.
>> And I really doubt that the LZO author has a big-endian machine,
>> given the days of ubiquitous x86.
>
>> le16_to_cpu it is.
>
> But then why you think it should be
> le_16_cpu() -- how will this make any difference?
Like I said, we are reinterpreting the byte stream as a uint16_t
(= reading from bytestream to CPU). While writing out an uint16_t
as a bytestream is cpu_to_le16.
> For your ref (from big_endian.h):
> # define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
> # define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
Jan
--
On 5/30/07, Jan Engelhardt <[email protected]> wrote:
>
> On May 30 2007 11:24, Nitin Gupta wrote:
> >>
> >> It is the bytestream (ip) that is reinterpreted as uint16_t.
> >> And I really doubt that the LZO author has a big-endian machine,
> >> given the days of ubiquitous x86.
> >
> >> le16_to_cpu it is.
> >
> > But then why you think it should be
> > le_16_cpu() -- how will this make any difference?
>
> Like I said, we are reinterpreting the byte stream as a uint16_t
> (= reading from bytestream to CPU). While writing out an uint16_t
> as a bytestream is cpu_to_le16.
>
I didn't properly understand this. But you seem to be correct - now I
ran sparse check:
* using cpu_to_le16()
lib/lzo1x/lzo1x_decompress.c:141:35: error: incompatible types for
operation (>>)
lib/lzo1x/lzo1x_decompress.c:141:35: left side has type restricted
unsigned short [usertype] [force] <noident>
lib/lzo1x/lzo1x_decompress.c:141:35: right side has type int
lib/lzo1x/lzo1x_decompress.c:140:20: error: incompatible types for operation (-)
lib/lzo1x/lzo1x_decompress.c:140:20: left side has type unsigned
char *register [assigned] op
lib/lzo1x/lzo1x_decompress.c:140:20: right side has type bad type
lib/lzo1x/lzo1x_decompress.c:157:35: error: incompatible types for
operation (>>)
lib/lzo1x/lzo1x_decompress.c:157:35: left side has type restricted
unsigned short [usertype] [force] <noident>
lib/lzo1x/lzo1x_decompress.c:157:35: right side has type int
lib/lzo1x/lzo1x_decompress.c:156:11: error: incompatible types for
operation (-=)
* using le16_to_cpu()
lib/lzo1x/lzo1x_decompress.c:140:23: warning: cast to restricted type
lib/lzo1x/lzo1x_decompress.c:156:14: warning: cast to restricted type
But still, how do I get rid of these two warnings? or, do these
warning suggest some real problem that still exist in code?
- Nitin
> > For your ref (from big_endian.h):
> > # define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
> > # define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
>
>
> Jan
> --
>
On Wednesday 30 May 2007 01:31:19 Mark Adler wrote:
> On May 29, 2007, at 8:15 AM, Satyam Sharma wrote:
> > skipping some checksum calculation if some
> > flag (PRESET_DICT) is absent from the input stream about to
> > be decompressed ...
>
> You don't need to dissect the header manually to look for that bit.
> If you feed inflate() at least the first two bytes, it will return
> immediately with the Z_NEED_DICT return code if a preset dictionary
> is requested. You can force inflate() to return immediately after
> decoding the two byte header even if a preset dictionary is not
> requested by using the Z_BLOCK flush code.
>
> Mark
I think that JFFS2 is doing the work of looking for PRESET_DICT itself not
only to skip the checksum calc, but also to lose the function-call overhead.
Yes, the performance difference shouldn't be all that great, but might have
made a difference in some test. When I'm done working on the
benchmark/testbed for the LZO code proposed for inclusion in the kernel I'll
do some testing and see how large the difference is in userspace. (for most
code the differences hold true, though usually with less drastic numbers, in
kernel)
DRH
Hi Mark,
On 5/30/07, Mark Adler <[email protected]> wrote:
> On May 29, 2007, at 8:15 AM, Satyam Sharma wrote:
> > skipping some checksum calculation if some
> > flag (PRESET_DICT) is absent from the input stream about to
> > be decompressed ...
>
> You don't need to dissect the header manually to look for that bit.
> If you feed inflate() at least the first two bytes, it will return
> immediately with the Z_NEED_DICT return code if a preset dictionary
> is requested. You can force inflate() to return immediately after
> decoding the two byte header even if a preset dictionary is not
> requested by using the Z_BLOCK flush code.
Thanks for replying, unfortunately I don't know either zlib or jffs2 code
deeply enough to actually understand what you meant here. Are you
saying that the if-else block in question [1] in
fs/jffs2/compr_zlib.c:jffs2_zlib_decompress() is unnecessary and can
be done away with?
Thanks,
Satyam
[1] For your reference, here is the user code in question:
inf_strm.next_in = data_in;
inf_strm.avail_in = srclen;
inf_strm.total_in = 0;
inf_strm.next_out = cpage_out;
inf_strm.avail_out = destlen;
inf_strm.total_out = 0;
int wbits = MAX_WBITS;
/* If it's deflate, and it's got no preset dictionary, then
we can tell zlib to skip the adler32 check. */
if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
((data_in[0] & 0x0f) == Z_DEFLATED) &&
!(((data_in[0]<<8) + data_in[1]) % 31)) {
D2(printk(KERN_DEBUG "inflate skipping adler32\n"));
wbits = -((data_in[0] >> 4) + 8);
inf_strm.next_in += 2;
inf_strm.avail_in -= 2;
} else {
/* Let this remain D1 for now -- it should never happen */
D1(printk(KERN_DEBUG "inflate not skipping adler32\n"));
}
if (zlib_inflateInit2(&inf_strm, wbits) != Z_OK) {
printk(KERN_WARNING "inflateInit failed\n");
return 1;
}
while((ret = zlib_inflate(&inf_strm, Z_FINISH)) == Z_OK)
;
if (ret != Z_STREAM_END)
printk(KERN_NOTICE "inflate returned %d\n", ret);
zlib_inflateEnd(&inf_strm);
Satyam Sharma wrote:
> Hmmm, either jffs2 thinks the zlib interfaces exposed through
> include/linux/zlib.h are insufficient, or it isn't using zlib properly
> (or at least the way it was supposed to be used :-)
>
I do not remember many details, but yes, JFFS2 uses zlib trickily.
Traditionally, zlib is used like this: you have an input buffer, and you have
a large enough out put buffer, you compress whole input buffer to the output
buffer.
JFFS2 needs: it has _big_ input buffer, and _small_ output buffer, and it wants
zlib to compress as much as possible from the input buffer, and make the output
buffer full or nearly full of compressed data.
This is why JFFS2 uses different hacks, but I do not remember details anymore.
Otherwise it could just use cryptoapi instead. In fact, I tried this 2
years ago, but gave up: http://lkml.org/lkml/2005/3/25/104
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On Wed, May 30, 2007, Nitin Gupta wrote:
>
> Again, all the original code has been retained _as-is_. Whatever was
> changed, has been mentioned in that detailed changelog that I post
> along with patch.
Just a general remark (I haven't been following this thread closely):
IMHO it would be _much_ better to merge the original code and
your changes as seperate patches. Then someone who
wants to review it later doesn't have to jump through all
the hoops of finding the original code himself to diff it
and see your changes.
Additionally, you should also split stylistic/cleanup
changes like "Reformatted the code to match general kernel style"
from functional changes like "use cpu_to_le16()".
Ideally each of the changes you mention in your
"Changelog vs. original LZO" should be a seperate
patch, this would make review much easier.
Regards,
Johannes
On 5/30/07, Johannes Stezenbach <[email protected]> wrote:
> On Wed, May 30, 2007, Nitin Gupta wrote:
> >
> > Again, all the original code has been retained _as-is_. Whatever was
> > changed, has been mentioned in that detailed changelog that I post
> > along with patch.
>
> Just a general remark (I haven't been following this thread closely):
>
> IMHO it would be _much_ better to merge the original code and
> your changes as seperate patches. Then someone who
> wants to review it later doesn't have to jump through all
> the hoops of finding the original code himself to diff it
> and see your changes.
>
> Additionally, you should also split stylistic/cleanup
> changes like "Reformatted the code to match general kernel style"
> from functional changes like "use cpu_to_le16()".
>
> Ideally each of the changes you mention in your
> "Changelog vs. original LZO" should be a seperate
> patch, this would make review much easier.
I violently agree with this method of going forward.
Artem Bityutskiy wrote:
> JFFS2 needs: it has _big_ input buffer, and _small_ output buffer, and
> it wants
> zlib to compress as much as possible from the input buffer, and make the
> output
> buffer full or nearly full of compressed data.
Err, and important note is that it also wants this compressed data to be
independently uncompressable.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
Hi Artem,
On 5/30/07, Artem Bityutskiy <[email protected]> wrote:
> Err, and important note is that it also wants this compressed data to be
> independently uncompressable.
Hmm, okay.
But what I meant was that if jffs2's needs are "standard" enough in
the sense that they could conceivably be required by other users too
(and this one mentioned by you does appear to be one of those),
then why not make such hacks (if they are necessary and suitable
indeed, lets wait for Mark's response) part of zlib itself?
Thanks,
Satyam
Satyam Sharma wrote:
> Hi Artem,
>
> On 5/30/07, Artem Bityutskiy <[email protected]> wrote:
>> Err, and important note is that it also wants this compressed data to be
>> independently uncompressable.
>
> Hmm, okay.
>
> But what I meant was that if jffs2's needs are "standard" enough in
> the sense that they could conceivably be required by other users too
> (and this one mentioned by you does appear to be one of those),
> then why not make such hacks (if they are necessary and suitable
> indeed, lets wait for Mark's response) part of zlib itself?
I really not sure - I vaguely remember that zlib just cannot provide this
capability. And what JFFS2 does is not always correct. But again, it was
so long time ago...
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On May 30, 2007, at 6:30 AM, Satyam Sharma wrote:
> [1] For your reference, here is the user code in question:
...
> if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
> ((data_in[0] & 0x0f) == Z_DEFLATED) &&
> !(((data_in[0]<<8) + data_in[1]) % 31)) {
The funny thing here is that the author felt compelled to use a
#defined constant for the dictionary bit (PRESET_DICT), but had no
problem with a numeric constant to isolate the compression method
(0x0f), or for that matter extracting the window bits from the
header. The easy way to avoid the use of an internal zlib header
file here is to simply replace PRESET_DICT with 0x20. That constant
will never change -- it is part of the definition of the zlib header
in RFC 1950.
The slightly more involved patch to avoid the problem is to let
inflate() do all that work for you, including the integrity check on
the zlib header (% 31). Also this corrects an error in the original
code, which is that it continues to try to decompress after finding
that a dictionary is needed or that the zlib header is invalid. In
this version, a bad header simply returns an error:
/* provide input data and output space */
inf_strm.next_in = data_in;
inf_strm.avail_in = srclen;
inf_strm.next_out = cpage_out;
inf_strm.avail_out = destlen;
/* verify and skip zlib header (this updates next_in and
avail_in) */
inf_strm.zalloc = Z_NULL;
inf_strm.zfree = Z_NULL;
inf_strm.opaque = Z_NULL;
if (zlib_inflateInit(&inf_strm) != Z_OK) {
printk(KERN_WARNING "inflateInit failed\n");
return 1;
}
ret = zlib_inflate(&inf_strm, Z_BLOCK);
if (ret != Z_OK || (inf_strm.data_type & 0x80) == 0) {
printk(KERN_WARNING "inflate failed on zlib header\n");
return 1;
}
zlib_inflateEnd(&inf_strm);
/* do raw inflate (no adler32) on deflate data after zlib header */
inf_strm.zalloc = Z_NULL;
inf_strm.zfree = Z_NULL;
inf_strm.opaque = Z_NULL;
if (zlib_inflateInit2(&inf_strm, -MAX_WBITS) != Z_OK) {
printk(KERN_WARNING "inflateInit failed\n");
return 1;
}
while ((ret = zlib_inflate ...
Mark
On Wednesday 30 May 2007 19:02:28 Mark Adler wrote:
> On May 30, 2007, at 6:30 AM, Satyam Sharma wrote:
> > [1] For your reference, here is the user code in question:
>
> ...
>
> > if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
> > ((data_in[0] & 0x0f) == Z_DEFLATED) &&
> > !(((data_in[0]<<8) + data_in[1]) % 31)) {
>
> The funny thing here is that the author felt compelled to use a
> #defined constant for the dictionary bit (PRESET_DICT), but had no
> problem with a numeric constant to isolate the compression method
> (0x0f), or for that matter extracting the window bits from the
> header. The easy way to avoid the use of an internal zlib header
> file here is to simply replace PRESET_DICT with 0x20. That constant
> will never change -- it is part of the definition of the zlib header
> in RFC 1950.
That was one of my original suggestions, though I personally downchecked it
(and believe I also did so in the mail with the suggestions) because of the
dislike of "Magic Numbers" in the kernel. However, I had only looked at a
small part of the JFFS2 code to see what it used zutil.h for - if it is using
said 'Magic Numbers' in the rest of the code, the consistent way to fix this
would be to replace the PRESET_DICT macro with the constant it stands for.
I've gotten a bit caught-up in extending the test-bed code I used to benchmark
Nitin's 'tinyLZO' implementation, but I will see about getting a patch
together to change JFFS2 in the manner Mark suggests. (Unless, of course,
someone gets to it before I do).
DRH
>
> The slightly more involved patch to avoid the problem is to let
> inflate() do all that work for you, including the integrity check on
> the zlib header (% 31). Also this corrects an error in the original
> code, which is that it continues to try to decompress after finding
> that a dictionary is needed or that the zlib header is invalid. In
> this version, a bad header simply returns an error:
>
> /* provide input data and output space */
> inf_strm.next_in = data_in;
> inf_strm.avail_in = srclen;
> inf_strm.next_out = cpage_out;
> inf_strm.avail_out = destlen;
>
> /* verify and skip zlib header (this updates next_in and
> avail_in) */
> inf_strm.zalloc = Z_NULL;
> inf_strm.zfree = Z_NULL;
> inf_strm.opaque = Z_NULL;
> if (zlib_inflateInit(&inf_strm) != Z_OK) {
> printk(KERN_WARNING "inflateInit failed\n");
> return 1;
> }
> ret = zlib_inflate(&inf_strm, Z_BLOCK);
> if (ret != Z_OK || (inf_strm.data_type & 0x80) == 0) {
> printk(KERN_WARNING "inflate failed on zlib header\n");
> return 1;
> }
> zlib_inflateEnd(&inf_strm);
>
> /* do raw inflate (no adler32) on deflate data after zlib header */
> inf_strm.zalloc = Z_NULL;
> inf_strm.zfree = Z_NULL;
> inf_strm.opaque = Z_NULL;
> if (zlib_inflateInit2(&inf_strm, -MAX_WBITS) != Z_OK) {
> printk(KERN_WARNING "inflateInit failed\n");
> return 1;
> }
> while ((ret = zlib_inflate ...
>
> Mark
Hi,
FYI...
The author (Markus Oberhumer) of LZO provided these comments for this patch:
---
I've only briefly looked over it, but it's obvious that your version does not
work on architechtures which do not allow unaligned access (arm, mips, ...).
As for further quality assurance, your version should generate byte-identical
object code to LZO 2.02 when using the same compiler & flags. So you could do
some semi-automatic testing by compiling with -ffunction-sections and use
"objcopy --only-section .text.XXX" to compare the md5sum of all generated
functions. This also works fine with crosscompilers.
Finally I'm not too happy that you renamed the functions and #defines like
LZO1X_WORKMEM_SIZE - please stay compatible with the official library version.
----
Still a lot to do...
- Nitin
On 5/28/07, Nitin Gupta <[email protected]> wrote:
> This is kernel port of LZO1X-1 compressor and LZO1X decompressor (safe
> version only).
>
> * Changes since 'take 5' (Full Changelog after this):
> - Added compressor and decomrpesssor as separate and hidden config
> options (default: n)
> - Cleanups: removed LZO_CHECK_MPOS_NON_DET macro
> - removed PTR* macros.
>
> * Benchmarks: (system: P4 3GHz, 1G RAM)
> (Using tester program from Daniel)
>
> Following compares this kernel port ('take 6') vs original miniLZO code:
>
> 'TinyLZO' refers to this kernel port.
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 61.2223 usec
> Compression: 41.8412 usec
> Decompression: 19.3811 usec
> 'miniLZO':
> Combined: 66.0444 usec
> Compression: 46.6323 usec
> Decompression: 19.4121 usec
>
> Result:
> Overall: TinyLZO is 7.3% faster
> Compressor: TinyLZO is 10.2% faster
> Decompressor: TinyLZO is 0.15% faster
>
> TODO: test 'take 6' port on archs other than x86(_32)
>
> * Changelog vs. original LZO:
> 1) Used standard/kernel defined data types: (this eliminated _huge_
> #ifdef chunks)
> lzo_bytep -> unsigned char *
> lzo_uint -> size_t
> lzo_xint -> size_t
> lzo_uint32p -> u32 *
> lzo_uintptr_t -> unsigned long
> 2) Removed everything #ifdef'ed under COPY_DICT (this is not set for
> LZO1X, so removed corres. parts).
> 3) Removed code #ifdef'ed for LZO1Y, LZO1Z, other variants.
> 4) Reformatted the code to match general kernel style.
> 5) The only code change: (as suggested by Andrey)
> -#if defined(__LITTLE_ENDIAN)
> - m_pos = op - 1;
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos = op - 1;
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
>
> (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> *** Need testing on big endian machine ***
>
> Similarly:
> -#if defined(__LITTLE_ENDIAN)
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;
>
> 6) Get rid of LZO_CHECK_MPOS_NON_DET macro and PTR* macros.
>
>
> I think it's now ready for mainline inclusion.
>
>
> include/linux/lzo1x.h | 66 +++++++++++
> lib/Kconfig | 6 +
> lib/Makefile | 2 +
> lib/lzo1x/Makefile | 2 +
> lib/lzo1x/lzo1x_compress.c | 259 ++++++++++++++++++++++++++++++++++++++++++
> lib/lzo1x/lzo1x_decompress.c | 238 ++++++++++++++++++++++++++++++++++++++
> lib/lzo1x/lzo1x_int.h | 85 ++++++++++++++
> 7 files changed, 658 insertions(+), 0 deletions(-)
>
> Signed-off-by: Nitin Gupta <nitingupta910 [at] gmail [dot] com>
> ---
> diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h
> new file mode 100755
> index 0000000..11a6f23
> --- /dev/null
> +++ b/include/linux/lzo1x.h
> @@ -0,0 +1,66 @@
> +/* lzo1x.h -- public interface of the LZO1X compression algorithm
> +
> + This file is part of the LZO real-time data compression library.
> +
> + Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
> + All Rights Reserved.
> +
> + The LZO library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU General Public License,
> + version 2, as published by the Free Software Foundation.
> +
> + The LZO library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with the LZO library; see the file COPYING.
> + If not, write to the Free Software Foundation, Inc.,
> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> + Markus F.X.J. Oberhumer
> + <[email protected]>
> + http://www.oberhumer.com/opensource/lzo/
> +
> +
> + This file is modified version of lzo1x.h found in original LZO 2.02
> + code. Some additional changes have also been made to make it work
> + in kernel space.
> +
> + Nitin Gupta
> + <[email protected]>
> + */
> +
> +#ifndef __LZO1X_H
> +#define __LZO1X_H
> +
> +/* LZO return codes */
> +#define LZO_E_OK 0
> +#define LZO_E_ERROR (-1)
> +#define LZO_E_OUT_OF_MEMORY (-2) /* [not used right now] */
> +#define LZO_E_NOT_COMPRESSIBLE (-3) /* [not used right now] */
> +#define LZO_E_INPUT_OVERRUN (-4)
> +#define LZO_E_OUTPUT_OVERRUN (-5)
> +#define LZO_E_LOOKBEHIND_OVERRUN (-6)
> +#define LZO_E_EOF_NOT_FOUND (-7)
> +#define LZO_E_INPUT_NOT_CONSUMED (-8)
> +#define LZO_E_NOT_YET_IMPLEMENTED (-9) /* [not used right now] */
> +
> +/* Size of temp buffer (workmem) required by lzo1x_compress */
> +#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *)))
> +
> +/*
> + * This requires 'workmem' of size LZO1X_WORKMEM_SIZE
> + */
> +int lzo1x_compress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len,
> + void *workmem);
> +
> +/*
> + * This decompressor will catch all compressed data violations and
> + * return an error code in this case.
> + */
> +int lzo1x_decompress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len);
> +#endif
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 2e7ae6b..eb95eaa 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -64,6 +64,12 @@ config ZLIB_INFLATE
> config ZLIB_DEFLATE
> tristate
>
> +config LZO1X_COMPRESS
> + tristate
> +
> +config LZO1X_DECOMPRESS
> + tristate
> +
> #
> # Generic allocator support is selected if needed
> #
> diff --git a/lib/Makefile b/lib/Makefile
> index c8c8e20..448ae37 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -49,6 +49,8 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
> obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
> obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
> obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
> +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x/
> +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x/
>
> obj-$(CONFIG_TEXTSEARCH) += textsearch.o
> obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
> diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
> new file mode 100644
> index 0000000..7b56a4d
> --- /dev/null
> +++ b/lib/lzo1x/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_LZO1X_COMPRESS) += lzo1x_compress.o
> +obj-$(CONFIG_LZO1X_DECOMPRESS) += lzo1x_decompress.o
> diff --git a/lib/lzo1x/lzo1x_compress.c b/lib/lzo1x/lzo1x_compress.c
> new file mode 100755
> index 0000000..b525be6
> --- /dev/null
> +++ b/lib/lzo1x/lzo1x_compress.c
> @@ -0,0 +1,259 @@
> +/* lzo1x_compress.c -- LZO1X-1 compression
> +
> + This file is part of the LZO real-time data compression library.
> +
> + Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
> + All Rights Reserved.
> +
> + The LZO library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU General Public License,
> + version 2, as published by the Free Software Foundation.
> +
> + The LZO library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with the LZO library; see the file COPYING.
> + If not, write to the Free Software Foundation, Inc.,
> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> + Markus F.X.J. Oberhumer
> + <[email protected]>
> + http://www.oberhumer.com/opensource/lzo/
> +
> +
> + This file is derived from lzo1x_1.c and lzo1x_c.ch found in original
> + LZO 2.02 code. Some additional changes have also been made to make
> + it work in kernel space.
> +
> + Nitin Gupta
> + <[email protected]>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/compiler.h>
> +#include <linux/lzo1x.h>
> +
> +#include "lzo1x_int.h"
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("LZO1X Compression");
> +
> +/* compress a block of data. */
> +static noinline unsigned int
> +lzo1x_compress_worker(const unsigned char *in, size_t in_len,
> + unsigned char *out, size_t *out_len,
> + void *workmem)
> +{
> + register const unsigned char *ip;
> + unsigned char *op;
> + const unsigned char * const in_end = in + in_len;
> + const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
> + const unsigned char *ii;
> + const unsigned char ** const dict = (const unsigned char **)workmem;
> +
> + op = out;
> + ip = in;
> + ii = ip;
> +
> + ip += 4;
> + for (;;) {
> + register const unsigned char *m_pos;
> + size_t m_off;
> + size_t m_len;
> + size_t dindex;
> +
> + DINDEX1(dindex, ip);
> + m_pos = dict[dindex];
> +
> + if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
> + || m_off > M4_MAX_OFFSET)
> + goto literal;
> +
> + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
> + goto try_match;
> +
> + DINDEX2(dindex, ip);
> + m_pos = dict[dindex];
> +
> + if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
> + || m_off > M4_MAX_OFFSET)
> + goto literal;
> +
> + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
> + goto try_match;
> +
> + goto literal;
> +
> +try_match:
> + if (*(const unsigned short *)m_pos ==
> + *(const unsigned short *)ip) {
> + if (likely(m_pos[2] == ip[2]))
> + goto match;
> + }
> +
> + /* a literal */
> +literal:
> + dict[dindex] = ip;
> + ++ip;
> + if (unlikely(ip >= ip_end))
> + break;
> + continue;
> +
> + /* a match */
> +match:
> + dict[dindex] = ip;
> + /* store current literal run */
> + if ((size_t)(ip - ii) > 0) {
> + register size_t t = (size_t)(ip - ii);
> + if (t <= 3)
> + op[-2] |= (unsigned char)t;
> + else if (t <= 18)
> + *op++ = (unsigned char)(t - 3);
> + else {
> + register size_t tt = t - 18;
> + *op++ = 0;
> + while (tt > 255) {
> + tt -= 255;
> + *op++ = 0;
> + }
> + *op++ = (unsigned char)tt;
> + }
> + do
> + *op++ = *ii++;
> + while (--t > 0);
> + }
> +
> + /* code the match */
> + ip += 3;
> + if (m_pos[3] != *ip++ || m_pos[4] != *ip++ ||
> + m_pos[5] != *ip++ || m_pos[6] != *ip++ ||
> + m_pos[7] != *ip++ || m_pos[8] != *ip++) {
> + --ip;
> + m_len = (size_t)(ip - ii);
> +
> + if (m_off <= M2_MAX_OFFSET) {
> + m_off -= 1;
> + *op++ = (unsigned char)(((m_len - 1) << 5) |
> + ((m_off & 7) << 2));
> + *op++ = (unsigned char)(m_off >> 3);
> + }
> + else if (m_off <= M3_MAX_OFFSET) {
> + m_off -= 1;
> + *op++ = (unsigned char)(M3_MARKER |
> + (m_len - 2));
> + goto m3_m4_offset;
> + } else {
> + m_off -= 0x4000;
> + *op++ = (unsigned char)(M4_MARKER |
> + ((m_off & 0x4000) >> 11) |
> + (m_len - 2));
> + goto m3_m4_offset;
> + }
> + } else {
> + const unsigned char *end = in_end;
> + const unsigned char *m = m_pos + M2_MAX_LEN + 1;
> + while (ip < end && *m == *ip)
> + m++, ip++;
> + m_len = (size_t)(ip - ii);
> +
> + if (m_off <= M3_MAX_OFFSET) {
> + m_off -= 1;
> + if (m_len <= 33)
> + *op++ = (unsigned char)(M3_MARKER |
> + (m_len - 2));
> + else {
> + m_len -= 33;
> + *op++ = M3_MARKER | 0;
> + goto m3_m4_len;
> + }
> + } else {
> + m_off -= 0x4000;
> + if (m_len <= M4_MAX_LEN)
> + *op++ = (unsigned char)(M4_MARKER |
> + ((m_off & 0x4000) >> 11) |
> + (m_len - 2));
> + else {
> + m_len -= M4_MAX_LEN;
> + *op++ = (unsigned char)(M4_MARKER |
> + ((m_off & 0x4000) >> 11));
> +m3_m4_len:
> + while (m_len > 255) {
> + m_len -= 255;
> + *op++ = 0;
> + }
> + *op++ = (unsigned char)(m_len);
> + }
> + }
> +
> +m3_m4_offset:
> + *op++ = (unsigned char)((m_off & 63) << 2);
> + *op++ = (unsigned char)(m_off >> 6);
> + }
> +
> + ii = ip;
> + if (unlikely(ip >= ip_end))
> + break;
> + }
> +
> + *out_len = (size_t)(op - out);
> + return (size_t)(in_end - ii);
> +}
> +
> +
> +/*
> + * This requires buffer (workmem) of size LZO1X_WORKMEM_SIZE
> + * (exported by lzo1x.h).
> + */
> +int
> +lzo1x_compress(const unsigned char *in, size_t in_len,
> + unsigned char *out, size_t *out_len,
> + void *workmem)
> +{
> + unsigned char *op = out;
> + size_t t;
> +
> + if (!workmem)
> + return -EINVAL;
> +
> + if (unlikely(in_len <= M2_MAX_LEN + 5))
> + t = in_len;
> + else {
> + t = lzo1x_compress_worker(in, in_len, op, out_len, workmem);
> + op += *out_len;
> + }
> +
> + if (t > 0) {
> + const unsigned char *ii = in + in_len - t;
> +
> + if (op == out && t <= 238)
> + *op++ = (unsigned char)(17 + t);
> + else if (t <= 3)
> + op[-2] |= (unsigned char)t;
> + else if (t <= 18)
> + *op++ = (unsigned char)(t - 3);
> + else {
> + size_t tt = t - 18;
> + *op++ = 0;
> + while (tt > 255) {
> + tt -= 255;
> + *op++ = 0;
> + }
> + *op++ = (unsigned char)tt;
> + }
> + do
> + *op++ = *ii++;
> + while (--t > 0);
> + }
> + *op++ = M4_MARKER | 1;
> + *op++ = 0;
> + *op++ = 0;
> +
> + *out_len = (size_t)(op - out);
> + return LZO_E_OK;
> +}
> +
> +EXPORT_SYMBOL(lzo1x_compress);
> diff --git a/lib/lzo1x/lzo1x_decompress.c b/lib/lzo1x/lzo1x_decompress.c
> new file mode 100755
> index 0000000..75ce294
> --- /dev/null
> +++ b/lib/lzo1x/lzo1x_decompress.c
> @@ -0,0 +1,238 @@
> +/* lzo1x_decompress.c -- LZO1X decompression
> +
> + This file is part of the LZO real-time data compression library.
> +
> + Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
> + All Rights Reserved.
> +
> + The LZO library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU General Public License,
> + version 2, as published by the Free Software Foundation.
> +
> + The LZO library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with the LZO library; see the file COPYING.
> + If not, write to the Free Software Foundation, Inc.,
> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> + Markus F.X.J. Oberhumer
> + <[email protected]>
> + http://www.oberhumer.com/opensource/lzo/
> +
> +
> + This file is derived from lzo1x_d1.c and lzo1x_d.ch found in original
> + LZO 2.02 code. Some additional changes have also been made to make
> + it work in kernel space.
> +
> + Nitin Gupta
> + <[email protected]>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <asm/byteorder.h>
> +#include <linux/lzo1x.h>
> +
> +#include "lzo1x_int.h"
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("LZO1X Decompression");
> +
> +int
> +lzo1x_decompress(const unsigned char *in, size_t in_len,
> + unsigned char *out, size_t *out_len)
> +{
> + register size_t t;
> + register unsigned char *op = out;
> + register const unsigned char *ip = in, *m_pos;
> + const unsigned char * const ip_end = in + in_len;
> + unsigned char * const op_end = out + *out_len;
> + *out_len = 0;
> +
> + if (*ip > 17) {
> + t = *ip++ - 17;
> + if (t < 4)
> + goto match_next;
> + NEED_OP(t);
> + NEED_IP(t + 1);
> + do
> + *op++ = *ip++;
> + while (--t > 0);
> + goto first_literal_run;
> + }
> +
> + while (TEST_IP) {
> + t = *ip++;
> + if (t >= 16)
> + goto match;
> + /* a literal run */
> + if (t == 0) {
> + NEED_IP(1);
> + while (*ip == 0) {
> + t += 255;
> + ip++;
> + NEED_IP(1);
> + }
> + t += 15 + *ip++;
> + }
> + /* copy literals */
> + NEED_OP(t + 3);
> + NEED_IP(t + 4);
> + COPY4(op, ip);
> + op += 4; ip += 4;
> + if (--t > 0) {
> + if (t >= 4) {
> + do {
> + COPY4(op, ip);
> + op += 4; ip += 4; t -= 4;
> + } while (t >= 4);
> + if (t > 0)
> + do
> + *op++ = *ip++;
> + while (--t > 0);
> + }
> + else
> + do
> + *op++ = *ip++;
> + while (--t > 0);
> + }
> +
> +first_literal_run:
> + t = *ip++;
> + if (t >= 16)
> + goto match;
> + m_pos = op - (1 + M2_MAX_OFFSET);
> + m_pos -= t >> 2;
> + m_pos -= *ip++ << 2;
> + TEST_LB(m_pos);
> + NEED_OP(3);
> + *op++ = *m_pos++;
> + *op++ = *m_pos++;
> + *op++ = *m_pos;
> + goto match_done;
> +
> + /* handle matches */
> + do {
> +match:
> + if (t >= 64) { /* a M2 match */
> + m_pos = op - 1;
> + m_pos -= (t >> 2) & 7;
> + m_pos -= *ip++ << 3;
> + t = (t >> 5) - 1;
> + TEST_LB(m_pos);
> + NEED_OP(t + 3 - 1);
> + goto copy_match;
> + } else if (t >= 32) { /* a M3 match */
> + t &= 31;
> + if (t == 0) {
> + NEED_IP(1);
> + while (*ip == 0) {
> + t += 255;
> + ip++;
> + NEED_IP(1);
> + }
> + t += 31 + *ip++;
> + }
> + m_pos = op - 1 - (cpu_to_le16(
> + *(const unsigned short *)ip) >> 2);
> + ip += 2;
> + } else if (t >= 16) { /* a M4 match */
> + m_pos = op;
> + m_pos -= (t & 8) << 11;
> + t &= 7;
> + if (t == 0) {
> + NEED_IP(1);
> + while (*ip == 0) {
> + t += 255;
> + ip++;
> + NEED_IP(1);
> + }
> + t += 7 + *ip++;
> + }
> + m_pos -= cpu_to_le16(
> + *(const unsigned short *)ip) >> 2;
> + ip += 2;
> + if (m_pos == op)
> + goto eof_found;
> + m_pos -= 0x4000;
> + } else { /* a M1 match */
> + m_pos = op - 1;
> + m_pos -= t >> 2;
> + m_pos -= *ip++ << 2;
> + TEST_LB(m_pos);
> + NEED_OP(2);
> + *op++ = *m_pos++;
> + *op++ = *m_pos;
> + goto match_done;
> + }
> +
> + /* copy match */
> + TEST_LB(m_pos);
> + NEED_OP(t + 3 - 1);
> +
> + if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
> + COPY4(op, m_pos);
> + op += 4; m_pos += 4; t -= 4 - (3 - 1);
> + do {
> + COPY4(op, m_pos);
> + op += 4; m_pos += 4; t -= 4;
> + } while (t >= 4);
> + if (t > 0)
> + do *op++ = *m_pos++;
> + while (--t > 0);
> + } else {
> +copy_match:
> + *op++ = *m_pos++;
> + *op++ = *m_pos++;
> + do
> + *op++ = *m_pos++;
> + while (--t > 0);
> + }
> +
> +match_done:
> + t = ip[-2] & 3;
> + if (t == 0)
> + break;
> +
> + /* copy literals */
> +match_next:
> + NEED_OP(t);
> + NEED_IP(t + 1);
> + *op++ = *ip++;
> + if (t > 1) {
> + *op++ = *ip++;
> + if (t > 2)
> + *op++ = *ip++;
> + }
> + t = *ip++;
> + } while (TEST_IP);
> + }
> +
> + /* no EOF code was found */
> + *out_len = (size_t)(op - out);
> + return LZO_E_EOF_NOT_FOUND;
> +
> +eof_found:
> + *out_len = (size_t)(op - out);
> + return (ip == ip_end ? LZO_E_OK :
> + (ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED :
> + LZO_E_INPUT_OVERRUN));
> +
> +input_overrun:
> + *out_len = (size_t)(op - out);
> + return LZO_E_INPUT_OVERRUN;
> +
> +output_overrun:
> + *out_len = (size_t)(op - out);
> + return LZO_E_OUTPUT_OVERRUN;
> +
> +lookbehind_overrun:
> + *out_len = (size_t)(op - out);
> + return LZO_E_LOOKBEHIND_OVERRUN;
> +}
> +
> +EXPORT_SYMBOL(lzo1x_decompress);
> diff --git a/lib/lzo1x/lzo1x_int.h b/lib/lzo1x/lzo1x_int.h
> new file mode 100755
> index 0000000..4f0fe8d
> --- /dev/null
> +++ b/lib/lzo1x/lzo1x_int.h
> @@ -0,0 +1,85 @@
> +/* lzo1x_int.h -- to be used internally by LZO de/compression algorithms
> +
> + This file is part of the LZO real-time data compression library.
> +
> + Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
> + All Rights Reserved.
> +
> + The LZO library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU General Public License,
> + version 2, as published by the Free Software Foundation.
> +
> + The LZO library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with the LZO library; see the file COPYING.
> + If not, write to the Free Software Foundation, Inc.,
> + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> + Markus F.X.J. Oberhumer
> + <[email protected]>
> + http://www.oberhumer.com/opensource/lzo/
> +
> +
> + This file was derived from several header files found in original
> + LZO 2.02 code. Some additional changes have also been made to make
> + it work in kernel space.
> +
> + Nitin Gupta
> + <[email protected]>
> + */
> +
> +#ifndef __LZO1X_INT_H
> +#define __LZO1X_INT_H
> +
> +#include <linux/types.h>
> +
> +#define D_BITS 14
> +#define D_SIZE (1u << D_BITS)
> +#define D_MASK (D_SIZE - 1)
> +#define D_HIGH ((D_MASK >> 1) + 1)
> +
> +#define DX2(p,s1,s2) \
> + (((((size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0])
> +#define DX3(p,s1,s2,s3) \
> + ((DX2((p) + 1, s2, s3) << (s1)) ^ (p)[0])
> +#define DINDEX1(d,p) \
> + d = ((size_t)(0x21 * DX3(p, 5, 5, 6)) >> 5) & D_MASK
> +#define DINDEX2(d,p) \
> + d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f)
> +
> +#define COPY4(dst,src) *(u32 *)(dst) = *(u32 *)(src)
> +
> +/* LZO1X Specific constants */
> +#define M1_MAX_OFFSET 0x0400
> +#define M2_MAX_OFFSET 0x0800
> +#define M3_MAX_OFFSET 0x4000
> +#define M4_MAX_OFFSET 0xbfff
> +
> +#define M1_MIN_LEN 2
> +#define M1_MAX_LEN 2
> +#define M2_MIN_LEN 3
> +#define M2_MAX_LEN 8
> +#define M3_MIN_LEN 3
> +#define M3_MAX_LEN 33
> +#define M4_MIN_LEN 3
> +#define M4_MAX_LEN 9
> +
> +#define M1_MARKER 0
> +#define M2_MARKER 64
> +#define M3_MARKER 32
> +#define M4_MARKER 16
> +
> +/* Bounds checking */
> +#define TEST_IP (ip < ip_end)
> +#define NEED_IP(x) \
> + if ((size_t)(ip_end - ip) < (size_t)(x)) goto input_overrun
> +#define NEED_OP(x) \
> + if ((size_t)(op_end - op) < (size_t)(x)) goto output_overrun
> +#define TEST_LB(m_pos) \
> + if (m_pos < out || m_pos >= op) goto lookbehind_overrun
> +
> +#endif
>
>
On 5/31/07, Nitin Gupta <[email protected]> wrote:
> [...]
> The author (Markus Oberhumer) of LZO provided these comments for this patch:
>
> ---
> I've only briefly looked over it, but it's obvious that your version does not
> work on architechtures which do not allow unaligned access (arm, mips, ...).
>
> As for further quality assurance, your version should generate byte-identical
> object code to LZO 2.02 when using the same compiler & flags. So you could do
> some semi-automatic testing by compiling with -ffunction-sections and use
> "objcopy --only-section .text.XXX" to compare the md5sum of all generated
> functions. This also works fine with crosscompilers.
>
> Finally I'm not too happy that you renamed the functions and #defines like
> LZO1X_WORKMEM_SIZE - please stay compatible with the official library version.
As suggested by Johannes earlier, it'd be great if you could submit
the various changes (as per Changelog) as _individual patches_ on
the original userspace code. That would be easier for others to review,
and there's lesser chances of bugs / issues leaking in that way.
As for "byte-identical object code", I definitely do *not* think it is
necessarily a requirement / good idea. As long as all the changes
you make are reviewed individually / closely by people here on this
list, there's very low chances of any bugs creeping in. [ F.e. I see
nothing wrong in removing the usage of "register" -- that could clearly
lead to different object code, but with no bugs introduced. ]
On Wednesday 30 May 2007 19:02:28 Mark Adler wrote:
> On May 30, 2007, at 6:30 AM, Satyam Sharma wrote:
> > [1] For your reference, here is the user code in question:
>
> ...
>
> > if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
> > ((data_in[0] & 0x0f) == Z_DEFLATED) &&
> > !(((data_in[0]<<8) + data_in[1]) % 31)) {
>
> The funny thing here is that the author felt compelled to use a
> #defined constant for the dictionary bit (PRESET_DICT), but had no
> problem with a numeric constant to isolate the compression method
> (0x0f), or for that matter extracting the window bits from the
> header. The easy way to avoid the use of an internal zlib header
> file here is to simply replace PRESET_DICT with 0x20. That constant
> will never change -- it is part of the definition of the zlib header
> in RFC 1950.
If there is no objection, I'll put together a patch that changes the use in
JFFS2 into a "magic number", complete with documentation on it, and also
moves all of the zlib stuff into a single directory.
> The slightly more involved patch to avoid the problem is to let
> inflate() do all that work for you, including the integrity check on
> the zlib header (% 31). Also this corrects an error in the original
> code, which is that it continues to try to decompress after finding
> that a dictionary is needed or that the zlib header is invalid. In
> this version, a bad header simply returns an error:
>
Does anyone know if doing as Mark suggests would negatively impact the
performance of JFFS2 to such a degree that it could be considered a
regression? I, unfortunately, don't have the hardware to properly test the
code. And, at this point in time, I also don't have enough familiarity with
the JFFS2 code to make such a change myself.
DRH
Hi Daniel,
On 6/1/07, Daniel Hazelton <[email protected]> wrote:
> On Wednesday 30 May 2007 19:02:28 Mark Adler wrote:
> > On May 30, 2007, at 6:30 AM, Satyam Sharma wrote:
> > > [1] For your reference, here is the user code in question:
> >
> > ...
> >
> > > if (srclen > 2 && !(data_in[1] & PRESET_DICT) &&
> > > ((data_in[0] & 0x0f) == Z_DEFLATED) &&
> > > !(((data_in[0]<<8) + data_in[1]) % 31)) {
> >
> > The funny thing here is that the author felt compelled to use a
> > #defined constant for the dictionary bit (PRESET_DICT), but had no
> > problem with a numeric constant to isolate the compression method
> > (0x0f), or for that matter extracting the window bits from the
> > header. The easy way to avoid the use of an internal zlib header
> > file here is to simply replace PRESET_DICT with 0x20. That constant
> > will never change -- it is part of the definition of the zlib header
> > in RFC 1950.
>
> If there is no objection, I'll put together a patch that changes the use in
> JFFS2 into a "magic number", complete with documentation on it, and also
> moves all of the zlib stuff into a single directory.
Right, s/PRESET_DICT/0x20/ would have the least fall-out / unknown
side-effects.
> > The slightly more involved patch to avoid the problem is to let
> > inflate() do all that work for you, including the integrity check on
> > the zlib header (% 31). Also this corrects an error in the original
> > code, which is that it continues to try to decompress after finding
> > that a dictionary is needed or that the zlib header is invalid. In
> > this version, a bad header simply returns an error:
> >
>
> Does anyone know if doing as Mark suggests would negatively impact the
> performance of JFFS2 to such a degree that it could be considered a
> regression? I, unfortunately, don't have the hardware to properly test the
> code. And, at this point in time, I also don't have enough familiarity with
> the JFFS2 code to make such a change myself.
David would have to comment on that, but you could simultaneously
make and submit the patch as suggested by Mark with both your
signed-offs-by. That'll naturally need to go through David's tree, so we'll
know if he likes/accepts the suggested modifications ...