2007-05-25 11:45:53

by Nitin Gupta

[permalink] [raw]
Subject: [RFC] LZO de/compression support - take 4

Hi,

This is kernel port of LZO1X compressor and LZO1X decompressor (safe
version only).

* Changes since 'take 3' (Full Changelog after this):
1) Removed 'unsafe' decompressor - hence also do away with symlinks in
Makefiles.
2) Rolled back changes where I replaced COPY4 with memcpy() calls.
This seemed to be causing too much perf. loss as shown by Richard's
tests. Need perf. testing again to confirm that this patch has perf.
comparable to original LZO code/Richard's patch.
3) Some functions were inlined (DX2, DX3 etc.) - this also seemed to
be one of factors for perf. loss. Changed them back Macros.
4) Added back the 'register' keyword - again seemed to me one of
factors for perf. loss.

Once I pinpoint exact reason for bad perf., I will do above cleanups
again. But this should not be reason for non-inclusion in mainline.
These are only minor cleanups.

Richard, can you please provide perf. results for this patch also?
Also, can you please mail back latest version of your LZO patch? In
meantime, I will try to include benchmarking support to the
'compress-test' module.

* Changelog vs. original LZO:
1) Used standard/kernel defined data types: (this eliminated _huge_
#ifdef chunks
lzo_bytep -> unsigned char *
lzo_uint -> size_t
lzo_xint -> size_t
lzo_uint32p -> u32 *
2) Removed everything #ifdef'ed under COPY_DICT (this is not set for
LZO1X, so removed corres. parts).
3) Removed code #ifdef'ed for LZO1Y, LZO1Z, other variants.
4) Reformatted the code to match general kernel style.
5) The only code change: (as suggested by Andrey)
-#if defined(__LITTLE_ENDIAN)
- m_pos = op - 1;
- m_pos -= (*(const unsigned short *)ip) >> 2;
-#else
- m_pos = op - 1;
- m_pos -= (ip[0] >> 2) + (ip[1] << 6);
-#endif

+ m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);

(Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
*** Need testing on big endian machine ***

Similarly:
-#if defined(__LITTLE_ENDIAN)
- m_pos -= (*(const unsigned short *)ip) >> 2;
-#else
- m_pos -= (ip[0] >> 2) + (ip[1] << 6);
-#endif

+ m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;


Apart from above changes, original LZO 2.02 code is retained as-is.

include/linux/lzo1x.h | 66 +++++++++++
lib/Kconfig | 6 +
lib/Makefile | 1 +
lib/lzo1x/Makefile | 3 +
lib/lzo1x/lzo1x_compress.c | 256 ++++++++++++++++++++++++++++++++++++++++++
lib/lzo1x/lzo1x_decompress.c | 235 ++++++++++++++++++++++++++++++++++++++
lib/lzo1x/lzo1x_int.h | 85 ++++++++++++++
7 files changed, 652 insertions(+), 0 deletions(-)

Signed-off-by: Nitin Gupta <[email protected]>
---
diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h
new file mode 100755
index 0000000..11a6f23
--- /dev/null
+++ b/include/linux/lzo1x.h
@@ -0,0 +1,66 @@
+/* lzo1x.h -- public interface of the LZO1X compression algorithm
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is modified version of lzo1x.h found in original LZO 2.02
+ code. Some additional changes have also been made to make it work
+ in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#ifndef __LZO1X_H
+#define __LZO1X_H
+
+/* LZO return codes */
+#define LZO_E_OK 0
+#define LZO_E_ERROR (-1)
+#define LZO_E_OUT_OF_MEMORY (-2) /* [not used right now] */
+#define LZO_E_NOT_COMPRESSIBLE (-3) /* [not used right now] */
+#define LZO_E_INPUT_OVERRUN (-4)
+#define LZO_E_OUTPUT_OVERRUN (-5)
+#define LZO_E_LOOKBEHIND_OVERRUN (-6)
+#define LZO_E_EOF_NOT_FOUND (-7)
+#define LZO_E_INPUT_NOT_CONSUMED (-8)
+#define LZO_E_NOT_YET_IMPLEMENTED (-9) /* [not used right now] */
+
+/* Size of temp buffer (workmem) required by lzo1x_compress */
+#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *)))
+
+/*
+ * This requires 'workmem' of size LZO1X_WORKMEM_SIZE
+ */
+int lzo1x_compress(const unsigned char *src, size_t src_len,
+ unsigned char *dst, size_t *dst_len,
+ void *workmem);
+
+/*
+ * This decompressor will catch all compressed data violations and
+ * return an error code in this case.
+ */
+int lzo1x_decompress(const unsigned char *src, size_t src_len,
+ unsigned char *dst, size_t *dst_len);
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 2e7ae6b..257f377 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -64,6 +64,12 @@ config ZLIB_INFLATE
config ZLIB_DEFLATE
tristate

+config LZO1X
+ tristate "LZO1X Compression/Decompression"
+ help
+ Compression: LZO1X-1
+ Decompression: LZO1X (safe)
+
#
# Generic allocator support is selected if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index c8c8e20..4dad99d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
+obj-$(CONFIG_LZO1X) += lzo1x/

obj-$(CONFIG_TEXTSEARCH) += textsearch.o
obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
new file mode 100644
index 0000000..fcd0d3e
--- /dev/null
+++ b/lib/lzo1x/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_LZO1X) += lzo1x.o
+lzo1x-objs := lzo1x_compress.o lzo1x_decompress.o
+
diff --git a/lib/lzo1x/lzo1x_compress.c b/lib/lzo1x/lzo1x_compress.c
new file mode 100755
index 0000000..dd3ee99
--- /dev/null
+++ b/lib/lzo1x/lzo1x_compress.c
@@ -0,0 +1,256 @@
+/* lzo1x_compress.c -- LZO1X-1 compression
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is derived from lzo1x_1.c and lzo1x_c.ch found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/compiler.h>
+
+#include "lzo1x_int.h"
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("LZO1X Compression");
+
+/* compress a block of data. */
+static unsigned int
+lzo1x_compress_worker(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ void *workmem)
+{
+ register const unsigned char *ip;
+ unsigned char *op;
+ const unsigned char * const in_end = in + in_len;
+ const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
+ const unsigned char *ii;
+ const unsigned char ** const dict = (const unsigned char **)workmem;
+
+ op = out;
+ ip = in;
+ ii = ip;
+
+ ip += 4;
+ for (;;) {
+ register const unsigned char *m_pos;
+ size_t m_off;
+ size_t m_len;
+ size_t dindex;
+
+ DINDEX1(dindex, ip);
+ m_pos = dict[dindex];
+
+ if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
+ || m_off > M4_MAX_OFFSET)
+ goto literal;
+
+ if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
+ goto try_match;
+
+ DINDEX2(dindex, ip);
+ m_pos = dict[dindex];
+
+ if ((m_pos < in) || (m_off = (size_t)(ip - m_pos)) <= 0
+ || m_off > M4_MAX_OFFSET)
+ goto literal;
+
+ if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
+ goto try_match;
+
+ goto literal;
+
+try_match:
+ if (*(const unsigned short *)m_pos ==
+ *(const unsigned short *)ip) {
+ if (likely(m_pos[2] == ip[2]))
+ goto match;
+ }
+
+ /* a literal */
+literal:
+ dict[dindex] = ip;
+ ++ip;
+ if (unlikely(ip >= ip_end))
+ break;
+ continue;
+
+ /* a match */
+match:
+ dict[dindex] = ip;
+ /* store current literal run */
+ if ((size_t)(ip - ii) > 0) {
+ register size_t t = (size_t)(ip - ii);
+ if (t <= 3)
+ op[-2] |= (unsigned char)(t);
+ else if (t <= 18)
+ *op++ = (unsigned char)(t - 3);
+ else {
+ register size_t tt = t - 18;
+ *op++ = 0;
+ while (tt > 255) {
+ tt -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)tt;
+ }
+ do *op++ = *ii++; while (--t > 0);
+ }
+
+ /* code the match */
+ ip += 3;
+ if (m_pos[3] != *ip++ || m_pos[4] != *ip++ ||
+ m_pos[5] != *ip++ || m_pos[6] != *ip++ ||
+ m_pos[7] != *ip++ || m_pos[8] != *ip++) {
+ --ip;
+ m_len = (size_t)(ip - ii);
+
+ if (m_off <= M2_MAX_OFFSET) {
+ m_off -= 1;
+ *op++ = (unsigned char)(((m_len - 1) << 5) |
+ ((m_off & 7) << 2));
+ *op++ = (unsigned char)(m_off >> 3);
+ }
+ else if (m_off <= M3_MAX_OFFSET) {
+ m_off -= 1;
+ *op++ = (unsigned char)(M3_MARKER |
+ (m_len - 2));
+ goto m3_m4_offset;
+ } else {
+ m_off -= 0x4000;
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11) |
+ (m_len - 2));
+ goto m3_m4_offset;
+ }
+ } else {
+ const unsigned char *end = in_end;
+ const unsigned char *m = m_pos + M2_MAX_LEN + 1;
+ while (ip < end && *m == *ip)
+ m++, ip++;
+ m_len = (size_t)(ip - ii);
+
+ if (m_off <= M3_MAX_OFFSET) {
+ m_off -= 1;
+ if (m_len <= 33)
+ *op++ = (unsigned char)(M3_MARKER |
+ (m_len - 2));
+ else {
+ m_len -= 33;
+ *op++ = M3_MARKER | 0;
+ goto m3_m4_len;
+ }
+ } else {
+ m_off -= 0x4000;
+ if (m_len <= M4_MAX_LEN)
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11) |
+ (m_len - 2));
+ else {
+ m_len -= M4_MAX_LEN;
+ *op++ = (unsigned char)(M4_MARKER |
+ ((m_off & 0x4000) >> 11));
+m3_m4_len:
+ while (m_len > 255) {
+ m_len -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)(m_len);
+ }
+ }
+
+m3_m4_offset:
+ *op++ = (unsigned char)((m_off & 63) << 2);
+ *op++ = (unsigned char)(m_off >> 6);
+ }
+
+ ii = ip;
+ if (unlikely(ip >= ip_end))
+ break;
+ }
+
+ *out_len = (size_t)(op - out);
+ return (size_t)(in_end - ii);
+}
+
+
+/*
+ * This requires buffer (workmem) of size LZO1X_WORKMEM_SIZE
+ * (exported by lzo1x.h).
+ */
+int
+lzo1x_compress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ void *workmem)
+{
+ unsigned char *op = out;
+ size_t t;
+
+ if (!workmem)
+ return -EINVAL;
+
+ if (unlikely(in_len <= M2_MAX_LEN + 5))
+ t = in_len;
+ else {
+ t = lzo1x_compress_worker(in, in_len, op, out_len, workmem);
+ op += *out_len;
+ }
+
+ if (t > 0) {
+ const unsigned char *ii = in + in_len - t;
+
+ if (op == out && t <= 238)
+ *op++ = (unsigned char)(17 + t);
+ else if (t <= 3)
+ op[-2] |= (unsigned char)(t);
+ else if (t <= 18)
+ *op++ = (unsigned char)(t - 3);
+ else {
+ size_t tt = t - 18;
+ *op++ = 0;
+ while (tt > 255) {
+ tt -= 255;
+ *op++ = 0;
+ }
+ *op++ = (unsigned char)tt;
+ }
+ do
+ *op++ = *ii++;
+ while (--t > 0);
+ }
+ *op++ = M4_MARKER | 1;
+ *op++ = 0;
+ *op++ = 0;
+
+ *out_len = (size_t)(op - out);
+ return 0;
+}
+
+EXPORT_SYMBOL(lzo1x_compress);
diff --git a/lib/lzo1x/lzo1x_decompress.c b/lib/lzo1x/lzo1x_decompress.c
new file mode 100755
index 0000000..cd8f634
--- /dev/null
+++ b/lib/lzo1x/lzo1x_decompress.c
@@ -0,0 +1,235 @@
+/* lzo1x_decompress.c -- LZO1X decompression
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file is derived from lzo1x_d1.c and lzo1x_d.ch found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <asm/byteorder.h>
+#include <linux/lzo1x.h>
+
+#include "lzo1x_int.h"
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("LZO1X Decompression");
+
+int
+lzo1x_decompress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len)
+{
+ register size_t t;
+ register unsigned char *op = out;
+ register const unsigned char *ip = in, *m_pos;
+ const unsigned char * const ip_end = in + in_len;
+ unsigned char * const op_end = out + *out_len;
+ *out_len = 0;
+
+ if (*ip > 17) {
+ t = *ip++ - 17;
+ if (t < 4)
+ goto match_next;
+ NEED_OP(t);
+ NEED_IP(t + 1);
+ do
+ *op++ = *ip++;
+ while (--t > 0);
+ goto first_literal_run;
+ }
+
+ while (TEST_IP) {
+ t = *ip++;
+ if (t >= 16)
+ goto match;
+ /* a literal run */
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 15 + *ip++;
+ }
+ /* copy literals */
+ NEED_OP(t + 3);
+ NEED_IP(t + 4);
+ COPY4(op, ip);
+ op += 4; ip += 4;
+ if (--t > 0) {
+ if (t >= 4) {
+ do {
+ COPY4(op, ip);
+ op += 4; ip += 4; t -= 4;
+ } while (t >= 4);
+ if (t > 0) do *op++ = *ip++; while (--t > 0);
+ }
+ else
+ do
+ *op++ = *ip++;
+ while (--t > 0);
+ }
+
+first_literal_run:
+ t = *ip++;
+ if (t >= 16)
+ goto match;
+ m_pos = op - (1 + M2_MAX_OFFSET);
+ m_pos -= t >> 2;
+ m_pos -= *ip++ << 2;
+ TEST_LB(m_pos);
+ NEED_OP(3);
+ *op++ = *m_pos++;
+ *op++ = *m_pos++;
+ *op++ = *m_pos;
+ goto match_done;
+
+ /* handle matches */
+ do {
+match:
+ if (t >= 64) { /* a M2 match */
+ m_pos = op - 1;
+ m_pos -= (t >> 2) & 7;
+ m_pos -= *ip++ << 3;
+ t = (t >> 5) - 1;
+ TEST_LB(m_pos);
+ NEED_OP(t + 3 - 1);
+ goto copy_match;
+ } else if (t >= 32) { /* a M3 match */
+ t &= 31;
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 31 + *ip++;
+ }
+ m_pos = op - 1 - (cpu_to_le16(
+ *(const unsigned short *)ip) >> 2);
+ ip += 2;
+ } else if (t >= 16) { /* a M4 match */
+ m_pos = op;
+ m_pos -= (t & 8) << 11;
+ t &= 7;
+ if (t == 0) {
+ NEED_IP(1);
+ while (*ip == 0) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 7 + *ip++;
+ }
+ m_pos -= cpu_to_le16(
+ *(const unsigned short *)ip) >> 2;
+ ip += 2;
+ if (m_pos == op)
+ goto eof_found;
+ m_pos -= 0x4000;
+ } else { /* a M1 match */
+ m_pos = op - 1;
+ m_pos -= t >> 2;
+ m_pos -= *ip++ << 2;
+ TEST_LB(m_pos);
+ NEED_OP(2);
+ *op++ = *m_pos++;
+ *op++ = *m_pos;
+ goto match_done;
+ }
+
+ /* copy match */
+ TEST_LB(m_pos);
+ NEED_OP(t + 3 - 1);
+
+ if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
+ COPY4(op, m_pos);
+ op += 4; m_pos += 4; t -= 4 - (3 - 1);
+ do {
+ COPY4(op, m_pos);
+ op += 4; m_pos += 4; t -= 4;
+ } while (t >= 4);
+ if (t > 0)
+ do *op++ = *m_pos++;
+ while (--t > 0);
+ } else {
+copy_match:
+ *op++ = *m_pos++;
+ *op++ = *m_pos++;
+ do
+ *op++ = *m_pos++;
+ while (--t > 0);
+ }
+
+match_done:
+ t = ip[-2] & 3;
+ if (t == 0)
+ break;
+
+ /* copy literals */
+match_next:
+ NEED_OP(t);
+ NEED_IP(t + 1);
+ *op++ = *ip++;
+ if (t > 1) {
+ *op++ = *ip++;
+ if (t > 2)
+ *op++ = *ip++;
+ }
+ t = *ip++;
+ } while (TEST_IP);
+ }
+
+ /* no EOF code was found */
+ *out_len = (size_t)(op - out);
+ return LZO_E_EOF_NOT_FOUND;
+
+eof_found:
+ *out_len = (size_t)(op - out);
+ return (ip == ip_end ? LZO_E_OK :
+ (ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED :
+ LZO_E_INPUT_OVERRUN));
+
+input_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_INPUT_OVERRUN;
+
+output_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_OUTPUT_OVERRUN;
+
+lookbehind_overrun:
+ *out_len = (size_t)(op - out);
+ return LZO_E_LOOKBEHIND_OVERRUN;
+}
+
+EXPORT_SYMBOL(lzo1x_decompress);
diff --git a/lib/lzo1x/lzo1x_int.h b/lib/lzo1x/lzo1x_int.h
new file mode 100755
index 0000000..4f0fe8d
--- /dev/null
+++ b/lib/lzo1x/lzo1x_int.h
@@ -0,0 +1,85 @@
+/* lzo1x_int.h -- to be used internally by LZO de/compression algorithms
+
+ This file is part of the LZO real-time data compression library.
+
+ Copyright (C) 1996-2005 Markus Franz Xaver Johannes Oberhumer
+ All Rights Reserved.
+
+ The LZO library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License,
+ version 2, as published by the Free Software Foundation.
+
+ The LZO library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with the LZO library; see the file COPYING.
+ If not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+ Markus F.X.J. Oberhumer
+ <[email protected]>
+ http://www.oberhumer.com/opensource/lzo/
+
+
+ This file was derived from several header files found in original
+ LZO 2.02 code. Some additional changes have also been made to make
+ it work in kernel space.
+
+ Nitin Gupta
+ <[email protected]>
+ */
+
+#ifndef __LZO1X_INT_H
+#define __LZO1X_INT_H
+
+#include <linux/types.h>
+
+#define D_BITS 14
+#define D_SIZE (1u << D_BITS)
+#define D_MASK (D_SIZE - 1)
+#define D_HIGH ((D_MASK >> 1) + 1)
+
+#define DX2(p,s1,s2) \
+ (((((size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0])
+#define DX3(p,s1,s2,s3) \
+ ((DX2((p) + 1, s2, s3) << (s1)) ^ (p)[0])
+#define DINDEX1(d,p) \
+ d = ((size_t)(0x21 * DX3(p, 5, 5, 6)) >> 5) & D_MASK
+#define DINDEX2(d,p) \
+ d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f)
+
+#define COPY4(dst,src) *(u32 *)(dst) = *(u32 *)(src)
+
+/* LZO1X Specific constants */
+#define M1_MAX_OFFSET 0x0400
+#define M2_MAX_OFFSET 0x0800
+#define M3_MAX_OFFSET 0x4000
+#define M4_MAX_OFFSET 0xbfff
+
+#define M1_MIN_LEN 2
+#define M1_MAX_LEN 2
+#define M2_MIN_LEN 3
+#define M2_MAX_LEN 8
+#define M3_MIN_LEN 3
+#define M3_MAX_LEN 33
+#define M4_MIN_LEN 3
+#define M4_MAX_LEN 9
+
+#define M1_MARKER 0
+#define M2_MARKER 64
+#define M3_MARKER 32
+#define M4_MARKER 16
+
+/* Bounds checking */
+#define TEST_IP (ip < ip_end)
+#define NEED_IP(x) \
+ if ((size_t)(ip_end - ip) < (size_t)(x)) goto input_overrun
+#define NEED_OP(x) \
+ if ((size_t)(op_end - op) < (size_t)(x)) goto output_overrun
+#define TEST_LB(m_pos) \
+ if (m_pos < out || m_pos >= op) goto lookbehind_overrun
+
+#endif


Attachments:
(No filename) (20.29 kB)
patch_lzo_2.6.22-rc2.bz2 (4.64 kB)
Download all attachments

2007-05-25 12:00:59

by Michael-Luke Jones

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On 25 May 2007, at 12:45, Nitin Gupta wrote:

> Hi,
>
> This is kernel port of LZO1X compressor and LZO1X decompressor (safe
> version only).

Hey there,

It's looking better now. Just wondering if you might want to separate
out the Kconfig options for lzo compress / decompress so that it's
orthogonal with zlib's inflate and deflate support. Read only
filesystems only have to decompress, for example.

M-L

2007-05-25 12:10:27

by Richard Purdie

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> Richard, can you please provide perf. results for this patch also?
> Also, can you please mail back latest version of your LZO patch? In
> meantime, I will try to include benchmarking support to the
> 'compress-test' module.

This version is 15% slower at decompression and about equal on
compression.

I am however still strongly of the opinion that we should just use the
version in -mm (which is my latest version).

Regards,

Richard



2007-05-25 12:32:49

by Satyam Sharma

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On 5/25/07, Nitin Gupta <[email protected]> wrote:
> Hi,
>
> This is kernel port of LZO1X compressor and LZO1X decompressor (safe
> version only).
>
> * Changes since 'take 3' (Full Changelog after this):
> 1) Removed 'unsafe' decompressor - hence also do away with symlinks in
> Makefiles.

Nice :-)

> 2) Rolled back changes where I replaced COPY4 with memcpy() calls.
> This seemed to be causing too much perf. loss as shown by Richard's
> tests. Need perf. testing again to confirm that this patch has perf.
> comparable to original LZO code/Richard's patch.

Hmmm, you've done away with a lot of cleanup in one big swoop under
suspicion of what exactly causes the performance loss ... so I'd rather
wait till you pinpoint the exact cause.

> 3) Some functions were inlined (DX2, DX3 etc.) - this also seemed to
> be one of factors for perf. loss. Changed them back Macros.

Why would an inline function be slower than a macro? If you're
worried that gcc may ignore to inline them (which is unlikely as it is
with -O2), you can use __always_inline ... inline functions do provide
type-checking.

> 4) Added back the 'register' keyword - again seemed to me one of
> factors for perf. loss.

Again, I wonder whether the programmer's opinion to keep something in
the registers is necessarily better than the compiler's. Especially when
gcc would quite likely ignore your register hint in any case, can't think
of too many reasons to use register these days ...

> Once I pinpoint exact reason for bad perf., I will do above cleanups
> again. But this should not be reason for non-inclusion in mainline.
> These are only minor cleanups.

Right, it'd be nice if you do a step-by-step cleanup to find what exactly
caused the performance hit.

> Richard, can you please provide perf. results for this patch also?
> Also, can you please mail back latest version of your LZO patch? In
> meantime, I will try to include benchmarking support to the
> 'compress-test' module.

I wonder if you could try and use crypto/tcrypt.c -- the debugfs test
module involves userspace, the VFS layer etc ... tcrypt otoh is fully
within the kernel so less likely to introduce spurious noise to the numbers
and better suited to performance benchmarking.

> 5) The only code change: (as suggested by Andrey)
> -#if defined(__LITTLE_ENDIAN)
> - m_pos = op - 1;
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos = op - 1;
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos = op - 1 - (cpu_to_le16(*(const u16 *)ip) >> 2);
>
> (Andrey suggested le16_to_cpu for above but I think it should be cpu_to_le16).
> *** Need testing on big endian machine ***
>
> Similarly:
> -#if defined(__LITTLE_ENDIAN)
> - m_pos -= (*(const unsigned short *)ip) >> 2;
> -#else
> - m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> -#endif
>
> + m_pos -= cpu_to_le16(*(const u16 *)ip) >> 2;

You could try and rollback this change too. Also, try to get results
on various arch's yourself (if you can lay your hands on some x86-64
/ sparc64 / etc boxen).

Otherwise, if it's still slower (and we're talking >10% here), there's not
much reason why we shouldn't stay with Richard's patches itself. You
could try and do cleanups to that itself (if Richard's fine with it and as
long as you retain performance).

Satyam

2007-05-25 12:38:24

by Satyam Sharma

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Richard,

On 5/25/07, Richard Purdie <[email protected]> wrote:
> On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> > Richard, can you please provide perf. results for this patch also?
> > Also, can you please mail back latest version of your LZO patch? In
> > meantime, I will try to include benchmarking support to the
> > 'compress-test' module.
>
> This version is 15% slower at decompression and about equal on
> compression.

I hope you tested your _safe variant against this, Nitin has done away
with the _unsafe version in this patch. Also, are you using your crypto
lzo-support + tcrypt changes ( http://lkml.org/lkml/2007/5/1/303 ) to
benchmark these?

> I am however still strongly of the opinion that we should just use the
> version in -mm (which is my latest version).

Right, if the difference is anything >10%, code cleanup does lose
its attractiveness.

Satyam

2007-05-25 12:57:33

by Nitin Gupta

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Richard,

On 5/25/07, Richard Purdie <[email protected]> wrote:
> On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> > Richard, can you please provide perf. results for this patch also?
> > Also, can you please mail back latest version of your LZO patch? In
> > meantime, I will try to include benchmarking support to the
> > 'compress-test' module.
>
> This version is 15% slower at decompression and about equal on
> compression.
>

If you don't mind, can you please try patch attached now? I have now
also rolled back that cpu_to_le16() change as Satyam suggested. I see
no other reason for this perf. loss as I made no other change!

Also, can you please verify if you are comparing your _safe_ version
with this patch? This patch does not include unsafe version and the
safe one is simply called lzo1x_decompress().


> I am however still strongly of the opinion that we should just use the
> version in -mm (which is my latest version).
>

I love this cleaned-up patch just sooo much! :)
Anyhow, if we can spend just small time scanning through these ~500
LOC to find reasons for any perf. loss, we will gain a lot!

Thanks,
Nitin


Attachments:
(No filename) (1.12 kB)
patch_lzo_2.6.22-rc2_richard_test.bz2 (4.62 kB)
Download all attachments

2007-05-25 13:34:16

by Richard Purdie

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Nitin,

On Fri, 2007-05-25 at 18:27 +0530, Nitin Gupta wrote:
> On 5/25/07, Richard Purdie <[email protected]> wrote:
> > On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> > > Richard, can you please provide perf. results for this patch also?
> > > Also, can you please mail back latest version of your LZO patch? In
> > > meantime, I will try to include benchmarking support to the
> > > 'compress-test' module.
> >
> > This version is 15% slower at decompression and about equal on
> > compression.
>
> If you don't mind, can you please try patch attached now? I have now
> also rolled back that cpu_to_le16() change as Satyam suggested. I see
> no other reason for this perf. loss as I made no other change!

I tested it with no real change in the results. Since I'm doing the
tests on LE, cpu_to_le16() should a NOP anyway.

> Also, can you please verify if you are comparing your _safe_ version
> with this patch? This patch does not include unsafe version and the
> safe one is simply called lzo1x_decompress().

Yes, I am comparing with my safe version.

Cheers,

Richard

2007-05-25 13:38:53

by Richard Purdie

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Fri, 2007-05-25 at 18:07 +0530, Satyam Sharma wrote:
> On 5/25/07, Richard Purdie <[email protected]> wrote:
> > On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> > > Richard, can you please provide perf. results for this patch also?
> > > Also, can you please mail back latest version of your LZO patch? In
> > > meantime, I will try to include benchmarking support to the
> > > 'compress-test' module.
> >
> > This version is 15% slower at decompression and about equal on
> > compression.
>
> I hope you tested your _safe variant against this, Nitin has done away
> with the _unsafe version in this patch.

I am.

> Also, are you using your crypto
> lzo-support + tcrypt changes ( http://lkml.org/lkml/2007/5/1/303 ) to
> benchmark these?

No, I'm putting the code into userspace and testing it there. Its not
difficult since there is little support needed from the kernel for
either set of code.

The tester runs each one in turn under the same circumstances from the
same binary and gives me a speed for each. Multiple runs return
consistent values and I'm being careful to make sure its an otherwise
idle machine and processes enough data to make sure there is no nasty
cache effects.

> > I am however still strongly of the opinion that we should just use the
> > version in -mm (which is my latest version).
>
> Right, if the difference is anything >10%, code cleanup does lose
> its attractiveness.

Agreed, and I still have the security and maintainability concerns. Add
them all together and its more unattractive.

Cheers,

Richard

2007-05-25 16:55:37

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Friday 25 May 2007 09:38:24 Richard Purdie wrote:
<snip>
> > > I am however still strongly of the opinion that we should just use the
> > > version in -mm (which is my latest version).
> >
> > Right, if the difference is anything >10%, code cleanup does lose
> > its attractiveness.
>
> Agreed, and I still have the security and maintainability concerns. Add
> them all together and its more unattractive.

I can understand the security concerns, but since none of the bounds checking
has been removed there shouldn't be any difference from a security
viewpoint. I have maintained the code to a MUD server at one point - I can
guarantee that it had a lot more code than the LZO code - and it was so
highly customized that no patches to the core code from anywhere *outside*
that games "coders" would apply. This means that every one of those patches
had to be done manually - sure, it was a massive PITA - but it was worth it.

In other words - yes, it will make maintaining the code harder, but the fact
that the code matches the kernels style and is "lightweight" compared to the
original userspace code *and* Richards "miniLZO" should mitigate this.

As to the performance - I can see absolutely no reason why the minimal version
shouldn't perform the same (or better). The kernel codes memset and memcpy
routines have been heavily tested *and* optimized over the years and moving
from macro's to inline functions shouldn't have impacted performance at all.
I will be testing the two code bases myself in a little bit - I'm more than a
little paranoid and don't like the idea of trusting anyone with a "competing
project" for all testing.

DRH

2007-05-25 18:45:32

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Friday 25 May 2007 12:55:21 Daniel Hazelton wrote:
<snip>
> As to the performance - I can see absolutely no reason why the minimal
> version shouldn't perform the same (or better). The kernel codes memset and
> memcpy routines have been heavily tested *and* optimized over the years and
> moving from macro's to inline functions shouldn't have impacted performance
> at all. I will be testing the two code bases myself in a little bit - I'm
> more than a little paranoid and don't like the idea of trusting anyone with
> a "competing project" for all testing.


I'll have to better instrument my test code (a real quick (userspace) hack)
using the minimized LZO1X implementation (take 4 :) and the complete LZOv2
library (lzo1x_1_11_compress and the *unsafe* version of the decompressor
used) but preliminary testing using just "time ./test" - the differences I've
seen might be because I'm directly including one version of the code and the
other is in a shared library. But even if I discount the system and user
time - going *only* by the "real" time value I get results across 10 runs
that differ by less than 0.001s - the average across 10 runs of the stripped
down LZO code is about 0.00133s where the LZO library (liblzo2) returns about
even performance - average is 0.001s.

A total difference of *ONE* *THIRD* of *ONE* *THOUSANDTH* of a second. With
the better performance being in-kernel should bring, I can see no reason for
a "big" difference.

If anyone's interested in the code I used for the test, let me know and I'll
make it available.

DRH

2007-05-25 19:35:56

by Satyam Sharma

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Daniel,

On 5/26/07, Daniel Hazelton <[email protected]> wrote:
> On Friday 25 May 2007 12:55:21 Daniel Hazelton wrote:
> <snip>
> > As to the performance - I can see absolutely no reason why the minimal
> > version shouldn't perform the same (or better). The kernel codes memset and
> > memcpy routines have been heavily tested *and* optimized over the years and
> > moving from macro's to inline functions shouldn't have impacted performance
> > at all. I will be testing the two code bases myself in a little bit - I'm
> > more than a little paranoid and don't like the idea of trusting anyone with
> > a "competing project" for all testing.
>
>
> I'll have to better instrument my test code (a real quick (userspace) hack)
> using the minimized LZO1X implementation (take 4 :) and the complete LZOv2
> library (lzo1x_1_11_compress and the *unsafe* version of the decompressor
> used) but preliminary testing using just "time ./test" - the differences I've
> seen might be because I'm directly including one version of the code and the
> other is in a shared library. But even if I discount the system and user
> time - going *only* by the "real" time value I get results across 10 runs
> that differ by less than 0.001s - the average across 10 runs of the stripped
> down LZO code is about 0.00133s where the LZO library (liblzo2) returns about
> even performance - average is 0.001s.
>
> A total difference of *ONE* *THIRD* of *ONE* *THOUSANDTH* of a second. With
> the better performance being in-kernel should bring, I can see no reason for
> a "big" difference.
>
> If anyone's interested in the code I used for the test, let me know and I'll
> make it available.

Yes, please do. It'd be nice if numbers are compared across arch's (LE,
BE, 32-bit, 64-bit) by others who have access to such boxes too.

Thanks,
Satyam

2007-05-26 10:29:20

by Richard Purdie

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Nitin,

On Fri, 2007-05-25 at 18:27 +0530, Nitin Gupta wrote:
> On 5/25/07, Richard Purdie <[email protected]> wrote:
> > On Fri, 2007-05-25 at 17:15 +0530, Nitin Gupta wrote:
> > > Richard, can you please provide perf. results for this patch also?
> > > Also, can you please mail back latest version of your LZO patch? In
> > > meantime, I will try to include benchmarking support to the
> > > 'compress-test' module.
> >
> > This version is 15% slower at decompression and about equal on
> > compression.
> >
>
> If you don't mind, can you please try patch attached now? I have now
> also rolled back that cpu_to_le16() change as Satyam suggested. I see
> no other reason for this perf. loss as I made no other change!
>
> Also, can you please verify if you are comparing your _safe_ version
> with this patch? This patch does not include unsafe version and the
> safe one is simply called lzo1x_decompress().

I've been looking at my benchmark figures and I think I've found why the
figures for my version were different to yours. Its not your code which
is at fault, its the way it was hooked into the benchmarking program.
The compiler was inlining some parts which it shouldn't have been
allowed to do, sorry :-/.

With that issue corrected, decompression is the same speed however
compression is showing about a 9% performance loss compared to my kernel
patch.

I did some diffs of the assembler outputted by our two versions (mine
matches minilzo). For decompression the output is effectively identical.
For compression, there are significant differences. If I add a noinline
attribute to lzo1x_compress_worker, that removes a lot of them (and
boosts speed a bit) but there are still differences. Ideally, I'd like
to understand why.

Regards,

Richard


2007-05-26 11:21:25

by Nitin Gupta

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Hi Richard,

On 5/26/07, Richard Purdie <[email protected]> wrote:
>
> I've been looking at my benchmark figures and I think I've found why the
> figures for my version were different to yours. Its not your code which
> is at fault, its the way it was hooked into the benchmarking program.
> The compiler was inlining some parts which it shouldn't have been
> allowed to do, sorry :-/.
>
> With that issue corrected, decompression is the same speed however
> compression is showing about a 9% performance loss compared to my kernel
> patch.
>
> I did some diffs of the assembler outputted by our two versions (mine
> matches minilzo). For decompression the output is effectively identical.
> For compression, there are significant differences. If I add a noinline
> attribute to lzo1x_compress_worker, that removes a lot of them (and
> boosts speed a bit) but there are still differences. Ideally, I'd like
> to understand why.
>

I will look more closely into compression code to see why you are
getting this perf. difference here.

Thanks for your tests.

- Nitin

2007-05-26 19:12:52

by Roland

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

btw - does in-kernel lzo scale on SMP systems ?

is it a matter of lzo builtin compression or a matter of the component using
in-kernel lzo compression ?

if i write/read data on reiser4 filesystem with lzo compression on - will
all CPUs being used ?

just curious here, because i remember reading about a scale problem with zfs
filesystem and gzip compression on solaris, so i'm just curious what to
expect on linux.

regards
roland

ps:
btw - there is a smp aware lzop utility at lemley.net/lzop_patches/lzop.html




>Hi,
>
>This is kernel port of LZO1X compressor and LZO1X decompressor (safe
>version only).
>
>* Changes since 'take 3' (Full Changelog after this):
>1) Removed 'unsafe' decompressor - hence also do away with symlinks in
>Makefiles.
>2) Rolled back changes where I replaced COPY4 with memcpy() calls.
>This seemed to be causing too much perf. loss as shown by Richard's
>tests. Need perf. testing again to confirm that this patch has perf.
>comparable to original LZO code/Richard's patch.
>3) Some functions were inlined (DX2, DX3 etc.) - this also seemed to
>be one of factors for perf. loss. Changed them back Macros.
>4) Added back the 'register' keyword - again seemed to me one of
>factors for perf. loss.
>
>Once I pinpoint exact reason for bad perf., I will do above cleanups
>again. But this should not be reason for non-inclusion in mainline.
>These are only minor cleanups.
>
>Richard, can you please provide perf. results for this patch also?
>Also, can you please mail back latest version of your LZO patch? In
>meantime, I will try to include benchmarking support to the
>'compress-test' module.

2007-05-28 08:18:26

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

Test code for this version (take 4) of the minimized LZO1X (from the liblzo
v2) is complete.


I don't see a significant slow-down comparing the complete liblzo2 to this
minimized code on my system (Pentium M 1.73GHz, 1GB Ram, Kubuntu Feisty
(stock Kubuntu kernel)). Rather, I see the opposite. This *might* have been
caused by the dynamic linking (or similar) so rather than rely on simply
doing "time xxx" I actually put checks around the calls to the
compress/decompress functions themselves.

('Tiny LZO' is what I call Nitins extremely small implementation of
lzo1x_[de]compress)

Output of the provided "test" script:
10 run averages:
'Tiny LZO':
Combined: 113.2 usec
Compression: 77.4 usec
Decompression: 35.8 usec
'liblzo2':
Combined: 140.7 usec
Compression: 94 usec
Decompression: 46.7 usec

(The "Combined" average is the average time taken for a compress+decompress)

TODO:
-Implement userspace version of likely/unlikely
-Implement cpu_to_le16 so code functions on BE systems

DRH


Attachments:
(No filename) (1.03 kB)
lzo1x-test.tar.bz2 (6.13 kB)
Download all attachments

2007-05-28 08:44:42

by Nitin Gupta

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> Test code for this version (take 4) of the minimized LZO1X (from the liblzo
> v2) is complete.
>
>
> I don't see a significant slow-down comparing the complete liblzo2 to this
> minimized code on my system (Pentium M 1.73GHz, 1GB Ram, Kubuntu Feisty
> (stock Kubuntu kernel)). Rather, I see the opposite. This *might* have been
> caused by the dynamic linking (or similar) so rather than rely on simply
> doing "time xxx" I actually put checks around the calls to the
> compress/decompress functions themselves.
>
> ('Tiny LZO' is what I call Nitins extremely small implementation of
> lzo1x_[de]compress)
>
> Output of the provided "test" script:
> 10 run averages:
> 'Tiny LZO':
> Combined: 113.2 usec
> Compression: 77.4 usec
> Decompression: 35.8 usec
> 'liblzo2':
> Combined: 140.7 usec
> Compression: 94 usec
> Decompression: 46.7 usec
>
> (The "Combined" average is the average time taken for a compress+decompress)
>
> TODO:
> -Implement userspace version of likely/unlikely
> -Implement cpu_to_le16 so code functions on BE systems
>
> DRH
>
>

As you mentioned in your mail, you are using lzo1x_1_11_compress()
which is slower than what I ported (which is same as what is exported
by miniLZO). So, can you please test with the version ported - this
is found in lzo/src/lzo1x_1.c (or in minilzo.c).
Also, can you please use 'take 5' for your next testing?

Thanks,
Nitin

2007-05-28 08:45:45

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Monday 28 May 2007 04:37:04 Nitin Gupta wrote:
> On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> > Test code for this version (take 4) of the minimized LZO1X (from the
> > liblzo v2) is complete.
> >
> >
> > I don't see a significant slow-down comparing the complete liblzo2 to
> > this minimized code on my system (Pentium M 1.73GHz, 1GB Ram, Kubuntu
> > Feisty (stock Kubuntu kernel)). Rather, I see the opposite. This *might*
> > have been caused by the dynamic linking (or similar) so rather than rely
> > on simply doing "time xxx" I actually put checks around the calls to the
> > compress/decompress functions themselves.
> >
> > ('Tiny LZO' is what I call Nitins extremely small implementation of
> > lzo1x_[de]compress)
> >
> > Output of the provided "test" script:
> > 10 run averages:
> > 'Tiny LZO':
> > Combined: 113.2 usec
> > Compression: 77.4 usec
> > Decompression: 35.8 usec
> > 'liblzo2':
> > Combined: 140.7 usec
> > Compression: 94 usec
> > Decompression: 46.7 usec
> >
> > (The "Combined" average is the average time taken for a
> > compress+decompress)
> >
> > TODO:
> > -Implement userspace version of likely/unlikely
> > -Implement cpu_to_le16 so code functions on BE systems
> >
> > DRH
>
> As you mentioned in your mail, you are using lzo1x_1_11_compress()
> which is slower than what I ported (which is same as what is exported
> by miniLZO). So, can you please test with the version ported - this
> is found in lzo/src/lzo1x_1.c (or in minilzo.c).
> Also, can you please use 'take 5' for your next testing?
>
> Thanks,
> Nitin

Will do. (that's DBITS=15, correct?)

However, when I averaged it 100 times, lzo1x_1_11_compress() showed better
speed than your implementation - about 1.5% faster. The *unsafe*
decompressor, however, only shows about a 1.2% speed advantage over the safe
decompressor.

DRH

2007-05-28 09:09:06

by Nitin Gupta

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> On Monday 28 May 2007 04:37:04 Nitin Gupta wrote:
> > On 5/28/07, Daniel Hazelton <[email protected]> wrote:

<snip>

> >
> > As you mentioned in your mail, you are using lzo1x_1_11_compress()
> > which is slower than what I ported (which is same as what is exported
> > by miniLZO). So, can you please test with the version ported - this
> > is found in lzo/src/lzo1x_1.c (or in minilzo.c).
> > Also, can you please use 'take 5' for your next testing?
> >
> > Thanks,
> > Nitin
>
> Will do. (that's DBITS=15, correct?)
>

That's D_BITS=14

> However, when I averaged it 100 times, lzo1x_1_11_compress() showed better
> speed than your implementation - about 1.5% faster.

I don't yet have any explanation for this.

> The *unsafe*
> decompressor, however, only shows about a 1.2% speed advantage over the safe
> decompressor.
>
> DRH
>

2007-05-28 09:21:44

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Monday 28 May 2007 05:08:54 Nitin Gupta wrote:
> On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> > On Monday 28 May 2007 04:37:04 Nitin Gupta wrote:
> > > On 5/28/07, Daniel Hazelton <[email protected]> wrote:
>
> <snip>
>
> > > As you mentioned in your mail, you are using lzo1x_1_11_compress()
> > > which is slower than what I ported (which is same as what is exported
> > > by miniLZO). So, can you please test with the version ported - this
> > > is found in lzo/src/lzo1x_1.c (or in minilzo.c).
> > > Also, can you please use 'take 5' for your next testing?
> > >
> > > Thanks,
> > > Nitin
> >
> > Will do. (that's DBITS=15, correct?)
>
> That's D_BITS=14
>
> > However, when I averaged it 100 times, lzo1x_1_11_compress() showed
> > better speed than your implementation - about 1.5% faster.
>
> I don't yet have any explanation for this.
>
> > The *unsafe*
> > decompressor, however, only shows about a 1.2% speed advantage over the
> > safe decompressor.
> >
> > DRH


New testbed based on minilzo complete.
Results from run using 1000 runs to generate averages:
1000 run averages:
'Tiny LZO':
Combined: 55.196 usec
Compression: 37.132 usec
Decompression: 18.064 usec
'miniLZO':
Combined: 55.785 usec
Compression: 40.862 usec
Decompression: 14.923 usec

(using (tiny/full)/100 for percentages:
'Tiny' is 0.9% faster on average
Same for the Compression
'safe' decompressor from tiny is 1.2% slower than unsafe from minilzo)

Newer version attached - same TODO items as the previous.

DRH


Attachments:
(No filename) (1.53 kB)
lzo1x-test-2.tar.bz2 (26.08 kB)
Download all attachments

2007-05-28 09:47:18

by Nitin Gupta

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> On Monday 28 May 2007 05:08:54 Nitin Gupta wrote:
> > On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> > > On Monday 28 May 2007 04:37:04 Nitin Gupta wrote:
> > > > On 5/28/07, Daniel Hazelton <[email protected]> wrote:
>
> New testbed based on minilzo complete.
> Results from run using 1000 runs to generate averages:
> 1000 run averages:
> 'Tiny LZO':
> Combined: 55.196 usec
> Compression: 37.132 usec
> Decompression: 18.064 usec
> 'miniLZO':
> Combined: 55.785 usec
> Compression: 40.862 usec
> Decompression: 14.923 usec
>

Great!

I believe its now ready for mainline. We can do further cleanups and
optimizations there - more users of this code will surely drive more
enhancements.

> (using (tiny/full)/100 for percentages:
> 'Tiny' is 0.9% faster on average


I think this is more appropriate: [(full-tiny)/full]*100
=> tiny is ~1% faster on average.


> Same for the Compression
> 'safe' decompressor from tiny is 1.2% slower than unsafe from minilzo)
>

This is expected :)


Cheers,
Nitin

2007-05-28 09:58:47

by Daniel Hazelton

[permalink] [raw]
Subject: Re: [RFC] LZO de/compression support - take 4

On Monday 28 May 2007 05:46:59 Nitin Gupta wrote:
> On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> > On Monday 28 May 2007 05:08:54 Nitin Gupta wrote:
> > > On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> > > > On Monday 28 May 2007 04:37:04 Nitin Gupta wrote:
> > > > > On 5/28/07, Daniel Hazelton <[email protected]> wrote:
> >
> > New testbed based on minilzo complete.
> > Results from run using 1000 runs to generate averages:
> > 1000 run averages:
> > 'Tiny LZO':
> > Combined: 55.196 usec
> > Compression: 37.132 usec
> > Decompression: 18.064 usec
> > 'miniLZO':
> > Combined: 55.785 usec
> > Compression: 40.862 usec
> > Decompression: 14.923 usec
>
> Great!
>
> I believe its now ready for mainline. We can do further cleanups and
> optimizations there - more users of this code will surely drive more
> enhancements.

Forgot to mention - this is still 'take 4' - I'm going to import the 'take 5'
code now and see if there is any appreciable difference
>
> > (using (tiny/full)/100 for percentages:
> > 'Tiny' is 0.9% faster on average
>
> I think this is more appropriate: [(full-tiny)/full]*100
> => tiny is ~1% faster on average.

Okay, I'll use that for all future calculation.

DRH

(Next version of the testbed will also test against the _safe variant in
miniLZO)