2012-10-07 15:08:22

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: [PATCH 0/3] Update LZO compression

As requested by akpm I am sending my "lzo-update" branch at

git://github.com/markus-oberhumer/linux.git lzo-update

to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".

You can also browse the branch at

https://github.com/markus-oberhumer/linux/compare/lzo-update

and review the three patches at

https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1

Share and enjoy,
Markus


Markus F.X.J. Oberhumer (3):
lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c
lib/lzo: Update LZO compression to current upstream version
lib/lzo: Optimize code for CPUs with inefficient unaligned access

include/linux/lzo.h | 15 +-
lib/lzo/Makefile | 2 +-
lib/lzo/lzo1x_compress.c | 309 +++++++++++++++++++++------------------
lib/lzo/lzo1x_decompress.c | 255 --------------------------------
lib/lzo/lzo1x_decompress_safe.c | 237 ++++++++++++++++++++++++++++++
lib/lzo/lzodefs.h | 34 ++++-
6 files changed, 441 insertions(+), 411 deletions(-)
delete mode 100644 lib/lzo/lzo1x_decompress.c
create mode 100644 lib/lzo/lzo1x_decompress_safe.c


2012-10-07 15:08:45

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: [PATCH 1/3] lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c

Rename the source file to match the function name and thereby
also make room for a possible future even slightly faster
"non-safe" decompressor version.

Signed-off-by: Markus F.X.J. Oberhumer <[email protected]>
---
lib/lzo/Makefile | 2 +-
...{lzo1x_decompress.c => lzo1x_decompress_safe.c} | 0
2 files changed, 1 insertions(+), 1 deletions(-)
rename lib/lzo/{lzo1x_decompress.c => lzo1x_decompress_safe.c} (100%)

diff --git a/lib/lzo/Makefile b/lib/lzo/Makefile
index e764116..f0f7d7c 100644
--- a/lib/lzo/Makefile
+++ b/lib/lzo/Makefile
@@ -1,5 +1,5 @@
lzo_compress-objs := lzo1x_compress.o
-lzo_decompress-objs := lzo1x_decompress.o
+lzo_decompress-objs := lzo1x_decompress_safe.o

obj-$(CONFIG_LZO_COMPRESS) += lzo_compress.o
obj-$(CONFIG_LZO_DECOMPRESS) += lzo_decompress.o
diff --git a/lib/lzo/lzo1x_decompress.c b/lib/lzo/lzo1x_decompress_safe.c
similarity index 100%
rename from lib/lzo/lzo1x_decompress.c
rename to lib/lzo/lzo1x_decompress_safe.c
--
1.7.1

2012-10-07 15:09:03

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: [PATCH 2/3] lib/lzo: Update LZO compression to current upstream version

This commit updates the kernel LZO code to the current upsteam version
which features a significant speed improvement - benchmarking the Calgary
and Silesia test corpora typically shows a doubled performance in
both compression and decompression on modern i386/x86_64/powerpc machines.

Signed-off-by: Markus F.X.J. Oberhumer <[email protected]>
---
include/linux/lzo.h | 15 +-
lib/lzo/lzo1x_compress.c | 309 +++++++++++++++++++----------------
lib/lzo/lzo1x_decompress_safe.c | 341 ++++++++++++++++++---------------------
lib/lzo/lzodefs.h | 34 +++-
4 files changed, 360 insertions(+), 339 deletions(-)

diff --git a/include/linux/lzo.h b/include/linux/lzo.h
index d793497..a0848d9 100644
--- a/include/linux/lzo.h
+++ b/include/linux/lzo.h
@@ -4,28 +4,28 @@
* LZO Public Kernel Interface
* A mini subset of the LZO real-time data compression library
*
- * Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <[email protected]>
+ * Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <[email protected]>
*
* The full LZO package can be found at:
* http://www.oberhumer.com/opensource/lzo/
*
- * Changed for kernel use by:
+ * Changed for Linux kernel use by:
* Nitin Gupta <[email protected]>
* Richard Purdie <[email protected]>
*/

-#define LZO1X_MEM_COMPRESS (16384 * sizeof(unsigned char *))
-#define LZO1X_1_MEM_COMPRESS LZO1X_MEM_COMPRESS
+#define LZO1X_1_MEM_COMPRESS (8192 * sizeof(unsigned short))
+#define LZO1X_MEM_COMPRESS LZO1X_1_MEM_COMPRESS

#define lzo1x_worst_compress(x) ((x) + ((x) / 16) + 64 + 3)

-/* This requires 'workmem' of size LZO1X_1_MEM_COMPRESS */
+/* This requires 'wrkmem' of size LZO1X_1_MEM_COMPRESS */
int lzo1x_1_compress(const unsigned char *src, size_t src_len,
- unsigned char *dst, size_t *dst_len, void *wrkmem);
+ unsigned char *dst, size_t *dst_len, void *wrkmem);

/* safe decompression with overrun testing */
int lzo1x_decompress_safe(const unsigned char *src, size_t src_len,
- unsigned char *dst, size_t *dst_len);
+ unsigned char *dst, size_t *dst_len);

/*
* Return values (< 0 = Error)
@@ -40,5 +40,6 @@ int lzo1x_decompress_safe(const unsigned char *src, size_t src_len,
#define LZO_E_EOF_NOT_FOUND (-7)
#define LZO_E_INPUT_NOT_CONSUMED (-8)
#define LZO_E_NOT_YET_IMPLEMENTED (-9)
+#define LZO_E_INVALID_ARGUMENT (-10)

#endif
diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c
index a604099..d42efe5 100644
--- a/lib/lzo/lzo1x_compress.c
+++ b/lib/lzo/lzo1x_compress.c
@@ -1,194 +1,217 @@
/*
- * LZO1X Compressor from MiniLZO
+ * LZO1X Compressor from LZO
*
- * Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <[email protected]>
+ * Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <[email protected]>
*
* The full LZO package can be found at:
* http://www.oberhumer.com/opensource/lzo/
*
- * Changed for kernel use by:
+ * Changed for Linux kernel use by:
* Nitin Gupta <[email protected]>
* Richard Purdie <[email protected]>
*/

#include <linux/module.h>
#include <linux/kernel.h>
-#include <linux/lzo.h>
#include <asm/unaligned.h>
+#include <linux/lzo.h>
#include "lzodefs.h"

static noinline size_t
-_lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
- unsigned char *out, size_t *out_len, void *wrkmem)
+lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ size_t ti, void *wrkmem)
{
+ const unsigned char *ip;
+ unsigned char *op;
const unsigned char * const in_end = in + in_len;
- const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
- const unsigned char ** const dict = wrkmem;
- const unsigned char *ip = in, *ii = ip;
- const unsigned char *end, *m, *m_pos;
- size_t m_off, m_len, dindex;
- unsigned char *op = out;
+ const unsigned char * const ip_end = in + in_len - 20;
+ const unsigned char *ii;
+ lzo_dict_t * const dict = (lzo_dict_t *) wrkmem;

- ip += 4;
+ op = out;
+ ip = in;
+ ii = ip;
+ ip += ti < 4 ? 4 - ti : 0;

for (;;) {
- dindex = ((size_t)(0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK;
- m_pos = dict[dindex];
-
- if (m_pos < in)
- goto literal;
-
- if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
- goto literal;
-
- m_off = ip - m_pos;
- if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
- goto try_match;
-
- dindex = (dindex & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f);
- m_pos = dict[dindex];
-
- if (m_pos < in)
- goto literal;
-
- if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET))
- goto literal;
-
- m_off = ip - m_pos;
- if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
- goto try_match;
-
- goto literal;
-
-try_match:
- if (get_unaligned((const unsigned short *)m_pos)
- == get_unaligned((const unsigned short *)ip)) {
- if (likely(m_pos[2] == ip[2]))
- goto match;
- }
-
+ const unsigned char *m_pos;
+ size_t t, m_len, m_off;
+ u32 dv;
literal:
- dict[dindex] = ip;
- ++ip;
+ ip += 1 + ((ip - ii) >> 5);
+next:
if (unlikely(ip >= ip_end))
break;
- continue;
-
-match:
- dict[dindex] = ip;
- if (ip != ii) {
- size_t t = ip - ii;
+ dv = get_unaligned_le32(ip);
+ t = ((dv * 0x1824429d) >> (32 - D_BITS)) & D_MASK;
+ m_pos = in + dict[t];
+ dict[t] = (lzo_dict_t) (ip - in);
+ if (unlikely(dv != get_unaligned_le32(m_pos)))
+ goto literal;

+ ii -= ti;
+ ti = 0;
+ t = ip - ii;
+ if (t != 0) {
if (t <= 3) {
op[-2] |= t;
- } else if (t <= 18) {
+ COPY4(op, ii);
+ op += t;
+ } else if (t <= 16) {
*op++ = (t - 3);
+ COPY8(op, ii);
+ COPY8(op + 8, ii + 8);
+ op += t;
} else {
- size_t tt = t - 18;
-
- *op++ = 0;
- while (tt > 255) {
- tt -= 255;
+ if (t <= 18) {
+ *op++ = (t - 3);
+ } else {
+ size_t tt = t - 18;
*op++ = 0;
+ while (unlikely(tt > 255)) {
+ tt -= 255;
+ *op++ = 0;
+ }
+ *op++ = tt;
}
- *op++ = tt;
+ do {
+ COPY8(op, ii);
+ COPY8(op + 8, ii + 8);
+ op += 16;
+ ii += 16;
+ t -= 16;
+ } while (t >= 16);
+ if (t > 0) do {
+ *op++ = *ii++;
+ } while (--t > 0);
}
- do {
- *op++ = *ii++;
- } while (--t > 0);
}

- ip += 3;
- if (m_pos[3] != *ip++ || m_pos[4] != *ip++
- || m_pos[5] != *ip++ || m_pos[6] != *ip++
- || m_pos[7] != *ip++ || m_pos[8] != *ip++) {
- --ip;
- m_len = ip - ii;
+ m_len = 4;
+ {
+#if defined(LZO_USE_CTZ64)
+ u64 v;
+ v = get_unaligned((const u64 *) (ip + m_len)) ^
+ get_unaligned((const u64 *) (m_pos + m_len));
+ if (unlikely(v == 0)) {
+ do {
+ m_len += 8;
+ v = get_unaligned((const u64 *) (ip + m_len)) ^
+ get_unaligned((const u64 *) (m_pos + m_len));
+ if (unlikely(ip + m_len >= ip_end))
+ goto m_len_done;
+ } while (v == 0);
+ }
+# if defined(__LITTLE_ENDIAN)
+ m_len += (unsigned) __builtin_ctzll(v) / 8;
+# elif defined(__BIG_ENDIAN)
+ m_len += (unsigned) __builtin_clzll(v) / 8;
+# else
+# error "missing endian definition"
+# endif
+#elif defined(LZO_USE_CTZ32)
+ u32 v;
+ v = get_unaligned((const u32 *) (ip + m_len)) ^
+ get_unaligned((const u32 *) (m_pos + m_len));
+ if (unlikely(v == 0)) {
+ do {
+ m_len += 4;
+ v = get_unaligned((const u32 *) (ip + m_len)) ^
+ get_unaligned((const u32 *) (m_pos + m_len));
+ if (unlikely(ip + m_len >= ip_end))
+ goto m_len_done;
+ } while (v == 0);
+ }
+# if defined(__LITTLE_ENDIAN)
+ m_len += (unsigned) __builtin_ctz(v) / 8;
+# elif defined(__BIG_ENDIAN)
+ m_len += (unsigned) __builtin_clz(v) / 8;
+# else
+# error "missing endian definition"
+# endif
+#else
+ if (unlikely(ip[m_len] == m_pos[m_len])) {
+ do {
+ m_len += 1;
+ if (unlikely(ip + m_len >= ip_end))
+ goto m_len_done;
+ } while (ip[m_len] == m_pos[m_len]);
+ }
+#endif
+ }
+m_len_done:

- if (m_off <= M2_MAX_OFFSET) {
- m_off -= 1;
- *op++ = (((m_len - 1) << 5)
- | ((m_off & 7) << 2));
- *op++ = (m_off >> 3);
- } else if (m_off <= M3_MAX_OFFSET) {
- m_off -= 1;
+ m_off = ip - m_pos;
+ ip += m_len;
+ ii = ip;
+ if (m_len <= M2_MAX_LEN && m_off <= M2_MAX_OFFSET) {
+ m_off -= 1;
+ *op++ = (((m_len - 1) << 5) | ((m_off & 7) << 2));
+ *op++ = (m_off >> 3);
+ } else if (m_off <= M3_MAX_OFFSET) {
+ m_off -= 1;
+ if (m_len <= M3_MAX_LEN)
*op++ = (M3_MARKER | (m_len - 2));
- goto m3_m4_offset;
- } else {
- m_off -= 0x4000;
-
- *op++ = (M4_MARKER | ((m_off & 0x4000) >> 11)
- | (m_len - 2));
- goto m3_m4_offset;
+ else {
+ m_len -= M3_MAX_LEN;
+ *op++ = M3_MARKER | 0;
+ while (unlikely(m_len > 255)) {
+ m_len -= 255;
+ *op++ = 0;
+ }
+ *op++ = (m_len);
}
+ *op++ = (m_off << 2);
+ *op++ = (m_off >> 6);
} else {
- end = in_end;
- m = m_pos + M2_MAX_LEN + 1;
-
- while (ip < end && *m == *ip) {
- m++;
- ip++;
- }
- m_len = ip - ii;
-
- if (m_off <= M3_MAX_OFFSET) {
- m_off -= 1;
- if (m_len <= 33) {
- *op++ = (M3_MARKER | (m_len - 2));
- } else {
- m_len -= 33;
- *op++ = M3_MARKER | 0;
- goto m3_m4_len;
- }
- } else {
- m_off -= 0x4000;
- if (m_len <= M4_MAX_LEN) {
- *op++ = (M4_MARKER
- | ((m_off & 0x4000) >> 11)
+ m_off -= 0x4000;
+ if (m_len <= M4_MAX_LEN)
+ *op++ = (M4_MARKER | ((m_off >> 11) & 8)
| (m_len - 2));
- } else {
- m_len -= M4_MAX_LEN;
- *op++ = (M4_MARKER
- | ((m_off & 0x4000) >> 11));
-m3_m4_len:
- while (m_len > 255) {
- m_len -= 255;
- *op++ = 0;
- }
-
- *op++ = (m_len);
+ else {
+ m_len -= M4_MAX_LEN;
+ *op++ = (M4_MARKER | ((m_off >> 11) & 8));
+ while (unlikely(m_len > 255)) {
+ m_len -= 255;
+ *op++ = 0;
}
+ *op++ = (m_len);
}
-m3_m4_offset:
- *op++ = ((m_off & 63) << 2);
+ *op++ = (m_off << 2);
*op++ = (m_off >> 6);
}
-
- ii = ip;
- if (unlikely(ip >= ip_end))
- break;
+ goto next;
}
-
*out_len = op - out;
- return in_end - ii;
+ return in_end - (ii - ti);
}

-int lzo1x_1_compress(const unsigned char *in, size_t in_len, unsigned char *out,
- size_t *out_len, void *wrkmem)
+int lzo1x_1_compress(const unsigned char *in, size_t in_len,
+ unsigned char *out, size_t *out_len,
+ void *wrkmem)
{
- const unsigned char *ii;
+ const unsigned char *ip = in;
unsigned char *op = out;
- size_t t;
+ size_t l = in_len;
+ size_t t = 0;

- if (unlikely(in_len <= M2_MAX_LEN + 5)) {
- t = in_len;
- } else {
- t = _lzo1x_1_do_compress(in, in_len, op, out_len, wrkmem);
+ while (l > 20) {
+ size_t ll = l <= (M4_MAX_OFFSET + 1) ? l : (M4_MAX_OFFSET + 1);
+ uintptr_t ll_end = (uintptr_t) ip + ll;
+ if ((ll_end + ((t + ll) >> 5)) <= ll_end)
+ break;
+ BUILD_BUG_ON(D_SIZE * sizeof(lzo_dict_t) > LZO1X_1_MEM_COMPRESS);
+ memset(wrkmem, 0, D_SIZE * sizeof(lzo_dict_t));
+ t = lzo1x_1_do_compress(ip, ll, op, out_len, t, wrkmem);
+ ip += ll;
op += *out_len;
+ l -= ll;
}
+ t += l;

if (t > 0) {
- ii = in + in_len - t;
+ const unsigned char *ii = in + in_len - t;

if (op == out && t <= 238) {
*op++ = (17 + t);
@@ -198,16 +221,21 @@ int lzo1x_1_compress(const unsigned char *in, size_t in_len, unsigned char *out,
*op++ = (t - 3);
} else {
size_t tt = t - 18;
-
*op++ = 0;
while (tt > 255) {
tt -= 255;
*op++ = 0;
}
-
*op++ = tt;
}
- do {
+ if (t >= 16) do {
+ COPY8(op, ii);
+ COPY8(op + 8, ii + 8);
+ op += 16;
+ ii += 16;
+ t -= 16;
+ } while (t >= 16);
+ if (t > 0) do {
*op++ = *ii++;
} while (--t > 0);
}
@@ -223,4 +251,3 @@ EXPORT_SYMBOL_GPL(lzo1x_1_compress);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("LZO1X-1 Compressor");
-
diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index f2fd098..0dba30c 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -1,12 +1,12 @@
/*
- * LZO1X Decompressor from MiniLZO
+ * LZO1X Decompressor from LZO
*
- * Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <[email protected]>
+ * Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <[email protected]>
*
* The full LZO package can be found at:
* http://www.oberhumer.com/opensource/lzo/
*
- * Changed for kernel use by:
+ * Changed for Linux kernel use by:
* Nitin Gupta <[email protected]>
* Richard Purdie <[email protected]>
*/
@@ -15,225 +15,198 @@
#include <linux/module.h>
#include <linux/kernel.h>
#endif
-
#include <asm/unaligned.h>
#include <linux/lzo.h>
#include "lzodefs.h"

-#define HAVE_IP(x, ip_end, ip) ((size_t)(ip_end - ip) < (x))
-#define HAVE_OP(x, op_end, op) ((size_t)(op_end - op) < (x))
-#define HAVE_LB(m_pos, out, op) (m_pos < out || m_pos >= op)
-
-#define COPY4(dst, src) \
- put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#define HAVE_IP(x) ((size_t)(ip_end - ip) >= (size_t)(x))
+#define HAVE_OP(x) ((size_t)(op_end - op) >= (size_t)(x))
+#define NEED_IP(x) if (!HAVE_IP(x)) goto input_overrun
+#define NEED_OP(x) if (!HAVE_OP(x)) goto output_overrun
+#define TEST_LB(m_pos) if ((m_pos) < out) goto lookbehind_overrun

int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
- unsigned char *out, size_t *out_len)
+ unsigned char *out, size_t *out_len)
{
+ unsigned char *op;
+ const unsigned char *ip;
+ size_t t, next;
+ size_t state = 0;
+ const unsigned char *m_pos;
const unsigned char * const ip_end = in + in_len;
unsigned char * const op_end = out + *out_len;
- const unsigned char *ip = in, *m_pos;
- unsigned char *op = out;
- size_t t;

- *out_len = 0;
+ op = out;
+ ip = in;

+ if (unlikely(in_len < 3))
+ goto input_overrun;
if (*ip > 17) {
t = *ip++ - 17;
- if (t < 4)
+ if (t < 4) {
+ next = t;
goto match_next;
- if (HAVE_OP(t, op_end, op))
- goto output_overrun;
- if (HAVE_IP(t + 1, ip_end, ip))
- goto input_overrun;
- do {
- *op++ = *ip++;
- } while (--t > 0);
- goto first_literal_run;
- }
-
- while ((ip < ip_end)) {
- t = *ip++;
- if (t >= 16)
- goto match;
- if (t == 0) {
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
- while (*ip == 0) {
- t += 255;
- ip++;
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
- }
- t += 15 + *ip++;
- }
- if (HAVE_OP(t + 3, op_end, op))
- goto output_overrun;
- if (HAVE_IP(t + 4, ip_end, ip))
- goto input_overrun;
-
- COPY4(op, ip);
- op += 4;
- ip += 4;
- if (--t > 0) {
- if (t >= 4) {
- do {
- COPY4(op, ip);
- op += 4;
- ip += 4;
- t -= 4;
- } while (t >= 4);
- if (t > 0) {
- do {
- *op++ = *ip++;
- } while (--t > 0);
- }
- } else {
- do {
- *op++ = *ip++;
- } while (--t > 0);
- }
}
+ goto copy_literal_run;
+ }

-first_literal_run:
+ for (;;) {
t = *ip++;
- if (t >= 16)
- goto match;
- m_pos = op - (1 + M2_MAX_OFFSET);
- m_pos -= t >> 2;
- m_pos -= *ip++ << 2;
-
- if (HAVE_LB(m_pos, out, op))
- goto lookbehind_overrun;
-
- if (HAVE_OP(3, op_end, op))
- goto output_overrun;
- *op++ = *m_pos++;
- *op++ = *m_pos++;
- *op++ = *m_pos;
-
- goto match_done;
-
- do {
-match:
- if (t >= 64) {
- m_pos = op - 1;
- m_pos -= (t >> 2) & 7;
- m_pos -= *ip++ << 3;
- t = (t >> 5) - 1;
- if (HAVE_LB(m_pos, out, op))
- goto lookbehind_overrun;
- if (HAVE_OP(t + 3 - 1, op_end, op))
- goto output_overrun;
- goto copy_match;
- } else if (t >= 32) {
- t &= 31;
- if (t == 0) {
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
- while (*ip == 0) {
+ if (t < 16) {
+ if (likely(state == 0)) {
+ if (unlikely(t == 0)) {
+ while (unlikely(*ip == 0)) {
t += 255;
ip++;
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
+ NEED_IP(1);
}
- t += 31 + *ip++;
+ t += 15 + *ip++;
}
- m_pos = op - 1;
- m_pos -= get_unaligned_le16(ip) >> 2;
- ip += 2;
- } else if (t >= 16) {
- m_pos = op;
- m_pos -= (t & 8) << 11;
-
- t &= 7;
- if (t == 0) {
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
- while (*ip == 0) {
- t += 255;
- ip++;
- if (HAVE_IP(1, ip_end, ip))
- goto input_overrun;
- }
- t += 7 + *ip++;
+ t += 3;
+copy_literal_run:
+ if (likely(HAVE_IP(t + 15) && HAVE_OP(t + 15))) {
+ const unsigned char *ie = ip + t;
+ unsigned char *oe = op + t;
+ do {
+ COPY8(op, ip);
+ op += 8;
+ ip += 8;
+ COPY8(op, ip);
+ op += 8;
+ ip += 8;
+ } while (ip < ie);
+ ip = ie;
+ op = oe;
+ } else {
+ NEED_OP(t);
+ NEED_IP(t + 3);
+ do {
+ *op++ = *ip++;
+ } while (--t > 0);
}
- m_pos -= get_unaligned_le16(ip) >> 2;
- ip += 2;
- if (m_pos == op)
- goto eof_found;
- m_pos -= 0x4000;
- } else {
+ state = 4;
+ continue;
+ } else if (state != 4) {
+ next = t & 3;
m_pos = op - 1;
m_pos -= t >> 2;
m_pos -= *ip++ << 2;
-
- if (HAVE_LB(m_pos, out, op))
- goto lookbehind_overrun;
- if (HAVE_OP(2, op_end, op))
- goto output_overrun;
-
- *op++ = *m_pos++;
- *op++ = *m_pos;
- goto match_done;
+ TEST_LB(m_pos);
+ NEED_OP(2);
+ op[0] = m_pos[0];
+ op[1] = m_pos[1];
+ op += 2;
+ goto match_next;
+ } else {
+ next = t & 3;
+ m_pos = op - (1 + M2_MAX_OFFSET);
+ m_pos -= t >> 2;
+ m_pos -= *ip++ << 2;
+ t = 3;
}
-
- if (HAVE_LB(m_pos, out, op))
- goto lookbehind_overrun;
- if (HAVE_OP(t + 3 - 1, op_end, op))
- goto output_overrun;
-
- if (t >= 2 * 4 - (3 - 1) && (op - m_pos) >= 4) {
- COPY4(op, m_pos);
- op += 4;
- m_pos += 4;
- t -= 4 - (3 - 1);
+ } else if (t >= 64) {
+ next = t & 3;
+ m_pos = op - 1;
+ m_pos -= (t >> 2) & 7;
+ m_pos -= *ip++ << 3;
+ t = (t >> 5) - 1 + (3 - 1);
+ } else if (t >= 32) {
+ t = (t & 31) + (3 - 1);
+ if (unlikely(t == 2)) {
+ while (unlikely(*ip == 0)) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 31 + *ip++;
+ NEED_IP(2);
+ }
+ m_pos = op - 1;
+ next = get_unaligned_le16(ip);
+ ip += 2;
+ m_pos -= next >> 2;
+ next &= 3;
+ } else {
+ m_pos = op;
+ m_pos -= (t & 8) << 11;
+ t = (t & 7) + (3 - 1);
+ if (unlikely(t == 2)) {
+ while (unlikely(*ip == 0)) {
+ t += 255;
+ ip++;
+ NEED_IP(1);
+ }
+ t += 7 + *ip++;
+ NEED_IP(2);
+ }
+ next = get_unaligned_le16(ip);
+ ip += 2;
+ m_pos -= next >> 2;
+ next &= 3;
+ if (m_pos == op)
+ goto eof_found;
+ m_pos -= 0x4000;
+ }
+ TEST_LB(m_pos);
+ if (op - m_pos >= 8) {
+ unsigned char *oe = op + t;
+ if (likely(HAVE_OP(t + 15))) {
do {
- COPY4(op, m_pos);
- op += 4;
- m_pos += 4;
- t -= 4;
- } while (t >= 4);
- if (t > 0)
- do {
- *op++ = *m_pos++;
- } while (--t > 0);
+ COPY8(op, m_pos);
+ op += 8;
+ m_pos += 8;
+ COPY8(op, m_pos);
+ op += 8;
+ m_pos += 8;
+ } while (op < oe);
+ op = oe;
+ if (HAVE_IP(6)) {
+ state = next;
+ COPY4(op, ip);
+ op += next;
+ ip += next;
+ continue;
+ }
} else {
-copy_match:
- *op++ = *m_pos++;
- *op++ = *m_pos++;
+ NEED_OP(t);
do {
*op++ = *m_pos++;
- } while (--t > 0);
+ } while (op < oe);
}
-match_done:
- t = ip[-2] & 3;
- if (t == 0)
- break;
+ } else {
+ unsigned char *oe = op + t;
+ NEED_OP(t);
+ op[0] = m_pos[0];
+ op[1] = m_pos[1];
+ op += 2;
+ m_pos += 2;
+ do {
+ *op++ = *m_pos++;
+ } while (op < oe);
+ }
match_next:
- if (HAVE_OP(t, op_end, op))
- goto output_overrun;
- if (HAVE_IP(t + 1, ip_end, ip))
- goto input_overrun;
-
- *op++ = *ip++;
- if (t > 1) {
+ state = next;
+ t = next;
+ if (likely(HAVE_IP(6) && HAVE_OP(4))) {
+ COPY4(op, ip);
+ op += t;
+ ip += t;
+ } else {
+ NEED_IP(t + 3);
+ NEED_OP(t);
+ while (t > 0) {
*op++ = *ip++;
- if (t > 2)
- *op++ = *ip++;
+ t--;
}
-
- t = *ip++;
- } while (ip < ip_end);
+ }
}

- *out_len = op - out;
- return LZO_E_EOF_NOT_FOUND;
-
eof_found:
*out_len = op - out;
- return (ip == ip_end ? LZO_E_OK :
- (ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED : LZO_E_INPUT_OVERRUN));
+ return (t != 3 ? LZO_E_ERROR :
+ ip == ip_end ? LZO_E_OK :
+ ip < ip_end ? LZO_E_INPUT_NOT_CONSUMED : LZO_E_INPUT_OVERRUN);
+
input_overrun:
*out_len = op - out;
return LZO_E_INPUT_OVERRUN;
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index b6d482c..ddc8db5 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -1,19 +1,37 @@
/*
* lzodefs.h -- architecture, OS and compiler specific defines
*
- * Copyright (C) 1996-2005 Markus F.X.J. Oberhumer <[email protected]>
+ * Copyright (C) 1996-2012 Markus F.X.J. Oberhumer <[email protected]>
*
* The full LZO package can be found at:
* http://www.oberhumer.com/opensource/lzo/
*
- * Changed for kernel use by:
+ * Changed for Linux kernel use by:
* Nitin Gupta <[email protected]>
* Richard Purdie <[email protected]>
*/

-#define LZO_VERSION 0x2020
-#define LZO_VERSION_STRING "2.02"
-#define LZO_VERSION_DATE "Oct 17 2005"
+
+#define COPY4(dst, src) \
+ put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#if defined(__x86_64__)
+#define COPY8(dst, src) \
+ put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))
+#else
+#define COPY8(dst, src) \
+ COPY4(dst, src); COPY4((dst) + 4, (src) + 4)
+#endif
+
+#if defined(__BIG_ENDIAN) && defined(__LITTLE_ENDIAN)
+#error "conflicting endian definitions"
+#elif defined(__x86_64__)
+#define LZO_USE_CTZ64 1
+#define LZO_USE_CTZ32 1
+#elif defined(__i386__) || defined(__powerpc__)
+#define LZO_USE_CTZ32 1
+#else
+#define LZO_USE_CTZ32 1
+#endif

#define M1_MAX_OFFSET 0x0400
#define M2_MAX_OFFSET 0x0800
@@ -34,8 +52,10 @@
#define M3_MARKER 32
#define M4_MARKER 16

-#define D_BITS 14
-#define D_MASK ((1u << D_BITS) - 1)
+#define lzo_dict_t unsigned short
+#define D_BITS 13
+#define D_SIZE (1u << D_BITS)
+#define D_MASK (D_SIZE - 1)
#define D_HIGH ((D_MASK >> 1) + 1)

#define DX2(p, s1, s2) (((((size_t)((p)[2]) << (s2)) ^ (p)[1]) \
--
1.7.1

2012-10-07 15:09:10

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: [PATCH 3/3] lib/lzo: Optimize code for CPUs with inefficient unaligned access

Some code paths are only benefical on machines with fast unaligned
loads, so only use these if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
defined.

Signed-off-by: Markus F.X.J. Oberhumer <[email protected]>
---
lib/lzo/lzo1x_compress.c | 4 ++--
lib/lzo/lzo1x_decompress_safe.c | 15 ++++++++++++---
lib/lzo/lzodefs.h | 2 +-
3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c
index d42efe5..1593dba 100644
--- a/lib/lzo/lzo1x_compress.c
+++ b/lib/lzo/lzo1x_compress.c
@@ -90,7 +90,7 @@ next:

m_len = 4;
{
-#if defined(LZO_USE_CTZ64)
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ64)
u64 v;
v = get_unaligned((const u64 *) (ip + m_len)) ^
get_unaligned((const u64 *) (m_pos + m_len));
@@ -110,7 +110,7 @@ next:
# else
# error "missing endian definition"
# endif
-#elif defined(LZO_USE_CTZ32)
+#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ32)
u32 v;
v = get_unaligned((const u32 *) (ip + m_len)) ^
get_unaligned((const u32 *) (m_pos + m_len));
diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index 0dba30c..569985d 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -64,6 +64,7 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
}
t += 3;
copy_literal_run:
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
if (likely(HAVE_IP(t + 15) && HAVE_OP(t + 15))) {
const unsigned char *ie = ip + t;
unsigned char *oe = op + t;
@@ -77,7 +78,9 @@ copy_literal_run:
} while (ip < ie);
ip = ie;
op = oe;
- } else {
+ } else
+#endif
+ {
NEED_OP(t);
NEED_IP(t + 3);
do {
@@ -148,6 +151,7 @@ copy_literal_run:
m_pos -= 0x4000;
}
TEST_LB(m_pos);
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
if (op - m_pos >= 8) {
unsigned char *oe = op + t;
if (likely(HAVE_OP(t + 15))) {
@@ -173,7 +177,9 @@ copy_literal_run:
*op++ = *m_pos++;
} while (op < oe);
}
- } else {
+ } else
+#endif
+ {
unsigned char *oe = op + t;
NEED_OP(t);
op[0] = m_pos[0];
@@ -187,11 +193,14 @@ copy_literal_run:
match_next:
state = next;
t = next;
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
if (likely(HAVE_IP(6) && HAVE_OP(4))) {
COPY4(op, ip);
op += t;
ip += t;
- } else {
+ } else
+#endif
+ {
NEED_IP(t + 3);
NEED_OP(t);
while (t > 0) {
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index ddc8db5..5a4beb2 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -29,7 +29,7 @@
#define LZO_USE_CTZ32 1
#elif defined(__i386__) || defined(__powerpc__)
#define LZO_USE_CTZ32 1
-#else
+#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
#define LZO_USE_CTZ32 1
#endif

--
1.7.1

2012-10-09 19:26:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

On Sun, 7 Oct 2012 17:07:55 +0200
"Markus F.X.J. Oberhumer" <[email protected]> wrote:

> As requested by akpm I am sending my "lzo-update" branch at
>
> git://github.com/markus-oberhumer/linux.git lzo-update
>
> to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".
>
> You can also browse the branch at
>
> https://github.com/markus-oberhumer/linux/compare/lzo-update
>
> and review the three patches at
>
> https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
> https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
> https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1

The changes look OK to me. Please ask Stephen to include the tree in
linux-next, for a 3.7 merge.



The changelog for patch 2/3 says:

: This commit updates the kernel LZO code to the current upsteam version
: which features a significant speed improvement - benchmarking the Calgary
: and Silesia test corpora typically shows a doubled performance in
: both compression and decompression on modern i386/x86_64/powerpc machines.


There are significant clients of the LZO library - crypto, btrfs,
jffs2, ubifs, squashfs and zcache. So let's give all those people a cc
and ask that they test the LZO changes once they land in linux-next.
For correctness and performance, please.

2012-10-09 19:55:18

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

Hi Stephen,

On 2012-10-09 21:26, Andrew Morton wrote:
> On Sun, 7 Oct 2012 17:07:55 +0200
> "Markus F.X.J. Oberhumer" <[email protected]> wrote:
>
>> As requested by akpm I am sending my "lzo-update" branch at
>>
>> git://github.com/markus-oberhumer/linux.git lzo-update
>>
>> to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".
>>
>> You can also browse the branch at
>>
>> https://github.com/markus-oberhumer/linux/compare/lzo-update
>>
>> and review the three patches at
>>
>> https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
>> https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
>> https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1
>
> The changes look OK to me. Please ask Stephen to include the tree in
> linux-next, for a 3.7 merge.

I'd ask you to include my "lzo-update" branch in linux-next:

git://github.com/markus-oberhumer/linux.git lzo-update

> The changelog for patch 2/3 says:
>
> : This commit updates the kernel LZO code to the current upsteam version
> : which features a significant speed improvement - benchmarking the Calgary
> : and Silesia test corpora typically shows a doubled performance in
> : both compression and decompression on modern i386/x86_64/powerpc machines.
>
> There are significant clients of the LZO library - crypto, btrfs,
> jffs2, ubifs, squashfs and zcache. So let's give all those people a cc
> and ask that they test the LZO changes once they land in linux-next.
> For correctness and performance, please.

The core compression and decompression code has been thoroughly tested, so I
do not expect major problems.

Good testing after the merge and feedback about build or performance issues
(and improvements!) is highly appreciated.

Many thanks,
Markus

--
Markus Oberhumer, <[email protected]>, http://www.oberhumer.com/

2012-10-09 22:43:37

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

Hi Markus,

On Tue, 09 Oct 2012 21:54:59 +0200 "Markus F.X.J. Oberhumer" <[email protected]> wrote:
>
> On 2012-10-09 21:26, Andrew Morton wrote:
> > On Sun, 7 Oct 2012 17:07:55 +0200
> > "Markus F.X.J. Oberhumer" <[email protected]> wrote:
> >
> >> As requested by akpm I am sending my "lzo-update" branch at
> >>
> >> git://github.com/markus-oberhumer/linux.git lzo-update
> >>
> >> to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".
> >>
> >> You can also browse the branch at
> >>
> >> https://github.com/markus-oberhumer/linux/compare/lzo-update
> >>
> >> and review the three patches at
> >>
> >> https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
> >> https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
> >> https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1
> >
> > The changes look OK to me. Please ask Stephen to include the tree in
> > linux-next, for a 3.7 merge.
>
> I'd ask you to include my "lzo-update" branch in linux-next:
>
> git://github.com/markus-oberhumer/linux.git lzo-update

I have added this from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgment of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees. You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next. These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc. The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc. If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.


Attachments:
(No filename) (2.64 kB)
(No filename) (836.00 B)
Download all attachments

2012-10-11 11:41:55

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

On Tuesday 09 October 2012, Markus F.X.J. Oberhumer wrote:
> >
> > : This commit updates the kernel LZO code to the current upsteam version
> > : which features a significant speed improvement - benchmarking the Calgary
> > : and Silesia test corpora typically shows a doubled performance in
> > : both compression and decompression on modern i386/x86_64/powerpc machines.
> >
> > There are significant clients of the LZO library - crypto, btrfs,
> > jffs2, ubifs, squashfs and zcache. So let's give all those people a cc
> > and ask that they test the LZO changes once they land in linux-next.
> > For correctness and performance, please.
>
> The core compression and decompression code has been thoroughly tested, so I
> do not expect major problems.
>
> Good testing after the merge and feedback about build or performance issues
> (and improvements!) is highly appreciated.

The addition of the lzo tree to linux-next caused this problem for ARM
imx_v6_v7_defconfig:

In file included from /home/arnd/linux-arm/arch/arm/boot/compressed/decompress.c:40:0:
/home/arnd/linux-arm/arch/arm/boot/compressed/../../../../lib/decompress_unlzo.c:34:34: fatal error: lzo/lzo1x_decompress.c: No such file or directory

Since the file was renamed, anything including it needs to be updated to the
new file name.

Signed-off-by: Arnd Bergmann <[email protected]>

diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c
index 4531294..960183d 100644
--- a/lib/decompress_unlzo.c
+++ b/lib/decompress_unlzo.c
@@ -31,7 +31,7 @@
*/

#ifdef STATIC
-#include "lzo/lzo1x_decompress.c"
+#include "lzo/lzo1x_decompress_safe.c"
#else
#include <linux/decompress/unlzo.h>
#endif

2012-10-11 16:29:40

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

Thanks Arnd,

On 2012-10-11 13:41, Arnd Bergmann wrote:
> On Tuesday 09 October 2012, Markus F.X.J. Oberhumer wrote:
>>>
>>> : This commit updates the kernel LZO code to the current upsteam version
>>> : which features a significant speed improvement - benchmarking the Calgary
>>> : and Silesia test corpora typically shows a doubled performance in
>>> : both compression and decompression on modern i386/x86_64/powerpc machines.
>>>
>>> There are significant clients of the LZO library - crypto, btrfs,
>>> jffs2, ubifs, squashfs and zcache. So let's give all those people a cc
>>> and ask that they test the LZO changes once they land in linux-next.
>>> For correctness and performance, please.
>>
>> The core compression and decompression code has been thoroughly tested, so I
>> do not expect major problems.
>>
>> Good testing after the merge and feedback about build or performance issues
>> (and improvements!) is highly appreciated.
>
> The addition of the lzo tree to linux-next caused this problem for ARM
> imx_v6_v7_defconfig:
>
> In file included from /home/arnd/linux-arm/arch/arm/boot/compressed/decompress.c:40:0:
> /home/arnd/linux-arm/arch/arm/boot/compressed/../../../../lib/decompress_unlzo.c:34:34: fatal error: lzo/lzo1x_decompress.c: No such file or directory
>
> Since the file was renamed, anything including it needs to be updated to the
> new file name.

I will add that patch to my tree.

Cheers,
Markus

>
> Signed-off-by: Arnd Bergmann <[email protected]>
>
> diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c
> index 4531294..960183d 100644
> --- a/lib/decompress_unlzo.c
> +++ b/lib/decompress_unlzo.c
> @@ -31,7 +31,7 @@
> */
>
> #ifdef STATIC
> -#include "lzo/lzo1x_decompress.c"
> +#include "lzo/lzo1x_decompress_safe.c"
> #else
> #include <linux/decompress/unlzo.h>
> #endif

--
Markus Oberhumer, <[email protected]>, http://www.oberhumer.com/

2012-10-15 19:19:33

by Seth Jennings

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

> As requested by akpm I am sending my "lzo-update" branch at
>
> git://github.com/markus-oberhumer/linux.git lzo-update
>
> to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".
>
> You can also browse the branch at
>
> https://github.com/markus-oberhumer/linux/compare/lzo-update
>
> and review the three patches at
>
> https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
> https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
> https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1

As this relates to my work on zcache, I just tested these patches on PPC64 and
they cause the LZO crypto module to fail its self-test:

[ 0.521137] alg: comp: Compression test 1 failed for lzo-generic: output len = 62

I built the exact same kernel for x86_64 and all is fine. I suspect an endianness
related bug, but I haven't looked at the code that closely yet.

Any ideas? I'd be happy to test any potential fixes.

Seth

2012-10-15 23:45:15

by Markus F.X.J. Oberhumer

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

On 2012-10-15 21:19, Seth Jennings wrote:
>> As requested by akpm I am sending my "lzo-update" branch at
>>
>> git://github.com/markus-oberhumer/linux.git lzo-update
>>
>> to lkml as a patch series created by "git format-patch -M v3.5..lzo-update".
>>
>> You can also browse the branch at
>>
>> https://github.com/markus-oberhumer/linux/compare/lzo-update
>>
>> and review the three patches at
>>
>> https://github.com/markus-oberhumer/linux/commit/7c979cebc0f93dc692b734c12665a6824d219c20
>> https://github.com/markus-oberhumer/linux/commit/10f6781c8591fe5fe4c8c733131915e5ae057826
>> https://github.com/markus-oberhumer/linux/commit/5f702781f158cb59075cfa97e5c21f52275057f1
>
> As this relates to my work on zcache, I just tested these patches on PPC64 and
> they cause the LZO crypto module to fail its self-test:
>
> [ 0.521137] alg: comp: Compression test 1 failed for lzo-generic: output len = 62
>
> I built the exact same kernel for x86_64 and all is fine. I suspect an endianness
> related bug, but I haven't looked at the code that closely yet.
>
> Any ideas? I'd be happy to test any potential fixes.

The crypto LZO test vectors had to be updated - this should land in linux-next
soon (or you can just pull from my branch).

BTW, this cannot have worked on x86_64 (or any other arch), so you probably
tested the wrong kernel.

Cheers,
Markus

> Seth

--
Markus Oberhumer, <[email protected]>, http://www.oberhumer.com/

2012-10-16 17:02:44

by Seth Jennings

[permalink] [raw]
Subject: Re: [PATCH 0/3] Update LZO compression

On 10/15/2012 06:45 PM, Markus F.X.J. Oberhumer wrote:
> The crypto LZO test vectors had to be updated - this should land in linux-next
> soon (or you can just pull from my branch).

Thanks, I pulled from your branch and now it works fine.

I noticed that you made the test vectors longer. Any particular
reason? Is there a new minimum length now or something?

I guess I'm not understanding why a change to the test vectors was needed.

Seth