2011-10-20 23:39:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 1/3] Add the snappy-c compressor to lib

From: Andi Kleen <[email protected]>

This is a C port of the google snappy compressor. It has roughly
comparable compression to LZO, but is significantly faster on many file
types. For example it beats all other compressors on already
compressed data.

I ported the original C++ code over to C and did some changes
to make it better fit into the kernel. It preallocates the worst
case memory consumption now. While the code being larger
than lzo it is still reasonable (about 5K on x86).

Decompression needs very little memory, Compression
currently 192K on 64bit and 128K on 32bit. For comparison
LZO compression needs 128K on 64bit and 64K on 32bit.

[This could be lowered significantly by not preallocating
for most use cases, typically the footprint is much lower.
The original C++ version only allocated most of this
when (rarely) needed, but this is more problematic in the kernel]

There are some minor divergences from the Linux coding standards:
in particular I kept the C++/C99 style mixed statement/declarations.
This was mainly to not diverge too much from the reference C++
source, so that improvements from there can be easily ported.
There are some other left overs from the google style, but very
little now.

checkpatch.pl has some complaints, but they are either caused by the
above or checkpatch.pl bugs (it misparsed #define foo do {} while(0))

Performance:

The compressor performs best on 64bit-LE systems,
but is also quite good on 32bit. I haven't tested BE, but
I don't expect that to add a lot of overhead.

Here is some performance data (32bit, Nehalem):
c/b = cycles/byte; lower numbers are better.

x86-64 executable: (compression minimally slower than qlz, but
much better at decompression, lzo is left in the dust):

snappy: emacs-gtk: 11007968 b: ratio 0.38: comp 8.13 uncomp 2.65 c/b
lzo : emacs-gtk: 11007968 b: ratio 0.33: comp 12.74 uncomp 4.70 c/b
zlib1 : emacs-gtk: 11007968 b: ratio 0.27: comp 49.96 uncomp 13.14 c/b
zlib3 : emacs-gtk: 11007968 b: ratio 0.26: comp 64.17 uncomp 12.33 c/b
lzf : emacs-gtk: 11007968 b: ratio 0.37: comp 9.85 uncomp 4.33 c/b
qlz : emacs-gtk: 11007968 b: ratio 0.34: comp 7.51 uncomp 6.28 c/b
fastlz: emacs-gtk: 11007968 b: ratio 0.37: comp 10.73 uncomp 4.97 c/b

Compressed data (beats everything else):

snappy: udev-151.tar.gz: 634842 b: ratio 1.00: comp 0.99 uncomp 0.33 c/b
lzo : udev-151.tar.gz: 634842 b: ratio 1.00: comp 41.44 uncomp 0.66 c/b
zlib1 : udev-151.tar.gz: 634842 b: ratio 1.00: comp 116.99 uncomp 3.94 c/b
zlib3 : udev-151.tar.gz: 634842 b: ratio 1.00: comp 117.68 uncomp 3.94 c/b
lzf : udev-151.tar.gz: 634842 b: ratio 1.03: comp 16.32 uncomp 1.14 c/b
qlz : udev-151.tar.gz: 634842 b: ratio 1.00: comp 10.42 uncomp 0.42 c/b
fastlz: udev-151.tar.gz: 634842 b: ratio 1.03: comp 19.35 uncomp 2.07 c/b

Text file (compression somewhat slower than qlz, but decompression
much better, lzo is much worse):

snappy: manual.txt: 445343 b: ratio 0.47: comp 12.01 uncomp 3.12 c/b
lzo : manual.txt: 445343 b: ratio 0.44: comp 16.32 uncomp 7.53 c/b
zlib1 : manual.txt: 445343 b: ratio 0.35: comp 56.37 uncomp 15.59 c/b
zlib3 : manual.txt: 445343 b: ratio 0.31: comp 73.45 uncomp 13.99 c/b
lzf : manual.txt: 445343 b: ratio 0.46: comp 13.43 uncomp 5.47 c/b
qlz : manual.txt: 445343 b: ratio 0.44: comp 9.16 uncomp 8.19 c/b
fastlz: manual.txt: 445343 b: ratio 0.46: comp 14.22 uncomp 7.28 c/b

As you can see snappy is a good all-around compressor.

On 64bit the compression is even faster and beats everything else easily:

snappy: emacs-gtk: 11007968 b: ratio 0.38: comp 4.90 uncomp 2.65 c/b
lzo : emacs-gtk: 11007968 b: ratio 0.33: comp 11.24 uncomp 4.46 c/b
zlib1 : emacs-gtk: 11007968 b: ratio 0.27: comp 41.67 uncomp 11.13 c/b
zlib3 : emacs-gtk: 11007968 b: ratio 0.26: comp 51.80 uncomp 10.54 c/b
lzf : emacs-gtk: 11007968 b: ratio 0.37: comp 8.79 uncomp 4.05 c/b
qlz : emacs-gtk: 11007968 b: ratio 0.34: comp 5.44 uncomp 5.46 c/b
fastlz: emacs-gtk: 11007968 b: ratio 0.37: comp 9.91 uncomp 4.77 c/b

On 64bit it's now nearly as fast as qlz on the text file too:

snappy: manual.txt: 445343 b: ratio 0.47: comp 7.79 uncomp 3.47 c/b
lzo : manual.txt: 445343 b: ratio 0.44: comp 15.46 uncomp 7.27 c/b
zlib1 : manual.txt: 445343 b: ratio 0.35: comp 45.79 uncomp 12.78 c/b
zlib3 : manual.txt: 445343 b: ratio 0.31: comp 60.52 uncomp 11.72 c/b
lzf : manual.txt: 445343 b: ratio 0.46: comp 12.62 uncomp 5.30 c/b
qlz : manual.txt: 445343 b: ratio 0.44: comp 6.81 uncomp 7.65 c/b
fastlz: manual.txt: 445343 b: ratio 0.46: comp 13.75 uncomp 6.52 c/b

Overall it's a good alternative to lzo, with the only
drawback being the somewhat higher memory use. The memory use
can be fixed for most cases (e.g. some of the current
buffers are only used for SG), but isn't yet in this version.

Open: it's pretty easy to add scatter-gather support since
input/output is quite abstracted. This would benefit
some users who could avoid temporary buffers.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/snappy.h | 26 +
lib/Kconfig | 8 +
lib/Makefile | 4 +
lib/snappy.c | 1294 ++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 1332 insertions(+), 0 deletions(-)
create mode 100644 include/linux/snappy.h
create mode 100644 lib/snappy.c

diff --git a/include/linux/snappy.h b/include/linux/snappy.h
new file mode 100644
index 0000000..4119803
--- /dev/null
+++ b/include/linux/snappy.h
@@ -0,0 +1,26 @@
+#ifndef _LINUX_SNAPPY_H
+#define _LINUX_SNAPPY_H 1
+
+#include <linux/types.h>
+
+/* Only needed for compression. This preallocates the worst case */
+struct snappy_env {
+ u16 *hash_table;
+ void *scratch;
+ void *scratch_output;
+};
+
+int snappy_init_env(struct snappy_env *env);
+void snappy_free_env(struct snappy_env *env);
+bool snappy_uncompress(const char *compressed, size_t n, char *uncompressed);
+int snappy_compress(struct snappy_env *env,
+ const char *input,
+ size_t input_length,
+ char *compressed,
+ size_t *compressed_length);
+bool snappy_uncompressed_length(const char *buf, size_t len, size_t *result);
+size_t snappy_max_compressed_length(size_t source_len);
+
+
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 6c695ff..416dc08 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -279,4 +279,12 @@ config CORDIC
config LLIST
bool

+config SNAPPY
+ tristate "Snappy compressor"
+ help
+ Add the snappy compressor. This is a reasonable compressor that
+ compresses and decompresses extremly fast. In general it's a better
+ replacement for LZO.
+
+
endmenu
diff --git a/lib/Makefile b/lib/Makefile
index 3f5bc6d..e38e580 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -117,6 +117,10 @@ obj-$(CONFIG_CORDIC) += cordic.o

obj-$(CONFIG_LLIST) += llist.o

+CFLAGS_snappy.o += $(call cc-disable-warning, declaration-after-statement) \
+ -DNDEBUG=1
+obj-$(CONFIG_SNAPPY) += snappy.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

diff --git a/lib/snappy.c b/lib/snappy.c
new file mode 100644
index 0000000..af44d66
--- /dev/null
+++ b/lib/snappy.c
@@ -0,0 +1,1294 @@
+/*
+ * C port of the snappy compressor from Google.
+ * This is a very fast compressor with comparable compression to lzo.
+ * Works best on 64bit little-endian, but should be good on others too.
+ * Ported by Andi Kleen.
+ * Based on snappy 1.0.3 plus some selected changes from SVN.
+ */
+
+/*
+ * Copyright 2005 Google Inc. All Rights Reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Google Inc. nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/snappy.h>
+#include <asm/unaligned.h>
+
+#define CRASH_UNLESS(x) BUG_ON(!(x))
+#define CHECK(cond) CRASH_UNLESS(cond)
+#define CHECK_LE(a, b) CRASH_UNLESS((a) <= (b))
+#define CHECK_GE(a, b) CRASH_UNLESS((a) >= (b))
+#define CHECK_EQ(a, b) CRASH_UNLESS((a) == (b))
+#define CHECK_NE(a, b) CRASH_UNLESS((a) != (b))
+#define CHECK_LT(a, b) CRASH_UNLESS((a) < (b))
+#define CHECK_GT(a, b) CRASH_UNLESS((a) > (b))
+
+#define UNALIGNED_LOAD16(_p) get_unaligned((u16 *)(_p))
+#define UNALIGNED_LOAD32(_p) get_unaligned((u32 *)(_p))
+#define UNALIGNED_LOAD64(_p) get_unaligned((u64 *)(_p))
+
+#define UNALIGNED_STORE16(_p, _val) put_unaligned(_val, (u16 *)(_p))
+#define UNALIGNED_STORE32(_p, _val) put_unaligned(_val, (u32 *)(_p))
+#define UNALIGNED_STORE64(_p, _val) put_unaligned(_val, (u64 *)(_p))
+
+#ifdef NDEBUG
+
+#define DCHECK(cond) do {} while(0)
+#define DCHECK_LE(a, b) do {} while(0)
+#define DCHECK_GE(a, b) do {} while(0)
+#define DCHECK_EQ(a, b) do {} while(0)
+#define DCHECK_NE(a, b) do {} while(0)
+#define DCHECK_LT(a, b) do {} while(0)
+#define DCHECK_GT(a, b) do {} while(0)
+
+#else
+
+#define DCHECK(cond) CHECK(cond)
+#define DCHECK_LE(a, b) CHECK_LE(a, b)
+#define DCHECK_GE(a, b) CHECK_GE(a, b)
+#define DCHECK_EQ(a, b) CHECK_EQ(a, b)
+#define DCHECK_NE(a, b) CHECK_NE(a, b)
+#define DCHECK_LT(a, b) CHECK_LT(a, b)
+#define DCHECK_GT(a, b) CHECK_GT(a, b)
+
+#endif
+
+static inline bool is_little_endian(void)
+{
+#ifdef __LITTLE_ENDIAN__
+ return true;
+#endif
+ return false;
+}
+
+static inline int log2_floor(u32 n)
+{
+ return n == 0 ? -1 : 31 ^ __builtin_clz(n);
+}
+
+static inline int find_lsb_set_non_zero(u32 n)
+{
+ return __builtin_ctz(n);
+}
+
+static inline int find_lsb_set_non_zero64(u64 n)
+{
+ return __builtin_ctzll(n);
+}
+
+#define kmax32 5
+
+/*
+ * Attempts to parse a varint32 from a prefix of the bytes in [ptr,limit-1].
+ * Never reads a character at or beyond limit. If a valid/terminated varint32
+ * was found in the range, stores it in *OUTPUT and returns a pointer just
+ * past the last byte of the varint32. Else returns NULL. On success,
+ * "result <= limit".
+ */
+static inline const char *varint_parse32_with_limit(const char *p,
+ const char *l,
+ u32 *OUTPUT)
+{
+ const unsigned char *ptr = (const unsigned char *)(p);
+ const unsigned char *limit = (const unsigned char *)(l);
+ u32 b, result;
+
+ if (ptr >= limit)
+ return NULL;
+ b = *(ptr++);
+ result = b & 127;
+ if (b < 128)
+ goto done;
+ if (ptr >= limit)
+ return NULL;
+ b = *(ptr++);
+ result |= (b & 127) << 7;
+ if (b < 128)
+ goto done;
+ if (ptr >= limit)
+ return NULL;
+ b = *(ptr++);
+ result |= (b & 127) << 14;
+ if (b < 128)
+ goto done;
+ if (ptr >= limit)
+ return NULL;
+ b = *(ptr++);
+ result |= (b & 127) << 21;
+ if (b < 128)
+ goto done;
+ if (ptr >= limit)
+ return NULL;
+ b = *(ptr++);
+ result |= (b & 127) << 28;
+ if (b < 16)
+ goto done;
+ return NULL; /* Value is too long to be a varint32 */
+done:
+ *OUTPUT = result;
+ return (const char *)(ptr);
+}
+
+/*
+ * REQUIRES "ptr" points to a buffer of length sufficient to hold "v".
+ * EFFECTS Encodes "v" into "ptr" and returns a pointer to the
+ * byte just past the last encoded byte.
+ */
+static inline char *varint_encode32(char *sptr, u32 v)
+{
+ /* Operate on characters as unsigneds */
+ unsigned char *ptr = (unsigned char *)(sptr);
+ static const int B = 128;
+
+ if (v < (1 << 7)) {
+ *(ptr++) = v;
+ } else if (v < (1 << 14)) {
+ *(ptr++) = v | B;
+ *(ptr++) = v >> 7;
+ } else if (v < (1 << 21)) {
+ *(ptr++) = v | B;
+ *(ptr++) = (v >> 7) | B;
+ *(ptr++) = v >> 14;
+ } else if (v < (1 << 28)) {
+ *(ptr++) = v | B;
+ *(ptr++) = (v >> 7) | B;
+ *(ptr++) = (v >> 14) | B;
+ *(ptr++) = v >> 21;
+ } else {
+ *(ptr++) = v | B;
+ *(ptr++) = (v >> 7) | B;
+ *(ptr++) = (v >> 14) | B;
+ *(ptr++) = (v >> 21) | B;
+ *(ptr++) = v >> 28;
+ }
+ return (char *)(ptr);
+}
+
+struct source {
+ const char *ptr;
+ size_t left;
+};
+
+static inline int available(struct source *s)
+{
+ return s->left;
+}
+
+static inline const char *peek(struct source *s, size_t * len)
+{
+ *len = s->left;
+ return s->ptr;
+}
+
+static inline void skip(struct source *s, size_t n)
+{
+ s->left -= n;
+ s->ptr += n;
+}
+
+struct sink {
+ char *dest;
+};
+
+static inline void append(struct sink *s, const char *data, size_t n)
+{
+ if (data != s->dest)
+ memcpy(s->dest, data, n);
+ s->dest += n;
+}
+
+static inline void *sink_peek(struct sink *s, size_t n)
+{
+ return s->dest;
+}
+
+struct writer {
+ char *base;
+ char *op;
+ char *op_limit;
+};
+
+/* Called before decompression */
+static inline void writer_set_expected_length(struct writer *w, size_t len)
+{
+ w->op_limit = w->op + len;
+}
+
+/* Called after decompression */
+static inline bool writer_check_length(struct writer *w)
+{
+ return w->op == w->op_limit;
+}
+
+/*
+ * Copy "len" bytes from "src" to "op", one byte at a time. Used for
+ * handling COPY operations where the input and output regions may
+ * overlap. For example, suppose:
+ * src == "ab"
+ * op == src + 2
+ * len == 20
+ * After IncrementalCopy(src, op, len), the result will have
+ * eleven copies of "ab"
+ * ababababababababababab
+ * Note that this does not match the semantics of either memcpy()
+ * or memmove().
+ */
+static inline void incremental_copy(const char *src, char *op, int len)
+{
+ DCHECK_GT(len, 0);
+ do {
+ *op++ = *src++;
+ } while (--len > 0);
+}
+
+/*
+ * Equivalent to IncrementalCopy except that it can write up to ten extra
+ * bytes after the end of the copy, and that it is faster.
+ *
+ * The main part of this loop is a simple copy of eight bytes at a time until
+ * we've copied (at least) the requested amount of bytes. However, if op and
+ * src are less than eight bytes apart (indicating a repeating pattern of
+ * length < 8), we first need to expand the pattern in order to get the correct
+ * results. For instance, if the buffer looks like this, with the eight-byte
+ * <src> and <op> patterns marked as intervals:
+ *
+ * abxxxxxxxxxxxx
+ * [------] src
+ * [------] op
+ *
+ * a single eight-byte copy from <src> to <op> will repeat the pattern once,
+ * after which we can move <op> two bytes without moving <src>:
+ *
+ * ababxxxxxxxxxx
+ * [------] src
+ * [------] op
+ *
+ * and repeat the exercise until the two no longer overlap.
+ *
+ * This allows us to do very well in the special case of one single byte
+ * repeated many times, without taking a big hit for more general cases.
+ *
+ * The worst case of extra writing past the end of the match occurs when
+ * op - src == 1 and len == 1; the last copy will read from byte positions
+ * [0..7] and write to [4..11], whereas it was only supposed to write to
+ * position 1. Thus, ten excess bytes.
+ */
+
+#define kmax_increment_copy_overflow 10
+
+static inline void incremental_copy_fast_path(const char *src, char *op,
+ int len)
+{
+ while (op - src < 8) {
+ UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
+ len -= op - src;
+ op += op - src;
+ }
+ while (len > 0) {
+ UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
+ src += 8;
+ op += 8;
+ len -= 8;
+ }
+}
+
+static inline bool writer_append_from_self(struct writer *w, u32 offset,
+ u32 len)
+{
+ char *op = w->op;
+ const int space_left = w->op_limit - op;
+
+ if (op - w->base <= offset - 1u) /* -1u catches offset==0 */
+ return false;
+ if (len <= 16 && offset >= 8 && space_left >= 16) {
+ /* Fast path, used for the majority (70-80%) of dynamic
+ * invocations. */
+ UNALIGNED_STORE64(op, UNALIGNED_LOAD64(op - offset));
+ UNALIGNED_STORE64(op + 8, UNALIGNED_LOAD64(op - offset + 8));
+ } else {
+ if (space_left >= len + kmax_increment_copy_overflow) {
+ incremental_copy_fast_path(op - offset, op, len);
+ } else {
+ if (space_left < len)
+ return false;
+ incremental_copy(op - offset, op, len);
+ }
+ }
+
+ w->op = op + len;
+ return true;
+}
+
+static inline bool writer_append(struct writer *w, const char *ip, u32 len,
+ bool allow_fast_path)
+{
+ char *op = w->op;
+ const int space_left = w->op_limit - op;
+ if (allow_fast_path && len <= 16 && space_left >= 16) {
+ /* Fast path, used for the majority (about 90%) of dynamic
+ * invocations. */
+ UNALIGNED_STORE64(op, UNALIGNED_LOAD64(ip));
+ UNALIGNED_STORE64(op + 8, UNALIGNED_LOAD64(ip + 8));
+ } else {
+ if (space_left < len)
+ return false;
+ memcpy(op, ip, len);
+ }
+ w->op = op + len;
+ return true;
+}
+
+/*
+ * Any hash function will produce a valid compressed bitstream, but a good
+ * hash function reduces the number of collisions and thus yields better
+ * compression for compressible input, and more speed for incompressible
+ * input. Of course, it doesn't hurt if the hash function is reasonably fast
+ * either, as it gets called a lot.
+ */
+static inline u32 hash_bytes(u32 bytes, int shift)
+{
+ u32 kmul = 0x1e35a7bd;
+ return (bytes * kmul) >> shift;
+}
+
+static inline u32 hash(const char *p, int shift)
+{
+ return hash_bytes(UNALIGNED_LOAD32(p), shift);
+}
+
+/*
+ * Compressed data can be defined as:
+ * compressed := item* literal*
+ * item := literal* copy
+ *
+ * The trailing literal sequence has a space blowup of at most 62/60
+ * since a literal of length 60 needs one tag byte + one extra byte
+ * for length information.
+ *
+ * Item blowup is trickier to measure. Suppose the "copy" op copies
+ * 4 bytes of data. Because of a special check in the encoding code,
+ * we produce a 4-byte copy only if the offset is < 65536. Therefore
+ * the copy op takes 3 bytes to encode, and this type of item leads
+ * to at most the 62/60 blowup for representing literals.
+ *
+ * Suppose the "copy" op copies 5 bytes of data. If the offset is big
+ * enough, it will take 5 bytes to encode the copy op. Therefore the
+ * worst case here is a one-byte literal followed by a five-byte copy.
+ * I.e., 6 bytes of input turn into 7 bytes of "compressed" data.
+ *
+ * This last factor dominates the blowup, so the final estimate is:
+ */
+size_t snappy_max_compressed_length(size_t source_len)
+{
+ return 32 + source_len + source_len / 6;
+}
+EXPORT_SYMBOL(snappy_max_compressed_length);
+
+enum {
+ LITERAL = 0,
+ COPY_1_BYTE_OFFSET = 1, /* 3 bit length + 3 bits of offset in opcode */
+ COPY_2_BYTE_OFFSET = 2,
+ COPY_4_BYTE_OFFSET = 3
+};
+
+static inline char *emit_literal(char *op,
+ const char *literal,
+ int len, bool allow_fast_path)
+{
+ int n = len - 1; /* Zero-length literals are disallowed */
+
+ if (n < 60) {
+ /* Fits in tag byte */
+ *op++ = LITERAL | (n << 2);
+
+/*
+ * The vast majority of copies are below 16 bytes, for which a
+ * call to memcpy is overkill. This fast path can sometimes
+ * copy up to 15 bytes too much, but that is okay in the
+ * main loop, since we have a bit to go on for both sides:
+ *
+ * - The input will always have kInputMarginBytes = 15 extra
+ * available bytes, as long as we're in the main loop, and
+ * if not, allow_fast_path = false.
+ * - The output will always have 32 spare bytes (see
+ * MaxCompressedLength).
+ */
+ if (allow_fast_path && len <= 16) {
+ UNALIGNED_STORE64(op, UNALIGNED_LOAD64(literal));
+ UNALIGNED_STORE64(op + 8,
+ UNALIGNED_LOAD64(literal + 8));
+ return op + len;
+ }
+ } else {
+ /* Encode in upcoming bytes */
+ char *base = op;
+ int count = 0;
+ op++;
+ while (n > 0) {
+ *op++ = n & 0xff;
+ n >>= 8;
+ count++;
+ }
+ DCHECK(count >= 1);
+ DCHECK(count <= 4);
+ *base = LITERAL | ((59 + count) << 2);
+ }
+ memcpy(op, literal, len);
+ return op + len;
+}
+
+static inline char *emit_copy_less_than64(char *op, int offset, int len)
+{
+ DCHECK_LE(len, 64);
+ DCHECK_GE(len, 4);
+ DCHECK_LT(offset, 65536);
+
+ if ((len < 12) && (offset < 2048)) {
+ int len_minus_4 = len - 4;
+ DCHECK(len_minus_4 < 8); /* Must fit in 3 bits */
+ *op++ =
+ COPY_1_BYTE_OFFSET | ((len_minus_4) << 2) | ((offset >> 8)
+ << 5);
+ *op++ = offset & 0xff;
+ } else {
+ *op++ = COPY_2_BYTE_OFFSET | ((len - 1) << 2);
+ put_unaligned_le16(offset, op);
+ op += 2;
+ }
+ return op;
+}
+
+static inline char *emit_copy(char *op, int offset, int len)
+{
+ /*
+ * Emit 64 byte copies but make sure to keep at least four bytes
+ * reserved
+ */
+ while (len >= 68) {
+ op = emit_copy_less_than64(op, offset, 64);
+ len -= 64;
+ }
+
+ /*
+ * Emit an extra 60 byte copy if have too much data to fit in
+ * one copy
+ */
+ if (len > 64) {
+ op = emit_copy_less_than64(op, offset, 60);
+ len -= 60;
+ }
+
+ /* Emit remainder */
+ op = emit_copy_less_than64(op, offset, len);
+ return op;
+}
+
+/**
+ * snappy_uncompressed_length - return length of uncompressed output.
+ * @start: compressed buffer
+ * @n: length of compressed buffer.
+ * @result: Write the length of the uncompressed output here.
+ *
+ * Returns true when successfull, otherwise false.
+ */
+bool snappy_uncompressed_length(const char *start, size_t n, size_t * result)
+{
+ u32 v = 0;
+ const char *limit = start + n;
+ if (varint_parse32_with_limit(start, limit, &v) != NULL) {
+ *result = v;
+ return true;
+ } else {
+ return false;
+ }
+}
+EXPORT_SYMBOL(snappy_uncompressed_length);
+
+#define kblock_log 15
+#define kblock_size (1 << kblock_log)
+
+#define kmax_hash_table_bits 14
+#define kmax_hash_table_size (1 << kmax_hash_table_bits)
+
+/*
+ * Use smaller hash table when input.size() is smaller, since we
+ * fill the table, incurring O(hash table size) overhead for
+ * compression, and if the input is short, we won't need that
+ * many hash table entries anyway.
+ */
+static u16 *get_hash_table(struct snappy_env *env, size_t input_size,
+ int *table_size)
+{
+ int htsize = 256;
+
+ DCHECK(kmax_hash_table_size >= 256);
+ while (htsize < kmax_hash_table_size && htsize < input_size)
+ htsize <<= 1;
+ CHECK_EQ(0, htsize & (htsize - 1));
+ CHECK_LE(htsize, kmax_hash_table_size);
+
+ u16 *table;
+ table = env->hash_table;
+
+ *table_size = htsize;
+ memset(table, 0, htsize * sizeof(*table));
+ return table;
+}
+
+/*
+ * Return the largest n such that
+ *
+ * s1[0,n-1] == s2[0,n-1]
+ * and n <= (s2_limit - s2).
+ *
+ * Does not read *s2_limit or beyond.
+ * Does not read *(s1 + (s2_limit - s2)) or beyond.
+ * Requires that s2_limit >= s2.
+ *
+ * Separate implementation for x86_64, for speed. Uses the fact that
+ * x86_64 is little endian.
+ */
+#if defined(__LITTLE_ENDIAN__)
+static inline int find_match_length(const char *s1,
+ const char *s2, const char *s2_limit)
+{
+ int matched = 0;
+
+ DCHECK_GE(s2_limit, s2);
+ /*
+ * Find out how long the match is. We loop over the data 64 bits at a
+ * time until we find a 64-bit block that doesn't match; then we find
+ * the first non-matching bit and use that to calculate the total
+ * length of the match.
+ */
+ while (likely(s2 <= s2_limit - 8)) {
+ if (unlikely
+ (UNALIGNED_LOAD64(s2) == UNALIGNED_LOAD64(s1 + matched))) {
+ s2 += 8;
+ matched += 8;
+ } else {
+ /*
+ * On current (mid-2008) Opteron models there
+ * is a 3% more efficient code sequence to
+ * find the first non-matching byte. However,
+ * what follows is ~10% better on Intel Core 2
+ * and newer, and we expect AMD's bsf
+ * instruction to improve.
+ */
+ u64 x =
+ UNALIGNED_LOAD64(s2) ^ UNALIGNED_LOAD64(s1 +
+ matched);
+ int matching_bits = find_lsb_set_non_zero64(x);
+ matched += matching_bits >> 3;
+ return matched;
+ }
+ }
+ while (likely(s2 < s2_limit)) {
+ if (likely(s1[matched] == *s2)) {
+ ++s2;
+ ++matched;
+ } else {
+ return matched;
+ }
+ }
+ return matched;
+}
+#else
+static inline int find_match_length(const char *s1,
+ const char *s2, const char *s2_limit)
+{
+ /* Implementation based on the x86-64 version, above. */
+ DCHECK_GE(s2_limit, s2);
+ int matched = 0;
+
+ while (s2 <= s2_limit - 4 &&
+ UNALIGNED_LOAD32(s2) == UNALIGNED_LOAD32(s1 + matched)) {
+ s2 += 4;
+ matched += 4;
+ }
+ if (is_little_endian() && s2 <= s2_limit - 4) {
+ u32 x =
+ UNALIGNED_LOAD32(s2) ^ UNALIGNED_LOAD32(s1 + matched);
+ int matching_bits = find_lsb_set_non_zero(x);
+ matched += matching_bits >> 3;
+ } else {
+ while ((s2 < s2_limit) && (s1[matched] == *s2)) {
+ ++s2;
+ ++matched;
+ }
+ }
+ return matched;
+}
+#endif
+
+/*
+ * For 0 <= offset <= 4, GetU32AtOffset(UNALIGNED_LOAD64(p), offset) will
+ * equal UNALIGNED_LOAD32(p + offset). Motivation: On x86-64 hardware we have
+ * empirically found that overlapping loads such as
+ * UNALIGNED_LOAD32(p) ... UNALIGNED_LOAD32(p+1) ... UNALIGNED_LOAD32(p+2)
+ * are slower than UNALIGNED_LOAD64(p) followed by shifts and casts to u32.
+ */
+static inline u32 get_u32_at_offset(u64 v, int offset)
+{
+ DCHECK(0 <= offset && offset <= 4);
+ return v >> (is_little_endian() ? 8 * offset : 32 - 8 * offset);
+}
+
+/*
+ * Flat array compression that does not emit the "uncompressed length"
+ * prefix. Compresses "input" string to the "*op" buffer.
+ *
+ * REQUIRES: "input" is at most "kBlockSize" bytes long.
+ * REQUIRES: "op" points to an array of memory that is at least
+ * "MaxCompressedLength(input.size())" in size.
+ * REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.
+ * REQUIRES: "table_size" is a power of two
+ *
+ * Returns an "end" pointer into "op" buffer.
+ * "end - op" is the compressed size of "input".
+ */
+
+static char *compress_fragment(const char *const input,
+ const size_t input_size,
+ char *op, u16 * table, const int table_size)
+{
+ /* "ip" is the input pointer, and "op" is the output pointer. */
+ const char *ip = input;
+ CHECK_LE(input_size, kblock_size);
+ CHECK_EQ(table_size & (table_size - 1), 0);
+ const int shift = 32 - log2_floor(table_size);
+ DCHECK_EQ(UINT_MAX >> shift, table_size - 1);
+ const char *ip_end = input + input_size;
+ const char *baseip = ip;
+ /*
+ * Bytes in [next_emit, ip) will be emitted as literal bytes. Or
+ * [next_emit, ip_end) after the main loop.
+ */
+ const char *next_emit = ip;
+
+ const int kinput_margin_bytes = 15;
+
+ if (likely(input_size >= kinput_margin_bytes)) {
+ const char *ip_limit = input + input_size -
+ kinput_margin_bytes;
+
+ u32 next_hash;
+ for (next_hash = hash(++ip, shift);;) {
+ DCHECK_LT(next_emit, ip);
+/*
+ * The body of this loop calls EmitLiteral once and then EmitCopy one or
+ * more times. (The exception is that when we're close to exhausting
+ * the input we goto emit_remainder.)
+ *
+ * In the first iteration of this loop we're just starting, so
+ * there's nothing to copy, so calling EmitLiteral once is
+ * necessary. And we only start a new iteration when the
+ * current iteration has determined that a call to EmitLiteral will
+ * precede the next call to EmitCopy (if any).
+ *
+ * Step 1: Scan forward in the input looking for a 4-byte-long match.
+ * If we get close to exhausting the input then goto emit_remainder.
+ *
+ * Heuristic match skipping: If 32 bytes are scanned with no matches
+ * found, start looking only at every other byte. If 32 more bytes are
+ * scanned, look at every third byte, etc.. When a match is found,
+ * immediately go back to looking at every byte. This is a small loss
+ * (~5% performance, ~0.1% density) for lcompressible data due to more
+ * bookkeeping, but for non-compressible data (such as JPEG) it's a huge
+ * win since the compressor quickly "realizes" the data is incompressible
+ * and doesn't bother looking for matches everywhere.
+ *
+ * The "skip" variable keeps track of how many bytes there are since the
+ * last match; dividing it by 32 (ie. right-shifting by five) gives the
+ * number of bytes to move ahead for each iteration.
+ */
+ u32 skip = 32;
+
+ const char *next_ip = ip;
+ const char *candidate;
+ do {
+ ip = next_ip;
+ u32 hval = next_hash;
+ DCHECK_EQ(hval, hash(ip, shift));
+ u32 bytes_between_hash_lookups = skip++ >> 5;
+ next_ip = ip + bytes_between_hash_lookups;
+ if (unlikely(next_ip > ip_limit)) {
+ goto emit_remainder;
+ }
+ next_hash = hash(next_ip, shift);
+ candidate = baseip + table[hval];
+ DCHECK_GE(candidate, baseip);
+ DCHECK_LT(candidate, ip);
+
+ table[hval] = ip - baseip;
+ } while (likely(UNALIGNED_LOAD32(ip) !=
+ UNALIGNED_LOAD32(candidate)));
+
+/*
+ * Step 2: A 4-byte match has been found. We'll later see if more
+ * than 4 bytes match. But, prior to the match, input
+ * bytes [next_emit, ip) are unmatched. Emit them as "literal bytes."
+ */
+ DCHECK_LE(next_emit + 16, ip_end);
+ op = emit_literal(op, next_emit, ip - next_emit, true);
+
+/*
+ * Step 3: Call EmitCopy, and then see if another EmitCopy could
+ * be our next move. Repeat until we find no match for the
+ * input immediately after what was consumed by the last EmitCopy call.
+ *
+ * If we exit this loop normally then we need to call EmitLiteral next,
+ * though we don't yet know how big the literal will be. We handle that
+ * by proceeding to the next iteration of the main loop. We also can exit
+ * this loop via goto if we get close to exhausting the input.
+ */
+ u64 input_bytes = 0;
+ u32 candidate_bytes = 0;
+
+ do {
+/*
+ * We have a 4-byte match at ip, and no need to emit any
+ * "literal bytes" prior to ip.
+ */
+ const char *base = ip;
+ int matched = 4 +
+ find_match_length(candidate + 4, ip + 4,
+ ip_end);
+ ip += matched;
+ int offset = base - candidate;
+ DCHECK_EQ(0, memcmp(base, candidate, matched));
+ op = emit_copy(op, offset, matched);
+/*
+ * We could immediately start working at ip now, but to improve
+ * compression we first update table[Hash(ip - 1, ...)].
+ */
+ const char *insert_tail = ip - 1;
+ next_emit = ip;
+ if (unlikely(ip >= ip_limit)) {
+ goto emit_remainder;
+ }
+ input_bytes = UNALIGNED_LOAD64(insert_tail);
+ u32 prev_hash =
+ hash_bytes(get_u32_at_offset
+ (input_bytes, 0), shift);
+ table[prev_hash] = ip - baseip - 1;
+ u32 cur_hash =
+ hash_bytes(get_u32_at_offset
+ (input_bytes, 1), shift);
+ candidate = baseip + table[cur_hash];
+ candidate_bytes = UNALIGNED_LOAD32(candidate);
+ table[cur_hash] = ip - baseip;
+ } while (get_u32_at_offset(input_bytes, 1) ==
+ candidate_bytes);
+
+ next_hash =
+ hash_bytes(get_u32_at_offset(input_bytes, 2),
+ shift);
+ ++ip;
+ }
+ }
+
+emit_remainder:
+ /* Emit the remaining bytes as a literal */
+ if (next_emit < ip_end)
+ op = emit_literal(op, next_emit, ip_end - next_emit, false);
+
+ return op;
+}
+
+/*
+ * -----------------------------------------------------------------------
+ * Lookup table for decompression code. Generated by ComputeTable() below.
+ * -----------------------------------------------------------------------
+ */
+
+/* Mapping from i in range [0,4] to a mask to extract the bottom 8*i bits */
+static const u32 wordmask[] = {
+ 0u, 0xffu, 0xffffu, 0xffffffu, 0xffffffffu
+};
+
+/*
+ * Data stored per entry in lookup table:
+ * Range Bits-used Description
+ * ------------------------------------
+ * 1..64 0..7 Literal/copy length encoded in opcode byte
+ * 0..7 8..10 Copy offset encoded in opcode byte / 256
+ * 0..4 11..13 Extra bytes after opcode
+ *
+ * We use eight bits for the length even though 7 would have sufficed
+ * because of efficiency reasons:
+ * (1) Extracting a byte is faster than a bit-field
+ * (2) It properly aligns copy offset so we do not need a <<8
+ */
+static const u16 char_table[256] = {
+ 0x0001, 0x0804, 0x1001, 0x2001, 0x0002, 0x0805, 0x1002, 0x2002,
+ 0x0003, 0x0806, 0x1003, 0x2003, 0x0004, 0x0807, 0x1004, 0x2004,
+ 0x0005, 0x0808, 0x1005, 0x2005, 0x0006, 0x0809, 0x1006, 0x2006,
+ 0x0007, 0x080a, 0x1007, 0x2007, 0x0008, 0x080b, 0x1008, 0x2008,
+ 0x0009, 0x0904, 0x1009, 0x2009, 0x000a, 0x0905, 0x100a, 0x200a,
+ 0x000b, 0x0906, 0x100b, 0x200b, 0x000c, 0x0907, 0x100c, 0x200c,
+ 0x000d, 0x0908, 0x100d, 0x200d, 0x000e, 0x0909, 0x100e, 0x200e,
+ 0x000f, 0x090a, 0x100f, 0x200f, 0x0010, 0x090b, 0x1010, 0x2010,
+ 0x0011, 0x0a04, 0x1011, 0x2011, 0x0012, 0x0a05, 0x1012, 0x2012,
+ 0x0013, 0x0a06, 0x1013, 0x2013, 0x0014, 0x0a07, 0x1014, 0x2014,
+ 0x0015, 0x0a08, 0x1015, 0x2015, 0x0016, 0x0a09, 0x1016, 0x2016,
+ 0x0017, 0x0a0a, 0x1017, 0x2017, 0x0018, 0x0a0b, 0x1018, 0x2018,
+ 0x0019, 0x0b04, 0x1019, 0x2019, 0x001a, 0x0b05, 0x101a, 0x201a,
+ 0x001b, 0x0b06, 0x101b, 0x201b, 0x001c, 0x0b07, 0x101c, 0x201c,
+ 0x001d, 0x0b08, 0x101d, 0x201d, 0x001e, 0x0b09, 0x101e, 0x201e,
+ 0x001f, 0x0b0a, 0x101f, 0x201f, 0x0020, 0x0b0b, 0x1020, 0x2020,
+ 0x0021, 0x0c04, 0x1021, 0x2021, 0x0022, 0x0c05, 0x1022, 0x2022,
+ 0x0023, 0x0c06, 0x1023, 0x2023, 0x0024, 0x0c07, 0x1024, 0x2024,
+ 0x0025, 0x0c08, 0x1025, 0x2025, 0x0026, 0x0c09, 0x1026, 0x2026,
+ 0x0027, 0x0c0a, 0x1027, 0x2027, 0x0028, 0x0c0b, 0x1028, 0x2028,
+ 0x0029, 0x0d04, 0x1029, 0x2029, 0x002a, 0x0d05, 0x102a, 0x202a,
+ 0x002b, 0x0d06, 0x102b, 0x202b, 0x002c, 0x0d07, 0x102c, 0x202c,
+ 0x002d, 0x0d08, 0x102d, 0x202d, 0x002e, 0x0d09, 0x102e, 0x202e,
+ 0x002f, 0x0d0a, 0x102f, 0x202f, 0x0030, 0x0d0b, 0x1030, 0x2030,
+ 0x0031, 0x0e04, 0x1031, 0x2031, 0x0032, 0x0e05, 0x1032, 0x2032,
+ 0x0033, 0x0e06, 0x1033, 0x2033, 0x0034, 0x0e07, 0x1034, 0x2034,
+ 0x0035, 0x0e08, 0x1035, 0x2035, 0x0036, 0x0e09, 0x1036, 0x2036,
+ 0x0037, 0x0e0a, 0x1037, 0x2037, 0x0038, 0x0e0b, 0x1038, 0x2038,
+ 0x0039, 0x0f04, 0x1039, 0x2039, 0x003a, 0x0f05, 0x103a, 0x203a,
+ 0x003b, 0x0f06, 0x103b, 0x203b, 0x003c, 0x0f07, 0x103c, 0x203c,
+ 0x0801, 0x0f08, 0x103d, 0x203d, 0x1001, 0x0f09, 0x103e, 0x203e,
+ 0x1801, 0x0f0a, 0x103f, 0x203f, 0x2001, 0x0f0b, 0x1040, 0x2040
+};
+
+struct snappy_decompressor {
+ struct source *reader; /* Underlying source of bytes to decompress */
+ const char *ip; /* Points to next buffered byte */
+ const char *ip_limit; /* Points just past buffered bytes */
+ u32 peeked; /* Bytes peeked from reader (need to skip) */
+ bool eof; /* Hit end of input without an error? */
+ char scratch[5]; /* Temporary buffer for peekfast boundaries */
+};
+
+static void
+init_snappy_decompressor(struct snappy_decompressor *d, struct source *reader)
+{
+ d->reader = reader;
+ d->ip = NULL;
+ d->ip_limit = NULL;
+ d->peeked = 0;
+ d->eof = false;
+}
+
+static void exit_snappy_decompressor(struct snappy_decompressor *d)
+{
+ skip(d->reader, d->peeked);
+}
+
+/*
+ * Read the uncompressed length stored at the start of the compressed data.
+ * On succcess, stores the length in *result and returns true.
+ * On failure, returns false.
+ */
+static bool read_uncompressed_length(struct snappy_decompressor *d,
+ u32 *result)
+{
+ DCHECK(d->ip == NULL); /*
+ * Must not have read anything yet
+ * Length is encoded in 1..5 bytes
+ */
+ *result = 0;
+ u32 shift = 0;
+ while (true) {
+ if (shift >= 32)
+ return false;
+ size_t n;
+ const char *ip = peek(d->reader, &n);
+ if (n == 0)
+ return false;
+ const unsigned char c = *(const unsigned char *)(ip);
+ skip(d->reader, 1);
+ *result |= (u32) (c & 0x7f) << shift;
+ if (c < 128) {
+ break;
+ }
+ shift += 7;
+ }
+ return true;
+}
+
+static bool refill_tag(struct snappy_decompressor *d);
+
+/*
+ * Process the next item found in the input.
+ * Returns true if successful, false on error or end of input.
+ */
+static void decompress_all_tags(struct snappy_decompressor *d,
+ struct writer *writer)
+{
+ const char *ip = d->ip;
+
+ for (;;) {
+ if (d->ip_limit - ip < 5) {
+ d->ip = ip;
+ if (!refill_tag(d))
+ return;
+ ip = d->ip;
+ }
+
+ const unsigned char c = *(const unsigned char *)(ip++);
+
+ if ((c & 0x3) == LITERAL) {
+ u32 literal_length = c >> 2;
+ if (unlikely(literal_length >= 60)) {
+ /* Long literal */
+ const u32 literal_ll = literal_length - 59;
+ literal_length = get_unaligned_le32(ip) &
+ wordmask[literal_ll];
+ ip += literal_ll;
+ }
+ ++literal_length;
+
+ u32 avail = d->ip_limit - ip;
+ while (avail < literal_length) {
+ if (!writer_append(writer, ip, avail, false))
+ return;
+ literal_length -= avail;
+ skip(d->reader, d->peeked);
+ size_t n;
+ ip = peek(d->reader, &n);
+ avail = n;
+ d->peeked = avail;
+ if (avail == 0)
+ return; /* Premature end of input */
+ d->ip_limit = ip + avail;
+ }
+ bool allow_fast_path = (avail >= 16);
+ if (!writer_append(writer, ip, literal_length,
+ allow_fast_path))
+ return;
+ ip += literal_length;
+ } else {
+ const u32 entry = char_table[c];
+ const u32 trailer = get_unaligned_le32(ip) &
+ wordmask[entry >> 11];
+ const u32 length = entry & 0xff;
+ ip += entry >> 11;
+
+ /*
+ * copy_offset/256 is encoded in bits 8..10.
+ * By just fetching those bits, we get
+ * copy_offset (since the bit-field starts at
+ * bit 8).
+ */
+ const u32 copy_offset = entry & 0x700;
+ if (!writer_append_from_self(writer,
+ copy_offset + trailer,
+ length))
+ return;
+ }
+ }
+}
+
+static bool refill_tag(struct snappy_decompressor *d)
+{
+ const char *ip = d->ip;
+
+ if (ip == d->ip_limit) {
+ size_t n;
+ /* Fetch a new fragment from the reader */
+ skip(d->reader, d->peeked); /* All peeked bytes are used up */
+ ip = peek(d->reader, &n);
+ d->peeked = n;
+ if (n == 0) {
+ d->eof = true;
+ return false;
+ }
+ d->ip_limit = ip + n;
+ }
+
+ /* Read the tag character */
+ DCHECK_LT(ip, d->ip_limit);
+ const unsigned char c = *(const unsigned char *)(ip);
+ const u32 entry = char_table[c];
+ const u32 needed = (entry >> 11) + 1; /* +1 byte for 'c' */
+ DCHECK_LE(needed, sizeof(d->scratch));
+
+ /* Read more bytes from reader if needed */
+ u32 nbuf = d->ip_limit - ip;
+
+ if (nbuf < needed) {
+ /*
+ * Stitch together bytes from ip and reader to form the word
+ * contents. We store the needed bytes in "scratch". They
+ * will be consumed immediately by the caller since we do not
+ * read more than we need.
+ */
+ memmove(d->scratch, ip, nbuf);
+ skip(d->reader, d->peeked); /* All peeked bytes are used up */
+ d->peeked = 0;
+ while (nbuf < needed) {
+ size_t length;
+ const char *src = peek(d->reader, &length);
+ if (length == 0)
+ return false;
+ u32 to_add = min_t(u32, needed - nbuf, length);
+ memcpy(d->scratch + nbuf, src, to_add);
+ nbuf += to_add;
+ skip(d->reader, to_add);
+ }
+ DCHECK_EQ(nbuf, needed);
+ d->ip = d->scratch;
+ d->ip_limit = d->scratch + needed;
+ } else if (nbuf < 5) {
+ /*
+ * Have enough bytes, but move into scratch so that we do not
+ * read past end of input
+ */
+ memmove(d->scratch, ip, nbuf);
+ skip(d->reader, d->peeked); /* All peeked bytes are used up */
+ d->peeked = 0;
+ d->ip = d->scratch;
+ d->ip_limit = d->scratch + nbuf;
+ } else {
+ /* Pass pointer to buffer returned by reader. */
+ d->ip = ip;
+ }
+ return true;
+}
+
+static int internal_uncompress(struct source *r,
+ struct writer *writer, u32 max_len)
+{
+ struct snappy_decompressor decompressor;
+ u32 uncompressed_len = 0;
+
+ init_snappy_decompressor(&decompressor, r);
+
+ if (!read_uncompressed_length(&decompressor, &uncompressed_len))
+ return -EIO;
+ /* Protect against possible DoS attack */
+ if ((u64) (uncompressed_len) > max_len)
+ return -EIO;
+
+ writer_set_expected_length(writer, uncompressed_len);
+
+ /* Process the entire input */
+ decompress_all_tags(&decompressor, writer);
+
+ exit_snappy_decompressor(&decompressor);
+ return (decompressor.eof && writer_check_length(writer)) ? 0 : -EIO;
+}
+
+static inline int compress(struct snappy_env *env, struct source *reader,
+ struct sink *writer)
+{
+ int err;
+ size_t written = 0;
+ int N = available(reader);
+ char ulength[kmax32];
+ char *p = varint_encode32(ulength, N);
+
+ append(writer, ulength, p - ulength);
+ written += (p - ulength);
+
+ while (N > 0) {
+ /* Get next block to compress (without copying if possible) */
+ size_t fragment_size;
+ const char *fragment = peek(reader, &fragment_size);
+ if (fragment_size == 0) {
+ err = -EIO;
+ goto out;
+ }
+ const int num_to_read = min_t(int, N, kblock_size);
+ size_t bytes_read = fragment_size;
+
+ int pending_advance = 0;
+ if (bytes_read >= num_to_read) {
+ /* Buffer returned by reader is large enough */
+ pending_advance = num_to_read;
+ fragment_size = num_to_read;
+ }
+#ifdef SCATHER_GATHER
+ else {
+ memcpy(env->scratch, fragment, bytes_read);
+ skip(reader, bytes_read);
+
+ while (bytes_read < num_to_read) {
+ fragment = peek(reader, &fragment_size);
+ size_t n =
+ min_t(size_t, fragment_size,
+ num_to_read - bytes_read);
+ memcpy(env->scratch + bytes_read, fragment, n);
+ bytes_read += n;
+ skip(reader, n);
+ }
+ DCHECK_EQ(bytes_read, num_to_read);
+ fragment = env->scratch;
+ fragment_size = num_to_read;
+ }
+#endif
+ if (fragment_size < num_to_read)
+ return -EIO;
+
+ /* Get encoding table for compression */
+ int table_size;
+ u16 *table = get_hash_table(env, num_to_read, &table_size);
+
+ /* Compress input_fragment and append to dest */
+ const int max_output =
+ snappy_max_compressed_length(num_to_read);
+
+ char *dest;
+ dest = sink_peek(writer, max_output);
+#ifdef SCATHER_GATHER
+ if (!dest) {
+ /*
+ * Need a scratch buffer for the output,
+ * because the byte sink doesn't have enough
+ * in one piece.
+ */
+ dest = env->scratch_output;
+ }
+#endif
+ char *end = compress_fragment(fragment, fragment_size,
+ dest, table, table_size);
+ append(writer, dest, end - dest);
+ written += (end - dest);
+
+ N -= num_to_read;
+ skip(reader, pending_advance);
+ }
+
+ err = 0;
+out:
+ return err;
+}
+
+/**
+ * snappy_compress - Compress a buffer using the snappy compressor.
+ * @env: Preallocated environment
+ * @input: Input buffer
+ * @input_length: Length of input_buffer
+ * @compressed: Output buffer for compressed data
+ * @compressed_length: The real length of the output written here.
+ *
+ * Return 0 on success, otherwise an negative error code.
+ *
+ * The output buffer must be at least
+ * snappy_max_compressed_length(input_length) bytes long.
+ *
+ * Requires a preallocated environment from snappy_init_env.
+ * The environment does not keep state over individual calls
+ * of this function, just preallocates the memory.
+ */
+int snappy_compress(struct snappy_env *env,
+ const char *input,
+ size_t input_length,
+ char *compressed, size_t *compressed_length)
+{
+ struct source reader = {
+ .ptr = input,
+ .left = input_length
+ };
+ struct sink writer = {
+ .dest = compressed,
+ };
+ int err = compress(env, &reader, &writer);
+
+ /* Compute how many bytes were added */
+ *compressed_length = (writer.dest - compressed);
+ return err;
+}
+EXPORT_SYMBOL(snappy_compress);
+
+/**
+ * snappy_uncompress - Uncompress a snappy compressed buffer
+ * @compressed: Input buffer with compressed data
+ * @n: length of compressed buffer
+ * @uncompressed: buffer for uncompressed data
+ *
+ * The uncompressed data buffer must be at least
+ * snappy_uncompressed_length(compressed) bytes long.
+ *
+ * Returns true when successfull, otherwise false.
+ */
+bool snappy_uncompress(const char *compressed, size_t n, char *uncompressed)
+{
+ struct source reader = {
+ .ptr = compressed,
+ .left = n
+ };
+ struct writer output = {
+ .base = uncompressed,
+ .op = uncompressed
+ };
+ return internal_uncompress(&reader, &output, 0xffffffff);
+}
+EXPORT_SYMBOL(snappy_uncompress);
+
+/**
+ * snappy_init_env - Allocate snappy compression environment
+ * @env: Environment to preallocate
+ *
+ * Returns 0 on success, otherwise negative errno.
+ * Must run in process context.
+ */
+int snappy_init_env(struct snappy_env *env)
+{
+ env->hash_table = vmalloc(sizeof(u16) * kmax_hash_table_size);
+ if (!env->hash_table)
+ goto error;
+#ifdef SCATHER_GATHER
+ env->scratch = vmalloc(kblock_size);
+ env->scratch_output =
+ vmalloc(snappy_max_compressed_length(kblock_size));
+ if (!env->scratch || !env->scratch_output)
+ goto error;
+#endif
+ return 0;
+error:
+ snappy_free_env(env);
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(snappy_init_env);
+
+/**
+ * snappy_free_env - Free an snappy compression environment
+ * @env: Environment to free.
+ *
+ * Must run in process context.
+ */
+void snappy_free_env(struct snappy_env *env)
+{
+ vfree(env->hash_table);
+#ifdef SCATHER_GATHER
+ vfree(env->scratch);
+ vfree(env->scratch_output);
+#endif
+ memset(env, 0, sizeof(struct snappy_env));
+}
+EXPORT_SYMBOL(snappy_free_env);
--
1.7.4.4


2011-10-20 23:39:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 2/3] BTRFS: Add snappy support

From: Andi Kleen <[email protected]>

Add support in btrfs for snappy compression.

This is based on the lzo code with minor modifications.
The btrfs glue code could be significantly improved over LZO
by exploiting some snappy features, but hasn't so far.

Open: implement scatter-gather support and get rid of the temporary
buffers.

Some performance numbers (thanks to Jacob Sowles for running them)

bonnie++, Core i7-64bit

block output rewrite block input random
K/sec K/sec K/sec K/sec
None 100% 100% 100% 100%
zlib +6.3% +5.4% +6.6% +11.6%
lzo +11.5% +4.6% +12.4% +6.7%
snappy +19.3% +28.1% +32.6% +9.3%

Snappy does extremly well on the 64bit architecture,
outperforming everything else, sometimes with a healthy
margin.

bonnie++, Atom-32bit

block output rewrite block input random

K/sec K/sec K/sec K/sec
None 100% 100% 100% 100%
zlib -43.1% -24.2% -19.0% +12.0%
lzo +0.8% +2.6% +6.8% +14.8%
snappy +19.5% +16.2% +24.0% +15.7%

zlib does very poorly on Atom, actually degrading
performance. snappy is generally faster or similar to LZO.
The difference is not as big as on the 64bit CPU though, but
still visible.

bonnie++, files, Core-i7-64bit

sequential create delete random create delete
files/sec
None 100% 100% 100% 100%
zlib +8.3% +10.5% +10.3% +1.4%
lzo +3.8% +3.3% +5.4% -3.89%
snappy +23.7% +37.2% +21.8% +23.8%

bonnie++, files, Atom-32bit

sequential create delete random create delete
files/sec
None 100% 100% 100% 100%
zlib +3.0% +7.9% +5.2% +5.1%
lzo +8.2% +5.9% +4.8% +4.6%
snappy +3.1% +8.5% +5.7% +1.3%

Creation/Deletion on Atom is a case where snappy loses to LZO,
however the loss is small. On 64bit Core it's a win.

I should add that these benchmarks mainly use 0 filled IO,
however FFSB was also quickly tested with more random data
and the differences were similar. See also the micro benchmarks
in the algorithm description for the behaviour with different
data types.

FFSB, 4 threads, stair case data pattern, Reads MB/s

Core i7-64bit Atom-32bit
MB/s MB/s
None 100% 100%
zlib +8.0% +4.2%
lzo +9.3% +4.8%
snappy +12.4% +7.9%

In general snappy is a better replacement for LZO, especially
on 64bit, but even on 32bit.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
fs/btrfs/Kconfig | 1 +
fs/btrfs/Makefile | 3 +-
fs/btrfs/compression.c | 1 +
fs/btrfs/compression.h | 1 +
fs/btrfs/ctree.h | 9 +-
fs/btrfs/disk-io.c | 2 +
fs/btrfs/ioctl.c | 4 +
fs/btrfs/snappy.c | 435 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/super.c | 9 +-
lib/Makefile | 1 +
10 files changed, 461 insertions(+), 5 deletions(-)
create mode 100644 fs/btrfs/snappy.c

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index ecb9fd3..d55df9c 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -6,6 +6,7 @@ config BTRFS_FS
select ZLIB_DEFLATE
select LZO_COMPRESS
select LZO_DECOMPRESS
+ select SNAPPY
help
Btrfs is a new filesystem with extents, writable snapshotting,
support for multiple devices and many more features.
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 40e6ac0..7cd86e7 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,6 +7,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
export.o tree-log.o free-space-cache.o zlib.o lzo.o \
- compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
+ snappy.o

btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 8ec5d86..b171858 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -729,6 +729,7 @@ static wait_queue_head_t comp_workspace_wait[BTRFS_COMPRESS_TYPES];
struct btrfs_compress_op *btrfs_compress_op[] = {
&btrfs_zlib_compress,
&btrfs_lzo_compress,
+ &btrfs_snappy_compress,
};

int __init btrfs_init_compress(void)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index a12059f..971a425 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -79,5 +79,6 @@ struct btrfs_compress_op {

extern struct btrfs_compress_op btrfs_zlib_compress;
extern struct btrfs_compress_op btrfs_lzo_compress;
+extern struct btrfs_compress_op btrfs_snappy_compress;

#endif
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 03912c5..7ebdae3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -415,6 +415,7 @@ struct btrfs_super_block {
#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1)
#define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL << 2)
#define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO (1ULL << 3)
+#define BTRFS_FEATURE_INCOMPAT_COMPRESS_SNAPPY (1ULL << 4)

#define BTRFS_FEATURE_COMPAT_SUPP 0ULL
#define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL
@@ -422,7 +423,8 @@ struct btrfs_super_block {
(BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL | \
BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \
- BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)
+ BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO | \
+ BTRFS_FEATURE_INCOMPAT_COMPRESS_SNAPPY)

/*
* A leaf is full of items. offset and size tell us where to find
@@ -578,8 +580,9 @@ enum btrfs_compression_type {
BTRFS_COMPRESS_NONE = 0,
BTRFS_COMPRESS_ZLIB = 1,
BTRFS_COMPRESS_LZO = 2,
- BTRFS_COMPRESS_TYPES = 2,
- BTRFS_COMPRESS_LAST = 3,
+ BTRFS_COMPRESS_SNAPPY = 3,
+ BTRFS_COMPRESS_TYPES = 3,
+ BTRFS_COMPRESS_LAST = 4,
};

struct btrfs_inode_item {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b3ac6..4c04228 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1808,6 +1808,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
if (tree_root->fs_info->compress_type & BTRFS_COMPRESS_LZO)
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+ if (tree_root->fs_info->compress_type & BTRFS_COMPRESS_SNAPPY)
+ features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_SNAPPY;
btrfs_set_super_incompat_flags(disk_super, features);

features = btrfs_super_compat_ro_flags(disk_super) &
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 538f65a..3f15175 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1133,6 +1133,10 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
btrfs_set_super_incompat_flags(disk_super, features);
}
+ if (range->compress_type == BTRFS_COMPRESS_SNAPPY) {
+ features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_SNAPPY;
+ btrfs_set_super_incompat_flags(disk_super, features);
+ }

if (!file)
kfree(ra);
diff --git a/fs/btrfs/snappy.c b/fs/btrfs/snappy.c
new file mode 100644
index 0000000..a715f9d
--- /dev/null
+++ b/fs/btrfs/snappy.c
@@ -0,0 +1,435 @@
+/*
+ * Copyright (C) 2011 Intel Corporation
+ * Author: Andi Kleen
+ * Copyright (C) 2008 Oracle
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+/* largely copy'n'pasted from lzo.c Unify? */
+
+/* XXX: could use snappy's fragments to avoid the working buffer?
+ * However it's difficult to kmap multiple buffers.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/init.h>
+#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/pagemap.h>
+#include <linux/bio.h>
+#include <linux/snappy.h>
+#include "compression.h"
+
+#define SNAPPY_LEN 4
+
+struct workspace {
+ void *buf; /* where compressed data goes */
+ void *cbuf; /* where decompressed data goes */
+ struct snappy_env env;
+ struct list_head list;
+};
+
+static void snappy_free_workspace(struct list_head *ws)
+{
+ struct workspace *workspace = list_entry(ws, struct workspace, list);
+
+ snappy_free_env(&workspace->env);
+ vfree(workspace->buf);
+ vfree(workspace->cbuf);
+ kfree(workspace);
+}
+
+static struct list_head *snappy_alloc_workspace(void)
+{
+ struct workspace *workspace;
+
+ workspace = kzalloc(sizeof(*workspace), GFP_NOFS);
+ if (!workspace)
+ return ERR_PTR(-ENOMEM);
+
+ workspace->buf = vmalloc(PAGE_CACHE_SIZE);
+ workspace->cbuf = vmalloc(snappy_max_compressed_length(PAGE_CACHE_SIZE));
+ if (!workspace->buf || !workspace->cbuf)
+ goto fail;
+
+ if (snappy_init_env(&workspace->env) < 0)
+ goto fail;
+
+ INIT_LIST_HEAD(&workspace->list);
+
+ return &workspace->list;
+fail:
+ snappy_free_workspace(&workspace->list);
+ return ERR_PTR(-ENOMEM);
+}
+
+static inline void write_compress_length(char *buf, size_t len)
+{
+ __le32 dlen;
+
+ dlen = cpu_to_le32(len);
+ memcpy(buf, &dlen, SNAPPY_LEN);
+}
+
+static inline size_t read_compress_length(char *buf)
+{
+ __le32 dlen;
+
+ memcpy(&dlen, buf, SNAPPY_LEN);
+ return le32_to_cpu(dlen);
+}
+
+static int snappy_compress_pages(struct list_head *ws,
+ struct address_space *mapping,
+ u64 start, unsigned long len,
+ struct page **pages,
+ unsigned long nr_dest_pages,
+ unsigned long *out_pages,
+ unsigned long *total_in,
+ unsigned long *total_out,
+ unsigned long max_out)
+{
+ struct workspace *workspace = list_entry(ws, struct workspace, list);
+ int ret = 0;
+ char *data_in;
+ char *cpage_out;
+ int nr_pages = 0;
+ struct page *in_page = NULL;
+ struct page *out_page = NULL;
+ unsigned long bytes_left;
+
+ size_t in_len;
+ size_t out_len;
+ char *buf;
+ unsigned long tot_in = 0;
+ unsigned long tot_out = 0;
+ unsigned long pg_bytes_left;
+ unsigned long out_offset;
+ unsigned long bytes;
+
+ *out_pages = 0;
+ *total_out = 0;
+ *total_in = 0;
+
+ in_page = find_get_page(mapping, start >> PAGE_CACHE_SHIFT);
+ data_in = kmap(in_page);
+
+ /*
+ * store the size of all chunks of compressed data in
+ * the first 4 bytes
+ */
+ out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+ if (out_page == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ cpage_out = kmap(out_page);
+ out_offset = SNAPPY_LEN;
+ tot_out = SNAPPY_LEN;
+ pages[0] = out_page;
+ nr_pages = 1;
+ pg_bytes_left = PAGE_CACHE_SIZE - SNAPPY_LEN;
+
+ /* compress at most one page of data each time */
+ in_len = min(len, PAGE_CACHE_SIZE);
+ while (tot_in < len) {
+ ret = snappy_compress(&workspace->env,
+ data_in, in_len, workspace->cbuf,
+ &out_len);
+ if (ret != 0) {
+ printk(KERN_DEBUG "btrfs uncompression in loop returned %d\n",
+ ret);
+ ret = -1;
+ goto out;
+ }
+
+ /* store the size of this chunk of compressed data */
+ write_compress_length(cpage_out + out_offset, out_len);
+ tot_out += SNAPPY_LEN;
+ out_offset += SNAPPY_LEN;
+ pg_bytes_left -= SNAPPY_LEN;
+
+ tot_in += in_len;
+ tot_out += out_len;
+
+ /* copy bytes from the working buffer into the pages */
+ buf = workspace->cbuf;
+ while (out_len) {
+ bytes = min_t(unsigned long, pg_bytes_left, out_len);
+
+ memcpy(cpage_out + out_offset, buf, bytes);
+
+ out_len -= bytes;
+ pg_bytes_left -= bytes;
+ buf += bytes;
+ out_offset += bytes;
+
+ /*
+ * we need another page for writing out.
+ *
+ * Note if there's less than 4 bytes left, we just
+ * skip to a new page.
+ */
+ if ((out_len == 0 && pg_bytes_left < SNAPPY_LEN) ||
+ pg_bytes_left == 0) {
+ if (pg_bytes_left) {
+ memset(cpage_out + out_offset, 0,
+ pg_bytes_left);
+ tot_out += pg_bytes_left;
+ }
+
+ /* we're done, don't allocate new page */
+ if (out_len == 0 && tot_in >= len)
+ break;
+
+ kunmap(out_page);
+ if (nr_pages == nr_dest_pages) {
+ out_page = NULL;
+ ret = -1;
+ goto out;
+ }
+
+ out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+ if (out_page == NULL) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ cpage_out = kmap(out_page);
+ pages[nr_pages++] = out_page;
+
+ pg_bytes_left = PAGE_CACHE_SIZE;
+ out_offset = 0;
+ }
+ }
+
+ /* we're making it bigger, give up */
+ if (tot_in > 8192 && tot_in < tot_out)
+ goto out;
+
+ /* we're all done */
+ if (tot_in >= len)
+ break;
+
+ if (tot_out > max_out)
+ break;
+
+ bytes_left = len - tot_in;
+ kunmap(in_page);
+ page_cache_release(in_page);
+
+ start += PAGE_CACHE_SIZE;
+ in_page = find_get_page(mapping, start >> PAGE_CACHE_SHIFT);
+ data_in = kmap(in_page);
+ in_len = min(bytes_left, PAGE_CACHE_SIZE);
+ }
+
+ if (tot_out > tot_in)
+ goto out;
+
+ /* store the size of all chunks of compressed data */
+ cpage_out = kmap(pages[0]);
+ write_compress_length(cpage_out, tot_out);
+
+ kunmap(pages[0]);
+
+ ret = 0;
+ *total_out = tot_out;
+ *total_in = tot_in;
+out:
+ *out_pages = nr_pages;
+ if (out_page)
+ kunmap(out_page);
+
+ if (in_page) {
+ kunmap(in_page);
+ page_cache_release(in_page);
+ }
+
+ return ret;
+}
+
+static int snappy_decompress_biovec(struct list_head *ws,
+ struct page **pages_in,
+ u64 disk_start,
+ struct bio_vec *bvec,
+ int vcnt,
+ size_t srclen)
+{
+ struct workspace *workspace = list_entry(ws, struct workspace, list);
+ int ret = 0, ret2;
+ char *data_in;
+ unsigned long page_in_index = 0;
+ unsigned long page_out_index = 0;
+ unsigned long total_pages_in = (srclen + PAGE_CACHE_SIZE - 1) /
+ PAGE_CACHE_SIZE;
+ unsigned long buf_start;
+ unsigned long buf_offset = 0;
+ unsigned long bytes;
+ unsigned long working_bytes;
+ unsigned long pg_offset;
+
+ size_t in_len;
+ size_t out_len;
+ unsigned long in_offset;
+ unsigned long in_page_bytes_left;
+ unsigned long tot_in;
+ unsigned long tot_out;
+ unsigned long tot_len;
+ char *buf;
+ bool may_late_unmap, need_unmap;
+
+ data_in = kmap(pages_in[0]);
+ tot_len = read_compress_length(data_in);
+
+ tot_in = SNAPPY_LEN;
+ in_offset = SNAPPY_LEN;
+ tot_len = min_t(size_t, srclen, tot_len);
+ in_page_bytes_left = PAGE_CACHE_SIZE - SNAPPY_LEN;
+
+ tot_out = 0;
+ pg_offset = 0;
+
+ while (tot_in < tot_len) {
+ in_len = read_compress_length(data_in + in_offset);
+ in_page_bytes_left -= SNAPPY_LEN;
+ in_offset += SNAPPY_LEN;
+ tot_in += SNAPPY_LEN;
+
+ tot_in += in_len;
+ working_bytes = in_len;
+ may_late_unmap = need_unmap = false;
+
+ /* fast path: avoid using the working buffer */
+ if (in_page_bytes_left >= in_len) {
+ buf = data_in + in_offset;
+ bytes = in_len;
+ may_late_unmap = true;
+ goto cont;
+ }
+
+ /* copy bytes from the pages into the working buffer */
+ buf = workspace->cbuf;
+ buf_offset = 0;
+ while (working_bytes) {
+ bytes = min(working_bytes, in_page_bytes_left);
+
+ memcpy(buf + buf_offset, data_in + in_offset, bytes);
+ buf_offset += bytes;
+cont:
+ working_bytes -= bytes;
+ in_page_bytes_left -= bytes;
+ in_offset += bytes;
+
+ /* check if we need to pick another page */
+ if ((working_bytes == 0 &&
+ in_page_bytes_left < SNAPPY_LEN)
+ || in_page_bytes_left == 0) {
+ tot_in += in_page_bytes_left;
+
+ if (working_bytes == 0 && tot_in >= tot_len)
+ break;
+
+ if (page_in_index + 1 >= total_pages_in) {
+ ret = -1;
+ goto done;
+ }
+
+ if (may_late_unmap)
+ need_unmap = true;
+ else
+ kunmap(pages_in[page_in_index]);
+
+ data_in = kmap(pages_in[++page_in_index]);
+
+ in_page_bytes_left = PAGE_CACHE_SIZE;
+ in_offset = 0;
+ }
+ }
+
+ ret = -1;
+ if (snappy_uncompressed_length(buf, in_len, &out_len))
+ ret = snappy_uncompress(buf, in_len, workspace->buf);
+ if (need_unmap)
+ kunmap(pages_in[page_in_index - 1]);
+ if (ret != 0) {
+ printk(KERN_WARNING "btrfs decompress failed\n");
+ ret = -1;
+ break;
+ }
+
+ buf_start = tot_out;
+ tot_out += out_len;
+
+ ret2 = btrfs_decompress_buf2page(workspace->buf, buf_start,
+ tot_out, disk_start,
+ bvec, vcnt,
+ &page_out_index, &pg_offset);
+ if (ret2 == 0)
+ break;
+ }
+done:
+ kunmap(pages_in[page_in_index]);
+ return ret;
+}
+
+static int btrfs_snappy_decompress(struct list_head *ws, unsigned char *data_in,
+ struct page *dest_page,
+ unsigned long start_byte,
+ size_t srclen, size_t destlen)
+{
+ struct workspace *workspace = list_entry(ws, struct workspace, list);
+ size_t in_len;
+ size_t out_len;
+ int ret = 0;
+ char *kaddr;
+ unsigned long bytes;
+
+ BUG_ON(srclen < SNAPPY_LEN);
+
+ in_len = read_compress_length(data_in);
+ data_in += SNAPPY_LEN;
+
+ ret = -1;
+ if (snappy_uncompressed_length(data_in, in_len, &out_len))
+ ret = snappy_uncompress(data_in, in_len, workspace->buf);
+ if (ret != 0) {
+ printk(KERN_WARNING "btrfs decompress failed!\n");
+ ret = -1;
+ goto out;
+ }
+ if (out_len < start_byte) {
+ ret = -1;
+ goto out;
+ }
+
+ bytes = min_t(unsigned long, destlen, out_len - start_byte);
+
+ kaddr = kmap_atomic(dest_page, KM_USER0);
+ memcpy(kaddr, workspace->buf + start_byte, bytes);
+ kunmap_atomic(kaddr, KM_USER0);
+out:
+ return ret;
+}
+
+struct btrfs_compress_op btrfs_snappy_compress = {
+ .alloc_workspace = snappy_alloc_workspace,
+ .free_workspace = snappy_free_workspace,
+ .compress_pages = snappy_compress_pages,
+ .decompress_biovec = snappy_decompress_biovec,
+ .decompress = btrfs_snappy_decompress,
+};
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 15634d4..d3a2d10 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -267,6 +267,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options)
} else if (strcmp(args[0].from, "lzo") == 0) {
compress_type = "lzo";
info->compress_type = BTRFS_COMPRESS_LZO;
+ } else if (strcmp(args[0].from, "snappy") == 0) {
+ compress_type = "snappy";
+ info->compress_type = BTRFS_COMPRESS_SNAPPY;
} else {
ret = -EINVAL;
goto out;
@@ -696,8 +699,12 @@ static int btrfs_show_options(struct seq_file *seq, struct vfsmount *vfs)
if (btrfs_test_opt(root, COMPRESS)) {
if (info->compress_type == BTRFS_COMPRESS_ZLIB)
compress_type = "zlib";
- else
+ else if (info->compress_type == BTRFS_COMPRESS_SNAPPY)
+ compress_type = "snappy";
+ else if (info->compress_type == BTRFS_COMPRESS_LZO)
compress_type = "lzo";
+ else
+ compress_type = "?";
if (btrfs_test_opt(root, FORCE_COMPRESS))
seq_printf(seq, ",compress-force=%s", compress_type);
else
diff --git a/lib/Makefile b/lib/Makefile
index e38e580..2fad385 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_AVERAGE) += average.o

obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o

+CFLAGS_snappy.o += $(call cc-disable-warning, declaration-after-statement) -DNDEBUG=1
obj-$(CONFIG_CORDIC) += cordic.o

obj-$(CONFIG_LLIST) += llist.o
--
1.7.4.4

2011-10-20 23:39:01

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 3/3] Add snappy interface to crypto API

From: Andi Kleen <[email protected]>

Mainly so that ubifs can use it.

Snappy is a better compressor in the same niche as LZO.

Only lightly tested so far. Experiences welcome.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
crypto/Kconfig | 9 +++++
crypto/Makefile | 1 +
crypto/snappy.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 109 insertions(+), 0 deletions(-)
create mode 100644 crypto/snappy.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index ae27b753..2c85991 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -823,6 +823,15 @@ config CRYPTO_LZO
help
This is the LZO algorithm.

+config CRYPTO_SNAPPY
+ tristate "Snappy compression algorithm"
+ select CRYPTO_ALGAPI
+ select SNAPPY
+ help
+ snappy is a faster alternative to the lzo compression algorithm
+ with comparable compression. It is very fast on 64bit systems, but also
+ good on 32bit systems. It especially excels at already compressed data.
+
comment "Random Number Generation"

config CRYPTO_ANSI_CPRNG
diff --git a/crypto/Makefile b/crypto/Makefile
index ce5a813..cf92f6f 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o
obj-$(CONFIG_CRYPTO_LZO) += lzo.o
+obj-$(CONFIG_CRYPTO_SNAPPY) += snappy.o
obj-$(CONFIG_CRYPTO_RNG2) += rng.o
obj-$(CONFIG_CRYPTO_RNG2) += krng.o
obj-$(CONFIG_CRYPTO_ANSI_CPRNG) += ansi_cprng.o
diff --git a/crypto/snappy.c b/crypto/snappy.c
new file mode 100644
index 0000000..2f44d30
--- /dev/null
+++ b/crypto/snappy.c
@@ -0,0 +1,99 @@
+/*
+ * Cryptographic API.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/vmalloc.h>
+#include <linux/snappy.h>
+
+struct snappy_ctx {
+ struct snappy_env env;
+};
+
+/* Only needed for compression actually */
+static int snappy_init(struct crypto_tfm *tfm)
+{
+ struct snappy_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return snappy_init_env(&ctx->env);
+}
+
+static void snappy_exit(struct crypto_tfm *tfm)
+{
+ struct snappy_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ snappy_free_env(&ctx->env);
+}
+
+static int snp_compress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct snappy_ctx *ctx = crypto_tfm_ctx(tfm);
+ size_t olen;
+ int err;
+
+ /* XXXX very pessimistic. check in snappy? */
+ if (*dlen < snappy_max_compressed_length(*dlen))
+ return -EINVAL;
+ err = snappy_compress(&ctx->env, src, slen, dst, &olen);
+ *dlen = olen;
+ return err;
+}
+
+static int snp_decompress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ size_t ulen;
+
+ if (!snappy_uncompressed_length(src, slen, &ulen))
+ return -EIO;
+ if (*dlen < ulen)
+ return -EINVAL;
+ *dlen = ulen;
+ return snappy_uncompress(src, slen, dst) ? 0 : -EIO;
+}
+
+static struct crypto_alg alg = {
+ .cra_name = "snappy",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct snappy_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_list = LIST_HEAD_INIT(alg.cra_list),
+ .cra_init = snappy_init,
+ .cra_exit = snappy_exit,
+ .cra_u = { .compress = {
+ .coa_compress = snp_compress,
+ .coa_decompress = snp_decompress } }
+};
+
+static int __init snappy_mod_init(void)
+{
+ return crypto_register_alg(&alg);
+}
+
+static void __exit snappy_mod_fini(void)
+{
+ crypto_unregister_alg(&alg);
+}
+
+module_init(snappy_mod_init);
+module_exit(snappy_mod_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Snappy Compression Algorithm");
--
1.7.4.4


2011-10-21 13:24:58

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3/3] Add snappy interface to crypto API

Andi Kleen <[email protected]> wrote:
> From: Andi Kleen <[email protected]>
>
> Mainly so that ubifs can use it.
>
> Snappy is a better compressor in the same niche as LZO.
>
> Only lightly tested so far. Experiences welcome.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: Herbert Xu <[email protected]>

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt