2011-12-01 20:13:41

by djwong

[permalink] [raw]
Subject: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

Hi all,

This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
software crc32c implementation. It removes the crc32c implementation in
crypto/ in favor of using the stamped-out one in lib/. There is also a change
to Kconfig so that the kernel builder can pick an implementation best suited
for the hardware.

The motivation for this patchset is that I am working on adding full metadata
checksumming to ext4. As far as performance impact of adding checksumming
goes, I see nearly no change with a standard mail server ffsb simulation. On a
test that involves only file creation and deletion and extent tree writes, I
see a drop of about 50 pcercent with the current kernel crc32c implementation;
this improves to a drop of about 20 percent with the enclosed crc32c code.

When metadata is usually a small fraction of total IO, this new implementation
doesn't help much because metadata is usually a small fraction of total IO.
However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
tree), then this patch speeds up the operation substantially.

Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
should improve their speed as well. I have not yet quantified that, however.
This latest submission combines Bob's patches from late August 2011 with mine
so that they can be one coherent patch set. Please excuse my inability to
combine some of the patches; I've been advised to leave Bob's patches alone and
build atop them instead. :/

Since the last posting, I've also collected some crc32c test results on a bunch
of different x86/powerpc/sparc platforms. The results can be viewed here:
http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c" columns describe the
performance of the kernel's current crc32 and crc32c software implementations.
The "crc32c-by8-le" column shows crc32c performance with this patchset applied.
I expect crc32 performance to be roughly the same.

The two _boost columns at the right side of the spreadsheet shows how much
faster the new implementation is over the old one. As you can see, crc32 rises
substantially, and crc32c experiences a huge increase. I'm hoping this patch
set meets with everyone's approval and can go in soon. Herbert Xu didn't
appear to have any strong objections to last month's posting, so I'm wondering
if Andrew has an opinion?

v2: Use the crypto testmgr api for self-test.
v3: Get rid of the -be version, which had no users.
v4: Allow kernel builder a choice of speed vs. space optimization.
v5: Reuse lib/crc32 for crc32c as well, and make crypto/crc32c use lib/crc32.c.
v5.1: Include Bob Pearson's patches in submission request.
v5.2: Fix changelogs for Bob's patches per akpm request.

--D



2011-12-01 20:13:57

by djwong

[permalink] [raw]
Subject: [PATCH 02/14] crc32: Move long comment about crc32 fundamentals to Documentation/

Moved a long comment from lib/crc32.c to Documentation/crc32.txt
where it will more likely get read.
- Edited the resulting document to add an explanation of the slicing-by-n
algorithm.

From: Bob Pearson <[email protected]>
Signed-off-by: George Spelvin <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
Documentation/00-INDEX | 2 +
Documentation/crc32.txt | 183 +++++++++++++++++++++++++++++++++++++++++++++++
lib/crc32.c | 129 +--------------------------------
3 files changed, 187 insertions(+), 127 deletions(-)
create mode 100644 Documentation/crc32.txt


diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 65bbd26..e7b38a0 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -104,6 +104,8 @@ cpuidle/
- info on CPU_IDLE, CPU idle state management subsystem.
cputopology.txt
- documentation on how CPU topology info is exported via sysfs.
+crc32.txt
+ - brief tutorial on CRC computation
cris/
- directory with info about Linux on CRIS architecture.
crypto/
diff --git a/Documentation/crc32.txt b/Documentation/crc32.txt
new file mode 100644
index 0000000..3d74ba4
--- /dev/null
+++ b/Documentation/crc32.txt
@@ -0,0 +1,183 @@
+A brief CRC tutorial.
+
+A CRC is a long-division remainder. You add the CRC to the message,
+and the whole thing (message+CRC) is a multiple of the given
+CRC polynomial. To check the CRC, you can either check that the
+CRC matches the recomputed value, *or* you can check that the
+remainder computed on the message+CRC is 0. This latter approach
+is used by a lot of hardware implementations, and is why so many
+protocols put the end-of-frame flag after the CRC.
+
+It's actually the same long division you learned in school, except that
+- We're working in binary, so the digits are only 0 and 1, and
+- When dividing polynomials, there are no carries. Rather than add and
+ subtract, we just xor. Thus, we tend to get a bit sloppy about
+ the difference between adding and subtracting.
+
+Like all division, the remainder is always smaller than the divisor.
+To produce a 32-bit CRC, the divisor is actually a 33-bit CRC polynomial.
+Since it's 33 bits long, bit 32 is always going to be set, so usually the
+CRC is written in hex with the most significant bit omitted. (If you're
+familiar with the IEEE 754 floating-point format, it's the same idea.)
+
+Note that a CRC is computed over a string of *bits*, so you have
+to decide on the endianness of the bits within each byte. To get
+the best error-detecting properties, this should correspond to the
+order they're actually sent. For example, standard RS-232 serial is
+little-endian; the most significant bit (sometimes used for parity)
+is sent last. And when appending a CRC word to a message, you should
+do it in the right order, matching the endianness.
+
+Just like with ordinary division, you proceed one digit (bit) at a time.
+Each step of the division, division, you take one more digit (bit) of the
+dividend and append it to the current remainder. Then you figure out the
+appropriate multiple of the divisor to subtract to being the remainder
+back into range. In binary, this is easy - it has to be either 0 or 1,
+and to make the XOR cancel, it's just a copy of bit 32 of the remainder.
+
+When computing a CRC, we don't care about the quotient, so we can
+throw the quotient bit away, but subtract the appropriate multiple of
+the polynomial from the remainder and we're back to where we started,
+ready to process the next bit.
+
+A big-endian CRC written this way would be coded like:
+for (i = 0; i < input_bits; i++) {
+ multiple = remainder & 0x80000000 ? CRCPOLY : 0;
+ remainder = (remainder << 1 | next_input_bit()) ^ multiple;
+}
+
+Notice how, to get at bit 32 of the shifted remainder, we look
+at bit 31 of the remainder *before* shifting it.
+
+But also notice how the next_input_bit() bits we're shifting into
+the remainder don't actually affect any decision-making until
+32 bits later. Thus, the first 32 cycles of this are pretty boring.
+Also, to add the CRC to a message, we need a 32-bit-long hole for it at
+the end, so we have to add 32 extra cycles shifting in zeros at the
+end of every message,
+
+These details lead to a standard trick: rearrange merging in the
+next_input_bit() until the moment it's needed. Then the first 32 cycles
+can be precomputed, and merging in the final 32 zero bits to make room
+for the CRC can be skipped entirely. This changes the code to:
+
+for (i = 0; i < input_bits; i++) {
+ remainder ^= next_input_bit() << 31;
+ multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
+ remainder = (remainder << 1) ^ multiple;
+}
+
+With this optimization, the little-endian code is particularly simple:
+for (i = 0; i < input_bits; i++) {
+ remainder ^= next_input_bit();
+ multiple = (remainder & 1) ? CRCPOLY : 0;
+ remainder = (remainder >> 1) ^ multiple;
+}
+
+The most significant coefficient of the remainder polynomial is stored
+in the least significant bit of the binary "remainder" variable.
+The other details of endianness have been hidden in CRCPOLY (which must
+be bit-reversed) and next_input_bit().
+
+As long as next_input_bit is returning the bits in a sensible order, we don't
+*have* to wait until the last possible moment to merge in additional bits.
+We can do it 8 bits at a time rather than 1 bit at a time:
+for (i = 0; i < input_bytes; i++) {
+ remainder ^= next_input_byte() << 24;
+ for (j = 0; j < 8; j++) {
+ multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
+ remainder = (remainder << 1) ^ multiple;
+ }
+}
+
+Or in little-endian:
+for (i = 0; i < input_bytes; i++) {
+ remainder ^= next_input_byte();
+ for (j = 0; j < 8; j++) {
+ multiple = (remainder & 1) ? CRCPOLY : 0;
+ remainder = (remainder >> 1) ^ multiple;
+ }
+}
+
+If the input is a multiple of 32 bits, you can even XOR in a 32-bit
+word at a time and increase the inner loop count to 32.
+
+You can also mix and match the two loop styles, for example doing the
+bulk of a message byte-at-a-time and adding bit-at-a-time processing
+for any fractional bytes at the end.
+
+To reduce the number of conditional branches, software commonly uses
+the byte-at-a-time table method, popularized by Dilip V. Sarwate,
+"Computation of Cyclic Redundancy Checks via Table Look-Up", Comm. ACM
+v.31 no.8 (August 1998) p. 1008-1013.
+
+Here, rather than just shifting one bit of the remainder to decide
+in the correct multiple to subtract, we can shift a byte at a time.
+This produces a 40-bit (rather than a 33-bit) intermediate remainder,
+and the correct multiple of the polynomial to subtract is found using
+a 256-entry lookup table indexed by the high 8 bits.
+
+(The table entries are simply the CRC-32 of the given one-byte messages.)
+
+When space is more constrained, smaller tables can be used, e.g. two
+4-bit shifts followed by a lookup in a 16-entry table.
+
+It is not practical to process much more than 8 bits at a time using this
+technique, because tables larger than 256 entries use too much memory and,
+more importantly, too much of the L1 cache.
+
+To get higher software performance, a "slicing" technique can be used.
+See "High Octane CRC Generation with the Intel Slicing-by-8 Algorithm",
+ftp://download.intel.com/technology/comms/perfnet/download/slicing-by-8.pdf
+
+This does not change the number of table lookups, but does increase
+the parallelism. With the classic Sarwate algorithm, each table lookup
+must be completed before the index of the next can be computed.
+
+A "slicing by 2" technique would shift the remainder 16 bits at a time,
+producing a 48-bit intermediate remainder. Rather than doing a single
+lookup in a 65536-entry table, the two high bytes are looked up in
+two different 256-entry tables. Each contains the remainder required
+to cancel out the corresponding byte. The tables are different because the
+polynomials to cancel are different. One has non-zero coefficients from
+x^32 to x^39, while the other goes from x^40 to x^47.
+
+Since modern processors can handle many parallel memory operations, this
+takes barely longer than a single table look-up and thus performs almost
+twice as fast as the basic Sarwate algorithm.
+
+This can be extended to "slicing by 4" using 4 256-entry tables.
+Each step, 32 bits of data is fetched, XORed with the CRC, and the result
+broken into bytes and looked up in the tables. Because the 32-bit shift
+leaves the low-order bits of the intermediate remainder zero, the
+final CRC is simply the XOR of the 4 table look-ups.
+
+But this still enforces sequential execution: a second group of table
+look-ups cannot begin until the previous groups 4 table look-ups have all
+been completed. Thus, the processor's load/store unit is sometimes idle.
+
+To make maximum use of the processor, "slicing by 8" performs 8 look-ups
+in parallel. Each step, the 32-bit CRC is shifted 64 bits and XORed
+with 64 bits of input data. What is important to note is that 4 of
+those 8 bytes are simply copies of the input data; they do not depend
+on the previous CRC at all. Thus, those 4 table look-ups may commence
+immediately, without waiting for the previous loop iteration.
+
+By always having 4 loads in flight, a modern superscalar processor can
+be kept busy and make full use of its L1 cache.
+
+Two more details about CRC implementation in the real world:
+
+Normally, appending zero bits to a message which is already a multiple
+of a polynomial produces a larger multiple of that polynomial. Thus,
+a basic CRC will not detect appended zero bits (or bytes). To enable
+a CRC to detect this condition, it's common to invert the CRC before
+appending it. This makes the remainder of the message+crc come out not
+as zero, but some fixed non-zero value. (The CRC of the inversion
+pattern, 0xffffffff.)
+
+The same problem applies to zero bits prepended to the message, and a
+similar solution is used. Instead of starting the CRC computation with
+a remainder of 0, an initial remainder of all ones is used. As long as
+you start the same way on decoding, it doesn't make a difference.
+
diff --git a/lib/crc32.c b/lib/crc32.c
index 23b08ba..7ac8b0d 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -20,6 +20,8 @@
* Version 2. See the file COPYING for more details.
*/

+/* see: Documentation/crc32.txt for a description of algorithms */
+
#include <linux/crc32.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -208,133 +210,6 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
EXPORT_SYMBOL(crc32_le);
EXPORT_SYMBOL(crc32_be);

-/*
- * A brief CRC tutorial.
- *
- * A CRC is a long-division remainder. You add the CRC to the message,
- * and the whole thing (message+CRC) is a multiple of the given
- * CRC polynomial. To check the CRC, you can either check that the
- * CRC matches the recomputed value, *or* you can check that the
- * remainder computed on the message+CRC is 0. This latter approach
- * is used by a lot of hardware implementations, and is why so many
- * protocols put the end-of-frame flag after the CRC.
- *
- * It's actually the same long division you learned in school, except that
- * - We're working in binary, so the digits are only 0 and 1, and
- * - When dividing polynomials, there are no carries. Rather than add and
- * subtract, we just xor. Thus, we tend to get a bit sloppy about
- * the difference between adding and subtracting.
- *
- * A 32-bit CRC polynomial is actually 33 bits long. But since it's
- * 33 bits long, bit 32 is always going to be set, so usually the CRC
- * is written in hex with the most significant bit omitted. (If you're
- * familiar with the IEEE 754 floating-point format, it's the same idea.)
- *
- * Note that a CRC is computed over a string of *bits*, so you have
- * to decide on the endianness of the bits within each byte. To get
- * the best error-detecting properties, this should correspond to the
- * order they're actually sent. For example, standard RS-232 serial is
- * little-endian; the most significant bit (sometimes used for parity)
- * is sent last. And when appending a CRC word to a message, you should
- * do it in the right order, matching the endianness.
- *
- * Just like with ordinary division, the remainder is always smaller than
- * the divisor (the CRC polynomial) you're dividing by. Each step of the
- * division, you take one more digit (bit) of the dividend and append it
- * to the current remainder. Then you figure out the appropriate multiple
- * of the divisor to subtract to being the remainder back into range.
- * In binary, it's easy - it has to be either 0 or 1, and to make the
- * XOR cancel, it's just a copy of bit 32 of the remainder.
- *
- * When computing a CRC, we don't care about the quotient, so we can
- * throw the quotient bit away, but subtract the appropriate multiple of
- * the polynomial from the remainder and we're back to where we started,
- * ready to process the next bit.
- *
- * A big-endian CRC written this way would be coded like:
- * for (i = 0; i < input_bits; i++) {
- * multiple = remainder & 0x80000000 ? CRCPOLY : 0;
- * remainder = (remainder << 1 | next_input_bit()) ^ multiple;
- * }
- * Notice how, to get at bit 32 of the shifted remainder, we look
- * at bit 31 of the remainder *before* shifting it.
- *
- * But also notice how the next_input_bit() bits we're shifting into
- * the remainder don't actually affect any decision-making until
- * 32 bits later. Thus, the first 32 cycles of this are pretty boring.
- * Also, to add the CRC to a message, we need a 32-bit-long hole for it at
- * the end, so we have to add 32 extra cycles shifting in zeros at the
- * end of every message,
- *
- * So the standard trick is to rearrage merging in the next_input_bit()
- * until the moment it's needed. Then the first 32 cycles can be precomputed,
- * and merging in the final 32 zero bits to make room for the CRC can be
- * skipped entirely.
- * This changes the code to:
- * for (i = 0; i < input_bits; i++) {
- * remainder ^= next_input_bit() << 31;
- * multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
- * remainder = (remainder << 1) ^ multiple;
- * }
- * With this optimization, the little-endian code is simpler:
- * for (i = 0; i < input_bits; i++) {
- * remainder ^= next_input_bit();
- * multiple = (remainder & 1) ? CRCPOLY : 0;
- * remainder = (remainder >> 1) ^ multiple;
- * }
- *
- * Note that the other details of endianness have been hidden in CRCPOLY
- * (which must be bit-reversed) and next_input_bit().
- *
- * However, as long as next_input_bit is returning the bits in a sensible
- * order, we can actually do the merging 8 or more bits at a time rather
- * than one bit at a time:
- * for (i = 0; i < input_bytes; i++) {
- * remainder ^= next_input_byte() << 24;
- * for (j = 0; j < 8; j++) {
- * multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
- * remainder = (remainder << 1) ^ multiple;
- * }
- * }
- * Or in little-endian:
- * for (i = 0; i < input_bytes; i++) {
- * remainder ^= next_input_byte();
- * for (j = 0; j < 8; j++) {
- * multiple = (remainder & 1) ? CRCPOLY : 0;
- * remainder = (remainder << 1) ^ multiple;
- * }
- * }
- * If the input is a multiple of 32 bits, you can even XOR in a 32-bit
- * word at a time and increase the inner loop count to 32.
- *
- * You can also mix and match the two loop styles, for example doing the
- * bulk of a message byte-at-a-time and adding bit-at-a-time processing
- * for any fractional bytes at the end.
- *
- * The only remaining optimization is to the byte-at-a-time table method.
- * Here, rather than just shifting one bit of the remainder to decide
- * in the correct multiple to subtract, we can shift a byte at a time.
- * This produces a 40-bit (rather than a 33-bit) intermediate remainder,
- * but again the multiple of the polynomial to subtract depends only on
- * the high bits, the high 8 bits in this case.
- *
- * The multiple we need in that case is the low 32 bits of a 40-bit
- * value whose high 8 bits are given, and which is a multiple of the
- * generator polynomial. This is simply the CRC-32 of the given
- * one-byte message.
- *
- * Two more details: normally, appending zero bits to a message which
- * is already a multiple of a polynomial produces a larger multiple of that
- * polynomial. To enable a CRC to detect this condition, it's common to
- * invert the CRC before appending it. This makes the remainder of the
- * message+crc come out not as zero, but some fixed non-zero value.
- *
- * The same problem applies to zero bits prepended to the message, and
- * a similar solution is used. Instead of starting with a remainder of
- * 0, an initial remainder of all ones is used. As long as you start
- * the same way on decoding, it doesn't make a difference.
- */
-
#ifdef UNITTEST

#include <stdlib.h>

2011-12-01 20:14:10

by djwong

[permalink] [raw]
Subject: [PATCH 04/14] crc32: Speed up memory table access on powerpc

Replace 2D array references by pointer references in loops.
This change has no effect on X86 code but improves PPC
performance.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 21 +++++++++++----------
1 files changed, 11 insertions(+), 10 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 7a0e5a9..c93c9ae 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -53,20 +53,21 @@ static inline u32
crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
{
# ifdef __LITTLE_ENDIAN
-# define DO_CRC(x) crc = tab[0][(crc ^ (x)) & 255] ^ (crc >> 8)
-# define DO_CRC4 crc = tab[3][(crc) & 255] ^ \
- tab[2][(crc >> 8) & 255] ^ \
- tab[1][(crc >> 16) & 255] ^ \
- tab[0][(crc >> 24) & 255]
+# define DO_CRC(x) (crc = t0[(crc ^ (x)) & 255] ^ (crc >> 8))
+# define DO_CRC4 crc = t3[(crc) & 255] ^ \
+ t2[(crc >> 8) & 255] ^ \
+ t1[(crc >> 16) & 255] ^ \
+ t0[(crc >> 24) & 255]
# else
-# define DO_CRC(x) crc = tab[0][((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
-# define DO_CRC4 crc = tab[0][(crc) & 255] ^ \
- tab[1][(crc >> 8) & 255] ^ \
- tab[2][(crc >> 16) & 255] ^ \
- tab[3][(crc >> 24) & 255]
+# define DO_CRC(x) (crc = t0[((crc >> 24) ^ (x)) & 255] ^ (crc << 8))
+# define DO_CRC4 crc = t0[(crc) & 255] ^ \
+ t1[(crc >> 8) & 255] ^ \
+ t2[(crc >> 16) & 255] ^ \
+ t3[(crc >> 24) & 255]
# endif
const u32 *b;
size_t rem_len;
+ const u32 *t0 = tab[0], *t1 = tab[1], *t2 = tab[2], *t3 = tab[3];

/* Align it */
if (unlikely((long)buf & 3 && len)) {

2011-12-01 20:14:04

by djwong

[permalink] [raw]
Subject: [PATCH 03/14] crc32: Simplify unit test code

Replaced the unit test provided in crc32.c, which doesn't have a
makefile and doesn't compile with current headers, with a simpler
self test routine that also gives a measure of performance and
runs at module init time. The self test option can be enabled
through a configuration option CONFIG_CRC32_SELFTEST.

The test stresses the pre and post loops and is thus not very
realistic since actual uses will likely have addresses and lengths
that are at least 4 byte aligned. However, the main loop is long
enough so that the performance is dominated by that loop.

The expected values for crc32_le and crc32_be were generated
with the original version of crc32.c using CRC_BITS_LE = 8 and
CRC_BITS_BE = 8. These values were then used to check all the
values of the BITS parameters in both the original and new versions.

The performance results show some variability from run to run
in spite of attempts to both warm the cache and reduce the amount
of OS noise by limiting interrutps during the test. To get comparable
results and to analyse options wrt performance the best time
reported over a small sample of runs has been taken.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/Kconfig | 10 +
lib/crc32.c | 798 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
2 files changed, 691 insertions(+), 117 deletions(-)


diff --git a/lib/Kconfig b/lib/Kconfig
index 32f3e5a..2bc5834 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -60,6 +60,16 @@ config CRC32
kernel tree does. Such modules that use library CRC32 functions
require M here.

+config CRC32_SELFTEST
+ bool "CRC32 perform self test on init"
+ default n
+ depends on CRC32
+ help
+ This option enables the CRC32 library functions to perform a
+ self test on initialization. The self test computes crc32_le
+ and crc32_be over byte strings with random alignment and length
+ and computes the total elapsed time and number of bytes processed.
+
config CRC7
tristate "CRC7 functions"
help
diff --git a/lib/crc32.c b/lib/crc32.c
index 7ac8b0d..7a0e5a9 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -210,137 +210,701 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
EXPORT_SYMBOL(crc32_le);
EXPORT_SYMBOL(crc32_be);

-#ifdef UNITTEST
+#ifdef CONFIG_CRC32_SELFTEST

-#include <stdlib.h>
-#include <stdio.h>
-
-#if 0 /*Not used at present */
-static void
-buf_dump(char const *prefix, unsigned char const *buf, size_t len)
+/* 4096 random bytes */
+static u8 __attribute__((__aligned__(8))) test_buf[] =
{
- fputs(prefix, stdout);
- while (len--)
- printf(" %02x", *buf++);
- putchar('\n');
-
-}
-#endif
-
-static void bytereverse(unsigned char *buf, size_t len)
+ 0x5b, 0x85, 0x21, 0xcb, 0x09, 0x68, 0x7d, 0x30,
+ 0xc7, 0x69, 0xd7, 0x30, 0x92, 0xde, 0x59, 0xe4,
+ 0xc9, 0x6e, 0x8b, 0xdb, 0x98, 0x6b, 0xaa, 0x60,
+ 0xa8, 0xb5, 0xbc, 0x6c, 0xa9, 0xb1, 0x5b, 0x2c,
+ 0xea, 0xb4, 0x92, 0x6a, 0x3f, 0x79, 0x91, 0xe4,
+ 0xe9, 0x70, 0x51, 0x8c, 0x7f, 0x95, 0x6f, 0x1a,
+ 0x56, 0xa1, 0x5c, 0x27, 0x03, 0x67, 0x9f, 0x3a,
+ 0xe2, 0x31, 0x11, 0x29, 0x6b, 0x98, 0xfc, 0xc4,
+ 0x53, 0x24, 0xc5, 0x8b, 0xce, 0x47, 0xb2, 0xb9,
+ 0x32, 0xcb, 0xc1, 0xd0, 0x03, 0x57, 0x4e, 0xd4,
+ 0xe9, 0x3c, 0xa1, 0x63, 0xcf, 0x12, 0x0e, 0xca,
+ 0xe1, 0x13, 0xd1, 0x93, 0xa6, 0x88, 0x5c, 0x61,
+ 0x5b, 0xbb, 0xf0, 0x19, 0x46, 0xb4, 0xcf, 0x9e,
+ 0xb6, 0x6b, 0x4c, 0x3a, 0xcf, 0x60, 0xf9, 0x7a,
+ 0x8d, 0x07, 0x63, 0xdb, 0x40, 0xe9, 0x0b, 0x6f,
+ 0xad, 0x97, 0xf1, 0xed, 0xd0, 0x1e, 0x26, 0xfd,
+ 0xbf, 0xb7, 0xc8, 0x04, 0x94, 0xf8, 0x8b, 0x8c,
+ 0xf1, 0xab, 0x7a, 0xd4, 0xdd, 0xf3, 0xe8, 0x88,
+ 0xc3, 0xed, 0x17, 0x8a, 0x9b, 0x40, 0x0d, 0x53,
+ 0x62, 0x12, 0x03, 0x5f, 0x1b, 0x35, 0x32, 0x1f,
+ 0xb4, 0x7b, 0x93, 0x78, 0x0d, 0xdb, 0xce, 0xa4,
+ 0xc0, 0x47, 0xd5, 0xbf, 0x68, 0xe8, 0x5d, 0x74,
+ 0x8f, 0x8e, 0x75, 0x1c, 0xb2, 0x4f, 0x9a, 0x60,
+ 0xd1, 0xbe, 0x10, 0xf4, 0x5c, 0xa1, 0x53, 0x09,
+ 0xa5, 0xe0, 0x09, 0x54, 0x85, 0x5c, 0xdc, 0x07,
+ 0xe7, 0x21, 0x69, 0x7b, 0x8a, 0xfd, 0x90, 0xf1,
+ 0x22, 0xd0, 0xb4, 0x36, 0x28, 0xe6, 0xb8, 0x0f,
+ 0x39, 0xde, 0xc8, 0xf3, 0x86, 0x60, 0x34, 0xd2,
+ 0x5e, 0xdf, 0xfd, 0xcf, 0x0f, 0xa9, 0x65, 0xf0,
+ 0xd5, 0x4d, 0x96, 0x40, 0xe3, 0xdf, 0x3f, 0x95,
+ 0x5a, 0x39, 0x19, 0x93, 0xf4, 0x75, 0xce, 0x22,
+ 0x00, 0x1c, 0x93, 0xe2, 0x03, 0x66, 0xf4, 0x93,
+ 0x73, 0x86, 0x81, 0x8e, 0x29, 0x44, 0x48, 0x86,
+ 0x61, 0x7c, 0x48, 0xa3, 0x43, 0xd2, 0x9c, 0x8d,
+ 0xd4, 0x95, 0xdd, 0xe1, 0x22, 0x89, 0x3a, 0x40,
+ 0x4c, 0x1b, 0x8a, 0x04, 0xa8, 0x09, 0x69, 0x8b,
+ 0xea, 0xc6, 0x55, 0x8e, 0x57, 0xe6, 0x64, 0x35,
+ 0xf0, 0xc7, 0x16, 0x9f, 0x5d, 0x5e, 0x86, 0x40,
+ 0x46, 0xbb, 0xe5, 0x45, 0x88, 0xfe, 0xc9, 0x63,
+ 0x15, 0xfb, 0xf5, 0xbd, 0x71, 0x61, 0xeb, 0x7b,
+ 0x78, 0x70, 0x07, 0x31, 0x03, 0x9f, 0xb2, 0xc8,
+ 0xa7, 0xab, 0x47, 0xfd, 0xdf, 0xa0, 0x78, 0x72,
+ 0xa4, 0x2a, 0xe4, 0xb6, 0xba, 0xc0, 0x1e, 0x86,
+ 0x71, 0xe6, 0x3d, 0x18, 0x37, 0x70, 0xe6, 0xff,
+ 0xe0, 0xbc, 0x0b, 0x22, 0xa0, 0x1f, 0xd3, 0xed,
+ 0xa2, 0x55, 0x39, 0xab, 0xa8, 0x13, 0x73, 0x7c,
+ 0x3f, 0xb2, 0xd6, 0x19, 0xac, 0xff, 0x99, 0xed,
+ 0xe8, 0xe6, 0xa6, 0x22, 0xe3, 0x9c, 0xf1, 0x30,
+ 0xdc, 0x01, 0x0a, 0x56, 0xfa, 0xe4, 0xc9, 0x99,
+ 0xdd, 0xa8, 0xd8, 0xda, 0x35, 0x51, 0x73, 0xb4,
+ 0x40, 0x86, 0x85, 0xdb, 0x5c, 0xd5, 0x85, 0x80,
+ 0x14, 0x9c, 0xfd, 0x98, 0xa9, 0x82, 0xc5, 0x37,
+ 0xff, 0x32, 0x5d, 0xd0, 0x0b, 0xfa, 0xdc, 0x04,
+ 0x5e, 0x09, 0xd2, 0xca, 0x17, 0x4b, 0x1a, 0x8e,
+ 0x15, 0xe1, 0xcc, 0x4e, 0x52, 0x88, 0x35, 0xbd,
+ 0x48, 0xfe, 0x15, 0xa0, 0x91, 0xfd, 0x7e, 0x6c,
+ 0x0e, 0x5d, 0x79, 0x1b, 0x81, 0x79, 0xd2, 0x09,
+ 0x34, 0x70, 0x3d, 0x81, 0xec, 0xf6, 0x24, 0xbb,
+ 0xfb, 0xf1, 0x7b, 0xdf, 0x54, 0xea, 0x80, 0x9b,
+ 0xc7, 0x99, 0x9e, 0xbd, 0x16, 0x78, 0x12, 0x53,
+ 0x5e, 0x01, 0xa7, 0x4e, 0xbd, 0x67, 0xe1, 0x9b,
+ 0x4c, 0x0e, 0x61, 0x45, 0x97, 0xd2, 0xf0, 0x0f,
+ 0xfe, 0x15, 0x08, 0xb7, 0x11, 0x4c, 0xe7, 0xff,
+ 0x81, 0x53, 0xff, 0x91, 0x25, 0x38, 0x7e, 0x40,
+ 0x94, 0xe5, 0xe0, 0xad, 0xe6, 0xd9, 0x79, 0xb6,
+ 0x92, 0xc9, 0xfc, 0xde, 0xc3, 0x1a, 0x23, 0xbb,
+ 0xdd, 0xc8, 0x51, 0x0c, 0x3a, 0x72, 0xfa, 0x73,
+ 0x6f, 0xb7, 0xee, 0x61, 0x39, 0x03, 0x01, 0x3f,
+ 0x7f, 0x94, 0x2e, 0x2e, 0xba, 0x3a, 0xbb, 0xb4,
+ 0xfa, 0x6a, 0x17, 0xfe, 0xea, 0xef, 0x5e, 0x66,
+ 0x97, 0x3f, 0x32, 0x3d, 0xd7, 0x3e, 0xb1, 0xf1,
+ 0x6c, 0x14, 0x4c, 0xfd, 0x37, 0xd3, 0x38, 0x80,
+ 0xfb, 0xde, 0xa6, 0x24, 0x1e, 0xc8, 0xca, 0x7f,
+ 0x3a, 0x93, 0xd8, 0x8b, 0x18, 0x13, 0xb2, 0xe5,
+ 0xe4, 0x93, 0x05, 0x53, 0x4f, 0x84, 0x66, 0xa7,
+ 0x58, 0x5c, 0x7b, 0x86, 0x52, 0x6d, 0x0d, 0xce,
+ 0xa4, 0x30, 0x7d, 0xb6, 0x18, 0x9f, 0xeb, 0xff,
+ 0x22, 0xbb, 0x72, 0x29, 0xb9, 0x44, 0x0b, 0x48,
+ 0x1e, 0x84, 0x71, 0x81, 0xe3, 0x6d, 0x73, 0x26,
+ 0x92, 0xb4, 0x4d, 0x2a, 0x29, 0xb8, 0x1f, 0x72,
+ 0xed, 0xd0, 0xe1, 0x64, 0x77, 0xea, 0x8e, 0x88,
+ 0x0f, 0xef, 0x3f, 0xb1, 0x3b, 0xad, 0xf9, 0xc9,
+ 0x8b, 0xd0, 0xac, 0xc6, 0xcc, 0xa9, 0x40, 0xcc,
+ 0x76, 0xf6, 0x3b, 0x53, 0xb5, 0x88, 0xcb, 0xc8,
+ 0x37, 0xf1, 0xa2, 0xba, 0x23, 0x15, 0x99, 0x09,
+ 0xcc, 0xe7, 0x7a, 0x3b, 0x37, 0xf7, 0x58, 0xc8,
+ 0x46, 0x8c, 0x2b, 0x2f, 0x4e, 0x0e, 0xa6, 0x5c,
+ 0xea, 0x85, 0x55, 0xba, 0x02, 0x0e, 0x0e, 0x48,
+ 0xbc, 0xe1, 0xb1, 0x01, 0x35, 0x79, 0x13, 0x3d,
+ 0x1b, 0xc0, 0x53, 0x68, 0x11, 0xe7, 0x95, 0x0f,
+ 0x9d, 0x3f, 0x4c, 0x47, 0x7b, 0x4d, 0x1c, 0xae,
+ 0x50, 0x9b, 0xcb, 0xdd, 0x05, 0x8d, 0x9a, 0x97,
+ 0xfd, 0x8c, 0xef, 0x0c, 0x1d, 0x67, 0x73, 0xa8,
+ 0x28, 0x36, 0xd5, 0xb6, 0x92, 0x33, 0x40, 0x75,
+ 0x0b, 0x51, 0xc3, 0x64, 0xba, 0x1d, 0xc2, 0xcc,
+ 0xee, 0x7d, 0x54, 0x0f, 0x27, 0x69, 0xa7, 0x27,
+ 0x63, 0x30, 0x29, 0xd9, 0xc8, 0x84, 0xd8, 0xdf,
+ 0x9f, 0x68, 0x8d, 0x04, 0xca, 0xa6, 0xc5, 0xc7,
+ 0x7a, 0x5c, 0xc8, 0xd1, 0xcb, 0x4a, 0xec, 0xd0,
+ 0xd8, 0x20, 0x69, 0xc5, 0x17, 0xcd, 0x78, 0xc8,
+ 0x75, 0x23, 0x30, 0x69, 0xc9, 0xd4, 0xea, 0x5c,
+ 0x4f, 0x6b, 0x86, 0x3f, 0x8b, 0xfe, 0xee, 0x44,
+ 0xc9, 0x7c, 0xb7, 0xdd, 0x3e, 0xe5, 0xec, 0x54,
+ 0x03, 0x3e, 0xaa, 0x82, 0xc6, 0xdf, 0xb2, 0x38,
+ 0x0e, 0x5d, 0xb3, 0x88, 0xd9, 0xd3, 0x69, 0x5f,
+ 0x8f, 0x70, 0x8a, 0x7e, 0x11, 0xd9, 0x1e, 0x7b,
+ 0x38, 0xf1, 0x42, 0x1a, 0xc0, 0x35, 0xf5, 0xc7,
+ 0x36, 0x85, 0xf5, 0xf7, 0xb8, 0x7e, 0xc7, 0xef,
+ 0x18, 0xf1, 0x63, 0xd6, 0x7a, 0xc6, 0xc9, 0x0e,
+ 0x4d, 0x69, 0x4f, 0x84, 0xef, 0x26, 0x41, 0x0c,
+ 0xec, 0xc7, 0xe0, 0x7e, 0x3c, 0x67, 0x01, 0x4c,
+ 0x62, 0x1a, 0x20, 0x6f, 0xee, 0x47, 0x4d, 0xc0,
+ 0x99, 0x13, 0x8d, 0x91, 0x4a, 0x26, 0xd4, 0x37,
+ 0x28, 0x90, 0x58, 0x75, 0x66, 0x2b, 0x0a, 0xdf,
+ 0xda, 0xee, 0x92, 0x25, 0x90, 0x62, 0x39, 0x9e,
+ 0x44, 0x98, 0xad, 0xc1, 0x88, 0xed, 0xe4, 0xb4,
+ 0xaf, 0xf5, 0x8c, 0x9b, 0x48, 0x4d, 0x56, 0x60,
+ 0x97, 0x0f, 0x61, 0x59, 0x9e, 0xa6, 0x27, 0xfe,
+ 0xc1, 0x91, 0x15, 0x38, 0xb8, 0x0f, 0xae, 0x61,
+ 0x7d, 0x26, 0x13, 0x5a, 0x73, 0xff, 0x1c, 0xa3,
+ 0x61, 0x04, 0x58, 0x48, 0x55, 0x44, 0x11, 0xfe,
+ 0x15, 0xca, 0xc3, 0xbd, 0xca, 0xc5, 0xb4, 0x40,
+ 0x5d, 0x1b, 0x7f, 0x39, 0xb5, 0x9c, 0x35, 0xec,
+ 0x61, 0x15, 0x32, 0x32, 0xb8, 0x4e, 0x40, 0x9f,
+ 0x17, 0x1f, 0x0a, 0x4d, 0xa9, 0x91, 0xef, 0xb7,
+ 0xb0, 0xeb, 0xc2, 0x83, 0x9a, 0x6c, 0xd2, 0x79,
+ 0x43, 0x78, 0x5e, 0x2f, 0xe5, 0xdd, 0x1a, 0x3c,
+ 0x45, 0xab, 0x29, 0x40, 0x3a, 0x37, 0x5b, 0x6f,
+ 0xd7, 0xfc, 0x48, 0x64, 0x3c, 0x49, 0xfb, 0x21,
+ 0xbe, 0xc3, 0xff, 0x07, 0xfb, 0x17, 0xe9, 0xc9,
+ 0x0c, 0x4c, 0x5c, 0x15, 0x9e, 0x8e, 0x22, 0x30,
+ 0x0a, 0xde, 0x48, 0x7f, 0xdb, 0x0d, 0xd1, 0x2b,
+ 0x87, 0x38, 0x9e, 0xcc, 0x5a, 0x01, 0x16, 0xee,
+ 0x75, 0x49, 0x0d, 0x30, 0x01, 0x34, 0x6a, 0xb6,
+ 0x9a, 0x5a, 0x2a, 0xec, 0xbb, 0x48, 0xac, 0xd3,
+ 0x77, 0x83, 0xd8, 0x08, 0x86, 0x4f, 0x48, 0x09,
+ 0x29, 0x41, 0x79, 0xa1, 0x03, 0x12, 0xc4, 0xcd,
+ 0x90, 0x55, 0x47, 0x66, 0x74, 0x9a, 0xcc, 0x4f,
+ 0x35, 0x8c, 0xd6, 0x98, 0xef, 0xeb, 0x45, 0xb9,
+ 0x9a, 0x26, 0x2f, 0x39, 0xa5, 0x70, 0x6d, 0xfc,
+ 0xb4, 0x51, 0xee, 0xf4, 0x9c, 0xe7, 0x38, 0x59,
+ 0xad, 0xf4, 0xbc, 0x46, 0xff, 0x46, 0x8e, 0x60,
+ 0x9c, 0xa3, 0x60, 0x1d, 0xf8, 0x26, 0x72, 0xf5,
+ 0x72, 0x9d, 0x68, 0x80, 0x04, 0xf6, 0x0b, 0xa1,
+ 0x0a, 0xd5, 0xa7, 0x82, 0x3a, 0x3e, 0x47, 0xa8,
+ 0x5a, 0xde, 0x59, 0x4f, 0x7b, 0x07, 0xb3, 0xe9,
+ 0x24, 0x19, 0x3d, 0x34, 0x05, 0xec, 0xf1, 0xab,
+ 0x6e, 0x64, 0x8f, 0xd3, 0xe6, 0x41, 0x86, 0x80,
+ 0x70, 0xe3, 0x8d, 0x60, 0x9c, 0x34, 0x25, 0x01,
+ 0x07, 0x4d, 0x19, 0x41, 0x4e, 0x3d, 0x5c, 0x7e,
+ 0xa8, 0xf5, 0xcc, 0xd5, 0x7b, 0xe2, 0x7d, 0x3d,
+ 0x49, 0x86, 0x7d, 0x07, 0xb7, 0x10, 0xe3, 0x35,
+ 0xb8, 0x84, 0x6d, 0x76, 0xab, 0x17, 0xc6, 0x38,
+ 0xb4, 0xd3, 0x28, 0x57, 0xad, 0xd3, 0x88, 0x5a,
+ 0xda, 0xea, 0xc8, 0x94, 0xcc, 0x37, 0x19, 0xac,
+ 0x9c, 0x9f, 0x4b, 0x00, 0x15, 0xc0, 0xc8, 0xca,
+ 0x1f, 0x15, 0xaa, 0xe0, 0xdb, 0xf9, 0x2f, 0x57,
+ 0x1b, 0x24, 0xc7, 0x6f, 0x76, 0x29, 0xfb, 0xed,
+ 0x25, 0x0d, 0xc0, 0xfe, 0xbd, 0x5a, 0xbf, 0x20,
+ 0x08, 0x51, 0x05, 0xec, 0x71, 0xa3, 0xbf, 0xef,
+ 0x5e, 0x99, 0x75, 0xdb, 0x3c, 0x5f, 0x9a, 0x8c,
+ 0xbb, 0x19, 0x5c, 0x0e, 0x93, 0x19, 0xf8, 0x6a,
+ 0xbc, 0xf2, 0x12, 0x54, 0x2f, 0xcb, 0x28, 0x64,
+ 0x88, 0xb3, 0x92, 0x0d, 0x96, 0xd1, 0xa6, 0xe4,
+ 0x1f, 0xf1, 0x4d, 0xa4, 0xab, 0x1c, 0xee, 0x54,
+ 0xf2, 0xad, 0x29, 0x6d, 0x32, 0x37, 0xb2, 0x16,
+ 0x77, 0x5c, 0xdc, 0x2e, 0x54, 0xec, 0x75, 0x26,
+ 0xc6, 0x36, 0xd9, 0x17, 0x2c, 0xf1, 0x7a, 0xdc,
+ 0x4b, 0xf1, 0xe2, 0xd9, 0x95, 0xba, 0xac, 0x87,
+ 0xc1, 0xf3, 0x8e, 0x58, 0x08, 0xd8, 0x87, 0x60,
+ 0xc9, 0xee, 0x6a, 0xde, 0xa4, 0xd2, 0xfc, 0x0d,
+ 0xe5, 0x36, 0xc4, 0x5c, 0x52, 0xb3, 0x07, 0x54,
+ 0x65, 0x24, 0xc1, 0xb1, 0xd1, 0xb1, 0x53, 0x13,
+ 0x31, 0x79, 0x7f, 0x05, 0x76, 0xeb, 0x37, 0x59,
+ 0x15, 0x2b, 0xd1, 0x3f, 0xac, 0x08, 0x97, 0xeb,
+ 0x91, 0x98, 0xdf, 0x6c, 0x09, 0x0d, 0x04, 0x9f,
+ 0xdc, 0x3b, 0x0e, 0x60, 0x68, 0x47, 0x23, 0x15,
+ 0x16, 0xc6, 0x0b, 0x35, 0xf8, 0x77, 0xa2, 0x78,
+ 0x50, 0xd4, 0x64, 0x22, 0x33, 0xff, 0xfb, 0x93,
+ 0x71, 0x46, 0x50, 0x39, 0x1b, 0x9c, 0xea, 0x4e,
+ 0x8d, 0x0c, 0x37, 0xe5, 0x5c, 0x51, 0x3a, 0x31,
+ 0xb2, 0x85, 0x84, 0x3f, 0x41, 0xee, 0xa2, 0xc1,
+ 0xc6, 0x13, 0x3b, 0x54, 0x28, 0xd2, 0x18, 0x37,
+ 0xcc, 0x46, 0x9f, 0x6a, 0x91, 0x3d, 0x5a, 0x15,
+ 0x3c, 0x89, 0xa3, 0x61, 0x06, 0x7d, 0x2e, 0x78,
+ 0xbe, 0x7d, 0x40, 0xba, 0x2f, 0x95, 0xb1, 0x2f,
+ 0x87, 0x3b, 0x8a, 0xbe, 0x6a, 0xf4, 0xc2, 0x31,
+ 0x74, 0xee, 0x91, 0xe0, 0x23, 0xaa, 0x5d, 0x7f,
+ 0xdd, 0xf0, 0x44, 0x8c, 0x0b, 0x59, 0x2b, 0xfc,
+ 0x48, 0x3a, 0xdf, 0x07, 0x05, 0x38, 0x6c, 0xc9,
+ 0xeb, 0x18, 0x24, 0x68, 0x8d, 0x58, 0x98, 0xd3,
+ 0x31, 0xa3, 0xe4, 0x70, 0x59, 0xb1, 0x21, 0xbe,
+ 0x7e, 0x65, 0x7d, 0xb8, 0x04, 0xab, 0xf6, 0xe4,
+ 0xd7, 0xda, 0xec, 0x09, 0x8f, 0xda, 0x6d, 0x24,
+ 0x07, 0xcc, 0x29, 0x17, 0x05, 0x78, 0x1a, 0xc1,
+ 0xb1, 0xce, 0xfc, 0xaa, 0x2d, 0xe7, 0xcc, 0x85,
+ 0x84, 0x84, 0x03, 0x2a, 0x0c, 0x3f, 0xa9, 0xf8,
+ 0xfd, 0x84, 0x53, 0x59, 0x5c, 0xf0, 0xd4, 0x09,
+ 0xf0, 0xd2, 0x6c, 0x32, 0x03, 0xb0, 0xa0, 0x8c,
+ 0x52, 0xeb, 0x23, 0x91, 0x88, 0x43, 0x13, 0x46,
+ 0xf6, 0x1e, 0xb4, 0x1b, 0xf5, 0x8e, 0x3a, 0xb5,
+ 0x3d, 0x00, 0xf6, 0xe5, 0x08, 0x3d, 0x5f, 0x39,
+ 0xd3, 0x21, 0x69, 0xbc, 0x03, 0x22, 0x3a, 0xd2,
+ 0x5c, 0x84, 0xf8, 0x15, 0xc4, 0x80, 0x0b, 0xbc,
+ 0x29, 0x3c, 0xf3, 0x95, 0x98, 0xcd, 0x8f, 0x35,
+ 0xbc, 0xa5, 0x3e, 0xfc, 0xd4, 0x13, 0x9e, 0xde,
+ 0x4f, 0xce, 0x71, 0x9d, 0x09, 0xad, 0xf2, 0x80,
+ 0x6b, 0x65, 0x7f, 0x03, 0x00, 0x14, 0x7c, 0x15,
+ 0x85, 0x40, 0x6d, 0x70, 0xea, 0xdc, 0xb3, 0x63,
+ 0x35, 0x4f, 0x4d, 0xe0, 0xd9, 0xd5, 0x3c, 0x58,
+ 0x56, 0x23, 0x80, 0xe2, 0x36, 0xdd, 0x75, 0x1d,
+ 0x94, 0x11, 0x41, 0x8e, 0xe0, 0x81, 0x8e, 0xcf,
+ 0xe0, 0xe5, 0xf6, 0xde, 0xd1, 0xe7, 0x04, 0x12,
+ 0x79, 0x92, 0x2b, 0x71, 0x2a, 0x79, 0x8b, 0x7c,
+ 0x44, 0x79, 0x16, 0x30, 0x4e, 0xf4, 0xf6, 0x9b,
+ 0xb7, 0x40, 0xa3, 0x5a, 0xa7, 0x69, 0x3e, 0xc1,
+ 0x3a, 0x04, 0xd0, 0x88, 0xa0, 0x3b, 0xdd, 0xc6,
+ 0x9e, 0x7e, 0x1e, 0x1e, 0x8f, 0x44, 0xf7, 0x73,
+ 0x67, 0x1e, 0x1a, 0x78, 0xfa, 0x62, 0xf4, 0xa9,
+ 0xa8, 0xc6, 0x5b, 0xb8, 0xfa, 0x06, 0x7d, 0x5e,
+ 0x38, 0x1c, 0x9a, 0x39, 0xe9, 0x39, 0x98, 0x22,
+ 0x0b, 0xa7, 0xac, 0x0b, 0xf3, 0xbc, 0xf1, 0xeb,
+ 0x8c, 0x81, 0xe3, 0x48, 0x8a, 0xed, 0x42, 0xc2,
+ 0x38, 0xcf, 0x3e, 0xda, 0xd2, 0x89, 0x8d, 0x9c,
+ 0x53, 0xb5, 0x2f, 0x41, 0x01, 0x26, 0x84, 0x9c,
+ 0xa3, 0x56, 0xf6, 0x49, 0xc7, 0xd4, 0x9f, 0x93,
+ 0x1b, 0x96, 0x49, 0x5e, 0xad, 0xb3, 0x84, 0x1f,
+ 0x3c, 0xa4, 0xe0, 0x9b, 0xd1, 0x90, 0xbc, 0x38,
+ 0x6c, 0xdd, 0x95, 0x4d, 0x9d, 0xb1, 0x71, 0x57,
+ 0x2d, 0x34, 0xe8, 0xb8, 0x42, 0xc7, 0x99, 0x03,
+ 0xc7, 0x07, 0x30, 0x65, 0x91, 0x55, 0xd5, 0x90,
+ 0x70, 0x97, 0x37, 0x68, 0xd4, 0x11, 0xf9, 0xe8,
+ 0xce, 0xec, 0xdc, 0x34, 0xd5, 0xd3, 0xb7, 0xc4,
+ 0xb8, 0x97, 0x05, 0x92, 0xad, 0xf8, 0xe2, 0x36,
+ 0x64, 0x41, 0xc9, 0xc5, 0x41, 0x77, 0x52, 0xd7,
+ 0x2c, 0xa5, 0x24, 0x2f, 0xd9, 0x34, 0x0b, 0x47,
+ 0x35, 0xa7, 0x28, 0x8b, 0xc5, 0xcd, 0xe9, 0x46,
+ 0xac, 0x39, 0x94, 0x3c, 0x10, 0xc6, 0x29, 0x73,
+ 0x0e, 0x0e, 0x5d, 0xe0, 0x71, 0x03, 0x8a, 0x72,
+ 0x0e, 0x26, 0xb0, 0x7d, 0x84, 0xed, 0x95, 0x23,
+ 0x49, 0x5a, 0x45, 0x83, 0x45, 0x60, 0x11, 0x4a,
+ 0x46, 0x31, 0xd4, 0xd8, 0x16, 0x54, 0x98, 0x58,
+ 0xed, 0x6d, 0xcc, 0x5d, 0xd6, 0x50, 0x61, 0x9f,
+ 0x9d, 0xc5, 0x3e, 0x9d, 0x32, 0x47, 0xde, 0x96,
+ 0xe1, 0x5d, 0xd8, 0xf8, 0xb4, 0x69, 0x6f, 0xb9,
+ 0x15, 0x90, 0x57, 0x7a, 0xf6, 0xad, 0xb0, 0x5b,
+ 0xf5, 0xa6, 0x36, 0x94, 0xfd, 0x84, 0xce, 0x1c,
+ 0x0f, 0x4b, 0xd0, 0xc2, 0x5b, 0x6b, 0x56, 0xef,
+ 0x73, 0x93, 0x0b, 0xc3, 0xee, 0xd9, 0xcf, 0xd3,
+ 0xa4, 0x22, 0x58, 0xcd, 0x50, 0x6e, 0x65, 0xf4,
+ 0xe9, 0xb7, 0x71, 0xaf, 0x4b, 0xb3, 0xb6, 0x2f,
+ 0x0f, 0x0e, 0x3b, 0xc9, 0x85, 0x14, 0xf5, 0x17,
+ 0xe8, 0x7a, 0x3a, 0xbf, 0x5f, 0x5e, 0xf8, 0x18,
+ 0x48, 0xa6, 0x72, 0xab, 0x06, 0x95, 0xe9, 0xc8,
+ 0xa7, 0xf4, 0x32, 0x44, 0x04, 0x0c, 0x84, 0x98,
+ 0x73, 0xe3, 0x89, 0x8d, 0x5f, 0x7e, 0x4a, 0x42,
+ 0x8f, 0xc5, 0x28, 0xb1, 0x82, 0xef, 0x1c, 0x97,
+ 0x31, 0x3b, 0x4d, 0xe0, 0x0e, 0x10, 0x10, 0x97,
+ 0x93, 0x49, 0x78, 0x2f, 0x0d, 0x86, 0x8b, 0xa1,
+ 0x53, 0xa9, 0x81, 0x20, 0x79, 0xe7, 0x07, 0x77,
+ 0xb6, 0xac, 0x5e, 0xd2, 0x05, 0xcd, 0xe9, 0xdb,
+ 0x8a, 0x94, 0x82, 0x8a, 0x23, 0xb9, 0x3d, 0x1c,
+ 0xa9, 0x7d, 0x72, 0x4a, 0xed, 0x33, 0xa3, 0xdb,
+ 0x21, 0xa7, 0x86, 0x33, 0x45, 0xa5, 0xaa, 0x56,
+ 0x45, 0xb5, 0x83, 0x29, 0x40, 0x47, 0x79, 0x04,
+ 0x6e, 0xb9, 0x95, 0xd0, 0x81, 0x77, 0x2d, 0x48,
+ 0x1e, 0xfe, 0xc3, 0xc2, 0x1e, 0xe5, 0xf2, 0xbe,
+ 0xfd, 0x3b, 0x94, 0x9f, 0xc4, 0xc4, 0x26, 0x9d,
+ 0xe4, 0x66, 0x1e, 0x19, 0xee, 0x6c, 0x79, 0x97,
+ 0x11, 0x31, 0x4b, 0x0d, 0x01, 0xcb, 0xde, 0xa8,
+ 0xf6, 0x6d, 0x7c, 0x39, 0x46, 0x4e, 0x7e, 0x3f,
+ 0x94, 0x17, 0xdf, 0xa1, 0x7d, 0xd9, 0x1c, 0x8e,
+ 0xbc, 0x7d, 0x33, 0x7d, 0xe3, 0x12, 0x40, 0xca,
+ 0xab, 0x37, 0x11, 0x46, 0xd4, 0xae, 0xef, 0x44,
+ 0xa2, 0xb3, 0x6a, 0x66, 0x0e, 0x0c, 0x90, 0x7f,
+ 0xdf, 0x5c, 0x66, 0x5f, 0xf2, 0x94, 0x9f, 0xa6,
+ 0x73, 0x4f, 0xeb, 0x0d, 0xad, 0xbf, 0xc0, 0x63,
+ 0x5c, 0xdc, 0x46, 0x51, 0xe8, 0x8e, 0x90, 0x19,
+ 0xa8, 0xa4, 0x3c, 0x91, 0x79, 0xfa, 0x7e, 0x58,
+ 0x85, 0x13, 0x55, 0xc5, 0x19, 0x82, 0x37, 0x1b,
+ 0x0a, 0x02, 0x1f, 0x99, 0x6b, 0x18, 0xf1, 0x28,
+ 0x08, 0xa2, 0x73, 0xb8, 0x0f, 0x2e, 0xcd, 0xbf,
+ 0xf3, 0x86, 0x7f, 0xea, 0xef, 0xd0, 0xbb, 0xa6,
+ 0x21, 0xdf, 0x49, 0x73, 0x51, 0xcc, 0x36, 0xd3,
+ 0x3e, 0xa0, 0xf8, 0x44, 0xdf, 0xd3, 0xa6, 0xbe,
+ 0x8a, 0xd4, 0x57, 0xdd, 0x72, 0x94, 0x61, 0x0f,
+ 0x82, 0xd1, 0x07, 0xb8, 0x7c, 0x18, 0x83, 0xdf,
+ 0x3a, 0xe5, 0x50, 0x6a, 0x82, 0x20, 0xac, 0xa9,
+ 0xa8, 0xff, 0xd9, 0xf3, 0x77, 0x33, 0x5a, 0x9e,
+ 0x7f, 0x6d, 0xfe, 0x5d, 0x33, 0x41, 0x42, 0xe7,
+ 0x6c, 0x19, 0xe0, 0x44, 0x8a, 0x15, 0xf6, 0x70,
+ 0x98, 0xb7, 0x68, 0x4d, 0xfa, 0x97, 0x39, 0xb0,
+ 0x8e, 0xe8, 0x84, 0x8b, 0x75, 0x30, 0xb7, 0x7d,
+ 0x92, 0x69, 0x20, 0x9c, 0x81, 0xfb, 0x4b, 0xf4,
+ 0x01, 0x50, 0xeb, 0xce, 0x0c, 0x1c, 0x6c, 0xb5,
+ 0x4a, 0xd7, 0x27, 0x0c, 0xce, 0xbb, 0xe5, 0x85,
+ 0xf0, 0xb6, 0xee, 0xd5, 0x70, 0xdd, 0x3b, 0xfc,
+ 0xd4, 0x99, 0xf1, 0x33, 0xdd, 0x8b, 0xc4, 0x2f,
+ 0xae, 0xab, 0x74, 0x96, 0x32, 0xc7, 0x4c, 0x56,
+ 0x3c, 0x89, 0x0f, 0x96, 0x0b, 0x42, 0xc0, 0xcb,
+ 0xee, 0x0f, 0x0b, 0x8c, 0xfb, 0x7e, 0x47, 0x7b,
+ 0x64, 0x48, 0xfd, 0xb2, 0x00, 0x80, 0x89, 0xa5,
+ 0x13, 0x55, 0x62, 0xfc, 0x8f, 0xe2, 0x42, 0x03,
+ 0xb7, 0x4e, 0x2a, 0x79, 0xb4, 0x82, 0xea, 0x23,
+ 0x49, 0xda, 0xaf, 0x52, 0x63, 0x1e, 0x60, 0x03,
+ 0x89, 0x06, 0x44, 0x46, 0x08, 0xc3, 0xc4, 0x87,
+ 0x70, 0x2e, 0xda, 0x94, 0xad, 0x6b, 0xe0, 0xe4,
+ 0xd1, 0x8a, 0x06, 0xc2, 0xa8, 0xc0, 0xa7, 0x43,
+ 0x3c, 0x47, 0x52, 0x0e, 0xc3, 0x77, 0x81, 0x11,
+ 0x67, 0x0e, 0xa0, 0x70, 0x04, 0x47, 0x29, 0x40,
+ 0x86, 0x0d, 0x34, 0x56, 0xa7, 0xc9, 0x35, 0x59,
+ 0x68, 0xdc, 0x93, 0x81, 0x70, 0xee, 0x86, 0xd9,
+ 0x80, 0x06, 0x40, 0x4f, 0x1a, 0x0d, 0x40, 0x30,
+ 0x0b, 0xcb, 0x96, 0x47, 0xc1, 0xb7, 0x52, 0xfd,
+ 0x56, 0xe0, 0x72, 0x4b, 0xfb, 0xbd, 0x92, 0x45,
+ 0x61, 0x71, 0xc2, 0x33, 0x11, 0xbf, 0x52, 0x83,
+ 0x79, 0x26, 0xe0, 0x49, 0x6b, 0xb7, 0x05, 0x8b,
+ 0xe8, 0x0e, 0x87, 0x31, 0xd7, 0x9d, 0x8a, 0xf5,
+ 0xc0, 0x5f, 0x2e, 0x58, 0x4a, 0xdb, 0x11, 0xb3,
+ 0x6c, 0x30, 0x2a, 0x46, 0x19, 0xe3, 0x27, 0x84,
+ 0x1f, 0x63, 0x6e, 0xf6, 0x57, 0xc7, 0xc9, 0xd8,
+ 0x5e, 0xba, 0xb3, 0x87, 0xd5, 0x83, 0x26, 0x34,
+ 0x21, 0x9e, 0x65, 0xde, 0x42, 0xd3, 0xbe, 0x7b,
+ 0xbc, 0x91, 0x71, 0x44, 0x4d, 0x99, 0x3b, 0x31,
+ 0xe5, 0x3f, 0x11, 0x4e, 0x7f, 0x13, 0x51, 0x3b,
+ 0xae, 0x79, 0xc9, 0xd3, 0x81, 0x8e, 0x25, 0x40,
+ 0x10, 0xfc, 0x07, 0x1e, 0xf9, 0x7b, 0x9a, 0x4b,
+ 0x6c, 0xe3, 0xb3, 0xad, 0x1a, 0x0a, 0xdd, 0x9e,
+ 0x59, 0x0c, 0xa2, 0xcd, 0xae, 0x48, 0x4a, 0x38,
+ 0x5b, 0x47, 0x41, 0x94, 0x65, 0x6b, 0xbb, 0xeb,
+ 0x5b, 0xe3, 0xaf, 0x07, 0x5b, 0xd4, 0x4a, 0xa2,
+ 0xc9, 0x5d, 0x2f, 0x64, 0x03, 0xd7, 0x3a, 0x2c,
+ 0x6e, 0xce, 0x76, 0x95, 0xb4, 0xb3, 0xc0, 0xf1,
+ 0xe2, 0x45, 0x73, 0x7a, 0x5c, 0xab, 0xc1, 0xfc,
+ 0x02, 0x8d, 0x81, 0x29, 0xb3, 0xac, 0x07, 0xec,
+ 0x40, 0x7d, 0x45, 0xd9, 0x7a, 0x59, 0xee, 0x34,
+ 0xf0, 0xe9, 0xd5, 0x7b, 0x96, 0xb1, 0x3d, 0x95,
+ 0xcc, 0x86, 0xb5, 0xb6, 0x04, 0x2d, 0xb5, 0x92,
+ 0x7e, 0x76, 0xf4, 0x06, 0xa9, 0xa3, 0x12, 0x0f,
+ 0xb1, 0xaf, 0x26, 0xba, 0x7c, 0xfc, 0x7e, 0x1c,
+ 0xbc, 0x2c, 0x49, 0x97, 0x53, 0x60, 0x13, 0x0b,
+ 0xa6, 0x61, 0x83, 0x89, 0x42, 0xd4, 0x17, 0x0c,
+ 0x6c, 0x26, 0x52, 0xc3, 0xb3, 0xd4, 0x67, 0xf5,
+ 0xe3, 0x04, 0xb7, 0xf4, 0xcb, 0x80, 0xb8, 0xcb,
+ 0x77, 0x56, 0x3e, 0xaa, 0x57, 0x54, 0xee, 0xb4,
+ 0x2c, 0x67, 0xcf, 0xf2, 0xdc, 0xbe, 0x55, 0xf9,
+ 0x43, 0x1f, 0x6e, 0x22, 0x97, 0x67, 0x7f, 0xc4,
+ 0xef, 0xb1, 0x26, 0x31, 0x1e, 0x27, 0xdf, 0x41,
+ 0x80, 0x47, 0x6c, 0xe2, 0xfa, 0xa9, 0x8c, 0x2a,
+ 0xf6, 0xf2, 0xab, 0xf0, 0x15, 0xda, 0x6c, 0xc8,
+ 0xfe, 0xb5, 0x23, 0xde, 0xa9, 0x05, 0x3f, 0x06,
+ 0x54, 0x4c, 0xcd, 0xe1, 0xab, 0xfc, 0x0e, 0x62,
+ 0x33, 0x31, 0x73, 0x2c, 0x76, 0xcb, 0xb4, 0x47,
+ 0x1e, 0x20, 0xad, 0xd8, 0xf2, 0x31, 0xdd, 0xc4,
+ 0x8b, 0x0c, 0x77, 0xbe, 0xe1, 0x8b, 0x26, 0x00,
+ 0x02, 0x58, 0xd6, 0x8d, 0xef, 0xad, 0x74, 0x67,
+ 0xab, 0x3f, 0xef, 0xcb, 0x6f, 0xb0, 0xcc, 0x81,
+ 0x44, 0x4c, 0xaf, 0xe9, 0x49, 0x4f, 0xdb, 0xa0,
+ 0x25, 0xa4, 0xf0, 0x89, 0xf1, 0xbe, 0xd8, 0x10,
+ 0xff, 0xb1, 0x3b, 0x4b, 0xfa, 0x98, 0xf5, 0x79,
+ 0x6d, 0x1e, 0x69, 0x4d, 0x57, 0xb1, 0xc8, 0x19,
+ 0x1b, 0xbd, 0x1e, 0x8c, 0x84, 0xb7, 0x7b, 0xe8,
+ 0xd2, 0x2d, 0x09, 0x41, 0x41, 0x37, 0x3d, 0xb1,
+ 0x6f, 0x26, 0x5d, 0x71, 0x16, 0x3d, 0xb7, 0x83,
+ 0x27, 0x2c, 0xa7, 0xb6, 0x50, 0xbd, 0x91, 0x86,
+ 0xab, 0x24, 0xa1, 0x38, 0xfd, 0xea, 0x71, 0x55,
+ 0x7e, 0x9a, 0x07, 0x77, 0x4b, 0xfa, 0x61, 0x66,
+ 0x20, 0x1e, 0x28, 0x95, 0x18, 0x1b, 0xa4, 0xa0,
+ 0xfd, 0xc0, 0x89, 0x72, 0x43, 0xd9, 0x3b, 0x49,
+ 0x5a, 0x3f, 0x9d, 0xbf, 0xdb, 0xb4, 0x46, 0xea,
+ 0x42, 0x01, 0x77, 0x23, 0x68, 0x95, 0xb6, 0x24,
+ 0xb3, 0xa8, 0x6c, 0x28, 0x3b, 0x11, 0x40, 0x7e,
+ 0x18, 0x65, 0x6d, 0xd8, 0x24, 0x42, 0x7d, 0x88,
+ 0xc0, 0x52, 0xd9, 0x05, 0xe4, 0x95, 0x90, 0x87,
+ 0x8c, 0xf4, 0xd0, 0x6b, 0xb9, 0x83, 0x99, 0x34,
+ 0x6d, 0xfe, 0x54, 0x40, 0x94, 0x52, 0x21, 0x4f,
+ 0x14, 0x25, 0xc5, 0xd6, 0x5e, 0x95, 0xdc, 0x0a,
+ 0x2b, 0x89, 0x20, 0x11, 0x84, 0x48, 0xd6, 0x3a,
+ 0xcd, 0x5c, 0x24, 0xad, 0x62, 0xe3, 0xb1, 0x93,
+ 0x25, 0x8d, 0xcd, 0x7e, 0xfc, 0x27, 0xa3, 0x37,
+ 0xfd, 0x84, 0xfc, 0x1b, 0xb2, 0xf1, 0x27, 0x38,
+ 0x5a, 0xb7, 0xfc, 0xf2, 0xfa, 0x95, 0x66, 0xd4,
+ 0xfb, 0xba, 0xa7, 0xd7, 0xa3, 0x72, 0x69, 0x48,
+ 0x48, 0x8c, 0xeb, 0x28, 0x89, 0xfe, 0x33, 0x65,
+ 0x5a, 0x36, 0x01, 0x7e, 0x06, 0x79, 0x0a, 0x09,
+ 0x3b, 0x74, 0x11, 0x9a, 0x6e, 0xbf, 0xd4, 0x9e,
+ 0x58, 0x90, 0x49, 0x4f, 0x4d, 0x08, 0xd4, 0xe5,
+ 0x4a, 0x09, 0x21, 0xef, 0x8b, 0xb8, 0x74, 0x3b,
+ 0x91, 0xdd, 0x36, 0x85, 0x60, 0x2d, 0xfa, 0xd4,
+ 0x45, 0x7b, 0x45, 0x53, 0xf5, 0x47, 0x87, 0x7e,
+ 0xa6, 0x37, 0xc8, 0x78, 0x7a, 0x68, 0x9d, 0x8d,
+ 0x65, 0x2c, 0x0e, 0x91, 0x5c, 0xa2, 0x60, 0xf0,
+ 0x8e, 0x3f, 0xe9, 0x1a, 0xcd, 0xaa, 0xe7, 0xd5,
+ 0x77, 0x18, 0xaf, 0xc9, 0xbc, 0x18, 0xea, 0x48,
+ 0x1b, 0xfb, 0x22, 0x48, 0x70, 0x16, 0x29, 0x9e,
+ 0x5b, 0xc1, 0x2c, 0x66, 0x23, 0xbc, 0xf0, 0x1f,
+ 0xef, 0xaf, 0xe4, 0xd6, 0x04, 0x19, 0x82, 0x7a,
+ 0x0b, 0xba, 0x4b, 0x46, 0xb1, 0x6a, 0x85, 0x5d,
+ 0xb4, 0x73, 0xd6, 0x21, 0xa1, 0x71, 0x60, 0x14,
+ 0xee, 0x0a, 0x77, 0xc4, 0x66, 0x2e, 0xf9, 0x69,
+ 0x30, 0xaf, 0x41, 0x0b, 0xc8, 0x83, 0x3c, 0x53,
+ 0x99, 0x19, 0x27, 0x46, 0xf7, 0x41, 0x6e, 0x56,
+ 0xdc, 0x94, 0x28, 0x67, 0x4e, 0xb7, 0x25, 0x48,
+ 0x8a, 0xc2, 0xe0, 0x60, 0x96, 0xcc, 0x18, 0xf4,
+ 0x84, 0xdd, 0xa7, 0x5e, 0x3e, 0x05, 0x0b, 0x26,
+ 0x26, 0xb2, 0x5c, 0x1f, 0x57, 0x1a, 0x04, 0x7e,
+ 0x6a, 0xe3, 0x2f, 0xb4, 0x35, 0xb6, 0x38, 0x40,
+ 0x40, 0xcd, 0x6f, 0x87, 0x2e, 0xef, 0xa3, 0xd7,
+ 0xa9, 0xc2, 0xe8, 0x0d, 0x27, 0xdf, 0x44, 0x62,
+ 0x99, 0xa0, 0xfc, 0xcf, 0x81, 0x78, 0xcb, 0xfe,
+ 0xe5, 0xa0, 0x03, 0x4e, 0x6c, 0xd7, 0xf4, 0xaf,
+ 0x7a, 0xbb, 0x61, 0x82, 0xfe, 0x71, 0x89, 0xb2,
+ 0x22, 0x7c, 0x8e, 0x83, 0x04, 0xce, 0xf6, 0x5d,
+ 0x84, 0x8f, 0x95, 0x6a, 0x7f, 0xad, 0xfd, 0x32,
+ 0x9c, 0x5e, 0xe4, 0x9c, 0x89, 0x60, 0x54, 0xaa,
+ 0x96, 0x72, 0xd2, 0xd7, 0x36, 0x85, 0xa9, 0x45,
+ 0xd2, 0x2a, 0xa1, 0x81, 0x49, 0x6f, 0x7e, 0x04,
+ 0xfa, 0xe2, 0xfe, 0x90, 0x26, 0x77, 0x5a, 0x33,
+ 0xb8, 0x04, 0x9a, 0x7a, 0xe6, 0x4c, 0x4f, 0xad,
+ 0x72, 0x96, 0x08, 0x28, 0x58, 0x13, 0xf8, 0xc4,
+ 0x1c, 0xf0, 0xc3, 0x45, 0x95, 0x49, 0x20, 0x8c,
+ 0x9f, 0x39, 0x70, 0xe1, 0x77, 0xfe, 0xd5, 0x4b,
+ 0xaf, 0x86, 0xda, 0xef, 0x22, 0x06, 0x83, 0x36,
+ 0x29, 0x12, 0x11, 0x40, 0xbc, 0x3b, 0x86, 0xaa,
+ 0xaa, 0x65, 0x60, 0xc3, 0x80, 0xca, 0xed, 0xa9,
+ 0xf3, 0xb0, 0x79, 0x96, 0xa2, 0x55, 0x27, 0x28,
+ 0x55, 0x73, 0x26, 0xa5, 0x50, 0xea, 0x92, 0x4b,
+ 0x3c, 0x5c, 0x82, 0x33, 0xf0, 0x01, 0x3f, 0x03,
+ 0xc1, 0x08, 0x05, 0xbf, 0x98, 0xf4, 0x9b, 0x6d,
+ 0xa5, 0xa8, 0xb4, 0x82, 0x0c, 0x06, 0xfa, 0xff,
+ 0x2d, 0x08, 0xf3, 0x05, 0x4f, 0x57, 0x2a, 0x39,
+ 0xd4, 0x83, 0x0d, 0x75, 0x51, 0xd8, 0x5b, 0x1b,
+ 0xd3, 0x51, 0x5a, 0x32, 0x2a, 0x9b, 0x32, 0xb2,
+ 0xf2, 0xa4, 0x96, 0x12, 0xf2, 0xae, 0x40, 0x34,
+ 0x67, 0xa8, 0xf5, 0x44, 0xd5, 0x35, 0x53, 0xfe,
+ 0xa3, 0x60, 0x96, 0x63, 0x0f, 0x1f, 0x6e, 0xb0,
+ 0x5a, 0x42, 0xa6, 0xfc, 0x51, 0x0b, 0x60, 0x27,
+ 0xbc, 0x06, 0x71, 0xed, 0x65, 0x5b, 0x23, 0x86,
+ 0x4a, 0x07, 0x3b, 0x22, 0x07, 0x46, 0xe6, 0x90,
+ 0x3e, 0xf3, 0x25, 0x50, 0x1b, 0x4c, 0x7f, 0x03,
+ 0x08, 0xa8, 0x36, 0x6b, 0x87, 0xe5, 0xe3, 0xdb,
+ 0x9a, 0x38, 0x83, 0xff, 0x9f, 0x1a, 0x9f, 0x57,
+ 0xa4, 0x2a, 0xf6, 0x37, 0xbc, 0x1a, 0xff, 0xc9,
+ 0x1e, 0x35, 0x0c, 0xc3, 0x7c, 0xa3, 0xb2, 0xe5,
+ 0xd2, 0xc6, 0xb4, 0x57, 0x47, 0xe4, 0x32, 0x16,
+ 0x6d, 0xa9, 0xae, 0x64, 0xe6, 0x2d, 0x8d, 0xc5,
+ 0x8d, 0x50, 0x8e, 0xe8, 0x1a, 0x22, 0x34, 0x2a,
+ 0xd9, 0xeb, 0x51, 0x90, 0x4a, 0xb1, 0x41, 0x7d,
+ 0x64, 0xf9, 0xb9, 0x0d, 0xf6, 0x23, 0x33, 0xb0,
+ 0x33, 0xf4, 0xf7, 0x3f, 0x27, 0x84, 0xc6, 0x0f,
+ 0x54, 0xa5, 0xc0, 0x2e, 0xec, 0x0b, 0x3a, 0x48,
+ 0x6e, 0x80, 0x35, 0x81, 0x43, 0x9b, 0x90, 0xb1,
+ 0xd0, 0x2b, 0xea, 0x21, 0xdc, 0xda, 0x5b, 0x09,
+ 0xf4, 0xcc, 0x10, 0xb4, 0xc7, 0xfe, 0x79, 0x51,
+ 0xc3, 0xc5, 0xac, 0x88, 0x74, 0x84, 0x0b, 0x4b,
+ 0xca, 0x79, 0x16, 0x29, 0xfb, 0x69, 0x54, 0xdf,
+ 0x41, 0x7e, 0xe9, 0xc7, 0x8e, 0xea, 0xa5, 0xfe,
+ 0xfc, 0x76, 0x0e, 0x90, 0xc4, 0x92, 0x38, 0xad,
+ 0x7b, 0x48, 0xe6, 0x6e, 0xf7, 0x21, 0xfd, 0x4e,
+ 0x93, 0x0a, 0x7b, 0x41, 0x83, 0x68, 0xfb, 0x57,
+ 0x51, 0x76, 0x34, 0xa9, 0x6c, 0x00, 0xaa, 0x4f,
+ 0x66, 0x65, 0x98, 0x4a, 0x4f, 0xa3, 0xa0, 0xef,
+ 0x69, 0x3f, 0xe3, 0x1c, 0x92, 0x8c, 0xfd, 0xd8,
+ 0xe8, 0xde, 0x7c, 0x7f, 0x3e, 0x84, 0x8e, 0x69,
+ 0x3c, 0xf1, 0xf2, 0x05, 0x46, 0xdc, 0x2f, 0x9d,
+ 0x5e, 0x6e, 0x4c, 0xfb, 0xb5, 0x99, 0x2a, 0x59,
+ 0x63, 0xc1, 0x34, 0xbc, 0x57, 0xc0, 0x0d, 0xb9,
+ 0x61, 0x25, 0xf3, 0x33, 0x23, 0x51, 0xb6, 0x0d,
+ 0x07, 0xa6, 0xab, 0x94, 0x4a, 0xb7, 0x2a, 0xea,
+ 0xee, 0xac, 0xa3, 0xc3, 0x04, 0x8b, 0x0e, 0x56,
+ 0xfe, 0x44, 0xa7, 0x39, 0xe2, 0xed, 0xed, 0xb4,
+ 0x22, 0x2b, 0xac, 0x12, 0x32, 0x28, 0x91, 0xd8,
+ 0xa5, 0xab, 0xff, 0x5f, 0xe0, 0x4b, 0xda, 0x78,
+ 0x17, 0xda, 0xf1, 0x01, 0x5b, 0xcd, 0xe2, 0x5f,
+ 0x50, 0x45, 0x73, 0x2b, 0xe4, 0x76, 0x77, 0xf4,
+ 0x64, 0x1d, 0x43, 0xfb, 0x84, 0x7a, 0xea, 0x91,
+ 0xae, 0xf9, 0x9e, 0xb7, 0xb4, 0xb0, 0x91, 0x5f,
+ 0x16, 0x35, 0x9a, 0x11, 0xb8, 0xc7, 0xc1, 0x8c,
+ 0xc6, 0x10, 0x8d, 0x2f, 0x63, 0x4a, 0xa7, 0x57,
+ 0x3a, 0x51, 0xd6, 0x32, 0x2d, 0x64, 0x72, 0xd4,
+ 0x66, 0xdc, 0x10, 0xa6, 0x67, 0xd6, 0x04, 0x23,
+ 0x9d, 0x0a, 0x11, 0x77, 0xdd, 0x37, 0x94, 0x17,
+ 0x3c, 0xbf, 0x8b, 0x65, 0xb0, 0x2e, 0x5e, 0x66,
+ 0x47, 0x64, 0xac, 0xdd, 0xf0, 0x84, 0xfd, 0x39,
+ 0xfa, 0x15, 0x5d, 0xef, 0xae, 0xca, 0xc1, 0x36,
+ 0xa7, 0x5c, 0xbf, 0xc7, 0x08, 0xc2, 0x66, 0x00,
+ 0x74, 0x74, 0x4e, 0x27, 0x3f, 0x55, 0x8a, 0xb7,
+ 0x38, 0x66, 0x83, 0x6d, 0xcf, 0x99, 0x9e, 0x60,
+ 0x8f, 0xdd, 0x2e, 0x62, 0x22, 0x0e, 0xef, 0x0c,
+ 0x98, 0xa7, 0x85, 0x74, 0x3b, 0x9d, 0xec, 0x9e,
+ 0xa9, 0x19, 0x72, 0xa5, 0x7f, 0x2c, 0x39, 0xb7,
+ 0x7d, 0xb7, 0xf1, 0x12, 0x65, 0x27, 0x4b, 0x5a,
+ 0xde, 0x17, 0xfe, 0xad, 0x44, 0xf3, 0x20, 0x4d,
+ 0xfd, 0xe4, 0x1f, 0xb5, 0x81, 0xb0, 0x36, 0x37,
+ 0x08, 0x6f, 0xc3, 0x0c, 0xe9, 0x85, 0x98, 0x82,
+ 0xa9, 0x62, 0x0c, 0xc4, 0x97, 0xc0, 0x50, 0xc8,
+ 0xa7, 0x3c, 0x50, 0x9f, 0x43, 0xb9, 0xcd, 0x5e,
+ 0x4d, 0xfa, 0x1c, 0x4b, 0x0b, 0xa9, 0x98, 0x85,
+ 0x38, 0x92, 0xac, 0x8d, 0xe4, 0xad, 0x9b, 0x98,
+ 0xab, 0xd9, 0x38, 0xac, 0x62, 0x52, 0xa3, 0x22,
+ 0x63, 0x0f, 0xbf, 0x95, 0x48, 0xdf, 0x69, 0xe7,
+ 0x8b, 0x33, 0xd5, 0xb2, 0xbd, 0x05, 0x49, 0x49,
+ 0x9d, 0x57, 0x73, 0x19, 0x33, 0xae, 0xfa, 0x33,
+ 0xf1, 0x19, 0xa8, 0x80, 0xce, 0x04, 0x9f, 0xbc,
+ 0x1d, 0x65, 0x82, 0x1b, 0xe5, 0x3a, 0x51, 0xc8,
+ 0x1c, 0x21, 0xe3, 0x5d, 0xf3, 0x7d, 0x9b, 0x2f,
+ 0x2c, 0x1d, 0x4a, 0x7f, 0x9b, 0x68, 0x35, 0xa3,
+ 0xb2, 0x50, 0xf7, 0x62, 0x79, 0xcd, 0xf4, 0x98,
+ 0x4f, 0xe5, 0x63, 0x7c, 0x3e, 0x45, 0x31, 0x8c,
+ 0x16, 0xa0, 0x12, 0xc8, 0x58, 0xce, 0x39, 0xa6,
+ 0xbc, 0x54, 0xdb, 0xc5, 0xe0, 0xd5, 0xba, 0xbc,
+ 0xb9, 0x04, 0xf4, 0x8d, 0xe8, 0x2f, 0x15, 0x9d,
+};
+
+/* 100 test cases */
+static struct crc_test {
+ u32 crc; /* random starting crc */
+ u32 start; /* random 6 bit offset in buf */
+ u32 length; /* random 11 bit length of test */
+ u32 crc_le; /* expected crc32_le result */
+ u32 crc_be; /* expected crc32_be result */
+} test[] =
{
- while (len--) {
- unsigned char x = bitrev8(*buf);
- *buf++ = x;
- }
-}
-
-static void random_garbage(unsigned char *buf, size_t len)
+ {0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1},
+ {0x35c672c6, 0x0000003a, 0x000001aa, 0xc6d3dfba, 0x28aaf3ad},
+ {0x496da28e, 0x00000039, 0x000005af, 0xd933660f, 0x5d57e81f},
+ {0x09a9b90e, 0x00000027, 0x000001f8, 0xb45fe007, 0xf45fca9a},
+ {0xdc97e5a9, 0x00000025, 0x000003b6, 0xf81a3562, 0xe0126ba2},
+ {0x47c58900, 0x0000000a, 0x000000b9, 0x8e58eccf, 0xf3afc793},
+ {0x292561e8, 0x0000000c, 0x00000403, 0xa2ba8aaf, 0x0b797aed},
+ {0x415037f6, 0x00000003, 0x00000676, 0xa17d52e8, 0x7f0fdf35},
+ {0x3466e707, 0x00000026, 0x00000042, 0x258319be, 0x75c484a2},
+ {0xafd1281b, 0x00000023, 0x000002ee, 0x4428eaf8, 0x06c7ad10},
+ {0xd3857b18, 0x00000028, 0x000004a2, 0x5c430821, 0xb062b7cb},
+ {0x1d825a8f, 0x0000002b, 0x0000050b, 0xd2c45f0c, 0xd68634e0},
+ {0x5033e3bc, 0x0000000b, 0x00000078, 0xa3ea4113, 0xac6d31fb},
+ {0x94f1fb5e, 0x0000000f, 0x000003a2, 0xfbfc50b1, 0x3cfe50ed},
+ {0xc9a0fe14, 0x00000009, 0x00000473, 0x5fb61894, 0x87070591},
+ {0x88a034b1, 0x0000001c, 0x000005ad, 0xc1b16053, 0x46f95c67},
+ {0xf0f72239, 0x00000020, 0x0000026d, 0xa6fa58f3, 0xf8c2c1dd},
+ {0xcc20a5e3, 0x0000003b, 0x0000067a, 0x7740185a, 0x308b979a},
+ {0xce589c95, 0x0000002b, 0x00000641, 0xd055e987, 0x40aae25b},
+ {0x78edc885, 0x00000035, 0x000005be, 0xa39cb14b, 0x035b0d1f},
+ {0x9d40a377, 0x0000003b, 0x00000038, 0x1f47ccd2, 0x197fbc9d},
+ {0x703d0e01, 0x0000003c, 0x000006f1, 0x88735e7c, 0xfed57c5a},
+ {0x776bf505, 0x0000000f, 0x000005b2, 0x5cc4fc01, 0xf32efb97},
+ {0x4a3e7854, 0x00000027, 0x000004b8, 0x8d923c82, 0x0cbfb4a2},
+ {0x209172dd, 0x0000003b, 0x00000356, 0xb89e9c2b, 0xd7868138},
+ {0x3ba4cc5b, 0x0000002f, 0x00000203, 0xe51601a9, 0x5b2a1032},
+ {0xfc62f297, 0x00000000, 0x00000079, 0x71a8e1a2, 0x5d88685f},
+ {0x64280b8b, 0x00000016, 0x000007ab, 0x0fa7a30c, 0xda3a455f},
+ {0x97dd724b, 0x00000033, 0x000007ad, 0x5788b2f4, 0xd7326d32},
+ {0x61394b52, 0x00000035, 0x00000571, 0xc66525f1, 0xcabe7fef},
+ {0x29b4faff, 0x00000024, 0x0000006e, 0xca13751e, 0x993648e0},
+ {0x29bfb1dc, 0x0000000b, 0x00000244, 0x436c43f7, 0x429f7a59},
+ {0x86ae934b, 0x00000035, 0x00000104, 0x0760ec93, 0x9cf7d0f4},
+ {0xc4c1024e, 0x0000002e, 0x000006b1, 0x6516a3ec, 0x19321f9c},
+ {0x3287a80a, 0x00000026, 0x00000496, 0x0b257eb1, 0x754ebd51},
+ {0xa4db423e, 0x00000023, 0x0000045d, 0x9b3a66dc, 0x873e9f11},
+ {0x7a1078df, 0x00000015, 0x0000014a, 0x8c2484c5, 0x6a628659},
+ {0x6048bd5b, 0x00000006, 0x0000006a, 0x897e3559, 0xac9961af},
+ {0xd8f9ea20, 0x0000003d, 0x00000277, 0x60eb905b, 0xed2aaf99},
+ {0xea5ec3b4, 0x0000002a, 0x000004fe, 0x869965dc, 0x6c1f833b},
+ {0x2dfb005d, 0x00000016, 0x00000345, 0x6a3b117e, 0xf05e8521},
+ {0x5a214ade, 0x00000020, 0x000005b6, 0x467f70be, 0xcb22ccd3},
+ {0xf0ab9cca, 0x00000032, 0x00000515, 0xed223df3, 0x7f3ef01d},
+ {0x91b444f9, 0x0000002e, 0x000007f8, 0x84e9a983, 0x5676756f},
+ {0x1b5d2ddb, 0x0000002e, 0x0000012c, 0xba638c4c, 0x3f42047b},
+ {0xd824d1bb, 0x0000003a, 0x000007b5, 0x6288653b, 0x3a3ebea0},
+ {0x0470180c, 0x00000034, 0x000001f0, 0x9d5b80d6, 0x3de08195},
+ {0xffaa3a3f, 0x00000036, 0x00000299, 0xf3a82ab8, 0x53e0c13d},
+ {0x6406cfeb, 0x00000023, 0x00000600, 0xa920b8e8, 0xe4e2acf4},
+ {0xb24aaa38, 0x0000003e, 0x000004a1, 0x657cc328, 0x5077b2c3},
+ {0x58b2ab7c, 0x00000039, 0x000002b4, 0x3a17ee7e, 0x9dcb3643},
+ {0x3db85970, 0x00000006, 0x000002b6, 0x95268b59, 0xb9812c10},
+ {0x857830c5, 0x00000003, 0x00000590, 0x4ef439d5, 0xf042161d},
+ {0xe1fcd978, 0x0000003e, 0x000007d8, 0xae8d8699, 0xce0a1ef5},
+ {0xb982a768, 0x00000016, 0x000006e0, 0x62fad3df, 0x5f8a067b},
+ {0x1d581ce8, 0x0000001e, 0x0000058b, 0xf0f5da53, 0x26e39eee},
+ {0x2456719b, 0x00000025, 0x00000503, 0x4296ac64, 0xd50e4c14},
+ {0xfae6d8f2, 0x00000000, 0x0000055d, 0x057fdf2e, 0x2a31391a},
+ {0xcba828e3, 0x00000039, 0x000002ce, 0xe3f22351, 0x8f00877b},
+ {0x13d25952, 0x0000000a, 0x0000072d, 0x76d4b4cc, 0x5eb67ec3},
+ {0x0342be3f, 0x00000015, 0x00000599, 0xec75d9f1, 0x9d4d2826},
+ {0xeaa344e0, 0x00000014, 0x000004d8, 0x72a4c981, 0x2064ea06},
+ {0xbbb52021, 0x0000003b, 0x00000272, 0x04af99fc, 0xaf042d35},
+ {0xb66384dc, 0x0000001d, 0x000007fc, 0xd7629116, 0x782bd801},
+ {0x616c01b6, 0x00000022, 0x000002c8, 0x5b1dab30, 0x783ce7d2},
+ {0xce2bdaad, 0x00000016, 0x0000062a, 0x932535c8, 0x3f02926d},
+ {0x00fe84d7, 0x00000005, 0x00000205, 0x850e50aa, 0x753d649c},
+ {0xbebdcb4c, 0x00000006, 0x0000055d, 0xbeaa37a2, 0x2d8c9eba},
+ {0xd8b1a02a, 0x00000010, 0x00000387, 0x5017d2fc, 0x503541a5},
+ {0x3b96cad2, 0x00000036, 0x00000347, 0x1d2372ae, 0x926cd90b},
+ {0xc94c1ed7, 0x00000005, 0x0000038b, 0x9e9fdb22, 0x144a9178},
+ {0x1aad454e, 0x00000025, 0x000002b2, 0xc3f6315c, 0x5c7a35b3},
+ {0xa4fec9a6, 0x00000000, 0x000006d6, 0x90be5080, 0xa4107605},
+ {0x1bbe71e2, 0x0000001f, 0x000002fd, 0x4e504c3b, 0x284ccaf1},
+ {0x4201c7e4, 0x00000002, 0x000002b7, 0x7822e3f9, 0x0cc912a9},
+ {0x23fddc96, 0x00000003, 0x00000627, 0x8a385125, 0x07767e78},
+ {0xd82ba25c, 0x00000016, 0x0000063e, 0x98e4148a, 0x283330c9},
+ {0x786f2032, 0x0000002d, 0x0000060f, 0xf201600a, 0xf561bfcd},
+ {0xfebe4e1f, 0x0000002a, 0x000004f2, 0x95e51961, 0xfd80dcab},
+ {0x1a6e0a39, 0x00000008, 0x00000672, 0x8af6c2a5, 0x78dd84cb},
+ {0x56000ab8, 0x0000000e, 0x000000e5, 0x36bacb8f, 0x22ee1f77},
+ {0x4717fe0c, 0x00000000, 0x000006ec, 0x8439f342, 0x5c8e03da},
+ {0xd5d5d68e, 0x0000003c, 0x000003a3, 0x46fff083, 0x177d1b39},
+ {0xc25dd6c6, 0x00000024, 0x000006c0, 0x5ceb8eb4, 0x892b0d16},
+ {0xe9b11300, 0x00000023, 0x00000683, 0x07a5d59a, 0x6c6a3208},
+ {0x95cd285e, 0x00000001, 0x00000047, 0x7b3a4368, 0x0202c07e},
+ {0xd9245a25, 0x0000001e, 0x000003a6, 0xd33c1841, 0x1936c0d5},
+ {0x103279db, 0x00000006, 0x0000039b, 0xca09b8a0, 0x77d62892},
+ {0x1cba3172, 0x00000027, 0x000001c8, 0xcb377194, 0xebe682db},
+ {0x8f613739, 0x0000000c, 0x000001df, 0xb4b0bc87, 0x7710bd43},
+ {0x1c6aa90d, 0x0000001b, 0x0000053c, 0x70559245, 0xda7894ac},
+ {0xaabe5b93, 0x0000003d, 0x00000715, 0xcdbf42fa, 0x0c3b99e7},
+ {0xf15dd038, 0x00000006, 0x000006db, 0x6e104aea, 0x8d5967f2},
+ {0x584dd49c, 0x00000020, 0x000007bc, 0x36b6cfd6, 0xad4e23b2},
+ {0x5d8c9506, 0x00000020, 0x00000470, 0x4c62378e, 0x31d92640},
+ {0xb80d17b0, 0x00000032, 0x00000346, 0x22a5bb88, 0x9a7ec89f},
+ {0xdaf0592e, 0x00000023, 0x000007b0, 0x3cab3f99, 0x9b1fdd99},
+ {0x4793cc85, 0x0000000d, 0x00000706, 0xe82e04f6, 0xed3db6b7},
+ {0x82ebf64e, 0x00000009, 0x000007c3, 0x69d590a9, 0x9efa8499},
+ {0xb18a0319, 0x00000026, 0x000007db, 0x1cf98dcc, 0x8fa9ad6a},
+};
+
+#include <linux/time.h>
+
+static int __init crc32_init(void)
{
- while (len--)
- *buf++ = (unsigned char) random();
-}
+ int i;
+ int errors = 0;
+ int bytes = 0;
+ struct timespec start, stop;
+ u64 nsec;
+ unsigned long flags;

-#if 0 /* Not used at present */
-static void store_le(u32 x, unsigned char *buf)
-{
- buf[0] = (unsigned char) x;
- buf[1] = (unsigned char) (x >> 8);
- buf[2] = (unsigned char) (x >> 16);
- buf[3] = (unsigned char) (x >> 24);
-}
-#endif
+ /* keep static to prevent cache warming code from
+ * getting eliminated by the compiler */
+ static u32 crc;

-static void store_be(u32 x, unsigned char *buf)
-{
- buf[0] = (unsigned char) (x >> 24);
- buf[1] = (unsigned char) (x >> 16);
- buf[2] = (unsigned char) (x >> 8);
- buf[3] = (unsigned char) x;
-}
+ /* pre-warm the cache */
+ for (i = 0; i < 100; i++) {
+ bytes += 2*test[i].length;

-/*
- * This checks that CRC(buf + CRC(buf)) = 0, and that
- * CRC commutes with bit-reversal. This has the side effect
- * of bytewise bit-reversing the input buffer, and returns
- * the CRC of the reversed buffer.
- */
-static u32 test_step(u32 init, unsigned char *buf, size_t len)
-{
- u32 crc1, crc2;
- size_t i;
-
- crc1 = crc32_be(init, buf, len);
- store_be(crc1, buf + len);
- crc2 = crc32_be(init, buf, len + 4);
- if (crc2)
- printf("\nCRC cancellation fail: 0x%08x should be 0\n",
- crc2);
-
- for (i = 0; i <= len + 4; i++) {
- crc2 = crc32_be(init, buf, i);
- crc2 = crc32_be(crc2, buf + i, len + 4 - i);
- if (crc2)
- printf("\nCRC split fail: 0x%08x\n", crc2);
+ crc ^= crc32_le(test[i].crc, test_buf +
+ test[i].start, test[i].length);
+
+ crc ^= crc32_be(test[i].crc, test_buf +
+ test[i].start, test[i].length);
}

- /* Now swap it around for the other test */
-
- bytereverse(buf, len + 4);
- init = bitrev32(init);
- crc2 = bitrev32(crc1);
- if (crc1 != bitrev32(crc2))
- printf("\nBit reversal fail: 0x%08x -> 0x%08x -> 0x%08x\n",
- crc1, crc2, bitrev32(crc2));
- crc1 = crc32_le(init, buf, len);
- if (crc1 != crc2)
- printf("\nCRC endianness fail: 0x%08x != 0x%08x\n", crc1,
- crc2);
- crc2 = crc32_le(init, buf, len + 4);
- if (crc2)
- printf("\nCRC cancellation fail: 0x%08x should be 0\n",
- crc2);
-
- for (i = 0; i <= len + 4; i++) {
- crc2 = crc32_le(init, buf, i);
- crc2 = crc32_le(crc2, buf + i, len + 4 - i);
- if (crc2)
- printf("\nCRC split fail: 0x%08x\n", crc2);
+ /* reduce OS noise */
+ local_irq_save(flags);
+ local_irq_disable();
+
+ getnstimeofday(&start);
+ for (i = 0; i < 100; i++) {
+ if (test[i].crc_le != crc32_le(test[i].crc, test_buf +
+ test[i].start, test[i].length))
+ errors++;
+
+ if (test[i].crc_be != crc32_be(test[i].crc, test_buf +
+ test[i].start, test[i].length))
+ errors++;
}
+ getnstimeofday(&stop);

- return crc1;
-}
+ local_irq_restore(flags);
+ local_irq_enable();

-#define SIZE 64
-#define INIT1 0
-#define INIT2 0
+ nsec = stop.tv_nsec - start.tv_nsec +
+ 1000000000 * (stop.tv_sec - start.tv_sec);

-int main(void)
-{
- unsigned char buf1[SIZE + 4];
- unsigned char buf2[SIZE + 4];
- unsigned char buf3[SIZE + 4];
- int i, j;
- u32 crc1, crc2, crc3;
-
- for (i = 0; i <= SIZE; i++) {
- printf("\rTesting length %d...", i);
- fflush(stdout);
- random_garbage(buf1, i);
- random_garbage(buf2, i);
- for (j = 0; j < i; j++)
- buf3[j] = buf1[j] ^ buf2[j];
-
- crc1 = test_step(INIT1, buf1, i);
- crc2 = test_step(INIT2, buf2, i);
- /* Now check that CRC(buf1 ^ buf2) = CRC(buf1) ^ CRC(buf2) */
- crc3 = test_step(INIT1 ^ INIT2, buf3, i);
- if (crc3 != (crc1 ^ crc2))
- printf("CRC XOR fail: 0x%08x != 0x%08x ^ 0x%08x\n",
- crc3, crc1, crc2);
+ pr_info("crc32: CRC_LE_BITS = %d, CRC_BE BITS = %d\n",
+ CRC_LE_BITS, CRC_BE_BITS);
+
+ if (errors)
+ pr_warn("crc32: %d self tests failed\n", errors);
+ else {
+ pr_info("crc32: self tests passed, processed %d bytes in %lld nsec\n",
+ bytes, nsec);
}
- printf("\nAll test complete. No failures expected.\n");
+
return 0;
}

-#endif /* UNITTEST */
+static void __exit crc32_exit(void)
+{
+}
+
+module_init(crc32_init);
+module_exit(crc32_exit);
+#endif /* CONFIG_CRC32_SELFTEST */


2011-12-01 20:14:40

by djwong

[permalink] [raw]
Subject: [PATCH 07/14] crc32: Make CRC_*_BITS definition correspond to actual bit counts

crc32.c provides a choice of one of several algorithms for
computing the LSB and LSB versions of the CRC32 checksum
based on the parameters CRC_LE_BITS and CRC_BE_BITS. In the
original version the values 1, 2, 4 and 8 respectively selected
versions of the alrogithm that computed the crc 1, 2, 4 and 32
bits as a time. This patch series adds a new version that computes
the CRC 64 bits at a time. To make things easier to understand
the parameter has been reinterpreted to actually stand for the
number of bits processed in each step of the algorithm so that
the old value 8 has been replaced with the value 32. This also
allows us to add in a widely used crc algorithm that
computes the crc 8 bits at a time called the Sarwate algorithm.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 17 ++++++++++++++---
lib/crc32defs.h | 18 ++++++++++--------
lib/gen_crc32table.c | 11 ++++++++++-
3 files changed, 34 insertions(+), 12 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index ff6bb9a..157b35f 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -27,13 +27,13 @@
#include <linux/types.h>
#include "crc32defs.h"

-#if CRC_LE_BITS == 8
+#if CRC_LE_BITS > 8
# define tole(x) (__force u32) __constant_cpu_to_le32(x)
#else
# define tole(x) (x)
#endif

-#if CRC_BE_BITS == 8
+#if CRC_BE_BITS > 8
# define tobe(x) (__force u32) __constant_cpu_to_be32(x)
#else
# define tobe(x) (x)
@@ -45,7 +45,7 @@ MODULE_AUTHOR("Matt Domsch <[email protected]>");
MODULE_DESCRIPTION("Ethernet CRC32 calculations");
MODULE_LICENSE("GPL");

-#if CRC_LE_BITS == 8 || CRC_BE_BITS == 8
+#if CRC_LE_BITS > 8 || CRC_BE_BITS > 8

static inline u32
crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
@@ -126,6 +126,12 @@ u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
crc = (crc >> 4) ^ crc32table_le[0][crc & 15];
}
# elif CRC_LE_BITS == 8
+ /* aka Sarwate algorithm */
+ while (len--) {
+ crc ^= *p++;
+ crc = (crc >> 8) ^ crc32table_le[0][crc & 255];
+ }
+# else
const u32 (*tab)[] = crc32table_le;

crc = (__force u32) __cpu_to_le32(crc);
@@ -169,6 +175,11 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
crc = (crc << 4) ^ crc32table_be[0][crc >> 28];
}
# elif CRC_BE_BITS == 8
+ while (len--) {
+ crc ^= *p++ << 24;
+ crc = (crc << 8) ^ crc32table_be[0][crc >> 24];
+ }
+# else
const u32 (*tab)[] = crc32table_be;

crc = (__force u32) __cpu_to_be32(crc);
diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index f5a5401..daa3a5e 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -6,27 +6,29 @@
#define CRCPOLY_LE 0xedb88320
#define CRCPOLY_BE 0x04c11db7

-/* How many bits at a time to use. Requires a table of 4<<CRC_xx_BITS bytes. */
-/* For less performance-sensitive, use 4 */
+/* How many bits at a time to use. Valid values are 1, 2, 4, 8, and 32. */
+/* For less performance-sensitive, use 4 or 8 */
#ifndef CRC_LE_BITS
-# define CRC_LE_BITS 8
+# define CRC_LE_BITS 32
#endif
#ifndef CRC_BE_BITS
-# define CRC_BE_BITS 8
+# define CRC_BE_BITS 32
#endif

/*
* Little-endian CRC computation. Used with serial bit streams sent
* lsbit-first. Be sure to use cpu_to_le32() to append the computed CRC.
*/
-#if CRC_LE_BITS > 8 || CRC_LE_BITS < 1 || CRC_LE_BITS & CRC_LE_BITS-1
-# error CRC_LE_BITS must be a power of 2 between 1 and 8
+#if CRC_LE_BITS > 32 || CRC_LE_BITS < 1 || CRC_LE_BITS == 16 || \
+ CRC_LE_BITS & CRC_LE_BITS-1
+# error "CRC_LE_BITS must be one of {1, 2, 4, 8, 32}"
#endif

/*
* Big-endian CRC computation. Used with serial bit streams sent
* msbit-first. Be sure to use cpu_to_be32() to append the computed CRC.
*/
-#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
-# error CRC_BE_BITS must be a power of 2 between 1 and 8
+#if CRC_BE_BITS > 32 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
+ CRC_BE_BITS & CRC_BE_BITS-1
+# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32}"
#endif
diff --git a/lib/gen_crc32table.c b/lib/gen_crc32table.c
index eced769..99ac744 100644
--- a/lib/gen_crc32table.c
+++ b/lib/gen_crc32table.c
@@ -4,8 +4,17 @@

#define ENTRIES_PER_LINE 4

+#if CRC_LE_BITS <= 8
#define LE_TABLE_SIZE (1 << CRC_LE_BITS)
+#else
+#define LE_TABLE_SIZE 256
+#endif
+
+#if CRC_BE_BITS <= 8
#define BE_TABLE_SIZE (1 << CRC_BE_BITS)
+#else
+#define BE_TABLE_SIZE 256
+#endif

static uint32_t crc32table_le[4][256];
static uint32_t crc32table_be[4][256];
@@ -24,7 +33,7 @@ static void crc32init_le(void)

crc32table_le[0][0] = 0;

- for (i = 1 << (CRC_LE_BITS - 1); i; i >>= 1) {
+ for (i = LE_TABLE_SIZE >> 1; i; i >>= 1) {
crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
for (j = 0; j < LE_TABLE_SIZE; j += 2 * i)
crc32table_le[0][i + j] = crc ^ crc32table_le[0][j];

2011-12-01 20:14:24

by djwong

[permalink] [raw]
Subject: [PATCH 06/14] crc32: Fix mixing of endian-specific types

crc32.c in its original version freely mixed u32, __le32 and __be32 types
which caused warnings from sparse with __CHECK_ENDIAN__.
This patch fixes these by forcing the types to u32.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 2a87ea2..ff6bb9a 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -28,13 +28,13 @@
#include "crc32defs.h"

#if CRC_LE_BITS == 8
-# define tole(x) __constant_cpu_to_le32(x)
+# define tole(x) (__force u32) __constant_cpu_to_le32(x)
#else
# define tole(x) (x)
#endif

#if CRC_BE_BITS == 8
-# define tobe(x) __constant_cpu_to_be32(x)
+# define tobe(x) (__force u32) __constant_cpu_to_be32(x)
#else
# define tobe(x) (x)
#endif
@@ -128,9 +128,9 @@ u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
# elif CRC_LE_BITS == 8
const u32 (*tab)[] = crc32table_le;

- crc = __cpu_to_le32(crc);
+ crc = (__force u32) __cpu_to_le32(crc);
crc = crc32_body(crc, p, len, tab);
- crc = __le32_to_cpu(crc);
+ crc = __le32_to_cpu((__force __le32)crc);
#endif
return crc;
}
@@ -171,9 +171,9 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
# elif CRC_BE_BITS == 8
const u32 (*tab)[] = crc32table_be;

- crc = __cpu_to_be32(crc);
+ crc = (__force u32) __cpu_to_be32(crc);
crc = crc32_body(crc, p, len, tab);
- crc = __be32_to_cpu(crc);
+ crc = __be32_to_cpu((__force __be32)crc);
# endif
return crc;
}

2011-12-01 20:14:37

by djwong

[permalink] [raw]
Subject: [PATCH 08/14] crc32: Add slice-by-8 algorithm to existing code

add slicing-by-8 algorithm to the existing
slicing-by-4 algorithm. This consists of:
- extend largest BITS size from 32 to 64
- extend tables from tab[4][256] to up to tab[8][256]
- Add code for inner loop.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 40 ++++++++++++++++++++++++++++------------
lib/crc32defs.h | 29 +++++++++++++++++++++--------
lib/gen_crc32table.c | 43 +++++++++++++++++++++++++++----------------
3 files changed, 76 insertions(+), 36 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 157b35f..6311712 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -47,25 +47,28 @@ MODULE_LICENSE("GPL");

#if CRC_LE_BITS > 8 || CRC_BE_BITS > 8

+/* implements slicing-by-4 or slicing-by-8 algorithm */
static inline u32
crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
{
# ifdef __LITTLE_ENDIAN
# define DO_CRC(x) (crc = t0[(crc ^ (x)) & 255] ^ (crc >> 8))
-# define DO_CRC4 crc = t3[(crc) & 255] ^ \
- t2[(crc >> 8) & 255] ^ \
- t1[(crc >> 16) & 255] ^ \
- t0[(crc >> 24) & 255]
+# define DO_CRC4 (t3[(q) & 255] ^ t2[(q >> 8) & 255] ^ \
+ t1[(q >> 16) & 255] ^ t0[(q >> 24) & 255])
+# define DO_CRC8 (t7[(q) & 255] ^ t6[(q >> 8) & 255] ^ \
+ t5[(q >> 16) & 255] ^ t4[(q >> 24) & 255])
# else
# define DO_CRC(x) (crc = t0[((crc >> 24) ^ (x)) & 255] ^ (crc << 8))
-# define DO_CRC4 crc = t0[(crc) & 255] ^ \
- t1[(crc >> 8) & 255] ^ \
- t2[(crc >> 16) & 255] ^ \
- t3[(crc >> 24) & 255]
+# define DO_CRC4 (t0[(q) & 255] ^ t1[(q >> 8) & 255] ^ \
+ t2[(q >> 16) & 255] ^ t3[(q >> 24) & 255])
+# define DO_CRC8 (t4[(q) & 255] ^ t5[(q >> 8) & 255] ^ \
+ t6[(q >> 16) & 255] ^ t7[(q >> 24) & 255])
# endif
const u32 *b;
- size_t rem_len;
+ size_t rem_len;
const u32 *t0 = tab[0], *t1 = tab[1], *t2 = tab[2], *t3 = tab[3];
+ const u32 *t4 = tab[4], *t5 = tab[5], *t6 = tab[6], *t7 = tab[7];
+ u32 q;

/* Align it */
if (unlikely((long)buf & 3 && len)) {
@@ -73,13 +76,25 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
DO_CRC(*buf++);
} while ((--len) && ((long)buf)&3);
}
+
+# if CRC_LE_BITS == 32
rem_len = len & 3;
- /* load data 32 bits wide, xor data 32 bits wide. */
len = len >> 2;
+# else
+ rem_len = len & 7;
+ len = len >> 3;
+# endif
+
b = (const u32 *)buf;
for (--b; len; --len) {
- crc ^= *++b; /* use pre increment for speed */
- DO_CRC4;
+ q = crc ^ *++b; /* use pre increment for speed */
+# if CRC_LE_BITS == 32
+ crc = DO_CRC4;
+# else
+ crc = DO_CRC8;
+ q = *++b;
+ crc ^= DO_CRC4;
+# endif
}
len = rem_len;
/* And the last few bytes */
@@ -92,6 +107,7 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
return crc;
#undef DO_CRC
#undef DO_CRC4
+#undef DO_CRC8
}
#endif

diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index daa3a5e..8181592 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -6,29 +6,42 @@
#define CRCPOLY_LE 0xedb88320
#define CRCPOLY_BE 0x04c11db7

-/* How many bits at a time to use. Valid values are 1, 2, 4, 8, and 32. */
-/* For less performance-sensitive, use 4 or 8 */
+/*
+ * How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and 64.
+ * For less performance-sensitive, use 4 or 8 to save table size.
+ * For larger systems choose same as CPU architecture as default.
+ * This works well on X86_64, SPARC64 systems. This may require some
+ * elaboration after experiments with other architectures.
+ */
#ifndef CRC_LE_BITS
-# define CRC_LE_BITS 32
+# ifdef CONFIG_64BIT
+# define CRC_LE_BITS 64
+# else
+# define CRC_LE_BITS 32
+# endif
#endif
#ifndef CRC_BE_BITS
-# define CRC_BE_BITS 32
+# ifdef CONFIG_64BIT
+# define CRC_BE_BITS 64
+# else
+# define CRC_BE_BITS 32
+# endif
#endif

/*
* Little-endian CRC computation. Used with serial bit streams sent
* lsbit-first. Be sure to use cpu_to_le32() to append the computed CRC.
*/
-#if CRC_LE_BITS > 32 || CRC_LE_BITS < 1 || CRC_LE_BITS == 16 || \
+#if CRC_LE_BITS > 64 || CRC_LE_BITS < 1 || CRC_LE_BITS == 16 || \
CRC_LE_BITS & CRC_LE_BITS-1
-# error "CRC_LE_BITS must be one of {1, 2, 4, 8, 32}"
+# error "CRC_LE_BITS must be one of {1, 2, 4, 8, 32, 64}"
#endif

/*
* Big-endian CRC computation. Used with serial bit streams sent
* msbit-first. Be sure to use cpu_to_be32() to append the computed CRC.
*/
-#if CRC_BE_BITS > 32 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
+#if CRC_BE_BITS > 64 || CRC_BE_BITS < 1 || CRC_BE_BITS == 16 || \
CRC_BE_BITS & CRC_BE_BITS-1
-# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32}"
+# error "CRC_BE_BITS must be one of {1, 2, 4, 8, 32, 64}"
#endif
diff --git a/lib/gen_crc32table.c b/lib/gen_crc32table.c
index 99ac744..0d9edd1 100644
--- a/lib/gen_crc32table.c
+++ b/lib/gen_crc32table.c
@@ -1,23 +1,28 @@
#include <stdio.h>
+#include "../include/generated/autoconf.h"
#include "crc32defs.h"
#include <inttypes.h>

#define ENTRIES_PER_LINE 4

-#if CRC_LE_BITS <= 8
-#define LE_TABLE_SIZE (1 << CRC_LE_BITS)
+#if CRC_LE_BITS > 8
+# define LE_TABLE_ROWS (CRC_LE_BITS/8)
+# define LE_TABLE_SIZE 256
#else
-#define LE_TABLE_SIZE 256
+# define LE_TABLE_ROWS 1
+# define LE_TABLE_SIZE (1 << CRC_LE_BITS)
#endif

-#if CRC_BE_BITS <= 8
-#define BE_TABLE_SIZE (1 << CRC_BE_BITS)
+#if CRC_BE_BITS > 8
+# define BE_TABLE_ROWS (CRC_BE_BITS/8)
+# define BE_TABLE_SIZE 256
#else
-#define BE_TABLE_SIZE 256
+# define BE_TABLE_ROWS 1
+# define BE_TABLE_SIZE (1 << CRC_BE_BITS)
#endif

-static uint32_t crc32table_le[4][256];
-static uint32_t crc32table_be[4][256];
+static uint32_t crc32table_le[LE_TABLE_ROWS][256];
+static uint32_t crc32table_be[BE_TABLE_ROWS][256];

/**
* crc32init_le() - allocate and initialize LE table data
@@ -40,7 +45,7 @@ static void crc32init_le(void)
}
for (i = 0; i < LE_TABLE_SIZE; i++) {
crc = crc32table_le[0][i];
- for (j = 1; j < 4; j++) {
+ for (j = 1; j < LE_TABLE_ROWS; j++) {
crc = crc32table_le[0][crc & 0xff] ^ (crc >> 8);
crc32table_le[j][i] = crc;
}
@@ -64,18 +69,18 @@ static void crc32init_be(void)
}
for (i = 0; i < BE_TABLE_SIZE; i++) {
crc = crc32table_be[0][i];
- for (j = 1; j < 4; j++) {
+ for (j = 1; j < BE_TABLE_ROWS; j++) {
crc = crc32table_be[0][(crc >> 24) & 0xff] ^ (crc << 8);
crc32table_be[j][i] = crc;
}
}
}

-static void output_table(uint32_t (*table)[256], int len, char *trans)
+static void output_table(uint32_t (*table)[256], int rows, int len, char *trans)
{
int i, j;

- for (j = 0 ; j < 4; j++) {
+ for (j = 0 ; j < rows; j++) {
printf("{");
for (i = 0; i < len - 1; i++) {
if (i % ENTRIES_PER_LINE == 0)
@@ -92,15 +97,21 @@ int main(int argc, char** argv)

if (CRC_LE_BITS > 1) {
crc32init_le();
- printf("static const u32 crc32table_le[4][256] = {");
- output_table(crc32table_le, LE_TABLE_SIZE, "tole");
+ printf("static const u32 __cacheline_aligned "
+ "crc32table_le[%d][%d] = {",
+ LE_TABLE_ROWS, LE_TABLE_SIZE);
+ output_table(crc32table_le, LE_TABLE_ROWS,
+ LE_TABLE_SIZE, "tole");
printf("};\n");
}

if (CRC_BE_BITS > 1) {
crc32init_be();
- printf("static const u32 crc32table_be[4][256] = {");
- output_table(crc32table_be, BE_TABLE_SIZE, "tobe");
+ printf("static const u32 __cacheline_aligned "
+ "crc32table_be[%d][%d] = {",
+ BE_TABLE_ROWS, BE_TABLE_SIZE);
+ output_table(crc32table_be, LE_TABLE_ROWS,
+ BE_TABLE_SIZE, "tobe");
printf("};\n");
}


2011-12-01 20:15:03

by djwong

[permalink] [raw]
Subject: [PATCH 12/14] crypto: crc32c should use library implementation

Since lib/crc32.c now provides crc32c, remove the software implementation here
and call the library function instead.

Signed-off-by: Darrick J. Wong <[email protected]>
---
crypto/Kconfig | 1 +
crypto/crc32c.c | 94 ++-----------------------------------------------------
2 files changed, 4 insertions(+), 91 deletions(-)


diff --git a/crypto/Kconfig b/crypto/Kconfig
index 527a857..4c9e93a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -310,6 +310,7 @@ comment "Digest"
config CRYPTO_CRC32C
tristate "CRC32c CRC algorithm"
select CRYPTO_HASH
+ select CRC32
help
Castagnoli, et al Cyclic Redundancy-Check Algorithm. Used
by iSCSI for header and data digests and by others.
diff --git a/crypto/crc32c.c b/crypto/crc32c.c
index 3f9ad28..06f7018 100644
--- a/crypto/crc32c.c
+++ b/crypto/crc32c.c
@@ -40,6 +40,7 @@
#include <linux/module.h>
#include <linux/string.h>
#include <linux/kernel.h>
+#include <linux/crc32.h>

#define CHKSUM_BLOCK_SIZE 1
#define CHKSUM_DIGEST_SIZE 4
@@ -53,95 +54,6 @@ struct chksum_desc_ctx {
};

/*
- * This is the CRC-32C table
- * Generated with:
- * width = 32 bits
- * poly = 0x1EDC6F41
- * reflect input bytes = true
- * reflect output bytes = true
- */
-
-static const u32 crc32c_table[256] = {
- 0x00000000L, 0xF26B8303L, 0xE13B70F7L, 0x1350F3F4L,
- 0xC79A971FL, 0x35F1141CL, 0x26A1E7E8L, 0xD4CA64EBL,
- 0x8AD958CFL, 0x78B2DBCCL, 0x6BE22838L, 0x9989AB3BL,
- 0x4D43CFD0L, 0xBF284CD3L, 0xAC78BF27L, 0x5E133C24L,
- 0x105EC76FL, 0xE235446CL, 0xF165B798L, 0x030E349BL,
- 0xD7C45070L, 0x25AFD373L, 0x36FF2087L, 0xC494A384L,
- 0x9A879FA0L, 0x68EC1CA3L, 0x7BBCEF57L, 0x89D76C54L,
- 0x5D1D08BFL, 0xAF768BBCL, 0xBC267848L, 0x4E4DFB4BL,
- 0x20BD8EDEL, 0xD2D60DDDL, 0xC186FE29L, 0x33ED7D2AL,
- 0xE72719C1L, 0x154C9AC2L, 0x061C6936L, 0xF477EA35L,
- 0xAA64D611L, 0x580F5512L, 0x4B5FA6E6L, 0xB93425E5L,
- 0x6DFE410EL, 0x9F95C20DL, 0x8CC531F9L, 0x7EAEB2FAL,
- 0x30E349B1L, 0xC288CAB2L, 0xD1D83946L, 0x23B3BA45L,
- 0xF779DEAEL, 0x05125DADL, 0x1642AE59L, 0xE4292D5AL,
- 0xBA3A117EL, 0x4851927DL, 0x5B016189L, 0xA96AE28AL,
- 0x7DA08661L, 0x8FCB0562L, 0x9C9BF696L, 0x6EF07595L,
- 0x417B1DBCL, 0xB3109EBFL, 0xA0406D4BL, 0x522BEE48L,
- 0x86E18AA3L, 0x748A09A0L, 0x67DAFA54L, 0x95B17957L,
- 0xCBA24573L, 0x39C9C670L, 0x2A993584L, 0xD8F2B687L,
- 0x0C38D26CL, 0xFE53516FL, 0xED03A29BL, 0x1F682198L,
- 0x5125DAD3L, 0xA34E59D0L, 0xB01EAA24L, 0x42752927L,
- 0x96BF4DCCL, 0x64D4CECFL, 0x77843D3BL, 0x85EFBE38L,
- 0xDBFC821CL, 0x2997011FL, 0x3AC7F2EBL, 0xC8AC71E8L,
- 0x1C661503L, 0xEE0D9600L, 0xFD5D65F4L, 0x0F36E6F7L,
- 0x61C69362L, 0x93AD1061L, 0x80FDE395L, 0x72966096L,
- 0xA65C047DL, 0x5437877EL, 0x4767748AL, 0xB50CF789L,
- 0xEB1FCBADL, 0x197448AEL, 0x0A24BB5AL, 0xF84F3859L,
- 0x2C855CB2L, 0xDEEEDFB1L, 0xCDBE2C45L, 0x3FD5AF46L,
- 0x7198540DL, 0x83F3D70EL, 0x90A324FAL, 0x62C8A7F9L,
- 0xB602C312L, 0x44694011L, 0x5739B3E5L, 0xA55230E6L,
- 0xFB410CC2L, 0x092A8FC1L, 0x1A7A7C35L, 0xE811FF36L,
- 0x3CDB9BDDL, 0xCEB018DEL, 0xDDE0EB2AL, 0x2F8B6829L,
- 0x82F63B78L, 0x709DB87BL, 0x63CD4B8FL, 0x91A6C88CL,
- 0x456CAC67L, 0xB7072F64L, 0xA457DC90L, 0x563C5F93L,
- 0x082F63B7L, 0xFA44E0B4L, 0xE9141340L, 0x1B7F9043L,
- 0xCFB5F4A8L, 0x3DDE77ABL, 0x2E8E845FL, 0xDCE5075CL,
- 0x92A8FC17L, 0x60C37F14L, 0x73938CE0L, 0x81F80FE3L,
- 0x55326B08L, 0xA759E80BL, 0xB4091BFFL, 0x466298FCL,
- 0x1871A4D8L, 0xEA1A27DBL, 0xF94AD42FL, 0x0B21572CL,
- 0xDFEB33C7L, 0x2D80B0C4L, 0x3ED04330L, 0xCCBBC033L,
- 0xA24BB5A6L, 0x502036A5L, 0x4370C551L, 0xB11B4652L,
- 0x65D122B9L, 0x97BAA1BAL, 0x84EA524EL, 0x7681D14DL,
- 0x2892ED69L, 0xDAF96E6AL, 0xC9A99D9EL, 0x3BC21E9DL,
- 0xEF087A76L, 0x1D63F975L, 0x0E330A81L, 0xFC588982L,
- 0xB21572C9L, 0x407EF1CAL, 0x532E023EL, 0xA145813DL,
- 0x758FE5D6L, 0x87E466D5L, 0x94B49521L, 0x66DF1622L,
- 0x38CC2A06L, 0xCAA7A905L, 0xD9F75AF1L, 0x2B9CD9F2L,
- 0xFF56BD19L, 0x0D3D3E1AL, 0x1E6DCDEEL, 0xEC064EEDL,
- 0xC38D26C4L, 0x31E6A5C7L, 0x22B65633L, 0xD0DDD530L,
- 0x0417B1DBL, 0xF67C32D8L, 0xE52CC12CL, 0x1747422FL,
- 0x49547E0BL, 0xBB3FFD08L, 0xA86F0EFCL, 0x5A048DFFL,
- 0x8ECEE914L, 0x7CA56A17L, 0x6FF599E3L, 0x9D9E1AE0L,
- 0xD3D3E1ABL, 0x21B862A8L, 0x32E8915CL, 0xC083125FL,
- 0x144976B4L, 0xE622F5B7L, 0xF5720643L, 0x07198540L,
- 0x590AB964L, 0xAB613A67L, 0xB831C993L, 0x4A5A4A90L,
- 0x9E902E7BL, 0x6CFBAD78L, 0x7FAB5E8CL, 0x8DC0DD8FL,
- 0xE330A81AL, 0x115B2B19L, 0x020BD8EDL, 0xF0605BEEL,
- 0x24AA3F05L, 0xD6C1BC06L, 0xC5914FF2L, 0x37FACCF1L,
- 0x69E9F0D5L, 0x9B8273D6L, 0x88D28022L, 0x7AB90321L,
- 0xAE7367CAL, 0x5C18E4C9L, 0x4F48173DL, 0xBD23943EL,
- 0xF36E6F75L, 0x0105EC76L, 0x12551F82L, 0xE03E9C81L,
- 0x34F4F86AL, 0xC69F7B69L, 0xD5CF889DL, 0x27A40B9EL,
- 0x79B737BAL, 0x8BDCB4B9L, 0x988C474DL, 0x6AE7C44EL,
- 0xBE2DA0A5L, 0x4C4623A6L, 0x5F16D052L, 0xAD7D5351L
-};
-
-/*
- * Steps through buffer one byte at at time, calculates reflected
- * crc using table.
- */
-
-static u32 crc32c(u32 crc, const u8 *data, unsigned int length)
-{
- while (length--)
- crc = crc32c_table[(crc ^ *data++) & 0xFFL] ^ (crc >> 8);
-
- return crc;
-}
-
-/*
* Steps through buffer one byte at at time, calculates reflected
* crc using table.
*/
@@ -179,7 +91,7 @@ static int chksum_update(struct shash_desc *desc, const u8 *data,
{
struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);

- ctx->crc = crc32c(ctx->crc, data, length);
+ ctx->crc = __crc32c_le(ctx->crc, data, length);
return 0;
}

@@ -193,7 +105,7 @@ static int chksum_final(struct shash_desc *desc, u8 *out)

static int __chksum_finup(u32 *crcp, const u8 *data, unsigned int len, u8 *out)
{
- *(__le32 *)out = ~cpu_to_le32(crc32c(*crcp, data, len));
+ *(__le32 *)out = ~cpu_to_le32(__crc32c_le(*crcp, data, len));
return 0;
}


2011-12-01 20:15:17

by djwong

[permalink] [raw]
Subject: [PATCH 14/14] crc32: Select an algorithm via kconfig

Allow the kernel builder to choose a crc32* algorithm for the kernel.

Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/Kconfig | 36 ++++++++++++++++++++++++++++++++++++
lib/crc32defs.h | 18 ++++++++++++++++++
2 files changed, 54 insertions(+), 0 deletions(-)


diff --git a/lib/Kconfig b/lib/Kconfig
index cfddafc..e9b9134 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -70,6 +70,42 @@ config CRC32_SELFTEST
and crc32_be over byte strings with random alignment and length
and computes the total elapsed time and number of bytes processed.

+choice
+ prompt "CRC32 implementation"
+ depends on CRC32
+ default CRC32_SLICEBY8
+
+config CRC32_SLICEBY8
+ bool "Slice by 8 bytes"
+ help
+ Calculate checksum 8 bytes at a time with a clever slicing algorithm.
+ This is the fastest algorithm, but comes with a 8KiB lookup table.
+ Most modern processors have enough cache that this shouldn't be
+ a problem.
+
+ If you don't know which to choose, choose this one.
+
+config CRC32_SLICEBY4
+ bool "Slice by 4 bytes"
+ help
+ Calculate checksum 4 bytes at a time with a clever slicing algorithm.
+ This is a bit slower than slice by 8, but has a smaller 4KiB lookup
+ table.
+
+config CRC32_SARWATE
+ bool "Sarwate's Algorithm (one byte at a time)"
+ help
+ Calculate checksum a byte at a time using Sarwate's algorithm. This
+ is not particularly fast, but has a small 256 byte lookup table.
+
+config CRC32_BIT
+ bool "Classic Algorithm (one bit at a time)"
+ help
+ Calculate checksum one bit at a time. This is VERY slow, but has
+ no lookup table. This is provided as a debugging option.
+
+endchoice
+
config CRC7
tristate "CRC7 functions"
help
diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index 6fd1917..64cba2c 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -13,6 +13,24 @@
*/
#define CRC32C_POLY_LE 0x82F63B78

+/* Try to choose an implementation variant via Kconfig */
+#ifdef CONFIG_CRC32_SLICEBY8
+# define CRC_LE_BITS 64
+# define CRC_BE_BITS 64
+#endif
+#ifdef CONFIG_CRC32_SLICEBY4
+# define CRC_LE_BITS 32
+# define CRC_BE_BITS 32
+#endif
+#ifdef CONFIG_CRC32_SARWATE
+# define CRC_LE_BITS 8
+# define CRC_BE_BITS 8
+#endif
+#ifdef CONFIG_CRC32_BIT
+# define CRC_LE_BITS 1
+# define CRC_BE_BITS 1
+#endif
+
/*
* How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and 64.
* For less performance-sensitive, use 4 or 8 to save table size.

2011-12-01 20:15:10

by djwong

[permalink] [raw]
Subject: [PATCH 13/14] crc32: Add self-test code for crc32c

Add self-test code for crc32c.

Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 363 ++++++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 261 insertions(+), 102 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 8df9561..382fa76 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -765,113 +765,265 @@ static struct crc_test {
u32 length; /* random 11 bit length of test */
u32 crc_le; /* expected crc32_le result */
u32 crc_be; /* expected crc32_be result */
+ u32 crc32c_le; /* expected crc32c_le result */
} test[] =
{
- {0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1},
- {0x35c672c6, 0x0000003a, 0x000001aa, 0xc6d3dfba, 0x28aaf3ad},
- {0x496da28e, 0x00000039, 0x000005af, 0xd933660f, 0x5d57e81f},
- {0x09a9b90e, 0x00000027, 0x000001f8, 0xb45fe007, 0xf45fca9a},
- {0xdc97e5a9, 0x00000025, 0x000003b6, 0xf81a3562, 0xe0126ba2},
- {0x47c58900, 0x0000000a, 0x000000b9, 0x8e58eccf, 0xf3afc793},
- {0x292561e8, 0x0000000c, 0x00000403, 0xa2ba8aaf, 0x0b797aed},
- {0x415037f6, 0x00000003, 0x00000676, 0xa17d52e8, 0x7f0fdf35},
- {0x3466e707, 0x00000026, 0x00000042, 0x258319be, 0x75c484a2},
- {0xafd1281b, 0x00000023, 0x000002ee, 0x4428eaf8, 0x06c7ad10},
- {0xd3857b18, 0x00000028, 0x000004a2, 0x5c430821, 0xb062b7cb},
- {0x1d825a8f, 0x0000002b, 0x0000050b, 0xd2c45f0c, 0xd68634e0},
- {0x5033e3bc, 0x0000000b, 0x00000078, 0xa3ea4113, 0xac6d31fb},
- {0x94f1fb5e, 0x0000000f, 0x000003a2, 0xfbfc50b1, 0x3cfe50ed},
- {0xc9a0fe14, 0x00000009, 0x00000473, 0x5fb61894, 0x87070591},
- {0x88a034b1, 0x0000001c, 0x000005ad, 0xc1b16053, 0x46f95c67},
- {0xf0f72239, 0x00000020, 0x0000026d, 0xa6fa58f3, 0xf8c2c1dd},
- {0xcc20a5e3, 0x0000003b, 0x0000067a, 0x7740185a, 0x308b979a},
- {0xce589c95, 0x0000002b, 0x00000641, 0xd055e987, 0x40aae25b},
- {0x78edc885, 0x00000035, 0x000005be, 0xa39cb14b, 0x035b0d1f},
- {0x9d40a377, 0x0000003b, 0x00000038, 0x1f47ccd2, 0x197fbc9d},
- {0x703d0e01, 0x0000003c, 0x000006f1, 0x88735e7c, 0xfed57c5a},
- {0x776bf505, 0x0000000f, 0x000005b2, 0x5cc4fc01, 0xf32efb97},
- {0x4a3e7854, 0x00000027, 0x000004b8, 0x8d923c82, 0x0cbfb4a2},
- {0x209172dd, 0x0000003b, 0x00000356, 0xb89e9c2b, 0xd7868138},
- {0x3ba4cc5b, 0x0000002f, 0x00000203, 0xe51601a9, 0x5b2a1032},
- {0xfc62f297, 0x00000000, 0x00000079, 0x71a8e1a2, 0x5d88685f},
- {0x64280b8b, 0x00000016, 0x000007ab, 0x0fa7a30c, 0xda3a455f},
- {0x97dd724b, 0x00000033, 0x000007ad, 0x5788b2f4, 0xd7326d32},
- {0x61394b52, 0x00000035, 0x00000571, 0xc66525f1, 0xcabe7fef},
- {0x29b4faff, 0x00000024, 0x0000006e, 0xca13751e, 0x993648e0},
- {0x29bfb1dc, 0x0000000b, 0x00000244, 0x436c43f7, 0x429f7a59},
- {0x86ae934b, 0x00000035, 0x00000104, 0x0760ec93, 0x9cf7d0f4},
- {0xc4c1024e, 0x0000002e, 0x000006b1, 0x6516a3ec, 0x19321f9c},
- {0x3287a80a, 0x00000026, 0x00000496, 0x0b257eb1, 0x754ebd51},
- {0xa4db423e, 0x00000023, 0x0000045d, 0x9b3a66dc, 0x873e9f11},
- {0x7a1078df, 0x00000015, 0x0000014a, 0x8c2484c5, 0x6a628659},
- {0x6048bd5b, 0x00000006, 0x0000006a, 0x897e3559, 0xac9961af},
- {0xd8f9ea20, 0x0000003d, 0x00000277, 0x60eb905b, 0xed2aaf99},
- {0xea5ec3b4, 0x0000002a, 0x000004fe, 0x869965dc, 0x6c1f833b},
- {0x2dfb005d, 0x00000016, 0x00000345, 0x6a3b117e, 0xf05e8521},
- {0x5a214ade, 0x00000020, 0x000005b6, 0x467f70be, 0xcb22ccd3},
- {0xf0ab9cca, 0x00000032, 0x00000515, 0xed223df3, 0x7f3ef01d},
- {0x91b444f9, 0x0000002e, 0x000007f8, 0x84e9a983, 0x5676756f},
- {0x1b5d2ddb, 0x0000002e, 0x0000012c, 0xba638c4c, 0x3f42047b},
- {0xd824d1bb, 0x0000003a, 0x000007b5, 0x6288653b, 0x3a3ebea0},
- {0x0470180c, 0x00000034, 0x000001f0, 0x9d5b80d6, 0x3de08195},
- {0xffaa3a3f, 0x00000036, 0x00000299, 0xf3a82ab8, 0x53e0c13d},
- {0x6406cfeb, 0x00000023, 0x00000600, 0xa920b8e8, 0xe4e2acf4},
- {0xb24aaa38, 0x0000003e, 0x000004a1, 0x657cc328, 0x5077b2c3},
- {0x58b2ab7c, 0x00000039, 0x000002b4, 0x3a17ee7e, 0x9dcb3643},
- {0x3db85970, 0x00000006, 0x000002b6, 0x95268b59, 0xb9812c10},
- {0x857830c5, 0x00000003, 0x00000590, 0x4ef439d5, 0xf042161d},
- {0xe1fcd978, 0x0000003e, 0x000007d8, 0xae8d8699, 0xce0a1ef5},
- {0xb982a768, 0x00000016, 0x000006e0, 0x62fad3df, 0x5f8a067b},
- {0x1d581ce8, 0x0000001e, 0x0000058b, 0xf0f5da53, 0x26e39eee},
- {0x2456719b, 0x00000025, 0x00000503, 0x4296ac64, 0xd50e4c14},
- {0xfae6d8f2, 0x00000000, 0x0000055d, 0x057fdf2e, 0x2a31391a},
- {0xcba828e3, 0x00000039, 0x000002ce, 0xe3f22351, 0x8f00877b},
- {0x13d25952, 0x0000000a, 0x0000072d, 0x76d4b4cc, 0x5eb67ec3},
- {0x0342be3f, 0x00000015, 0x00000599, 0xec75d9f1, 0x9d4d2826},
- {0xeaa344e0, 0x00000014, 0x000004d8, 0x72a4c981, 0x2064ea06},
- {0xbbb52021, 0x0000003b, 0x00000272, 0x04af99fc, 0xaf042d35},
- {0xb66384dc, 0x0000001d, 0x000007fc, 0xd7629116, 0x782bd801},
- {0x616c01b6, 0x00000022, 0x000002c8, 0x5b1dab30, 0x783ce7d2},
- {0xce2bdaad, 0x00000016, 0x0000062a, 0x932535c8, 0x3f02926d},
- {0x00fe84d7, 0x00000005, 0x00000205, 0x850e50aa, 0x753d649c},
- {0xbebdcb4c, 0x00000006, 0x0000055d, 0xbeaa37a2, 0x2d8c9eba},
- {0xd8b1a02a, 0x00000010, 0x00000387, 0x5017d2fc, 0x503541a5},
- {0x3b96cad2, 0x00000036, 0x00000347, 0x1d2372ae, 0x926cd90b},
- {0xc94c1ed7, 0x00000005, 0x0000038b, 0x9e9fdb22, 0x144a9178},
- {0x1aad454e, 0x00000025, 0x000002b2, 0xc3f6315c, 0x5c7a35b3},
- {0xa4fec9a6, 0x00000000, 0x000006d6, 0x90be5080, 0xa4107605},
- {0x1bbe71e2, 0x0000001f, 0x000002fd, 0x4e504c3b, 0x284ccaf1},
- {0x4201c7e4, 0x00000002, 0x000002b7, 0x7822e3f9, 0x0cc912a9},
- {0x23fddc96, 0x00000003, 0x00000627, 0x8a385125, 0x07767e78},
- {0xd82ba25c, 0x00000016, 0x0000063e, 0x98e4148a, 0x283330c9},
- {0x786f2032, 0x0000002d, 0x0000060f, 0xf201600a, 0xf561bfcd},
- {0xfebe4e1f, 0x0000002a, 0x000004f2, 0x95e51961, 0xfd80dcab},
- {0x1a6e0a39, 0x00000008, 0x00000672, 0x8af6c2a5, 0x78dd84cb},
- {0x56000ab8, 0x0000000e, 0x000000e5, 0x36bacb8f, 0x22ee1f77},
- {0x4717fe0c, 0x00000000, 0x000006ec, 0x8439f342, 0x5c8e03da},
- {0xd5d5d68e, 0x0000003c, 0x000003a3, 0x46fff083, 0x177d1b39},
- {0xc25dd6c6, 0x00000024, 0x000006c0, 0x5ceb8eb4, 0x892b0d16},
- {0xe9b11300, 0x00000023, 0x00000683, 0x07a5d59a, 0x6c6a3208},
- {0x95cd285e, 0x00000001, 0x00000047, 0x7b3a4368, 0x0202c07e},
- {0xd9245a25, 0x0000001e, 0x000003a6, 0xd33c1841, 0x1936c0d5},
- {0x103279db, 0x00000006, 0x0000039b, 0xca09b8a0, 0x77d62892},
- {0x1cba3172, 0x00000027, 0x000001c8, 0xcb377194, 0xebe682db},
- {0x8f613739, 0x0000000c, 0x000001df, 0xb4b0bc87, 0x7710bd43},
- {0x1c6aa90d, 0x0000001b, 0x0000053c, 0x70559245, 0xda7894ac},
- {0xaabe5b93, 0x0000003d, 0x00000715, 0xcdbf42fa, 0x0c3b99e7},
- {0xf15dd038, 0x00000006, 0x000006db, 0x6e104aea, 0x8d5967f2},
- {0x584dd49c, 0x00000020, 0x000007bc, 0x36b6cfd6, 0xad4e23b2},
- {0x5d8c9506, 0x00000020, 0x00000470, 0x4c62378e, 0x31d92640},
- {0xb80d17b0, 0x00000032, 0x00000346, 0x22a5bb88, 0x9a7ec89f},
- {0xdaf0592e, 0x00000023, 0x000007b0, 0x3cab3f99, 0x9b1fdd99},
- {0x4793cc85, 0x0000000d, 0x00000706, 0xe82e04f6, 0xed3db6b7},
- {0x82ebf64e, 0x00000009, 0x000007c3, 0x69d590a9, 0x9efa8499},
- {0xb18a0319, 0x00000026, 0x000007db, 0x1cf98dcc, 0x8fa9ad6a},
+ {0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1,
+ 0xf6e93d6c},
+ {0x35c672c6, 0x0000003a, 0x000001aa, 0xc6d3dfba, 0x28aaf3ad,
+ 0x0fe92aca},
+ {0x496da28e, 0x00000039, 0x000005af, 0xd933660f, 0x5d57e81f,
+ 0x52e1ebb8},
+ {0x09a9b90e, 0x00000027, 0x000001f8, 0xb45fe007, 0xf45fca9a,
+ 0x0798af9a},
+ {0xdc97e5a9, 0x00000025, 0x000003b6, 0xf81a3562, 0xe0126ba2,
+ 0x18eb3152},
+ {0x47c58900, 0x0000000a, 0x000000b9, 0x8e58eccf, 0xf3afc793,
+ 0xd00d08c7},
+ {0x292561e8, 0x0000000c, 0x00000403, 0xa2ba8aaf, 0x0b797aed,
+ 0x8ba966bc},
+ {0x415037f6, 0x00000003, 0x00000676, 0xa17d52e8, 0x7f0fdf35,
+ 0x11d694a2},
+ {0x3466e707, 0x00000026, 0x00000042, 0x258319be, 0x75c484a2,
+ 0x6ab3208d},
+ {0xafd1281b, 0x00000023, 0x000002ee, 0x4428eaf8, 0x06c7ad10,
+ 0xba4603c5},
+ {0xd3857b18, 0x00000028, 0x000004a2, 0x5c430821, 0xb062b7cb,
+ 0xe6071c6f},
+ {0x1d825a8f, 0x0000002b, 0x0000050b, 0xd2c45f0c, 0xd68634e0,
+ 0x179ec30a},
+ {0x5033e3bc, 0x0000000b, 0x00000078, 0xa3ea4113, 0xac6d31fb,
+ 0x0903beb8},
+ {0x94f1fb5e, 0x0000000f, 0x000003a2, 0xfbfc50b1, 0x3cfe50ed,
+ 0x6a7cb4fa},
+ {0xc9a0fe14, 0x00000009, 0x00000473, 0x5fb61894, 0x87070591,
+ 0xdb535801},
+ {0x88a034b1, 0x0000001c, 0x000005ad, 0xc1b16053, 0x46f95c67,
+ 0x92bed597},
+ {0xf0f72239, 0x00000020, 0x0000026d, 0xa6fa58f3, 0xf8c2c1dd,
+ 0x192a3f1b},
+ {0xcc20a5e3, 0x0000003b, 0x0000067a, 0x7740185a, 0x308b979a,
+ 0xccbaec1a},
+ {0xce589c95, 0x0000002b, 0x00000641, 0xd055e987, 0x40aae25b,
+ 0x7eabae4d},
+ {0x78edc885, 0x00000035, 0x000005be, 0xa39cb14b, 0x035b0d1f,
+ 0x28c72982},
+ {0x9d40a377, 0x0000003b, 0x00000038, 0x1f47ccd2, 0x197fbc9d,
+ 0xc3cd4d18},
+ {0x703d0e01, 0x0000003c, 0x000006f1, 0x88735e7c, 0xfed57c5a,
+ 0xbca8f0e7},
+ {0x776bf505, 0x0000000f, 0x000005b2, 0x5cc4fc01, 0xf32efb97,
+ 0x713f60b3},
+ {0x4a3e7854, 0x00000027, 0x000004b8, 0x8d923c82, 0x0cbfb4a2,
+ 0xebd08fd5},
+ {0x209172dd, 0x0000003b, 0x00000356, 0xb89e9c2b, 0xd7868138,
+ 0x64406c59},
+ {0x3ba4cc5b, 0x0000002f, 0x00000203, 0xe51601a9, 0x5b2a1032,
+ 0x7421890e},
+ {0xfc62f297, 0x00000000, 0x00000079, 0x71a8e1a2, 0x5d88685f,
+ 0xe9347603},
+ {0x64280b8b, 0x00000016, 0x000007ab, 0x0fa7a30c, 0xda3a455f,
+ 0x1bef9060},
+ {0x97dd724b, 0x00000033, 0x000007ad, 0x5788b2f4, 0xd7326d32,
+ 0x34720072},
+ {0x61394b52, 0x00000035, 0x00000571, 0xc66525f1, 0xcabe7fef,
+ 0x48310f59},
+ {0x29b4faff, 0x00000024, 0x0000006e, 0xca13751e, 0x993648e0,
+ 0x783a4213},
+ {0x29bfb1dc, 0x0000000b, 0x00000244, 0x436c43f7, 0x429f7a59,
+ 0x9e8efd41},
+ {0x86ae934b, 0x00000035, 0x00000104, 0x0760ec93, 0x9cf7d0f4,
+ 0xfc3d34a5},
+ {0xc4c1024e, 0x0000002e, 0x000006b1, 0x6516a3ec, 0x19321f9c,
+ 0x17a52ae2},
+ {0x3287a80a, 0x00000026, 0x00000496, 0x0b257eb1, 0x754ebd51,
+ 0x886d935a},
+ {0xa4db423e, 0x00000023, 0x0000045d, 0x9b3a66dc, 0x873e9f11,
+ 0xeaaeaeb2},
+ {0x7a1078df, 0x00000015, 0x0000014a, 0x8c2484c5, 0x6a628659,
+ 0x8e900a4b},
+ {0x6048bd5b, 0x00000006, 0x0000006a, 0x897e3559, 0xac9961af,
+ 0xd74662b1},
+ {0xd8f9ea20, 0x0000003d, 0x00000277, 0x60eb905b, 0xed2aaf99,
+ 0xd26752ba},
+ {0xea5ec3b4, 0x0000002a, 0x000004fe, 0x869965dc, 0x6c1f833b,
+ 0x8b1fcd62},
+ {0x2dfb005d, 0x00000016, 0x00000345, 0x6a3b117e, 0xf05e8521,
+ 0xf54342fe},
+ {0x5a214ade, 0x00000020, 0x000005b6, 0x467f70be, 0xcb22ccd3,
+ 0x5b95b988},
+ {0xf0ab9cca, 0x00000032, 0x00000515, 0xed223df3, 0x7f3ef01d,
+ 0x2e1176be},
+ {0x91b444f9, 0x0000002e, 0x000007f8, 0x84e9a983, 0x5676756f,
+ 0x66120546},
+ {0x1b5d2ddb, 0x0000002e, 0x0000012c, 0xba638c4c, 0x3f42047b,
+ 0xf256a5cc},
+ {0xd824d1bb, 0x0000003a, 0x000007b5, 0x6288653b, 0x3a3ebea0,
+ 0x4af1dd69},
+ {0x0470180c, 0x00000034, 0x000001f0, 0x9d5b80d6, 0x3de08195,
+ 0x56f0a04a},
+ {0xffaa3a3f, 0x00000036, 0x00000299, 0xf3a82ab8, 0x53e0c13d,
+ 0x74f6b6b2},
+ {0x6406cfeb, 0x00000023, 0x00000600, 0xa920b8e8, 0xe4e2acf4,
+ 0x085951fd},
+ {0xb24aaa38, 0x0000003e, 0x000004a1, 0x657cc328, 0x5077b2c3,
+ 0xc65387eb},
+ {0x58b2ab7c, 0x00000039, 0x000002b4, 0x3a17ee7e, 0x9dcb3643,
+ 0x1ca9257b},
+ {0x3db85970, 0x00000006, 0x000002b6, 0x95268b59, 0xb9812c10,
+ 0xfd196d76},
+ {0x857830c5, 0x00000003, 0x00000590, 0x4ef439d5, 0xf042161d,
+ 0x5ef88339},
+ {0xe1fcd978, 0x0000003e, 0x000007d8, 0xae8d8699, 0xce0a1ef5,
+ 0x2c3714d9},
+ {0xb982a768, 0x00000016, 0x000006e0, 0x62fad3df, 0x5f8a067b,
+ 0x58576548},
+ {0x1d581ce8, 0x0000001e, 0x0000058b, 0xf0f5da53, 0x26e39eee,
+ 0xfd7c57de},
+ {0x2456719b, 0x00000025, 0x00000503, 0x4296ac64, 0xd50e4c14,
+ 0xd5fedd59},
+ {0xfae6d8f2, 0x00000000, 0x0000055d, 0x057fdf2e, 0x2a31391a,
+ 0x1cc3b17b},
+ {0xcba828e3, 0x00000039, 0x000002ce, 0xe3f22351, 0x8f00877b,
+ 0x270eed73},
+ {0x13d25952, 0x0000000a, 0x0000072d, 0x76d4b4cc, 0x5eb67ec3,
+ 0x91ecbb11},
+ {0x0342be3f, 0x00000015, 0x00000599, 0xec75d9f1, 0x9d4d2826,
+ 0x05ed8d0c},
+ {0xeaa344e0, 0x00000014, 0x000004d8, 0x72a4c981, 0x2064ea06,
+ 0x0b09ad5b},
+ {0xbbb52021, 0x0000003b, 0x00000272, 0x04af99fc, 0xaf042d35,
+ 0xf8d511fb},
+ {0xb66384dc, 0x0000001d, 0x000007fc, 0xd7629116, 0x782bd801,
+ 0x5ad832cc},
+ {0x616c01b6, 0x00000022, 0x000002c8, 0x5b1dab30, 0x783ce7d2,
+ 0x1214d196},
+ {0xce2bdaad, 0x00000016, 0x0000062a, 0x932535c8, 0x3f02926d,
+ 0x5747218a},
+ {0x00fe84d7, 0x00000005, 0x00000205, 0x850e50aa, 0x753d649c,
+ 0xde8f14de},
+ {0xbebdcb4c, 0x00000006, 0x0000055d, 0xbeaa37a2, 0x2d8c9eba,
+ 0x3563b7b9},
+ {0xd8b1a02a, 0x00000010, 0x00000387, 0x5017d2fc, 0x503541a5,
+ 0x071475d0},
+ {0x3b96cad2, 0x00000036, 0x00000347, 0x1d2372ae, 0x926cd90b,
+ 0x54c79d60},
+ {0xc94c1ed7, 0x00000005, 0x0000038b, 0x9e9fdb22, 0x144a9178,
+ 0x4c53eee6},
+ {0x1aad454e, 0x00000025, 0x000002b2, 0xc3f6315c, 0x5c7a35b3,
+ 0x10137a3c},
+ {0xa4fec9a6, 0x00000000, 0x000006d6, 0x90be5080, 0xa4107605,
+ 0xaa9d6c73},
+ {0x1bbe71e2, 0x0000001f, 0x000002fd, 0x4e504c3b, 0x284ccaf1,
+ 0xb63d23e7},
+ {0x4201c7e4, 0x00000002, 0x000002b7, 0x7822e3f9, 0x0cc912a9,
+ 0x7f53e9cf},
+ {0x23fddc96, 0x00000003, 0x00000627, 0x8a385125, 0x07767e78,
+ 0x13c1cd83},
+ {0xd82ba25c, 0x00000016, 0x0000063e, 0x98e4148a, 0x283330c9,
+ 0x49ff5867},
+ {0x786f2032, 0x0000002d, 0x0000060f, 0xf201600a, 0xf561bfcd,
+ 0x8467f211},
+ {0xfebe4e1f, 0x0000002a, 0x000004f2, 0x95e51961, 0xfd80dcab,
+ 0x3f9683b2},
+ {0x1a6e0a39, 0x00000008, 0x00000672, 0x8af6c2a5, 0x78dd84cb,
+ 0x76a3f874},
+ {0x56000ab8, 0x0000000e, 0x000000e5, 0x36bacb8f, 0x22ee1f77,
+ 0x863b702f},
+ {0x4717fe0c, 0x00000000, 0x000006ec, 0x8439f342, 0x5c8e03da,
+ 0xdc6c58ff},
+ {0xd5d5d68e, 0x0000003c, 0x000003a3, 0x46fff083, 0x177d1b39,
+ 0x0622cc95},
+ {0xc25dd6c6, 0x00000024, 0x000006c0, 0x5ceb8eb4, 0x892b0d16,
+ 0xe85605cd},
+ {0xe9b11300, 0x00000023, 0x00000683, 0x07a5d59a, 0x6c6a3208,
+ 0x31da5f06},
+ {0x95cd285e, 0x00000001, 0x00000047, 0x7b3a4368, 0x0202c07e,
+ 0xa1f2e784},
+ {0xd9245a25, 0x0000001e, 0x000003a6, 0xd33c1841, 0x1936c0d5,
+ 0xb07cc616},
+ {0x103279db, 0x00000006, 0x0000039b, 0xca09b8a0, 0x77d62892,
+ 0xbf943b6c},
+ {0x1cba3172, 0x00000027, 0x000001c8, 0xcb377194, 0xebe682db,
+ 0x2c01af1c},
+ {0x8f613739, 0x0000000c, 0x000001df, 0xb4b0bc87, 0x7710bd43,
+ 0x0fe5f56d},
+ {0x1c6aa90d, 0x0000001b, 0x0000053c, 0x70559245, 0xda7894ac,
+ 0xf8943b2d},
+ {0xaabe5b93, 0x0000003d, 0x00000715, 0xcdbf42fa, 0x0c3b99e7,
+ 0xe4d89272},
+ {0xf15dd038, 0x00000006, 0x000006db, 0x6e104aea, 0x8d5967f2,
+ 0x7c2f6bbb},
+ {0x584dd49c, 0x00000020, 0x000007bc, 0x36b6cfd6, 0xad4e23b2,
+ 0xabbf388b},
+ {0x5d8c9506, 0x00000020, 0x00000470, 0x4c62378e, 0x31d92640,
+ 0x1dca1f4e},
+ {0xb80d17b0, 0x00000032, 0x00000346, 0x22a5bb88, 0x9a7ec89f,
+ 0x5c170e23},
+ {0xdaf0592e, 0x00000023, 0x000007b0, 0x3cab3f99, 0x9b1fdd99,
+ 0xc0e9d672},
+ {0x4793cc85, 0x0000000d, 0x00000706, 0xe82e04f6, 0xed3db6b7,
+ 0xc18bdc86},
+ {0x82ebf64e, 0x00000009, 0x000007c3, 0x69d590a9, 0x9efa8499,
+ 0xa874fcdd},
+ {0xb18a0319, 0x00000026, 0x000007db, 0x1cf98dcc, 0x8fa9ad6a,
+ 0x9dc0bb48},
};

#include <linux/time.h>

-static int __init crc32_init(void)
+static int __init crc32c_test(void)
+{
+ int i;
+ int errors = 0;
+ int bytes = 0;
+ struct timespec start, stop;
+ u64 nsec;
+ unsigned long flags;
+
+ /* keep static to prevent cache warming code from
+ * getting eliminated by the compiler */
+ static u32 crc;
+
+ /* pre-warm the cache */
+ for (i = 0; i < 100; i++) {
+ bytes += 2*test[i].length;
+
+ crc ^= __crc32c_le(test[i].crc, test_buf +
+ test[i].start, test[i].length);
+ }
+
+ /* reduce OS noise */
+ local_irq_save(flags);
+ local_irq_disable();
+
+ getnstimeofday(&start);
+ for (i = 0; i < 100; i++) {
+ if (test[i].crc32c_le != __crc32c_le(test[i].crc, test_buf +
+ test[i].start, test[i].length))
+ errors++;
+ }
+ getnstimeofday(&stop);
+
+ local_irq_restore(flags);
+ local_irq_enable();
+
+ nsec = stop.tv_nsec - start.tv_nsec +
+ 1000000000 * (stop.tv_sec - start.tv_sec);
+
+ pr_info("crc32c: CRC_LE_BITS = %d\n", CRC_LE_BITS);
+
+ if (errors)
+ pr_warn("crc32c: %d self tests failed\n", errors);
+ else {
+ pr_info("crc32c: self tests passed, processed %d bytes in %lld nsec\n",
+ bytes, nsec);
+ }
+
+ return 0;
+}
+
+static int __init crc32_test(void)
{
int i;
int errors = 0;
@@ -930,10 +1082,17 @@ static int __init crc32_init(void)
return 0;
}

+static int __init crc32test_init(void)
+{
+ crc32_test();
+ crc32c_test();
+ return 0;
+}
+
static void __exit crc32_exit(void)
{
}

-module_init(crc32_init);
+module_init(crc32test_init);
module_exit(crc32_exit);
#endif /* CONFIG_CRC32_SELFTEST */

2011-12-01 20:14:57

by djwong

[permalink] [raw]
Subject: [PATCH 11/14] crc32: Bolt on crc32c

Reuse the existing crc32 code to stamp out a crc32c implementation.

Signed-off-by: Darrick J. Wong <[email protected]>
---
include/linux/crc32.h | 2 ++
lib/Kconfig | 8 +++---
lib/crc32.c | 62 +++++++++++++++++++++++++++++++------------------
lib/crc32defs.h | 7 ++++++
lib/gen_crc32table.c | 35 ++++++++++++++++++++++------
5 files changed, 80 insertions(+), 34 deletions(-)


diff --git a/include/linux/crc32.h b/include/linux/crc32.h
index 391a259..68267b6 100644
--- a/include/linux/crc32.h
+++ b/include/linux/crc32.h
@@ -11,6 +11,8 @@
extern u32 crc32_le(u32 crc, unsigned char const *p, size_t len);
extern u32 crc32_be(u32 crc, unsigned char const *p, size_t len);

+extern u32 __crc32c_le(u32 crc, unsigned char const *p, size_t len);
+
#define crc32(seed, data, length) crc32_le(seed, (unsigned char const *)(data), length)

/*
diff --git a/lib/Kconfig b/lib/Kconfig
index 2bc5834..cfddafc 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -51,14 +51,14 @@ config CRC_ITU_T
functions require M here.

config CRC32
- tristate "CRC32 functions"
+ tristate "CRC32/CRC32c functions"
default y
select BITREVERSE
help
This option is provided for the case where no in-kernel-tree
- modules require CRC32 functions, but a module built outside the
- kernel tree does. Such modules that use library CRC32 functions
- require M here.
+ modules require CRC32/CRC32c functions, but a module built outside
+ the kernel tree does. Such modules that use library CRC32/CRC32c
+ functions require M here.

config CRC32_SELFTEST
bool "CRC32 perform self test on init"
diff --git a/lib/crc32.c b/lib/crc32.c
index d56516d..8df9561 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -46,7 +46,7 @@
#include "crc32table.h"

MODULE_AUTHOR("Matt Domsch <[email protected]>");
-MODULE_DESCRIPTION("Ethernet CRC32 calculations");
+MODULE_DESCRIPTION("Various CRC32 calculations");
MODULE_LICENSE("GPL");

#if CRC_LE_BITS > 8 || CRC_BE_BITS > 8
@@ -135,46 +135,57 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
* @p: pointer to buffer over which CRC is run
* @len: length of buffer @p
*/
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
+static inline u32 __pure crc32_le_generic(u32 crc, unsigned char const *p,
+ size_t len, const u32 (*tab)[256],
+ u32 polynomial)
{
#if CRC_LE_BITS == 1
int i;
while (len--) {
crc ^= *p++;
for (i = 0; i < 8; i++)
- crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
+ crc = (crc >> 1) ^ ((crc & 1) ? polynomial : 0);
}
# elif CRC_LE_BITS == 2
while (len--) {
crc ^= *p++;
- crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
- crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
- crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
- crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
+ crc = (crc >> 2) ^ tab[0][crc & 3];
+ crc = (crc >> 2) ^ tab[0][crc & 3];
+ crc = (crc >> 2) ^ tab[0][crc & 3];
+ crc = (crc >> 2) ^ tab[0][crc & 3];
}
# elif CRC_LE_BITS == 4
while (len--) {
crc ^= *p++;
- crc = (crc >> 4) ^ crc32table_le[0][crc & 15];
- crc = (crc >> 4) ^ crc32table_le[0][crc & 15];
+ crc = (crc >> 4) ^ tab[0][crc & 15];
+ crc = (crc >> 4) ^ tab[0][crc & 15];
}
# elif CRC_LE_BITS == 8
/* aka Sarwate algorithm */
while (len--) {
crc ^= *p++;
- crc = (crc >> 8) ^ crc32table_le[0][crc & 255];
+ crc = (crc >> 8) ^ tab[0][crc & 255];
}
# else
- const u32 (*tab)[] = crc32table_le;
-
crc = (__force u32) __cpu_to_le32(crc);
crc = crc32_body(crc, p, len, tab);
crc = __le32_to_cpu((__force __le32)crc);
#endif
return crc;
}
+
+u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
+{
+ return crc32_le_generic(crc, p, len, crc32table_le, CRCPOLY_LE);
+}
EXPORT_SYMBOL(crc32_le);

+u32 __pure __crc32c_le(u32 crc, unsigned char const *p, size_t len)
+{
+ return crc32_le_generic(crc, p, len, crc32ctable_le, CRC32C_POLY_LE);
+}
+EXPORT_SYMBOL(__crc32c_le);
+
/**
* crc32_be() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
* @crc: seed value for computation. ~0 for Ethernet, sometimes 0 for
@@ -182,7 +193,9 @@ EXPORT_SYMBOL(crc32_le);
* @p: pointer to buffer over which CRC is run
* @len: length of buffer @p
*/
-u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
+static inline u32 __pure crc32_be_generic(u32 crc, unsigned char const *p,
+ size_t len, const u32 (*tab)[256],
+ u32 polynomial)
{
#if CRC_BE_BITS == 1
int i;
@@ -190,37 +203,40 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
crc ^= *p++ << 24;
for (i = 0; i < 8; i++)
crc =
- (crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE :
+ (crc << 1) ^ ((crc & 0x80000000) ? polynomial :
0);
}
# elif CRC_BE_BITS == 2
while (len--) {
crc ^= *p++ << 24;
- crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
- crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
- crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
- crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
+ crc = (crc << 2) ^ tab[0][crc >> 30];
+ crc = (crc << 2) ^ tab[0][crc >> 30];
+ crc = (crc << 2) ^ tab[0][crc >> 30];
+ crc = (crc << 2) ^ tab[0][crc >> 30];
}
# elif CRC_BE_BITS == 4
while (len--) {
crc ^= *p++ << 24;
- crc = (crc << 4) ^ crc32table_be[0][crc >> 28];
- crc = (crc << 4) ^ crc32table_be[0][crc >> 28];
+ crc = (crc << 4) ^ tab[0][crc >> 28];
+ crc = (crc << 4) ^ tab[0][crc >> 28];
}
# elif CRC_BE_BITS == 8
while (len--) {
crc ^= *p++ << 24;
- crc = (crc << 8) ^ crc32table_be[0][crc >> 24];
+ crc = (crc << 8) ^ tab[0][crc >> 24];
}
# else
- const u32 (*tab)[] = crc32table_be;
-
crc = (__force u32) __cpu_to_be32(crc);
crc = crc32_body(crc, p, len, tab);
crc = __be32_to_cpu((__force __be32)crc);
# endif
return crc;
}
+
+u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
+{
+ return crc32_be_generic(crc, p, len, crc32table_be, CRCPOLY_BE);
+}
EXPORT_SYMBOL(crc32_be);

#ifdef CONFIG_CRC32_SELFTEST
diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index 8181592..6fd1917 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -7,6 +7,13 @@
#define CRCPOLY_BE 0x04c11db7

/*
+ * This is the CRC32c polynomial, as outlined by Castagnoli.
+ * x^32+x^28+x^27+x^26+x^25+x^23+x^22+x^20+x^19+x^18+x^14+x^13+x^11+x^10+x^9+
+ * x^8+x^6+x^0
+ */
+#define CRC32C_POLY_LE 0x82F63B78
+
+/*
* How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and 64.
* For less performance-sensitive, use 4 or 8 to save table size.
* For larger systems choose same as CPU architecture as default.
diff --git a/lib/gen_crc32table.c b/lib/gen_crc32table.c
index 0d9edd1..8f8d543 100644
--- a/lib/gen_crc32table.c
+++ b/lib/gen_crc32table.c
@@ -23,6 +23,7 @@

static uint32_t crc32table_le[LE_TABLE_ROWS][256];
static uint32_t crc32table_be[BE_TABLE_ROWS][256];
+static uint32_t crc32ctable_le[LE_TABLE_ROWS][256];

/**
* crc32init_le() - allocate and initialize LE table data
@@ -31,27 +32,38 @@ static uint32_t crc32table_be[BE_TABLE_ROWS][256];
* fact that crctable[i^j] = crctable[i] ^ crctable[j].
*
*/
-static void crc32init_le(void)
+static void crc32init_le_generic(const uint32_t polynomial,
+ uint32_t (*tab)[256])
{
unsigned i, j;
uint32_t crc = 1;

- crc32table_le[0][0] = 0;
+ tab[0][0] = 0;

for (i = LE_TABLE_SIZE >> 1; i; i >>= 1) {
- crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
+ crc = (crc >> 1) ^ ((crc & 1) ? polynomial : 0);
for (j = 0; j < LE_TABLE_SIZE; j += 2 * i)
- crc32table_le[0][i + j] = crc ^ crc32table_le[0][j];
+ tab[0][i + j] = crc ^ tab[0][j];
}
for (i = 0; i < LE_TABLE_SIZE; i++) {
- crc = crc32table_le[0][i];
+ crc = tab[0][i];
for (j = 1; j < LE_TABLE_ROWS; j++) {
- crc = crc32table_le[0][crc & 0xff] ^ (crc >> 8);
- crc32table_le[j][i] = crc;
+ crc = tab[0][crc & 0xff] ^ (crc >> 8);
+ tab[j][i] = crc;
}
}
}

+static void crc32init_le(void)
+{
+ crc32init_le_generic(CRCPOLY_LE, crc32table_le);
+}
+
+static void crc32cinit_le(void)
+{
+ crc32init_le_generic(CRC32C_POLY_LE, crc32ctable_le);
+}
+
/**
* crc32init_be() - allocate and initialize BE table data
*/
@@ -114,6 +126,15 @@ int main(int argc, char** argv)
BE_TABLE_SIZE, "tobe");
printf("};\n");
}
+ if (CRC_LE_BITS > 1) {
+ crc32cinit_le();
+ printf("static const u32 __cacheline_aligned "
+ "crc32ctable_le[%d][%d] = {",
+ LE_TABLE_ROWS, LE_TABLE_SIZE);
+ output_table(crc32ctable_le, LE_TABLE_ROWS,
+ LE_TABLE_SIZE, "tole");
+ printf("};\n");
+ }

return 0;
}


2011-12-01 20:14:50

by djwong

[permalink] [raw]
Subject: [PATCH 10/14] crc32: Add note about this patchset to crc32.c

Some final changes
- added a comment at the top of crc32.c

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 2c8e8c0..d56516d 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -1,4 +1,8 @@
/*
+ * Aug 8, 2011 Bob Pearson with help from Joakim Tjernlund and George Spelvin
+ * cleaned up code to current version of sparse and added the slicing-by-8
+ * algorithm to the closely similar existing slicing-by-4 algorithm.
+ *
* Oct 15, 2000 Matt Domsch <[email protected]>
* Nicer crc32 functions/docs submitted by [email protected]. Thanks!
* Code was from the public domain, copyright abandoned. Code was

2011-12-01 20:14:44

by djwong

[permalink] [raw]
Subject: [PATCH 09/14] crc32: Optimize loop counter for x86

Add two changes that improve the performance of x86 systems
1. replace main loop with incrementing counter
this change improves the performance of the selftest
by about 5-6% on Nehalem CPUs. The apparent
reason is that the compiler can use the loop index
to perform an indexed memory access. This is
reported to make the performance of PowerPC CPUs
to get worse.
2. replace the rem_len loop with incrementing counter
this change improves the performance of the selftest,
which has more than the usual number of occurances,
by about 1-2% on x86 CPUs. In actual work loads
the length is most often a multiple of 4 bytes and
this code does not get executed as often if at all.
Again this change is reported to make the performance
of PowerPC get worse.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index 6311712..2c8e8c0 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -66,6 +66,9 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
# endif
const u32 *b;
size_t rem_len;
+# ifdef CONFIG_X86
+ size_t i;
+# endif
const u32 *t0 = tab[0], *t1 = tab[1], *t2 = tab[2], *t3 = tab[3];
const u32 *t4 = tab[4], *t5 = tab[5], *t6 = tab[6], *t7 = tab[7];
u32 q;
@@ -86,7 +89,12 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
# endif

b = (const u32 *)buf;
+# ifdef CONFIG_X86
+ --b;
+ for (i = 0; i < len; i++) {
+# else
for (--b; len; --len) {
+# endif
q = crc ^ *++b; /* use pre increment for speed */
# if CRC_LE_BITS == 32
crc = DO_CRC4;
@@ -100,9 +108,14 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
/* And the last few bytes */
if (len) {
u8 *p = (u8 *)(b + 1) - 1;
+# ifdef CONFIG_X86
+ for (i = 0; i < len; i++)
+ DO_CRC(*++p); /* use pre increment for speed */
+# else
do {
DO_CRC(*++p); /* use pre increment for speed */
} while (--len);
+# endif
}
return crc;
#undef DO_CRC

2011-12-01 20:14:16

by djwong

[permalink] [raw]
Subject: [PATCH 05/14] crc32: Miscellaneous cleanups

Misc cleanup of lib/crc32.c and related files
- removed unnecessary header files.
- straightened out some convoluted ifdef's
- rewrote some references to 2 dimensional arrays as 1 dimensional
arrays to make them correct. I.e. replaced tab[i] with tab[0][i].
- a few trivial whitespace changes
- fixed a warning in gen_crc32tables.c caused by a mismatch in the
type of the pointer passed to output table. Since the table is
only used at kernel compile time, it is simpler to make the table
big enough to hold the largest column size used. One cannot make the
column size smaller in output_table because it has to be used by
both the le and be tables and they can have different column sizes.

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: Minor changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 104 +++++++++++++++++---------------------------------
lib/gen_crc32table.c | 6 +--
2 files changed, 39 insertions(+), 71 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index c93c9ae..2a87ea2 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -23,13 +23,10 @@
/* see: Documentation/crc32.txt for a description of algorithms */

#include <linux/crc32.h>
-#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/compiler.h>
#include <linux/types.h>
-#include <linux/init.h>
-#include <linux/atomic.h>
#include "crc32defs.h"
+
#if CRC_LE_BITS == 8
# define tole(x) __constant_cpu_to_le32(x)
#else
@@ -41,6 +38,7 @@
#else
# define tobe(x) (x)
#endif
+
#include "crc32table.h"

MODULE_AUTHOR("Matt Domsch <[email protected]>");
@@ -96,6 +94,7 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
#undef DO_CRC4
}
#endif
+
/**
* crc32_le() - Calculate bitwise little-endian Ethernet AUTODIN II CRC32
* @crc: seed value for computation. ~0 for Ethernet, sometimes 0 for
@@ -103,53 +102,39 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
* @p: pointer to buffer over which CRC is run
* @len: length of buffer @p
*/
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len);
-
-#if CRC_LE_BITS == 1
-/*
- * In fact, the table-based code will work in this case, but it can be
- * simplified by inlining the table in ?: form.
- */
-
u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
{
+#if CRC_LE_BITS == 1
int i;
while (len--) {
crc ^= *p++;
for (i = 0; i < 8; i++)
crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
}
- return crc;
-}
-#else /* Table-based approach */
-
-u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
-{
-# if CRC_LE_BITS == 8
- const u32 (*tab)[] = crc32table_le;
-
- crc = __cpu_to_le32(crc);
- crc = crc32_body(crc, p, len, tab);
- return __le32_to_cpu(crc);
-# elif CRC_LE_BITS == 4
+# elif CRC_LE_BITS == 2
while (len--) {
crc ^= *p++;
- crc = (crc >> 4) ^ crc32table_le[crc & 15];
- crc = (crc >> 4) ^ crc32table_le[crc & 15];
+ crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
+ crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
+ crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
+ crc = (crc >> 2) ^ crc32table_le[0][crc & 3];
}
- return crc;
-# elif CRC_LE_BITS == 2
+# elif CRC_LE_BITS == 4
while (len--) {
crc ^= *p++;
- crc = (crc >> 2) ^ crc32table_le[crc & 3];
- crc = (crc >> 2) ^ crc32table_le[crc & 3];
- crc = (crc >> 2) ^ crc32table_le[crc & 3];
- crc = (crc >> 2) ^ crc32table_le[crc & 3];
+ crc = (crc >> 4) ^ crc32table_le[0][crc & 15];
+ crc = (crc >> 4) ^ crc32table_le[0][crc & 15];
}
+# elif CRC_LE_BITS == 8
+ const u32 (*tab)[] = crc32table_le;
+
+ crc = __cpu_to_le32(crc);
+ crc = crc32_body(crc, p, len, tab);
+ crc = __le32_to_cpu(crc);
+#endif
return crc;
-# endif
}
-#endif
+EXPORT_SYMBOL(crc32_le);

/**
* crc32_be() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
@@ -158,16 +143,9 @@ u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
* @p: pointer to buffer over which CRC is run
* @len: length of buffer @p
*/
-u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len);
-
-#if CRC_BE_BITS == 1
-/*
- * In fact, the table-based code will work in this case, but it can be
- * simplified by inlining the table in ?: form.
- */
-
u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
{
+#if CRC_BE_BITS == 1
int i;
while (len--) {
crc ^= *p++ << 24;
@@ -176,39 +154,29 @@ u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
(crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE :
0);
}
- return crc;
-}
-
-#else /* Table-based approach */
-u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
-{
-# if CRC_BE_BITS == 8
- const u32 (*tab)[] = crc32table_be;
-
- crc = __cpu_to_be32(crc);
- crc = crc32_body(crc, p, len, tab);
- return __be32_to_cpu(crc);
-# elif CRC_BE_BITS == 4
+# elif CRC_BE_BITS == 2
while (len--) {
crc ^= *p++ << 24;
- crc = (crc << 4) ^ crc32table_be[crc >> 28];
- crc = (crc << 4) ^ crc32table_be[crc >> 28];
+ crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
+ crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
+ crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
+ crc = (crc << 2) ^ crc32table_be[0][crc >> 30];
}
- return crc;
-# elif CRC_BE_BITS == 2
+# elif CRC_BE_BITS == 4
while (len--) {
crc ^= *p++ << 24;
- crc = (crc << 2) ^ crc32table_be[crc >> 30];
- crc = (crc << 2) ^ crc32table_be[crc >> 30];
- crc = (crc << 2) ^ crc32table_be[crc >> 30];
- crc = (crc << 2) ^ crc32table_be[crc >> 30];
+ crc = (crc << 4) ^ crc32table_be[0][crc >> 28];
+ crc = (crc << 4) ^ crc32table_be[0][crc >> 28];
}
- return crc;
+# elif CRC_BE_BITS == 8
+ const u32 (*tab)[] = crc32table_be;
+
+ crc = __cpu_to_be32(crc);
+ crc = crc32_body(crc, p, len, tab);
+ crc = __be32_to_cpu(crc);
# endif
+ return crc;
}
-#endif
-
-EXPORT_SYMBOL(crc32_le);
EXPORT_SYMBOL(crc32_be);

#ifdef CONFIG_CRC32_SELFTEST
diff --git a/lib/gen_crc32table.c b/lib/gen_crc32table.c
index 85d0e41..eced769 100644
--- a/lib/gen_crc32table.c
+++ b/lib/gen_crc32table.c
@@ -7,8 +7,8 @@
#define LE_TABLE_SIZE (1 << CRC_LE_BITS)
#define BE_TABLE_SIZE (1 << CRC_BE_BITS)

-static uint32_t crc32table_le[4][LE_TABLE_SIZE];
-static uint32_t crc32table_be[4][BE_TABLE_SIZE];
+static uint32_t crc32table_le[4][256];
+static uint32_t crc32table_be[4][256];

/**
* crc32init_le() - allocate and initialize LE table data
@@ -62,7 +62,7 @@ static void crc32init_be(void)
}
}

-static void output_table(uint32_t table[4][256], int len, char *trans)
+static void output_table(uint32_t (*table)[256], int len, char *trans)
{
int i, j;



2011-12-01 20:13:50

by djwong

[permalink] [raw]
Subject: [PATCH 01/14] crc32: removed two instances of trailing whitespaces

- remove trailing whitespace from lib/crc32.c
- remove trailing whitespace from lib/crc32defs.h

From: Bob Pearson <[email protected]>
Signed-off-by: Bob Pearson <[email protected]>
[[email protected]: changelog tweaks]
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/crc32.c | 2 +-
lib/crc32defs.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/lib/crc32.c b/lib/crc32.c
index a6e633a..23b08ba 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -316,7 +316,7 @@ EXPORT_SYMBOL(crc32_be);
* in the correct multiple to subtract, we can shift a byte at a time.
* This produces a 40-bit (rather than a 33-bit) intermediate remainder,
* but again the multiple of the polynomial to subtract depends only on
- * the high bits, the high 8 bits in this case.
+ * the high bits, the high 8 bits in this case.
*
* The multiple we need in that case is the low 32 bits of a 40-bit
* value whose high 8 bits are given, and which is a multiple of the
diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index 9b6773d..f5a5401 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -8,7 +8,7 @@

/* How many bits at a time to use. Requires a table of 4<<CRC_xx_BITS bytes. */
/* For less performance-sensitive, use 4 */
-#ifndef CRC_LE_BITS
+#ifndef CRC_LE_BITS
# define CRC_LE_BITS 8
#endif
#ifndef CRC_BE_BITS


2011-12-01 20:21:16

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

On Thu, Dec 01, 2011 at 12:13:41PM -0800, Darrick J. Wong wrote:
> Hi all,
>
> This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
> software crc32c implementation. It removes the crc32c implementation in
> crypto/ in favor of using the stamped-out one in lib/. There is also a change
> to Kconfig so that the kernel builder can pick an implementation best suited
> for the hardware.
>
> The motivation for this patchset is that I am working on adding full metadata
> checksumming to ext4. As far as performance impact of adding checksumming
> goes, I see nearly no change with a standard mail server ffsb simulation. On a
> test that involves only file creation and deletion and extent tree writes, I
> see a drop of about 50 pcercent with the current kernel crc32c implementation;
> this improves to a drop of about 20 percent with the enclosed crc32c code.
>
> When metadata is usually a small fraction of total IO, this new implementation
> doesn't help much because metadata is usually a small fraction of total IO.
> However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
> tree), then this patch speeds up the operation substantially.
>
> Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
> should improve their speed as well. I have not yet quantified that, however.

I thought they usually used the SSE instruction for crc32 or
equivalent.

Joel

--

"I almost ran over an angel
He had a nice big fat cigar.
'In a sense,' he said, 'You're alone here
So if you jump, you'd best jump far.'"

http://www.jlbec.org/
[email protected]

2011-12-01 20:31:29

by djwong

[permalink] [raw]
Subject: Re: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

On Thu, Dec 01, 2011 at 12:20:53PM -0800, Joel Becker wrote:
> On Thu, Dec 01, 2011 at 12:13:41PM -0800, Darrick J. Wong wrote:
> > Hi all,
> >
> > This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out a
> > software crc32c implementation. It removes the crc32c implementation in
> > crypto/ in favor of using the stamped-out one in lib/. There is also a change
> > to Kconfig so that the kernel builder can pick an implementation best suited
> > for the hardware.
> >
> > The motivation for this patchset is that I am working on adding full metadata
> > checksumming to ext4. As far as performance impact of adding checksumming
> > goes, I see nearly no change with a standard mail server ffsb simulation. On a
> > test that involves only file creation and deletion and extent tree writes, I
> > see a drop of about 50 pcercent with the current kernel crc32c implementation;
> > this improves to a drop of about 20 percent with the enclosed crc32c code.
> >
> > When metadata is usually a small fraction of total IO, this new implementation
> > doesn't help much because metadata is usually a small fraction of total IO.
> > However, when we are doing IO that is almost all metadata (such as rm -rf'ing a
> > tree), then this patch speeds up the operation substantially.
> >
> > Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this patchset
> > should improve their speed as well. I have not yet quantified that, however.
>
> I thought they usually used the SSE instruction for crc32 or
> equivalent.

They seem to call crc32c(), which is in crypto/crc32c. If you're interested in
hardware accelerated crc32c on Intel, it is still the case that the wrapper for
that can be loaded via crc32c-intel.

--D
>
> Joel
>
> --
>
> "I almost ran over an angel
> He had a nice big fat cigar.
> 'In a sense,' he said, 'You're alone here
> So if you jump, you'd best jump far.'"
>
> http://www.jlbec.org/
> [email protected]
>

2011-12-02 00:23:58

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

On Thu, Dec 01, 2011 at 12:31:22PM -0800, Darrick J. Wong wrote:
.
> They seem to call crc32c(), which is in crypto/crc32c. If you're interested in

Nope, the crypto API layer will use the SSE implementation
where available. Only when it isn't available will the C version
in crypto/ be used.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2011-12-02 00:25:16

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

On Thu, Dec 01, 2011 at 12:15:17PM -0800, Darrick J. Wong wrote:
> Allow the kernel builder to choose a crc32* algorithm for the kernel.
>
> Signed-off-by: Darrick J. Wong <[email protected]>

I don't like this at all. How do you expect distros or indeed
anyone to make this choice? For generic C implementations like
this we should only have one, and not many.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2011-12-03 02:30:37

by djwong

[permalink] [raw]
Subject: Re: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

On Fri, Dec 02, 2011 at 08:23:58AM +0800, Herbert Xu wrote:
> On Thu, Dec 01, 2011 at 12:31:22PM -0800, Darrick J. Wong wrote:
> .
> > They seem to call crc32c(), which is in crypto/crc32c. If you're interested in
>
> Nope, the crypto API layer will use the SSE implementation
> where available. Only when it isn't available will the C version
> in crypto/ be used.

There's a SSE version other than what's in crc32c-intel?

(I suspect we're talking about the same thing?)

--D
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-03 02:36:59

by djwong

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

On Fri, Dec 02, 2011 at 08:25:05AM +0800, Herbert Xu wrote:
> On Thu, Dec 01, 2011 at 12:15:17PM -0800, Darrick J. Wong wrote:
> > Allow the kernel builder to choose a crc32* algorithm for the kernel.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
>
> I don't like this at all. How do you expect distros or indeed
> anyone to make this choice? For generic C implementations like
> this we should only have one, and not many.

Slice-by-8 should be picked automatically if the builder doesn't explicitly
pick another one. The other choices are provided for people who want a slimmer
cache footprint. I guess I could make the Kconfig file a bit more explicit
about slice-by-8 being default, or I guess we could just ignore this one patch
and thereby keeping us with the old method where anyone who wants the slimmer
implementations patches the #defines.

--D
>
> Cheers,
> --
> Email: Herbert Xu <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-12-03 11:00:50

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v5.2 00/14] crc32c: Add faster algorithm and self-test code

On Fri, Dec 02, 2011 at 06:30:37PM -0800, Darrick J. Wong wrote:
> On Fri, Dec 02, 2011 at 08:23:58AM +0800, Herbert Xu wrote:
> > On Thu, Dec 01, 2011 at 12:31:22PM -0800, Darrick J. Wong wrote:
> > .
> > > They seem to call crc32c(), which is in crypto/crc32c. If you're interested in
> >
> > Nope, the crypto API layer will use the SSE implementation
> > where available. Only when it isn't available will the C version
> > in crypto/ be used.
>
> There's a SSE version other than what's in crc32c-intel?

crc32c-intel is what I meant. My point is that it'll be used
automatically.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2011-12-12 22:58:57

by djwong

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

On Fri, Dec 02, 2011 at 06:36:46PM -0800, Darrick J. Wong wrote:
> On Fri, Dec 02, 2011 at 08:25:05AM +0800, Herbert Xu wrote:
> > On Thu, Dec 01, 2011 at 12:15:17PM -0800, Darrick J. Wong wrote:
> > > Allow the kernel builder to choose a crc32* algorithm for the kernel.
> > >
> > > Signed-off-by: Darrick J. Wong <[email protected]>
> >
> > I don't like this at all. How do you expect distros or indeed
> > anyone to make this choice? For generic C implementations like
> > this we should only have one, and not many.
>
> Slice-by-8 should be picked automatically if the builder doesn't explicitly
> pick another one. The other choices are provided for people who want a slimmer
> cache footprint. I guess I could make the Kconfig file a bit more explicit
> about slice-by-8 being default, or I guess we could just ignore this one patch
> and thereby keeping us with the old method where anyone who wants the slimmer
> implementations patches the #defines.

Ok, here's a patch that makes it more explicit that sliceby8 is the default.
I expect distros and anyone else to simply hit <Enter>. The only people who
should do otherwise are people who know they are building for machines that
have small cache sizes such that the crc table fights for cache lines with the
data being checksummed.

I made a quick survey of CPU L1 cache quantities:

All Intel CPUs since the Pentium MMX have > 8KiB of L1.
All AMD CPUs since the K5 have had > 8KiB of L1.
Most SPARC64 CPUs except the UltraSparc T1 and T2 CPUs have > 8KiB of L1.
Most PowerPC CPUs since the 601 seem to have > 8KiB of L1.
All IBM POWER CPUs since at least the POWER2 have had > 8KiB of L1.
There are too many different ARM cores for me to track. My smartphones and
embedded ARM controllers all have > 8KIB of L1, but that's not enough to
generalize.

While I might've been tempted to agree with Herbert and hardwire the code to
use slice by 8, there are enough CPUs out there that *could* have too-small L1
caches that I'm not comfortable with _removing_ the Kconfig option to use a
slimmer algorithm. I can't gate the decision on 64-bitness either, since I've
seen plenty of i386 CPUs that benefit from slice by 8, and the UltraSparc T2 is
a 64-bit processor that seems likely to suffer cache thrashing.

I think having a configurable menu that steers people towards slice by 8 is
fine. Bob, was there a reason for picking slice by 4 for 32-bit machines?

D
---
Allow the kernel builder to choose a crc32* algorithm for the kernel.

Signed-off-by: Darrick J. Wong <[email protected]>
---

lib/Kconfig | 43 +++++++++++++++++++++++++++++++++++++++++++
lib/crc32defs.h | 18 ++++++++++++++++++
2 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index cfddafc..029c0e3 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -70,6 +70,49 @@ config CRC32_SELFTEST
and crc32_be over byte strings with random alignment and length
and computes the total elapsed time and number of bytes processed.

+choice
+ prompt "CRC32 implementation"
+ depends on CRC32
+ default CRC32_SLICEBY8
+
+config CRC32_SLICEBY8
+ bool "Slice by 8 bytes"
+ help
+ Calculate checksum 8 bytes at a time with a clever slicing algorithm.
+ This is the fastest algorithm, but comes with a 8KiB lookup table.
+ Most modern processors have enough cache to hold this table without
+ thrashing the cache.
+
+ This is the default implementation choice. Choose this one unless
+ you have a good reason not to.
+
+config CRC32_SLICEBY4
+ bool "Slice by 4 bytes"
+ help
+ Calculate checksum 4 bytes at a time with a clever slicing algorithm.
+ This is a bit slower than slice by 8, but has a smaller 4KiB lookup
+ table.
+
+ Only choose this option if you know what you are doing.
+
+config CRC32_SARWATE
+ bool "Sarwate's Algorithm (one byte at a time)"
+ help
+ Calculate checksum a byte at a time using Sarwate's algorithm. This
+ is not particularly fast, but has a small 256 byte lookup table.
+
+ Only choose this option if you know what you are doing.
+
+config CRC32_BIT
+ bool "Classic Algorithm (one bit at a time)"
+ help
+ Calculate checksum one bit at a time. This is VERY slow, but has
+ no lookup table. This is provided as a debugging option.
+
+ Only choose this option if you are debugging crc32.
+
+endchoice
+
config CRC7
tristate "CRC7 functions"
help
diff --git a/lib/crc32defs.h b/lib/crc32defs.h
index 6fd1917..64cba2c 100644
--- a/lib/crc32defs.h
+++ b/lib/crc32defs.h
@@ -13,6 +13,24 @@
*/
#define CRC32C_POLY_LE 0x82F63B78

+/* Try to choose an implementation variant via Kconfig */
+#ifdef CONFIG_CRC32_SLICEBY8
+# define CRC_LE_BITS 64
+# define CRC_BE_BITS 64
+#endif
+#ifdef CONFIG_CRC32_SLICEBY4
+# define CRC_LE_BITS 32
+# define CRC_BE_BITS 32
+#endif
+#ifdef CONFIG_CRC32_SARWATE
+# define CRC_LE_BITS 8
+# define CRC_BE_BITS 8
+#endif
+#ifdef CONFIG_CRC32_BIT
+# define CRC_LE_BITS 1
+# define CRC_BE_BITS 1
+#endif
+
/*
* How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and 64.
* For less performance-sensitive, use 4 or 8 to save table size.


2011-12-12 23:10:49

by Bob Pearson

[permalink] [raw]
Subject: RE: [PATCH 14/14] crc32: Select an algorithm via kconfig

That choice was for Joakim who measured better performance on his 32 bit PPC
platform with "by 4".

> -----Original Message-----
> From: Darrick J. Wong [mailto:[email protected]]
> Sent: Monday, December 12, 2011 4:59 PM
> To: Herbert Xu; Bob Pearson
> Cc: Andrew Morton; Theodore Tso; Joakim Tjernlund; linux-kernel; Andreas
> Dilger; linux-crypto; linux-fsdevel; Mingming Cao;
[email protected]
> Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig
>
> On Fri, Dec 02, 2011 at 06:36:46PM -0800, Darrick J. Wong wrote:
> > On Fri, Dec 02, 2011 at 08:25:05AM +0800, Herbert Xu wrote:
> > > On Thu, Dec 01, 2011 at 12:15:17PM -0800, Darrick J. Wong wrote:
> > > > Allow the kernel builder to choose a crc32* algorithm for the
kernel.
> > > >
> > > > Signed-off-by: Darrick J. Wong <[email protected]>
> > >
> > > I don't like this at all. How do you expect distros or indeed
> > > anyone to make this choice? For generic C implementations like
> > > this we should only have one, and not many.
> >
> > Slice-by-8 should be picked automatically if the builder doesn't
explicitly
> > pick another one. The other choices are provided for people who want a
> slimmer
> > cache footprint. I guess I could make the Kconfig file a bit more
explicit
> > about slice-by-8 being default, or I guess we could just ignore this one
> patch
> > and thereby keeping us with the old method where anyone who wants the
> slimmer
> > implementations patches the #defines.
>
> Ok, here's a patch that makes it more explicit that sliceby8 is the
default.
> I expect distros and anyone else to simply hit <Enter>. The only people
who
> should do otherwise are people who know they are building for machines
> that
> have small cache sizes such that the crc table fights for cache lines with
the
> data being checksummed.
>
> I made a quick survey of CPU L1 cache quantities:
>
> All Intel CPUs since the Pentium MMX have > 8KiB of L1.
> All AMD CPUs since the K5 have had > 8KiB of L1.
> Most SPARC64 CPUs except the UltraSparc T1 and T2 CPUs have > 8KiB of L1.
> Most PowerPC CPUs since the 601 seem to have > 8KiB of L1.
> All IBM POWER CPUs since at least the POWER2 have had > 8KiB of L1.
> There are too many different ARM cores for me to track. My smartphones
> and
> embedded ARM controllers all have > 8KIB of L1, but that's not enough to
> generalize.
>
> While I might've been tempted to agree with Herbert and hardwire the code
> to
> use slice by 8, there are enough CPUs out there that *could* have
too-small
> L1
> caches that I'm not comfortable with _removing_ the Kconfig option to use
a
> slimmer algorithm. I can't gate the decision on 64-bitness either, since
I've
> seen plenty of i386 CPUs that benefit from slice by 8, and the UltraSparc
T2 is
> a 64-bit processor that seems likely to suffer cache thrashing.
>
> I think having a configurable menu that steers people towards slice by 8
is
> fine. Bob, was there a reason for picking slice by 4 for 32-bit machines?
>
> D
> ---
> Allow the kernel builder to choose a crc32* algorithm for the kernel.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> ---
>
> lib/Kconfig | 43 +++++++++++++++++++++++++++++++++++++++++++
> lib/crc32defs.h | 18 ++++++++++++++++++
> 2 files changed, 61 insertions(+), 0 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index cfddafc..029c0e3 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -70,6 +70,49 @@ config CRC32_SELFTEST
> and crc32_be over byte strings with random alignment and length
> and computes the total elapsed time and number of bytes
> processed.
>
> +choice
> + prompt "CRC32 implementation"
> + depends on CRC32
> + default CRC32_SLICEBY8
> +
> +config CRC32_SLICEBY8
> + bool "Slice by 8 bytes"
> + help
> + Calculate checksum 8 bytes at a time with a clever slicing
algorithm.
> + This is the fastest algorithm, but comes with a 8KiB lookup table.
> + Most modern processors have enough cache to hold this table
> without
> + thrashing the cache.
> +
> + This is the default implementation choice. Choose this one unless
> + you have a good reason not to.
> +
> +config CRC32_SLICEBY4
> + bool "Slice by 4 bytes"
> + help
> + Calculate checksum 4 bytes at a time with a clever slicing
algorithm.
> + This is a bit slower than slice by 8, but has a smaller 4KiB
lookup
> + table.
> +
> + Only choose this option if you know what you are doing.
> +
> +config CRC32_SARWATE
> + bool "Sarwate's Algorithm (one byte at a time)"
> + help
> + Calculate checksum a byte at a time using Sarwate's algorithm.
This
> + is not particularly fast, but has a small 256 byte lookup table.
> +
> + Only choose this option if you know what you are doing.
> +
> +config CRC32_BIT
> + bool "Classic Algorithm (one bit at a time)"
> + help
> + Calculate checksum one bit at a time. This is VERY slow, but has
> + no lookup table. This is provided as a debugging option.
> +
> + Only choose this option if you are debugging crc32.
> +
> +endchoice
> +
> config CRC7
> tristate "CRC7 functions"
> help
> diff --git a/lib/crc32defs.h b/lib/crc32defs.h
> index 6fd1917..64cba2c 100644
> --- a/lib/crc32defs.h
> +++ b/lib/crc32defs.h
> @@ -13,6 +13,24 @@
> */
> #define CRC32C_POLY_LE 0x82F63B78
>
> +/* Try to choose an implementation variant via Kconfig */
> +#ifdef CONFIG_CRC32_SLICEBY8
> +# define CRC_LE_BITS 64
> +# define CRC_BE_BITS 64
> +#endif
> +#ifdef CONFIG_CRC32_SLICEBY4
> +# define CRC_LE_BITS 32
> +# define CRC_BE_BITS 32
> +#endif
> +#ifdef CONFIG_CRC32_SARWATE
> +# define CRC_LE_BITS 8
> +# define CRC_BE_BITS 8
> +#endif
> +#ifdef CONFIG_CRC32_BIT
> +# define CRC_LE_BITS 1
> +# define CRC_BE_BITS 1
> +#endif
> +
> /*
> * How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and
64.
> * For less performance-sensitive, use 4 or 8 to save table size.

2011-12-13 06:32:28

by djwong

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

On Mon, Dec 12, 2011 at 05:10:45PM -0600, Bob Pearson wrote:
> That choice was for Joakim who measured better performance on his 32 bit PPC
> platform with "by 4".

Ok. On my 1.33GHz PowerBook I get ~255MB/s with slice by 4 and ~270MB/s with
slice by 8. I think it's a PPC 7447, and definitely 32-bit. In any case, it
reports having 32K of L1D cache.

--D
>
> > -----Original Message-----
> > From: Darrick J. Wong [mailto:[email protected]]
> > Sent: Monday, December 12, 2011 4:59 PM
> > To: Herbert Xu; Bob Pearson
> > Cc: Andrew Morton; Theodore Tso; Joakim Tjernlund; linux-kernel; Andreas
> > Dilger; linux-crypto; linux-fsdevel; Mingming Cao;
> [email protected]
> > Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig
> >
> > On Fri, Dec 02, 2011 at 06:36:46PM -0800, Darrick J. Wong wrote:
> > > On Fri, Dec 02, 2011 at 08:25:05AM +0800, Herbert Xu wrote:
> > > > On Thu, Dec 01, 2011 at 12:15:17PM -0800, Darrick J. Wong wrote:
> > > > > Allow the kernel builder to choose a crc32* algorithm for the
> kernel.
> > > > >
> > > > > Signed-off-by: Darrick J. Wong <[email protected]>
> > > >
> > > > I don't like this at all. How do you expect distros or indeed
> > > > anyone to make this choice? For generic C implementations like
> > > > this we should only have one, and not many.
> > >
> > > Slice-by-8 should be picked automatically if the builder doesn't
> explicitly
> > > pick another one. The other choices are provided for people who want a
> > slimmer
> > > cache footprint. I guess I could make the Kconfig file a bit more
> explicit
> > > about slice-by-8 being default, or I guess we could just ignore this one
> > patch
> > > and thereby keeping us with the old method where anyone who wants the
> > slimmer
> > > implementations patches the #defines.
> >
> > Ok, here's a patch that makes it more explicit that sliceby8 is the
> default.
> > I expect distros and anyone else to simply hit <Enter>. The only people
> who
> > should do otherwise are people who know they are building for machines
> > that
> > have small cache sizes such that the crc table fights for cache lines with
> the
> > data being checksummed.
> >
> > I made a quick survey of CPU L1 cache quantities:
> >
> > All Intel CPUs since the Pentium MMX have > 8KiB of L1.
> > All AMD CPUs since the K5 have had > 8KiB of L1.
> > Most SPARC64 CPUs except the UltraSparc T1 and T2 CPUs have > 8KiB of L1.
> > Most PowerPC CPUs since the 601 seem to have > 8KiB of L1.
> > All IBM POWER CPUs since at least the POWER2 have had > 8KiB of L1.
> > There are too many different ARM cores for me to track. My smartphones
> > and
> > embedded ARM controllers all have > 8KIB of L1, but that's not enough to
> > generalize.
> >
> > While I might've been tempted to agree with Herbert and hardwire the code
> > to
> > use slice by 8, there are enough CPUs out there that *could* have
> too-small
> > L1
> > caches that I'm not comfortable with _removing_ the Kconfig option to use
> a
> > slimmer algorithm. I can't gate the decision on 64-bitness either, since
> I've
> > seen plenty of i386 CPUs that benefit from slice by 8, and the UltraSparc
> T2 is
> > a 64-bit processor that seems likely to suffer cache thrashing.
> >
> > I think having a configurable menu that steers people towards slice by 8
> is
> > fine. Bob, was there a reason for picking slice by 4 for 32-bit machines?
> >
> > D
> > ---
> > Allow the kernel builder to choose a crc32* algorithm for the kernel.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
> > ---
> >
> > lib/Kconfig | 43 +++++++++++++++++++++++++++++++++++++++++++
> > lib/crc32defs.h | 18 ++++++++++++++++++
> > 2 files changed, 61 insertions(+), 0 deletions(-)
> >
> > diff --git a/lib/Kconfig b/lib/Kconfig
> > index cfddafc..029c0e3 100644
> > --- a/lib/Kconfig
> > +++ b/lib/Kconfig
> > @@ -70,6 +70,49 @@ config CRC32_SELFTEST
> > and crc32_be over byte strings with random alignment and length
> > and computes the total elapsed time and number of bytes
> > processed.
> >
> > +choice
> > + prompt "CRC32 implementation"
> > + depends on CRC32
> > + default CRC32_SLICEBY8
> > +
> > +config CRC32_SLICEBY8
> > + bool "Slice by 8 bytes"
> > + help
> > + Calculate checksum 8 bytes at a time with a clever slicing
> algorithm.
> > + This is the fastest algorithm, but comes with a 8KiB lookup table.
> > + Most modern processors have enough cache to hold this table
> > without
> > + thrashing the cache.
> > +
> > + This is the default implementation choice. Choose this one unless
> > + you have a good reason not to.
> > +
> > +config CRC32_SLICEBY4
> > + bool "Slice by 4 bytes"
> > + help
> > + Calculate checksum 4 bytes at a time with a clever slicing
> algorithm.
> > + This is a bit slower than slice by 8, but has a smaller 4KiB
> lookup
> > + table.
> > +
> > + Only choose this option if you know what you are doing.
> > +
> > +config CRC32_SARWATE
> > + bool "Sarwate's Algorithm (one byte at a time)"
> > + help
> > + Calculate checksum a byte at a time using Sarwate's algorithm.
> This
> > + is not particularly fast, but has a small 256 byte lookup table.
> > +
> > + Only choose this option if you know what you are doing.
> > +
> > +config CRC32_BIT
> > + bool "Classic Algorithm (one bit at a time)"
> > + help
> > + Calculate checksum one bit at a time. This is VERY slow, but has
> > + no lookup table. This is provided as a debugging option.
> > +
> > + Only choose this option if you are debugging crc32.
> > +
> > +endchoice
> > +
> > config CRC7
> > tristate "CRC7 functions"
> > help
> > diff --git a/lib/crc32defs.h b/lib/crc32defs.h
> > index 6fd1917..64cba2c 100644
> > --- a/lib/crc32defs.h
> > +++ b/lib/crc32defs.h
> > @@ -13,6 +13,24 @@
> > */
> > #define CRC32C_POLY_LE 0x82F63B78
> >
> > +/* Try to choose an implementation variant via Kconfig */
> > +#ifdef CONFIG_CRC32_SLICEBY8
> > +# define CRC_LE_BITS 64
> > +# define CRC_BE_BITS 64
> > +#endif
> > +#ifdef CONFIG_CRC32_SLICEBY4
> > +# define CRC_LE_BITS 32
> > +# define CRC_BE_BITS 32
> > +#endif
> > +#ifdef CONFIG_CRC32_SARWATE
> > +# define CRC_LE_BITS 8
> > +# define CRC_BE_BITS 8
> > +#endif
> > +#ifdef CONFIG_CRC32_BIT
> > +# define CRC_LE_BITS 1
> > +# define CRC_BE_BITS 1
> > +#endif
> > +
> > /*
> > * How many bits at a time to use. Valid values are 1, 2, 4, 8, 32 and
> 64.
> > * For less performance-sensitive, use 4 or 8 to save table size.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-13 08:27:10

by Joakim Tjernlund

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

"Darrick J. Wong" <[email protected]> wrote on 2011/12/13 07:32:28:
>
> On Mon, Dec 12, 2011 at 05:10:45PM -0600, Bob Pearson wrote:
> > That choice was for Joakim who measured better performance on his 32 bit PPC
> > platform with "by 4".
>
> Ok. On my 1.33GHz PowerBook I get ~255MB/s with slice by 4 and ~270MB/s with
> slice by 8. I think it's a PPC 7447, and definitely 32-bit. In any case, it
> reports having 32K of L1D cache.

I tested Bobs early version on my mpc8321(266MHz, embedded CPU) and it was just
half the speed compared with current crc32.

Jocke

2011-12-13 18:43:08

by djwong

[permalink] [raw]
Subject: Re: [PATCH 14/14] crc32: Select an algorithm via kconfig

On Tue, Dec 13, 2011 at 09:27:10AM +0100, Joakim Tjernlund wrote:
> "Darrick J. Wong" <[email protected]> wrote on 2011/12/13 07:32:28:
> >
> > On Mon, Dec 12, 2011 at 05:10:45PM -0600, Bob Pearson wrote:
> > > That choice was for Joakim who measured better performance on his 32 bit PPC
> > > platform with "by 4".
> >
> > Ok. On my 1.33GHz PowerBook I get ~255MB/s with slice by 4 and ~270MB/s with
> > slice by 8. I think it's a PPC 7447, and definitely 32-bit. In any case, it
> > reports having 32K of L1D cache.
>
> I tested Bobs early version on my mpc8321(266MHz, embedded CPU) and it was just
> half the speed compared with current crc32.

I wonder, given the patch "crc32: Speed up memory table access on powerpc"
would you mind retesting to see if slice by 8 still trails slice by 4 on your
powerpc? I see that your mpc8321 has 16K of L1D cache and a 32-bit memory bus
whereas my 7447 has a 64-bit memory bus. I wonder if memory bus size could be
a defining characteristic...?

I tried it out the crc32c code on a s390x today; apparently by-8 trails by-4
there too. It's unfortunately difficult to figure out the hardware details of
whatever's going on underneath that VM.

--D
>
> Jocke
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>