In trying to get my EFI GUID Partition Table patch included into the stock
kernel, Andreas Dilger suggested it was time for some crc32 cleanup, as
the GPT patch added yet another copy of the common crc32 function. So,
here's my first pass at it. Comments welcome.
First patch:
http://domsch.com/linux/patches/linux-2.4.13-pre1-crc32-20011012.patch
linux-2.4.13-pre1.crc/include/linux/crc32.h | 62 +++++++++++
linux-2.4.13-pre1.crc/lib/Makefile | 4
linux-2.4.13-pre1.crc/lib/crc32.c | 151 ++++++++++++++++++++++++++++
linux-2.4.13-pre1.gpt/init/main.c | 2
4 files changed, 217 insertions(+), 2 deletions(-)
This patch (appended below), makes include/linux/crc32.h and
lib/crc32.c. It generates a table based on the commonly used polynomial
at init time provided a driver needs it. It changes ether_crc_le() to use
this table. It renames the commonly seen ether_crc() to be
ether_crc_be(), and puts it in crc32.h (allowing lots of copies to be
deleted).
ether_crc_be() continues to use the calculated method, rather than a table
lookup method, for computing CRC. Since I don't understand the
differences between big-endian and little-endian CRC32 functions in this
case, and I don't have access to BE systems, I tried to do the safest
thing here. If someone wishes to make this use a table lookup, please do
so.
Exactly how init_crc32() is called is subject to debate. The method
presented below avoids allocating and filling the table unless something
that uses it puts a line in lib/crc32.c. A second method would be to
have each user call init_crc32() itself, but if a driver neglects to do
such, the kernel would Oops. I'm sure there's yet a better method I
haven't thought of.
Assuming the above patch is somewhat close to correct, then a
corresponding patch to various network drivers and jffs2 can be applied
(not appended)
http://domsch.com/linux/patches/linux-2.4.13-pre1-crc32-drivers-20011012.patch
drivers/net/8139too.c | 24 --------
drivers/net/at1700.c | 22 -------
drivers/net/atp.c | 21 -------
drivers/net/au1000_eth.c | 19 ------
drivers/net/dl2k.c | 15 -----
drivers/net/dl2k.h | 1
drivers/net/dmfe.c | 84 ++----------------------------
drivers/net/epic100.c | 22 -------
drivers/net/fealnx.c | 24 --------
drivers/net/pci-skeleton.c | 25 ---------
drivers/net/pcmcia/smc91c92_cs.c | 28 ----------
drivers/net/pcmcia/xircom_tulip_cb.c | 40 --------------
drivers/net/smc9194.c | 35 +-----------
drivers/net/starfire.c | 26 ---------
drivers/net/sundance.c | 27 ---------
drivers/net/tulip/tulip_core.c | 40 +-------------
drivers/net/via-rhine.c | 23 --------
drivers/net/winbond-840.c | 19 ------
drivers/net/yellowfin.c | 24 --------
fs/jffs2/crc32.c | 97 -----------------------------------
fs/jffs2/crc32.h | 21 -------
fs/jffs2/dir.c | 2
fs/jffs2/erase.c | 2
fs/jffs2/file.c | 2
fs/jffs2/gc.c | 2
fs/jffs2/read.c | 2
fs/jffs2/readinode.c | 2
fs/jffs2/scan.c | 2
fs/jffs2/write.c | 2
29 files changed, 46 insertions(+), 607 deletions(-)
I've got GPT and common uuid patches also at
http://domsch.com/linux/patches/ (see linux-2.4.13-*) to submit pending the
inclusion of the crc32 patches.
Feedback welcomed.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!
diff -burNp --exclude='*~' --exclude='*.orig' linux-2.4.13-pre1/include/linux/crc32.h linux-2.4.13-pre1.gpt/include/linux/crc32.h
--- linux-2.4.13-pre1/include/linux/crc32.h Wed Dec 31 18:00:00 1969
+++ linux-2.4.13-pre1.crc/include/linux/crc32.h Fri Oct 12 12:20:47 2001
@@ -0,0 +1,62 @@
+/*
+ * crc32.h
+ * See linux/lib/crc32.c for license and changes
+ */
+#ifndef _LINUX_CRC32_H
+#define _LINUX_CRC32_H
+
+#include <linux/types.h>
+#include <linux/init.h>
+
+/* This is used only by init/main.c */
+extern void crc32_init(void) __init;
+
+/*
+ * This computes a 32 bit CRC of the data in the buffer, and returns the CRC.
+ * The polynomial used is 0xedb88320.
+ */
+
+extern u32 crc32 (u32 seed, const void *buf, unsigned long len);
+
+/**
+ * ether_crc_le(int length, unsigned char *data)
+ * @length - length of buffer @data
+ * @data - buffer to generate CRC32 value for
+ *
+ * The little-endian AUTODIN32 ethernet CRC calculation.
+ * ethernet_polynomial_le = 0xedb88320U;
+ * Seed of ~0, no final xor of ~0.
+ */
+static inline unsigned ether_crc_le(int length, unsigned char *data)
+{
+ return crc32(~0, data, length);
+}
+
+/**
+ * ether_crc_be(int length, unsigned char *data)
+ * @length - length of buffer @data
+ * @data - buffer to generate CRC32 value for
+ *
+ * The big-endian AUTODIN II ethernet CRC calculation.
+ * N.B. Do not use for bulk data, use a table-based routine instead.
+ */
+static inline u32 ether_crc_be(int length, unsigned char *data)
+{
+ int crc = ~0;
+ unsigned const ethernet_polynomial_be = 0x04c11db7U;
+
+ while(--length >= 0) {
+ unsigned char current_octet = *data++;
+ int bit;
+ for (bit = 0; bit < 8; bit++, current_octet >>= 1) {
+ crc = (crc << 1) ^
+ ((crc < 0) ^ (current_octet & 1) ?
+ ethernet_polynomial_be : 0);
+ }
+ }
+ return crc;
+}
+
+
+
+#endif /* _LINUX_CRC32_H */
diff -burNp --exclude='*~' --exclude='*.orig' linux-2.4.13-pre1/lib/Makefile linux-2.4.13-pre1.crc/lib/Makefile
--- linux-2.4.13-pre1/lib/Makefile Mon Sep 17 17:31:15 2001
+++ linux-2.4.13-pre1.crc/lib/Makefile Thu Oct 11 15:45:06 2001
@@ -8,9 +8,9 @@
L_TARGET := lib.a
-export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o
+export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o crc32.o
-obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o bust_spinlocks.o rbtree.o
+obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o bust_spinlocks.o rbtree.o crc32.o
obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
diff -burNp --exclude='*~' --exclude='*.orig' linux-2.4.13-pre1/lib/crc32.c linux-2.4.13-pre1.crc/lib/crc32.c
--- linux-2.4.13-pre1/lib/crc32.c Wed Dec 31 18:00:00 1969
+++ linux-2.4.13-pre1.crc/lib/crc32.c Fri Oct 12 11:48:03 2001
@@ -0,0 +1,151 @@
+/*
+ * Oct 12, 2000 Matt Domsch <[email protected]>
+ * Same crc32 function was used in 5 other places in the kernel.
+ * I made one version, and deleted the others.
+ * There are various incantations of crc32(). Some use a seed of 0 or ~0.
+ * Some xor at the end with ~0. The generic crc32() function takes
+ * seed as an argument, and doesn't xor at the end. Then individual
+ * users can do whatever they need.
+ * drivers/net/smc9194.c uses seed ~0, doesn't xor with ~0.
+ * fs/jffs2 uses seed 0, doesn't xor with ~0.
+ * fs/partitions/efi.c uses seed ~0, xor's with ~0.
+ *
+ * Dec 5, 2000 Matt Domsch <[email protected]>
+ * - Copied crc32.c from the linux/drivers/net/cipe directory.
+ * - Now pass seed as an arg
+ * - changed unsigned long to u32, added #include<linux/types.h>
+ * - changed len to be an unsigned long
+ * - changed crc32val to be a register
+ * - License remains unchanged! It's still GPL-compatable!
+ */
+
+ /* ============================================================= */
+ /* COPYRIGHT (C) 1986 Gary S. Brown. You may use this program, or */
+ /* code or tables extracted from it, as desired without restriction. */
+ /* */
+ /* First, the polynomial itself and its table of feedback terms. The */
+ /* polynomial is */
+ /* X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 */
+ /* */
+ /* Note that we take it "backwards" and put the highest-order term in */
+ /* the lowest-order bit. The X^32 term is "implied"; the LSB is the */
+ /* X^31 term, etc. The X^0 term (usually shown as "+1") results in */
+ /* the MSB being 1. */
+ /* */
+ /* Note that the usual hardware shift register implementation, which */
+ /* is what we're using (we're merely optimizing it by doing eight-bit */
+ /* chunks at a time) shifts bits into the lowest-order term. In our */
+ /* implementation, that means shifting towards the right. Why do we */
+ /* do it this way? Because the calculated CRC must be transmitted in */
+ /* order from highest-order term to lowest-order term. UARTs transmit */
+ /* characters in order from LSB to MSB. By storing the CRC this way, */
+ /* we hand it to the UART in the order low-byte to high-byte; the UART */
+ /* sends each low-bit to hight-bit; and the result is transmission bit */
+ /* by bit from highest- to lowest-order term without requiring any bit */
+ /* shuffling on our part. Reception works similarly. */
+ /* */
+ /* The feedback terms table consists of 256, 32-bit entries. Notes: */
+ /* */
+ /* The table can be generated at runtime if desired; code to do so */
+ /* is shown later. It might not be obvious, but the feedback */
+ /* terms simply represent the results of eight shift/xor opera- */
+ /* tions for all combinations of data and CRC register values. */
+ /* */
+ /* The values must be right-shifted by eight bits by the "updcrc" */
+ /* logic; the shift must be unsigned (bring in zeroes). On some */
+ /* hardware you could probably optimize the shift in assembler by */
+ /* using byte-swap instructions. */
+ /* polynomial $edb88320 */
+ /* */
+ /* -------------------------------------------------------------------- */
+
+#include <linux/crc32.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+static u32 *crc_32_tab;
+
+/**
+ * crc32_init(): generates CRC32 table for polynomial $edb88320
+ *
+ * Description: Code to compute the CRC-32 table. Borrowed from
+ * gzip-1.0.3/makecrc.c.
+ */
+#if defined(CONFIG_AT1700) || \
+ defined(CONFIG_ATP) || \
+ defined(CONFIG_DL2K) || \
+ defined(CONFIG_DM9102) || \
+ defined(CONFIG_EPIC100) || \
+ defined(CONFIG_SMC9194) || \
+ defined(CONFIG_STARFIRE) || \
+ defined(CONFIG_SUNDANCE) || \
+ defined(CONFIG_TULIP) || \
+ defined(CONFIG_YELLOWFIN) || \
+ defined(CONFIG_FS_JFFS2) || \
+ defined(CONFIG_EFI_PARTITION)
+void __init crc32_init(void)
+{
+/* Not copyrighted 1990 Mark Adler */
+
+ u32 c; /* crc shift register */
+ u32 e; /* polynomial exclusive-or pattern */
+ int i; /* counter for all possible eight bit values */
+ int k; /* byte being shifted into crc apparatus */
+ /* terms of polynomial defining this crc (except x^32): */
+ const int p[] = {0,1,2,4,5,7,8,10,11,12,16,22,23,26};
+
+ if (crc_32_tab) return;
+ else crc_32_tab = kmalloc(256*sizeof(u32), GFP_KERNEL);
+
+ /* Make exclusive-or pattern from polynomial */
+ e = 0;
+ for (i = 0; i < sizeof(p)/sizeof(int); i++)
+ e |= 1L << (31 - p[i]);
+
+ crc_32_tab[0] = 0;
+
+ for (i = 1; i < 256; i++) {
+ c = 0;
+ for (k = i | 256; k != 1; k >>= 1) {
+ c = c & 1 ? (c >> 1) ^ e : c >> 1;
+ if (k & 1)
+ c ^= e;
+ }
+ crc_32_tab[i] = c;
+ }
+}
+#else
+#define init_crc32()
+#define NEED_INIT_CRC32
+#endif
+
+/**
+ * crc32(): Generates CRC32 checksum of buffer
+ * @seed: initial value, often 0 or ~0, or the
+ * running crc if checksumming multiple buffers.
+ * @buf: buffer to be checksummed
+ * @len: length of @buf
+ *
+ */
+
+
+u32
+crc32(u32 seed, const void *buf, unsigned long len)
+{
+ unsigned long i;
+ register u32 crc32val;
+ const u8 *s = buf;
+
+#ifdef NEED_INIT_CRC32
+#error You must add the CONFIG variable for this module to lib/crc32.c:init_crc32().
+#endif
+ crc32val = seed;
+ for (i = 0; i < len; i++) {
+ crc32val = crc_32_tab[(crc32val ^ s[i]) & 0xff] ^
+ (crc32val >> 8);
+ }
+ return crc32val;
+}
+
+EXPORT_SYMBOL(crc32);
diff -burNp --exclude='*~' --exclude='*.orig' linux-2.4.13-pre1/init/main.c linux-2.4.13-pre1.gpt/init/main.c
--- linux-2.4.13-pre1/init/main.c Sat Oct 6 10:49:16 2001
+++ linux-2.4.13-pre1.gpt/init/main.c Fri Oct 12 10:40:02 2001
@@ -27,6 +27,7 @@
#include <linux/iobuf.h>
#include <linux/bootmem.h>
#include <linux/tty.h>
+#include <linux/crc32.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -608,6 +609,7 @@ asmlinkage void __init start_kernel(void
#if defined(CONFIG_SYSVIPC)
ipc_init();
#endif
+ crc32_init();
check_bugs();
printk("POSIX conformance testing by UNIFIX\n");
I like it. I even had local patches that did something similar,
creating lib/crc32, but it never got to the polished stage. Donald
Becker had also suggested at the 2.5 kernel summit that the ether_crc
stuff become generic code, so your code here follows along well with
that plan. So, after some testing, I'm definitely interested in these
patches (at least as they related to net drivers).
WRT initialization, I would suggest refcounting: driver calls
init_crc32 at module load time, and cleanup_crc32 at module removal
time. When the first reference appears, the desired poly table
is initialized. When the last reference disappears, the poly table
is kfree'd. I considered other initialization scenarios but this seems
to be the cleanest.
Jeff
More comments:
* if ether_crc is always == ether_crc_be, then create a #define instead
of patching driver code
* no need to inline ether_crc_be, stick it in lib/crc32.c also
* using a ref-counting init_crc32 and cleanup_crc32, you do not need
CONFIG_xxx tests, per driver, in the code itself. Either make
lib/crc32 a permanent part of the kernel, or make it a separate module
which is enabled by makefile rules. Example:
(linux/lib/Makefile)
obj-$(CONFIG_TULIP) += crc32.o
obj-$(CONFIG_NATSEMI) += crc32.o
obj-$(CONFIG_DMFE) += crc32.o
obj-$(CONFIG_ANOTHERDRIVER) += crc32.o
makefile rules eliminate the duplicates...
Matt Domsch <[email protected]> said:
> In trying to get my EFI GUID Partition Table patch included into the stock
> kernel, Andreas Dilger suggested it was time for some crc32 cleanup, as
> the GPT patch added yet another copy of the common crc32 function. So,
> here's my first pass at it. Comments welcome.
[...]
> This patch (appended below), makes include/linux/crc32.h and
> lib/crc32.c. It generates a table based on the commonly used polynomial
> at init time provided a driver needs it. It changes ether_crc_le() to use
> this table. It renames the commonly seen ether_crc() to be
> ether_crc_be(), and puts it in crc32.h (allowing lots of copies to be
> deleted).
You could just place the functions (each with its table) into separate .c
files, and place them in the library. The linker will then include them iff
they are referenced somewhere. OTOH, they _are_ all over the place right
now, so they could just be included unconditionally. Thinking of ways to
set up to have them included in case only modules use them makes me
shiver...
--
Dr. Horst H. von Brand Usuario #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Fri, 12 Oct 2001, Horst von Brand wrote:
> Matt Domsch <[email protected]> said:
> > In trying to get my EFI GUID Partition Table patch included into the stock
> > kernel, Andreas Dilger suggested it was time for some crc32 cleanup, as
> > the GPT patch added yet another copy of the common crc32 function. So,
> > here's my first pass at it. Comments welcome.
>
> [...]
>
> > This patch (appended below), makes include/linux/crc32.h and
> > lib/crc32.c. It generates a table based on the commonly used polynomial
> > at init time provided a driver needs it. It changes ether_crc_le() to use
> > this table. It renames the commonly seen ether_crc() to be
> > ether_crc_be(), and puts it in crc32.h (allowing lots of copies to be
> > deleted).
>
> You could just place the functions (each with its table) into separate .c
> files, and place them in the library. The linker will then include them iff
> they are referenced somewhere. OTOH, they _are_ all over the place right
> now, so they could just be included unconditionally. Thinking of ways to
> set up to have them included in case only modules use them makes me
> shiver...
Then leave that task to others ;-)
We must consider the case where the kernel is built, and then a random
3rd party module comes along that needs crc32 features. An ar library
won't cut it, and neither will [the existing crc32 method of] patching
linux/lib/crc32.c. That leaves (a) unconditionally building it into the
kernel, or (b) Makefile and Config.in rules.
Jeff
> That leaves (a) unconditionally building
> it into the kernel, or (b) Makefile and Config.in rules.
(a) is simple, but needs a 1KB malloc (or alternately, a 1KB static const
array - I've taken the approach that the malloc is better)
(b) isn't that much harder, but requires drivers to be sure to call
init_crc32 and cleanup_crc32. If somehow they manage not to do that, Oops.
I don't want to add a runtime check for the existance of the array in
crc32().
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!
On Fri, 12 Oct 2001 [email protected] wrote:
> > That leaves (a) unconditionally building
> > it into the kernel, or (b) Makefile and Config.in rules.
>
> (a) is simple, but needs a 1KB malloc (or alternately, a 1KB static const
> array - I've taken the approach that the malloc is better)
> (b) isn't that much harder, but requires drivers to be sure to call
> init_crc32 and cleanup_crc32. If somehow they manage not to do that, Oops.
> I don't want to add a runtime check for the existance of the array in
> crc32().
You are talking about the data; I was talking about the code.
I do not think kernels need the data table, kmalloc'd or statically
built, unless it will be used. That implies a refcounting scheme.
[WRT "Oops", that is a driver bug, not a case to be considered. In
Linuxland we do not write code to protect us from rogue code.]
I was pondering whether it was ok to unconditionally include the
lib/crc32.c code, regardless of need. I am leaning towards "no," which
implies Makefile and Config.in rules which must be updated for each
driver that uses crc32.
Jeff
> I was pondering whether it was ok to unconditionally include the
> lib/crc32.c code, regardless of need. I am leaning towards
> "no," which
> implies Makefile and Config.in rules which must be updated for each
> driver that uses crc32.
OK, I'm taking this approach.
> * if ether_crc is always == ether_crc_be, then create a
> #define instead of patching driver code
Done.
> * no need to inline ether_crc_be, stick it in lib/crc32.c also
Done.
> * using a ref-counting init_crc32 and cleanup_crc32
> (linux/lib/Makefile)
> obj-$(CONFIG_TULIP) += crc32.o
> obj-$(CONFIG_NATSEMI) += crc32.o
> obj-$(CONFIG_DMFE) += crc32.o
> obj-$(CONFIG_ANOTHERDRIVER) += crc32.o
Done. init_crc32() needs to be called from all drivers which call crc32()
or ether_crc_le(). I think I've got them added, but will verify and report
back after the weekend. For the curious, I've got one big patch with crc32,
GPT, efivars, and uuid stuff (separates pretty easily into multiple patches,
but for review now, it's just one).
http://domsch.com/linux/patches/big-crc32-20011012.patch
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!
* Still need init_crc32:
8139too, au1000_eth, fealnx, smc91c92_cs, xircom_tulip_cb, smc9194,
via-rhine, winbond-840,
* lib/uuid.o needs to be in lib/Makefile's export-objs
* in init/cleanup_crc32, don't hold spinlock during kmalloc/kfree.
Check out other refcounting schemes in the various fs/*.c files.
* Add a Config.in entry, so that people can manually select to compile
crc32.o as a module. This takes care of the 3rd party module case,
the module that might for example have been built some time
after the kernel itself was built.
Jeff Garzik <[email protected]> said:
[...]
> We must consider the case where the kernel is built, and then a random
> 3rd party module comes along that needs crc32 features. An ar library
> won't cut it, and neither will [the existing crc32 method of] patching
> linux/lib/crc32.c. That leaves (a) unconditionally building it into the
> kernel, or (b) Makefile and Config.in rules.
(b) won't work if my kernel has no CRC32 modules, and a random 3rd party
module needs it. So it looks like firm builtin is the only real option (a).
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616
[email protected] said:
> > That leaves (a) unconditionally building
> > it into the kernel, or (b) Makefile and Config.in rules.
> (a) is simple, but needs a 1KB malloc (or alternately, a 1KB static const
> array - I've taken the approach that the malloc is better)
Better static (less overhead in size and at runtime), initialized at build
time (you could compute it then). In case of _dire_ kernel size problems, it
can be left out anyway. AFAIU, there are now a _lot_ of copies of this
around, so you'll win overall in any case.
> (b) isn't that much harder, but requires drivers to be sure to call
> init_crc32 and cleanup_crc32. If somehow they manage not to do that, Oops.
> I don't want to add a runtime check for the existance of the array in
> crc32().
KISS: Keep It Simple, Stupid. Unless it won't cut it, that is.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616
On Fri, 12 Oct 2001, Horst von Brand wrote:
> Jeff Garzik <[email protected]> said:
> > We must consider the case where the kernel is built, and then a random
> > 3rd party module comes along that needs crc32 features. An ar library
> > won't cut it, and neither will [the existing crc32 method of] patching
> > linux/lib/crc32.c. That leaves (a) unconditionally building it into the
> > kernel, or (b) Makefile and Config.in rules.
>
> (b) won't work if my kernel has no CRC32 modules, and a random 3rd party
> module needs it. So it looks like firm builtin is the only real option (a).
Sure it will. (b) not only will work, but it is the preferred option.
lib/Makefile will contain the line
obj-$(CONFIG_CRC32) += crc32.o
and arch/$arch/config.in will contain
tristate 'Include crc32 library?' CONFIG_CRC32
On Fri, 12 Oct 2001, Horst von Brand wrote:
> [email protected] said:
> > > That leaves (a) unconditionally building
> > > it into the kernel, or (b) Makefile and Config.in rules.
>
> > (a) is simple, but needs a 1KB malloc (or alternately, a 1KB static const
> > array - I've taken the approach that the malloc is better)
>
> Better static (less overhead in size and at runtime), initialized at build
> time (you could compute it then). In case of _dire_ kernel size problems, it
> can be left out anyway. AFAIU, there are now a _lot_ of copies of this
> around, so you'll win overall in any case.
>
> > (b) isn't that much harder, but requires drivers to be sure to call
> > init_crc32 and cleanup_crc32. If somehow they manage not to do that, Oops.
> > I don't want to add a runtime check for the existance of the array in
> > crc32().
>
> KISS: Keep It Simple, Stupid. Unless it won't cut it, that is.
Currently we are only talking about one CRC table, but we can set up
two crc tables, if we consider ether_crc_be as well.
Being static definitely has the advantage of not needing the refcounting
scheme; it really depends on what the code size looks like in the end,
with and without a statically-calculated CRC table.
Jeff
On Fri, 12 Oct 2001 14:37:52 -0500 (CDT),
Jeff Garzik <[email protected]> wrote:
>(linux/lib/Makefile)
>obj-$(CONFIG_TULIP) += crc32.o
>obj-$(CONFIG_NATSEMI) += crc32.o
>obj-$(CONFIG_DMFE) += crc32.o
>obj-$(CONFIG_ANOTHERDRIVER) += crc32.o
It is better to define CONFIG_CRC32 and have the Config.in files set
CONFIG_CRC32 for selected drivers. That avoids the problem of lots of
drivers wanting to patch the same Makefile, instead the selection of
crc32 is kept with the driver selection.
lib/Makefile
obj-$(CONFIG_CRC32) += crc32.o
drivers/foo/Config.in
if [ "$CONFIG_FOO" = "y" ]; then
define_bool CONFIG_CRC32 y
fi
It is even cleaner in CML2.
require FOO implies CRC32=y
In general it is a bad idea to handle selections in the Makefile, that
is what CML is for. Makefiles should just build the code based on CML
output, not try to decide what to build.
[email protected] said:
> [email protected] said:
> > > That leaves (a) unconditionally building
> > > it into the kernel, or (b) Makefile and Config.in rules.
>
> > (a) is simple, but needs a 1KB malloc (or alternately, a 1KB static const
> > array - I've taken the approach that the malloc is better)
>
> Better static (less overhead in size and at runtime), initialized at build
> time (you could compute it then). In case of _dire_ kernel size problems, it
> can be left out anyway. AFAIU, there are now a _lot_ of copies of this
> around, so you'll win overall in any case.
>
> > (b) isn't that much harder, but requires drivers to be sure to call
> > init_crc32 and cleanup_crc32. If somehow they manage not to do that, Oops.
> > I don't want to add a runtime check for the existance of the array in
> > crc32().
I have to agree. A single global config that controls if CRC32 is available
as a static table, and an inline function that allows me to use it.
Any module that tries to use it will either not compile or if will not load
due an unresolved address.
Least impact. Simplest API. The lowest overhead for kernels that do or do
not need CRC.
We should probably provide CRC10 and CRC16 as well.
--
__O
Lineo - Where Open Meets Smart _-\<,_
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00 (_)/ (_) 88 EC A3 EE 2D 1C 15 68
Stuart Lynne <[email protected]> http://www.lineo.com 604-461-7532
On Sat, 13 Oct 2001, Keith Owens wrote:
> On Fri, 12 Oct 2001 14:37:52 -0500 (CDT),
> Jeff Garzik <[email protected]> wrote:
> >(linux/lib/Makefile)
> >obj-$(CONFIG_TULIP) += crc32.o
> >obj-$(CONFIG_NATSEMI) += crc32.o
> >obj-$(CONFIG_DMFE) += crc32.o
> >obj-$(CONFIG_ANOTHERDRIVER) += crc32.o
>
> It is better to define CONFIG_CRC32 and have the Config.in files set
> CONFIG_CRC32 for selected drivers. That avoids the problem of lots of
> drivers wanting to patch the same Makefile, instead the selection of
> crc32 is kept with the driver selection.
>
> lib/Makefile
> obj-$(CONFIG_CRC32) += crc32.o
>
> drivers/foo/Config.in
> if [ "$CONFIG_FOO" = "y" ]; then
> define_bool CONFIG_CRC32 y
> fi
>
> It is even cleaner in CML2.
> require FOO implies CRC32=y
No, because that doesn't take care of the module case (CONFIG_CRC32=m).
Note how things get a whole lot uglier when you remember that. Now
consider when CONFIG_FOO=m (implies CONFIG_CRC32=m), and then later on
in the Config.in files, CONFIG_BAR=y (which means CONFIG_CRC32 much be
switched from 'm' to 'y').
> In general it is a bad idea to handle selections in the Makefile, that
> is what CML is for. Makefiles should just build the code based on CML
> output, not try to decide what to build.
Um, whatever. That's the whole purpose of
obj-$(CONFIG_FOO) += ...
it allows the makefile to automagically decide whether or not to build
that particular module into the kernel or separately with -DMODULE. And
that decision occurs at build time, after all the 'make config' steps
have occurred, and we know exactly what modules to build.
Jeff
On Fri, 12 Oct 2001 21:36:45 -0500 (CDT),
Jeff Garzik <[email protected]> wrote:
>On Sat, 13 Oct 2001, Keith Owens wrote:
>> It is better to define CONFIG_CRC32 and have the Config.in files set
>> CONFIG_CRC32 for selected drivers. That avoids the problem of lots of
>> drivers wanting to patch the same Makefile, instead the selection of
>> crc32 is kept with the driver selection.
>
>No, because that doesn't take care of the module case (CONFIG_CRC32=m).
Don't build crc32.o as a module, if it is required at all then make it
built in. Building small service routines as modules is a waste of
space, they take up complete pages when they are loaded.
On Sat, 13 Oct 2001, Keith Owens wrote:
> On Fri, 12 Oct 2001 21:36:45 -0500 (CDT),
> Jeff Garzik <[email protected]> wrote:
> >On Sat, 13 Oct 2001, Keith Owens wrote:
> >> It is better to define CONFIG_CRC32 and have the Config.in files set
> >> CONFIG_CRC32 for selected drivers. That avoids the problem of lots of
> >> drivers wanting to patch the same Makefile, instead the selection of
> >> crc32 is kept with the driver selection.
> >
> >No, because that doesn't take care of the module case (CONFIG_CRC32=m).
>
> Don't build crc32.o as a module, if it is required at all then make it
> built in. Building small service routines as modules is a waste of
> space, they take up complete pages when they are loaded.
If a page matters then CONFIG_CRC32=y can always be chosen by the user.
I prefer not to take away choice from the users unnecessarily, in this
case because you (as I read it) find the case of a library module too
ugly to deal with in Config.in language. This type of case is going to
pop up again and again. To me for example it seems reasonable to make
bzlib or zlib into a support module, if a similar case arose.
The easy solution to all this is obviously to build crc32 into the
kernel unconditionally, and live with the kernel bloat. I don't like
kernel bloat, so I prefer the non-easy option.
Jeff
On Fri, 12 Oct 2001 21:56:42 -0500 (CDT),
Jeff Garzik <[email protected]> wrote:
>The easy solution to all this is obviously to build crc32 into the
>kernel unconditionally, and live with the kernel bloat. I don't like
>kernel bloat, so I prefer the non-easy option.
Not when your non-easy option requires every driver that needs a
library routine to patch lib/Makefile. Adding a new driver should not
require a patch to an unrelated area of the kernel, it is bad design.
It also results on overlapping patches with the attendent risk of patch
rejects. Anybody remember the common Space.c and all the problems that
file caused?
Kernel bloat is bad. A kbuild design that will cause maintainence
problems in future is even worse. Setting CONFIG_CRC32 at the time the
driver is selected is the cleanest solution.
And, just when I thought I understood the crc32 stuff, here's an even better
explanation/code/etc. With thanks.
-Matt
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Friday, October 12, 2001 10:06 PM
Subject: Here's table-optimized crc32 code for both ways...
I think just having these in the kernel unconditionally is best.
This version allows various space/time tradeoffs.
For a RAM kernel, the initialization code is smaller than the tables,
and can be made initcode. For a ROM kernel, it would make sense
to precompute the tables and compile them in.
#include <stddef.h> /* For size_t */
typedef unsigned _u32;
/*
* This code is in the public domain; copyright abandoned.
* Liability for non-performance of this code is limited to the amount
* you paid for it. Since it is distributed for free, your refund will
* be very very small. If it breaks, you get to keep both pieces.
*/
/*
* There are multiple 16-bit CRC polynomials in common use, but this is
* *the* standard CRC-32 polynomial, first popularized by Ethernet.
* x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0
*/
#define CRCPOLY_LE 0xedb88320
#define CRCPOLY_BE 0x04c11db7
/* How many bits at a time to use. Requires a table of 4<<CRC_xx_BITS
bytes. */
#define CRC_LE_BITS 8
#define CRC_BE_BITS 4 /* Half the speed, 960 bytes less space */
/*
* Little-endian CRC computation. Used with serial bit streams sent
* lsbit-first. Be sure to use cpu_to_le32() to append the computed CRC.
*/
#if CRC_LE_BITS > 8 || CRC_LE_BITS < 1 || CRC_LE_BITS & CRC_LE_BITS-1
# error CRC_LE_BITS must be a power of 2 between 1 and 8
#endif
#if CRC_LE_BITS == 1
/*
* In fact, the table-based code will work in this case, but it can be
* simplified by inlining the table in ?: form.
*/
void
crc32init_le(void)
{/* no-op */;}
_u32
crc32_le(_u32 crc, unsigned char const *p, size_t len)
{
int i;
while (len--) {
crc ^= *p++;
for (i = 0; i < i; i++)
crc = (crc >> 1) ^ (crc & 1 ? CRCPOLY_LE : 0);
}
return crc;
}
#else /* Table-based approach */
_u32 crc32table_le[1<<CRC_LE_BITS];
/*
* crc is the crc of the byte i; other entries are filled in based on the
* fact that crctable[i^j] = crctable[i] ^ crctable[j].
*
* Note that the _init functions never write anything but the final correct
* value to each table entry, so they're safe to call repeatedly, even if
* someone else is currently using the table.
*/
void
crc32init_le(void)
{
unsigned i, j;
_u32 crc = 1;
crc32table_le[0] = 0;
for (i = 1 ; i < 1<<CRC_LE_BITS; i <<= 1) {
crc = (crc >> 1) ^ (crc & 1 ? CRCPOLY_LE : 0);
for (j = 0; j < i; j++)
crc32table_le[i+j] = crc ^ crc32table_le[j];
}
}
_u32
crc32_le(_u32 crc, unsigned char const *p, size_t len)
{
while (len--) {
# if CRC_LE_BITS == 8
crc = (crc >> 8) ^ crc32table_le[(crc ^ *p++) & 255];
# elif CRC_LE_BITS == 4
crc ^= *p++;
crc = (crc >> 4) ^ crc32table_le[crc & 15];
crc = (crc >> 4) ^ crc32table_le[crc & 15];
# elif CRC_LE_BITS == 2
crc ^= *p++;
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
# endif
}
return crc;
}
#endif
/*
* Big-endian CRC computation. Used with serial bit streams sent
* msbit-first. Be sure to use cpu_to_be32() to append the computed CRC.
*/
#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
# error CRC_BE_BITS must be a power of 2 between 1 and 8
#endif
#if CRC_BE_BITS == 1
/*
* In fact, the table-based code will work in this case, but it can be
* simplified by inlining the table in ?: form.
*/
void
crc32init_be(void)
{/*no-op*/;}
_u32
crc32_be(_u32 crc, unsigned char const *p, size_t len)
{
int i;
while (len--) {
crc ^= *p++ << 24;
for (i = 0; i < i; i++)
crc = (crc << 1) ^ (crc & 0x80000000 ? CRCPOLY_BE :
0);
}
return crc;
}
#else /* Table-based approach */
_u32 crc32table_be[256];
void
crc32init_be(void)
{
unsigned i, j;
_u32 crc = 0x80000000;
crc32table_le[0] = 0;
for (i = 1<<(CRC_BE_BITS-1); i; i >>= 1) {
crc = (crc << 1) ^ (crc & 0x80000000 ? CRCPOLY_BE : 0);
for (j = 0; j < 1<<CRC_BE_BITS; j += 2*i)
crc32table_le[i+j] = crc ^ crc32table_le[j];
}
}
_u32
crc32_be(_u32 crc, unsigned char const *p, size_t len)
{
while (len--) {
# if CRC_BE_BITS == 8
crc = (crc << 8) ^ crc32table_be[(crc >> 24) ^ *p++];
# elif CRC_BE_BITS == 4
crc ^= *p++ << 24;
crc = (crc << 4) ^ crc32table_be[crc >> 28];
crc = (crc << 4) ^ crc32table_be[crc >> 28];
# elif CRC_BE_BITS == 2
crc ^= *p++ << 24;
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
# endif
}
return crc;
}
#endif
/*
* A brief CRC tutorial.
*
* A CRC is a long-division remainder. You add the CRC to the message,
* and the whole thing (message+CRC) is a multiple of the given
* CRC polynomial. To check the CRC, you can either check that the
* CRC matches the recomputed value, *or* you can check that the
* remainder computed on the message+CRC is 0. This latter approach
* is used by a lot of hardware implementations, and is why so many
* protocols put the end-of-frame flag after the CRC.
*
* It's actually the same long division you learned in school, except that
* - We're working in binary, so the digits are only 0 and 1, and
* - When dividing polynomials, there are no carries. Rather than add and
* subtract, we just xor. Thus, we tend to get a bit sloppy about
* the difference between adding and subtracting.
*
* A 32-bit CRC polynomial is actually 33 bits long. But since it's
* 33 bits long, bit 32 is always going to be set, so usually the CRC
* is written in hex with the most significant bit omitted. (If you're
* familiar with the IEEE 754 floating-point format, it's the same idea.)
*
* Note that a CRC is computed over a string of *bits*, so you have
* to decide on the endianness of the bits within each byte. To get
* the best error-detecting properties, this should correspond to the
* order they're actually sent. For example, standard RS-232 serial is
* little-endian; the most significant bit (sometimes used for parity)
* is sent last. And when appending a CRC word to a message, you should
* do it in the right order, matching the endianness.
*
* Just like with ordinary division, the remainder is always smaller than
* the divisor (the CRC polynomial) you're dividing by. Each step of the
* division, you take one more digit (bit) of the dividend and append it
* to the current remainder. Then you figure out the appropriate multiple
* of the divisor to subtract to being the remainder back into range.
* In binary, it's easy - it has to be either 0 or 1, and to make the
* XOR cancel, it's just a copy of bit 32 of the remainder.
*
* When computing a CRC, we don't care about the quotient, so we can
* throw the quotient bit away, but subtract the appropriate multiple of
* the polynomial from the remainder and we're back to where we started,
* ready to process the next bit.
*
* A big-endian CRC written this way would be coded like:
* for (i = 0; i < input_bits; i++) {
* multiple = remainder & 0x80000000 ? CRCPOLY : 0;
* remainder = (remainder << 1 | next_input_bit()) ^ multiple;
* }
* Notice how, to get at bit 32 of the shifted remainder, we look
* at bit 31 of the remainder *before* shifting it.
*
* But also notice how the next_input_bit() bits we're shifting into
* the remainder don't actually affect any decision-making until
* 32 bits later. Thus, the first 32 cycles of this are pretty boring.
* Also, to add the CRC to a message, we need a 32-bit-long hole for it at
* the end, so we have to add 32 extra cycles shifting in zeros at the
* end of every message,
*
* So the standard trick is to rearrage merging in the next_input_bit()
* until the moment it's needed. Then the first 32 cycles can be
precomputed,
* and merging in the final 32 zero bits to make room for the CRC can be
* skipped entirely.
* This changes the code to:
* for (i = 0; i < input_bits; i++) {
* remainder ^= next_input_bit() << 31;
* multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* With this optimization, the little-endian code is simpler:
* for (i = 0; i < input_bits; i++) {
* remainder ^= next_input_bit();
* multiple = (remainder & 1) ? CRCPOLY : 0;
* remainder = (remainder >> 1) ^ multiple;
* }
*
* Note that the other details of endianness have been hidden in CRCPOLY
* (which must be bit-reversed) and next_input_bit().
*
* However, as long as next_input_bit is returning the bits in a sensible
* order, we can actually do the merging 8 or more bits at a time rather
* than one bit at a time:
* for (i = 0; i < input_bytes; i++) {
* remainder ^= next_input_byte() << 24;
* for (j = 0; j < 8; j++) {
* multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* }
* Or in little-endian:
* for (i = 0; i < input_bytes; i++) {
* remainder ^= next_input_byte();
* for (j = 0; j < 8; j++) {
* multiple = (remainder & 1) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* }
* If the input is a multiple of 32 bits, you can even XOR in a 32-bit
* word at a time and increase the inner loop count to 32.
*
* You can also mix and match the two loop styles, for example doing the
* bulk of a message byte-at-a-time and adding bit-at-a-time processing
* for any fractional bytes at the end.
*
* The only remaining optimization is to the byte-at-a-time table method.
* Here, rather than just shifting one bit of the remainder to decide
* in the correct multiple to subtract, we can shift a byte at a time.
* This produces a 40-bit (rather than a 33-bit) intermediate remainder,
* but again the multiple of the polynomial to subtract depends only on
* the high bits, the high 8 bits in this case.
*
* The multile we need in that case is the low 32 bits of a 40-bit
* value whose high 8 bits are given, and which is a multiple of the
* generator polynomial. This is simply the CRC-32 of the given
* one-byte message.
*
* Two more details: normally, appending zero bits to a message which
* is already a multiple of a polynomial produces a larger multiple of that
* polynomial. To enable a CRC to detect this condition, it's common to
* invert the CRC before appending it. This makes the remainder of the
* message+crc come out not as zero, but some fixed non-zero value.
*
* The same problem applies to zero bits prepended to the message, and
* a similar solution is used. Instead of starting with a remainder of
* 0, an initial remainder of all ones is used. As long as you start
* the same way on decoding, it doesn't make a difference.
*/
Jeff Garzik <[email protected]> said:
[...]
> I was pondering whether it was ok to unconditionally include the
> lib/crc32.c code, regardless of need. I am leaning towards "no," which
> implies Makefile and Config.in rules which must be updated for each
> driver that uses crc32.
It is easier in that case just to make it another module.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616
I think this summarizes the problems with the crc code
1. Should the crc code be included in the kernel generally ?
2. If not - should the user or the driver modules change the behaviour of
the kernel build process ?
3. How do we provide crc functions for modules not included in the main
kernel tree ?
4. Should we use precompiled tables (bigger kernel) or use more compex
(slower) algorithms ?
I think the code should generally be included in the kernel - we can add
somewhere an option like "Remove crc code from kernel if not used" marked
as "dangerous". So the user can force removal if he is sure, he will never
use the code. The drivers included in the kernel may still enable the
code, if needed, but extern drivers won't work.
To safe space or gain performance we should add a select option next to
the "remove crc" one ("Optimize crc code for space/performance")
Comments
Jan-Marek Glogowski
In article <Pine.LNX.4.30.0110131131200.32076-100000@stud.fbi.fh-darmstadt.de>,
Jan-Marek Glogowski <[email protected]> writes:
> Comments
Just use the existing linker features. Link the crc code as an .a library.
If some code needs it it'll get included. If it needs initialization
use the existing __initcall() setup. It'll generate a call to the
initialization function when it is linked in, and none if it is not.
[Note that __initcall may be a bit tricky here if some other __initcall
user like an ethernet driver needs crc32 too; there is unfortunately no
priority mechanism in __initcall to enforce that the crc32 init runs before
the other initcalls. best would probably just to use a static table to avoid
this issue]
-Andi
That doesn't solve the problem, if your kernel doesn't use the crc
functions but an externel one (special driver - eventually binary) module
needs the code. The external driver may give a message like "crc functions
needed - please recompile your kernel with crc support". Is this ok for
the normal user ?
> Just use the existing linker features. Link the crc code as an .a library.
> If some code needs it it'll get included. If it needs initialization
> use the existing __initcall() setup. It'll generate a call to the
> initialization function when it is linked in, and none if it is not.
>
> [Note that __initcall may be a bit tricky here if some other __initcall
> user like an ethernet driver needs crc32 too; there is unfortunately no
> priority mechanism in __initcall to enforce that the crc32 init runs before
> the other initcalls. best would probably just to use a static table to avoid
> this issue]
Jan-Marek
On 13 Oct 2001 12:02:16 +0200,
Andi Kleen <[email protected]> wrote:
>Just use the existing linker features. Link the crc code as an .a library.
Does not work if all the code that uses crc32 is in a module. No
references from the main kernel so crc32 is not included by the linker.
>[Note that __initcall may be a bit tricky here if some other __initcall
>user like an ethernet driver needs crc32 too; there is unfortunately no
>priority mechanism in __initcall to enforce that the crc32 init runs before
>the other initcalls. best would probably just to use a static table to avoid
>this issue]
???! __initcall entries are executed in the order that they are linked
into the kernel. The linkage order is controlled by the order that
Makefiles are processed during kbuild and by line order within each
Makefile. There is definitely a priority order for __initcall code.
Jeff Garzik <[email protected]> said:
> On Fri, 12 Oct 2001, Horst von Brand wrote:
> > Jeff Garzik <[email protected]> said:
> > > We must consider the case where the kernel is built, and then a random
> > > 3rd party module comes along that needs crc32 features. An ar library
> > > won't cut it, and neither will [the existing crc32 method of] patching
> > > linux/lib/crc32.c. That leaves (a) unconditionally building it into the
> > > kernel, or (b) Makefile and Config.in rules.
> > (b) won't work if my kernel has no CRC32 modules, and a random 3rd party
> > module needs it. So it looks like firm builtin is the only real option (a).
> Sure it will. (b) not only will work, but it is the preferred option.
3rd party module, added _after_ the fact?
In any case, either this stuff is small (so adding it into the base kernel
is no big deal if needed somewhere) or it is large (in which case the
overhead of a module is no big deal). I'd vote for using the existing
infrastructure for handling transient code/data if needed, not adding yet
another scheme with refcounting &c just for this particular use.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616
[email protected] wrote on 12.10.01 in <[email protected]>:
> And, just when I thought I understood the crc32 stuff, here's an even better
> explanation/code/etc. With thanks.
Except for one thing:
> for (i = 0; i < i; i++)
I don't think this does what was intended. Should that perhaps be
> for (i = 0; i < 1; i++)
? That would mean one pass through the loop, whereas the original does no
passes through the loop.
MfG Kai
Horst von Brand wrote:
>
> Jeff Garzik <[email protected]> said:
>
> [...]
>
> > I was pondering whether it was ok to unconditionally include the
> > lib/crc32.c code, regardless of need. I am leaning towards "no," which
> > implies Makefile and Config.in rules which must be updated for each
> > driver that uses crc32.
>
> It is easier in that case just to make it another module.
That will waste space due to page granularity.
Jan-Marek Glogowski <[email protected]> said:
> That doesn't solve the problem, if your kernel doesn't use the crc
> functions but an externel one (special driver - eventually binary) module
> needs the code. The external driver may give a message like "crc functions
> needed - please recompile your kernel with crc support". Is this ok for
> the normal user ?
Make it a configure option, forced by drivers that need them and up to the
builder otherwise (default to "yes", I'd suppose).
Going a bit further, there are several cases of "library" stuff in the
kernel (CRC, gzip come to mind). Dunno if it is enough stuff to set up a
specific policy and perhaps CML support (or suggestions on how to use CML2
for this case).
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616
> I don't think this does what was intended. Should that perhaps be
>
> > for (i = 0; i < 1; i++)
>
Good catch. [email protected] sent me a cleaned up version.
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!
That'll teach me to clean up code by hand and not test it afterwards.
That's supposed to be "i < 8" in that for() loop, but there are a couple
of other glitches, too. In particular, I got bits of the
table-initialization
code transposed when I generalized it for less than 8 bits.
Here's a *tested* version, with test harness (if compiled with -DUNITTEST).
#include <stddef.h> /* For size_t */
typedef unsigned _u32;
#if __GNUC__ >= 3 /* 2.x has "attribute", but only 3.0 has "pure */
#define attribute(x) __attribute__(x)
#else
#define attribute(x)
#endif
/*
* This code is in the public domain; copyright abandoned.
* Liability for non-performance of this code is limited to the amount
* you paid for it. Since it is distributed for free, your refund will
* be very very small. If it breaks, you get to keep both pieces.
*/
/*
* There are multiple 16-bit CRC polynomials in common use, but this is
* *the* standard CRC-32 polynomial, first popularized by Ethernet.
* x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0
*/
#define CRCPOLY_LE 0xedb88320
#define CRCPOLY_BE 0x04c11db7
/* How many bits at a time to use. Requires a table of 4<<CRC_xx_BITS
bytes. */
#define CRC_LE_BITS 8
#define CRC_BE_BITS 4 /* Less performance-sensitive */
/*
* Little-endian CRC computation. Used with serial bit streams sent
* lsbit-first. Be sure to use cpu_to_le32() to append the computed CRC.
*/
#if CRC_LE_BITS > 8 || CRC_LE_BITS < 1 || CRC_LE_BITS & CRC_LE_BITS-1
# error CRC_LE_BITS must be a power of 2 between 1 and 8
#endif
#if CRC_LE_BITS == 1
/*
* In fact, the table-based code will work in this case, but it can be
* simplified by inlining the table in ?: form.
*/
void
crc32init_le(void)
{/* no-op */;}
_u32 attribute((pure))
crc32_le(_u32 crc, unsigned char const *p, size_t len)
{
int i;
while (len--) {
crc ^= *p++;
for (i = 0; i < 8; i++)
crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
}
return crc;
}
#else /* Table-based approach */
_u32 crc32table_le[1<<CRC_LE_BITS];
/*
* crc is the crc of the byte i; other entries are filled in based on the
* fact that crctable[i^j] = crctable[i] ^ crctable[j].
*
* Note that the _init functions never write anything but the final correct
* value to each table entry, so they're safe to call repeatedly, even if
* someone else is currently using the table.
*/
void
crc32init_le(void)
{
unsigned i, j;
_u32 crc = 1;
crc32table_le[0] = 0;
for (i = 1<<(CRC_LE_BITS-1); i; i >>= 1) {
crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
for (j = 0; j < 1<<CRC_LE_BITS; j += 2*i)
crc32table_le[i+j] = crc ^ crc32table_le[j];
}
}
_u32 attribute((pure))
crc32_le(_u32 crc, unsigned char const *p, size_t len)
{
while (len--) {
# if CRC_LE_BITS == 8
crc = (crc >> 8) ^ crc32table_le[(crc ^ *p++) & 255];
# elif CRC_LE_BITS == 4
crc ^= *p++;
crc = (crc >> 4) ^ crc32table_le[crc & 15];
crc = (crc >> 4) ^ crc32table_le[crc & 15];
# elif CRC_LE_BITS == 2
crc ^= *p++;
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
crc = (crc >> 2) ^ crc32table_le[crc & 3];
# endif
}
return crc;
}
#endif
/*
* Big-endian CRC computation. Used with serial bit streams sent
* msbit-first. Be sure to use cpu_to_be32() to append the computed CRC.
*/
#if CRC_BE_BITS > 8 || CRC_BE_BITS < 1 || CRC_BE_BITS & CRC_BE_BITS-1
# error CRC_BE_BITS must be a power of 2 between 1 and 8
#endif
#if CRC_BE_BITS == 1
/*
* In fact, the table-based code will work in this case, but it can be
* simplified by inlining the table in ?: form.
*/
void
crc32init_be(void)
{/*no-op*/;}
_u32 attribute((pure))
crc32_be(_u32 crc, unsigned char const *p, size_t len)
{
int i;
while (len--) {
crc ^= *p++ << 24;
for (i = 0; i < 8; i++)
crc = (crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE
: 0);
}
return crc;
}
#else /* Table-based approach */
_u32 crc32table_be[256];
void
crc32init_be(void)
{
unsigned i, j;
_u32 crc = 0x80000000;
crc32table_be[0] = 0;
for (i = 1 ; i < 1<<CRC_BE_BITS; i <<= 1) {
crc = (crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE : 0);
for (j = 0; j < i; j++)
crc32table_be[i+j] = crc ^ crc32table_be[j];
}
}
_u32 attribute((pure))
crc32_be(_u32 crc, unsigned char const *p, size_t len)
{
while (len--) {
# if CRC_BE_BITS == 8
crc = (crc << 8) ^ crc32table_be[(crc >> 24) ^ *p++];
# elif CRC_BE_BITS == 4
crc ^= *p++ << 24;
crc = (crc << 4) ^ crc32table_be[crc >> 28];
crc = (crc << 4) ^ crc32table_be[crc >> 28];
# elif CRC_BE_BITS == 2
crc ^= *p++ << 24;
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
crc = (crc << 2) ^ crc32table_be[crc >> 30];
# endif
}
return crc;
}
#endif
/*
* A brief CRC tutorial.
*
* A CRC is a long-division remainder. You add the CRC to the message,
* and the whole thing (message+CRC) is a multiple of the given
* CRC polynomial. To check the CRC, you can either check that the
* CRC matches the recomputed value, *or* you can check that the
* remainder computed on the message+CRC is 0. This latter approach
* is used by a lot of hardware implementations, and is why so many
* protocols put the end-of-frame flag after the CRC.
*
* It's actually the same long division you learned in school, except that
* - We're working in binary, so the digits are only 0 and 1, and
* - When dividing polynomials, there are no carries. Rather than add and
* subtract, we just xor. Thus, we tend to get a bit sloppy about
* the difference between adding and subtracting.
*
* A 32-bit CRC polynomial is actually 33 bits long. But since it's
* 33 bits long, bit 32 is always going to be set, so usually the CRC
* is written in hex with the most significant bit omitted. (If you're
* familiar with the IEEE 754 floating-point format, it's the same idea.)
*
* Note that a CRC is computed over a string of *bits*, so you have
* to decide on the endianness of the bits within each byte. To get
* the best error-detecting properties, this should correspond to the
* order they're actually sent. For example, standard RS-232 serial is
* little-endian; the most significant bit (sometimes used for parity)
* is sent last. And when appending a CRC word to a message, you should
* do it in the right order, matching the endianness.
*
* Just like with ordinary division, the remainder is always smaller than
* the divisor (the CRC polynomial) you're dividing by. Each step of the
* division, you take one more digit (bit) of the dividend and append it
* to the current remainder. Then you figure out the appropriate multiple
* of the divisor to subtract to being the remainder back into range.
* In binary, it's easy - it has to be either 0 or 1, and to make the
* XOR cancel, it's just a copy of bit 32 of the remainder.
*
* When computing a CRC, we don't care about the quotient, so we can
* throw the quotient bit away, but subtract the appropriate multiple of
* the polynomial from the remainder and we're back to where we started,
* ready to process the next bit.
*
* A big-endian CRC written this way would be coded like:
* for (i = 0; i < input_bits; i++) {
* multiple = remainder & 0x80000000 ? CRCPOLY : 0;
* remainder = (remainder << 1 | next_input_bit()) ^ multiple;
* }
* Notice how, to get at bit 32 of the shifted remainder, we look
* at bit 31 of the remainder *before* shifting it.
*
* But also notice how the next_input_bit() bits we're shifting into
* the remainder don't actually affect any decision-making until
* 32 bits later. Thus, the first 32 cycles of this are pretty boring.
* Also, to add the CRC to a message, we need a 32-bit-long hole for it at
* the end, so we have to add 32 extra cycles shifting in zeros at the
* end of every message,
*
* So the standard trick is to rearrage merging in the next_input_bit()
* until the moment it's needed. Then the first 32 cycles can be
precomputed,
* and merging in the final 32 zero bits to make room for the CRC can be
* skipped entirely.
* This changes the code to:
* for (i = 0; i < input_bits; i++) {
* remainder ^= next_input_bit() << 31;
* multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* With this optimization, the little-endian code is simpler:
* for (i = 0; i < input_bits; i++) {
* remainder ^= next_input_bit();
* multiple = (remainder & 1) ? CRCPOLY : 0;
* remainder = (remainder >> 1) ^ multiple;
* }
*
* Note that the other details of endianness have been hidden in CRCPOLY
* (which must be bit-reversed) and next_input_bit().
*
* However, as long as next_input_bit is returning the bits in a sensible
* order, we can actually do the merging 8 or more bits at a time rather
* than one bit at a time:
* for (i = 0; i < input_bytes; i++) {
* remainder ^= next_input_byte() << 24;
* for (j = 0; j < 8; j++) {
* multiple = (remainder & 0x80000000) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* }
* Or in little-endian:
* for (i = 0; i < input_bytes; i++) {
* remainder ^= next_input_byte();
* for (j = 0; j < 8; j++) {
* multiple = (remainder & 1) ? CRCPOLY : 0;
* remainder = (remainder << 1) ^ multiple;
* }
* }
* If the input is a multiple of 32 bits, you can even XOR in a 32-bit
* word at a time and increase the inner loop count to 32.
*
* You can also mix and match the two loop styles, for example doing the
* bulk of a message byte-at-a-time and adding bit-at-a-time processing
* for any fractional bytes at the end.
*
* The only remaining optimization is to the byte-at-a-time table method.
* Here, rather than just shifting one bit of the remainder to decide
* in the correct multiple to subtract, we can shift a byte at a time.
* This produces a 40-bit (rather than a 33-bit) intermediate remainder,
* but again the multiple of the polynomial to subtract depends only on
* the high bits, the high 8 bits in this case.
*
* The multile we need in that case is the low 32 bits of a 40-bit
* value whose high 8 bits are given, and which is a multiple of the
* generator polynomial. This is simply the CRC-32 of the given
* one-byte message.
*
* Two more details: normally, appending zero bits to a message which
* is already a multiple of a polynomial produces a larger multiple of that
* polynomial. To enable a CRC to detect this condition, it's common to
* invert the CRC before appending it. This makes the remainder of the
* message+crc come out not as zero, but some fixed non-zero value.
*
* The same problem applies to zero bits prepended to the message, and
* a similar solution is used. Instead of starting with a remainder of
* 0, an initial remainder of all ones is used. As long as you start
* the same way on decoding, it doesn't make a difference.
*/
#if UNITTEST
#include <stdlib.h>
#include <stdio.h>
#if 0 /*Not used at present */
static void
buf_dump(char const *prefix, unsigned char const *buf, size_t len)
{
fputs(prefix, stdout);
while (len--)
printf(" %02x", *buf++);
putchar('\n');
}
#endif
static _u32 attribute((const))
bitreverse(_u32 x)
{
x = (x >> 16) | (x << 16);
x = (x >> 8 & 0x00ff00ff) | (x << 8 & 0xff00ff00);
x = (x >> 4 & 0x0f0f0f0f) | (x << 4 & 0xf0f0f0f0);
x = (x >> 2 & 0x33333333) | (x << 2 & 0xcccccccc);
x = (x >> 1 & 0x55555555) | (x << 1 & 0xaaaaaaaa);
return x;
}
static void
bytereverse(unsigned char *buf, size_t len)
{
while (len--) {
unsigned char x = *buf;
x = (x >> 4) | (x << 4);
x = (x >> 2 & 0x33) | (x << 2 & 0xcc);
x = (x >> 1 & 0x55) | (x << 1 & 0xaa);
*buf++ = x;
}
}
static void
random_garbage(unsigned char *buf, size_t len)
{
while (len--)
*buf++ = (unsigned char)random();
}
#if 0 /* Not used at present */
static void
store_le(_u32 x, unsigned char *buf)
{
buf[0] = (unsigned char)x;
buf[1] = (unsigned char)(x >> 8);
buf[2] = (unsigned char)(x >> 16);
buf[3] = (unsigned char)(x >> 24);
}
#endif
static void
store_be(_u32 x, unsigned char *buf)
{
buf[0] = (unsigned char)(x >> 24);
buf[1] = (unsigned char)(x >> 16);
buf[2] = (unsigned char)(x >> 8);
buf[3] = (unsigned char)x;
}
/*
* This checks that CRC(buf + CRC(buf)) = 0, and that
* CRC commutes with bit-reversal. This has the side effect
* of bytewise bit-reversing the input buffer, and returns
* the CRC of the reversed buffer.
*/
static _u32
test_step(_u32 init, unsigned char *buf, size_t len)
{
_u32 crc1, crc2;
size_t i;
crc1 = crc32_be(init, buf, len);
store_be(crc1, buf+len);
crc2 = crc32_be(init, buf, len+4);
if (crc2)
printf("\nCRC cancellation fail: 0x%08x should be 0\n",
crc2);
for (i = 0; i <= len+4; i++) {
crc2 = crc32_be(init, buf, i);
crc2 = crc32_be(crc2, buf+i, len+4-i);
if (crc2)
printf("\nCRC split fail: 0x%08x\n", crc2);
}
/* Now swap it around for the other test */
bytereverse(buf, len+4);
init = bitreverse(init);
crc2 = bitreverse(crc1);
if (crc1 != bitreverse(crc2))
printf("\nBit reversal fail: 0x%08x -> %0x08x -> 0x%08x\n",
crc1, crc2, bitreverse(crc2));
crc1 = crc32_le(init, buf, len);
if (crc1 != crc2)
printf("\nCRC endianness fail: 0x%08x != 0x%08x\n", crc1,
crc2);
crc2 = crc32_le(init, buf, len+4);
if (crc2)
printf("\nCRC cancellation fail: 0x%08x should be 0\n",
crc2);
for (i = 0; i <= len+4; i++) {
crc2 = crc32_le(init, buf, i);
crc2 = crc32_le(crc2, buf+i, len+4-i);
if (crc2)
printf("\nCRC split fail: 0x%08x\n", crc2);
}
return crc1;
}
#define SIZE 64
#define INIT1 0
#define INIT2 0
int
main(void)
{
unsigned char buf1[SIZE+4];
unsigned char buf2[SIZE+4];
unsigned char buf3[SIZE+4];
int i, j;
_u32 crc1, crc2, crc3;
crc32init_le();
crc32init_be();
for (i = 0; i <= SIZE; i++) {
printf("\rTesting length %d...", i);
fflush(stdout);
random_garbage(buf1, i);
random_garbage(buf2, i);
for (j = 0; j < i; j++)
buf3[j] = buf1[j] ^ buf2[j];
crc1 = test_step(INIT1, buf1, i);
crc2 = test_step(INIT2, buf2, i);
/* Now check that CRC(buf1 ^ buf2) = CRC(buf1) ^ CRC(buf2)
*/
crc3 = test_step(INIT1^INIT2, buf3, i);
if (crc3 != (crc1 ^ crc2))
printf("CRC XOR fail: 0x%08x != 0x%08x ^ 0x%08x\n",
crc3, crc1, crc2);
}
printf("\nAll test complete. No failures expected.\n");
return 0;
}
#endif /* UNITTEST */
On Sat, 13 Oct 2001, Keith Owens wrote:
> Does not work if all the code that uses crc32 is in a module. No
> references from the main kernel so crc32 is not included by the linker.
So make the CRC32 code a module itself ?
> ???! __initcall entries are executed in the order that they are linked
> into the kernel. The linkage order is controlled by the order that
> Makefiles are processed during kbuild and by line order within each
> Makefile. There is definitely a priority order for __initcall code.
That is in practice an unuseable "priority" (I'd like to consider that a
highly stochastic variable :)
Not to mention that as an individual sub-project maintainer you can't go
around changing higher level makefiles all the time just to get your
particular initcall chain in order (again, in practice).
You could _conceivably_ build an initcall dependency system by adding some
"initcall_requires" macros which put the dependant other calls into
another linker table, which the kernel would resolve at boot.
/BW
> > Does not work if all the code that uses crc32 is in a module. No
> > references from the main kernel so crc32 is not included by the linker.
>
> So make the CRC32 code a module itself ?
Nasty idea, since its ideologically unclean.
The idea behind the CONFIG_xxx options is to give user the control over
the kernel content in the cases he really can win from it.
I suppose no sane user knows what crc32 is needed by, nor even whats it..
Nor he should, since maximum of his knowledge lies in device selection,
which is the proper source for such information.
What i`m trying to say is that CONFIG_xxx is just the wrong information
bearer for that.
Cheers, Samium Gromoff
On Mon, 15 Oct 2001 10:29:39 +0200 (MET DST),
Bjorn Wesen <[email protected]> wrote:
>On Sat, 13 Oct 2001, Keith Owens wrote:
>> ???! __initcall entries are executed in the order that they are linked
>> into the kernel. The linkage order is controlled by the order that
>> Makefiles are processed during kbuild and by line order within each
>> Makefile. There is definitely a priority order for __initcall code.
>
>Not to mention that as an individual sub-project maintainer you can't go
>around changing higher level makefiles all the time just to get your
>particular initcall chain in order (again, in practice).
>
>You could _conceivably_ build an initcall dependency system by adding some
>"initcall_requires" macros which put the dependant other calls into
>another linker table, which the kernel would resolve at boot.
Absolutely agree. I would love to separate the initcall order from
directory and Makefile order, using a clean and well documented method
of describing initialisation order. But there is one massive problem
with changing the existing method, .... Linus likes it this way.
Maybe after kbuild 2.5 is in. I have learnt to fight one battle at a
time.
Changes since last time:
* A lot more drivers were updated. These had in-lined the CRC32 code
(either BE or LE). I think I got them all except natsemi, which I was
afraid I'd screw up.
* Incorporated the new crc32 library routines. Both BE and LE now use a
table. LE is faster, BE uses less space. It's trivial to make this
tradeoff now.
> * Still need init_crc32:
> 8139too, au1000_eth, fealnx, smc91c92_cs, xircom_tulip_cb, smc9194,
> via-rhine, winbond-840
Now that the BE functions use a table, yes.
> * lib/uuid.o needs to be in lib/Makefile's export-objs
Oops, good catch. I meant to submit uuid as a separate patch, but I'll
include it in the crc32-lib + efivars patches (it's small).
> * in init/cleanup_crc32, don't hold spinlock during kmalloc/kfree.
> Check out other refcounting schemes in the various fs/*.c files.
OK, hope this is better.
> * Add a Config.in entry, so that people can manually select to compile
> crc32.o as a module. This takes care of the 3rd party module case,
> the module that might for example have been built some time
> after the kernel itself was built.
OK. Turns out this then needs to change arch/$arch/config.in to add
lib/Config.in to the menus too. :-(
Here's today's pass at all this.
http://domsch.com/linux/patches/linux-2.4.13-pre3-crc32-lib-20011016.patch
http://domsch.com/linux/patches/linux-2.4.13-pre3-crc32-drivers-20011016.pat
ch
http://domsch.com/linux/patches/linux-2.4.13-pre3-gpt-20011016.patch
http://domsch.com/linux/patches/linux-2.4.13-pre3-efivars-20011016.patch
This is working on my x86 system with CONFIG_EFI_PARTITION enabled, so I
feel pretty good about at least the LE crc32 function. All the drivers that
could be built on x86 were, but I don't have hardware to test any of them,
nor any non-x86 platforms.
I'm not sure I got the init/cleanup right in all the modules. I tried to
call init_crc32() once only if/when we know the driver's init routine
actually found one or more cards. For the hassle of getting this right, I'm
tempted to make crc32.c have module_init/cleanup() that allocates the memory
unconditionally... It's only 2*1KB at most.
Feedback appreciated.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!
Yet another try...
Changes since last time:
* Do away with modules calling init/cleanup_crc32(). This now happens using
module_init(). This avoids refcounting (though I finally got it right with
assistance) and unnecessary pain associated with knowing exactly when in
each module to call init/cleanup_crc32, how many times, etc.
* Each driver that uses crc32 functions now calls request_module("crc32") in
their init or probe function. This way if crc32.o is a module it gets
loaded and initialized prior to use. If it's static, no harm.
I'm concerned by Keith's comment:
> __initcall entries are executed in the order that they are linked
> into the kernel. The linkage order is controlled by the order that
> Makefiles are processed during kbuild and by line order within each
> Makefile. There is definitely a priority order for __initcall code.
Top-level Makefile has:
SUBDIRS = kernel drivers mm fs net ipc lib
And indeed, drivers and fs get built before lib. Therefore, drivers that
use any of the crc32 functions as part of their probe or module_init will be
broken. So, I've moved lib before drivers, both there and in the link
(where it really matters). Now library __init functions are called after
the file system drivers (which I think is OK, as no file systems, even
initrds, are used before the crc32 functions), and before static drivers are
loaded, which is critical. Any objections?
Patches at:
http://domsch.com/linux/patches/linux-2.4.13-pre6-crc32-lib-20011023.patch
http://domsch.com/linux/patches/linux-2.4.13-pre6-crc32-drivers-20011023.pat
ch
http://domsch.com/linux/patches/linux-2.4.13-pre6-gpt-20011023.patch
http://domsch.com/linux/patches/linux-2.4.13-pre6-efivars-20011023.patch
Feedback welcome. I'm hoping to wrap this up soon... :-)
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions
http://www.dell.com/linux
#2 Linux Server provider with 17% in the US and 14% Worldwide (IDC)!
#3 Unix provider with 18% in the US (Dataquest)!