LinuxLists.cc - [PATCH v5 0/3] New RAID library supporting up to six parities

2014-02-24 21:15:51

Subject: [PATCH v5 0/3] New RAID library supporting up to six parities

Hi,

A new version of the new RAID library. Finally with *working* btrfs support!

It includes patches for both the kernel and btrfs-progs to add new parity
modes "par3", "par4", "par5" and "par6" working similarly at the existing
"raid5" and "raid6" ones.

The patches apply cleanly to kernel v3.14-rc3 and btrfs-progs v3.12.

If you are willing to test it, you can do something like that:

mkfs.btrfs -d par3 -m par3 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mount /dev/sdb1 /mnt/tmp
...copy something to /mnt/tmp...
md5deep -r /mnt/tmp > test_before.hash
umount /mnt/tmp
dd if=/dev/urandom of=/dev/sdc1 count=...
dd if=/dev/urandom of=/dev/sdd1 count=...
dd if=/dev/urandom of=/dev/sde1 count=...
mount -o degraded /dev/sdb1 /mnt/tmp
md5deep -r /mnt/tmp > test_after.hash
umount /mnt/tmp
diff -u test_before.hash test_after.hash && echo OK

I run various test like that, and everything seems to work.

The first patch is the new RAID library for the kernel, supporting up to six
parities. It's verified with automated test that reach 99.3% code coverage.
It also passes clang and valgrind tests with no error.

It applies cleanly to kernel v3.14-rc3, but it should work with any other
version because it's formed only of new files. The only kernel change
is the new CONFIG_RAID_CAUCHY option in the "lib" configuration section.

For reviewing I recommend to start from the include/raid/raid.h that describes
the new generic raid interface. Then continue in lib/raid/raid.c where the interface
is implement. You can start reading the documentation about the RAID
mathematics used, taking care that its correctness is proven both
mathematically and by brute-force by the test programs.
You can then review raid_gen() and raid_rec(), that are high level forwarders
to generic and optimized asm functions that generate parity
and recover data. Their internal structure is very similar at the functions
in RAID6. The main difference is to have a generic matrix of parity coefficients.
All these functions are verified by the test programs, with full lines and
branches coverage, meaning that you can concentrate the review on their
structure, than in the computation and asm details.
Finally, you can review the test programs in lib/raid/test, to ensure that
everything is really tested, and the coverage test can help you on that.

The second patch contains the kernel btrfs modifications. Besides adding the
new parity modes it also removes a lot of code about raid details that are
now handled by the new raid library.

It applies cleanly to kernel v3.14-rc3. You can use it also for previous kernels,
with an obvious adjustment in fs/btrfs/ctree.h.

For reviewing you can start from the diff, and check chunk after chunk.
Likely the two most complex changes are where the new raid_gen() and raid_rec()
are called replacing big chunks of code. But the rest is mostly straightforward
as I just extended all the checks about RAID5 and RAID6 to six parities.
But for sure it needs a more careful review as my knowledge of btrfs internals is
very limited.

The third patch contains the btrfs-progs modification. They are just matching
the kernel changes, and the same considerations apply.

It applies cleanly to btrfs-progs v3.12.

Please let me know what you think, and if it can be considered for inclusion or
something more is required.

If some patch is missing due mailinglist size limit, you can download them at:

http://snapraid.sourceforge.net/linux/v5/

You can see the code coverage analysis made by lcov at:

http://snapraid.sourceforge.net/linux/v5/coverage/

Changes from v4 to v5:
- Adds more comments in the libraid patch.
- Reviews and completes the btrfs patch. The previous patch was not
really working due some missing pieces.
- Adds a new patch for btrfs-progs to extend the mkfs.btrfs
functionality to create filesystem with up to six parity levels.
- Removes the async_tx patch as not yet ready for inclusion.

Changes from v3 to v4:
- Adds a code coverage test
- Adds a matrix inversion test.
- Everything updated to kernel 3.13.

Changes from v2 to v3:
- Adds a new patch to change async_tx to use the new raid library
for synchronous cases and to export a similar interface.
Also modified md/raid5.c to use the new interface of async_tx.
This is just example code not meant for inclusion!
- Renamed raid_par() to raid_gen() to match better existing naming.
- Removed raid_sort() and replaced with raid_insert() that allows
to build a vector already in order instead of sorting it later.
This function is declared in the new raid/helper.h.
- Better documentation in the raid.h/c files. Start from raid.h
to see the documentation of the new interface.

Changes from v1 to v2:
- Adds a patch to btrfs to extend its support to more than double parity.
This is just example code not meant for inclusion!
- Changes the main raid_rec() interface to merge the failed data
and parity index vectors. This matches better the kernel usage.
- Uses alloc_pages_exact() instead of __get_free_pages().
- Removes unnecessary register loads from par1_sse().
- Converts the asm_begin/end() macros to inlined functions.
- Fixes some more checkpatch.pl warnings.
- Other minor style/comment changes.

Andrea Mazzoleni (2):
lib: raid: New RAID library supporting up to six parities
fs: btrfs: Adds new par3456 modes to support up to six parities

fs/btrfs/Kconfig | 1 +
fs/btrfs/ctree.h | 50 +-
fs/btrfs/disk-io.c | 7 +-
fs/btrfs/extent-tree.c | 67 +-
fs/btrfs/inode.c | 3 +-
fs/btrfs/raid56.c | 273 +++-----
fs/btrfs/raid56.h | 19 +-
fs/btrfs/scrub.c | 3 +-
fs/btrfs/volumes.c | 144 ++--
include/linux/raid/helper.h | 32 +
include/linux/raid/raid.h | 87 +++
include/trace/events/btrfs.h | 16 +-
include/uapi/linux/btrfs.h | 19 +-
lib/Kconfig | 17 +
lib/Makefile | 1 +
lib/raid/.gitignore | 3 +
lib/raid/Makefile | 14 +
lib/raid/cpu.h | 44 ++
lib/raid/gf.h | 109 +++
lib/raid/helper.c | 38 +
lib/raid/int.c | 567 +++++++++++++++
lib/raid/internal.h | 148 ++++
lib/raid/mktables.c | 383 +++++++++++
lib/raid/module.c | 458 ++++++++++++
lib/raid/raid.c | 492 +++++++++++++
lib/raid/test/Makefile | 72 ++
lib/raid/test/combo.h | 155 +++++
lib/raid/test/fulltest.c | 79 +++
lib/raid/test/invtest.c | 172 +++++
lib/raid/test/memory.c | 79 +++
lib/raid/test/memory.h | 78 +++
lib/raid/test/selftest.c | 44 ++
lib/raid/test/speedtest.c | 578 ++++++++++++++++
lib/raid/test/test.c | 314 +++++++++
lib/raid/test/test.h | 59 ++
lib/raid/test/usermode.h | 95 +++
lib/raid/test/xor.c | 41 ++
lib/raid/x86.c | 1565 ++++++++++++++++++++++++++++++++++++++++++
38 files changed, 6037 insertions(+), 289 deletions(-)
create mode 100644 include/linux/raid/helper.h
create mode 100644 include/linux/raid/raid.h
create mode 100644 lib/raid/.gitignore
create mode 100644 lib/raid/Makefile
create mode 100644 lib/raid/cpu.h
create mode 100644 lib/raid/gf.h
create mode 100644 lib/raid/helper.c
create mode 100644 lib/raid/int.c
create mode 100644 lib/raid/internal.h
create mode 100644 lib/raid/mktables.c
create mode 100644 lib/raid/module.c
create mode 100644 lib/raid/raid.c
create mode 100644 lib/raid/test/Makefile
create mode 100644 lib/raid/test/combo.h
create mode 100644 lib/raid/test/fulltest.c
create mode 100644 lib/raid/test/invtest.c
create mode 100644 lib/raid/test/memory.c
create mode 100644 lib/raid/test/memory.h
create mode 100644 lib/raid/test/selftest.c
create mode 100644 lib/raid/test/speedtest.c
create mode 100644 lib/raid/test/test.c
create mode 100644 lib/raid/test/test.h
create mode 100644 lib/raid/test/usermode.h
create mode 100644 lib/raid/test/xor.c
create mode 100644 lib/raid/x86.c

--
1.7.12.1

2014-02-24 21:16:12

by Andrea Mazzoleni

[permalink] [raw]

Subject: [PATCH v5 1/3] lib: raid: New RAID library supporting up to six parities

This patch adds a new lib/raid directory, containing a new RAID support
based on a Cauchy matrix working for up to six parities, and backward
compatible with the existing RAID6 support.

The interface is defined in include/linux/raid/raid.h and provides two new
functions raid_gen() and raid_rec() that handle parity generation and data
recovering for up to six level of redundancy, and replaces the previous
RAID6 interface.

The library provides fast implementations using SSE2 and SSSE3 for x86/x64
and a portable C implementation working everywhere.
If the RAID6 library is enabled in the kernel, its functionality is also used
to maintain the existing level of performance for the first two parities in
architectures different than x86.

At startup the module runs a very fast self test (about 1ms) to ensure that
the used functions are correct.
You can also enable a speed test similar at the one used by raid6, using the
"speedtest=1" argument when loading the module.

In the lib/raid/test directory are present also some user mode test programs:
selftest - Runs the same selftest and speedtest executed at the module startup.
fulltest - Runs a more extensive test that checks all the built-in functions.
speetest - Runs a more complete speed test.
invtest - Runs an extensive matrix inversion test of all the 377.342.351.231
possible square submatrices of the Cauchy matrix used.
covtest - Runs a coverage test using lcov.

As a reference, in my icore7 2.7GHz the speedtest program reports:

...
Speed test using 16 data buffers of 4096 bytes, for a total of 64 KiB.
Memory blocks have a displacement of 64 bytes to improve cache performance.
The reported value is the aggregate bandwidth of all data blocks in MiB/s,
not counting parity blocks.

Memory write speed using the C memset() function:
memset 33518

RAID functions used for computing the parity:
int8 int32 int64 sse2 sse2e ssse3 ssse3e
gen1 11762 21450 44621
gen2 3520 6176 18100 20338
gen3 848 8009 9210
gen4 659 6518 7303
gen5 531 4931 5363
gen6 430 4069 4471

RAID functions used for recovering:
int8 ssse3
rec1 591 1126
rec2 272 456
rec3 80 305
rec4 49 216
rec5 34 151
...

Legend:
genX functions to generate X parities
recX functions to recover X data blocks
int8 implementation based on 8 bits arithmetics
int32 implementation based on 32 bits arithmetics
int64 implementation based on 64 bits arithmetics
sse2 implementation based on SSE2
sse2e implementation based on SSE2 with 16 registers (x64)
ssse3 implementation based on SSSE3
ssse3e implementation based on SSSE3 with 16 registers (x64)

Signed-off-by: Andrea Mazzoleni <[email protected]>
---
include/linux/raid/helper.h | 32 +
include/linux/raid/raid.h | 87 +++
lib/Kconfig | 17 +
lib/Makefile | 1 +
lib/raid/.gitignore | 3 +
lib/raid/Makefile | 14 +
lib/raid/cpu.h | 44 ++
lib/raid/gf.h | 109 +++
lib/raid/helper.c | 38 ++
lib/raid/int.c | 567 ++++++++++++++++
lib/raid/internal.h | 148 ++++
lib/raid/mktables.c | 383 +++++++++++
lib/raid/module.c | 458 +++++++++++++
lib/raid/raid.c | 492 ++++++++++++++
lib/raid/test/Makefile | 72 ++
lib/raid/test/combo.h | 155 +++++
lib/raid/test/fulltest.c | 79 +++
lib/raid/test/invtest.c | 172 +++++
lib/raid/test/memory.c | 79 +++
lib/raid/test/memory.h | 78 +++
lib/raid/test/selftest.c | 44 ++
lib/raid/test/speedtest.c | 578 ++++++++++++++++
lib/raid/test/test.c | 314 +++++++++
lib/raid/test/test.h | 59 ++
lib/raid/test/usermode.h | 95 +++
lib/raid/test/xor.c | 41 ++
lib/raid/x86.c | 1565 +++++++++++++++++++++++++++++++++++++++++++
27 files changed, 5724 insertions(+)
create mode 100644 include/linux/raid/helper.h
create mode 100644 include/linux/raid/raid.h
create mode 100644 lib/raid/.gitignore
create mode 100644 lib/raid/Makefile
create mode 100644 lib/raid/cpu.h
create mode 100644 lib/raid/gf.h
create mode 100644 lib/raid/helper.c
create mode 100644 lib/raid/int.c
create mode 100644 lib/raid/internal.h
create mode 100644 lib/raid/mktables.c
create mode 100644 lib/raid/module.c
create mode 100644 lib/raid/raid.c
create mode 100644 lib/raid/test/Makefile
create mode 100644 lib/raid/test/combo.h
create mode 100644 lib/raid/test/fulltest.c
create mode 100644 lib/raid/test/invtest.c
create mode 100644 lib/raid/test/memory.c
create mode 100644 lib/raid/test/memory.h
create mode 100644 lib/raid/test/selftest.c
create mode 100644 lib/raid/test/speedtest.c
create mode 100644 lib/raid/test/test.c
create mode 100644 lib/raid/test/test.h
create mode 100644 lib/raid/test/usermode.h
create mode 100644 lib/raid/test/xor.c
create mode 100644 lib/raid/x86.c

diff --git a/include/linux/raid/helper.h b/include/linux/raid/helper.h
new file mode 100644
index 0000000..4787df9
--- /dev/null
+++ b/include/linux/raid/helper.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_HELPER_H
+#define __RAID_HELPER_H
+
+/**
+ * Inserts an integer in a sorted vector.
+ *
+ * This function can be used to insert indexes in order, ready to be used for
+ * calling raid_rec().
+ *
+ * @n Number of integers currently in the vector.
+ * @v Vector of integers already sorted.
+ * It must have extra space for the new elemet at the end.
+ * @i Value to insert.
+ */
+void raid_insert(int n, int *v, int i);
+
+#endif
+
diff --git a/include/linux/raid/raid.h b/include/linux/raid/raid.h
new file mode 100644
index 0000000..ef61846
--- /dev/null
+++ b/include/linux/raid/raid.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_H
+#define __RAID_H
+
+#ifdef __KERNEL__ /* to build the user mode test */
+#include <linux/types.h> /* for size_t */
+#endif
+
+/**
+ * Maximum number of parity disks supported.
+ */
+#define RAID_PARITY_MAX 6
+
+/**
+ * Maximum number of data disks supported.
+ */
+#define RAID_DATA_MAX 251
+
+/**
+ * Computes the parity blocks.
+ *
+ * This function computes the specified number of parity blocks of the
+ * provided set of data blocks.
+ *
+ * Each parity block, will allow to recover on data block.
+ *
+ * @nd Number of data blocks.
+ * @np Number of parities blocks to compute.
+ * @size Size of the blocks pointed by @v. It must be a multipler of 64.
+ * @v Vector of pointers to the blocks of data and parity.
+ * It has (@nd + @np) elements. The starting elements are the blocks for
+ * data, following with the parity blocks.
+ * Data blocks are only read and not modified. Parity blocks are written.
+ * Each block has @size bytes.
+ */
+void raid_gen(int nd, int np, size_t size, void **v);
+
+/**
+ * Recovers failures in data and parity blocks.
+ *
+ * This function recovers all the data and parity blocks marked as bad
+ * in the @ir vector.
+ *
+ * Ensure to have @nr <= @np, otherwise recovering is not possible.
+ *
+ * The parities blocks used for recovering are automatically selected from
+ * the ones NOT present in the @ir vector.
+ *
+ * In case there are more parity blocks than needed to recover, the parities
+ * at lower indexes are used in the recovering, and the others are ignored.
+ *
+ * Note that no internal integrity check is done when recovering. If the
+ * provided parities are correct the resulting data will be also correct.
+ * If parities are wrong, also the resulting recovered data will be wrong.
+ * This happens even in the case you have more parities blocks than needed,
+ * and some form of integrity verification is possible.
+ *
+ * @nr Number of failed data and parity blocks to recover.
+ * @ir[] Vector of @nr indexes of the data and parity blocks to recover.
+ * The indexes start from 0. They must be in order.
+ * The first parity is represented with value @nd, the second with value
+ * @nd + 1, just like positions in the @v vector.
+ * @nd Number of data blocks.
+ * @np Number of parity blocks.
+ * @size Size of the blocks pointed by @v. It must be a multipler of 64.
+ * @v Vector of pointers to the blocks of data and parity.
+ * It has (@nd + @np) elements. The starting elements are the blocks
+ * for data, following with the parity blocks.
+ * Each block has @size bytes.
+ */
+void raid_rec(int nr, int *ir, int nd, int np, size_t size, void **v);
+
+#endif
+
diff --git a/lib/Kconfig b/lib/Kconfig
index 991c98b..9865862 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -10,6 +10,23 @@ menu "Library routines"
config RAID6_PQ
tristate

+config RAID_CAUCHY
+ tristate "RAID Cauchy functions"
+ help
+ This option enables the RAID parity library based on a Cauchy matrix
+ that supports up to six parities, and it's compatible with the
+ existing RAID6 support.
+
+ This library provides optimized functions for triple parity and
+ beyond for architectures with SSSE3 support.
+
+ The new interface is defined in the linux/raid/raid.h file.
+ If the RAID6 module is enabled, it's used to maintain the same
+ performance level for RAID5 and RAID6 in all the architectures
+ when using the new interface.
+
+ Module will be called raid_cauchy.
+
config BITREVERSE
tristate

diff --git a/lib/Makefile b/lib/Makefile
index 48140e3..28135a4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_LZ4HC_COMPRESS) += lz4/
obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/
obj-$(CONFIG_XZ_DEC) += xz/
obj-$(CONFIG_RAID6_PQ) += raid6/
+obj-$(CONFIG_RAID_CAUCHY) += raid/

lib-$(CONFIG_DECOMPRESS_GZIP) += decompress_inflate.o
lib-$(CONFIG_DECOMPRESS_BZIP2) += decompress_bunzip2.o
diff --git a/lib/raid/.gitignore b/lib/raid/.gitignore
new file mode 100644
index 0000000..aef693b
--- /dev/null
+++ b/lib/raid/.gitignore
@@ -0,0 +1,3 @@
+mktables
+tables.c
+
diff --git a/lib/raid/Makefile b/lib/raid/Makefile
new file mode 100644
index 0000000..9eedf4a
--- /dev/null
+++ b/lib/raid/Makefile
@@ -0,0 +1,14 @@
+obj-$(CONFIG_RAID_CAUCHY) += raid_cauchy.o
+
+raid_cauchy-y += module.o raid.o tables.o int.o helper.o
+
+raid_cauchy-$(CONFIG_X86) += x86.o
+
+hostprogs-y += mktables
+
+quiet_cmd_mktable = TABLE $@
+ cmd_mktable = $(obj)/mktables > $@ || ( rm -f $@ && exit 1 )
+
+targets += tables.c
+$(obj)/tables.c: $(obj)/mktables FORCE
+ $(call if_changed,mktable)
diff --git a/lib/raid/cpu.h b/lib/raid/cpu.h
new file mode 100644
index 0000000..4295aa7
--- /dev/null
+++ b/lib/raid/cpu.h
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_CPU_H
+#define __RAID_CPU_H
+
+#ifdef CONFIG_X86
+static inline int raid_cpu_has_sse2(void)
+{
+ return boot_cpu_has(X86_FEATURE_XMM2);
+}
+
+static inline int raid_cpu_has_ssse3(void)
+{
+ /* checks also for SSE2 */
+ /* likely it's implicit, but just to be sure */
+ return boot_cpu_has(X86_FEATURE_XMM2)
+ && boot_cpu_has(X86_FEATURE_SSSE3);
+}
+
+static inline int raid_cpu_has_avx2(void)
+{
+ /* checks also for SSE2 and SSSE3 */
+ /* likely it's implicit, but just to be sure */
+ return boot_cpu_has(X86_FEATURE_XMM2)
+ && boot_cpu_has(X86_FEATURE_SSSE3)
+ && boot_cpu_has(X86_FEATURE_AVX)
+ && boot_cpu_has(X86_FEATURE_AVX2);
+}
+#endif
+
+#endif
+
diff --git a/lib/raid/gf.h b/lib/raid/gf.h
new file mode 100644
index 0000000..f444e63
--- /dev/null
+++ b/lib/raid/gf.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_GF_H
+#define __RAID_GF_H
+
+/*
+ * Galois field operations.
+ *
+ * Basic range checks are implemented using BUG_ON().
+ */
+
+/*
+ * GF a*b.
+ */
+static __always_inline uint8_t mul(uint8_t a, uint8_t b)
+{
+ return gfmul[a][b];
+}
+
+/*
+ * GF 1/a.
+ * Not defined for a == 0.
+ */
+static __always_inline uint8_t inv(uint8_t v)
+{
+ BUG_ON(v == 0); /* division by zero */
+
+ return gfinv[v];
+}
+
+/*
+ * GF 2^a.
+ */
+static __always_inline uint8_t pow2(int v)
+{
+ BUG_ON(v < 0 || v > 254); /* invalid exponent */
+
+ return gfexp[v];
+}
+
+/*
+ * Gets the multiplication table for a specified value.
+ */
+static __always_inline const uint8_t *table(uint8_t v)
+{
+ return gfmul[v];
+}
+
+/*
+ * Gets the generator matrix coefficient for parity 'p' and disk 'd'.
+ */
+static __always_inline uint8_t A(int p, int d)
+{
+ return gfgen[p][d];
+}
+
+/*
+ * Dereference as uint8_t
+ */
+#define v_8(p) (*(uint8_t *)&(p))
+
+/*
+ * Dereference as uint32_t
+ */
+#define v_32(p) (*(uint32_t *)&(p))
+
+/*
+ * Dereference as uint64_t
+ */
+#define v_64(p) (*(uint64_t *)&(p))
+
+/*
+ * Multiply each byte of a uint32 by 2 in the GF(2^8).
+ */
+static __always_inline uint32_t x2_32(uint32_t v)
+{
+ uint32_t mask = v & 0x80808080U;
+ mask = (mask << 1) - (mask >> 7);
+ v = (v << 1) & 0xfefefefeU;
+ v ^= mask & 0x1d1d1d1dU;
+ return v;
+}
+
+/*
+ * Multiply each byte of a uint64 by 2 in the GF(2^8).
+ */
+static __always_inline uint64_t x2_64(uint64_t v)
+{
+ uint64_t mask = v & 0x8080808080808080ULL;
+ mask = (mask << 1) - (mask >> 7);
+ v = (v << 1) & 0xfefefefefefefefeULL;
+ v ^= mask & 0x1d1d1d1d1d1d1d1dULL;
+ return v;
+}
+
+#endif
+
diff --git a/lib/raid/helper.c b/lib/raid/helper.c
new file mode 100644
index 0000000..03f7ecc
--- /dev/null
+++ b/lib/raid/helper.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+
+void raid_insert(int n, int *v, int i)
+{
+ /* we don't use binary search because this is intended */
+ /* for very small vectors and we want to optimize the case */
+ /* of elements inserted already in order */
+
+ /* insert at the end */
+ v[n] = i;
+
+ /* swap until in the correct position */
+ while (n > 0 && v[n-1] > v[n]) {
+ /* swap */
+ int t = v[n-1];
+ v[n-1] = v[n];
+ v[n] = t;
+
+ /* previous position */
+ --n;
+ }
+}
+EXPORT_SYMBOL_GPL(raid_insert);
+
diff --git a/lib/raid/int.c b/lib/raid/int.c
new file mode 100644
index 0000000..bd03b52
--- /dev/null
+++ b/lib/raid/int.c
@@ -0,0 +1,567 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "gf.h"
+
+/*
+ * GEN1 (RAID5 with xor) 32bit C implementation
+ */
+void raid_gen1_int32(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ int d, l;
+ size_t i;
+
+ uint32_t p0;
+ uint32_t p1;
+
+ l = nd - 1;
+ p = v[nd];
+
+ for (i = 0; i < size; i += 8) {
+ p0 = v_32(v[l][i]);
+ p1 = v_32(v[l][i+4]);
+ for (d = l-1; d >= 0; --d) {
+ p0 ^= v_32(v[d][i]);
+ p1 ^= v_32(v[d][i+4]);
+ }
+ v_32(p[i]) = p0;
+ v_32(p[i+4]) = p1;
+ }
+}
+
+/*
+ * GEN1 (RAID5 with xor) 64bit C implementation
+ */
+void raid_gen1_int64(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ int d, l;
+ size_t i;
+
+ uint64_t p0;
+ uint64_t p1;
+
+ l = nd - 1;
+ p = v[nd];
+
+ for (i = 0; i < size; i += 16) {
+ p0 = v_64(v[l][i]);
+ p1 = v_64(v[l][i+8]);
+ for (d = l-1; d >= 0; --d) {
+ p0 ^= v_64(v[d][i]);
+ p1 ^= v_64(v[d][i+8]);
+ }
+ v_64(p[i]) = p0;
+ v_64(p[i+8]) = p1;
+ }
+}
+
+/*
+ * GEN2 (RAID6 with powers of 2) 32bit C implementation
+ */
+void raid_gen2_int32(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ int d, l;
+ size_t i;
+
+ uint32_t d0, q0, p0;
+ uint32_t d1, q1, p1;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+
+ for (i = 0; i < size; i += 8) {
+ q0 = p0 = v_32(v[l][i]);
+ q1 = p1 = v_32(v[l][i+4]);
+ for (d = l-1; d >= 0; --d) {
+ d0 = v_32(v[d][i]);
+ d1 = v_32(v[d][i+4]);
+
+ p0 ^= d0;
+ p1 ^= d1;
+
+ q0 = x2_32(q0);
+ q1 = x2_32(q1);
+
+ q0 ^= d0;
+ q1 ^= d1;
+ }
+ v_32(p[i]) = p0;
+ v_32(p[i+4]) = p1;
+ v_32(q[i]) = q0;
+ v_32(q[i+4]) = q1;
+ }
+}
+
+/*
+ * GEN2 (RAID6 with powers of 2) 64bit C implementation
+ */
+void raid_gen2_int64(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ int d, l;
+ size_t i;
+
+ uint64_t d0, q0, p0;
+ uint64_t d1, q1, p1;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+
+ for (i = 0; i < size; i += 16) {
+ q0 = p0 = v_64(v[l][i]);
+ q1 = p1 = v_64(v[l][i+8]);
+ for (d = l-1; d >= 0; --d) {
+ d0 = v_64(v[d][i]);
+ d1 = v_64(v[d][i+8]);
+
+ p0 ^= d0;
+ p1 ^= d1;
+
+ q0 = x2_64(q0);
+ q1 = x2_64(q1);
+
+ q0 ^= d0;
+ q1 ^= d1;
+ }
+ v_64(p[i]) = p0;
+ v_64(p[i+8]) = p1;
+ v_64(q[i]) = q0;
+ v_64(q[i+8]) = q1;
+ }
+}
+
+/*
+ * GEN3 (triple parity with Cauchy matrix) 8bit C implementation
+ *
+ * Note that instead of a generic multiplication table, likely resulting
+ * in multiple cache misses, a precomputed table could be used.
+ * But this is only a kind of reference function, and we are not really
+ * interested in speed.
+ */
+void raid_gen3_int8(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ int d, l;
+ size_t i;
+
+ uint8_t d0, r0, q0, p0;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+
+ for (i = 0; i < size; i += 1) {
+ p0 = q0 = r0 = 0;
+ for (d = l; d > 0; --d) {
+ d0 = v_8(v[d][i]);
+
+ p0 ^= d0;
+ q0 ^= gfmul[d0][gfgen[1][d]];
+ r0 ^= gfmul[d0][gfgen[2][d]];
+ }
+
+ /* first disk with all coefficients at 1 */
+ d0 = v_8(v[0][i]);
+
+ p0 ^= d0;
+ q0 ^= d0;
+ r0 ^= d0;
+
+ v_8(p[i]) = p0;
+ v_8(q[i]) = q0;
+ v_8(r[i]) = r0;
+ }
+}
+
+/*
+ * GEN4 (quad parity with Cauchy matrix) 8bit C implementation
+ *
+ * Note that instead of a generic multiplication table, likely resulting
+ * in multiple cache misses, a precomputed table could be used.
+ * But this is only a kind of reference function, and we are not really
+ * interested in speed.
+ */
+void raid_gen4_int8(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ int d, l;
+ size_t i;
+
+ uint8_t d0, s0, r0, q0, p0;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+
+ for (i = 0; i < size; i += 1) {
+ p0 = q0 = r0 = s0 = 0;
+ for (d = l; d > 0; --d) {
+ d0 = v_8(v[d][i]);
+
+ p0 ^= d0;
+ q0 ^= gfmul[d0][gfgen[1][d]];
+ r0 ^= gfmul[d0][gfgen[2][d]];
+ s0 ^= gfmul[d0][gfgen[3][d]];
+ }
+
+ /* first disk with all coefficients at 1 */
+ d0 = v_8(v[0][i]);
+
+ p0 ^= d0;
+ q0 ^= d0;
+ r0 ^= d0;
+ s0 ^= d0;
+
+ v_8(p[i]) = p0;
+ v_8(q[i]) = q0;
+ v_8(r[i]) = r0;
+ v_8(s[i]) = s0;
+ }
+}
+
+/*
+ * GEN5 (penta parity with Cauchy matrix) 8bit C implementation
+ *
+ * Note that instead of a generic multiplication table, likely resulting
+ * in multiple cache misses, a precomputed table could be used.
+ * But this is only a kind of reference function, and we are not really
+ * interested in speed.
+ */
+void raid_gen5_int8(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ int d, l;
+ size_t i;
+
+ uint8_t d0, t0, s0, r0, q0, p0;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+
+ for (i = 0; i < size; i += 1) {
+ p0 = q0 = r0 = s0 = t0 = 0;
+ for (d = l; d > 0; --d) {
+ d0 = v_8(v[d][i]);
+
+ p0 ^= d0;
+ q0 ^= gfmul[d0][gfgen[1][d]];
+ r0 ^= gfmul[d0][gfgen[2][d]];
+ s0 ^= gfmul[d0][gfgen[3][d]];
+ t0 ^= gfmul[d0][gfgen[4][d]];
+ }
+
+ /* first disk with all coefficients at 1 */
+ d0 = v_8(v[0][i]);
+
+ p0 ^= d0;
+ q0 ^= d0;
+ r0 ^= d0;
+ s0 ^= d0;
+ t0 ^= d0;
+
+ v_8(p[i]) = p0;
+ v_8(q[i]) = q0;
+ v_8(r[i]) = r0;
+ v_8(s[i]) = s0;
+ v_8(t[i]) = t0;
+ }
+}
+
+/*
+ * GEN6 (hexa parity with Cauchy matrix) 8bit C implementation
+ *
+ * Note that instead of a generic multiplication table, likely resulting
+ * in multiple cache misses, a precomputed table could be used.
+ * But this is only a kind of reference function, and we are not really
+ * interested in speed.
+ */
+void raid_gen6_int8(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ uint8_t *u;
+ int d, l;
+ size_t i;
+
+ uint8_t d0, u0, t0, s0, r0, q0, p0;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+ u = v[nd+5];
+
+ for (i = 0; i < size; i += 1) {
+ p0 = q0 = r0 = s0 = t0 = u0 = 0;
+ for (d = l; d > 0; --d) {
+ d0 = v_8(v[d][i]);
+
+ p0 ^= d0;
+ q0 ^= gfmul[d0][gfgen[1][d]];
+ r0 ^= gfmul[d0][gfgen[2][d]];
+ s0 ^= gfmul[d0][gfgen[3][d]];
+ t0 ^= gfmul[d0][gfgen[4][d]];
+ u0 ^= gfmul[d0][gfgen[5][d]];
+ }
+
+ /* first disk with all coefficients at 1 */
+ d0 = v_8(v[0][i]);
+
+ p0 ^= d0;
+ q0 ^= d0;
+ r0 ^= d0;
+ s0 ^= d0;
+ t0 ^= d0;
+ u0 ^= d0;
+
+ v_8(p[i]) = p0;
+ v_8(q[i]) = q0;
+ v_8(r[i]) = r0;
+ v_8(s[i]) = s0;
+ v_8(t[i]) = t0;
+ v_8(u[i]) = u0;
+ }
+}
+
+/*
+ * Recover failure of one data block at index id[0] using parity at index
+ * ip[0] for any RAID level.
+ *
+ * Starting from the equation:
+ *
+ * Pd = A[ip[0],id[0]] * Dx
+ *
+ * and solving we get:
+ *
+ * Dx = A[ip[0],id[0]]^-1 * Pd
+ */
+void raid_rec1_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *pa;
+ const uint8_t *T;
+ uint8_t G;
+ uint8_t V;
+ size_t i;
+
+ (void)nr; /* unused, it's always 1 */
+
+ /* if it's RAID5 uses the faster function */
+ if (ip[0] == 0) {
+ raid_rec1of1(id, nd, size, vv);
+ return;
+ }
+
+#ifdef RAID_USE_RAID6_PQ
+ /* if it's RAID6 recovering with Q uses the faster function */
+ if (ip[0] == 1) {
+ raid6_datap_recov(nd + 2, size, id[0], vv);
+ return;
+ }
+#endif
+
+ /* setup the coefficients matrix */
+ G = A(ip[0], id[0]);
+
+ /* invert it to solve the system of linear equations */
+ V = inv(G);
+
+ /* get multiplication tables */
+ T = table(V);
+
+ /* compute delta parity */
+ raid_delta_gen(1, id, ip, nd, size, vv);
+
+ p = v[nd+ip[0]];
+ pa = v[id[0]];
+
+ for (i = 0; i < size; ++i) {
+ /* delta */
+ uint8_t Pd = p[i] ^ pa[i];
+
+ /* reconstruct */
+ pa[i] = T[Pd];
+ }
+}
+
+/*
+ * Recover failure of two data blocks at indexes id[0],id[1] using parity at
+ * indexes ip[0],ip[1] for any RAID level.
+ *
+ * Starting from the equations:
+ *
+ * Pd = A[ip[0],id[0]] * Dx + A[ip[0],id[1]] * Dy
+ * Qd = A[ip[1],id[0]] * Dx + A[ip[1],id[1]] * Dy
+ *
+ * we solve inverting the coefficients matrix.
+ */
+void raid_rec2_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *pa;
+ uint8_t *q;
+ uint8_t *qa;
+ const int N = 2;
+ const uint8_t *T[N][N];
+ uint8_t G[N*N];
+ uint8_t V[N*N];
+ size_t i;
+ int j, k;
+
+ (void)nr; /* unused, it's always 2 */
+
+ /* if it's RAID6 recovering with P and Q uses the faster function */
+ if (ip[0] == 0 && ip[1] == 1) {
+#ifdef RAID_USE_RAID6_PQ
+ raid6_2data_recov(nd + 2, size, id[0], id[1], vv);
+#else
+ raid_rec2of2_int8(id, ip, nd, size, vv);
+#endif
+ return;
+ }
+
+ /* setup the coefficients matrix */
+ for (j = 0; j < N; ++j)
+ for (k = 0; k < N; ++k)
+ G[j*N+k] = A(ip[j], id[k]);
+
+ /* invert it to solve the system of linear equations */
+ raid_invert(G, V, N);
+
+ /* get multiplication tables */
+ for (j = 0; j < N; ++j)
+ for (k = 0; k < N; ++k)
+ T[j][k] = table(V[j*N+k]);
+
+ /* compute delta parity */
+ raid_delta_gen(2, id, ip, nd, size, vv);
+
+ p = v[nd+ip[0]];
+ q = v[nd+ip[1]];
+ pa = v[id[0]];
+ qa = v[id[1]];
+
+ for (i = 0; i < size; ++i) {
+ /* delta */
+ uint8_t Pd = p[i] ^ pa[i];
+ uint8_t Qd = q[i] ^ qa[i];
+
+ /* reconstruct */
+ pa[i] = T[0][0][Pd] ^ T[0][1][Qd];
+ qa[i] = T[1][0][Pd] ^ T[1][1][Qd];
+ }
+}
+
+/*
+ * Recover failure of N data blocks at indexes id[N] using parity at indexes
+ * ip[N] for any RAID level.
+ *
+ * Starting from the N equations, with 0<=i<N :
+ *
+ * PD[i] = sum(A[ip[i],id[j]] * D[i]) 0<=j<N
+ *
+ * we solve inverting the coefficients matrix.
+ *
+ * Note that referring at previous equations you have:
+ * PD[0] = Pd, PD[1] = Qd, PD[2] = Rd, ...
+ * D[0] = Dx, D[1] = Dy, D[2] = Dz, ...
+ */
+void raid_recX_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p[RAID_PARITY_MAX];
+ uint8_t *pa[RAID_PARITY_MAX];
+ const uint8_t *T[RAID_PARITY_MAX][RAID_PARITY_MAX];
+ uint8_t G[RAID_PARITY_MAX*RAID_PARITY_MAX];
+ uint8_t V[RAID_PARITY_MAX*RAID_PARITY_MAX];
+ size_t i;
+ int j, k;
+
+ /* setup the coefficients matrix */
+ for (j = 0; j < nr; ++j)
+ for (k = 0; k < nr; ++k)
+ G[j*nr+k] = A(ip[j], id[k]);
+
+ /* invert it to solve the system of linear equations */
+ raid_invert(G, V, nr);
+
+ /* get multiplication tables */
+ for (j = 0; j < nr; ++j)
+ for (k = 0; k < nr; ++k)
+ T[j][k] = table(V[j*nr+k]);
+
+ /* compute delta parity */
+ raid_delta_gen(nr, id, ip, nd, size, vv);
+
+ for (j = 0; j < nr; ++j) {
+ p[j] = v[nd+ip[j]];
+ pa[j] = v[id[j]];
+ }
+
+ for (i = 0; i < size; ++i) {
+ uint8_t PD[RAID_PARITY_MAX];
+
+ /* delta */
+ for (j = 0; j < nr; ++j)
+ PD[j] = p[j][i] ^ pa[j][i];
+
+ /* reconstruct */
+ for (j = 0; j < nr; ++j) {
+ uint8_t b = 0;
+ for (k = 0; k < nr; ++k)
+ b ^= T[j][k][PD[k]];
+ pa[j][i] = b;
+ }
+ }
+}
+
diff --git a/lib/raid/internal.h b/lib/raid/internal.h
new file mode 100644
index 0000000..b3bf9e5
--- /dev/null
+++ b/lib/raid/internal.h
@@ -0,0 +1,148 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_INTERNAL_H
+#define __RAID_INTERNAL_H
+
+/*
+ * Includes anything required for compatibility.
+ */
+#ifdef __KERNEL__ /* to build the user mode test */
+
+#include <linux/module.h>
+#include <linux/kconfig.h> /* for IS_* macros */
+#include <linux/export.h> /* for EXPORT_SYMBOL/EXPORT_SYMBOL_GPL */
+#include <linux/bug.h> /* for BUG_ON */
+#include <linux/gfp.h> /* for __get_free_pages */
+
+#ifdef CONFIG_X86
+#include <asm/i387.h> /* for kernel_fpu_begin/end() */
+#endif
+
+/* if we can use the XOR_BLOCKS library */
+#if IS_BUILTIN(CONFIG_XOR_BLOCKS) \
+ || (IS_MODULE(CONFIG_XOR_BLOCKS) && IS_MODULE(CONFIG_RAID6_CAUCHY))
+#define RAID_USE_XOR_BLOCKS 1
+#include <linux/raid/xor.h> /* for xor_blocks */
+#endif
+
+/* if we can use the RAID6 library */
+#if IS_BUILTIN(CONFIG_RAID6_PQ) \
+ || (IS_MODULE(CONFIG_RAID6_PQ) && IS_MODULE(CONFIG_RAID6_CAUCHY))
+#define RAID_USE_RAID6_PQ 1
+#include <linux/raid/pq.h> /* for tables/functions */
+#endif
+
+#else /* __KERNEL__ */
+#include "test/usermode.h"
+#endif /* __KERNEL__ */
+
+/*
+ * Includes the headers.
+ */
+#include <linux/raid/raid.h>
+#include <linux/raid/helper.h>
+
+/*
+ * Internal functions.
+ *
+ * These are intented to provide access for testing.
+ */
+void raid_init(void);
+int raid_selftest(void);
+int raid_speedtest(int displacement);
+void raid_gen_ref(int nd, int np, size_t size, void **vv);
+void raid_invert(uint8_t *M, uint8_t *V, int n);
+void raid_delta_gen(int nr, int *id, int *ip, int nd, size_t size, void **v);
+void raid_rec1of1(int *id, int nd, size_t size, void **v);
+void raid_rec2of2_int8(int *id, int *ip, int nd, size_t size, void **vv);
+void raid_gen1_xorblocks(int nd, size_t size, void **v);
+void raid_gen1_int32(int nd, size_t size, void **vv);
+void raid_gen1_int64(int nd, size_t size, void **vv);
+void raid_gen1_sse2(int nd, size_t size, void **vv);
+void raid_gen2_raid6(int nd, size_t size, void **vv);
+void raid_gen2_int32(int nd, size_t size, void **vv);
+void raid_gen2_int64(int nd, size_t size, void **vv);
+void raid_gen2_sse2(int nd, size_t size, void **vv);
+void raid_gen2_sse2ext(int nd, size_t size, void **vv);
+void raid_gen3_int8(int nd, size_t size, void **vv);
+void raid_gen3_ssse3(int nd, size_t size, void **vv);
+void raid_gen3_ssse3ext(int nd, size_t size, void **vv);
+void raid_gen4_int8(int nd, size_t size, void **vv);
+void raid_gen4_ssse3(int nd, size_t size, void **vv);
+void raid_gen4_ssse3ext(int nd, size_t size, void **vv);
+void raid_gen5_int8(int nd, size_t size, void **vv);
+void raid_gen5_ssse3(int nd, size_t size, void **vv);
+void raid_gen5_ssse3ext(int nd, size_t size, void **vv);
+void raid_gen6_int8(int nd, size_t size, void **vv);
+void raid_gen6_ssse3(int nd, size_t size, void **vv);
+void raid_gen6_ssse3ext(int nd, size_t size, void **vv);
+void raid_rec1_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+void raid_rec2_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+void raid_recX_int8(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+void raid_rec1_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+void raid_rec2_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+void raid_recX_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv);
+
+/*
+ * Internal forwarders.
+ */
+extern void (*raid_gen_ptr[RAID_PARITY_MAX])(
+ int nd, size_t size, void **vv);
+extern void (*raid_rec_ptr[RAID_PARITY_MAX])(
+ int nr, int *id, int *ip, int nd, size_t size, void **vv);
+
+/*
+ * Tables.
+ *
+ * Uses RAID6 tables if available, otherwise the ones in tables.c.
+ */
+#ifdef RAID_USE_RAID6_PQ
+#define gfmul raid6_gfmul
+#define gfinv raid6_gfinv
+#define gfexp raid6_gfexp
+#else
+extern const uint8_t raid_gfmul[256][256] __aligned(256);
+extern const uint8_t raid_gfexp[256] __aligned(256);
+extern const uint8_t raid_gfinv[256] __aligned(256);
+#define gfmul raid_gfmul
+#define gfexp raid_gfexp
+#define gfinv raid_gfinv
+#endif
+
+extern const uint8_t raid_gfcauchy[6][256] __aligned(256);
+extern const uint8_t raid_gfcauchypshufb[251][4][2][16] __aligned(256);
+extern const uint8_t raid_gfmulpshufb[256][2][16] __aligned(256);
+#define gfgen raid_gfcauchy
+#define gfgenpshufb raid_gfcauchypshufb
+#define gfmulpshufb raid_gfmulpshufb
+
+/*
+ * Assembler blocks.
+ */
+#ifdef CONFIG_X86
+static __always_inline void raid_asm_begin(void)
+{
+ kernel_fpu_begin();
+}
+
+static __always_inline void raid_asm_end(void)
+{
+ asm volatile("sfence" : : : "memory");
+ kernel_fpu_end();
+}
+#endif
+
+#endif
+
diff --git a/lib/raid/mktables.c b/lib/raid/mktables.c
new file mode 100644
index 0000000..8e7fa03
--- /dev/null
+++ b/lib/raid/mktables.c
@@ -0,0 +1,383 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+/**
+ * Multiplication a*b in GF(2^8).
+ */
+static uint8_t gfmul(uint8_t a, uint8_t b)
+{
+ uint8_t v;
+
+ v = 0;
+ while (b) {
+ if ((b & 1) != 0)
+ v ^= a;
+
+ if ((a & 0x80) != 0) {
+ a <<= 1;
+ a ^= 0x1d;
+ } else {
+ a <<= 1;
+ }
+
+ b >>= 1;
+ }
+
+ return v;
+}
+
+/**
+ * Inversion (1/a) in GF(2^8).
+ */
+uint8_t gfinv[256];
+
+/**
+ * Number of parities.
+ * This is the number of rows of the generator matrix.
+ */
+#define PARITY 6
+
+/**
+ * Number of disks.
+ * This is the number of columns of the generator matrix.
+ */
+#define DISK (257-PARITY)
+
+/**
+ * Setup the Cauchy matrix used to generate the parity.
+ */
+static void set_cauchy(uint8_t *matrix)
+{
+ int i, j;
+ uint8_t inv_x, y;
+
+ /*
+ * The first row of the generator matrix is formed by all 1.
+ *
+ * The generator matrix is an Extended Cauchy matrix built from
+ * a Cauchy matrix adding at the top a row of all 1.
+ *
+ * Extending a Cauchy matrix in this way maintains the MDS property
+ * of the matrix.
+ *
+ * For example, considering a generator matrix of 4x6 we have now:
+ *
+ * 1 1 1 1 1 1
+ * - - - - - -
+ * - - - - - -
+ * - - - - - -
+ */
+ for (i = 0; i < DISK; ++i)
+ matrix[0*DISK+i] = 1;
+
+ /*
+ * Second row is formed with powers 2^i, and it's the first
+ * row of the Cauchy matrix.
+ *
+ * Each element of the Cauchy matrix is in the form 1/(x_i + y_j)
+ * where all x_i and y_j must be different for any i and j.
+ *
+ * For the first row with j=0, we choose x_i = 2^-i and y_0 = 0
+ * and we obtain a first row formed as:
+ *
+ * 1/(x_i + y_0) = 1/(2^-i + 0) = 2^i
+ *
+ * with 2^-i != 0 for any i
+ *
+ * In the example we get:
+ *
+ * x_0 = 1
+ * x_1 = 142
+ * x_2 = 71
+ * x_3 = 173
+ * x_4 = 216
+ * x_5 = 108
+ * y_0 = 0
+ *
+ * with the matrix:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * - - - - - -
+ * - - - - - -
+ */
+ inv_x = 1;
+ for (i = 0; i < DISK; ++i) {
+ matrix[1*DISK+i] = inv_x;
+ inv_x = gfmul(2, inv_x);
+ }
+
+ /*
+ * The rest of the Cauchy matrix is formed choosing for each row j
+ * a new y_j = 2^j and reusing the x_i already assigned in the first
+ * row obtaining :
+ *
+ * 1/(x_i + y_j) = 1/(2^-i + 2^j)
+ *
+ * with 2^-i + 2^j != 0 for any i,j with i>=0,j>=1,i+j<255
+ *
+ * In the example we get:
+ *
+ * y_1 = 2
+ * y_2 = 4
+ *
+ * with the matrix:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * 244 83 78 183 118 47
+ * 167 39 213 59 153 82
+ */
+ y = 2;
+ for (j = 0; j < PARITY-2; ++j) {
+ inv_x = 1;
+ for (i = 0; i < DISK; ++i) {
+ uint8_t x = gfinv[inv_x];
+ matrix[(j+2)*DISK+i] = gfinv[y ^ x];
+ inv_x = gfmul(2, inv_x);
+ }
+
+ y = gfmul(2, y);
+ }
+
+ /*
+ * Finally we adjust the matrix multipling each row for
+ * the inverse of the first element in the row.
+ *
+ * Also this operation maintains the MDS property of the matrix.
+ *
+ * Resulting in:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * 1 245 210 196 154 113
+ * 1 187 166 215 7 106
+ */
+ for (j = 0; j < PARITY-2; ++j) {
+ uint8_t f = gfinv[matrix[(j+2)*DISK]];
+
+ for (i = 0; i < DISK; ++i)
+ matrix[(j+2)*DISK+i] = gfmul(matrix[(j+2)*DISK+i], f);
+ }
+}
+
+/**
+ * Next power of 2.
+ */
+static unsigned np(unsigned v)
+{
+ --v;
+ v |= v >> 1;
+ v |= v >> 2;
+ v |= v >> 4;
+ v |= v >> 8;
+ v |= v >> 16;
+ ++v;
+
+ return v;
+}
+
+int main(void)
+{
+ uint8_t v;
+ int i, j, k, p;
+ uint8_t matrix[PARITY * 256];
+
+ printf("/*\n");
+ printf(" * Copyright (C) 2013 Andrea Mazzoleni\n");
+ printf(" *\n");
+ printf(" * This program is free software: you can redistribute it and/or modify\n");
+ printf(" * it under the terms of the GNU General Public License as published by\n");
+ printf(" * the Free Software Foundation, either version 2 of the License, or\n");
+ printf(" * (at your option) any later version.\n");
+ printf(" *\n");
+ printf(" * This program is distributed in the hope that it will be useful,\n");
+ printf(" * but WITHOUT ANY WARRANTY; without even the implied warranty of\n");
+ printf(" * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n");
+ printf(" * GNU General Public License for more details.\n");
+ printf(" */\n");
+ printf("\n");
+
+ printf("#include \"internal.h\"\n");
+ printf("\n");
+
+ /* a*b */
+ printf("#ifndef RAID_USE_RAID6_PQ\n");
+ printf("const uint8_t __aligned(256) raid_gfmul[256][256] =\n");
+ printf("{\n");
+ for (i = 0; i < 256; ++i) {
+ printf("\t{\n");
+ for (j = 0; j < 256; ++j) {
+ if (j % 8 == 0)
+ printf("\t\t");
+ v = gfmul(i, j);
+ if (v == 1)
+ gfinv[i] = j;
+ printf("0x%02x,", (unsigned)v);
+ if (j % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("\t},\n");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfmul);\n");
+ printf("#endif\n");
+ printf("\n");
+
+ /* 2^a */
+ printf("#ifndef RAID_USE_RAID6_PQ\n");
+ printf("const uint8_t __aligned(256) raid_gfexp[256] =\n");
+ printf("{\n");
+ v = 1;
+ for (i = 0; i < 256; ++i) {
+ if (i % 8 == 0)
+ printf("\t");
+ printf("0x%02x,", v);
+ v = gfmul(v, 2);
+ if (i % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfexp);\n");
+ printf("#endif\n");
+ printf("\n");
+
+ /* 1/a */
+ printf("#ifndef RAID_USE_RAID6_PQ\n");
+ printf("const uint8_t __aligned(256) raid_gfinv[256] =\n");
+ printf("{\n");
+ printf("\t/* note that the first element is not significative */\n");
+ for (i = 0; i < 256; ++i) {
+ if (i % 8 == 0)
+ printf("\t");
+ if (i == 0)
+ v = 0;
+ else
+ v = gfinv[i];
+ printf("0x%02x,", v);
+ if (i % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfinv);\n");
+ printf("#endif\n");
+ printf("\n");
+
+ /* cauchy matrix */
+ set_cauchy(matrix);
+
+ printf("/**\n");
+ printf(" * Cauchy matrix used to generate parity.\n");
+ printf(" * This matrix is valid for up to %u parity with %u data disks.\n", PARITY, DISK);
+ printf(" *\n");
+ for (p = 0; p < PARITY; ++p) {
+ printf(" * ");
+ for (i = 0; i < DISK; ++i)
+ printf("%02x ", matrix[p*DISK+i]);
+ printf("\n");
+ }
+ printf(" */\n");
+ printf("const uint8_t __aligned(256) raid_gfcauchy[%u][256] =\n", PARITY);
+ printf("{\n");
+ for (p = 0; p < PARITY; ++p) {
+ printf("\t{\n");
+ for (i = 0; i < DISK; ++i) {
+ if (i % 8 == 0)
+ printf("\t\t");
+ printf("0x%02x,", matrix[p*DISK+i]);
+ if (i % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("\n\t},\n");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfcauchy);\n");
+ printf("\n");
+
+ printf("#ifdef CONFIG_X86\n");
+ printf("/**\n");
+ printf(" * PSHUFB tables for the Cauchy matrix.\n");
+ printf(" *\n");
+ printf(" * Indexes are [DISK][PARITY - 2][LH].\n");
+ printf(" * Where DISK is from 0 to %u, PARITY from 2 to %u, LH from 0 to 1.\n", DISK - 1, PARITY - 1);
+ printf(" */\n");
+ printf("const uint8_t __aligned(256) raid_gfcauchypshufb[%u][%u][2][16] =\n", DISK, np(PARITY - 2));
+ printf("{\n");
+ for (i = 0; i < DISK; ++i) {
+ printf("\t{\n");
+ for (p = 2; p < PARITY; ++p) {
+ printf("\t\t{\n");
+ for (j = 0; j < 2; ++j) {
+ printf("\t\t\t{ ");
+ for (k = 0; k < 16; ++k) {
+ v = gfmul(matrix[p*DISK+i], k);
+ if (j == 1)
+ v = gfmul(v, 16);
+ printf("0x%02x", (unsigned)v);
+ if (k != 15)
+ printf(", ");
+ }
+ printf(" },\n");
+ }
+ printf("\t\t},\n");
+ }
+ printf("\t},\n");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfcauchypshufb);\n");
+ printf("#endif\n\n");
+
+ printf("#ifdef CONFIG_X86\n");
+ printf("/**\n");
+ printf(" * PSHUFB tables for generic multiplication.\n");
+ printf(" *\n");
+ printf(" * Indexes are [MULTIPLER][LH].\n");
+ printf(" * Where MULTIPLER is from 0 to 255, LH from 0 to 1.\n");
+ printf(" */\n");
+ printf("const uint8_t __aligned(256) raid_gfmulpshufb[256][2][16] =\n");
+ printf("{\n");
+ for (i = 0; i < 256; ++i) {
+ printf("\t{\n");
+ for (j = 0; j < 2; ++j) {
+ printf("\t\t{ ");
+ for (k = 0; k < 16; ++k) {
+ v = gfmul(i, k);
+ if (j == 1)
+ v = gfmul(v, 16);
+ printf("0x%02x", (unsigned)v);
+ if (k != 15)
+ printf(", ");
+ }
+ printf(" },\n");
+ }
+ printf("\t},\n");
+ }
+ printf("};\n");
+ printf("EXPORT_SYMBOL(raid_gfmulpshufb);\n");
+ printf("#endif\n\n");
+
+ return 0;
+}
+
diff --git a/lib/raid/module.c b/lib/raid/module.c
new file mode 100644
index 0000000..8d45ab4
--- /dev/null
+++ b/lib/raid/module.c
@@ -0,0 +1,458 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "cpu.h"
+
+/*
+ * Initializes and selects the best algorithm.
+ */
+void raid_init(void)
+{
+ /* setup parity functions */
+ if (sizeof(void *) == 8) {
+ raid_gen_ptr[0] = raid_gen1_int64;
+ raid_gen_ptr[1] = raid_gen2_int64;
+ } else {
+ raid_gen_ptr[0] = raid_gen1_int32;
+ raid_gen_ptr[1] = raid_gen2_int32;
+ }
+ raid_gen_ptr[2] = raid_gen3_int8;
+ raid_gen_ptr[3] = raid_gen4_int8;
+ raid_gen_ptr[4] = raid_gen5_int8;
+ raid_gen_ptr[5] = raid_gen6_int8;
+
+ /* if XOR_BLOCKS is present, use it */
+#ifdef RAID_USE_XOR_BLOCKS
+ raid_gen_ptr[0] = raid_gen1_xorblocks;
+#endif
+ /* if RAID6 is present, use it */
+#ifdef RAID_USE_RAID6_PQ
+ raid_gen_ptr[1] = raid_gen2_raid6;
+#endif
+
+ /* optimized SSE2 functions */
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ raid_gen_ptr[0] = raid_gen1_sse2;
+ raid_gen_ptr[1] = raid_gen2_sse2;
+#ifdef CONFIG_X86_64
+ raid_gen_ptr[1] = raid_gen2_sse2ext;
+#endif
+ }
+#endif
+
+ /* optimized SSSE3 functions */
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ raid_gen_ptr[2] = raid_gen3_ssse3;
+ raid_gen_ptr[3] = raid_gen4_ssse3;
+ raid_gen_ptr[4] = raid_gen5_ssse3;
+ raid_gen_ptr[5] = raid_gen6_ssse3;
+#ifdef CONFIG_X86_64
+ raid_gen_ptr[2] = raid_gen3_ssse3ext;
+ raid_gen_ptr[3] = raid_gen4_ssse3ext;
+ raid_gen_ptr[4] = raid_gen5_ssse3ext;
+ raid_gen_ptr[5] = raid_gen6_ssse3ext;
+#endif
+ }
+#endif
+
+ /* setup recovering functions */
+ raid_rec_ptr[0] = raid_rec1_int8;
+ raid_rec_ptr[1] = raid_rec2_int8;
+ raid_rec_ptr[2] = raid_recX_int8;
+ raid_rec_ptr[3] = raid_recX_int8;
+ raid_rec_ptr[4] = raid_recX_int8;
+ raid_rec_ptr[5] = raid_recX_int8;
+
+ /* optimized SSSE3 functions */
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ raid_rec_ptr[0] = raid_rec1_ssse3;
+ raid_rec_ptr[1] = raid_rec2_ssse3;
+ raid_rec_ptr[2] = raid_recX_ssse3;
+ raid_rec_ptr[3] = raid_recX_ssse3;
+ raid_rec_ptr[4] = raid_recX_ssse3;
+ raid_rec_ptr[5] = raid_recX_ssse3;
+ }
+#endif
+}
+
+/*
+ * Refence parity computation.
+ */
+void raid_gen_ref(int nd, int np, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ size_t i;
+
+ for (i = 0; i < size; ++i) {
+ uint8_t p[RAID_PARITY_MAX];
+ int j, d;
+
+ for (j = 0; j < np; ++j)
+ p[j] = 0;
+
+ for (d = 0; d < nd; ++d) {
+ uint8_t b = v[d][i];
+
+ for (j = 0; j < np; ++j)
+ p[j] ^= gfmul[b][gfgen[j][d]];
+ }
+
+ for (j = 0; j < np; ++j)
+ v[nd + j][i] = p[j];
+ }
+}
+
+/*
+ * Size of the blocks to test.
+ */
+#define TEST_SIZE PAGE_SIZE
+
+/*
+ * Number of data blocks to test.
+ */
+#define TEST_COUNT (65536 / TEST_SIZE)
+
+/*
+ * Period for the speed test.
+ */
+#ifdef __KERNEL__ /* to build the user mode test */
+#define TEST_PERIOD 16
+#else
+#ifdef COVERAGE
+#define TEST_PERIOD 100 /* fast in coverage test */
+#else
+#define TEST_PERIOD 512 /* more time in usermode */
+#endif
+#endif
+
+/*
+ * Parity generation test.
+ */
+static int raid_test_par(int nd, int np, size_t size, void **v, void **ref)
+{
+ int i;
+ void *t[TEST_COUNT + RAID_PARITY_MAX];
+
+ /* setup data */
+ for (i = 0; i < nd; ++i)
+ t[i] = ref[i];
+
+ /* setup parity */
+ for (i = 0; i < np; ++i)
+ t[nd+i] = v[nd+i];
+
+ raid_gen(nd, np, size, t);
+
+ /* compare parity */
+ for (i = 0; i < np; ++i) {
+ if (memcmp(t[nd+i], ref[nd+i], size) != 0) {
+ pr_err("raid: Self test failed!\n");
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Recovering test.
+ */
+static int raid_test_rec(int nr, int *ir, int nd, int np, size_t size, void **v, void **ref)
+{
+ int i, j;
+ void *t[TEST_COUNT + RAID_PARITY_MAX];
+
+ /* setup vector */
+ for (i = 0, j = 0; i < nd+np; ++i) {
+ if (j < nr && ir[j] == i) {
+ /* this block has to be recovered */
+ t[i] = v[i];
+ ++j;
+ } else {
+ /* this block is left unchanged */
+ t[i] = ref[i];
+ }
+ }
+
+ raid_rec(nr, ir, nd, np, size, t);
+
+ /* compare all data and parity */
+ for (i = 0; i < nd+np; ++i) {
+ if (t[i] != ref[i] && memcmp(t[i], ref[i], size) != 0) {
+ pr_err("raid: Self test failed!\n");
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Basic functionality self test.
+ */
+int raid_selftest(void)
+{
+ const int nd = TEST_COUNT;
+ const size_t size = TEST_SIZE;
+ const int nv = nd + RAID_PARITY_MAX * 2;
+ uint8_t *pages;
+ void *v[nd + RAID_PARITY_MAX * 2];
+ void *ref[nd + RAID_PARITY_MAX];
+ int ir[RAID_PARITY_MAX];
+ int i, np;
+ int ret = 0;
+
+ /* ensure to have enough space for data */
+ BUG_ON(nd * size > 65536);
+
+ /* allocates pages for data and parity */
+ pages = alloc_pages_exact(nv * size, GFP_KERNEL);
+ if (!pages) {
+ pr_err("raid: No memory available.\n");
+ return -ENOMEM;
+ }
+
+ /* setup working vector */
+ for (i = 0; i < nv; ++i)
+ v[i] = pages + size * i;
+
+ /* use the multiplication table as data */
+ for (i = 0; i < nd; ++i)
+ ref[i] = ((uint8_t *)gfmul) + size * i;
+
+ /* setup reference parity */
+ for (i = 0; i < RAID_PARITY_MAX; ++i)
+ ref[nd+i] = v[nd+RAID_PARITY_MAX+i];
+
+ /* compute reference parity */
+ raid_gen_ref(nd, RAID_PARITY_MAX, size, ref);
+
+ /* test for each parity level */
+ for (np = 1; np <= RAID_PARITY_MAX; ++np) {
+ /* test parity generation */
+ ret = raid_test_par(nd, np, size, v, ref);
+ if (ret != 0)
+ goto bail;
+
+ /* test recovering with full broken data disks */
+ for (i = 0; i < np; ++i)
+ ir[i] = nd - np + i;
+
+ ret = raid_test_rec(np, ir, nd, np, size, v, ref);
+ if (ret != 0)
+ goto bail;
+
+ /* test recovering with half broken data and leading parity */
+ for (i = 0; i < np / 2; ++i)
+ ir[i] = i;
+
+ for (i = 0; i < (np + 1) / 2; ++i)
+ ir[np / 2 + i] = nd + i;
+
+ ret = raid_test_rec(np, ir, nd, np, size, v, ref);
+ if (ret != 0)
+ goto bail;
+
+ /* test recovering with half broken data and ending parity */
+ for (i = 0; i < np / 2; ++i)
+ ir[i] = i;
+
+ for (i = 0; i < (np + 1) / 2; ++i)
+ ir[np / 2 + i] = nd + np - (np + 1) / 2 + i;
+
+ ret = raid_test_rec(np, ir, nd, np, size, v, ref);
+ if (ret != 0)
+ goto bail;
+ }
+
+bail:
+ free_pages_exact(pages, nv * size);
+
+ return ret;
+}
+
+/*
+ * Test the speed of a single function.
+ */
+static void raid_test_speed(
+ void (*func)(int nd, size_t size, void **vv),
+ const char *tag, const char *imp,
+ void **vv)
+{
+ unsigned count;
+ unsigned long j_start, j_stop;
+ unsigned long speed;
+
+ count = 0;
+
+ preempt_disable();
+
+ j_start = jiffies;
+ while ((j_stop = jiffies) == j_start)
+ cpu_relax();
+
+ j_stop += TEST_PERIOD;
+ while (time_before(jiffies, j_stop)) {
+#ifdef __KERNEL__
+ func(TEST_COUNT, TEST_SIZE, vv);
+ ++count;
+#else
+ /* in usermode reading jiffies is a slow operation */
+ unsigned i;
+ for (i = 0; i < 16; ++i) {
+ func(TEST_COUNT, TEST_SIZE, vv);
+ ++count;
+ }
+#endif
+ }
+
+ preempt_enable();
+
+ speed = count * HZ / (TEST_PERIOD * 1024 * 1024 / (TEST_SIZE * TEST_COUNT));
+
+ pr_info("raid: %-4s %-6s %5ld MB/s\n", tag, imp, speed);
+}
+
+/*
+ * Basic speed test.
+ *
+ * @displacement Memory displacement to use to improve cache coloring.
+ * Use 0 for not optimized memory layout.
+ */
+int raid_speedtest(int displacement)
+{
+ const int nd = TEST_COUNT;
+ const size_t size = TEST_SIZE;
+ const int nv = nd + RAID_PARITY_MAX;
+ uint8_t *pages;
+ void *v[nd + RAID_PARITY_MAX];
+ int i;
+
+ /* ensure to have enough space for data */
+ BUG_ON(nd * size > 65536);
+
+ /* allocates pages for parity */
+ pages = alloc_pages_exact(nv * (size + displacement), GFP_KERNEL);
+ if (!pages) {
+ pr_err("raid: No memory available.\n");
+ return -ENOMEM;
+ }
+
+ /* setup working vector */
+ for (i = 0; i < nv; ++i)
+ v[i] = pages + (size + displacement) * i;
+
+ /* if we use optimized memory layout */
+ if (displacement != 0) {
+ /* reverse the data buffers because they are accessed */
+ /* in reverse order */
+ for (i = 0; i < nd / 2; ++i) {
+ void *t = v[i];
+ v[i] = v[nd-1-i];
+ v[nd-1-i] = t;
+ }
+ }
+
+ /* use the multiplication table as data */
+ for (i = 0; i < nd; ++i)
+ memcpy(v[i], ((uint8_t *)gfmul) + size * i, size);
+
+ raid_test_speed(raid_gen1_int32, "gen1", "int32", v);
+ raid_test_speed(raid_gen2_int32, "gen2", "int32", v);
+ raid_test_speed(raid_gen1_int64, "gen1", "int64", v);
+ raid_test_speed(raid_gen2_int64, "gen2", "int64", v);
+ raid_test_speed(raid_gen3_int8, "gen3", "int8", v);
+ raid_test_speed(raid_gen4_int8, "gen4", "int8", v);
+ raid_test_speed(raid_gen5_int8, "gen5", "int8", v);
+ raid_test_speed(raid_gen6_int8, "gen6", "int8", v);
+#ifdef RAID_USE_XOR_BLOCKS
+ raid_test_speed(raid_gen1_xorblocks, "gen1", "xor", v);
+#endif
+#ifdef RAID_USE_RAID6_PQ
+ raid_test_speed(raid_gen2_raid6, "gen2", "raid6", v);
+#endif
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ raid_test_speed(raid_gen1_sse2, "gen1", "sse2", v);
+ raid_test_speed(raid_gen2_sse2, "gen2", "sse2", v);
+ }
+ if (raid_cpu_has_ssse3()) {
+ raid_test_speed(raid_gen3_ssse3, "gen3", "ssse3", v);
+ raid_test_speed(raid_gen4_ssse3, "gen4", "ssse3", v);
+ raid_test_speed(raid_gen5_ssse3, "gen5", "ssse3", v);
+ raid_test_speed(raid_gen6_ssse3, "gen6", "ssse3", v);
+#ifdef CONFIG_X86_64
+ raid_test_speed(raid_gen2_sse2ext, "gen2", "sse2e", v);
+ raid_test_speed(raid_gen3_ssse3ext, "gen3", "ssse3e", v);
+ raid_test_speed(raid_gen4_ssse3ext, "gen4", "ssse3e", v);
+ raid_test_speed(raid_gen5_ssse3ext, "gen5", "ssse3e", v);
+ raid_test_speed(raid_gen6_ssse3ext, "gen6", "ssse3e", v);
+#endif
+ }
+#endif
+
+ free_pages_exact(pages, nv * (size + displacement));
+
+ return 0;
+}
+
+#ifdef __KERNEL__ /* to build the user mode test */
+static int speedtest;
+
+int __init raid_cauchy_init(void)
+{
+ int ret;
+
+ raid_init();
+
+#ifdef RAID_USE_XOR_BLOCKS
+ pr_info("raid: Using xor_blocks\n");
+#endif
+#ifdef RAID_USE_RAID6_PQ
+ pr_info("raid: Using raid6\n");
+#endif
+
+ ret = raid_selftest();
+ if (ret != 0)
+ return ret;
+
+ pr_info("raid: Self test passed\n");
+
+ if (speedtest) {
+ pr_info("raid: Speed test\n");
+ raid_speedtest(0);
+ pr_info("raid: Speed test with optimized memory layout\n");
+ raid_speedtest(64); /* 64 is the typical cache line size */
+ }
+
+ return 0;
+}
+
+static void raid_cauchy_exit(void)
+{
+}
+
+subsys_initcall(raid_cauchy_init);
+module_exit(raid_cauchy_exit);
+module_param(speedtest, int, 0);
+MODULE_PARM_DESC(speedtest, "Runs a startup speed test");
+MODULE_AUTHOR("Andrea Mazzoleni <[email protected]>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("RAID Cauchy functions");
+#endif
+
diff --git a/lib/raid/raid.c b/lib/raid/raid.c
new file mode 100644
index 0000000..e1b1660
--- /dev/null
+++ b/lib/raid/raid.c
@@ -0,0 +1,492 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "gf.h"
+
+/*
+ * This is a RAID implementation working in the Galois Field GF(2^8) with
+ * the primitive polynomial x^8 + x^4 + x^3 + x^2 + 1 (285 decimal), and
+ * supporting up to six parity levels.
+ *
+ * For RAID5 and RAID6 it works as as described in the H. Peter Anvin's
+ * paper "The mathematics of RAID-6" [1]. Please refer to this paper for a
+ * complete explanation.
+ *
+ * To support triple parity, it was first evaluated and then dropped, an
+ * extension of the same approach, with additional parity coefficients set
+ * as powers of 2^-1, with equations:
+ *
+ * P = sum(Di)
+ * Q = sum(2^i * Di)
+ * R = sum(2^-i * Di) with 0<=i<N
+ *
+ * This approach works well for triple parity and it's very efficient,
+ * because we can implement very fast parallel multiplications and
+ * divisions by 2 in GF(2^8).
+ *
+ * It's also similar at the approach used by ZFS RAIDZ3, with the
+ * difference that ZFS uses powers of 4 instead of 2^-1.
+ *
+ * Unfortunately it doesn't work beyond triple parity, because whatever
+ * value we choose to generate the power coefficients to compute other
+ * parities, the resulting equations are not solvable for some
+ * combinations of missing disks.
+ *
+ * This is expected, because the Vandermonde matrix used to compute the
+ * parity has no guarantee to have all submatrices not singular
+ * [2, Chap 11, Problem 7] and this is a requirement to have
+ * a MDS (Maximum Distance Separable) code [2, Chap 11, Theorem 8].
+ *
+ * To overcome this limitation, we use a Cauchy matrix [3][4] to compute
+ * the parity. A Cauchy matrix has the property to have all the square
+ * submatrices not singular, resulting in always solvable equations,
+ * for any combination of missing disks.
+ *
+ * The problem of this approach is that it requires the use of
+ * generic multiplications, and not only by 2 or 2^-1, potentially
+ * affecting badly the performance.
+ *
+ * Hopefully there is a method to implement parallel multiplications
+ * using SSSE3 instructions [1][5]. Method competitive with the
+ * computation of triple parity using power coefficients.
+ *
+ * Another important property of the Cauchy matrix is that we can setup
+ * the first two rows with coeffients equal at the RAID5 and RAID6 approach
+ * decribed, resulting in a compatible extension, and requiring SSSE3
+ * instructions only if triple parity or beyond is used.
+ *
+ * The matrix is also adjusted, multipling each row by a constant factor
+ * to make the first column of all 1, to optimize the computation for
+ * the first disk.
+ *
+ * This results in the matrix A[row,col] defined as:
+ *
+ * 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01...
+ * 01 02 04 08 10 20 40 80 1d 3a 74 e8 cd 87 13 26 4c 98 2d 5a b4 75...
+ * 01 f5 d2 c4 9a 71 f1 7f fc 87 c1 c6 19 2f 40 55 3d ba 53 04 9c 61...
+ * 01 bb a6 d7 c7 07 ce 82 4a 2f a5 9b b6 60 f1 ad e7 f4 06 d2 df 2e...
+ * 01 97 7f 9c 7c 18 bd a2 58 1a da 74 70 a3 e5 47 29 07 f5 80 23 e9...
+ * 01 2b 3f cf 73 2c d6 ed cb 74 15 78 8a c1 17 c9 89 68 21 ab 76 3b...
+ *
+ * This matrix supports 6 level of parity, one for each row, for up to 251
+ * data disks, one for each column, with all the 377,342,351,231 square
+ * submatrices not singular, verified also with brute-force.
+ *
+ * This matrix can be extended to support any number of parities, just
+ * adding additional rows, and removing one column for each new row.
+ * (see mktables.c for more details in how the matrix is generated)
+ *
+ * In details, parity is computed as:
+ *
+ * P = sum(Di)
+ * Q = sum(2^i * Di)
+ * R = sum(A[2,i] * Di)
+ * S = sum(A[3,i] * Di)
+ * T = sum(A[4,i] * Di)
+ * U = sum(A[5,i] * Di) with 0<=i<N
+ *
+ * To recover from a failure of six disks at indexes x,y,z,h,v,w,
+ * with 0<=x<y<z<h<v<w<N, we compute the parity of the available N-6
+ * disks as:
+ *
+ * Pa = sum(Di)
+ * Qa = sum(2^i * Di)
+ * Ra = sum(A[2,i] * Di)
+ * Sa = sum(A[3,i] * Di)
+ * Ta = sum(A[4,i] * Di)
+ * Ua = sum(A[5,i] * Di) with 0<=i<N,i!=x,i!=y,i!=z,i!=h,i!=v,i!=w.
+ *
+ * And if we define:
+ *
+ * Pd = Pa + P
+ * Qd = Qa + Q
+ * Rd = Ra + R
+ * Sd = Sa + S
+ * Td = Ta + T
+ * Ud = Ua + U
+ *
+ * we can sum these two sets of equations, obtaining:
+ *
+ * Pd = Dx + Dy + Dz + Dh + Dv + Dw
+ * Qd = 2^x * Dx + 2^y * Dy + 2^z * Dz + 2^h * Dh + 2^v * Dv + 2^w * Dw
+ * Rd = A[2,x] * Dx + A[2,y] * Dy + A[2,z] * Dz + A[2,h] * Dh + A[2,v] * Dv + A[2,w] * Dw
+ * Sd = A[3,x] * Dx + A[3,y] * Dy + A[3,z] * Dz + A[3,h] * Dh + A[3,v] * Dv + A[3,w] * Dw
+ * Td = A[4,x] * Dx + A[4,y] * Dy + A[4,z] * Dz + A[4,h] * Dh + A[4,v] * Dv + A[4,w] * Dw
+ * Ud = A[5,x] * Dx + A[5,y] * Dy + A[5,z] * Dz + A[5,h] * Dh + A[5,v] * Dv + A[5,w] * Dw
+ *
+ * A linear system always solvable because the coefficients matrix is
+ * always not singular due the properties of the matrix A[].
+ *
+ * Resulting speed in x64, with 16 data disks, using a stripe of 4 KiB,
+ * for a Core i7-3740QM CPU @ 2.7GHz is:
+ *
+ * int8 int32 int64 sse2 sse2e ssse3 ssse3e
+ * gen1 11469 21579 44743
+ * gen2 3474 6176 17930 20435
+ * gen3 850 7908 9069
+ * gen4 647 6357 7159
+ * gen5 527 5041 5412
+ * gen6 432 4094 4470
+ *
+ * Values are in MiB/s of data processed, not counting generated parity.
+ *
+ * References:
+ * [1] Anvin, "The mathematics of RAID-6", 2004
+ * [2] MacWilliams, Sloane, "The Theory of Error-Correcting Codes", 1977
+ * [3] Blomer, "An XOR-Based Erasure-Resilient Coding Scheme", 1995
+ * [4] Roth, "Introduction to Coding Theory", 2006
+ * [5] Plank, "Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions", 2013
+ */
+
+/**
+ * Buffer filled with 0 used in recovering.
+ */
+uint8_t raid_zero_block[PAGE_SIZE] __aligned(256);
+
+#ifdef RAID_USE_XOR_BLOCKS
+/*
+ * PAR1 (RAID5 with xor) implementation using the kernel xor_blocks()
+ * function.
+ */
+void raid_gen1_xorblocks(int nd, size_t size, void **v)
+{
+ int i;
+
+ /* copy the first block */
+ memcpy(v[nd], v[0], size);
+
+ i = 1;
+ while (i < nd) {
+ int run = nd - i;
+
+ /* xor_blocks supports no more than MAX_XOR_BLOCKS blocks */
+ if (run > MAX_XOR_BLOCKS)
+ run = MAX_XOR_BLOCKS;
+
+ xor_blocks(run, size, v[nd], v + i);
+
+ i += run;
+ }
+}
+#endif
+
+#ifdef RAID_USE_RAID6_PQ
+/**
+ * PAR2 (RAID6 with powers of 2) implementation using raid6 library.
+ */
+void raid_gen2_raid6(int nd, size_t size, void **vv)
+{
+ raid6_call.gen_syndrome(nd + 2, size, vv);
+}
+#endif
+
+/*
+ * Forwarders for parity computation.
+ *
+ * These functions compute the parity blocks from the provided data.
+ *
+ * The number of parities to compute is implicit in the position in the
+ * forwarder vector. Position at index #i, computes (#i+1) parities.
+ *
+ * @nd Number of data blocks
+ * @size Size of the blocks pointed by @v. It must be a multipler of 64.
+ * @v Vector of pointers to the blocks of data and parity.
+ * It has (@nd + #parities) elements. The starting elements are the blocks for
+ * data, following with the parity blocks.
+ * Each block has @size bytes.
+ */
+void (*raid_gen_ptr[RAID_PARITY_MAX])(int nd, size_t size, void **v);
+
+void raid_gen(int nd, int np, size_t size, void **v)
+{
+ BUG_ON(np < 1 || np > RAID_PARITY_MAX);
+ BUG_ON(size % 64 != 0);
+
+ raid_gen_ptr[np - 1](nd, size, v);
+}
+EXPORT_SYMBOL_GPL(raid_gen);
+
+/**
+ * Inverts the square matrix M of size nxn into V.
+ *
+ * This is not a general matrix inversion because we assume the matrix M
+ * to have all the square submatrix not singular.
+ * We use Gauss elimination to invert.
+ *
+ * @M Matrix to invert with @n rows and @n columns.
+ * @V Destination matrix where the result is put.
+ * @n Number of rows and columns of the matrix.
+ */
+void raid_invert(uint8_t *M, uint8_t *V, int n)
+{
+ int i, j, k;
+
+ /* set the identity matrix in V */
+ for (i = 0; i < n; ++i)
+ for (j = 0; j < n; ++j)
+ V[i*n+j] = i == j;
+
+ /* for each element in the diagonal */
+ for (k = 0; k < n; ++k) {
+ uint8_t f;
+
+ /* the diagonal element cannot be 0 because */
+ /* we are inverting matrices with all the square */
+ /* submatrices not singular */
+ BUG_ON(M[k*n+k] == 0);
+
+ /* make the diagonal element to be 1 */
+ f = inv(M[k*n+k]);
+ for (j = 0; j < n; ++j) {
+ M[k*n+j] = mul(f, M[k*n+j]);
+ V[k*n+j] = mul(f, V[k*n+j]);
+ }
+
+ /* make all the elements over and under the diagonal */
+ /* to be zero */
+ for (i = 0; i < n; ++i) {
+ if (i == k)
+ continue;
+ f = M[i*n+k];
+ for (j = 0; j < n; ++j) {
+ M[i*n+j] ^= mul(f, M[k*n+j]);
+ V[i*n+j] ^= mul(f, V[k*n+j]);
+ }
+ }
+ }
+}
+
+/**
+ * Computes the parity without the missing data blocks
+ * and store it in the buffers of such data blocks.
+ *
+ * This is the parity expressed as Pa,Qa,Ra,Sa,Ta,Ua
+ * in the equations.
+ *
+ * Note that all the other parities not in the ip[] vector
+ * are destroyed.
+ */
+void raid_delta_gen(int nr, int *id, int *ip, int nd, size_t size, void **v)
+{
+ void *p[RAID_PARITY_MAX];
+ void *pa[RAID_PARITY_MAX];
+ int i;
+
+ for (i = 0; i < nr; ++i) {
+ /* keep a copy of the parity buffer */
+ p[i] = v[nd+ip[i]];
+
+ /* buffer for missing data blocks */
+ pa[i] = v[id[i]];
+
+ /* set at zero the missing data blocks */
+ v[id[i]] = raid_zero_block;
+
+ /* compute the parity over the missing data blocks */
+ v[nd+ip[i]] = pa[i];
+ }
+
+ /* recompute the minimal parity required */
+ raid_gen(nd, ip[nr - 1] + 1, size, v);
+
+ for (i = 0; i < nr; ++i) {
+ /* restore disk buffers as before */
+ v[id[i]] = pa[i];
+
+ /* restore parity buffers as before */
+ v[nd+ip[i]] = p[i];
+ }
+}
+
+/**
+ * Recover failure of one data block for PAR1.
+ *
+ * Starting from the equation:
+ *
+ * Pd = Dx
+ *
+ * and solving we get:
+ *
+ * Dx = Pd
+ */
+void raid_rec1of1(int *id, int nd, size_t size, void **v)
+{
+ void *p;
+ void *pa;
+
+ /* for PAR1 we can directly compute the missing block */
+ /* and we don't need to use the zero buffer */
+ p = v[nd];
+ pa = v[id[0]];
+
+ /* use the parity as missing data block */
+ v[id[0]] = p;
+
+ /* compute the parity over the missing data block */
+ v[nd] = pa;
+
+ /* compute */
+ raid_gen(nd, 1, size, v);
+
+ /* restore as before */
+ v[id[0]] = pa;
+ v[nd] = p;
+}
+
+/**
+ * Recover failure of two data blocks for PAR2.
+ *
+ * Starting from the equations:
+ *
+ * Pd = Dx + Dy
+ * Qd = 2^id[0] * Dx + 2^id[1] * Dy
+ *
+ * and solving we get:
+ *
+ * 1 2^(-id[0])
+ * Dy = ------------------- * Pd + ------------------- * Qd
+ * 2^(id[1]-id[0]) + 1 2^(id[1]-id[0]) + 1
+ *
+ * Dx = Dy + Pd
+ *
+ * with conditions:
+ *
+ * 2^id[0] != 0
+ * 2^(id[1]-id[0]) + 1 != 0
+ *
+ * That are always satisfied for any 0<=id[0]<id[1]<255.
+ */
+void raid_rec2of2_int8(int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ size_t i;
+ uint8_t *p;
+ uint8_t *pa;
+ uint8_t *q;
+ uint8_t *qa;
+ const uint8_t *T[2];
+
+ /* get multiplication tables */
+ T[0] = table(inv(pow2(id[1]-id[0]) ^ 1));
+ T[1] = table(inv(pow2(id[0]) ^ pow2(id[1])));
+
+ /* compute delta parity */
+ raid_delta_gen(2, id, ip, nd, size, vv);
+
+ p = v[nd];
+ q = v[nd+1];
+ pa = v[id[0]];
+ qa = v[id[1]];
+
+ for (i = 0; i < size; ++i) {
+ /* delta */
+ uint8_t Pd = p[i] ^ pa[i];
+ uint8_t Qd = q[i] ^ qa[i];
+
+ /* reconstruct */
+ uint8_t Dy = T[0][Pd] ^ T[1][Qd];
+ uint8_t Dx = Pd ^ Dy;
+
+ /* set */
+ pa[i] = Dx;
+ qa[i] = Dy;
+ }
+}
+
+/*
+ * Forwarders for data recovery.
+ *
+ * These functions recover data blocks using the specified parity
+ * to recompute the missing data.
+ *
+ * Note that the format of vectors @id/@ip is different than raid_rec().
+ * For example, in the vector @ip the first parity is represented with the
+ * value 0 and not @nd.
+ *
+ * @nr Number of failed data blocks to recover.
+ * @id[] Vector of @nr indexes of the data blocks to recover.
+ * The indexes start from 0. They must be in order.
+ * @ip[] Vector of @nr indexes of the parity blocks to use in the recovering.
+ * The indexes start from 0. They must be in order.
+ * @nd Number of data blocks.
+ * @np Number of parity blocks.
+ * @size Size of the blocks pointed by @v. It must be a multipler of 64.
+ * @v Vector of pointers to the blocks of data and parity.
+ * It has (@nd + @np) elements. The starting elements are the blocks
+ * for data, following with the parity blocks.
+ * Each block has @size bytes.
+ */
+void (*raid_rec_ptr[RAID_PARITY_MAX])(
+ int nr, int *id, int *ip, int nd, size_t size, void **vv);
+
+void raid_rec(int nr, int *ir, int nd, int np, size_t size, void **v)
+{
+ int nrd; /* number of data blocks to recover */
+ int nrp; /* number of parity blocks to recover */
+
+ /* enforce limits on size */
+ BUG_ON(size % 64 != 0);
+ BUG_ON(size > PAGE_SIZE);
+
+ /* enforce the order in the index vector */
+ BUG_ON(nr >= 2 && ir[0] > ir[1]);
+ BUG_ON(nr >= 3 && ir[1] > ir[2]);
+ BUG_ON(nr >= 4 && ir[2] > ir[3]);
+ BUG_ON(nr >= 5 && ir[3] > ir[4]);
+ BUG_ON(nr >= 6 && ir[4] > ir[5]);
+
+ /* counts the number of data blocks to recover */
+ nrd = 0;
+ while (nrd < nr && ir[nrd] < nd)
+ ++nrd;
+
+ /* all the remaining are parity */
+ nrp = nr - nrd;
+
+ /* enforce basic sanity in arguments */
+ BUG_ON(nrd > nd);
+ BUG_ON(nrp > np);
+
+ /* ensure that we have enough parity to recover */
+ BUG_ON(nrd + nrp > np);
+
+ /* if failed data is present */
+ if (nrd != 0) {
+ int ip[RAID_PARITY_MAX];
+ int i, j, k;
+
+ /* setup the vector of parities to use */
+ for (i = 0, j = 0, k = 0; i < np; ++i) {
+ if (j < nrp && ir[nrd + j] == nd + i) {
+ /* this parity has to be recovered */
+ ++j;
+ } else {
+ /* this parity is used for recovering */
+ ip[k] = i;
+ ++k;
+ }
+ }
+
+ /* recover the nrd data blocks specified in ir[], */
+ /* using the first nrd parity in ip[] for recovering */
+ raid_rec_ptr[nrd - 1](nrd, ir, ip, nd, size, v);
+ }
+
+ /* recompute all the parities up to the last bad one */
+ if (nrp != 0)
+ raid_gen(nd, ir[nr - 1] - nd + 1, size, v);
+}
+EXPORT_SYMBOL_GPL(raid_rec);
+
diff --git a/lib/raid/test/Makefile b/lib/raid/test/Makefile
new file mode 100644
index 0000000..d19fe29
--- /dev/null
+++ b/lib/raid/test/Makefile
@@ -0,0 +1,72 @@
+#
+# Test programs for the RAID library
+#
+# selftest - Runs the same selftest and speedtest executed at the module startup.
+# fulltest - Runs a more extensive test that checks all the built-in functions.
+# speetest - Runs a more complete speed test.
+# invtest - Runs an extensive matrix inversion test of all the 377.342.351.231
+# possible square submatrices of the Cauchy matrix used.
+# covtest - Runs a coverage test.
+#
+
+CC = gcc
+CFLAGS = -I.. -I../../../include -Wall -Wextra -g
+ifeq ($(COVERAGE),)
+CFLAGS += -O2
+else
+CFLAGS += -O0 --coverage -DCOVERAGE=1
+endif
+LD = ld
+OBJS = raid.o int.o x86.o tables.o memory.o test.o helper.o module.o xor.o
+
+%.o: ../%.c
+ $(CC) $(CFLAGS) -c -o $@ $<
+
+all: fulltest speedtest selftest invtest
+
+fulltest: $(OBJS) fulltest.o
+ $(CC) $(CFLAGS) -o fulltest $^
+
+speedtest: $(OBJS) speedtest.o
+ $(CC) $(CFLAGS) -o speedtest $^
+
+selftest: $(OBJS) selftest.o
+ $(CC) $(CFLAGS) -o selftest $^
+
+invtest: $(OBJS) invtest.o
+ $(CC) $(CFLAGS) -o invtest $^
+
+mktables: mktables.o
+ $(CC) $(CFLAGS) -o mktables $^
+
+tables.c: mktables
+ ./mktables > tables.c
+
+# Use this target to run a coverage test using lcov
+covtest:
+ $(MAKE) clean
+ $(MAKE) lcov_reset
+ $(MAKE) COVERAGE=1 all
+ ./fulltest
+ ./selftest
+ ./speedtest
+ $(MAKE) lcov_capture
+ $(MAKE) lcov_html
+
+lcov_reset:
+ lcov --directory . -z
+ rm -f lcov.info
+
+lcov_capture:
+ lcov --directory . --capture --rc lcov_branch_coverage=1 -o lcov.info
+
+lcov_html:
+ rm -rf coverage
+ mkdir coverage
+ genhtml --branch-coverage -o coverage lcov.info
+
+clean:
+ rm -f *.o mktables.c mktables tables.c fulltest speedtest selftest invtest
+ rm -f *.gcda *.gcno lcov.info
+ rm -rf coverage
+
diff --git a/lib/raid/test/combo.h b/lib/raid/test/combo.h
new file mode 100644
index 0000000..30ae7b7
--- /dev/null
+++ b/lib/raid/test/combo.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_COMBO_H
+#define __RAID_COMBO_H
+
+#include <assert.h>
+
+/**
+ * Get the first permutation with repetition of r of n elements.
+ *
+ * Typical use is with permutation_next() in the form :
+ *
+ * int i[R];
+ * permutation_first(R, N, i);
+ * do {
+ * code using i[0], i[1], ..., i[R-1]
+ * } while (permutation_next(R, N, i));
+ *
+ * It's equivalent at the code :
+ *
+ * for(i[0]=0;i[0]<N;++i[0])
+ * for(i[1]=0;i[1]<N;++i[1])
+ * ...
+ * for(i[R-2]=0;i[R-2]<N;++i[R-2])
+ * for(i[R-1]=0;i[R-1]<N;++i[R-1])
+ * code using i[0], i[1], ..., i[R-1]
+ */
+static __always_inline void permutation_first(int r, int n, int *c)
+{
+ int i;
+
+ (void)n; /* unused, but kept for clarity */
+ assert(0 < r && r <= n);
+
+ for (i = 0; i < r; ++i)
+ c[i] = 0;
+}
+
+/**
+ * Get the next permutation with repetition of r of n elements.
+ * Return ==0 when finished.
+ */
+static __always_inline int permutation_next(int r, int n, int *c)
+{
+ int i = r - 1; /* present position */
+
+recurse:
+ /* next element at position i */
+ ++c[i];
+
+ /* if the position has reached the max */
+ if (c[i] >= n) {
+
+ /* if we are at the first level, we have finished */
+ if (i == 0)
+ return 0;
+
+ /* increase the previous position */
+ --i;
+ goto recurse;
+ }
+
+ ++i;
+
+ /* initialize all the next positions, if any */
+ while (i < r) {
+ c[i] = 0;
+ ++i;
+ }
+
+ return 1;
+}
+
+/**
+ * Get the first combination without repetition of r of n elements.
+ *
+ * Typical use is with combination_next() in the form :
+ *
+ * int i[R];
+ * combination_first(R, N, i);
+ * do {
+ * code using i[0], i[1], ..., i[R-1]
+ * } while (combination_next(R, N, i));
+ *
+ * It's equivalent at the code :
+ *
+ * for(i[0]=0;i[0]<N-(R-1);++i[0])
+ * for(i[1]=i[0]+1;i[1]<N-(R-2);++i[1])
+ * ...
+ * for(i[R-2]=i[R-3]+1;i[R-2]<N-1;++i[R-2])
+ * for(i[R-1]=i[R-2]+1;i[R-1]<N;++i[R-1])
+ * code using i[0], i[1], ..., i[R-1]
+ */
+static __always_inline void combination_first(int r, int n, int *c)
+{
+ int i;
+
+ (void)n; /* unused, but kept for clarity */
+ assert(0 < r && r <= n);
+
+ for (i = 0; i < r; ++i)
+ c[i] = i;
+}
+
+/**
+ * Get the next combination without repetition of r of n elements.
+ * Return ==0 when finished.
+ */
+static __always_inline int combination_next(int r, int n, int *c)
+{
+ int i = r - 1; /* present position */
+ int h = n; /* high limit for this position */
+
+recurse:
+ /* next element at position i */
+ ++c[i];
+
+ /* if the position has reached the max */
+ if (c[i] >= h) {
+
+ /* if we are at the first level, we have finished */
+ if (i == 0)
+ return 0;
+
+ /* increase the previous position */
+ --i;
+ --h;
+ goto recurse;
+ }
+
+ ++i;
+
+ /* initialize all the next positions, if any */
+ while (i < r) {
+ /* each position start at the next value of the previous one */
+ c[i] = c[i-1] + 1;
+ ++i;
+ }
+
+ return 1;
+}
+#endif
+
diff --git a/lib/raid/test/fulltest.c b/lib/raid/test/fulltest.c
new file mode 100644
index 0000000..0923ff4
--- /dev/null
+++ b/lib/raid/test/fulltest.c
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Full sanity test for the RAID library */
+
+#include "internal.h"
+#include "test.h"
+#include "cpu.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+
+/*
+ * Size of the blocks to test.
+ */
+#define TEST_SIZE 256
+
+/**
+ * Number of disks in the long parity test.
+ */
+#ifdef COVERAGE
+#define TEST_COUNT 10
+#else
+#define TEST_COUNT 32
+#endif
+
+int main(void)
+{
+ printf("Full sanity test for the RAID Cauchy library\n\n");
+
+ raid_init();
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2())
+ printf("Including x86 SSE2 functions\n");
+ if (raid_cpu_has_ssse3())
+ printf("Including x86 SSSE3 functions\n");
+#endif
+#ifdef CONFIG_X86_64
+ printf("Including x64 extended SSE register set\n");
+#endif
+
+ printf("\nPlease wait about 60 seconds...\n\n");
+
+ printf("Test insertion...\n");
+ if (raid_test_insert() != 0)
+ goto bail;
+ printf("Test combinations/permutations...\n");
+ if (raid_test_combo() != 0)
+ goto bail;
+ printf("Test parity generation with %u data disks...\n", RAID_DATA_MAX);
+ if (raid_test_par(RAID_DATA_MAX, TEST_SIZE) != 0)
+ goto bail;
+ printf("Test parity generation with 1 data disk...\n");
+ if (raid_test_par(1, TEST_SIZE) != 0)
+ goto bail;
+ printf("Test recovering with all combinations of %u data and 6 parity blocks...\n", TEST_COUNT);
+ if (raid_test_rec(TEST_COUNT, TEST_SIZE) != 0)
+ goto bail;
+
+ printf("OK\n");
+ return 0;
+
+bail:
+ printf("FAILED!\n");
+ exit(EXIT_FAILURE);
+}
+
diff --git a/lib/raid/test/invtest.c b/lib/raid/test/invtest.c
new file mode 100644
index 0000000..180a052
--- /dev/null
+++ b/lib/raid/test/invtest.c
@@ -0,0 +1,172 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Matrix inversion test for the RAID library */
+
+#include "internal.h"
+
+#include "combo.h"
+#include "gf.h"
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+
+/**
+ * Like raid_invert() but optimized to only check if the matrix is
+ * invertible.
+ */
+static __always_inline int raid_invert_fast(uint8_t *M, int n)
+{
+ int i, j, k;
+
+ /* for each element in the diagonal */
+ for (k = 0; k < n; ++k) {
+ uint8_t f;
+
+ /* the diagonal element cannot be 0 because */
+ /* we are inverting matrices with all the square */
+ /* submatrices not singular */
+ if (M[k*n+k] == 0)
+ return -1;
+
+ /* make the diagonal element to be 1 */
+ f = inv(M[k*n+k]);
+ for (j = 0; j < n; ++j)
+ M[k*n+j] = mul(f, M[k*n+j]);
+
+ /* make all the elements over and under the diagonal */
+ /* to be zero */
+ for (i = 0; i < n; ++i) {
+ if (i == k)
+ continue;
+ f = M[i*n+k];
+ for (j = 0; j < n; ++j)
+ M[i*n+j] ^= mul(f, M[k*n+j]);
+ }
+ }
+
+ return 0;
+}
+
+#define TEST_REFRESH (4*1024*1024)
+
+/**
+ * Precomputed number of square submatrices of size nr.
+ *
+ * It's bc(np,nr) * bc(nd,nr)
+ *
+ * With 1<=nr<=6 and bc(n, r) == binomial coefficient of (n over r).
+ */
+long long EXPECTED[RAID_PARITY_MAX] = {
+ 1506LL,
+ 470625LL,
+ 52082500LL,
+ 2421836250LL,
+ 47855484300LL,
+ 327012476050LL
+};
+
+static __always_inline int test_sub_matrix(int nr, long long *total)
+{
+ uint8_t M[RAID_PARITY_MAX * RAID_PARITY_MAX];
+ int np = RAID_PARITY_MAX;
+ int nd = RAID_DATA_MAX;
+ int ip[RAID_PARITY_MAX];
+ int id[RAID_DATA_MAX];
+ long long count;
+ long long expected;
+
+ printf("\n%ux%u\n", nr, nr);
+
+ count = 0;
+ expected = EXPECTED[nr - 1];
+
+ /* all combinations (nr of nd) disks */
+ combination_first(nr, nd, id);
+ do {
+ /* all combinations (nr of np) parities */
+ combination_first(nr, np, ip);
+ do {
+ int i, j;
+
+ /* setup the submatrix */
+ for (i = 0; i < nr; ++i)
+ for (j = 0; j < nr; ++j)
+ M[i*nr+j] = gfgen[ip[i]][id[j]];
+
+ /* invert */
+ if (raid_invert_fast(M, nr) != 0)
+ return -1;
+
+ if (++count % TEST_REFRESH == 0) {
+ printf("\r%.3f %%", count * (double)100 / expected);
+ fflush(stdout);
+ }
+ } while (combination_next(nr, np, ip));
+ } while (combination_next(nr, nd, id));
+
+ if (count != expected)
+ return -1;
+
+ printf("\rTested %lld matrix\n", count);
+
+ *total += count;
+
+ return 0;
+}
+
+int test_all_sub_matrix(void)
+{
+ long long total;
+
+ printf("Invert all square submatrices of the %dx%d Cauchy matrix\n",
+ RAID_PARITY_MAX, RAID_DATA_MAX);
+
+ printf("\nPlease wait about 2 days...\n");
+
+ total = 0;
+
+ /* force inlining of everything */
+ if (test_sub_matrix(1, &total) != 0)
+ return -1;
+ if (test_sub_matrix(2, &total) != 0)
+ return -1;
+ if (test_sub_matrix(3, &total) != 0)
+ return -1;
+ if (test_sub_matrix(4, &total) != 0)
+ return -1;
+ if (test_sub_matrix(5, &total) != 0)
+ return -1;
+ if (test_sub_matrix(6, &total) != 0)
+ return -1;
+
+ printf("\nTested in total %lld matrix\n", total);
+
+ return 0;
+}
+
+int main(void)
+{
+ printf("Matrix inversion test for the RAID Cauchy library\n\n");
+
+ if (test_all_sub_matrix() != 0) {
+ printf("FAILED!\n");
+ exit(EXIT_FAILURE);
+ }
+ printf("OK\n");
+
+ return 0;
+}
+
diff --git a/lib/raid/test/memory.c b/lib/raid/test/memory.c
new file mode 100644
index 0000000..6807ee4
--- /dev/null
+++ b/lib/raid/test/memory.c
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "memory.h"
+
+void *raid_malloc_align(size_t size, void **freeptr)
+{
+ unsigned char *ptr;
+ uintptr_t offset;
+
+ ptr = malloc(size + RAID_MALLOC_ALIGN);
+ if (!ptr)
+ return 0;
+
+ *freeptr = ptr;
+
+ offset = ((uintptr_t)ptr) % RAID_MALLOC_ALIGN;
+
+ if (offset != 0)
+ ptr += RAID_MALLOC_ALIGN - offset;
+
+ return ptr;
+}
+
+void **raid_malloc_vector(int nd, int n, size_t size, void **freeptr)
+{
+ void **v;
+ unsigned char *va;
+ int i;
+
+ v = malloc(n * sizeof(void *));
+ if (!v)
+ return 0;
+
+ va = raid_malloc_align(n * (size + RAID_MALLOC_DISPLACEMENT), freeptr);
+ if (!va) {
+ free(v);
+ return 0;
+ }
+
+ for (i = 0; i < n; ++i) {
+ v[i] = va;
+ va += size + RAID_MALLOC_DISPLACEMENT;
+ }
+
+ /* reverse order of the data blocks */
+ /* because they are usually accessed from the last one */
+ for (i = 0; i < nd/2; ++i) {
+ void *ptr = v[i];
+ v[i] = v[nd - 1 - i];
+ v[nd - 1 - i] = ptr;
+ }
+
+ return v;
+}
+
+void raid_mrand_vector(int n, size_t size, void **vv)
+{
+ unsigned char **v = (unsigned char **)vv;
+ int i;
+ size_t j;
+
+ for (i = 0; i < n; ++i)
+ for (j = 0; j < size; ++j)
+ v[i][j] = rand();
+}
+
diff --git a/lib/raid/test/memory.h b/lib/raid/test/memory.h
new file mode 100644
index 0000000..44f4b15
--- /dev/null
+++ b/lib/raid/test/memory.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_MEMORY_H
+#define __RAID_MEMORY_H
+
+/**
+ * Memory alignment provided by raid_malloc_align().
+ *
+ * It should guarantee good cache performance everywhere.
+ */
+#define RAID_MALLOC_ALIGN 256
+
+/**
+ * Memory displacement to avoid cache address sharing on contiguous blocks,
+ * used by raid_malloc_vector().
+ *
+ * When allocating a sequence of blocks with a size of power of 2,
+ * there is the risk that the addresses of each block are mapped into the
+ * same cache line and prefetching predictor, resulting in a lot of cache
+ * sharing if you access all the blocks in parallel, from the start to the
+ * end.
+ *
+ * To avoid this effect, it's better if all the blocks are allocated
+ * with a fixed displacement trying to reduce the cache addresses sharing.
+ *
+ * The selected displacement was choosen empirically with some speed tests
+ * with 16 data buffers of 4 KB.
+ *
+ * These are the results in MB/s with no displacement:
+ *
+ * int8 int32 int64 sse2 sse2e ssse3 ssse3e
+ * gen1 6940 13971 29824
+ * gen2 2530 4675 14840 16485
+ * gen3 490 6859 7710
+ *
+ * These are the results with displacement resulting in improvments
+ * from 20% to up of 50%:
+ *
+ * int8 int32 int64 sse2 sse2e ssse3 ssse3e
+ * gen1 11762 21450 44621
+ * gen2 3520 6176 18100 20338
+ * gen3 848 8009 9210
+ *
+ */
+#define RAID_MALLOC_DISPLACEMENT 64
+
+/**
+ * Aligned malloc.
+ */
+void *raid_malloc_align(size_t size, void **freeptr);
+
+/**
+ * Aligned vector allocation.
+ * Returns a vector of @n pointers, each one pointing to a block of
+ * the specified @size.
+ * The first @nd elements are reversed in order.
+ */
+void **raid_malloc_vector(int nd, int n, size_t size, void **freeptr);
+
+/**
+ * Fills the memory vector with random data.
+ */
+void raid_mrand_vector(int n, size_t size, void **vv);
+
+#endif
+
diff --git a/lib/raid/test/selftest.c b/lib/raid/test/selftest.c
new file mode 100644
index 0000000..57ef059
--- /dev/null
+++ b/lib/raid/test/selftest.c
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Self sanity test for the RAID library */
+
+#include "internal.h"
+#include "cpu.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+
+int main(void)
+{
+ printf("Self sanity test for the RAID Cauchy library\n\n");
+
+ raid_init();
+
+ printf("Self test...\n");
+ if (raid_selftest() != 0) {
+ printf("FAILED!\n");
+ exit(EXIT_FAILURE);
+ }
+ printf("OK\n\n");
+
+ printf("Speed test...\n");
+ raid_speedtest(0);
+
+ printf("\nSpeed test with optimized memory layout...\n");
+ raid_speedtest(64);
+
+ return 0;
+}
+
diff --git a/lib/raid/test/speedtest.c b/lib/raid/test/speedtest.c
new file mode 100644
index 0000000..e52ba64
--- /dev/null
+++ b/lib/raid/test/speedtest.c
@@ -0,0 +1,578 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Speed test for the RAID library */
+
+#include "internal.h"
+#include "memory.h"
+#include "cpu.h"
+
+#include <sys/time.h>
+#include <stdio.h>
+#include <inttypes.h>
+
+/*
+ * Size of the blocks to test.
+ */
+#define TEST_SIZE PAGE_SIZE
+
+/*
+ * Number of data blocks to test.
+ */
+#define TEST_COUNT (65536 / TEST_SIZE)
+
+/**
+ * Differential us of two timeval.
+ */
+static int64_t diffgettimeofday(struct timeval *start, struct timeval *stop)
+{
+ int64_t d;
+
+ d = 1000000LL * (stop->tv_sec - start->tv_sec);
+ d += stop->tv_usec - start->tv_usec;
+
+ return d;
+}
+
+/**
+ * Test period.
+ */
+#ifdef COVERAGE
+#define TEST_PERIOD 100000LL
+#define TEST_DELTA 1
+#else
+#define TEST_PERIOD 1000000LL
+#define TEST_DELTA 10
+#endif
+
+/**
+ * Start time measurement.
+ */
+#define SPEED_START \
+ count = 0; \
+ gettimeofday(&start, 0); \
+ do { \
+ for (i = 0; i < delta; ++i)
+
+/**
+ * Stop time measurement.
+ */
+#define SPEED_STOP \
+ count += delta; \
+ gettimeofday(&stop, 0); \
+ } while (diffgettimeofday(&start, &stop) < TEST_PERIOD); \
+ ds = size * (int64_t)count * nd; \
+ dt = diffgettimeofday(&start, &stop);
+
+void speed(void)
+{
+ struct timeval start;
+ struct timeval stop;
+ int64_t ds;
+ int64_t dt;
+ int i, j;
+ int id[RAID_PARITY_MAX];
+ int ip[RAID_PARITY_MAX];
+ int count;
+ int delta = TEST_DELTA;
+ int size = TEST_SIZE;
+ int nd = TEST_COUNT;
+ int nv;
+ void *v_alloc;
+ void **v;
+
+ nv = nd + RAID_PARITY_MAX;
+
+ v = raid_malloc_vector(nd, nv, size, &v_alloc);
+
+ /* initialize disks with fixed data */
+ for (i = 0; i < nd; ++i)
+ memset(v[i], i, size);
+
+ /* basic disks and parity mapping */
+ for (i = 0; i < RAID_PARITY_MAX; ++i) {
+ id[i] = i;
+ ip[i] = i;
+ }
+
+ printf("Speed test using %u data buffers of %u bytes, for a total of %u KiB.\n", nd, size, nd * size / 1024);
+ printf("Memory blocks have a displacement of %u bytes to improve cache performance.\n", RAID_MALLOC_DISPLACEMENT);
+ printf("The reported value is the aggregate bandwidth of all data blocks in MiB/s,\n");
+ printf("not counting parity blocks.\n");
+ printf("\n");
+
+ printf("Memory write speed using the C memset() function:\n");
+ printf("%8s", "memset");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ memset(v[j], j, size);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ printf("\n");
+ printf("\n");
+
+ /* RAID table */
+ printf("RAID functions used for computing the parity:\n");
+ printf("%8s", "");
+ printf("%8s", "int8");
+ printf("%8s", "int32");
+ printf("%8s", "int64");
+#ifdef CONFIG_X86
+ printf("%8s", "sse2");
+#ifdef CONFIG_X86_64
+ printf("%8s", "sse2e");
+#endif
+ printf("%8s", "ssse3");
+#ifdef CONFIG_X86_64
+ printf("%8s", "ssse3e");
+#endif
+#endif
+ printf("\n");
+
+ /* GEN1 */
+ printf("%8s", "gen1");
+ fflush(stdout);
+
+ printf("%8s", "");
+
+ SPEED_START {
+ raid_gen1_int32(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen1_int64(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ SPEED_START {
+ raid_gen1_sse2(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+ }
+#endif
+ printf("\n");
+
+ /* GEN2 */
+ printf("%8s", "gen2");
+ fflush(stdout);
+
+ printf("%8s", "");
+
+ SPEED_START {
+ raid_gen2_int32(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen2_int64(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ SPEED_START {
+ raid_gen2_sse2(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86_64
+ SPEED_START {
+ raid_gen2_sse2ext(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+#endif
+ }
+#endif
+ printf("\n");
+
+ /* GEN3 */
+ printf("%8s", "gen3");
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen3_int8(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ printf("%8s", "");
+ printf("%8s", "");
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ printf("%8s", "");
+
+#ifdef CONFIG_X86_64
+ printf("%8s", "");
+#endif
+ }
+#endif
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ raid_gen3_ssse3(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86_64
+ SPEED_START {
+ raid_gen3_ssse3ext(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+#endif
+ }
+#endif
+ printf("\n");
+
+ /* GEN4 */
+ printf("%8s", "gen4");
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen4_int8(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ printf("%8s", "");
+ printf("%8s", "");
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ printf("%8s", "");
+
+#ifdef CONFIG_X86_64
+ printf("%8s", "");
+#endif
+ }
+#endif
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ raid_gen4_ssse3(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86_64
+ SPEED_START {
+ raid_gen4_ssse3ext(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+#endif
+ }
+#endif
+ printf("\n");
+
+ /* GEN5 */
+ printf("%8s", "gen5");
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen5_int8(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ printf("%8s", "");
+ printf("%8s", "");
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ printf("%8s", "");
+
+#ifdef CONFIG_X86_64
+ printf("%8s", "");
+#endif
+ }
+#endif
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ raid_gen5_ssse3(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86_64
+ SPEED_START {
+ raid_gen5_ssse3ext(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+#endif
+ }
+#endif
+ printf("\n");
+
+ /* GEN6 */
+ printf("%8s", "gen6");
+ fflush(stdout);
+
+ SPEED_START {
+ raid_gen6_int8(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+ printf("%8s", "");
+ printf("%8s", "");
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ printf("%8s", "");
+
+#ifdef CONFIG_X86_64
+ printf("%8s", "");
+#endif
+ }
+#endif
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ raid_gen6_ssse3(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86_64
+ SPEED_START {
+ raid_gen6_ssse3ext(nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+#endif
+ }
+#endif
+ printf("\n");
+ printf("\n");
+
+ /* recover table */
+ printf("RAID functions used for recovering:\n");
+ printf("%8s", "");
+ printf("%8s", "int8");
+#ifdef CONFIG_X86
+ printf("%8s", "ssse3");
+#endif
+ printf("\n");
+
+ printf("%8s", "rec1");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ /* +1 to avoid GEN1 optimized case */
+ raid_rec1_int8(1, id, ip + 1, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ /* +1 to avoid GEN1 optimized case */
+ raid_rec1_ssse3(1, id, ip + 1, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+
+ printf("%8s", "rec2");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ /* +1 to avoid GEN2 optimized case */
+ raid_rec2_int8(2, id, ip + 1, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ /* +1 to avoid GEN2 optimized case */
+ raid_rec2_ssse3(2, id, ip + 1, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+
+ printf("%8s", "rec3");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_int8(3, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_ssse3(3, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+
+ printf("%8s", "rec4");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_int8(4, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_ssse3(4, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+
+ printf("%8s", "rec5");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_int8(5, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_ssse3(5, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+
+ printf("%8s", "rec6");
+ fflush(stdout);
+
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_int8(6, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ fflush(stdout);
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ SPEED_START {
+ for (j = 0; j < nd; ++j)
+ raid_recX_ssse3(6, id, ip, nd, size, v);
+ } SPEED_STOP
+
+ printf("%8"PRIu64, ds / dt);
+ }
+#endif
+ printf("\n");
+ printf("\n");
+
+ free(v_alloc);
+ free(v);
+}
+
+int main(void)
+{
+ printf("Speed test for the RAID Cauchy library\n\n");
+
+ raid_init();
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2())
+ printf("Including x86 SSE2 functions\n");
+ if (raid_cpu_has_ssse3())
+ printf("Including x86 SSSE3 functions\n");
+#endif
+#ifdef CONFIG_X86_64
+ printf("Including x64 extended SSE register set\n");
+#endif
+
+ printf("\nPlease wait about 30 seconds...\n\n");
+
+ speed();
+
+ return 0;
+}
+
diff --git a/lib/raid/test/test.c b/lib/raid/test/test.c
new file mode 100644
index 0000000..248fbec
--- /dev/null
+++ b/lib/raid/test/test.c
@@ -0,0 +1,314 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "cpu.h"
+#include "combo.h"
+#include "memory.h"
+
+/**
+ * Binomial coefficient of n over r.
+ */
+static int ibc(int n, int r)
+{
+ if (r == 0 || n == r)
+ return 1;
+ else
+ return ibc(n - 1, r - 1) + ibc(n - 1, r);
+}
+
+/**
+ * Power n ^ r;
+ */
+static int ipow(int n, int r)
+{
+ int v = 1;
+ while (r) {
+ v *= n;
+ --r;
+ }
+ return v;
+}
+
+int raid_test_combo(void)
+{
+ int r;
+ int count;
+ int p[RAID_PARITY_MAX];
+
+ for (r = 1; r <= RAID_PARITY_MAX; ++r) {
+ /* count combination (r of RAID_PARITY_MAX) elements */
+ count = 0;
+ combination_first(r, RAID_PARITY_MAX, p);
+
+ do {
+ ++count;
+ } while (combination_next(r, RAID_PARITY_MAX, p));
+
+ if (count != ibc(RAID_PARITY_MAX, r))
+ return -1;
+ }
+
+ for (r = 1; r <= RAID_PARITY_MAX; ++r) {
+ /* count permutation (r of RAID_PARITY_MAX) elements */
+ count = 0;
+ permutation_first(r, RAID_PARITY_MAX, p);
+
+ do {
+ ++count;
+ } while (permutation_next(r, RAID_PARITY_MAX, p));
+
+ if (count != ipow(RAID_PARITY_MAX, r))
+ return -1;
+ }
+
+ return 0;
+}
+
+int raid_test_insert(void)
+{
+ int p[RAID_PARITY_MAX];
+ int r;
+
+ for (r = 1; r <= RAID_PARITY_MAX; ++r) {
+ permutation_first(r, RAID_PARITY_MAX, p);
+ do {
+ int i[RAID_PARITY_MAX];
+ int j;
+
+ /* insert in order */
+ for (j = 0; j < r; ++j)
+ raid_insert(j, i, p[j]);
+
+ /* check order */
+ for (j = 1; j < r; ++j)
+ if (i[j-1] > i[j])
+ return -1;
+ } while (permutation_next(r, RAID_PARITY_MAX, p));
+ }
+
+ return 0;
+}
+
+int raid_test_rec(int nd, size_t size)
+{
+ void *v_alloc;
+ void **v;
+ void **data;
+ void **parity;
+ void **test;
+ void *data_save[RAID_PARITY_MAX];
+ void *parity_save[RAID_PARITY_MAX];
+ void *waste;
+ int nv;
+ int id[RAID_PARITY_MAX];
+ int ip[RAID_PARITY_MAX];
+ int i;
+ int j;
+ int nr;
+ void (*f[RAID_PARITY_MAX][4])(
+ int nr, int *id, int *ip, int nd, size_t size, void **vbuf);
+ int nf[RAID_PARITY_MAX];
+ int np;
+
+ np = RAID_PARITY_MAX;
+
+ nv = nd + np * 2 + 1;
+
+ v = raid_malloc_vector(nd, nv, size, &v_alloc);
+ if (!v)
+ return -1;
+
+ data = v;
+ parity = v + nd;
+ test = v + nd + np;
+
+ for (i = 0; i < np; ++i)
+ parity_save[i] = parity[i];
+
+ waste = v[nv-1];
+
+ /* fill data disk with random */
+ raid_mrand_vector(nd, size, v);
+
+ /* setup recov functions */
+ for (i = 0; i < np; ++i) {
+ nf[i] = 0;
+ if (i == 0) {
+ f[i][nf[i]++] = raid_rec1_int8;
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3())
+ f[i][nf[i]++] = raid_rec1_ssse3;
+#endif
+ } else if (i == 1) {
+ f[i][nf[i]++] = raid_rec2_int8;
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3())
+ f[i][nf[i]++] = raid_rec2_ssse3;
+#endif
+ } else {
+ f[i][nf[i]++] = raid_recX_int8;
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3())
+ f[i][nf[i]++] = raid_recX_ssse3;
+#endif
+ }
+ }
+
+ /* compute the parity */
+ raid_gen_ref(nd, np, size, v);
+
+ /* set all the parity to the waste v */
+ for (i = 0; i < np; ++i)
+ parity[i] = waste;
+
+ /* all parity levels */
+ for (nr = 1; nr <= np; ++nr) {
+ /* all combinations (nr of nd) disks */
+ combination_first(nr, nd, id);
+ do {
+ /* all combinations (nr of np) parities */
+ combination_first(nr, np, ip);
+ do {
+ /* for each recover function */
+ for (j = 0; j < nf[nr-1]; ++j) {
+ /* set */
+ for (i = 0; i < nr; ++i) {
+ /* remove the missing data */
+ data_save[i] = data[id[i]];
+ data[id[i]] = test[i];
+ /* set the parity to use */
+ parity[ip[i]] = parity_save[ip[i]];
+ }
+
+ /* recover */
+ f[nr-1][j](nr, id, ip, nd, size, v);
+
+ /* check */
+ for (i = 0; i < nr; ++i)
+ if (memcmp(test[i], data_save[i], size) != 0)
+ goto bail;
+
+ /* restore */
+ for (i = 0; i < nr; ++i) {
+ /* restore the data */
+ data[id[i]] = data_save[i];
+ /* restore the parity */
+ parity[ip[i]] = waste;
+ }
+ }
+ } while (combination_next(nr, np, ip));
+ } while (combination_next(nr, nd, id));
+ }
+
+ free(v_alloc);
+ free(v);
+ return 0;
+
+bail:
+ free(v_alloc);
+ free(v);
+ return -1;
+}
+
+int raid_test_par(int nd, size_t size)
+{
+ void *v_alloc;
+ void **v;
+ int nv;
+ int i, j;
+ void (*f[64])(int nd, size_t size, void **vbuf);
+ int nf;
+ int np;
+
+ np = RAID_PARITY_MAX;
+
+ nv = nd + np * 2;
+
+ v = raid_malloc_vector(nd, nv, size, &v_alloc);
+ if (!v)
+ return -1;
+
+ /* fill with random */
+ raid_mrand_vector(nv, size, v);
+
+ /* compute the parity */
+ raid_gen_ref(nd, np, size, v);
+
+ /* copy in back buffers */
+ for (i = 0; i < np; ++i)
+ memcpy(v[nd + np + i], v[nd + i], size);
+
+ /* load all the available functions */
+ nf = 0;
+
+#ifdef RAID_USE_XOR_BLOCKS
+ f[nf++] = raid_gen1_xorblocks;
+#endif
+ f[nf++] = raid_gen1_int32;
+ f[nf++] = raid_gen1_int64;
+ f[nf++] = raid_gen2_int32;
+ f[nf++] = raid_gen2_int64;
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_sse2()) {
+ f[nf++] = raid_gen1_sse2;
+ f[nf++] = raid_gen2_sse2;
+#ifdef CONFIG_X86_64
+ f[nf++] = raid_gen2_sse2ext;
+#endif
+ }
+#endif
+
+ f[nf++] = raid_gen3_int8;
+ f[nf++] = raid_gen4_int8;
+ f[nf++] = raid_gen5_int8;
+ f[nf++] = raid_gen6_int8;
+
+#ifdef CONFIG_X86
+ if (raid_cpu_has_ssse3()) {
+ f[nf++] = raid_gen3_ssse3;
+ f[nf++] = raid_gen4_ssse3;
+ f[nf++] = raid_gen5_ssse3;
+ f[nf++] = raid_gen6_ssse3;
+#ifdef CONFIG_X86_64
+ f[nf++] = raid_gen3_ssse3ext;
+ f[nf++] = raid_gen4_ssse3ext;
+ f[nf++] = raid_gen5_ssse3ext;
+ f[nf++] = raid_gen6_ssse3ext;
+#endif
+ }
+#endif
+
+ /* check all the functions */
+ for (j = 0; j < nf; ++j) {
+ /* compute parity */
+ f[j](nd, size, v);
+
+ /* check it */
+ for (i = 0; i < np; ++i)
+ if (memcmp(v[nd + np + i], v[nd + i], size) != 0)
+ goto bail;
+ }
+
+ free(v_alloc);
+ free(v);
+ return 0;
+
+bail:
+ free(v_alloc);
+ free(v);
+ return -1;
+}
+
diff --git a/lib/raid/test/test.h b/lib/raid/test/test.h
new file mode 100644
index 0000000..7ca48af
--- /dev/null
+++ b/lib/raid/test/test.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_TEST_H
+#define __RAID_TEST_H
+
+/**
+ * Tests insertion function.
+ *
+ * Test raid_insert() with all the possible combinations of elements to insert.
+ *
+ * Returns 0 on success.
+ */
+int raid_test_insert(void);
+
+/**
+ * Tests combination functions.
+ *
+ * Tests combination_first() and combination_next() for all the parity levels.
+ *
+ * Returns 0 on success.
+ */
+int raid_test_combo(void);
+
+/**
+ * Tests recovering functions.
+ *
+ * All the recovering functions are tested with all the combinations
+ * of failing disks and recovering parities.
+ *
+ * Take care that the test time grows exponentially with the number of disks.
+ *
+ * Returns 0 on success.
+ */
+int raid_test_rec(int nd, size_t size);
+
+/**
+ * Tests parity generation functions.
+ *
+ * All the parity generation functions are tested with the specified
+ * number of disks.
+ *
+ * Returns 0 on success.
+ */
+int raid_test_par(int nd, size_t size);
+
+#endif
+
diff --git a/lib/raid/test/usermode.h b/lib/raid/test/usermode.h
new file mode 100644
index 0000000..732cbc5
--- /dev/null
+++ b/lib/raid/test/usermode.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_USERMODE_H
+#define __RAID_USERMODE_H
+
+/*
+ * Compatibility layer for user mode applications.
+ */
+#include <stdlib.h>
+#include <stdint.h>
+#include <assert.h>
+#include <string.h>
+#include <malloc.h>
+#include <errno.h>
+#include <sys/time.h>
+
+#define pr_err printf
+#define pr_info printf
+#define __aligned(a) __attribute__((aligned(a)))
+#define PAGE_SIZE 4096
+#define EXPORT_SYMBOL_GPL(a) int dummy_##a
+#define EXPORT_SYMBOL(a) int dummy_##a
+#if defined(__i386__)
+#define CONFIG_X86 1
+#define CONFIG_X86_32 1
+#endif
+#if defined(__x86_64__)
+#define CONFIG_X86 1
+#define CONFIG_X86_64 1
+#endif
+#ifdef COVERAGE
+#define BUG_ON(a) do { } while (0)
+#else
+#define BUG_ON(a) assert(!(a))
+#endif
+#define RAID_USE_XOR_BLOCKS 1
+#define MAX_XOR_BLOCKS 1
+void xor_blocks(unsigned count, unsigned size, void *dest, void **srcs);
+#define GFP_KERNEL 0
+#define alloc_pages_exact(size, x) memalign(PAGE_SIZE, size)
+#define free_pages_exact(p, size) free(p)
+#define preempt_disable() do { } while (0)
+#define preempt_enable() do { } while (0)
+#define cpu_relax() do { } while (0)
+#define HZ 1000
+#define jiffies get_jiffies()
+static inline unsigned long get_jiffies(void)
+{
+ struct timeval t;
+ gettimeofday(&t, 0);
+ return t.tv_sec * 1000 + t.tv_usec / 1000;
+}
+#define time_before(x, y) ((x) < (y))
+
+#ifdef CONFIG_X86
+#define X86_FEATURE_XMM2 (0*32+26)
+#define X86_FEATURE_SSSE3 (4*32+9)
+#define X86_FEATURE_AVX (4*32+28)
+#define X86_FEATURE_AVX2 (9*32+5)
+
+static inline int boot_cpu_has(int flag)
+{
+ uint32_t eax, ebx, ecx, edx;
+
+ eax = (flag & 0x100) ? 7 : (flag & 0x20) ? 0x80000001 : 1;
+ ecx = 0;
+
+ asm volatile("cpuid" : "+a" (eax), "=b" (ebx), "=d" (edx), "+c" (ecx));
+
+ return ((flag & 0x100 ? ebx : (flag & 0x80) ? ecx : edx) >> (flag & 31)) & 1;
+}
+
+static inline void kernel_fpu_begin(void)
+{
+}
+
+static inline void kernel_fpu_end(void)
+{
+}
+#endif /* CONFIG_X86 */
+
+#endif
+
diff --git a/lib/raid/test/xor.c b/lib/raid/test/xor.c
new file mode 100644
index 0000000..2d68636
--- /dev/null
+++ b/lib/raid/test/xor.c
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+
+/**
+ * Implementation of the kernel xor_blocks().
+ */
+void xor_blocks(unsigned int count, unsigned int bytes, void *dest, void **srcs)
+{
+ uint32_t *p1 = dest;
+ uint32_t *p2 = srcs[0];
+ long lines = bytes / (sizeof(uint32_t)) / 8;
+
+ BUG_ON(count != 1);
+
+ do {
+ p1[0] ^= p2[0];
+ p1[1] ^= p2[1];
+ p1[2] ^= p2[2];
+ p1[3] ^= p2[3];
+ p1[4] ^= p2[4];
+ p1[5] ^= p2[5];
+ p1[6] ^= p2[6];
+ p1[7] ^= p2[7];
+ p1 += 8;
+ p2 += 8;
+ } while (--lines > 0);
+}
+
diff --git a/lib/raid/x86.c b/lib/raid/x86.c
new file mode 100644
index 0000000..a2a8f0d
--- /dev/null
+++ b/lib/raid/x86.c
@@ -0,0 +1,1565 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "internal.h"
+#include "gf.h"
+
+#ifdef CONFIG_X86
+/*
+ * GEN1 (RAID5 with xor) SSE2 implementation
+ *
+ * Intentionally don't process more than 64 bytes because 64 is the typical
+ * cache block, and processing 128 bytes doesn't increase performance, and in
+ * some cases it even decreases it.
+ */
+void raid_gen1_sse2(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+
+ raid_asm_begin();
+
+ for (i = 0; i < size; i += 64) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (v[l][i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (v[l][i+16]));
+ asm volatile("movdqa %0,%%xmm2" : : "m" (v[l][i+32]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (v[l][i+48]));
+ for (d = l-1; d >= 0; --d) {
+ asm volatile("pxor %0,%%xmm0" : : "m" (v[d][i]));
+ asm volatile("pxor %0,%%xmm1" : : "m" (v[d][i+16]));
+ asm volatile("pxor %0,%%xmm2" : : "m" (v[d][i+32]));
+ asm volatile("pxor %0,%%xmm3" : : "m" (v[d][i+48]));
+ }
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (p[i+16]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (p[i+32]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (p[i+48]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+static const struct gfconst16 {
+ uint8_t poly[16];
+ uint8_t low4[16];
+} gfconst16 __aligned(32) = {
+ { 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d,
+ 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d, 0x1d },
+ { 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
+ 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f },
+};
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * GEN2 (RAID6 with powers of 2) SSE2 implementation
+ */
+void raid_gen2_sse2(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+
+ raid_asm_begin();
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+
+ for (i = 0; i < size; i += 32) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (v[l][i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (v[l][i+16]));
+ asm volatile("movdqa %xmm0,%xmm2");
+ asm volatile("movdqa %xmm1,%xmm3");
+ for (d = l-1; d >= 0; --d) {
+ asm volatile("pxor %xmm4,%xmm4");
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm2,%xmm4");
+ asm volatile("pcmpgtb %xmm3,%xmm5");
+ asm volatile("paddb %xmm2,%xmm2");
+ asm volatile("paddb %xmm3,%xmm3");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm5,%xmm3");
+
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+ asm volatile("movdqa %0,%%xmm5" : : "m" (v[d][i+16]));
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm5,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm5,%xmm3");
+ }
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (p[i+16]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (q[i+16]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86_64
+/*
+ * GEN2 (RAID6 with powers of 2) SSE2 implementation
+ *
+ * Note that it uses 16 registers, meaning that x64 is required.
+ */
+void raid_gen2_sse2ext(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+
+ raid_asm_begin();
+
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.poly[0]));
+
+ for (i = 0; i < size; i += 64) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (v[l][i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (v[l][i+16]));
+ asm volatile("movdqa %0,%%xmm2" : : "m" (v[l][i+32]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (v[l][i+48]));
+ asm volatile("movdqa %xmm0,%xmm4");
+ asm volatile("movdqa %xmm1,%xmm5");
+ asm volatile("movdqa %xmm2,%xmm6");
+ asm volatile("movdqa %xmm3,%xmm7");
+ for (d = l-1; d >= 0; --d) {
+ asm volatile("pxor %xmm8,%xmm8");
+ asm volatile("pxor %xmm9,%xmm9");
+ asm volatile("pxor %xmm10,%xmm10");
+ asm volatile("pxor %xmm11,%xmm11");
+ asm volatile("pcmpgtb %xmm4,%xmm8");
+ asm volatile("pcmpgtb %xmm5,%xmm9");
+ asm volatile("pcmpgtb %xmm6,%xmm10");
+ asm volatile("pcmpgtb %xmm7,%xmm11");
+ asm volatile("paddb %xmm4,%xmm4");
+ asm volatile("paddb %xmm5,%xmm5");
+ asm volatile("paddb %xmm6,%xmm6");
+ asm volatile("paddb %xmm7,%xmm7");
+ asm volatile("pand %xmm15,%xmm8");
+ asm volatile("pand %xmm15,%xmm9");
+ asm volatile("pand %xmm15,%xmm10");
+ asm volatile("pand %xmm15,%xmm11");
+ asm volatile("pxor %xmm8,%xmm4");
+ asm volatile("pxor %xmm9,%xmm5");
+ asm volatile("pxor %xmm10,%xmm6");
+ asm volatile("pxor %xmm11,%xmm7");
+
+ asm volatile("movdqa %0,%%xmm8" : : "m" (v[d][i]));
+ asm volatile("movdqa %0,%%xmm9" : : "m" (v[d][i+16]));
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[d][i+32]));
+ asm volatile("movdqa %0,%%xmm11" : : "m" (v[d][i+48]));
+ asm volatile("pxor %xmm8,%xmm0");
+ asm volatile("pxor %xmm9,%xmm1");
+ asm volatile("pxor %xmm10,%xmm2");
+ asm volatile("pxor %xmm11,%xmm3");
+ asm volatile("pxor %xmm8,%xmm4");
+ asm volatile("pxor %xmm9,%xmm5");
+ asm volatile("pxor %xmm10,%xmm6");
+ asm volatile("pxor %xmm11,%xmm7");
+ }
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (p[i+16]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (p[i+32]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (p[i+48]));
+ asm volatile("movntdq %%xmm4,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm5,%0" : "=m" (q[i+16]));
+ asm volatile("movntdq %%xmm6,%0" : "=m" (q[i+32]));
+ asm volatile("movntdq %%xmm7,%0" : "=m" (q[i+48]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * GEN3 (triple parity with Cauchy matrix) SSSE3 implementation
+ */
+void raid_gen3_ssse3(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 3; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %xmm4,%xmm0");
+ asm volatile("movdqa %xmm4,%xmm1");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm6");
+ asm volatile("pxor %xmm6,%xmm2");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm3,%xmm5");
+ asm volatile("pxor %xmm5,%xmm1");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm5,%xmm6");
+ asm volatile("pxor %xmm6,%xmm2");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm3,%xmm5");
+ asm volatile("pxor %xmm5,%xmm1");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86_64
+/*
+ * GEN3 (triple parity with Cauchy matrix) SSSE3 implementation
+ *
+ * Note that it uses 16 registers, meaning that x64 is required.
+ */
+void raid_gen3_ssse3ext(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 3; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm11" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 32) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[l][i+16]));
+
+ asm volatile("movdqa %xmm4,%xmm0");
+ asm volatile("movdqa %xmm4,%xmm1");
+ asm volatile("movdqa %xmm12,%xmm8");
+ asm volatile("movdqa %xmm12,%xmm9");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("movdqa %xmm12,%xmm13");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("psrlw $4,%xmm13");
+ asm volatile("pand %xmm11,%xmm4");
+ asm volatile("pand %xmm11,%xmm12");
+ asm volatile("pand %xmm11,%xmm5");
+ asm volatile("pand %xmm11,%xmm13");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("movdqa %xmm2,%xmm10");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm12,%xmm10");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm7,%xmm2");
+ asm volatile("pxor %xmm15,%xmm10");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[d][i+16]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pxor %xmm13,%xmm13");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("pcmpgtb %xmm9,%xmm13");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("paddb %xmm9,%xmm9");
+ asm volatile("pand %xmm3,%xmm5");
+ asm volatile("pand %xmm3,%xmm13");
+ asm volatile("pxor %xmm5,%xmm1");
+ asm volatile("pxor %xmm13,%xmm9");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm12,%xmm8");
+ asm volatile("pxor %xmm12,%xmm9");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("movdqa %xmm12,%xmm13");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("psrlw $4,%xmm13");
+ asm volatile("pand %xmm11,%xmm4");
+ asm volatile("pand %xmm11,%xmm12");
+ asm volatile("pand %xmm11,%xmm5");
+ asm volatile("pand %xmm11,%xmm13");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("movdqa %xmm6,%xmm14");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm12,%xmm14");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("pxor %xmm14,%xmm10");
+ asm volatile("pxor %xmm7,%xmm2");
+ asm volatile("pxor %xmm15,%xmm10");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[0][i+16]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pxor %xmm13,%xmm13");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("pcmpgtb %xmm9,%xmm13");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("paddb %xmm9,%xmm9");
+ asm volatile("pand %xmm3,%xmm5");
+ asm volatile("pand %xmm3,%xmm13");
+ asm volatile("pxor %xmm5,%xmm1");
+ asm volatile("pxor %xmm13,%xmm9");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm12,%xmm8");
+ asm volatile("pxor %xmm12,%xmm9");
+ asm volatile("pxor %xmm12,%xmm10");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm8,%0" : "=m" (p[i+16]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm9,%0" : "=m" (q[i+16]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm10,%0" : "=m" (r[i+16]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * GEN4 (quad parity with Cauchy matrix) SSSE3 implementation
+ */
+void raid_gen4_ssse3(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 4; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %xmm4,%xmm0");
+ asm volatile("movdqa %xmm4,%xmm1");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm3");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm3");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pxor %xmm5,%xmm1");
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm3");
+ asm volatile("pxor %xmm7,%xmm3");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pxor %xmm5,%xmm1");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm4,%xmm3");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (s[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86_64
+/*
+ * GEN4 (quad parity with Cauchy matrix) SSSE3 implementation
+ *
+ * Note that it uses 16 registers, meaning that x64 is required.
+ */
+void raid_gen4_ssse3ext(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 4; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ for (i = 0; i < size; i += 32) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[l][i+16]));
+
+ asm volatile("movdqa %xmm4,%xmm0");
+ asm volatile("movdqa %xmm4,%xmm1");
+ asm volatile("movdqa %xmm12,%xmm8");
+ asm volatile("movdqa %xmm12,%xmm9");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("movdqa %xmm12,%xmm13");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("psrlw $4,%xmm13");
+ asm volatile("pand %xmm15,%xmm4");
+ asm volatile("pand %xmm15,%xmm12");
+ asm volatile("pand %xmm15,%xmm5");
+ asm volatile("pand %xmm15,%xmm13");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("movdqa %xmm2,%xmm10");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm12,%xmm10");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm7,%xmm2");
+ asm volatile("pxor %xmm15,%xmm10");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("movdqa %xmm3,%xmm11");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm3");
+ asm volatile("pshufb %xmm12,%xmm11");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm7,%xmm3");
+ asm volatile("pxor %xmm15,%xmm11");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[d][i+16]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pxor %xmm13,%xmm13");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("pcmpgtb %xmm9,%xmm13");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("paddb %xmm9,%xmm9");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pand %xmm7,%xmm13");
+ asm volatile("pxor %xmm5,%xmm1");
+ asm volatile("pxor %xmm13,%xmm9");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm12,%xmm8");
+ asm volatile("pxor %xmm12,%xmm9");
+
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("movdqa %xmm12,%xmm13");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("psrlw $4,%xmm13");
+ asm volatile("pand %xmm15,%xmm4");
+ asm volatile("pand %xmm15,%xmm12");
+ asm volatile("pand %xmm15,%xmm5");
+ asm volatile("pand %xmm15,%xmm13");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("movdqa %xmm6,%xmm14");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm12,%xmm14");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("pxor %xmm14,%xmm10");
+ asm volatile("pxor %xmm7,%xmm2");
+ asm volatile("pxor %xmm15,%xmm10");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("movdqa %xmm6,%xmm14");
+ asm volatile("movdqa %xmm7,%xmm15");
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm12,%xmm14");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pshufb %xmm13,%xmm15");
+ asm volatile("pxor %xmm6,%xmm3");
+ asm volatile("pxor %xmm14,%xmm11");
+ asm volatile("pxor %xmm7,%xmm3");
+ asm volatile("pxor %xmm15,%xmm11");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+ asm volatile("movdqa %0,%%xmm12" : : "m" (v[0][i+16]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pxor %xmm13,%xmm13");
+ asm volatile("pcmpgtb %xmm1,%xmm5");
+ asm volatile("pcmpgtb %xmm9,%xmm13");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("paddb %xmm9,%xmm9");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pand %xmm7,%xmm13");
+ asm volatile("pxor %xmm5,%xmm1");
+ asm volatile("pxor %xmm13,%xmm9");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm4,%xmm3");
+ asm volatile("pxor %xmm12,%xmm8");
+ asm volatile("pxor %xmm12,%xmm9");
+ asm volatile("pxor %xmm12,%xmm10");
+ asm volatile("pxor %xmm12,%xmm11");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm8,%0" : "=m" (p[i+16]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm9,%0" : "=m" (q[i+16]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm10,%0" : "=m" (r[i+16]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (s[i]));
+ asm volatile("movntdq %%xmm11,%0" : "=m" (s[i+16]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * GEN5 (penta parity with Cauchy matrix) SSSE3 implementation
+ */
+void raid_gen5_ssse3(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ int d, l;
+ size_t i;
+ uint8_t p0[16] __aligned(16);
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 5; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %xmm4,%xmm0");
+ asm volatile("movdqa %%xmm4,%0" : "=m" (p0[0]));
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm1" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm1");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm1");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][2][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][2][1][0]));
+ asm volatile("pshufb %xmm4,%xmm3");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm3");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+ asm volatile("movdqa %0,%%xmm6" : : "m" (p0[0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm0,%xmm5");
+ asm volatile("paddb %xmm0,%xmm0");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pxor %xmm5,%xmm0");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm6");
+ asm volatile("movdqa %%xmm6,%0" : "=m" (p0[0]));
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm1");
+ asm volatile("pxor %xmm7,%xmm1");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][2][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][2][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm3");
+ asm volatile("pxor %xmm7,%xmm3");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+ asm volatile("movdqa %0,%%xmm6" : : "m" (p0[0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+
+ asm volatile("pxor %xmm5,%xmm5");
+ asm volatile("pcmpgtb %xmm0,%xmm5");
+ asm volatile("paddb %xmm0,%xmm0");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pxor %xmm5,%xmm0");
+
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm4,%xmm3");
+ asm volatile("pxor %xmm4,%xmm6");
+
+ asm volatile("movntdq %%xmm6,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm0,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (s[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (t[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86_64
+/*
+ * GEN5 (penta parity with Cauchy matrix) SSSE3 implementation
+ *
+ * Note that it uses 16 registers, meaning that x64 is required.
+ */
+void raid_gen5_ssse3ext(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 5; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ asm volatile("movdqa %0,%%xmm14" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %xmm10,%xmm0");
+ asm volatile("movdqa %xmm10,%xmm1");
+
+ asm volatile("movdqa %xmm10,%xmm11");
+ asm volatile("psrlw $4,%xmm11");
+ asm volatile("pand %xmm15,%xmm10");
+ asm volatile("pand %xmm15,%xmm11");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm10,%xmm2");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("pshufb %xmm10,%xmm3");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm3");
+
+ asm volatile("movdqa %0,%%xmm4" : : "m" (gfgenpshufb[l][2][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][2][1][0]));
+ asm volatile("pshufb %xmm10,%xmm4");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm4");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[d][i]));
+
+ asm volatile("pxor %xmm11,%xmm11");
+ asm volatile("pcmpgtb %xmm1,%xmm11");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm14,%xmm11");
+ asm volatile("pxor %xmm11,%xmm1");
+
+ asm volatile("pxor %xmm10,%xmm0");
+ asm volatile("pxor %xmm10,%xmm1");
+
+ asm volatile("movdqa %xmm10,%xmm11");
+ asm volatile("psrlw $4,%xmm11");
+ asm volatile("pand %xmm15,%xmm10");
+ asm volatile("pand %xmm15,%xmm11");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm2");
+ asm volatile("pxor %xmm13,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm3");
+ asm volatile("pxor %xmm13,%xmm3");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][2][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][2][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm4");
+ asm volatile("pxor %xmm13,%xmm4");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[0][i]));
+
+ asm volatile("pxor %xmm11,%xmm11");
+ asm volatile("pcmpgtb %xmm1,%xmm11");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm14,%xmm11");
+ asm volatile("pxor %xmm11,%xmm1");
+
+ asm volatile("pxor %xmm10,%xmm0");
+ asm volatile("pxor %xmm10,%xmm1");
+ asm volatile("pxor %xmm10,%xmm2");
+ asm volatile("pxor %xmm10,%xmm3");
+ asm volatile("pxor %xmm10,%xmm4");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (s[i]));
+ asm volatile("movntdq %%xmm4,%0" : "=m" (t[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * GEN6 (hexa parity with Cauchy matrix) SSSE3 implementation
+ */
+void raid_gen6_ssse3(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ uint8_t *u;
+ int d, l;
+ size_t i;
+ uint8_t p0[16] __aligned(16);
+ uint8_t q0[16] __aligned(16);
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+ u = v[nd+5];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 6; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %%xmm4,%0" : "=m" (p0[0]));
+ asm volatile("movdqa %%xmm4,%0" : "=m" (q0[0]));
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm0" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm0");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm0");
+
+ asm volatile("movdqa %0,%%xmm1" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm1");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm1");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][2][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][2][1][0]));
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][3][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[l][3][1][0]));
+ asm volatile("pshufb %xmm4,%xmm3");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm7,%xmm3");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm5" : : "m" (p0[0]));
+ asm volatile("movdqa %0,%%xmm6" : : "m" (q0[0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+
+ asm volatile("pxor %xmm4,%xmm4");
+ asm volatile("pcmpgtb %xmm6,%xmm4");
+ asm volatile("paddb %xmm6,%xmm6");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pxor %xmm4,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[d][i]));
+
+ asm volatile("pxor %xmm4,%xmm5");
+ asm volatile("pxor %xmm4,%xmm6");
+ asm volatile("movdqa %%xmm5,%0" : "=m" (p0[0]));
+ asm volatile("movdqa %%xmm6,%0" : "=m" (q0[0]));
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm0");
+ asm volatile("pxor %xmm7,%xmm0");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm1");
+ asm volatile("pxor %xmm7,%xmm1");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][2][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][2][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm2");
+ asm volatile("pxor %xmm7,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm6" : : "m" (gfgenpshufb[d][3][0][0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfgenpshufb[d][3][1][0]));
+ asm volatile("pshufb %xmm4,%xmm6");
+ asm volatile("pshufb %xmm5,%xmm7");
+ asm volatile("pxor %xmm6,%xmm3");
+ asm volatile("pxor %xmm7,%xmm3");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm5" : : "m" (p0[0]));
+ asm volatile("movdqa %0,%%xmm6" : : "m" (q0[0]));
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.poly[0]));
+
+ asm volatile("pxor %xmm4,%xmm4");
+ asm volatile("pcmpgtb %xmm6,%xmm4");
+ asm volatile("paddb %xmm6,%xmm6");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pxor %xmm4,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm4" : : "m" (v[0][i]));
+ asm volatile("pxor %xmm4,%xmm0");
+ asm volatile("pxor %xmm4,%xmm1");
+ asm volatile("pxor %xmm4,%xmm2");
+ asm volatile("pxor %xmm4,%xmm3");
+ asm volatile("pxor %xmm4,%xmm5");
+ asm volatile("pxor %xmm4,%xmm6");
+
+ asm volatile("movntdq %%xmm5,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm6,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm0,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (s[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (t[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (u[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86_64
+/*
+ * GEN6 (hexa parity with Cauchy matrix) SSSE3 implementation
+ *
+ * Note that it uses 16 registers, meaning that x64 is required.
+ */
+void raid_gen6_ssse3ext(int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *q;
+ uint8_t *r;
+ uint8_t *s;
+ uint8_t *t;
+ uint8_t *u;
+ int d, l;
+ size_t i;
+
+ l = nd - 1;
+ p = v[nd];
+ q = v[nd+1];
+ r = v[nd+2];
+ s = v[nd+3];
+ t = v[nd+4];
+ u = v[nd+5];
+
+ /* special case with only one data disk */
+ if (l == 0) {
+ for (i = 0; i < 6; ++i)
+ memcpy(v[1+i], v[0], size);
+ return;
+ }
+
+ raid_asm_begin();
+
+ /* generic case with at least two data disks */
+ asm volatile("movdqa %0,%%xmm14" : : "m" (gfconst16.poly[0]));
+ asm volatile("movdqa %0,%%xmm15" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 16) {
+ /* last disk without the by two multiplication */
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[l][i]));
+
+ asm volatile("movdqa %xmm10,%xmm0");
+ asm volatile("movdqa %xmm10,%xmm1");
+
+ asm volatile("movdqa %xmm10,%xmm11");
+ asm volatile("psrlw $4,%xmm11");
+ asm volatile("pand %xmm15,%xmm10");
+ asm volatile("pand %xmm15,%xmm11");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfgenpshufb[l][0][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][0][1][0]));
+ asm volatile("pshufb %xmm10,%xmm2");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfgenpshufb[l][1][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][1][1][0]));
+ asm volatile("pshufb %xmm10,%xmm3");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm3");
+
+ asm volatile("movdqa %0,%%xmm4" : : "m" (gfgenpshufb[l][2][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][2][1][0]));
+ asm volatile("pshufb %xmm10,%xmm4");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm4");
+
+ asm volatile("movdqa %0,%%xmm5" : : "m" (gfgenpshufb[l][3][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[l][3][1][0]));
+ asm volatile("pshufb %xmm10,%xmm5");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm13,%xmm5");
+
+ /* intermediate disks */
+ for (d = l-1; d > 0; --d) {
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[d][i]));
+
+ asm volatile("pxor %xmm11,%xmm11");
+ asm volatile("pcmpgtb %xmm1,%xmm11");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm14,%xmm11");
+ asm volatile("pxor %xmm11,%xmm1");
+
+ asm volatile("pxor %xmm10,%xmm0");
+ asm volatile("pxor %xmm10,%xmm1");
+
+ asm volatile("movdqa %xmm10,%xmm11");
+ asm volatile("psrlw $4,%xmm11");
+ asm volatile("pand %xmm15,%xmm10");
+ asm volatile("pand %xmm15,%xmm11");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][0][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][0][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm2");
+ asm volatile("pxor %xmm13,%xmm2");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][1][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][1][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm3");
+ asm volatile("pxor %xmm13,%xmm3");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][2][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][2][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm4");
+ asm volatile("pxor %xmm13,%xmm4");
+
+ asm volatile("movdqa %0,%%xmm12" : : "m" (gfgenpshufb[d][3][0][0]));
+ asm volatile("movdqa %0,%%xmm13" : : "m" (gfgenpshufb[d][3][1][0]));
+ asm volatile("pshufb %xmm10,%xmm12");
+ asm volatile("pshufb %xmm11,%xmm13");
+ asm volatile("pxor %xmm12,%xmm5");
+ asm volatile("pxor %xmm13,%xmm5");
+ }
+
+ /* first disk with all coefficients at 1 */
+ asm volatile("movdqa %0,%%xmm10" : : "m" (v[0][i]));
+
+ asm volatile("pxor %xmm11,%xmm11");
+ asm volatile("pcmpgtb %xmm1,%xmm11");
+ asm volatile("paddb %xmm1,%xmm1");
+ asm volatile("pand %xmm14,%xmm11");
+ asm volatile("pxor %xmm11,%xmm1");
+
+ asm volatile("pxor %xmm10,%xmm0");
+ asm volatile("pxor %xmm10,%xmm1");
+ asm volatile("pxor %xmm10,%xmm2");
+ asm volatile("pxor %xmm10,%xmm3");
+ asm volatile("pxor %xmm10,%xmm4");
+ asm volatile("pxor %xmm10,%xmm5");
+
+ asm volatile("movntdq %%xmm0,%0" : "=m" (p[i]));
+ asm volatile("movntdq %%xmm1,%0" : "=m" (q[i]));
+ asm volatile("movntdq %%xmm2,%0" : "=m" (r[i]));
+ asm volatile("movntdq %%xmm3,%0" : "=m" (s[i]));
+ asm volatile("movntdq %%xmm4,%0" : "=m" (t[i]));
+ asm volatile("movntdq %%xmm5,%0" : "=m" (u[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * RAID recovering for one disk SSSE3 implementation
+ */
+void raid_rec1_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ uint8_t *p;
+ uint8_t *pa;
+ uint8_t G;
+ uint8_t V;
+ size_t i;
+
+ (void)nr; /* unused, it's always 1 */
+
+ /* if it's RAID5 uses the faster function */
+ if (ip[0] == 0) {
+ raid_rec1of1(id, nd, size, vv);
+ return;
+ }
+
+#ifdef RAID_USE_RAID6_PQ
+ /* if it's RAID6 recovering with Q uses the faster function */
+ if (ip[0] == 1) {
+ raid6_datap_recov(nd + 2, size, id[0], vv);
+ return;
+ }
+#endif
+
+ /* setup the coefficients matrix */
+ G = A(ip[0], id[0]);
+
+ /* invert it to solve the system of linear equations */
+ V = inv(G);
+
+ /* compute delta parity */
+ raid_delta_gen(1, id, ip, nd, size, vv);
+
+ p = v[nd+ip[0]];
+ pa = v[id[0]];
+
+ raid_asm_begin();
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (gfmulpshufb[V][0][0]));
+ asm volatile("movdqa %0,%%xmm5" : : "m" (gfmulpshufb[V][1][0]));
+
+ for (i = 0; i < size; i += 16) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (p[i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (pa[i]));
+ asm volatile("movdqa %xmm4,%xmm2");
+ asm volatile("movdqa %xmm5,%xmm3");
+ asm volatile("pxor %xmm0,%xmm1");
+ asm volatile("movdqa %xmm1,%xmm0");
+ asm volatile("psrlw $4,%xmm1");
+ asm volatile("pand %xmm7,%xmm0");
+ asm volatile("pand %xmm7,%xmm1");
+ asm volatile("pshufb %xmm0,%xmm2");
+ asm volatile("pshufb %xmm1,%xmm3");
+ asm volatile("pxor %xmm3,%xmm2");
+ asm volatile("movdqa %%xmm2,%0" : "=m" (pa[i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * RAID recovering for two disks SSSE3 implementation
+ */
+void raid_rec2_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ const int N = 2;
+ uint8_t *p[N];
+ uint8_t *pa[N];
+ uint8_t G[N*N];
+ uint8_t V[N*N];
+ size_t i;
+ int j, k;
+
+ (void)nr; /* unused, it's always 2 */
+
+#ifdef RAID_USE_RAID6_PQ
+ /* if it's RAID6 recovering with P and Q uses the faster function */
+ if (ip[0] == 0 && ip[1] == 1) {
+ raid6_2data_recov(nd + 2, size, id[0], id[1], vv);
+ return;
+ }
+#endif
+
+ /* setup the coefficients matrix */
+ for (j = 0; j < N; ++j)
+ for (k = 0; k < N; ++k)
+ G[j*N+k] = A(ip[j], id[k]);
+
+ /* invert it to solve the system of linear equations */
+ raid_invert(G, V, N);
+
+ /* compute delta parity */
+ raid_delta_gen(N, id, ip, nd, size, vv);
+
+ for (j = 0; j < N; ++j) {
+ p[j] = v[nd+ip[j]];
+ pa[j] = v[id[j]];
+ }
+
+ raid_asm_begin();
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 16) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (p[0][i]));
+ asm volatile("movdqa %0,%%xmm2" : : "m" (pa[0][i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (p[1][i]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (pa[1][i]));
+ asm volatile("pxor %xmm2,%xmm0");
+ asm volatile("pxor %xmm3,%xmm1");
+
+ asm volatile("pxor %xmm6,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfmulpshufb[V[0]][0][0]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfmulpshufb[V[0]][1][0]));
+ asm volatile("movdqa %xmm0,%xmm4");
+ asm volatile("movdqa %xmm0,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm3");
+ asm volatile("pxor %xmm2,%xmm6");
+ asm volatile("pxor %xmm3,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfmulpshufb[V[1]][0][0]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfmulpshufb[V[1]][1][0]));
+ asm volatile("movdqa %xmm1,%xmm4");
+ asm volatile("movdqa %xmm1,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm3");
+ asm volatile("pxor %xmm2,%xmm6");
+ asm volatile("pxor %xmm3,%xmm6");
+
+ asm volatile("movdqa %%xmm6,%0" : "=m" (pa[0][i]));
+
+ asm volatile("pxor %xmm6,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfmulpshufb[V[2]][0][0]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfmulpshufb[V[2]][1][0]));
+ asm volatile("movdqa %xmm0,%xmm4");
+ asm volatile("movdqa %xmm0,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm3");
+ asm volatile("pxor %xmm2,%xmm6");
+ asm volatile("pxor %xmm3,%xmm6");
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfmulpshufb[V[3]][0][0]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfmulpshufb[V[3]][1][0]));
+ asm volatile("movdqa %xmm1,%xmm4");
+ asm volatile("movdqa %xmm1,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm3");
+ asm volatile("pxor %xmm2,%xmm6");
+ asm volatile("pxor %xmm3,%xmm6");
+
+ asm volatile("movdqa %%xmm6,%0" : "=m" (pa[1][i]));
+ }
+
+ raid_asm_end();
+}
+#endif
+
+#ifdef CONFIG_X86
+/*
+ * RAID recovering SSSE3 implementation
+ */
+void raid_recX_ssse3(int nr, int *id, int *ip, int nd, size_t size, void **vv)
+{
+ uint8_t **v = (uint8_t **)vv;
+ int N = nr;
+ uint8_t *p[RAID_PARITY_MAX];
+ uint8_t *pa[RAID_PARITY_MAX];
+ uint8_t G[RAID_PARITY_MAX*RAID_PARITY_MAX];
+ uint8_t V[RAID_PARITY_MAX*RAID_PARITY_MAX];
+ size_t i;
+ int j, k;
+
+ /* setup the coefficients matrix */
+ for (j = 0; j < N; ++j)
+ for (k = 0; k < N; ++k)
+ G[j*N+k] = A(ip[j], id[k]);
+
+ /* invert it to solve the system of linear equations */
+ raid_invert(G, V, N);
+
+ /* compute delta parity */
+ raid_delta_gen(N, id, ip, nd, size, vv);
+
+ for (j = 0; j < N; ++j) {
+ p[j] = v[nd+ip[j]];
+ pa[j] = v[id[j]];
+ }
+
+ raid_asm_begin();
+
+ asm volatile("movdqa %0,%%xmm7" : : "m" (gfconst16.low4[0]));
+
+ for (i = 0; i < size; i += 16) {
+ uint8_t PD[RAID_PARITY_MAX][16] __aligned(16);
+
+ /* delta */
+ for (j = 0; j < N; ++j) {
+ asm volatile("movdqa %0,%%xmm0" : : "m" (p[j][i]));
+ asm volatile("movdqa %0,%%xmm1" : : "m" (pa[j][i]));
+ asm volatile("pxor %xmm1,%xmm0");
+ asm volatile("movdqa %%xmm0,%0" : "=m" (PD[j][0]));
+ }
+
+ /* reconstruct */
+ for (j = 0; j < N; ++j) {
+ asm volatile("pxor %xmm0,%xmm0");
+ asm volatile("pxor %xmm1,%xmm1");
+
+ for (k = 0; k < N; ++k) {
+ uint8_t m = V[j*N+k];
+
+ asm volatile("movdqa %0,%%xmm2" : : "m" (gfmulpshufb[m][0][0]));
+ asm volatile("movdqa %0,%%xmm3" : : "m" (gfmulpshufb[m][1][0]));
+ asm volatile("movdqa %0,%%xmm4" : : "m" (PD[k][0]));
+ asm volatile("movdqa %xmm4,%xmm5");
+ asm volatile("psrlw $4,%xmm5");
+ asm volatile("pand %xmm7,%xmm4");
+ asm volatile("pand %xmm7,%xmm5");
+ asm volatile("pshufb %xmm4,%xmm2");
+ asm volatile("pshufb %xmm5,%xmm3");
+ asm volatile("pxor %xmm2,%xmm0");
+ asm volatile("pxor %xmm3,%xmm1");
+ }
+
+ asm volatile("pxor %xmm1,%xmm0");
+ asm volatile("movdqa %%xmm0,%0" : "=m" (pa[j][i]));
+ }
+ }
+
+ raid_asm_end();
+}
+#endif
+
--
1.7.12.1

2014-02-24 21:16:22

by Andrea Mazzoleni

[permalink] [raw]

Subject: [PATCH v5 3/3] btrfs-progs: Adds new par3456 modes to support up to six parities

Extends mkfs.btrfs to support the new par1/2/3/4/5/6 modes to create
filesystem with up to six parities.
Replaces the raid6 code with a new references function able to compute up
to six parities.
Replaces the existing BLOCK_GROUP_RAID5/6 with new PAR1/2/3/4/5/6 ones that
handle up to six parities, and updates all the code to use them.

Signed-off-by: Andrea Mazzoleni <[email protected]>
---
Makefile | 14 ++-
chunk-recover.c | 18 +---
cmds-balance.c | 20 +++-
cmds-check.c | 7 +-
cmds-chunk.c | 18 +---
cmds-filesystem.c | 12 ++-
ctree.h | 42 ++++++++-
disk-io.h | 2 -
extent-tree.c | 3 +-
ioctl.h | 18 +++-
man/mkfs.btrfs.8.in | 4 +-
mkfs.c | 28 +++++-
mktables.c | 256 ++++++++++++++++++++++++++++++++++++++++++++++++++++
raid.c | 44 +++++++++
raid.h | 34 +++++++
raid6.c | 101 ---------------------
utils.c | 12 ++-
volumes.c | 112 ++++++++++-------------
volumes.h | 12 ++-
19 files changed, 530 insertions(+), 227 deletions(-)
create mode 100644 mktables.c
create mode 100644 raid.c
create mode 100644 raid.h
delete mode 100644 raid6.c

diff --git a/Makefile b/Makefile
index 0874a41..72c5c01 100644
--- a/Makefile
+++ b/Makefile
@@ -9,7 +9,7 @@ CFLAGS = -g -O1 -fno-strict-aliasing
objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \
extent-cache.o extent_io.o volumes.o utils.o repair.o \
- qgroup.o raid6.o free-space-cache.o list_sort.o
+ qgroup.o raid.o tables.o free-space-cache.o list_sort.o
cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
@@ -140,6 +140,10 @@ version.h:
@echo " [SH] $@"
$(Q)bash version.sh

+tables.c: mktables
+ @echo " [MK] $@"
+ $(Q)./mktables > tables.c
+
$(libs_shared): $(libbtrfs_objects) $(lib_links) send.h
@echo " [LD] $@"
$(Q)$(CC) $(CFLAGS) $(libbtrfs_objects) $(LDFLAGS) $(lib_LIBS) \
@@ -193,6 +197,10 @@ mkfs.btrfs: $(objects) $(libs) mkfs.o
@echo " [LD] $@"
$(Q)$(CC) $(CFLAGS) -o mkfs.btrfs $(objects) mkfs.o $(LDFLAGS) $(LIBS)

+mktables: $(libs) mktables.o
+ @echo " [LD] $@"
+ $(Q)$(CC) $(CFLAGS) -o mktables mktables.o $(LDFLAGS) $(LIBS)
+
mkfs.btrfs.static: $(static_objects) mkfs.static.o $(static_libbtrfs_objects)
@echo " [LD] $@"
$(Q)$(CC) $(STATIC_CFLAGS) -o mkfs.btrfs.static mkfs.static.o $(static_objects) \
@@ -225,8 +233,8 @@ clean: $(CLEANDIRS)
@echo "Cleaning"
$(Q)rm -f $(progs) cscope.out *.o *.o.d btrfs-convert btrfs-image btrfs-select-super \
btrfs-zero-log btrfstune dir-test ioctl-test quick-test send-test btrfsck \
- btrfs.static mkfs.btrfs.static btrfs-calc-size \
- version.h $(check_defs) \
+ btrfs.static mkfs.btrfs.static btrfs-calc-size mktables \
+ version.h tables.c $(check_defs) \
$(libs) $(lib_links)

$(CLEANDIRS):
diff --git a/chunk-recover.c b/chunk-recover.c
index bcde39e..cec14cd 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1327,8 +1327,7 @@ static int calc_num_stripes(u64 type)
{
if (type & (BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6))
+ BTRFS_BLOCK_GROUP_PARX))
return 0;
else if (type & (BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_DUP))
@@ -1404,13 +1403,8 @@ static int btrfs_calc_stripe_index(struct chunk_record *chunk, u64 logical)
} else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID10) {
index = stripe_nr % (chunk->num_stripes / chunk->sub_stripes);
index *= chunk->sub_stripes;
- } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID5) {
- nr_data_stripes = chunk->num_stripes - 1;
- index = stripe_nr % nr_data_stripes;
- stripe_nr /= nr_data_stripes;
- index = (index + stripe_nr) % chunk->num_stripes;
- } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID6) {
- nr_data_stripes = chunk->num_stripes - 2;
+ } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_PARX) {
+ nr_data_stripes = chunk->num_stripes - btrfs_flags_par(chunk->type_flags);
index = stripe_nr % nr_data_stripes;
stripe_nr /= nr_data_stripes;
index = (index + stripe_nr) % chunk->num_stripes;
@@ -1503,8 +1497,7 @@ no_extent_record:
if (list_empty(&devexts))
return 0;

- if (chunk->type_flags & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ if (chunk->type_flags & BTRFS_BLOCK_GROUP_PARX) {
/* Fixme: try to recover the order by the parity block. */
list_splice_tail(&devexts, &chunk->dextents);
return -EINVAL;
@@ -1540,8 +1533,7 @@ no_extent_record:

#define BTRFS_ORDERED_RAID (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID10 | \
- BTRFS_BLOCK_GROUP_RAID5 | \
- BTRFS_BLOCK_GROUP_RAID6)
+ BTRFS_BLOCK_GROUP_PARX)

static int btrfs_rebuild_chunk_stripes(struct recover_control *rc,
struct chunk_record *chunk)
diff --git a/cmds-balance.c b/cmds-balance.c
index a151475..7d116bb 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -48,10 +48,22 @@ static int parse_one_profile(const char *profile, u64 *flags)
*flags |= BTRFS_BLOCK_GROUP_RAID1;
} else if (!strcmp(profile, "raid10")) {
*flags |= BTRFS_BLOCK_GROUP_RAID10;
- } else if (!strcmp(profile, "raid5")) {
- *flags |= BTRFS_BLOCK_GROUP_RAID5;
- } else if (!strcmp(profile, "raid6")) {
- *flags |= BTRFS_BLOCK_GROUP_RAID6;
+ } else if (!strcmp(profile, "raid5")) { /* synonymous of "par1" */
+ *flags |= BTRFS_BLOCK_GROUP_PAR1;
+ } else if (!strcmp(profile, "raid6")) { /* synonymous of "par2" */
+ *flags |= BTRFS_BLOCK_GROUP_PAR2;
+ } else if (!strcmp(profile, "par1")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR1;
+ } else if (!strcmp(profile, "par2")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR2;
+ } else if (!strcmp(profile, "par3")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR3;
+ } else if (!strcmp(profile, "par4")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR4;
+ } else if (!strcmp(profile, "par5")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR5;
+ } else if (!strcmp(profile, "par6")) {
+ *flags |= BTRFS_BLOCK_GROUP_PAR6;
} else if (!strcmp(profile, "dup")) {
*flags |= BTRFS_BLOCK_GROUP_DUP;
} else if (!strcmp(profile, "single")) {
diff --git a/cmds-check.c b/cmds-check.c
index a65670e..46e1a83 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5189,12 +5189,9 @@ u64 calc_stripe_length(u64 type, u64 length, int num_stripes)
} else if (type & BTRFS_BLOCK_GROUP_RAID10) {
stripe_size = length * 2;
stripe_size /= num_stripes;
- } else if (type & BTRFS_BLOCK_GROUP_RAID5) {
+ } else if (type & BTRFS_BLOCK_GROUP_PARX) {
stripe_size = length;
- stripe_size /= (num_stripes - 1);
- } else if (type & BTRFS_BLOCK_GROUP_RAID6) {
- stripe_size = length;
- stripe_size /= (num_stripes - 2);
+ stripe_size /= num_stripes - btrfs_flags_par(type);
} else {
stripe_size = length;
}
diff --git a/cmds-chunk.c b/cmds-chunk.c
index 4d7fce0..b4c067d 100644
--- a/cmds-chunk.c
+++ b/cmds-chunk.c
@@ -1347,8 +1347,7 @@ static int calc_num_stripes(u64 type)
{
if (type & (BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6))
+ BTRFS_BLOCK_GROUP_PARX))
return 0;
else if (type & (BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_DUP))
@@ -1424,13 +1423,8 @@ static int btrfs_calc_stripe_index(struct chunk_record *chunk, u64 logical)
} else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID10) {
index = stripe_nr % (chunk->num_stripes / chunk->sub_stripes);
index *= chunk->sub_stripes;
- } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID5) {
- nr_data_stripes = chunk->num_stripes - 1;
- index = stripe_nr % nr_data_stripes;
- stripe_nr /= nr_data_stripes;
- index = (index + stripe_nr) % chunk->num_stripes;
- } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_RAID6) {
- nr_data_stripes = chunk->num_stripes - 2;
+ } else if (chunk->type_flags & BTRFS_BLOCK_GROUP_PARX) {
+ nr_data_stripes = chunk->num_stripes - btrfs_flags_par(chunk->type_flags);
index = stripe_nr % nr_data_stripes;
stripe_nr /= nr_data_stripes;
index = (index + stripe_nr) % chunk->num_stripes;
@@ -1523,8 +1517,7 @@ no_extent_record:
if (list_empty(&devexts))
return 0;

- if (chunk->type_flags & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ if (chunk->type_flags & BTRFS_BLOCK_GROUP_PARX) {
/* Fixme: try to recover the order by the parity block. */
list_splice_tail(&devexts, &chunk->dextents);
return -EINVAL;
@@ -1560,8 +1553,7 @@ no_extent_record:

#define BTRFS_ORDERED_RAID (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID10 | \
- BTRFS_BLOCK_GROUP_RAID5 | \
- BTRFS_BLOCK_GROUP_RAID6)
+ BTRFS_BLOCK_GROUP_PARX)

static int btrfs_rebuild_chunk_stripes(struct recover_control *rc,
struct chunk_record *chunk)
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 1c1926b..861cbb3 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -142,10 +142,18 @@ static char *group_profile_str(u64 flag)
return "RAID0";
case BTRFS_BLOCK_GROUP_RAID1:
return "RAID1";
- case BTRFS_BLOCK_GROUP_RAID5:
+ case BTRFS_BLOCK_GROUP_PAR1:
return "RAID5";
- case BTRFS_BLOCK_GROUP_RAID6:
+ case BTRFS_BLOCK_GROUP_PAR2:
return "RAID6";
+ case BTRFS_BLOCK_GROUP_PAR3:
+ return "PAR3";
+ case BTRFS_BLOCK_GROUP_PAR4:
+ return "PAR4";
+ case BTRFS_BLOCK_GROUP_PAR5:
+ return "PAR5";
+ case BTRFS_BLOCK_GROUP_PAR6:
+ return "PAR6";
case BTRFS_BLOCK_GROUP_DUP:
return "DUP";
case BTRFS_BLOCK_GROUP_RAID10:
diff --git a/ctree.h b/ctree.h
index 2117374..4d2d1b6 100644
--- a/ctree.h
+++ b/ctree.h
@@ -470,6 +470,7 @@ struct btrfs_super_block {
#define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF (1ULL << 6)
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
+#define BTRFS_FEATURE_INCOMPAT_PAR3456 (1ULL << 10)

#define BTRFS_FEATURE_COMPAT_SUPP 0ULL
@@ -482,7 +483,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS | \
- BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
+ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
+ BTRFS_FEATURE_INCOMPAT_PAR3456)

/*
* A leaf is full of items. offset and size tell us where to find
@@ -830,8 +832,39 @@ struct btrfs_csum_item {
#define BTRFS_BLOCK_GROUP_RAID1 (1ULL << 4)
#define BTRFS_BLOCK_GROUP_DUP (1ULL << 5)
#define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6)
-#define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7)
-#define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8)
+#define BTRFS_BLOCK_GROUP_PAR1 (1ULL << 7)
+#define BTRFS_BLOCK_GROUP_PAR2 (1ULL << 8)
+#define BTRFS_BLOCK_GROUP_PAR3 (1ULL << 9)
+#define BTRFS_BLOCK_GROUP_PAR4 (1ULL << 10)
+#define BTRFS_BLOCK_GROUP_PAR5 (1ULL << 11)
+#define BTRFS_BLOCK_GROUP_PAR6 (1ULL << 12)
+
+/* tags for all the parity groups */
+#define BTRFS_BLOCK_GROUP_PARX (BTRFS_BLOCK_GROUP_PAR1 | \
+ BTRFS_BLOCK_GROUP_PAR2 | \
+ BTRFS_BLOCK_GROUP_PAR3 | \
+ BTRFS_BLOCK_GROUP_PAR4 | \
+ BTRFS_BLOCK_GROUP_PAR5 | \
+ BTRFS_BLOCK_GROUP_PAR6)
+
+/* gets the parity number from the parity group */
+static inline int btrfs_flags_par(unsigned group)
+{
+ switch (group & BTRFS_BLOCK_GROUP_PARX) {
+ case BTRFS_BLOCK_GROUP_PAR1: return 1;
+ case BTRFS_BLOCK_GROUP_PAR2: return 2;
+ case BTRFS_BLOCK_GROUP_PAR3: return 3;
+ case BTRFS_BLOCK_GROUP_PAR4: return 4;
+ case BTRFS_BLOCK_GROUP_PAR5: return 5;
+ case BTRFS_BLOCK_GROUP_PAR6: return 6;
+ }
+
+ /* ensures that no multiple groups are defined */
+ BUG_ON(group & BTRFS_BLOCK_GROUP_PARX);
+
+ return 0;
+}
+
#define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE

#define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \
@@ -840,8 +873,7 @@ struct btrfs_csum_item {

#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID1 | \
- BTRFS_BLOCK_GROUP_RAID5 | \
- BTRFS_BLOCK_GROUP_RAID6 | \
+ BTRFS_BLOCK_GROUP_PARX | \
BTRFS_BLOCK_GROUP_DUP | \
BTRFS_BLOCK_GROUP_RAID10)

diff --git a/disk-io.h b/disk-io.h
index ca6af2d..27e3dc4 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -110,5 +110,3 @@ int write_and_map_eb(struct btrfs_trans_handle *trans, struct btrfs_root *root,
struct extent_buffer *eb);
#endif

-/* raid6.c */
-void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs);
diff --git a/extent-tree.c b/extent-tree.c
index 7860d1d..98a8cb4 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1862,8 +1862,7 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_DUP);
if (extra_flags) {
if (flags & BTRFS_BLOCK_GROUP_DATA)
diff --git a/ioctl.h b/ioctl.h
index a589cd7..f798d22 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -466,7 +466,11 @@ enum btrfs_err_code {
BTRFS_ERROR_DEV_TGT_REPLACE,
BTRFS_ERROR_DEV_MISSING_NOT_FOUND,
BTRFS_ERROR_DEV_ONLY_WRITABLE,
- BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS
+ BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS,
+ BTRFS_ERROR_DEV_PAR3_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR4_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR5_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR6_MIN_NOT_MET
};

/* An error code to error string mapping for the kernel
@@ -480,9 +484,9 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
case BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET:
return "unable to go below four devices on raid10";
case BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET:
- return "unable to go below three devices on raid5";
+ return "unable to go below two devices on raid5/par1";
case BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET:
- return "unable to go below four devices on raid6";
+ return "unable to go below three devices on raid6/par2";
case BTRFS_ERROR_DEV_TGT_REPLACE:
return "unable to remove the dev_replace target dev";
case BTRFS_ERROR_DEV_MISSING_NOT_FOUND:
@@ -492,6 +496,14 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
case BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS:
return "add/delete/balance/replace/resize operation "
"in progress";
+ case BTRFS_ERROR_DEV_PAR3_MIN_NOT_MET:
+ return "unable to go below four devices on par3";
+ case BTRFS_ERROR_DEV_PAR4_MIN_NOT_MET:
+ return "unable to go below five devices on par4";
+ case BTRFS_ERROR_DEV_PAR5_MIN_NOT_MET:
+ return "unable to go below six devices on par5";
+ case BTRFS_ERROR_DEV_PAR6_MIN_NOT_MET:
+ return "unable to go below seven devices on par5";
default:
return NULL;
}
diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
index b54e935..e3f4ec7 100644
--- a/man/mkfs.btrfs.8.in
+++ b/man/mkfs.btrfs.8.in
@@ -38,7 +38,9 @@ mkfs.btrfs uses all the available storage for the filesystem.
.TP
\fB\-d\fR, \fB\-\-data \fItype\fR
Specify how the data must be spanned across the devices specified. Valid
-values are raid0, raid1, raid5, raid6, raid10 or single.
+values are raid0, raid1, raid5, raid6, raid10, par1, par2, par3, par4, par5,
+par6 or single. The parX values enable RAID for up to six parity levels.
+Note that raid5 and raid6 are synonymous of par1 and par2.
.TP
\fB\-f\fR, \fB\-\-force\fR
Force overwrite when an existing filesystem is detected on the device.
diff --git a/mkfs.c b/mkfs.c
index 33369f9..661e59f 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -276,7 +276,7 @@ static void print_usage(void)
fprintf(stderr, "options:\n");
fprintf(stderr, "\t -A --alloc-start the offset to start the FS\n");
fprintf(stderr, "\t -b --byte-count total number of bytes in the FS\n");
- fprintf(stderr, "\t -d --data data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n");
+ fprintf(stderr, "\t -d --data data profile, raid0, raid1, raid5, raid6, par[1,2,3,4,5,6], raid10, dup or single\n");
fprintf(stderr, "\t -f --force force overwrite of existing filesystem\n");
fprintf(stderr, "\t -l --leafsize size of btree leaves\n");
fprintf(stderr, "\t -L --label set a label\n");
@@ -306,9 +306,21 @@ static u64 parse_profile(char *s)
} else if (strcmp(s, "raid1") == 0) {
return BTRFS_BLOCK_GROUP_RAID1;
} else if (strcmp(s, "raid5") == 0) {
- return BTRFS_BLOCK_GROUP_RAID5;
+ return BTRFS_BLOCK_GROUP_PAR1;
} else if (strcmp(s, "raid6") == 0) {
- return BTRFS_BLOCK_GROUP_RAID6;
+ return BTRFS_BLOCK_GROUP_PAR2;
+ } else if (strcmp(s, "par1") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR1;
+ } else if (strcmp(s, "par2") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR2;
+ } else if (strcmp(s, "par3") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR3;
+ } else if (strcmp(s, "par4") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR4;
+ } else if (strcmp(s, "par5") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR5;
+ } else if (strcmp(s, "par6") == 0) {
+ return BTRFS_BLOCK_GROUP_PAR6;
} else if (strcmp(s, "raid10") == 0) {
return BTRFS_BLOCK_GROUP_RAID10;
} else if (strcmp(s, "dup") == 0) {
@@ -1147,6 +1159,8 @@ static const struct btrfs_fs_feature {
"raid56 extended format" },
{ "skinny-metadata", BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA,
"reduced-size metadata extent refs" },
+ { "par3456", BTRFS_FEATURE_INCOMPAT_PAR3456,
+ "raid support with up to six parities" },
/* Keep this one last */
{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
};
@@ -1491,10 +1505,16 @@ int main(int ac, char **av)
features |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS;

if ((data_profile | metadata_profile) &
- (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+ (BTRFS_BLOCK_GROUP_PAR1 | BTRFS_BLOCK_GROUP_PAR2)) {
features |= BTRFS_FEATURE_INCOMPAT_RAID56;
}

+ if ((data_profile | metadata_profile) &
+ (BTRFS_BLOCK_GROUP_PAR3 | BTRFS_BLOCK_GROUP_PAR4
+ | BTRFS_BLOCK_GROUP_PAR5 | BTRFS_BLOCK_GROUP_PAR6)) {
+ features |= BTRFS_FEATURE_INCOMPAT_PAR3456;
+ }
+
process_fs_features(features);

ret = make_btrfs(fd, file, label, blocks, dev_block_count,
diff --git a/mktables.c b/mktables.c
new file mode 100644
index 0000000..21c0222
--- /dev/null
+++ b/mktables.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+/**
+ * Multiplication a*b in GF(2^8).
+ */
+static uint8_t gfmul(uint8_t a, uint8_t b)
+{
+ uint8_t v;
+
+ v = 0;
+ while (b) {
+ if ((b & 1) != 0)
+ v ^= a;
+
+ if ((a & 0x80) != 0) {
+ a <<= 1;
+ a ^= 0x1d;
+ } else {
+ a <<= 1;
+ }
+
+ b >>= 1;
+ }
+
+ return v;
+}
+
+/**
+ * Inversion (1/a) in GF(2^8).
+ */
+uint8_t gfinv[256];
+
+/**
+ * Number of parities.
+ * This is the number of rows of the generator matrix.
+ */
+#define PARITY 6
+
+/**
+ * Number of disks.
+ * This is the number of columns of the generator matrix.
+ */
+#define DISK (257-PARITY)
+
+/**
+ * Setup the Cauchy matrix used to generate the parity.
+ */
+static void set_cauchy(uint8_t *matrix)
+{
+ int i, j;
+ uint8_t inv_x, y;
+
+ /*
+ * The first row of the generator matrix is formed by all 1.
+ *
+ * The generator matrix is an Extended Cauchy matrix built from
+ * a Cauchy matrix adding at the top a row of all 1.
+ *
+ * Extending a Cauchy matrix in this way maintains the MDS property
+ * of the matrix.
+ *
+ * For example, considering a generator matrix of 4x6 we have now:
+ *
+ * 1 1 1 1 1 1
+ * - - - - - -
+ * - - - - - -
+ * - - - - - -
+ */
+ for (i = 0; i < DISK; ++i)
+ matrix[0*DISK+i] = 1;
+
+ /*
+ * Second row is formed with powers 2^i, and it's the first
+ * row of the Cauchy matrix.
+ *
+ * Each element of the Cauchy matrix is in the form 1/(x_i + y_j)
+ * where all x_i and y_j must be different for any i and j.
+ *
+ * For the first row with j=0, we choose x_i = 2^-i and y_0 = 0
+ * and we obtain a first row formed as:
+ *
+ * 1/(x_i + y_0) = 1/(2^-i + 0) = 2^i
+ *
+ * with 2^-i != 0 for any i
+ *
+ * In the example we get:
+ *
+ * x_0 = 1
+ * x_1 = 142
+ * x_2 = 71
+ * x_3 = 173
+ * x_4 = 216
+ * x_5 = 108
+ * y_0 = 0
+ *
+ * with the matrix:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * - - - - - -
+ * - - - - - -
+ */
+ inv_x = 1;
+ for (i = 0; i < DISK; ++i) {
+ matrix[1*DISK+i] = inv_x;
+ inv_x = gfmul(2, inv_x);
+ }
+
+ /*
+ * The rest of the Cauchy matrix is formed choosing for each row j
+ * a new y_j = 2^j and reusing the x_i already assigned in the first
+ * row obtaining :
+ *
+ * 1/(x_i + y_j) = 1/(2^-i + 2^j)
+ *
+ * with 2^-i + 2^j != 0 for any i,j with i>=0,j>=1,i+j<255
+ *
+ * In the example we get:
+ *
+ * y_1 = 2
+ * y_2 = 4
+ *
+ * with the matrix:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * 244 83 78 183 118 47
+ * 167 39 213 59 153 82
+ */
+ y = 2;
+ for (j = 0; j < PARITY-2; ++j) {
+ inv_x = 1;
+ for (i = 0; i < DISK; ++i) {
+ uint8_t x = gfinv[inv_x];
+ matrix[(j+2)*DISK+i] = gfinv[y ^ x];
+ inv_x = gfmul(2, inv_x);
+ }
+
+ y = gfmul(2, y);
+ }
+
+ /*
+ * Finally we adjust the matrix multipling each row for
+ * the inverse of the first element in the row.
+ *
+ * Also this operation maintains the MDS property of the matrix.
+ *
+ * Resulting in:
+ *
+ * 1 1 1 1 1 1
+ * 1 2 4 8 16 32
+ * 1 245 210 196 154 113
+ * 1 187 166 215 7 106
+ */
+ for (j = 0; j < PARITY-2; ++j) {
+ uint8_t f = gfinv[matrix[(j+2)*DISK]];
+
+ for (i = 0; i < DISK; ++i)
+ matrix[(j+2)*DISK+i] = gfmul(matrix[(j+2)*DISK+i], f);
+ }
+}
+
+int main(void)
+{
+ uint8_t v;
+ int i, j, p;
+ uint8_t matrix[PARITY * 256];
+
+ printf("/*\n");
+ printf(" * Copyright (C) 2013 Andrea Mazzoleni\n");
+ printf(" *\n");
+ printf(" * This program is free software: you can redistribute it and/or modify\n");
+ printf(" * it under the terms of the GNU General Public License as published by\n");
+ printf(" * the Free Software Foundation, either version 2 of the License, or\n");
+ printf(" * (at your option) any later version.\n");
+ printf(" *\n");
+ printf(" * This program is distributed in the hope that it will be useful,\n");
+ printf(" * but WITHOUT ANY WARRANTY; without even the implied warranty of\n");
+ printf(" * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n");
+ printf(" * GNU General Public License for more details.\n");
+ printf(" */\n");
+ printf("\n");
+
+ printf("#include \"kerncompat.h\"\n");
+ printf("\n");
+
+ /* a*b */
+ printf("const u8 raid_gfmul[256][256] =\n");
+ printf("{\n");
+ for (i = 0; i < 256; ++i) {
+ printf("\t{\n");
+ for (j = 0; j < 256; ++j) {
+ if (j % 8 == 0)
+ printf("\t\t");
+ v = gfmul(i, j);
+ if (v == 1)
+ gfinv[i] = j;
+ printf("0x%02x,", (unsigned)v);
+ if (j % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("\t},\n");
+ }
+ printf("};\n\n");
+
+ /* cauchy matrix */
+ set_cauchy(matrix);
+
+ printf("/**\n");
+ printf(" * Cauchy matrix used to generate parity.\n");
+ printf(" * This matrix is valid for up to %u parity with %u data disks.\n", PARITY, DISK);
+ printf(" *\n");
+ for (p = 0; p < PARITY; ++p) {
+ printf(" * ");
+ for (i = 0; i < DISK; ++i)
+ printf("%02x ", matrix[p*DISK+i]);
+ printf("\n");
+ }
+ printf(" */\n");
+ printf("const u8 raid_gfcauchy[%u][256] =\n", PARITY);
+ printf("{\n");
+ for (p = 0; p < PARITY; ++p) {
+ printf("\t{\n");
+ for (i = 0; i < DISK; ++i) {
+ if (i % 8 == 0)
+ printf("\t\t");
+ printf("0x%02x,", matrix[p*DISK+i]);
+ if (i % 8 == 7)
+ printf("\n");
+ else
+ printf(" ");
+ }
+ printf("\n\t},\n");
+ }
+ printf("};\n\n");
+
+ return 0;
+}
+
diff --git a/raid.c b/raid.c
new file mode 100644
index 0000000..2aa275e
--- /dev/null
+++ b/raid.c
@@ -0,0 +1,44 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "raid.h"
+
+/* tables defined in tables.c */
+const u8 raid_gfmul[256][256];
+const u8 raid_gfcauchy[6][256];
+
+void raid_gen(int nd, int np, size_t size, void **vv)
+{
+ u8 **v = (u8 **)vv;
+ size_t i;
+
+ for (i = 0; i < size; ++i) {
+ u8 p[RAID_PARITY_MAX];
+ int j, d;
+
+ for (j = 0; j < np; ++j)
+ p[j] = 0;
+
+ for (d = 0; d < nd; ++d) {
+ u8 b = v[d][i];
+
+ for (j = 0; j < np; ++j)
+ p[j] ^= raid_gfmul[b][raid_gfcauchy[j][d]];
+ }
+
+ for (j = 0; j < np; ++j)
+ v[nd + j][i] = p[j];
+ }
+}
+
diff --git a/raid.h b/raid.h
new file mode 100644
index 0000000..83f8b25
--- /dev/null
+++ b/raid.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2013 Andrea Mazzoleni
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAID_H
+#define __RAID_H
+
+#include "kerncompat.h"
+
+/*
+ * Max number of parities supported.
+ */
+#define RAID_PARITY_MAX 6
+
+/*
+ * Generate the RAID Cauchy parity.
+ *
+ * Note that this is the slow reference implementation.
+ * For a faster one and documentation see lib/raid/raid.c in the Linux Kernel.
+ */
+void raid_gen(int nd, int np, size_t size, void **vv);
+
+#endif
+
diff --git a/raid6.c b/raid6.c
deleted file mode 100644
index a6ee483..0000000
--- a/raid6.c
+++ /dev/null
@@ -1,101 +0,0 @@
-/* -*- linux-c -*- ------------------------------------------------------- *
- *
- * Copyright 2002-2004 H. Peter Anvin - All Rights Reserved
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, Inc., 53 Temple Place Ste 330,
- * Boston MA 02111-1307, USA; either version 2 of the License, or
- * (at your option) any later version; incorporated herein by reference.
- *
- * ----------------------------------------------------------------------- */
-
-/*
- * raid6int1.c
- *
- * 1-way unrolled portable integer math RAID-6 instruction set
- *
- * This file was postprocessed using unroll.pl and then ported to userspace
- */
-#include <stdint.h>
-#include <unistd.h>
-#include "kerncompat.h"
-#include "ctree.h"
-#include "disk-io.h"
-
-/*
- * This is the C data type to use
- */
-
-/* Change this from BITS_PER_LONG if there is something better... */
-#if BITS_PER_LONG == 64
-# define NBYTES(x) ((x) * 0x0101010101010101UL)
-# define NSIZE 8
-# define NSHIFT 3
-typedef uint64_t unative_t;
-#else
-# define NBYTES(x) ((x) * 0x01010101U)
-# define NSIZE 4
-# define NSHIFT 2
-typedef uint32_t unative_t;
-#endif
-
-/*
- * These sub-operations are separate inlines since they can sometimes be
- * specially optimized using architecture-specific hacks.
- */
-
-/*
- * The SHLBYTE() operation shifts each byte left by 1, *not*
- * rolling over into the next byte
- */
-static inline __attribute_const__ unative_t SHLBYTE(unative_t v)
-{
- unative_t vv;
-
- vv = (v << 1) & NBYTES(0xfe);
- return vv;
-}
-
-/*
- * The MASK() operation returns 0xFF in any byte for which the high
- * bit is 1, 0x00 for any byte for which the high bit is 0.
- */
-static inline __attribute_const__ unative_t MASK(unative_t v)
-{
- unative_t vv;
-
- vv = v & NBYTES(0x80);
- vv = (vv << 1) - (vv >> 7); /* Overflow on the top bit is OK */
- return vv;
-}
-
-
-void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs)
-{
- uint8_t **dptr = (uint8_t **)ptrs;
- uint8_t *p, *q;
- int d, z, z0;
-
- unative_t wd0, wq0, wp0, w10, w20;
-
- z0 = disks - 3; /* Highest data disk */
- p = dptr[z0+1]; /* XOR parity */
- q = dptr[z0+2]; /* RS syndrome */
-
- for ( d = 0 ; d < bytes ; d += NSIZE*1 ) {
- wq0 = wp0 = *(unative_t *)&dptr[z0][d+0*NSIZE];
- for ( z = z0-1 ; z >= 0 ; z-- ) {
- wd0 = *(unative_t *)&dptr[z][d+0*NSIZE];
- wp0 ^= wd0;
- w20 = MASK(wq0);
- w10 = SHLBYTE(wq0);
- w20 &= NBYTES(0x1d);
- w10 ^= w20;
- wq0 = w10 ^ wd0;
- }
- *(unative_t *)&p[d+NSIZE*0] = wp0;
- *(unative_t *)&q[d+NSIZE*0] = wq0;
- }
-}
-
diff --git a/utils.c b/utils.c
index f499023..52b090b 100644
--- a/utils.c
+++ b/utils.c
@@ -1856,13 +1856,19 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,

switch (dev_cnt) {
default:
+ case 7:
+ allowed |= BTRFS_BLOCK_GROUP_PAR6;
+ case 6:
+ allowed |= BTRFS_BLOCK_GROUP_PAR5;
+ case 5:
+ allowed |= BTRFS_BLOCK_GROUP_PAR4;
case 4:
- allowed |= BTRFS_BLOCK_GROUP_RAID10;
+ allowed |= BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_PAR3;
case 3:
- allowed |= BTRFS_BLOCK_GROUP_RAID6;
+ allowed |= BTRFS_BLOCK_GROUP_PAR2;
case 2:
allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5;
+ BTRFS_BLOCK_GROUP_PAR1;
break;
case 1:
allowed |= BTRFS_BLOCK_GROUP_DUP;
diff --git a/volumes.c b/volumes.c
index c38da6c..b1fb7de 100644
--- a/volumes.c
+++ b/volumes.c
@@ -30,6 +30,7 @@
#include "print-tree.h"
#include "volumes.h"
#include "math.h"
+#include "raid.h"

struct stripe {
struct btrfs_device *dev;
@@ -38,12 +39,7 @@ struct stripe {

static inline int nr_parity_stripes(struct map_lookup *map)
{
- if (map->type & BTRFS_BLOCK_GROUP_RAID5)
- return 1;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- return 2;
- else
- return 0;
+ return btrfs_flags_par(map->type);
}

static inline int nr_data_stripes(struct map_lookup *map)
@@ -51,8 +47,6 @@ static inline int nr_data_stripes(struct map_lookup *map)
return map->num_stripes - nr_parity_stripes(map);
}

-#define is_parity_stripe(x) ( ((x) == BTRFS_RAID5_P_STRIPE) || ((x) == BTRFS_RAID6_Q_STRIPE) )
-
static LIST_HEAD(fs_uuids);

static struct btrfs_device *__find_device(struct list_head *head, u64 devid,
@@ -643,10 +637,8 @@ static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes,
return calc_size;
else if (type & BTRFS_BLOCK_GROUP_RAID10)
return calc_size * (num_stripes / sub_stripes);
- else if (type & BTRFS_BLOCK_GROUP_RAID5)
- return calc_size * (num_stripes - 1);
- else if (type & BTRFS_BLOCK_GROUP_RAID6)
- return calc_size * (num_stripes - 2);
+ else if (type & BTRFS_BLOCK_GROUP_PARX)
+ return calc_size * (num_stripes - btrfs_flags_par(type));
else
return calc_size * num_stripes;
}
@@ -782,7 +774,7 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
}

if (type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_RAID10 |
BTRFS_BLOCK_GROUP_DUP)) {
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
@@ -822,20 +814,13 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
sub_stripes = 2;
min_stripes = 4;
}
- if (type & (BTRFS_BLOCK_GROUP_RAID5)) {
- num_stripes = btrfs_super_num_devices(info->super_copy);
- if (num_stripes < 2)
- return -ENOSPC;
- min_stripes = 2;
- stripe_len = find_raid56_stripe_len(num_stripes - 1,
- btrfs_super_stripesize(info->super_copy));
- }
- if (type & (BTRFS_BLOCK_GROUP_RAID6)) {
+ if (type & BTRFS_BLOCK_GROUP_PARX) {
+ min_stripes = 1 + btrfs_flags_par(type);
num_stripes = btrfs_super_num_devices(info->super_copy);
- if (num_stripes < 3)
+ if (num_stripes < min_stripes)
return -ENOSPC;
- min_stripes = 3;
- stripe_len = find_raid56_stripe_len(num_stripes - 2,
+
+ stripe_len = find_raid56_stripe_len(num_stripes - btrfs_flags_par(type),
btrfs_super_stripesize(info->super_copy));
}

@@ -1107,10 +1092,8 @@ int btrfs_num_copies(struct btrfs_mapping_tree *map_tree, u64 logical, u64 len)
ret = map->num_stripes;
else if (map->type & BTRFS_BLOCK_GROUP_RAID10)
ret = map->sub_stripes;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
- ret = 2;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- ret = 3;
+ else if (map->type & BTRFS_BLOCK_GROUP_PARX)
+ ret = 1 + btrfs_flags_par(map->type);
else
ret = 1;
return ret;
@@ -1163,8 +1146,7 @@ int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree,
length = ce->size / (map->num_stripes / map->sub_stripes);
else if (map->type & BTRFS_BLOCK_GROUP_RAID0)
length = ce->size / map->num_stripes;
- else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ else if (map->type & BTRFS_BLOCK_GROUP_PARX) {
length = ce->size / nr_data_stripes(map);
rmap_len = map->stripe_len * nr_data_stripes(map);
}
@@ -1294,9 +1276,9 @@ again:
stripes_required = map->sub_stripes;
}
}
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)
+ if ((map->type & BTRFS_BLOCK_GROUP_PARX)
&& multi_ret && ((rw & WRITE) || mirror_num > 1) && raid_map_ret) {
- /* RAID[56] write or recovery. Return all stripes */
+ /* PAR write or recovery. Return all stripes */
stripes_required = map->num_stripes;

/* Only allocate the map if we've already got a large enough multi_ret */
@@ -1330,7 +1312,7 @@ again:
stripe_offset = offset - stripe_offset;

if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_RAID10 |
BTRFS_BLOCK_GROUP_DUP)) {
/* we limit the length of each bio to what fits in a stripe */
@@ -1369,14 +1351,14 @@ again:
multi->num_stripes = map->num_stripes;
else if (mirror_num)
stripe_index = mirror_num - 1;
- } else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ } else if (map->type & BTRFS_BLOCK_GROUP_PARX) {

if (raid_map) {
int rot;
u64 tmp;
u64 raid56_full_stripe_start;
u64 full_stripe_len = nr_data_stripes(map) * map->stripe_len;
+ int j;

/*
* align the start of our data stripe in the logical
@@ -1399,9 +1381,8 @@ again:
raid_map[(i+rot) % map->num_stripes] =
ce->start + (tmp + i) * map->stripe_len;

- raid_map[(i+rot) % map->num_stripes] = BTRFS_RAID5_P_STRIPE;
- if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- raid_map[(i+rot+1) % map->num_stripes] = BTRFS_RAID6_Q_STRIPE;
+ for (j = 0; j < btrfs_flags_par(map->type); j++)
+ raid_map[(i+rot+j) % map->num_stripes] = BTRFS_RAID_PAR1_STRIPE + j;

*length = map->stripe_len;
stripe_index = 0;
@@ -1413,8 +1394,9 @@ again:

/*
* Mirror #0 or #1 means the original data block.
- * Mirror #2 is RAID5 parity block.
- * Mirror #3 is RAID6 Q block.
+ * Mirror #2 is RAID5/PAR1 P block.
+ * Mirror #3 is RAID6/PAR2 Q block.
+ * .. and so on up to PAR6
*/
if (mirror_num > 1)
stripe_index = nr_data_stripes(map) + mirror_num - 2;
@@ -1838,7 +1820,7 @@ static void split_eb_for_raid56(struct btrfs_fs_info *info,
int ret;

for (i = 0; i < num_stripes; i++) {
- if (raid_map[i] >= BTRFS_RAID5_P_STRIPE)
+ if (raid_map[i] >= BTRFS_RAID_PAR1_STRIPE)
break;

eb = malloc(sizeof(struct extent_buffer) + stripe_len);
@@ -1871,11 +1853,13 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
struct btrfs_multi_bio *multi,
u64 stripe_len, u64 *raid_map)
{
- struct extent_buffer **ebs, *p_eb = NULL, *q_eb = NULL;
+ struct extent_buffer **ebs;
+ struct extent_buffer *p_eb[RAID_PARITY_MAX];
int i;
int j;
int ret;
int alloc_size = eb->len;
+ int np;

ebs = kmalloc(sizeof(*ebs) * multi->num_stripes, GFP_NOFS);
BUG_ON(!ebs);
@@ -1883,12 +1867,16 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
if (stripe_len > alloc_size)
alloc_size = stripe_len;

+ np = 0;
+ for (i = 0; i < RAID_PARITY_MAX; i++)
+ p_eb[i] = NULL;
+
split_eb_for_raid56(info, eb, ebs, stripe_len, raid_map,
multi->num_stripes);

for (i = 0; i < multi->num_stripes; i++) {
struct extent_buffer *new_eb;
- if (raid_map[i] < BTRFS_RAID5_P_STRIPE) {
+ if (raid_map[i] < BTRFS_RAID_PAR1_STRIPE) {
ebs[i]->dev_bytenr = multi->stripes[i].physical;
ebs[i]->fd = multi->stripes[i].dev->fd;
multi->stripes[i].dev->total_ios++;
@@ -1902,35 +1890,33 @@ int write_raid56_with_parity(struct btrfs_fs_info *info,
multi->stripes[i].dev->total_ios++;
new_eb->len = stripe_len;

- if (raid_map[i] == BTRFS_RAID5_P_STRIPE)
- p_eb = new_eb;
- else if (raid_map[i] == BTRFS_RAID6_Q_STRIPE)
- q_eb = new_eb;
+ /* parity index */
+ j = raid_map[i] - BTRFS_RAID_PAR1_STRIPE;
+
+ BUG_ON(j < 0 || j >= RAID_PARITY_MAX);
+
+ p_eb[j] = new_eb;
+
+ /* keep track of the number of parities used */
+ if (j + 1 > np)
+ np = j + 1;
}
- if (q_eb) {
+
+ if (np != 0) {
void **pointers;

- pointers = kmalloc(sizeof(*pointers) * multi->num_stripes,
- GFP_NOFS);
+ pointers = kmalloc(sizeof(*pointers) * multi->num_stripes, GFP_NOFS);
BUG_ON(!pointers);

- ebs[multi->num_stripes - 2] = p_eb;
- ebs[multi->num_stripes - 1] = q_eb;
+ for (i = 0; i < np; i++)
+ ebs[multi->num_stripes - np + i] = p_eb[i];

for (i = 0; i < multi->num_stripes; i++)
pointers[i] = ebs[i]->data;

- raid6_gen_syndrome(multi->num_stripes, stripe_len, pointers);
+ raid_gen(multi->num_stripes - np, np, stripe_len, pointers);
+
kfree(pointers);
- } else {
- ebs[multi->num_stripes - 1] = p_eb;
- memcpy(p_eb->data, ebs[0]->data, stripe_len);
- for (j = 1; j < multi->num_stripes - 1; j++) {
- for (i = 0; i < stripe_len; i += sizeof(unsigned long)) {
- *(unsigned long *)(p_eb->data + i) ^=
- *(unsigned long *)(ebs[j]->data + i);
- }
- }
}

for (i = 0; i < multi->num_stripes; i++) {
diff --git a/volumes.h b/volumes.h
index 2802cb0..0a73084 100644
--- a/volumes.h
+++ b/volumes.h
@@ -137,9 +137,15 @@ struct map_lookup {
#define BTRFS_BALANCE_ARGS_CONVERT (1ULL << 8)
#define BTRFS_BALANCE_ARGS_SOFT (1ULL << 9)

-#define BTRFS_RAID5_P_STRIPE ((u64)-2)
-#define BTRFS_RAID6_Q_STRIPE ((u64)-1)
-
+/*
+ * Parity stripe indexes.
+ */
+#define BTRFS_RAID_PAR1_STRIPE ((u64)-6)
+#define BTRFS_RAID_PAR2_STRIPE ((u64)-5)
+#define BTRFS_RAID_PAR3_STRIPE ((u64)-4)
+#define BTRFS_RAID_PAR4_STRIPE ((u64)-3)
+#define BTRFS_RAID_PAR5_STRIPE ((u64)-2)
+#define BTRFS_RAID_PAR6_STRIPE ((u64)-1)

int __btrfs_map_block(struct btrfs_mapping_tree *map_tree, int rw,
u64 logical, u64 *length, u64 *type,
--
1.7.12.1

2014-02-24 21:17:02

by Andrea Mazzoleni

[permalink] [raw]

Subject: [PATCH v5 2/3] fs: btrfs: Adds new par3456 modes to support up to six parities

Removes the RAID logic now handled in the new raid_gen() and raid_rec() calls
that hide all the details.
Replaces the faila/failb failure indexes with a fail[] vector that keeps
track of up to six failures.
Replaces the existing BLOCK_GROUP_RAID5/6 with new PAR1/2/3/4/5/6 ones that
handle up to six parities, and updates all the code to use them.

Signed-off-by: Andrea Mazzoleni <[email protected]>
---
fs/btrfs/Kconfig | 1 +
fs/btrfs/ctree.h | 50 ++++++--
fs/btrfs/disk-io.c | 7 +-
fs/btrfs/extent-tree.c | 67 +++++++----
fs/btrfs/inode.c | 3 +-
fs/btrfs/raid56.c | 273 ++++++++++++++-----------------------------
fs/btrfs/raid56.h | 19 ++-
fs/btrfs/scrub.c | 3 +-
fs/btrfs/volumes.c | 144 +++++++++++++++--------
include/trace/events/btrfs.h | 16 ++-
include/uapi/linux/btrfs.h | 19 ++-
11 files changed, 313 insertions(+), 289 deletions(-)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index a66768e..fb011b8 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -6,6 +6,7 @@ config BTRFS_FS
select ZLIB_DEFLATE
select LZO_COMPRESS
select LZO_DECOMPRESS
+ select RAID_CAUCHY
select RAID6_PQ
select XOR_BLOCKS

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2c1a42c..7e6d2bf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -522,6 +522,7 @@ struct btrfs_super_block {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_PAR3456 (1ULL << 10)

#define BTRFS_FEATURE_COMPAT_SUPP 0ULL
#define BTRFS_FEATURE_COMPAT_SAFE_SET 0ULL
@@ -539,7 +540,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_PAR3456)

#define BTRFS_FEATURE_INCOMPAT_SAFE_SET \
(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -983,8 +985,39 @@ struct btrfs_dev_replace_item {
#define BTRFS_BLOCK_GROUP_RAID1 (1ULL << 4)
#define BTRFS_BLOCK_GROUP_DUP (1ULL << 5)
#define BTRFS_BLOCK_GROUP_RAID10 (1ULL << 6)
-#define BTRFS_BLOCK_GROUP_RAID5 (1ULL << 7)
-#define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8)
+#define BTRFS_BLOCK_GROUP_PAR1 (1ULL << 7)
+#define BTRFS_BLOCK_GROUP_PAR2 (1ULL << 8)
+#define BTRFS_BLOCK_GROUP_PAR3 (1ULL << 9)
+#define BTRFS_BLOCK_GROUP_PAR4 (1ULL << 10)
+#define BTRFS_BLOCK_GROUP_PAR5 (1ULL << 11)
+#define BTRFS_BLOCK_GROUP_PAR6 (1ULL << 12)
+
+/* tags for all the parity groups */
+#define BTRFS_BLOCK_GROUP_PARX (BTRFS_BLOCK_GROUP_PAR1 | \
+ BTRFS_BLOCK_GROUP_PAR2 | \
+ BTRFS_BLOCK_GROUP_PAR3 | \
+ BTRFS_BLOCK_GROUP_PAR4 | \
+ BTRFS_BLOCK_GROUP_PAR5 | \
+ BTRFS_BLOCK_GROUP_PAR6)
+
+/* gets the parity number from the parity group */
+static inline int btrfs_flags_par(unsigned group)
+{
+ switch (group & BTRFS_BLOCK_GROUP_PARX) {
+ case BTRFS_BLOCK_GROUP_PAR1: return 1;
+ case BTRFS_BLOCK_GROUP_PAR2: return 2;
+ case BTRFS_BLOCK_GROUP_PAR3: return 3;
+ case BTRFS_BLOCK_GROUP_PAR4: return 4;
+ case BTRFS_BLOCK_GROUP_PAR5: return 5;
+ case BTRFS_BLOCK_GROUP_PAR6: return 6;
+ }
+
+ /* ensures that no multiple groups are defined */
+ BUG_ON(group & BTRFS_BLOCK_GROUP_PARX);
+
+ return 0;
+}
+
#define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE

enum btrfs_raid_types {
@@ -993,8 +1026,12 @@ enum btrfs_raid_types {
BTRFS_RAID_DUP,
BTRFS_RAID_RAID0,
BTRFS_RAID_SINGLE,
- BTRFS_RAID_RAID5,
- BTRFS_RAID_RAID6,
+ BTRFS_RAID_PAR1,
+ BTRFS_RAID_PAR2,
+ BTRFS_RAID_PAR3,
+ BTRFS_RAID_PAR4,
+ BTRFS_RAID_PAR5,
+ BTRFS_RAID_PAR6,
BTRFS_NR_RAID_TYPES
};

@@ -1004,8 +1041,7 @@ enum btrfs_raid_types {

#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID1 | \
- BTRFS_BLOCK_GROUP_RAID5 | \
- BTRFS_BLOCK_GROUP_RAID6 | \
+ BTRFS_BLOCK_GROUP_PARX | \
BTRFS_BLOCK_GROUP_DUP | \
BTRFS_BLOCK_GROUP_RAID10)
/*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 81ea553..9931cf3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3337,12 +3337,11 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
num_tolerated_disk_barrier_failures = 0;
else if (num_tolerated_disk_barrier_failures > 1) {
if (flags & (BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5 |
BTRFS_BLOCK_GROUP_RAID10)) {
num_tolerated_disk_barrier_failures = 1;
- } else if (flags &
- BTRFS_BLOCK_GROUP_RAID6) {
- num_tolerated_disk_barrier_failures = 2;
+ } else if (flags & BTRFS_BLOCK_GROUP_PARX) {
+ num_tolerated_disk_barrier_failures
+ = btrfs_flags_par(flags);
}
}
}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 32312e0..a5d1f9d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3516,21 +3516,35 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
/* First, mask out the RAID levels which aren't possible */
if (num_devices == 1)
flags &= ~(BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID5);
+ BTRFS_BLOCK_GROUP_PAR1);
if (num_devices < 3)
- flags &= ~BTRFS_BLOCK_GROUP_RAID6;
+ flags &= ~BTRFS_BLOCK_GROUP_PAR2;
if (num_devices < 4)
- flags &= ~BTRFS_BLOCK_GROUP_RAID10;
+ flags &= ~(BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_PAR3);
+ if (num_devices < 5)
+ flags &= ~BTRFS_BLOCK_GROUP_PAR4;
+ if (num_devices < 6)
+ flags &= ~BTRFS_BLOCK_GROUP_PAR5;
+ if (num_devices < 7)
+ flags &= ~BTRFS_BLOCK_GROUP_PAR6;

tmp = flags & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID10);
+ BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_PARX |
+ BTRFS_BLOCK_GROUP_RAID10);
flags &= ~tmp;

- if (tmp & BTRFS_BLOCK_GROUP_RAID6)
- tmp = BTRFS_BLOCK_GROUP_RAID6;
- else if (tmp & BTRFS_BLOCK_GROUP_RAID5)
- tmp = BTRFS_BLOCK_GROUP_RAID5;
+ if (tmp & BTRFS_BLOCK_GROUP_PAR6)
+ tmp = BTRFS_BLOCK_GROUP_PAR6;
+ else if (tmp & BTRFS_BLOCK_GROUP_PAR5)
+ tmp = BTRFS_BLOCK_GROUP_PAR5;
+ else if (tmp & BTRFS_BLOCK_GROUP_PAR4)
+ tmp = BTRFS_BLOCK_GROUP_PAR4;
+ else if (tmp & BTRFS_BLOCK_GROUP_PAR3)
+ tmp = BTRFS_BLOCK_GROUP_PAR3;
+ else if (tmp & BTRFS_BLOCK_GROUP_PAR2)
+ tmp = BTRFS_BLOCK_GROUP_PAR2;
+ else if (tmp & BTRFS_BLOCK_GROUP_PAR1)
+ tmp = BTRFS_BLOCK_GROUP_PAR1;
else if (tmp & BTRFS_BLOCK_GROUP_RAID10)
tmp = BTRFS_BLOCK_GROUP_RAID10;
else if (tmp & BTRFS_BLOCK_GROUP_RAID1)
@@ -3769,8 +3783,7 @@ static u64 get_system_chunk_thresh(struct btrfs_root *root, u64 type)

if (type & (BTRFS_BLOCK_GROUP_RAID10 |
BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6))
+ BTRFS_BLOCK_GROUP_PARX))
num_dev = root->fs_info->fs_devices->rw_devices;
else if (type & BTRFS_BLOCK_GROUP_RAID1)
num_dev = 2;
@@ -6104,10 +6117,18 @@ int __get_raid_index(u64 flags)
return BTRFS_RAID_DUP;
else if (flags & BTRFS_BLOCK_GROUP_RAID0)
return BTRFS_RAID_RAID0;
- else if (flags & BTRFS_BLOCK_GROUP_RAID5)
- return BTRFS_RAID_RAID5;
- else if (flags & BTRFS_BLOCK_GROUP_RAID6)
- return BTRFS_RAID_RAID6;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR1)
+ return BTRFS_RAID_PAR1;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR2)
+ return BTRFS_RAID_PAR2;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR3)
+ return BTRFS_RAID_PAR3;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR4)
+ return BTRFS_RAID_PAR4;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR5)
+ return BTRFS_RAID_PAR5;
+ else if (flags & BTRFS_BLOCK_GROUP_PAR6)
+ return BTRFS_RAID_PAR6;

return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */
}
@@ -6123,8 +6144,12 @@ static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = {
[BTRFS_RAID_DUP] = "dup",
[BTRFS_RAID_RAID0] = "raid0",
[BTRFS_RAID_SINGLE] = "single",
- [BTRFS_RAID_RAID5] = "raid5",
- [BTRFS_RAID_RAID6] = "raid6",
+ [BTRFS_RAID_PAR1] = "raid5",
+ [BTRFS_RAID_PAR2] = "raid6",
+ [BTRFS_RAID_PAR3] = "par3",
+ [BTRFS_RAID_PAR4] = "par4",
+ [BTRFS_RAID_PAR5] = "par5",
+ [BTRFS_RAID_PAR6] = "par6",
};

static const char *get_raid_name(enum btrfs_raid_types type)
@@ -6269,8 +6294,7 @@ search:
if (!block_group_bits(block_group, flags)) {
u64 extra = BTRFS_BLOCK_GROUP_DUP |
BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_RAID10;

/*
@@ -7856,7 +7880,7 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags)
root->fs_info->fs_devices->missing_devices;

stripped = BTRFS_BLOCK_GROUP_RAID0 |
- BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10;

if (num_devices == 1) {
@@ -8539,8 +8563,7 @@ int btrfs_read_block_groups(struct btrfs_root *root)
if (!(get_alloc_profile(root, space_info->flags) &
(BTRFS_BLOCK_GROUP_RAID10 |
BTRFS_BLOCK_GROUP_RAID1 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6 |
+ BTRFS_BLOCK_GROUP_PARX |
BTRFS_BLOCK_GROUP_DUP)))
continue;
/*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d3d4448..46b4b49 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7184,8 +7184,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
}

/* async crcs make it difficult to collect full stripe writes. */
- if (btrfs_get_alloc_profile(root, 1) &
- (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6))
+ if (btrfs_get_alloc_profile(root, 1) & BTRFS_BLOCK_GROUP_PARX)
async_submit = 0;
else
async_submit = 1;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 9af0b25..c7573dc 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -27,10 +27,10 @@
#include <linux/capability.h>
#include <linux/ratelimit.h>
#include <linux/kthread.h>
-#include <linux/raid/pq.h>
+#include <linux/raid/raid.h>
+#include <linux/raid/helper.h>
#include <linux/hash.h>
#include <linux/list_sort.h>
-#include <linux/raid/xor.h>
#include <linux/vmalloc.h>
#include <asm/div64.h>
#include "ctree.h"
@@ -125,11 +125,11 @@ struct btrfs_raid_bio {
*/
int read_rebuild;

- /* first bad stripe */
- int faila;
+ /* bad stripes */
+ int fail[RAID_PARITY_MAX];

- /* second bad stripe (for raid6 use) */
- int failb;
+ /* number of bad stripes in fail[] */
+ int nr_fail;

/*
* number of pages needed to represent the full
@@ -496,26 +496,6 @@ static void cache_rbio(struct btrfs_raid_bio *rbio)
}

/*
- * helper function to run the xor_blocks api. It is only
- * able to do MAX_XOR_BLOCKS at a time, so we need to
- * loop through.
- */
-static void run_xor(void **pages, int src_cnt, ssize_t len)
-{
- int src_off = 0;
- int xor_src_cnt = 0;
- void *dest = pages[src_cnt];
-
- while(src_cnt > 0) {
- xor_src_cnt = min(src_cnt, MAX_XOR_BLOCKS);
- xor_blocks(xor_src_cnt, len, dest, pages + src_off);
-
- src_cnt -= xor_src_cnt;
- src_off += xor_src_cnt;
- }
-}
-
-/*
* returns true if the bio list inside this rbio
* covers an entire stripe (no rmw required).
* Must be called with the bio list lock held, or
@@ -587,25 +567,18 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
}

/*
- * helper to index into the pstripe
- */
-static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio, int index)
-{
- index += (rbio->nr_data * rbio->stripe_len) >> PAGE_CACHE_SHIFT;
- return rbio->stripe_pages[index];
-}
-
-/*
- * helper to index into the qstripe, returns null
- * if there is no qstripe
+ * helper to index into the parity stripe
+ * returns null if there is no stripe
*/
-static struct page *rbio_qstripe_page(struct btrfs_raid_bio *rbio, int index)
+static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio,
+ int index, int parity)
{
- if (rbio->nr_data + 1 == rbio->bbio->num_stripes)
+ if (rbio->nr_data + parity >= rbio->bbio->num_stripes)
return NULL;

- index += ((rbio->nr_data + 1) * rbio->stripe_len) >>
- PAGE_CACHE_SHIFT;
+ index += ((rbio->nr_data + parity) * rbio->stripe_len)
+ >> PAGE_CACHE_SHIFT;
+
return rbio->stripe_pages[index];
}

@@ -946,8 +919,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_root *root,
rbio->fs_info = root->fs_info;
rbio->stripe_len = stripe_len;
rbio->nr_pages = num_pages;
- rbio->faila = -1;
- rbio->failb = -1;
+ rbio->nr_fail = 0;
atomic_set(&rbio->refs, 1);

/*
@@ -958,10 +930,10 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_root *root,
rbio->stripe_pages = p;
rbio->bio_pages = p + sizeof(struct page *) * num_pages;

- if (raid_map[bbio->num_stripes - 1] == RAID6_Q_STRIPE)
- nr_data = bbio->num_stripes - 2;
- else
- nr_data = bbio->num_stripes - 1;
+ /* get the number of data stripes removing all the parities */
+ nr_data = bbio->num_stripes;
+ while (nr_data > 0 && is_parity_stripe(raid_map[nr_data - 1]))
+ --nr_data;

rbio->nr_data = nr_data;
return rbio;
@@ -1072,8 +1044,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
*/
static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio)
{
- if (rbio->faila >= 0 || rbio->failb >= 0) {
- BUG_ON(rbio->faila == rbio->bbio->num_stripes - 1);
+ if (rbio->nr_fail > 0) {
__raid56_parity_recover(rbio);
} else {
finish_rmw(rbio);
@@ -1137,10 +1108,10 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
void *pointers[bbio->num_stripes];
int stripe_len = rbio->stripe_len;
int nr_data = rbio->nr_data;
+ int nr_parity;
+ int parity;
int stripe;
int pagenr;
- int p_stripe = -1;
- int q_stripe = -1;
struct bio_list bio_list;
struct bio *bio;
int pages_per_stripe = stripe_len >> PAGE_CACHE_SHIFT;
@@ -1148,14 +1119,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)

bio_list_init(&bio_list);

- if (bbio->num_stripes - rbio->nr_data == 1) {
- p_stripe = bbio->num_stripes - 1;
- } else if (bbio->num_stripes - rbio->nr_data == 2) {
- p_stripe = bbio->num_stripes - 2;
- q_stripe = bbio->num_stripes - 1;
- } else {
- BUG();
- }
+ nr_parity = bbio->num_stripes - rbio->nr_data;

/* at this point we either have a full stripe,
* or we've read the full stripe from the drive.
@@ -1194,29 +1158,15 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
pointers[stripe] = kmap(p);
}

- /* then add the parity stripe */
- p = rbio_pstripe_page(rbio, pagenr);
- SetPageUptodate(p);
- pointers[stripe++] = kmap(p);
-
- if (q_stripe != -1) {
-
- /*
- * raid6, add the qstripe and call the
- * library function to fill in our p/q
- */
- p = rbio_qstripe_page(rbio, pagenr);
+ /* then add the parity stripes */
+ for (parity = 0; parity < nr_parity; ++parity) {
+ p = rbio_pstripe_page(rbio, pagenr, parity);
SetPageUptodate(p);
pointers[stripe++] = kmap(p);
-
- raid6_call.gen_syndrome(bbio->num_stripes, PAGE_SIZE,
- pointers);
- } else {
- /* raid5 */
- memcpy(pointers[nr_data], pointers[0], PAGE_SIZE);
- run_xor(pointers + 1, nr_data - 1, PAGE_CACHE_SIZE);
}

+ /* compute the parity */
+ raid_gen(rbio->nr_data, nr_parity, PAGE_SIZE, pointers);

for (stripe = 0; stripe < bbio->num_stripes; stripe++)
kunmap(page_in_rbio(rbio, stripe, pagenr, 0));
@@ -1321,24 +1271,25 @@ static int fail_rbio_index(struct btrfs_raid_bio *rbio, int failed)
{
unsigned long flags;
int ret = 0;
+ int i;

spin_lock_irqsave(&rbio->bio_list_lock, flags);

/* we already know this stripe is bad, move on */
- if (rbio->faila == failed || rbio->failb == failed)
- goto out;
+ for (i = 0; i < rbio->nr_fail; ++i)
+ if (rbio->fail[i] == failed)
+ goto out;

- if (rbio->faila == -1) {
- /* first failure on this rbio */
- rbio->faila = failed;
- atomic_inc(&rbio->bbio->error);
- } else if (rbio->failb == -1) {
- /* second failure on this rbio */
- rbio->failb = failed;
- atomic_inc(&rbio->bbio->error);
- } else {
+ if (rbio->nr_fail == RAID_PARITY_MAX) {
ret = -EIO;
+ goto out;
}
+
+ /* new failure on this rbio */
+ raid_insert(rbio->nr_fail, rbio->fail, failed);
+ ++rbio->nr_fail;
+ atomic_inc(&rbio->bbio->error);
+
out:
spin_unlock_irqrestore(&rbio->bio_list_lock, flags);

@@ -1724,8 +1675,10 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
{
int pagenr, stripe;
void **pointers;
- int faila = -1, failb = -1;
+ int ifail;
int nr_pages = (rbio->stripe_len + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ int nr_parity;
+ int nr_fail;
struct page *page;
int err;
int i;
@@ -1737,8 +1690,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
goto cleanup_io;
}

- faila = rbio->faila;
- failb = rbio->failb;
+ nr_parity = rbio->bbio->num_stripes - rbio->nr_data;
+ nr_fail = rbio->nr_fail;

if (rbio->read_rebuild) {
spin_lock_irq(&rbio->bio_list_lock);
@@ -1752,98 +1705,30 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
/* setup our array of pointers with pages
* from each stripe
*/
+ ifail = 0;
for (stripe = 0; stripe < rbio->bbio->num_stripes; stripe++) {
/*
* if we're rebuilding a read, we have to use
* pages from the bio list
*/
if (rbio->read_rebuild &&
- (stripe == faila || stripe == failb)) {
+ rbio->fail[ifail] == stripe) {
page = page_in_rbio(rbio, stripe, pagenr, 0);
+ ++ifail;
} else {
page = rbio_stripe_page(rbio, stripe, pagenr);
}
pointers[stripe] = kmap(page);
}

- /* all raid6 handling here */
- if (rbio->raid_map[rbio->bbio->num_stripes - 1] ==
- RAID6_Q_STRIPE) {
-
- /*
- * single failure, rebuild from parity raid5
- * style
- */
- if (failb < 0) {
- if (faila == rbio->nr_data) {
- /*
- * Just the P stripe has failed, without
- * a bad data or Q stripe.
- * TODO, we should redo the xor here.
- */
- err = -EIO;
- goto cleanup;
- }
- /*
- * a single failure in raid6 is rebuilt
- * in the pstripe code below
- */
- goto pstripe;
- }
-
- /* make sure our ps and qs are in order */
- if (faila > failb) {
- int tmp = failb;
- failb = faila;
- faila = tmp;
- }
-
- /* if the q stripe is failed, do a pstripe reconstruction
- * from the xors.
- * If both the q stripe and the P stripe are failed, we're
- * here due to a crc mismatch and we can't give them the
- * data they want
- */
- if (rbio->raid_map[failb] == RAID6_Q_STRIPE) {
- if (rbio->raid_map[faila] == RAID5_P_STRIPE) {
- err = -EIO;
- goto cleanup;
- }
- /*
- * otherwise we have one bad data stripe and
- * a good P stripe. raid5!
- */
- goto pstripe;
- }
-
- if (rbio->raid_map[failb] == RAID5_P_STRIPE) {
- raid6_datap_recov(rbio->bbio->num_stripes,
- PAGE_SIZE, faila, pointers);
- } else {
- raid6_2data_recov(rbio->bbio->num_stripes,
- PAGE_SIZE, faila, failb,
- pointers);
- }
- } else {
- void *p;
-
- /* rebuild from P stripe here (raid5 or raid6) */
- BUG_ON(failb != -1);
-pstripe:
- /* Copy parity block into failed block to start with */
- memcpy(pointers[faila],
- pointers[rbio->nr_data],
- PAGE_CACHE_SIZE);
-
- /* rearrange the pointer array */
- p = pointers[faila];
- for (stripe = faila; stripe < rbio->nr_data - 1; stripe++)
- pointers[stripe] = pointers[stripe + 1];
- pointers[rbio->nr_data - 1] = p;
-
- /* xor in the rest */
- run_xor(pointers, rbio->nr_data - 1, PAGE_CACHE_SIZE);
+ /* if we have too many failure */
+ if (nr_fail > nr_parity) {
+ err = -EIO;
+ goto cleanup;
}
+ raid_rec(nr_fail, rbio->fail, rbio->nr_data, nr_parity,
+ PAGE_SIZE, pointers);
+
/* if we're doing this rebuild as part of an rmw, go through
* and set all of our private rbio pages in the
* failed stripes as uptodate. This way finish_rmw will
@@ -1852,24 +1737,23 @@ pstripe:
*/
if (!rbio->read_rebuild) {
for (i = 0; i < nr_pages; i++) {
- if (faila != -1) {
- page = rbio_stripe_page(rbio, faila, i);
- SetPageUptodate(page);
- }
- if (failb != -1) {
- page = rbio_stripe_page(rbio, failb, i);
+ for (ifail = 0; ifail < nr_fail; ++ifail) {
+ int sfail = rbio->fail[ifail];
+ page = rbio_stripe_page(rbio, sfail, i);
SetPageUptodate(page);
}
}
}
+ ifail = 0;
for (stripe = 0; stripe < rbio->bbio->num_stripes; stripe++) {
/*
* if we're rebuilding a read, we have to use
* pages from the bio list
*/
if (rbio->read_rebuild &&
- (stripe == faila || stripe == failb)) {
+ rbio->fail[ifail] == stripe) {
page = page_in_rbio(rbio, stripe, pagenr, 0);
+ ++ifail;
} else {
page = rbio_stripe_page(rbio, stripe, pagenr);
}
@@ -1891,8 +1775,7 @@ cleanup_io:

rbio_orig_end_io(rbio, err, err == 0);
} else if (err == 0) {
- rbio->faila = -1;
- rbio->failb = -1;
+ rbio->nr_fail = 0;
finish_rmw(rbio);
} else {
rbio_orig_end_io(rbio, err, 0);
@@ -1939,6 +1822,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
int bios_to_read = 0;
struct btrfs_bio *bbio = rbio->bbio;
struct bio_list bio_list;
+ int ifail;
int ret;
int nr_pages = (rbio->stripe_len + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
int pagenr;
@@ -1958,10 +1842,12 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
* stripe cache, it is possible that some or all of these
* pages are going to be uptodate.
*/
+ ifail = 0;
for (stripe = 0; stripe < bbio->num_stripes; stripe++) {
- if (rbio->faila == stripe ||
- rbio->failb == stripe)
+ if (rbio->fail[ifail] == stripe) {
+ ++ifail;
continue;
+ }

for (pagenr = 0; pagenr < nr_pages; pagenr++) {
struct page *p;
@@ -2037,6 +1923,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
{
struct btrfs_raid_bio *rbio;
int ret;
+ int i;

rbio = alloc_rbio(root, bbio, raid_map, stripe_len);
if (IS_ERR(rbio))
@@ -2046,21 +1933,33 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
bio_list_add(&rbio->bio_list, bio);
rbio->bio_list_bytes = bio->bi_iter.bi_size;

- rbio->faila = find_logical_bio_stripe(rbio, bio);
- if (rbio->faila == -1) {
+ rbio->fail[0] = find_logical_bio_stripe(rbio, bio);
+ if (rbio->fail[0] == -1) {
BUG();
kfree(raid_map);
kfree(bbio);
kfree(rbio);
return -EIO;
}
+ rbio->nr_fail = 1;

/*
- * reconstruct from the q stripe if they are
- * asking for mirror 3
+ * Reconstruct from other parity stripes if they are
+ * asking for different mirrors.
+ * For each mirror we disable one extra parity to trigger
+ * a different recovery.
+ * With mirror_num == 2 we disable nothing and we reconstruct
+ * with the first parity, with mirror_num == 3 we disable the
+ * first parity and then we reconstruct with the second,
+ * and so on, up to mirror_num == 7 where we disable the first 5
+ * parity levels and we recover with the 6 one.
*/
- if (mirror_num == 3)
- rbio->failb = bbio->num_stripes - 2;
+ if (mirror_num > 2 && mirror_num - 2 < RAID_PARITY_MAX) {
+ for (i = 0; i < mirror_num - 2; ++i) {
+ raid_insert(rbio->nr_fail, rbio->fail, rbio->nr_data + i);
+ ++rbio->nr_fail;
+ }
+ }

ret = lock_stripe_add(rbio);

diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index ea5d73b..b1082b6 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -21,23 +21,22 @@
#define __BTRFS_RAID56__
static inline int nr_parity_stripes(struct map_lookup *map)
{
- if (map->type & BTRFS_BLOCK_GROUP_RAID5)
- return 1;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- return 2;
- else
- return 0;
+ return btrfs_flags_par(map->type);
}

static inline int nr_data_stripes(struct map_lookup *map)
{
return map->num_stripes - nr_parity_stripes(map);
}
-#define RAID5_P_STRIPE ((u64)-2)
-#define RAID6_Q_STRIPE ((u64)-1)

-#define is_parity_stripe(x) (((x) == RAID5_P_STRIPE) || \
- ((x) == RAID6_Q_STRIPE))
+#define BTRFS_RAID_PAR1_STRIPE ((u64)-6)
+#define BTRFS_RAID_PAR2_STRIPE ((u64)-5)
+#define BTRFS_RAID_PAR3_STRIPE ((u64)-4)
+#define BTRFS_RAID_PAR4_STRIPE ((u64)-3)
+#define BTRFS_RAID_PAR5_STRIPE ((u64)-2)
+#define BTRFS_RAID_PAR6_STRIPE ((u64)-1)
+
+#define is_parity_stripe(x) (((u64)(x) >= BTRFS_RAID_PAR1_STRIPE))

int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
struct btrfs_bio *bbio, u64 *raid_map,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index efba5d1..495c13e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2259,8 +2259,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
int extent_mirror_num;
int stop_loop;

- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ if (map->type & BTRFS_BLOCK_GROUP_PARX) {
if (num >= nr_data_stripes(map)) {
return 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bab0b84..acafb50 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1525,17 +1525,41 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
goto out;
}

- if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) &&
+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR1) &&
root->fs_info->fs_devices->rw_devices <= 2) {
ret = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET;
goto out;
}
- if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) &&
+
+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR2) &&
root->fs_info->fs_devices->rw_devices <= 3) {
ret = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET;
goto out;
}

+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR3) &&
+ root->fs_info->fs_devices->rw_devices <= 4) {
+ ret = BTRFS_ERROR_DEV_PAR3_MIN_NOT_MET;
+ goto out;
+ }
+
+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR4) &&
+ root->fs_info->fs_devices->rw_devices <= 5) {
+ ret = BTRFS_ERROR_DEV_PAR4_MIN_NOT_MET;
+ goto out;
+ }
+
+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR5) &&
+ root->fs_info->fs_devices->rw_devices <= 6) {
+ ret = BTRFS_ERROR_DEV_PAR5_MIN_NOT_MET;
+ goto out;
+ }
+
+ if ((all_avail & BTRFS_BLOCK_GROUP_PAR6) &&
+ root->fs_info->fs_devices->rw_devices <= 7) {
+ ret = BTRFS_ERROR_DEV_PAR6_MIN_NOT_MET;
+ goto out;
+ }
if (strcmp(device_path, "missing") == 0) {
struct list_head *devices;
struct btrfs_device *tmp;
@@ -2797,10 +2821,8 @@ static int chunk_drange_filter(struct extent_buffer *leaf,
if (btrfs_chunk_type(leaf, chunk) & (BTRFS_BLOCK_GROUP_DUP |
BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10)) {
factor = num_stripes / 2;
- } else if (btrfs_chunk_type(leaf, chunk) & BTRFS_BLOCK_GROUP_RAID5) {
- factor = num_stripes - 1;
- } else if (btrfs_chunk_type(leaf, chunk) & BTRFS_BLOCK_GROUP_RAID6) {
- factor = num_stripes - 2;
+ } else if (btrfs_chunk_type(leaf, chunk) & BTRFS_BLOCK_GROUP_PARX) {
+ factor = num_stripes - btrfs_flags_par(btrfs_chunk_type(leaf, chunk));
} else {
factor = num_stripes;
}
@@ -3158,10 +3180,18 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
else if (num_devices > 1)
allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
if (num_devices > 2)
- allowed |= BTRFS_BLOCK_GROUP_RAID5;
+ allowed |= BTRFS_BLOCK_GROUP_PAR1;
if (num_devices > 3)
allowed |= (BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID6);
+ BTRFS_BLOCK_GROUP_PAR2);
+ if (num_devices > 4)
+ allowed |= BTRFS_BLOCK_GROUP_PAR3;
+ if (num_devices > 5)
+ allowed |= BTRFS_BLOCK_GROUP_PAR4;
+ if (num_devices > 6)
+ allowed |= BTRFS_BLOCK_GROUP_PAR5;
+ if (num_devices > 7)
+ allowed |= BTRFS_BLOCK_GROUP_PAR6;
if ((bctl->data.flags & BTRFS_BALANCE_ARGS_CONVERT) &&
(!alloc_profile_is_valid(bctl->data.target, 1) ||
(bctl->data.target & ~allowed))) {
@@ -3201,8 +3231,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
/* allow to reduce meta or sys integrity only if force set */
allowed = BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6;
+ BTRFS_BLOCK_GROUP_PARX;
do {
seq = read_seqbegin(&fs_info->profiles_lock);

@@ -3940,7 +3969,7 @@ static struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
.devs_increment = 1,
.ncopies = 1,
},
- [BTRFS_RAID_RAID5] = {
+ [BTRFS_RAID_PAR1] = {
.sub_stripes = 1,
.dev_stripes = 1,
.devs_max = 0,
@@ -3948,7 +3977,7 @@ static struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
.devs_increment = 1,
.ncopies = 2,
},
- [BTRFS_RAID_RAID6] = {
+ [BTRFS_RAID_PAR2] = {
.sub_stripes = 1,
.dev_stripes = 1,
.devs_max = 0,
@@ -3956,6 +3985,38 @@ static struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
.devs_increment = 1,
.ncopies = 3,
},
+ [BTRFS_RAID_PAR3] = {
+ .sub_stripes = 1,
+ .dev_stripes = 1,
+ .devs_max = 0,
+ .devs_min = 4,
+ .devs_increment = 1,
+ .ncopies = 4,
+ },
+ [BTRFS_RAID_PAR4] = {
+ .sub_stripes = 1,
+ .dev_stripes = 1,
+ .devs_max = 0,
+ .devs_min = 5,
+ .devs_increment = 1,
+ .ncopies = 5,
+ },
+ [BTRFS_RAID_PAR5] = {
+ .sub_stripes = 1,
+ .dev_stripes = 1,
+ .devs_max = 0,
+ .devs_min = 6,
+ .devs_increment = 1,
+ .ncopies = 6,
+ },
+ [BTRFS_RAID_PAR6] = {
+ .sub_stripes = 1,
+ .dev_stripes = 1,
+ .devs_max = 0,
+ .devs_min = 7,
+ .devs_increment = 1,
+ .ncopies = 7,
+ },
};

static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
@@ -3966,7 +4027,7 @@ static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)

static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
{
- if (!(type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)))
+ if (!(type & BTRFS_BLOCK_GROUP_PARX))
return;

btrfs_set_fs_incompat(info, RAID56);
@@ -4134,15 +4195,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
*/
data_stripes = num_stripes / ncopies;

- if (type & BTRFS_BLOCK_GROUP_RAID5) {
- raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
+ if (type & BTRFS_BLOCK_GROUP_PARX) {
+ int nr_par = btrfs_flags_par(type);
+ raid_stripe_len = find_raid56_stripe_len(ndevs - nr_par,
btrfs_super_stripesize(info->super_copy));
- data_stripes = num_stripes - 1;
- }
- if (type & BTRFS_BLOCK_GROUP_RAID6) {
- raid_stripe_len = find_raid56_stripe_len(ndevs - 2,
- btrfs_super_stripesize(info->super_copy));
- data_stripes = num_stripes - 2;
+ data_stripes = num_stripes - nr_par;
}

/*
@@ -4500,10 +4557,8 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
ret = map->num_stripes;
else if (map->type & BTRFS_BLOCK_GROUP_RAID10)
ret = map->sub_stripes;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
- ret = 2;
- else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- ret = 3;
+ else if (map->type & BTRFS_BLOCK_GROUP_PARX)
+ ret = 1 + btrfs_flags_par(map->type);
else
ret = 1;
free_extent_map(em);
@@ -4532,10 +4587,9 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root,

BUG_ON(em->start > logical || em->start + em->len < logical);
map = (struct map_lookup *)em->bdev;
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ if (map->type & BTRFS_BLOCK_GROUP_PARX)
len = map->stripe_len * nr_data_stripes(map);
- }
+
free_extent_map(em);
return len;
}
@@ -4555,8 +4609,7 @@ int btrfs_is_parity_mirror(struct btrfs_mapping_tree *map_tree,

BUG_ON(em->start > logical || em->start + em->len < logical);
map = (struct map_lookup *)em->bdev;
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6))
+ if (map->type & BTRFS_BLOCK_GROUP_PARX)
ret = 1;
free_extent_map(em);
return ret;
@@ -4694,7 +4747,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
stripe_offset = offset - stripe_offset;

/* if we're here for raid56, we need to know the stripe aligned start */
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+ if (map->type & BTRFS_BLOCK_GROUP_PARX) {
unsigned long full_stripe_len = stripe_len * nr_data_stripes(map);
raid56_full_stripe_start = offset;

@@ -4707,8 +4760,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,

if (rw & REQ_DISCARD) {
/* we don't discard raid56 yet */
- if (map->type &
- (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) {
+ if (map->type & BTRFS_BLOCK_GROUP_PARX) {
ret = -EOPNOTSUPP;
goto out;
}
@@ -4718,7 +4770,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
/* For writes to RAID[56], allow a full stripeset across all disks.
For other RAID types and for RAID[56] reads, just allow a single
stripe (on a single disk). */
- if (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6) &&
+ if (map->type & BTRFS_BLOCK_GROUP_PARX &&
(rw & REQ_WRITE)) {
max_len = stripe_len * nr_data_stripes(map) -
(offset - raid56_full_stripe_start);
@@ -4882,13 +4934,12 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
mirror_num = stripe_index - old_stripe_index + 1;
}

- } else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ } else if (map->type & BTRFS_BLOCK_GROUP_PARX) {
u64 tmp;

if (bbio_ret && ((rw & REQ_WRITE) || mirror_num > 1)
&& raid_map_ret) {
- int i, rot;
+ int i, j, rot;

/* push stripe_nr back to the start of the full stripe */
stripe_nr = raid56_full_stripe_start;
@@ -4917,10 +4968,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
raid_map[(i+rot) % num_stripes] =
em->start + (tmp + i) * map->stripe_len;

- raid_map[(i+rot) % map->num_stripes] = RAID5_P_STRIPE;
- if (map->type & BTRFS_BLOCK_GROUP_RAID6)
- raid_map[(i+rot+1) % num_stripes] =
- RAID6_Q_STRIPE;
+ for (j = 0; j < btrfs_flags_par(map->type); j++)
+ raid_map[(i+rot+j) % num_stripes] = BTRFS_RAID_PAR1_STRIPE + j;

*length = map->stripe_len;
stripe_index = 0;
@@ -4928,8 +4977,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
} else {
/*
* Mirror #0 or #1 means the original data block.
- * Mirror #2 is RAID5 parity block.
- * Mirror #3 is RAID6 Q block.
+ * Mirror #2 is RAID5/PAR1 P block.
+ * Mirror #3 is RAID6/PAR2 Q block.
+ * .. and so on up to PAR6
*/
stripe_index = do_div(stripe_nr, nr_data_stripes(map));
if (mirror_num > 1)
@@ -5049,11 +5099,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
if (rw & (REQ_WRITE | REQ_GET_READ_MIRRORS)) {
if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID10 |
- BTRFS_BLOCK_GROUP_RAID5 |
BTRFS_BLOCK_GROUP_DUP)) {
max_errors = 1;
- } else if (map->type & BTRFS_BLOCK_GROUP_RAID6) {
- max_errors = 2;
+ } else if (map->type & BTRFS_BLOCK_GROUP_PARX) {
+ max_errors = btrfs_flags_par(map->type);
}
}

@@ -5212,8 +5261,7 @@ int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree,
do_div(length, map->num_stripes / map->sub_stripes);
else if (map->type & BTRFS_BLOCK_GROUP_RAID0)
do_div(length, map->num_stripes);
- else if (map->type & (BTRFS_BLOCK_GROUP_RAID5 |
- BTRFS_BLOCK_GROUP_RAID6)) {
+ else if (map->type & BTRFS_BLOCK_GROUP_PARX) {
do_div(length, nr_data_stripes(map));
rmap_len = map->stripe_len * nr_data_stripes(map);
}
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 3176cdc..98a9c78 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -58,8 +58,12 @@ struct extent_buffer;
{ BTRFS_BLOCK_GROUP_RAID1, "RAID1"}, \
{ BTRFS_BLOCK_GROUP_DUP, "DUP"}, \
{ BTRFS_BLOCK_GROUP_RAID10, "RAID10"}, \
- { BTRFS_BLOCK_GROUP_RAID5, "RAID5"}, \
- { BTRFS_BLOCK_GROUP_RAID6, "RAID6"}
+ { BTRFS_BLOCK_GROUP_PAR1, "RAID5"}, \
+ { BTRFS_BLOCK_GROUP_PAR2, "RAID6"}, \
+ { BTRFS_BLOCK_GROUP_PAR3, "PAR3"}, \
+ { BTRFS_BLOCK_GROUP_PAR4, "PAR4"}, \
+ { BTRFS_BLOCK_GROUP_PAR5, "PAR5"}, \
+ { BTRFS_BLOCK_GROUP_PAR6, "PAR6"}

#define BTRFS_UUID_SIZE 16

@@ -623,8 +627,12 @@ DEFINE_EVENT(btrfs_delayed_ref_head, run_delayed_ref_head,
{ BTRFS_BLOCK_GROUP_RAID1, "RAID1" }, \
{ BTRFS_BLOCK_GROUP_DUP, "DUP" }, \
{ BTRFS_BLOCK_GROUP_RAID10, "RAID10"}, \
- { BTRFS_BLOCK_GROUP_RAID5, "RAID5" }, \
- { BTRFS_BLOCK_GROUP_RAID6, "RAID6" })
+ { BTRFS_BLOCK_GROUP_PAR1, "RAID5" }, \
+ { BTRFS_BLOCK_GROUP_PAR2, "RAID6" }, \
+ { BTRFS_BLOCK_GROUP_PAR3, "PAR3" }, \
+ { BTRFS_BLOCK_GROUP_PAR4, "PAR4" }, \
+ { BTRFS_BLOCK_GROUP_PAR5, "PAR5" }, \
+ { BTRFS_BLOCK_GROUP_PAR6, "PAR6" })

DECLARE_EVENT_CLASS(btrfs__chunk,

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index b4d6909..ba120ba 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -488,8 +488,13 @@ enum btrfs_err_code {
BTRFS_ERROR_DEV_TGT_REPLACE,
BTRFS_ERROR_DEV_MISSING_NOT_FOUND,
BTRFS_ERROR_DEV_ONLY_WRITABLE,
- BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS
+ BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS,
+ BTRFS_ERROR_DEV_PAR3_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR4_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR5_MIN_NOT_MET,
+ BTRFS_ERROR_DEV_PAR6_MIN_NOT_MET
};
+
/* An error code to error string mapping for the kernel
* error codes
*/
@@ -501,9 +506,9 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
case BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET:
return "unable to go below four devices on raid10";
case BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET:
- return "unable to go below two devices on raid5";
+ return "unable to go below two devices on raid5/par1";
case BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET:
- return "unable to go below three devices on raid6";
+ return "unable to go below three devices on raid6/par2";
case BTRFS_ERROR_DEV_TGT_REPLACE:
return "unable to remove the dev_replace target dev";
case BTRFS_ERROR_DEV_MISSING_NOT_FOUND:
@@ -513,6 +518,14 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
case BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS:
return "add/delete/balance/replace/resize operation "\
"in progress";
+ case BTRFS_ERROR_DEV_PAR3_MIN_NOT_MET:
+ return "unable to go below four devices on par3";
+ case BTRFS_ERROR_DEV_PAR4_MIN_NOT_MET:
+ return "unable to go below five devices on par4";
+ case BTRFS_ERROR_DEV_PAR5_MIN_NOT_MET:
+ return "unable to go below six devices on par5";
+ case BTRFS_ERROR_DEV_PAR6_MIN_NOT_MET:
+ return "unable to go below seven devices on par5";
default:
return NULL;
}
--
1.7.12.1