2011-12-03 14:10:08

by Jussi Kivilinna

[permalink] [raw]
Subject: [PATCH 3.2] crypto: twofish-x86_64-3way - blacklist pentium4 and atom

Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
of twofish-x86_64 module. So blacklist these CPUs.

Signed-off-by: Jussi Kivilinna <[email protected]>
---
arch/x86/crypto/twofish_glue_3way.c | 47 +++++++++++++++++++++++++++++++++++
1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 5ede9c4..4897b6b 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -25,6 +25,7 @@
*
*/

+#include <asm/processor.h>
#include <linux/crypto.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -432,10 +433,56 @@ static struct crypto_alg blk_ctr_alg = {
},
};

+static bool is_blacklisted_cpu(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return false;
+
+ if (boot_cpu_data.x86 == 0x06 &&
+ (boot_cpu_data.x86_model == 0x1c ||
+ boot_cpu_data.x86_model == 0x26 ||
+ boot_cpu_data.x86_model == 0x36)) {
+ /*
+ * On Atom, twofish-3way is slower than original assembler
+ * implementation. Twofish-3way trades off some performance in
+ * storing blocks in 64bit registers to allow three blocks to
+ * be processed parallel. Parallel operation then allows gaining
+ * more performance than was trade off, on out-of-order CPUs.
+ * However Atom does not benefit from this parallellism and
+ * should be blacklisted.
+ */
+ return true;
+ }
+
+ if (boot_cpu_data.x86 == 0x0f) {
+ /*
+ * On Pentium 4, twofish-3way is slower than original assembler
+ * implementation because excessive uses of 64bit rotate and
+ * left-shifts (which are really slow on P4) needed to store and
+ * handle 128bit block in two 64bit registers.
+ */
+ return true;
+ }
+
+ return false;
+}
+
+static int force;
+module_param(force, int, 0);
+MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+
int __init init(void)
{
int err;

+ if (!force && is_blacklisted_cpu()) {
+ printk(KERN_INFO
+ "twofish-x86_64-3way: performance on this CPU "
+ "would be suboptimal: disabling "
+ "twofish-x86_64-3way.\n");
+ return -ENODEV;
+ }
+
err = crypto_register_alg(&blk_ecb_alg);
if (err)
goto ecb_err;


2011-12-20 07:19:47

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3.2] crypto: twofish-x86_64-3way - blacklist pentium4 and atom

On Sat, Dec 03, 2011 at 04:10:04PM +0200, Jussi Kivilinna wrote:
> Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
> of twofish-x86_64 module. So blacklist these CPUs.
>
> Signed-off-by: Jussi Kivilinna <[email protected]>

Please send patches based on cryptodev instead. If necessary I
can then rebase them to crypto.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2011-12-20 10:20:44

by Jussi Kivilinna

[permalink] [raw]
Subject: [PATCH 2/2] crypto: blowfish-x86_64 - blacklist Pentium 4

Implementation in blowfish-x86_64 uses 64bit rotations which are slow on P4,
making blowfish-x86_64 slower than generic C implementation. Therefore
blacklist P4.

Signed-off-by: Jussi Kivilinna <[email protected]>
---
arch/x86/crypto/blowfish_glue.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/arch/x86/crypto/blowfish_glue.c b/arch/x86/crypto/blowfish_glue.c
index b05aa16..2970110 100644
--- a/arch/x86/crypto/blowfish_glue.c
+++ b/arch/x86/crypto/blowfish_glue.c
@@ -25,6 +25,7 @@
*
*/

+#include <asm/processor.h>
#include <crypto/blowfish.h>
#include <linux/crypto.h>
#include <linux/init.h>
@@ -446,10 +447,39 @@ static struct crypto_alg blk_ctr_alg = {
},
};

+static bool is_blacklisted_cpu(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return false;
+
+ if (boot_cpu_data.x86 == 0x0f) {
+ /*
+ * On Pentium 4, blowfish-x86_64 is slower than generic C
+ * implementation because use of 64bit rotates (which are really
+ * slow on P4). Therefore blacklist P4s.
+ */
+ return true;
+ }
+
+ return false;
+}
+
+static int force;
+module_param(force, int, 0);
+MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+
static int __init init(void)
{
int err;

+ if (!force && is_blacklisted_cpu()) {
+ printk(KERN_INFO
+ "blowfish-x86_64: performance on this CPU "
+ "would be suboptimal: disabling "
+ "blowfish-x86_64.\n");
+ return -ENODEV;
+ }
+
err = crypto_register_alg(&bf_alg);
if (err)
goto bf_err;

2011-12-20 10:21:02

by Jussi Kivilinna

[permalink] [raw]
Subject: [PATCH 1/2] crypto: twofish-x86_64-3way - blacklist pentium4 and atom

Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
of twofish-x86_64 module. So blacklist these CPUs.

Signed-off-by: Jussi Kivilinna <[email protected]>
---
arch/x86/crypto/twofish_glue_3way.c | 47 +++++++++++++++++++++++++++++++++++
1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/crypto/twofish_glue_3way.c b/arch/x86/crypto/twofish_glue_3way.c
index 7fee8c1..0afd134 100644
--- a/arch/x86/crypto/twofish_glue_3way.c
+++ b/arch/x86/crypto/twofish_glue_3way.c
@@ -25,6 +25,7 @@
*
*/

+#include <asm/processor.h>
#include <linux/crypto.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -637,10 +638,56 @@ static struct crypto_alg blk_xts_alg = {
},
};

+static bool is_blacklisted_cpu(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return false;
+
+ if (boot_cpu_data.x86 == 0x06 &&
+ (boot_cpu_data.x86_model == 0x1c ||
+ boot_cpu_data.x86_model == 0x26 ||
+ boot_cpu_data.x86_model == 0x36)) {
+ /*
+ * On Atom, twofish-3way is slower than original assembler
+ * implementation. Twofish-3way trades off some performance in
+ * storing blocks in 64bit registers to allow three blocks to
+ * be processed parallel. Parallel operation then allows gaining
+ * more performance than was trade off, on out-of-order CPUs.
+ * However Atom does not benefit from this parallellism and
+ * should be blacklisted.
+ */
+ return true;
+ }
+
+ if (boot_cpu_data.x86 == 0x0f) {
+ /*
+ * On Pentium 4, twofish-3way is slower than original assembler
+ * implementation because excessive uses of 64bit rotate and
+ * left-shifts (which are really slow on P4) needed to store and
+ * handle 128bit block in two 64bit registers.
+ */
+ return true;
+ }
+
+ return false;
+}
+
+static int force;
+module_param(force, int, 0);
+MODULE_PARM_DESC(force, "Force module load, ignore CPU blacklist");
+
int __init init(void)
{
int err;

+ if (!force && is_blacklisted_cpu()) {
+ printk(KERN_INFO
+ "twofish-x86_64-3way: performance on this CPU "
+ "would be suboptimal: disabling "
+ "twofish-x86_64-3way.\n");
+ return -ENODEV;
+ }
+
err = crypto_register_alg(&blk_ecb_alg);
if (err)
goto ecb_err;

2012-01-13 05:36:13

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/2] crypto: twofish-x86_64-3way - blacklist pentium4 and atom

On Tue, Dec 20, 2011 at 12:20:16PM +0200, Jussi Kivilinna wrote:
> Performance of twofish-x86_64-3way on Intel Pentium 4 and Atom is lower than
> of twofish-x86_64 module. So blacklist these CPUs.
>
> Signed-off-by: Jussi Kivilinna <[email protected]>

Both patches applied. Thanks!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt