2024-05-14 17:39:15

by Danny Tsen

[permalink] [raw]
Subject: [PATCH 0/3] crypto: X25519 supports for ppc64le

This patch series provide X25519 support for ppc64le with a new module
curve25519-ppc64le.

The implementation is based on CRYPTOGAMs perl output from x25519-ppc64.pl.
Modified and added 3 supporting functions.

This patch has passed the selftest by running modprobe
curve25519-ppc64le.

Danny Tsen (3):
X25519 low-level primitives for ppc64le.
X25519 core functions to handle scalar multiplication for ppc64le.
Update Kconfig and Makefile.

arch/powerpc/crypto/Kconfig | 11 +
arch/powerpc/crypto/Makefile | 2 +
arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 ++++++++
arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 ++++++++++++++++++
4 files changed, 960 insertions(+)
create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c
create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S

--
2.31.1



2024-05-14 17:39:21

by Danny Tsen

[permalink] [raw]
Subject: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
and x25519_fe51_tobytes.

Signed-off-by: Danny Tsen <[email protected]>
---
arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++++++++++++++++++
1 file changed, 648 insertions(+)
create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S

diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
new file mode 100644
index 000000000000..8a018104838a
--- /dev/null
+++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
@@ -0,0 +1,648 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#
+# Copyright 2024- IBM Corp. All Rights Reserved.
+#
+# This code is taken from CRYPTOGAMs[1] and is included here using the option
+# in the license to distribute the code under the GPL. Therefore this program
+# is free software; you can redistribute it and/or modify it under the terms of
+# the GNU General Public License version 2 as published by the Free Software
+# Foundation.
+#
+# [1] https://www.openssl.org/~appro/cryptogams/
+
+# Copyright (c) 2006-2017, CRYPTOGAMS by <[email protected]>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain copyright notices,
+# this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above
+# copyright notice, this list of conditions and the following
+# disclaimer in the documentation and/or other materials
+# provided with the distribution.
+#
+# * Neither the name of the CRYPTOGAMS nor the names of its
+# copyright holder and contributors may be used to endorse or
+# promote products derived from this software without specific
+# prior written permission.
+#
+# ALTERNATIVELY, provided that this notice is retained in full, this
+# product may be distributed under the terms of the GNU General Public
+# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
+# those given above.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# ====================================================================
+# Written by Andy Polyakov <[email protected]> for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see https://www.openssl.org/~appro/cryptogams/.
+# ====================================================================
+
+#
+# ====================================================================
+# Written and Modified by Danny Tsen <[email protected]>
+# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes
+#
+# X25519 lower-level primitives for PPC64.
+#
+
+#include <linux/linkage.h>
+
+.machine "any"
+.abiversion 2
+.text
+
+SYM_FUNC_START(x25519_fe51_mul)
+.align 5
+
+ stdu 1,-144(1)
+ std 21,56(1)
+ std 22,64(1)
+ std 23,72(1)
+ std 24,80(1)
+ std 25,88(1)
+ std 26,96(1)
+ std 27,104(1)
+ std 28,112(1)
+ std 29,120(1)
+ std 30,128(1)
+ std 31,136(1)
+
+ ld 6,0(5)
+ ld 7,0(4)
+ ld 8,8(4)
+ ld 9,16(4)
+ ld 10,24(4)
+ ld 11,32(4)
+
+ mulld 22,7,6
+ mulhdu 23,7,6
+
+ mulld 24,8,6
+ mulhdu 25,8,6
+
+ mulld 30,11,6
+ mulhdu 31,11,6
+ ld 4,8(5)
+ mulli 11,11,19
+
+ mulld 26,9,6
+ mulhdu 27,9,6
+
+ mulld 28,10,6
+ mulhdu 29,10,6
+ mulld 12,11,4
+ mulhdu 21,11,4
+ addc 22,22,12
+ adde 23,23,21
+
+ mulld 12,7,4
+ mulhdu 21,7,4
+ addc 24,24,12
+ adde 25,25,21
+
+ mulld 12,10,4
+ mulhdu 21,10,4
+ ld 6,16(5)
+ mulli 10,10,19
+ addc 30,30,12
+ adde 31,31,21
+
+ mulld 12,8,4
+ mulhdu 21,8,4
+ addc 26,26,12
+ adde 27,27,21
+
+ mulld 12,9,4
+ mulhdu 21,9,4
+ addc 28,28,12
+ adde 29,29,21
+ mulld 12,10,6
+ mulhdu 21,10,6
+ addc 22,22,12
+ adde 23,23,21
+
+ mulld 12,11,6
+ mulhdu 21,11,6
+ addc 24,24,12
+ adde 25,25,21
+
+ mulld 12,9,6
+ mulhdu 21,9,6
+ ld 4,24(5)
+ mulli 9,9,19
+ addc 30,30,12
+ adde 31,31,21
+
+ mulld 12,7,6
+ mulhdu 21,7,6
+ addc 26,26,12
+ adde 27,27,21
+
+ mulld 12,8,6
+ mulhdu 21,8,6
+ addc 28,28,12
+ adde 29,29,21
+ mulld 12,9,4
+ mulhdu 21,9,4
+ addc 22,22,12
+ adde 23,23,21
+
+ mulld 12,10,4
+ mulhdu 21,10,4
+ addc 24,24,12
+ adde 25,25,21
+
+ mulld 12,8,4
+ mulhdu 21,8,4
+ ld 6,32(5)
+ mulli 8,8,19
+ addc 30,30,12
+ adde 31,31,21
+
+ mulld 12,11,4
+ mulhdu 21,11,4
+ addc 26,26,12
+ adde 27,27,21
+
+ mulld 12,7,4
+ mulhdu 21,7,4
+ addc 28,28,12
+ adde 29,29,21
+ mulld 12,8,6
+ mulhdu 21,8,6
+ addc 22,22,12
+ adde 23,23,21
+
+ mulld 12,9,6
+ mulhdu 21,9,6
+ addc 24,24,12
+ adde 25,25,21
+
+ mulld 12,10,6
+ mulhdu 21,10,6
+ addc 26,26,12
+ adde 27,27,21
+
+ mulld 12,11,6
+ mulhdu 21,11,6
+ addc 28,28,12
+ adde 29,29,21
+
+ mulld 12,7,6
+ mulhdu 21,7,6
+ addc 30,30,12
+ adde 31,31,21
+
+.Lfe51_reduce:
+ li 0,-1
+ srdi 0,0,13
+
+ srdi 12,26,51
+ and 9,26,0
+ insrdi 12,27,51,0
+ srdi 21,22,51
+ and 7,22,0
+ insrdi 21,23,51,0
+ addc 28,28,12
+ addze 29,29
+ addc 24,24,21
+ addze 25,25
+
+ srdi 12,28,51
+ and 10,28,0
+ insrdi 12,29,51,0
+ srdi 21,24,51
+ and 8,24,0
+ insrdi 21,25,51,0
+ addc 30,30,12
+ addze 31,31
+ add 9,9,21
+
+ srdi 12,30,51
+ and 11,30,0
+ insrdi 12,31,51,0
+ mulli 12,12,19
+
+ add 7,7,12
+
+ srdi 21,9,51
+ and 9,9,0
+ add 10,10,21
+
+ srdi 12,7,51
+ and 7,7,0
+ add 8,8,12
+
+ std 9,16(3)
+ std 10,24(3)
+ std 11,32(3)
+ std 7,0(3)
+ std 8,8(3)
+
+ ld 21,56(1)
+ ld 22,64(1)
+ ld 23,72(1)
+ ld 24,80(1)
+ ld 25,88(1)
+ ld 26,96(1)
+ ld 27,104(1)
+ ld 28,112(1)
+ ld 29,120(1)
+ ld 30,128(1)
+ ld 31,136(1)
+ addi 1,1,144
+ blr
+SYM_FUNC_END(x25519_fe51_mul)
+
+SYM_FUNC_START(x25519_fe51_sqr)
+.align 5
+
+ stdu 1,-144(1)
+ std 21,56(1)
+ std 22,64(1)
+ std 23,72(1)
+ std 24,80(1)
+ std 25,88(1)
+ std 26,96(1)
+ std 27,104(1)
+ std 28,112(1)
+ std 29,120(1)
+ std 30,128(1)
+ std 31,136(1)
+
+ ld 7,0(4)
+ ld 8,8(4)
+ ld 9,16(4)
+ ld 10,24(4)
+ ld 11,32(4)
+
+ add 6,7,7
+ mulli 21,11,19
+
+ mulld 22,7,7
+ mulhdu 23,7,7
+ mulld 24,8,6
+ mulhdu 25,8,6
+ mulld 26,9,6
+ mulhdu 27,9,6
+ mulld 28,10,6
+ mulhdu 29,10,6
+ mulld 30,11,6
+ mulhdu 31,11,6
+ add 6,8,8
+ mulld 12,11,21
+ mulhdu 11,11,21
+ addc 28,28,12
+ adde 29,29,11
+
+ mulli 5,10,19
+
+ mulld 12,8,8
+ mulhdu 11,8,8
+ addc 26,26,12
+ adde 27,27,11
+ mulld 12,9,6
+ mulhdu 11,9,6
+ addc 28,28,12
+ adde 29,29,11
+ mulld 12,10,6
+ mulhdu 11,10,6
+ addc 30,30,12
+ adde 31,31,11
+ mulld 12,21,6
+ mulhdu 11,21,6
+ add 6,10,10
+ addc 22,22,12
+ adde 23,23,11
+ mulld 12,10,5
+ mulhdu 10,10,5
+ addc 24,24,12
+ adde 25,25,10
+ mulld 12,6,21
+ mulhdu 10,6,21
+ add 6,9,9
+ addc 26,26,12
+ adde 27,27,10
+
+ mulld 12,9,9
+ mulhdu 10,9,9
+ addc 30,30,12
+ adde 31,31,10
+ mulld 12,5,6
+ mulhdu 10,5,6
+ addc 22,22,12
+ adde 23,23,10
+ mulld 12,21,6
+ mulhdu 10,21,6
+ addc 24,24,12
+ adde 25,25,10
+
+ b .Lfe51_reduce
+SYM_FUNC_END(x25519_fe51_sqr)
+
+SYM_FUNC_START(x25519_fe51_mul121666)
+.align 5
+
+ stdu 1,-144(1)
+ std 21,56(1)
+ std 22,64(1)
+ std 23,72(1)
+ std 24,80(1)
+ std 25,88(1)
+ std 26,96(1)
+ std 27,104(1)
+ std 28,112(1)
+ std 29,120(1)
+ std 30,128(1)
+ std 31,136(1)
+
+ lis 6,1
+ ori 6,6,56130
+ ld 7,0(4)
+ ld 8,8(4)
+ ld 9,16(4)
+ ld 10,24(4)
+ ld 11,32(4)
+
+ mulld 22,7,6
+ mulhdu 23,7,6
+ mulld 24,8,6
+ mulhdu 25,8,6
+ mulld 26,9,6
+ mulhdu 27,9,6
+ mulld 28,10,6
+ mulhdu 29,10,6
+ mulld 30,11,6
+ mulhdu 31,11,6
+
+ b .Lfe51_reduce
+SYM_FUNC_END(x25519_fe51_mul121666)
+
+SYM_FUNC_START(x25519_fe51_sqr_times)
+.align 5
+
+ stdu 1,-144(1)
+ std 21,56(1)
+ std 22,64(1)
+ std 23,72(1)
+ std 24,80(1)
+ std 25,88(1)
+ std 26,96(1)
+ std 27,104(1)
+ std 28,112(1)
+ std 29,120(1)
+ std 30,128(1)
+ std 31,136(1)
+
+ ld 7,0(4)
+ ld 8,8(4)
+ ld 9,16(4)
+ ld 10,24(4)
+ ld 11,32(4)
+
+ mtctr 5
+
+.Lsqr_times_loop:
+ add 6,7,7
+ mulli 21,11,19
+
+ mulld 22,7,7
+ mulhdu 23,7,7
+ mulld 24,8,6
+ mulhdu 25,8,6
+ mulld 26,9,6
+ mulhdu 27,9,6
+ mulld 28,10,6
+ mulhdu 29,10,6
+ mulld 30,11,6
+ mulhdu 31,11,6
+ add 6,8,8
+ mulld 12,11,21
+ mulhdu 11,11,21
+ addc 28,28,12
+ adde 29,29,11
+
+ mulli 5,10,19
+
+ mulld 12,8,8
+ mulhdu 11,8,8
+ addc 26,26,12
+ adde 27,27,11
+ mulld 12,9,6
+ mulhdu 11,9,6
+ addc 28,28,12
+ adde 29,29,11
+ mulld 12,10,6
+ mulhdu 11,10,6
+ addc 30,30,12
+ adde 31,31,11
+ mulld 12,21,6
+ mulhdu 11,21,6
+ add 6,10,10
+ addc 22,22,12
+ adde 23,23,11
+ mulld 12,10,5
+ mulhdu 10,10,5
+ addc 24,24,12
+ adde 25,25,10
+ mulld 12,6,21
+ mulhdu 10,6,21
+ add 6,9,9
+ addc 26,26,12
+ adde 27,27,10
+
+ mulld 12,9,9
+ mulhdu 10,9,9
+ addc 30,30,12
+ adde 31,31,10
+ mulld 12,5,6
+ mulhdu 10,5,6
+ addc 22,22,12
+ adde 23,23,10
+ mulld 12,21,6
+ mulhdu 10,21,6
+ addc 24,24,12
+ adde 25,25,10
+
+ # fe51_reduce
+ li 0,-1
+ srdi 0,0,13
+
+ srdi 12,26,51
+ and 9,26,0
+ insrdi 12,27,51,0
+ srdi 21,22,51
+ and 7,22,0
+ insrdi 21,23,51,0
+ addc 28,28,12
+ addze 29,29
+ addc 24,24,21
+ addze 25,25
+
+ srdi 12,28,51
+ and 10,28,0
+ insrdi 12,29,51,0
+ srdi 21,24,51
+ and 8,24,0
+ insrdi 21,25,51,0
+ addc 30,30,12
+ addze 31,31
+ add 9,9,21
+
+ srdi 12,30,51
+ and 11,30,0
+ insrdi 12,31,51,0
+ mulli 12,12,19
+
+ add 7,7,12
+
+ srdi 21,9,51
+ and 9,9,0
+ add 10,10,21
+
+ srdi 12,7,51
+ and 7,7,0
+ add 8,8,12
+
+ std 9,16(3)
+ std 10,24(3)
+ std 11,32(3)
+ std 7,0(3)
+ std 8,8(3)
+ bdnz .Lsqr_times_loop
+
+ ld 21,56(1)
+ ld 22,64(1)
+ ld 23,72(1)
+ ld 24,80(1)
+ ld 25,88(1)
+ ld 26,96(1)
+ ld 27,104(1)
+ ld 28,112(1)
+ ld 29,120(1)
+ ld 30,128(1)
+ ld 31,136(1)
+ addi 1,1,144
+ blr
+SYM_FUNC_END(x25519_fe51_sqr_times)
+
+SYM_FUNC_START(x25519_fe51_frombytes)
+.align 5
+
+ li 12, -1
+ srdi 12, 12, 13 # 0x7ffffffffffff
+
+ ld 5, 0(4)
+ ld 6, 8(4)
+ ld 7, 16(4)
+ ld 8, 24(4)
+
+ srdi 10, 5, 51
+ and 5, 5, 12 # h0
+
+ sldi 11, 6, 13
+ or 11, 10, 11 # h1t
+ srdi 10, 6, 38
+ and 6, 11, 12 # h1
+
+ sldi 11, 7, 26
+ or 10, 10, 11 # h2t
+
+ srdi 11, 7, 25
+ and 7, 10, 12 # h2
+ sldi 10, 8, 39
+ or 11, 11, 10 # h3t
+
+ srdi 9, 8, 12
+ and 8, 11, 12 # h3
+ and 9, 9, 12 # h4
+
+ std 5, 0(3)
+ std 6, 8(3)
+ std 7, 16(3)
+ std 8, 24(3)
+ std 9, 32(3)
+
+ blr
+SYM_FUNC_END(x25519_fe51_frombytes)
+
+SYM_FUNC_START(x25519_fe51_tobytes)
+.align 5
+
+ ld 5, 0(4)
+ ld 6, 8(4)
+ ld 7, 16(4)
+ ld 8, 24(4)
+ ld 9, 32(4)
+
+ li 12, -1
+ srdi 12, 12, 13 # 0x7ffffffffffff
+
+ # Full reducuction
+ addi 10, 5, 19
+ srdi 10, 10, 51
+ add 10, 10, 6
+ srdi 10, 10, 51
+ add 10, 10, 7
+ srdi 10, 10, 51
+ add 10, 10, 8
+ srdi 10, 10, 51
+ add 10, 10, 9
+ srdi 10, 10, 51
+
+ mulli 10, 10, 19
+ add 5, 5, 10
+ srdi 11, 5, 51
+ add 6, 6, 11
+ srdi 11, 6, 51
+ add 7, 7, 11
+ srdi 11, 7, 51
+ add 8, 8, 11
+ srdi 11, 8, 51
+ add 9, 9, 11
+
+ and 5, 5, 12
+ and 6, 6, 12
+ and 7, 7, 12
+ and 8, 8, 12
+ and 9, 9, 12
+
+ sldi 10, 6, 51
+ or 5, 5, 10 # s0
+
+ srdi 11, 6, 13
+ sldi 10, 7, 38
+ or 6, 11, 10 # s1
+
+ srdi 11, 7, 26
+ sldi 10, 8, 25
+ or 7, 11, 10 # s2
+
+ srdi 11, 8, 39
+ sldi 10, 9, 12
+ or 8, 11, 10 # s4
+
+ std 5, 0(3)
+ std 6, 8(3)
+ std 7, 16(3)
+ std 8, 24(3)
+
+ blr
+SYM_FUNC_END(x25519_fe51_tobytes)
--
2.31.1


2024-05-14 17:39:26

by Danny Tsen

[permalink] [raw]
Subject: [PATCH 2/3] crypto: X25519 core functions for ppc64le

X25519 core functions to handle scalar multiplication for ppc64le.

Signed-off-by: Danny Tsen <[email protected]>
---
arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 ++++++++++++++++++
1 file changed, 299 insertions(+)
create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c

diff --git a/arch/powerpc/crypto/curve25519-ppc64le-core.c b/arch/powerpc/crypto/curve25519-ppc64le-core.c
new file mode 100644
index 000000000000..6a8b5efc40ce
--- /dev/null
+++ b/arch/powerpc/crypto/curve25519-ppc64le-core.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2024- IBM Corp. All rights reserved.
+ *
+ * X25519 scalar multiplication with 51 bits limbs for PPC64le.
+ * Based on RFC7748 and AArch64 optimized implementation for X25519
+ * - Algorithm 1 Scalar multiplication of a variable point
+ */
+
+#include <crypto/curve25519.h>
+#include <crypto/internal/kpp.h>
+
+#include <linux/types.h>
+#include <linux/jump_label.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/scatterlist.h>
+
+#include <linux/cpufeature.h>
+#include <linux/processor.h>
+
+typedef uint64_t fe51[5];
+
+asmlinkage void x25519_fe51_mul(fe51 h, const fe51 f, const fe51 g);
+asmlinkage void x25519_fe51_sqr(fe51 h, const fe51 f);
+asmlinkage void x25519_fe51_mul121666(fe51 h, fe51 f);
+asmlinkage void x25519_fe51_sqr_times(fe51 h, const fe51 f, int n);
+asmlinkage void x25519_fe51_frombytes(fe51 h, const uint8_t *s);
+asmlinkage void x25519_fe51_tobytes(uint8_t *s, const fe51 h);
+
+#define fmul x25519_fe51_mul
+#define fsqr x25519_fe51_sqr
+#define fmul121666 x25519_fe51_mul121666
+#define fe51_tobytes x25519_fe51_tobytes
+#define fe51_frombytes x25519_fe51_frombytes
+
+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+ u64 t, i;
+ u64 c = 0 - (u64) bit;
+
+ for (i = 0; i < 5; ++i) {
+ t = c & (p[i] ^ q[i]);
+ p[i] ^= t;
+ q[i] ^= t;
+ }
+}
+
+static void fadd(fe51 h, const fe51 f, const fe51 g)
+{
+ h[0] = f[0] + g[0];
+ h[1] = f[1] + g[1];
+ h[2] = f[2] + g[2];
+ h[3] = f[3] + g[3];
+ h[4] = f[4] + g[4];
+}
+
+/*
+ * Prime = 2 ** 255 - 19, 255 bits
+ * (0x7fffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffed)
+ *
+ * Prime in 5 51-bit limbs
+ */
+static fe51 prime51 = { 0x7ffffffffffed, 0x7ffffffffffff, 0x7ffffffffffff, 0x7ffffffffffff, 0x7ffffffffffff};
+
+static void fsub(fe51 h, const fe51 f, const fe51 g)
+{
+ h[0] = (f[0] + ((prime51[0] * 2))) - g[0];
+ h[1] = (f[1] + ((prime51[1] * 2))) - g[1];
+ h[2] = (f[2] + ((prime51[2] * 2))) - g[2];
+ h[3] = (f[3] + ((prime51[3] * 2))) - g[3];
+ h[4] = (f[4] + ((prime51[4] * 2))) - g[4];
+}
+
+static void finv(fe51 o, const fe51 i)
+{
+ fe51 a0, b, c, t00;
+
+ fsqr(a0, i);
+ x25519_fe51_sqr_times(t00, a0, 2);
+
+ fmul(b, t00, i);
+ fmul(a0, b, a0);
+
+ fsqr(t00, a0);
+
+ fmul(b, t00, b);
+ x25519_fe51_sqr_times(t00, b, 5);
+
+ fmul(b, t00, b);
+ x25519_fe51_sqr_times(t00, b, 10);
+
+ fmul(c, t00, b);
+ x25519_fe51_sqr_times(t00, c, 20);
+
+ fmul(t00, t00, c);
+ x25519_fe51_sqr_times(t00, t00, 10);
+
+ fmul(b, t00, b);
+ x25519_fe51_sqr_times(t00, b, 50);
+
+ fmul(c, t00, b);
+ x25519_fe51_sqr_times(t00, c, 100);
+
+ fmul(t00, t00, c);
+ x25519_fe51_sqr_times(t00, t00, 50);
+
+ fmul(t00, t00, b);
+ x25519_fe51_sqr_times(t00, t00, 5);
+
+ fmul(o, t00, a0);
+}
+
+static void curve25519_fe51(uint8_t out[32], const uint8_t scalar[32],
+ const uint8_t point[32])
+{
+ fe51 x1, x2, z2, x3, z3;
+ uint8_t s[32];
+ unsigned int swap = 0;
+ int i;
+
+ memcpy(s, scalar, 32);
+ s[0] &= 0xf8;
+ s[31] &= 0x7f;
+ s[31] |= 0x40;
+ fe51_frombytes(x1, point);
+
+ z2[0] = z2[1] = z2[2] = z2[3] = z2[4] = 0;
+ x3[0] = x1[0];
+ x3[1] = x1[1];
+ x3[2] = x1[2];
+ x3[3] = x1[3];
+ x3[4] = x1[4];
+
+ x2[0] = z3[0] = 1;
+ x2[1] = z3[1] = 0;
+ x2[2] = z3[2] = 0;
+ x2[3] = z3[3] = 0;
+ x2[4] = z3[4] = 0;
+
+ for (i = 254; i >= 0; --i) {
+ unsigned int k_t = 1 & (s[i / 8] >> (i & 7));
+ fe51 a, b, c, d, e;
+ fe51 da, cb, aa, bb;
+ fe51 dacb_p, dacb_m;
+
+ swap ^= k_t;
+ cswap(x2, x3, swap);
+ cswap(z2, z3, swap);
+ swap = k_t;
+
+ fsub(b, x2, z2); // B = x_2 - z_2
+ fadd(a, x2, z2); // A = x_2 + z_2
+ fsub(d, x3, z3); // D = x_3 - z_3
+ fadd(c, x3, z3); // C = x_3 + z_3
+
+ fsqr(bb, b); // BB = B^2
+ fsqr(aa, a); // AA = A^2
+ fmul(da, d, a); // DA = D * A
+ fmul(cb, c, b); // CB = C * B
+
+ fsub(e, aa, bb); // E = AA - BB
+ fmul(x2, aa, bb); // x2 = AA * BB
+ fadd(dacb_p, da, cb); // DA + CB
+ fsub(dacb_m, da, cb); // DA - CB
+
+ fmul121666(z3, e); // 121666 * E
+ fsqr(z2, dacb_m); // (DA - CB)^2
+ fsqr(x3, dacb_p); // x3 = (DA + CB)^2
+ fadd(b, bb, z3); // BB + 121666 * E
+ fmul(z3, x1, z2); // z3 = x1 * (DA - CB)^2
+ fmul(z2, e, b); // z2 = e * (BB + (DA + CB)^2)
+ }
+
+ finv(z2, z2);
+ fmul(x2, x2, z2);
+ fe51_tobytes(out, x2);
+}
+
+void curve25519_arch(u8 mypublic[CURVE25519_KEY_SIZE],
+ const u8 secret[CURVE25519_KEY_SIZE],
+ const u8 basepoint[CURVE25519_KEY_SIZE])
+{
+ curve25519_fe51(mypublic, secret, basepoint);
+}
+EXPORT_SYMBOL(curve25519_arch);
+
+void curve25519_base_arch(u8 pub[CURVE25519_KEY_SIZE],
+ const u8 secret[CURVE25519_KEY_SIZE])
+{
+ curve25519_fe51(pub, secret, curve25519_base_point);
+}
+EXPORT_SYMBOL(curve25519_base_arch);
+
+static int curve25519_set_secret(struct crypto_kpp *tfm, const void *buf,
+ unsigned int len)
+{
+ u8 *secret = kpp_tfm_ctx(tfm);
+
+ if (!len)
+ curve25519_generate_secret(secret);
+ else if (len == CURVE25519_KEY_SIZE &&
+ crypto_memneq(buf, curve25519_null_point, CURVE25519_KEY_SIZE))
+ memcpy(secret, buf, CURVE25519_KEY_SIZE);
+ else
+ return -EINVAL;
+ return 0;
+}
+
+static int curve25519_generate_public_key(struct kpp_request *req)
+{
+ struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+ const u8 *secret = kpp_tfm_ctx(tfm);
+ u8 buf[CURVE25519_KEY_SIZE];
+ int copied, nbytes;
+
+ if (req->src)
+ return -EINVAL;
+
+ curve25519_base_arch(buf, secret);
+
+ /* might want less than we've got */
+ nbytes = min_t(size_t, CURVE25519_KEY_SIZE, req->dst_len);
+ copied = sg_copy_from_buffer(req->dst, sg_nents_for_len(req->dst,
+ nbytes),
+ buf, nbytes);
+ if (copied != nbytes)
+ return -EINVAL;
+ return 0;
+}
+
+static int curve25519_compute_shared_secret(struct kpp_request *req)
+{
+ struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+ const u8 *secret = kpp_tfm_ctx(tfm);
+ u8 public_key[CURVE25519_KEY_SIZE];
+ u8 buf[CURVE25519_KEY_SIZE];
+ int copied, nbytes;
+
+ if (!req->src)
+ return -EINVAL;
+
+ copied = sg_copy_to_buffer(req->src,
+ sg_nents_for_len(req->src,
+ CURVE25519_KEY_SIZE),
+ public_key, CURVE25519_KEY_SIZE);
+ if (copied != CURVE25519_KEY_SIZE)
+ return -EINVAL;
+
+ curve25519_arch(buf, secret, public_key);
+
+ /* might want less than we've got */
+ nbytes = min_t(size_t, CURVE25519_KEY_SIZE, req->dst_len);
+ copied = sg_copy_from_buffer(req->dst, sg_nents_for_len(req->dst,
+ nbytes),
+ buf, nbytes);
+ if (copied != nbytes)
+ return -EINVAL;
+ return 0;
+}
+
+static unsigned int curve25519_max_size(struct crypto_kpp *tfm)
+{
+ return CURVE25519_KEY_SIZE;
+}
+
+static struct kpp_alg curve25519_alg = {
+ .base.cra_name = "curve25519",
+ .base.cra_driver_name = "curve25519-ppc64le",
+ .base.cra_priority = 200,
+ .base.cra_module = THIS_MODULE,
+ .base.cra_ctxsize = CURVE25519_KEY_SIZE,
+
+ .set_secret = curve25519_set_secret,
+ .generate_public_key = curve25519_generate_public_key,
+ .compute_shared_secret = curve25519_compute_shared_secret,
+ .max_size = curve25519_max_size,
+};
+
+
+static int __init curve25519_mod_init(void)
+{
+ return IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
+ crypto_register_kpp(&curve25519_alg) : 0;
+}
+
+static void __exit curve25519_mod_exit(void)
+{
+ if (IS_REACHABLE(CONFIG_CRYPTO_KPP))
+ crypto_unregister_kpp(&curve25519_alg);
+}
+
+module_init(curve25519_mod_init);
+module_exit(curve25519_mod_exit);
+
+MODULE_ALIAS_CRYPTO("curve25519");
+MODULE_ALIAS_CRYPTO("curve25519-ppc64le");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Danny Tsen <[email protected]>");
--
2.31.1


2024-05-14 17:39:34

by Danny Tsen

[permalink] [raw]
Subject: [PATCH 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519.

Defined CRYPTO_CURVE25519_PPC64 to support X25519 for ppc64le.

Added new module curve25519-ppc64le for X25519.

Signed-off-by: Danny Tsen <[email protected]>
---
arch/powerpc/crypto/Kconfig | 11 +++++++++++
arch/powerpc/crypto/Makefile | 2 ++
2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig
index 1e201b7ae2fc..09ebcbdfb34f 100644
--- a/arch/powerpc/crypto/Kconfig
+++ b/arch/powerpc/crypto/Kconfig
@@ -2,6 +2,17 @@

menu "Accelerated Cryptographic Algorithms for CPU (powerpc)"

+config CRYPTO_CURVE25519_PPC64
+ tristate "Public key crypto: Curve25519 (PowerPC64)"
+ depends on PPC64 && CPU_LITTLE_ENDIAN
+ select CRYPTO_LIB_CURVE25519_GENERIC
+ select CRYPTO_ARCH_HAVE_LIB_CURVE25519
+ help
+ Curve25519 algorithm
+
+ Architecture: PowerPC64
+ - Little-endian
+
config CRYPTO_CRC32C_VPMSUM
tristate "CRC32c"
depends on PPC64 && ALTIVEC
diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index fca0e9739869..59808592f0a1 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o
obj-$(CONFIG_CRYPTO_CHACHA20_P10) += chacha-p10-crypto.o
obj-$(CONFIG_CRYPTO_POLY1305_P10) += poly1305-p10-crypto.o
obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o
+obj-$(CONFIG_CRYPTO_CURVE25519_PPC64) += curve25519-ppc64le.o

aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o
md5-ppc-y := md5-asm.o md5-glue.o
@@ -29,6 +30,7 @@ aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-p
chacha-p10-crypto-y := chacha-p10-glue.o chacha-p10le-8x.o
poly1305-p10-crypto-y := poly1305-p10-glue.o poly1305-p10le_64.o
vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o
+curve25519-ppc64le-y := curve25519-ppc64le-core.o curve25519-ppc64le_asm.o

ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
override flavour := linux-ppc64le
--
2.31.1


2024-05-15 08:30:09

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

Hi,

> +static void cswap(fe51 p, fe51 q, unsigned int bit)
> +{
> + u64 t, i;
> + u64 c = 0 - (u64) bit;
> +
> + for (i = 0; i < 5; ++i) {
> + t = c & (p[i] ^ q[i]);
> + p[i] ^= t;
> + q[i] ^= t;
> + }
> +}

The "c" in cswap stands for "constant-time," and the problem is that
contemporary compilers have exhibited the ability to produce
non-constant-time machine code as result of compilation of the above
kind of technique. The outcome is platform-specific and ironically some
of PPC code generators were observed to generate "most"
non-constant-time code. "Most" in sense that execution time variations
would be most easy to catch. One way to work around the problem, at
least for the time being, is to add 'asm volatile("" : "+r"(c))' after
you calculate 'c'. But there is no guarantee that the next compiler
version won't see through it, hence the permanent solution is to do it
in assembly. I can put together something...

Cheers.


2024-05-15 13:09:51

by Danny Tsen

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

Hi Andy,

Points taken.  And much appreciate for the help.

Thanks.

-Danny

On 5/15/24 3:29 AM, Andy Polyakov wrote:
> Hi,
>
>> +static void cswap(fe51 p, fe51 q, unsigned int bit)
>> +{
>> +    u64 t, i;
>> +    u64 c = 0 - (u64) bit;
>> +
>> +    for (i = 0; i < 5; ++i) {
>> +        t = c & (p[i] ^ q[i]);
>> +        p[i] ^= t;
>> +        q[i] ^= t;
>> +    }
>> +}
>
> The "c" in cswap stands for "constant-time," and the problem is that
> contemporary compilers have exhibited the ability to produce
> non-constant-time machine code as result of compilation of the above
> kind of technique. The outcome is platform-specific and ironically
> some of PPC code generators were observed to generate "most"
> non-constant-time code. "Most" in sense that execution time variations
> would be most easy to catch. One way to work around the problem, at
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after
> you calculate 'c'. But there is no guarantee that the next compiler
> version won't see through it, hence the permanent solution is to do it
> in assembly. I can put together something...
>
> Cheers.
>

2024-05-15 13:33:16

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

>> +static void cswap(fe51 p, fe51 q, unsigned int bit)
>> +{
>> +    u64 t, i;
>> +    u64 c = 0 - (u64) bit;
>> +
>> +    for (i = 0; i < 5; ++i) {
>> +        t = c & (p[i] ^ q[i]);
>> +        p[i] ^= t;
>> +        q[i] ^= t;
>> +    }
>> +}
>
> The "c" in cswap stands for "constant-time," and the problem is that
> contemporary compilers have exhibited the ability to produce
> non-constant-time machine code as result of compilation of the above
> kind of technique. The outcome is platform-specific and ironically some
> of PPC code generators were observed to generate "most"
> non-constant-time code. "Most" in sense that execution time variations
> would be most easy to catch.

Just to substantiate the point, consider
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the
middle of the loop, which flies in the face of constant-time-ness. In
case you object 'bit &= 1' on line 7 in the C code. Indeed, if you
comment it out, the generated code will be fine. But the point is that
the compiler is capable of and was in fact observed to figure out that
the caller passes either one or zero and generate the machine code in
the assembly window. In other words 'bit &= 1' is just a reflection of
what the caller does.

> ... the permanent solution is to do it
> in assembly. I can put together something...

Though you should be able to do this just as well :-) So should I or
would you?

Cheers.


2024-05-15 14:20:33

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

> Thanks for the info.  I should be able to do it.  I was hoping an
> assembly guru like you can show me some tricks here if there is :)

No tricks in cswap, it's as straightforward as it gets, so go ahead :-)


2024-05-16 04:54:07

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Hi Danny,

Danny Tsen <[email protected]> writes:
> Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
> supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
> and x25519_fe51_tobytes.

For other algorithms we have checked-in the perl script and generated
the code at runtime. Is there a reason you've done it differently this time?

> Signed-off-by: Danny Tsen <[email protected]>
> ---
> arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++++++++++++++++++
> 1 file changed, 648 insertions(+)
> create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S
>
> diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> new file mode 100644
> index 000000000000..8a018104838a
> --- /dev/null
> +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> @@ -0,0 +1,648 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#
> +# Copyright 2024- IBM Corp. All Rights Reserved.

I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and
can be confusing - because we are not reserving all rights, we are
granting some rights under the GPL.

I also think the IBM copyright should be down below where your
modifications are described.

> +# This code is taken from CRYPTOGAMs[1] and is included here using the option
> +# in the license to distribute the code under the GPL. Therefore this program
> +# is free software; you can redistribute it and/or modify it under the terms of
> +# the GNU General Public License version 2 as published by the Free Software
> +# Foundation.
> +#
> +# [1] https://www.openssl.org/~appro/cryptogams/
> +
> +# Copyright (c) 2006-2017, CRYPTOGAMS by <[email protected]>
> +# All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +# * Redistributions of source code must retain copyright notices,
> +# this list of conditions and the following disclaimer.
> +#
> +# * Redistributions in binary form must reproduce the above
> +# copyright notice, this list of conditions and the following
> +# disclaimer in the documentation and/or other materials
> +# provided with the distribution.
> +#
> +# * Neither the name of the CRYPTOGAMS nor the names of its
> +# copyright holder and contributors may be used to endorse or
> +# promote products derived from this software without specific
> +# prior written permission.
> +#
> +# ALTERNATIVELY, provided that this notice is retained in full, this
> +# product may be distributed under the terms of the GNU General Public
> +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
> +# those given above.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +# ====================================================================
> +# Written by Andy Polyakov <[email protected]> for the OpenSSL
> +# project. The module is, however, dual licensed under OpenSSL and
> +# CRYPTOGAMS licenses depending on where you obtain it. For further
> +# details see https://www.openssl.org/~appro/cryptogams/.
> +# ====================================================================
> +
> +#
> +# ====================================================================
> +# Written and Modified by Danny Tsen <[email protected]>
> +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes

ie. here.

> +# X25519 lower-level primitives for PPC64.
> +#
> +
> +#include <linux/linkage.h>
> +
> +.machine "any"

Please don't add new .machine directives unless they are required.

> +.abiversion 2

I'd prefer that was left to the compiler flags.

cheers

2024-05-16 08:38:54

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Hi,

>> +.abiversion 2
>
> I'd prefer that was left to the compiler flags.

Problem is that it's the compiler that is responsible for providing this
directive in the intermediate .s prior invoking the assembler. And there
is no assembler flag to pass through -Wa. If concern is ABI neutrality,
then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can
also make a case for

#ifdef _CALL_ELF
.abiversion _CALL_ELF
#endif

Cheers.


2024-05-16 13:42:25

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Hi,

>>>> +.abiversion 2
>>>
>>> I'd prefer that was left to the compiler flags.
>>
>> Problem is that it's the compiler that is responsible for providing this
>> directive in the intermediate .s prior invoking the assembler. And there
>> is no assembler flag to pass through -Wa.
>
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
>
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.
>
> So all the intermediate .o's generated from .S files are not ELFv2:
>
> $ find .build/ -name '*.o' | xargs file | grep Unspecified
> .build/arch/powerpc/kernel/vdso/note-64.o: ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped

I would guess that contemporary linker is more forgiving than it was
back then when the .abiversion directive was added. If it works now,
then it of course can be omitted. I suppose my original remark should be
viewed rather as "you can't replace it with a command line option" than
"you can't make it work without it." :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?
>
> So it currently works. But it's kind of gross that those .o files are
> not ELFv2 for an ELFv2 build.

Well, as far as passing base types and pointers to/from assembly goes,
there are no differences between the versions. Then it's a question of
meaning assigned to r2 and r13, but as long as you don't touch them, you
can freely reuse the code with either ABI. With this in mind the
.abiversion directive is effectively reduced to just a marker in the .o
file. In other words the instruction sequences by themselves are
customarily ABI-neutral, at least in "general calculation" modules such
as the suggested one, so that if it works 100% without the .abiversion
directive, then it can be safely omitted.

Cheers.


2024-05-16 19:52:40

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Hi!

On Thu, May 16, 2024 at 10:06:58PM +1000, Michael Ellerman wrote:
> Andy Polyakov <[email protected]> writes:
> >>> +.abiversion 2
> >>
> >> I'd prefer that was left to the compiler flags.
> >
> > Problem is that it's the compiler that is responsible for providing this
> > directive in the intermediate .s prior invoking the assembler. And there
> > is no assembler flag to pass through -Wa.
>
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
>
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.

Yup. You coulds include some header file, maybe? Since you run the
assembler code through the C preprocessor anyway, for some weird reason :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?

It isn't a version. It is an actual different ABI.

GNU LD allows linking together whatever, yes.

> Is .abiversion documented anywhere? I can't see it in the manual.

Yeah me neither. https://sourceware.org/bugzilla/enter_bug.cgi ?
A commandline flag (to GAS) would seem best?


Segher

2024-05-16 20:10:43

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le

On Wed, May 15, 2024 at 10:29:56AM +0200, Andy Polyakov wrote:
> >+static void cswap(fe51 p, fe51 q, unsigned int bit)
>
> The "c" in cswap stands for "constant-time," and the problem is that
> contemporary compilers have exhibited the ability to produce
> non-constant-time machine code as result of compilation of the above
> kind of technique.

This can happen with *any* comnpiler, on *any* platform. In general,
you have to write machine code if you want to be sure what machine code
will eventually be executed.

> The outcome is platform-specific and ironically some
> of PPC code generators were observed to generate "most"
> non-constant-time code. "Most" in sense that execution time variations
> would be most easy to catch. One way to work around the problem, at
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after
> you calculate 'c'. But there is no guarantee that the next compiler
> version won't see through it, hence the permanent solution is to do it
> in assembly. I can put together something...

Such tricks can help ameliorate the problem, sure. But it is not a
solution ever.


Segher