2024-05-15 09:07:11

by Andy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

Hi,

> +SYM_FUNC_START(x25519_fe51_sqr_times)
> ...
> +
> +.Lsqr_times_loop:
> ...
> +
> + std 9,16(3)
> + std 10,24(3)
> + std 11,32(3)
> + std 7,0(3)
> + std 8,8(3)
> + bdnz .Lsqr_times_loop

I see no reason for why the stores can't be moved outside the loop in
question.

> +SYM_FUNC_START(x25519_fe51_frombytes)
> +.align 5
> +
> + li 12, -1
> + srdi 12, 12, 13 # 0x7ffffffffffff
> +
> + ld 5, 0(4)
> + ld 6, 8(4)
> + ld 7, 16(4)
> + ld 8, 24(4)

Is there actual guarantee that the byte input is 64-bit aligned? While
it is true that processor is obliged to handle misaligned loads and
stores by the ISA specification, them being inefficient doesn't go
against it. Most notably inefficiency is likely to be noted at the page
boundaries. What I'm trying to say is that it would be more appropriate
to avoid the unaligned loads (and stores).

Cheers.



2024-05-15 13:04:40

by Danny Tsen

[permalink] [raw]
Subject: Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.

See inline.

On 5/15/24 4:06 AM, Andy Polyakov wrote:
> Hi,
>
>> +SYM_FUNC_START(x25519_fe51_sqr_times)
>> ...
>> +
>> +.Lsqr_times_loop:
>> ...
>> +
>> +    std    9,16(3)
>> +    std    10,24(3)
>> +    std    11,32(3)
>> +    std    7,0(3)
>> +    std    8,8(3)
>> +    bdnz    .Lsqr_times_loop
>
> I see no reason for why the stores can't be moved outside the loop in
> question.
>
Yeah.  I'll fix it.


>> +SYM_FUNC_START(x25519_fe51_frombytes)
>> +.align    5
>> +
>> +    li    12, -1
>> +    srdi    12, 12, 13    # 0x7ffffffffffff
>> +
>> +    ld    5, 0(4)
>> +    ld    6, 8(4)
>> +    ld    7, 16(4)
>> +    ld    8, 24(4)
>
> Is there actual guarantee that the byte input is 64-bit aligned? While
> it is true that processor is obliged to handle misaligned loads and
> stores by the ISA specification, them being inefficient doesn't go
> against it. Most notably inefficiency is likely to be noted at the
> page boundaries. What I'm trying to say is that it would be more
> appropriate to avoid the unaligned loads (and stores).

Good point.  Maybe I can handle it with 64-bit aligned for the input.

Thanks.


>
> Cheers.
>