2022-05-11 13:22:40

by Stafford Horne

[permalink] [raw]
Subject: Re: [PATCH v2] hex2bin: make the function hex_to_bin constant-time

On Sun, May 08, 2022 at 09:37:26AM +0900, Stafford Horne wrote:
> On Thu, May 05, 2022 at 05:38:58AM +0900, Stafford Horne wrote:
> > On Wed, May 04, 2022 at 01:10:03PM -0700, Linus Torvalds wrote:
> > > On Wed, May 4, 2022 at 12:58 PM Stafford Horne <[email protected]> wrote:
> > > >
> > > > I have uploaded a diff I created here:
> > > > https://gist.github.com/54334556f2907104cd12374872a0597c
> > > >
> > > > It shows the same output.
> > >
> > > In hex_to_bin itself it seems to only be a difference due to some
> > > register allocation (r19 and r3 switched around).
> > >
> > > But then it gets inlined into hex2bin and there changes there seem to
> > > be about instruction and basic block scheduling, so it's a lot harder
> > > to see what's going on.
> > >
> > > And a lot of constant changes, which honestly look just like code code
> > > moved around by 16 bytes and offsets changed due to that.
> > >
> > > So I doubt it's hex_to_bin() that is causing problems, I think it's
> > > purely code movement. Which explains why adding a nop or a fake printk
> > > fixes things.
> > >
> > > Some alignment assumption that got broken?
> >
> > This is what it looks like to me too. I will have to do a deep dive on what is
> > going on with this particular build combination as I can't figure out what it is
> > off the top of my head.
> >
> > This test is using a gcc 11 compiler, I tried with my gcc 12 toolchain and the
> > issue cannot be reproduced.
> >
> > - musl gcc 11 - https://musl.cc/or1k-linux-musl-cross.tgz
> > - openrisc gcc 12 - https://github.com/openrisc/or1k-gcc/releases/tag/or1k-12.0.1-20220210-20220304
> >
> > But again the difference between the two compiler outputs is a lot of register
> > allocation and offsets changes. Its not easy to see anything that stands out.
> > I checked the change log for the openrisc specific changes from gcc 11 to gcc
> > 12. Nothing seems to stand out, mcount profiler fix for PIC, a new large binary
> > link flag.
>
> Hello,
>
> Just an update on this. The issue so far has been traced to the alignment of
> the crypto multiplication function fe_mul_impl in lib/crypto/curve25519-fiat32.c.
>
> This patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time")
> allowed for the alignment to be just right to cause the failure. Without
> this patch and forcing the alignment we can reproduce the issue. So though the
> bisect is correct, this patch is not the root cause of the issue.
>
> Using some l.nop sliding techniques and some strategically placed .align
> statements I have been able to reproduce the issue on:
>
> - gcc 11 and gcc 12
> - preempt and non-preempt kernels
>
> I have not been able to reproduce this on my FPGA, so far only QEMU. My
> hunch now is that since the fe_mul_impl function contains some rather long basic
> blocks it appears that timer interrupts that interrupt qemu mid basic block
> execution somehow is causing this. The hypothesis is it may be basic block
> resuming behavior in qemu that causes incosistent behavior.
>
> It will take a bit more time to trace this. Since I maintain OpenRISC QEMU the
> issue is on me.
>
> Again, It's safe to say that patch e5be15767e7e ("hex2bin: make the function
> hex_to_bin constant-time") is not an issue.

This issue has been fixed. I sent a patch to QEMU for it:

- https://lore.kernel.org/qemu-devel/[email protected]/T/#u

The issue was a bug in the OpenRISC emulation in QEMU which was triggered when
receiving a TICK TIMER interrupt, in a delay slot, on a page boundary.

The fix was simple enough, but investigation took quite some work.

Thanks,

-Stafford