On Thu, May 5, 2022 at 4:43 AM Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Linus,
>
> On Wed, May 4, 2022 at 8:00 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Wed, May 4, 2022 at 3:15 AM Jason A. Donenfeld <[email protected]> wrote:
> > >
> > > > Alignment? Compiler bug? HW issue?
> > >
> > > Probably one of those, yea. Removing the instruction addresses, the only
> > > difference between the two compiles is: https://xn--4db.cc/Rrn8usaX/diff#line-440
> >
> > Well, that address doesn't work for me at all. It turns into א.cc.
> >
> > I'd love to see the compiler problem, since I find those fascinating
> > (mainly because they scare the hell out of me), but those web
> > addresses you use are not working for me.
>
> א.cc is correct. If you can't load it, your browser or something in
> your stack is broken. Choosing a non-ASCII domain like that clearly a
> bad decision because people with broken stacks can't load it? Yea,
> maybe. But maybe it's like the arch/alpha/ reordering of dependent
> loads applied to the web... A bit of stretch.
I have uploaded a diff I created here:
https://gist.github.com/54334556f2907104cd12374872a0597c
It shows the same output.
> > It most definitely looks like an OpenRISC compiler bug - that code
> > doesn't look like it does anything remotely undefined (and with the
> > "unsigned char", nothing implementation-defined either).
>
> I'm not so certain it's in the compiler anymore, actually. The bug
> exhibits itself even when that code isn't actually called. Adding nops
> to unrelated code also makes the problem go away. And removing these
> nops [1] makes the problem go away too. So maybe it's looking more
> like a linker bug (or linker script bug) related to alignment. Or
> whatever is jumping between contexts in the preemption code and
> restoring registers and such is assuming certain things about code
> layout that doesn't always hold. More fiddling is necessary still.
Bisecting definitely came to this patch which is strange. Then reverting
e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time")
did also fix the problem for me.
But it could be any small patch that changes layout could make this go away.
I have things to try:
- more close look at the produced asembly diff
- newer compiler (I fixed a few bugs in gcc 12 for openrisc, and
this testing came up in gcc 11)
- trying on FPGA's
I'll report as I find things.
-Stafford