Hi,
I have a 32bit installation here that stopped working. Bisected it
to commit d298b03506d3 ("x86/fpu: Restore the masking out of reserved
MXCSR bits").
dhcpcd was the first thing I notice being affected on account of
network not coming up, and after trying to look at it with gdb also
gdb turned out to be broken.
strace of dhcpcd shows a SIGFPE getting delivered, after which it gets
stuck (seem to be sitting in poll but not responding to even ^C).
And gdb seems to be stuck in a perpetual SIGFPE loop and won't even
get to the prompt.
The crucial bit here seems to be that most of the software is built
with -mfpmath=sse. After rebuilding dhcpcd without that it started
to work on the broken kernel. Rebuilding gdb didn't help so I whatever
SSE usage is causing the issue is presumably happening in a library.
Had to do the rebuilds on a working kernel as well because otherwise
the build itself would die to a SIGFPE somewhere.
Tested the same disk on on both a 64bit capable Pentium D
and a 32bit only Pentium 4 just to rule out the specific CPU.
Busted on both.
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 04:27:07PM +0200, Borislav Petkov wrote:
> On Thu, Oct 14, 2021 at 02:44:33PM +0300, Ville Syrj?l? wrote:
> > I have a 32bit installation here that stopped working. Bisected it
> > to commit d298b03506d3 ("x86/fpu: Restore the masking out of reserved
> > MXCSR bits").
>
> Lemme make sure I understand this correctly: this patch is bad and with
> it reverted it works?
Yes.
>
> Because before this patch, the restoring would be a more restrictive
> than before and this patch reverts the code to the old behavior for
> invalid MXCSR bits.
>
> > Tested the same disk on on both a 64bit capable Pentium D
> > and a 32bit only Pentium 4 just to rule out the specific CPU.
> > Busted on both.
>
> So that's a purely 32-bit installation and a 32-bit kernel and you've
> booted it on two different machines?
Yes.
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 04:56:56PM +0200, Borislav Petkov wrote:
> On Thu, Oct 14, 2021 at 05:43:14PM +0300, Ville Syrj?l? wrote:
> > Hmm. Actually I just stared at the code a bit more it looks
> > a bit funny. Was it supposed to do this instead?
> >
> > - fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
> > + fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
>
> Whoops, I had it like that in the original patch:
>
> https://lore.kernel.org/all/163354193576.25758.8132624386883258818.tip-bot2@tip-bot2/
>
> I blame tglx. :-)
>
> Does it work if you remove the mask negation "~"?
The machine is currently preoccupied with other things. Should free up
in an hour or two. Once it does I'll give this a spin and report back.
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 02:44:33PM +0300, Ville Syrjälä wrote:
> I have a 32bit installation here that stopped working. Bisected it
> to commit d298b03506d3 ("x86/fpu: Restore the masking out of reserved
> MXCSR bits").
Lemme make sure I understand this correctly: this patch is bad and with
it reverted it works?
Because before this patch, the restoring would be a more restrictive
than before and this patch reverts the code to the old behavior for
invalid MXCSR bits.
> Tested the same disk on on both a 64bit capable Pentium D
> and a 32bit only Pentium 4 just to rule out the specific CPU.
> Busted on both.
So that's a purely 32-bit installation and a 32-bit kernel and you've
booted it on two different machines?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 14, 2021 at 05:34:14PM +0300, Ville Syrjälä wrote:
> > > Tested the same disk on on both a 64bit capable Pentium D
> > > and a 32bit only Pentium 4 just to rule out the specific CPU.
> > > Busted on both.
> >
> > So that's a purely 32-bit installation and a 32-bit kernel and you've
> > booted it on two different machines?
>
> Yes.
This is insane, grrr!
So Ser's report was about an old 32-bit Intel CPU failing:
https://lore.kernel.org/all/[email protected]/T/#u
so if we revert, it'll break booting on his machine.
Can you give /proc/cpuinfo from your machines?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 14, 2021 at 05:34:14PM +0300, Ville Syrj?l? wrote:
> On Thu, Oct 14, 2021 at 04:27:07PM +0200, Borislav Petkov wrote:
> > On Thu, Oct 14, 2021 at 02:44:33PM +0300, Ville Syrj?l? wrote:
> > > I have a 32bit installation here that stopped working. Bisected it
> > > to commit d298b03506d3 ("x86/fpu: Restore the masking out of reserved
> > > MXCSR bits").
> >
> > Lemme make sure I understand this correctly: this patch is bad and with
> > it reverted it works?
>
> Yes.
>
> >
> > Because before this patch, the restoring would be a more restrictive
> > than before and this patch reverts the code to the old behavior for
> > invalid MXCSR bits.
Yeah, it's a bit weird.
Hmm. Actually I just stared at the code a bit more it looks
a bit funny. Was it supposed to do this instead?
- fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
+ fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 05:43:14PM +0300, Ville Syrjälä wrote:
> Hmm. Actually I just stared at the code a bit more it looks
> a bit funny. Was it supposed to do this instead?
>
> - fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
> + fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
Whoops, I had it like that in the original patch:
https://lore.kernel.org/all/163354193576.25758.8132624386883258818.tip-bot2@tip-bot2/
I blame tglx. :-)
Does it work if you remove the mask negation "~"?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 14, 2021 at 06:03:46PM +0300, Ville Syrj?l? wrote:
> On Thu, Oct 14, 2021 at 04:56:56PM +0200, Borislav Petkov wrote:
> > On Thu, Oct 14, 2021 at 05:43:14PM +0300, Ville Syrj?l? wrote:
> > > Hmm. Actually I just stared at the code a bit more it looks
> > > a bit funny. Was it supposed to do this instead?
> > >
> > > - fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
> > > + fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
> >
> > Whoops, I had it like that in the original patch:
> >
> > https://lore.kernel.org/all/163354193576.25758.8132624386883258818.tip-bot2@tip-bot2/
> >
> > I blame tglx. :-)
> >
> > Does it work if you remove the mask negation "~"?
>
> The machine is currently preoccupied with other things. Should free up
> in an hour or two. Once it does I'll give this a spin and report back.
That ~ was indeed the problem. With it gone the machine is happy again.
I presume you'll turn this into a real patch?
Tested-by: Ville Syrj?l? <[email protected]>
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 08:45:33PM +0300, Ville Syrjälä wrote:
> That ~ was indeed the problem. With it gone the machine is happy again.
>
> I presume you'll turn this into a real patch?
Actually, you found it and you should be the one to write it and do the
honors. Unless you don't want to - then I can do it.
If you do, pls add
Ser Olmy <[email protected]>
to Cc so that he can test your patch. I *think* it should work for him
too but I don't know anything anymore. :-)
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 14, 2021 at 08:01:24PM +0200, Borislav Petkov wrote:
> On Thu, Oct 14, 2021 at 08:45:33PM +0300, Ville Syrj?l? wrote:
> > That ~ was indeed the problem. With it gone the machine is happy again.
> >
> > I presume you'll turn this into a real patch?
>
> Actually, you found it and you should be the one to write it and do the
> honors. Unless you don't want to - then I can do it.
I figured you can write a reasonably succinct commit message, instead
of having me ramble on incoherently. ATM I don't even know what mxcsr
is or why clobbering it would cause floating point exceptions with
sse specifically.
But I can certainly ramble, if you prefer that.
>
> If you do, pls add
>
> Ser Olmy <[email protected]>
>
> to Cc so that he can test your patch. I *think* it should work for him
> too but I don't know anything anymore. :-)
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
--
Ville Syrj?l?
Intel
On Thu, Oct 14, 2021 at 09:46:50PM +0300, Ville Syrjälä wrote:
> I figured you can write a reasonably succinct commit message, instead
> of having me ramble on incoherently. ATM I don't even know what mxcsr
> is or why clobbering it would cause floating point exceptions with
> sse specifically.
>
> But I can certainly ramble, if you prefer that.
Well, you can simply say that d298b03506d3 was supposed to mask out the
reserved bits in the MXCSR register but the author mistakenly used the
negation of the mask.
And if the commit message is not explaining stuff properly, I'll fix it
up, no worries there.
But your call - I'm done for today and I'll do a patch tomorrow. That
is, unless you haven't decided that you wish to write one in the
meantime and you've sent it to me overnight.
:-)))
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Ok, here it is.
Ser, I'd appreciate you running it too, to make sure your box is still
ok.
Thx.
---
From: Borislav Petkov <[email protected]>
Date: Fri, 15 Oct 2021 12:46:25 +0200
Subject: [PATCH] x86/fpu: Mask out the invalid MXCSR bits properly
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is a fix for the fix (yeah, /facepalm).
The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.
Reported and debugged by Ville Syrjälä <[email protected]>
Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/fpu/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index fa17a27390ab..831b25c5e705 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -385,7 +385,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
return -EINVAL;
} else {
/* Mask invalid bits out for historical reasons (broken hardware). */
- fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
+ fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
}
/* Enforce XFEATURE_MASK_FPSSE when XSAVE is enabled */
--
2.29.2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Friday, October 15th, 2021 at 1:04 PM, Borislav Petkov <[email protected]> wrote:
> Ok, here it is.
>
> Ser, I'd appreciate you running it too, to make sure your box is still ok.
Tested-by: [email protected]
Working fine here with the patch applied to a stock 5.14.12 kernel.
Regards,
Olmy
On Sat, Oct 16, 2021 at 07:26:25AM +0000, Ser Olmy wrote:
> Tested-by: [email protected]
>
> Working fine here with the patch applied to a stock 5.14.12 kernel.
Thanks!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b2381acd3fd9bacd2c63f53b2c610c89959b31cc
Gitweb: https://git.kernel.org/tip/b2381acd3fd9bacd2c63f53b2c610c89959b31cc
Author: Borislav Petkov <[email protected]>
AuthorDate: Fri, 15 Oct 2021 12:46:25 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Sat, 16 Oct 2021 12:37:50 +02:00
x86/fpu: Mask out the invalid MXCSR bits properly
This is a fix for the fix (yeah, /facepalm).
The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.
Reported and debugged by Ville Syrjälä <[email protected]>
Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Ville Syrjälä <[email protected]>
Tested-by: Ser Olmy <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/fpu/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index fa17a27..831b25c 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -385,7 +385,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
return -EINVAL;
} else {
/* Mask invalid bits out for historical reasons (broken hardware). */
- fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
+ fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
}
/* Enforce XFEATURE_MASK_FPSSE when XSAVE is enabled */
On Fri, Oct 15, 2021 at 01:04:28PM +0200, Borislav Petkov wrote:
> Ok, here it is.
Thanks. I got distracted by other shiny objects anyway, so
wouldn't even have gotten to cooking up a proper patch until
now.
>
> Ser, I'd appreciate you running it too, to make sure your box is still
> ok.
>
> Thx.
>
> ---
> From: Borislav Petkov <[email protected]>
> Date: Fri, 15 Oct 2021 12:46:25 +0200
> Subject: [PATCH] x86/fpu: Mask out the invalid MXCSR bits properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This is a fix for the fix (yeah, /facepalm).
>
> The correct mask to use is not the negation of the MXCSR_MASK but the
> actual mask which contains the supported bits in the MXCSR register.
>
> Reported and debugged by Ville Syrj?l? <[email protected]>
>
> Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> arch/x86/kernel/fpu/signal.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index fa17a27390ab..831b25c5e705 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -385,7 +385,7 @@ static int __fpu_restore_sig(void __user *buf, void __user *buf_fx,
> return -EINVAL;
> } else {
> /* Mask invalid bits out for historical reasons (broken hardware). */
> - fpu->state.fxsave.mxcsr &= ~mxcsr_feature_mask;
> + fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
> }
>
> /* Enforce XFEATURE_MASK_FPSSE when XSAVE is enabled */
> --
> 2.29.2
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
--
Ville Syrj?l?
Intel