2022-01-06 08:35:45

by kernel test robot

[permalink] [raw]
Subject: [x86/entry_32] aa93e2ad74: BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#]



Greeting,

FYI, we noticed the following commit (built with clang-14):

commit: aa93e2ad7464ffb90155a5ffdde963816f86d5dc ("x86/entry_32: Remove .fixup usage")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core

in testcase: kernel-selftests
version:
with following parameters:

group: x86

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-----------------------------------------------------+------------+------------+
| | 16e617d05e | aa93e2ad74 |
+-----------------------------------------------------+------------+------------+
| boot_successes | 52 | 5 |
| boot_failures | 0 | 47 |
| BUG:kernel_hang_in_test_stage | 0 | 40 |
| BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#] | 0 | 7 |
| EIP:smp_call_function_many_cond | 0 | 7 |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 7 |
+-----------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 1153.108528][ C1] watchdog: BUG: soft lockup - CPU#1 stuck for 536s! [systemd-logind:1589]
[ 1153.113402][ C1] Modules linked in: bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm serio_raw drm_panel_orientation_quirks i2c_piix4 evbug intel_agp evdev rtc_cmos mac_hid intel_gtt agpgart stm_p_basic
[ 1153.123007][ C1] irq event stamp: 270688
[ 1153.126371][ C1] hardirqs last enabled at (270687): irqentry_exit (kernel/entry/common.c:?)
[ 1153.130734][ C1] hardirqs last disabled at (270688): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097)
[ 1153.135693][ C1] softirqs last enabled at (203498): do_softirq_own_stack (arch/x86/kernel/irq_32.c:60 arch/x86/kernel/irq_32.c:150)
[ 1153.140292][ C1] softirqs last disabled at (203423): do_softirq_own_stack (arch/x86/kernel/irq_32.c:60 arch/x86/kernel/irq_32.c:150)
[ 1153.144942][ C1] CPU: 1 PID: 1589 Comm: systemd-logind Not tainted 5.16.0-rc4-00015-gaa93e2ad7464 #1 724caf37a2ed720c19b702e6f6c942970fe22427
[ 1153.153337][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1153.158184][ C1] EIP: smp_call_function_many_cond (kernel/smp.c:440 kernel/smp.c:969)
[ 1153.162660][ C1] Code: 00 89 c7 3b 05 2c d7 97 c2 73 3a 8b 45 f0 8b 18 83 ff 20 73 20 8b 04 bd 84 94 4b c2 f7 44 18 04 01 00 00 00 74 d2 90 90 f3 90 <8b> 4c 03 04 f6 c1 01 75 f5 eb c3 57 68 80 7f 87 c2 e8 68 43 4d 00
All code
========
0: 00 89 c7 3b 05 2c add %cl,0x2c053bc7(%rcx)
6: d7 xlat %ds:(%rbx)
7: 97 xchg %eax,%edi
8: c2 73 3a retq $0x3a73
b: 8b 45 f0 mov -0x10(%rbp),%eax
e: 8b 18 mov (%rax),%ebx
10: 83 ff 20 cmp $0x20,%edi
13: 73 20 jae 0x35
15: 8b 04 bd 84 94 4b c2 mov -0x3db46b7c(,%rdi,4),%eax
1c: f7 44 18 04 01 00 00 testl $0x1,0x4(%rax,%rbx,1)
23: 00
24: 74 d2 je 0xfffffffffffffff8
26: 90 nop
27: 90 nop
28: f3 90 pause
2a:* 8b 4c 03 04 mov 0x4(%rbx,%rax,1),%ecx <-- trapping instruction
2e: f6 c1 01 test $0x1,%cl
31: 75 f5 jne 0x28
33: eb c3 jmp 0xfffffffffffffff8
35: 57 push %rdi
36: 68 80 7f 87 c2 pushq $0xffffffffc2877f80
3b: e8 68 43 4d 00 callq 0x4d43a8

Code starting with the faulting instruction
===========================================
0: 8b 4c 03 04 mov 0x4(%rbx,%rax,1),%ecx
4: f6 c1 01 test $0x1,%cl
7: 75 f5 jne 0xfffffffffffffffe
9: eb c3 jmp 0xffffffffffffffce
b: 57 push %rdi
c: 68 80 7f 87 c2 pushq $0xffffffffc2877f80
11: e8 68 43 4d 00 callq 0x4d437e
[ 1153.173395][ C1] EAX: 17173000 EBX: c2a89df0 ECX: 00000011 EDX: 00000001
[ 1153.178489][ C1] ESI: d9c17244 EDI: 00000000 EBP: c70bbcc8 ESP: c70bbc90
[ 1153.183282][ C1] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00000202
[ 1153.188245][ C1] CR0: 80050033 CR2: 005fd180 CR3: 060b3000 CR4: 00040690
[ 1153.193152][ C1] Call Trace:
[ 1153.197354][ C1] ? flush_tlb_all (arch/x86/mm/tlb.c:1027)
[ 1153.201557][ C1] on_each_cpu_cond_mask (kernel/smp.c:1135)
[ 1153.205806][ C1] flush_tlb_kernel_range (include/linux/smp.h:71 arch/x86/mm/tlb.c:1053)
[ 1153.209947][ C1] __kmap_flush_unused (include/linux/spinlock.h:389 mm/highmem.c:201)
[ 1153.213857][ C1] change_page_attr_set_clr (arch/x86/mm/pat/set_memory.c:1743)
[ 1153.217987][ C1] ? rcu_read_lock_sched_held (kernel/rcu/update.c:125)
[ 1153.222472][ C1] set_memory_ro (arch/x86/mm/pat/set_memory.c:1946)
[ 1153.226468][ C1] bpf_prog_select_runtime (include/linux/filter.h:?)
[ 1153.230241][ C1] bpf_prepare_filter (net/core/filter.c:? net/core/filter.c:1343)
[ 1153.234171][ C1] __get_filter (net/core/filter.c:1512)
[ 1153.238208][ C1] sk_attach_filter (net/core/filter.c:1527)
[ 1153.242013][ C1] sock_setsockopt (net/core/sock.c:?)
[ 1153.245655][ C1] __sys_setsockopt (net/socket.c:?)
[ 1153.249508][ C1] __ia32_sys_socketcall (net/socket.c:? net/socket.c:2901 net/socket.c:2901)
[ 1153.253576][ C1] ? kmem_cache_free (mm/slub.c:3501 mm/slub.c:3514 mm/slub.c:3530)
[ 1153.257465][ C1] ? put_cred_rcu (kernel/cred.c:127)
[ 1153.261292][ C1] ? put_cred_rcu (kernel/cred.c:127)
[ 1153.264756][ C1] ? rcu_lock_release (include/linux/rcupdate.h:274)
[ 1153.268279][ C1] ? put_cred_rcu (kernel/cred.c:127)
[ 1153.271688][ C1] ? syscall_enter_from_user_mode (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 kernel/entry/common.c:107)
[ 1153.275128][ C1] ? do_int80_syscall_32 (arch/x86/entry/common.c:110 arch/x86/entry/common.c:132)
[ 1153.278608][ C1] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:51)
[ 1153.282036][ C1] ? syscall_enter_from_user_mode (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 kernel/entry/common.c:107)
[ 1153.285382][ C1] do_int80_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:132)
[ 1153.289305][ C1] ? irqentry_exit_to_user_mode (kernel/entry/common.c:316)
[ 1153.293109][ C1] ? irqentry_exit (kernel/entry/common.c:441)
[ 1153.296533][ C1] ? exc_page_fault (arch/x86/mm/fault.c:1545)
[ 1153.300022][ C1] entry_INT80_32 (init_task.c:?)
[ 1153.303502][ C1] EIP: 0xb7f10092
[ 1153.306552][ C1] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e0 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00
All code
========
0: 00 00 add %al,(%rax)
2: 00 e9 add %ch,%cl
4: 90 nop
5: ff (bad)
6: ff (bad)
7: ff (bad)
8: ff a3 24 00 00 00 jmpq *0x24(%rbx)
e: 68 30 00 00 00 pushq $0x30
13: e9 80 ff ff ff jmpq 0xffffffffffffff98
18: ff a3 e0 ff ff ff jmpq *-0x20(%rbx)
1e: 66 90 xchg %ax,%ax
...
28: cd 80 int $0x80
2a:* c3 retq <-- trapping instruction
2b: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
32: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
38: 8b 1c 24 mov (%rsp),%ebx
3b: c3 retq
3c: 8d .byte 0x8d
3d: b4 26 mov $0x26,%ah
...

Code starting with the faulting instruction
===========================================
0: c3 retq
1: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
8: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
e: 8b 1c 24 mov (%rsp),%ebx
11: c3 retq
12: 8d .byte 0x8d
13: b4 26 mov $0x26,%ah


To reproduce:

# build kernel
cd linux
cp config-5.16.0-rc4-00015-gaa93e2ad7464 .config
make HOSTCC=clang-14 CC=clang-14 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-14 CC=clang-14 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (9.26 kB)
config-5.16.0-rc4-00015-gaa93e2ad7464 (147.96 kB)
job-script (4.71 kB)
dmesg.xz (25.95 kB)
Download all attachments

2022-01-11 11:11:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [x86/entry_32] aa93e2ad74: BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#]

On Thu, Jan 06, 2022 at 04:35:23PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with clang-14):
>
> commit: aa93e2ad7464ffb90155a5ffdde963816f86d5dc ("x86/entry_32: Remove .fixup usage")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
>
> in testcase: kernel-selftests
> version:
> with following parameters:
>
> group: x86
>

It would be very useful if this thing would also say which of the many
x86 selftests fails... it appears to be: ldt_gdt_32.

The below fixes it, but I'm still not entirely sure what the actual
problem is, although Andy did find a bug in that the exception handler
should do: *(ss:esp) = 0, adding ss-base (using insn_get_seg_base())
doesn't seem to cure things.

The below simply emulates the pop instruction and relies on the nested
RESTORE_REGS to actually set the segment register. The testcase now
successfully completes again.

---
arch/x86/entry/entry_32.S | 13 +++++++++----
arch/x86/include/asm/extable_fixup_types.h | 5 ++++-
arch/x86/lib/insn-eval.c | 5 +++++
arch/x86/mm/extable.c | 17 +++--------------
4 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e0a95d8a6553..fff7f2a08ff5 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -268,11 +268,16 @@
1: popl %ds
2: popl %es
3: popl %fs
- addl $(4 + \pop), %esp /* pop the unused "gs" slot */
+4: addl $(4 + \pop), %esp /* pop the unused "gs" slot */
IRET_FRAME
- _ASM_EXTABLE_TYPE(1b, 1b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(2b, 2b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(3b, 3b, EX_TYPE_POP_ZERO)
+
+ /*
+ * There is no _ASM_EXTABLE_TYPE_REG() for ASM, however since this is
+ * ASM the registers are known and we can trivially hard-code them.
+ */
+ _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_POP_ZERO|EX_DATA_REG(8))
+ _ASM_EXTABLE_TYPE(2b, 3b, EX_TYPE_POP_ZERO|EX_DATA_REG(9))
+ _ASM_EXTABLE_TYPE(3b, 4b, EX_TYPE_POP_ZERO|EX_DATA_REG(10))
.endm

.macro RESTORE_ALL_NMI cr3_reg:req pop=0
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index b5ab333e064a..81eedabc1ea1 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -16,6 +16,7 @@
#define EX_DATA_FLAG_SHIFT 12
#define EX_DATA_IMM_SHIFT 16

+#define EX_DATA_REG(reg) ((reg) << EX_DATA_REG_SHIFT)
#define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
#define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)

@@ -41,7 +42,9 @@
#define EX_TYPE_RDMSR_IN_MCE 13
#define EX_TYPE_DEFAULT_MCE_SAFE 14
#define EX_TYPE_FAULT_MCE_SAFE 15
-#define EX_TYPE_POP_ZERO 16
+
+#define EX_TYPE_POP_REG 16 /* reg := (long)imm */
+#define EX_TYPE_POP_ZERO (EX_TYPE_POP_REG | EX_DATA_IMM(0))

#define EX_TYPE_IMM_REG 17 /* reg := (long)imm */
#define EX_TYPE_EFAULT_REG (EX_TYPE_IMM_REG | EX_DATA_IMM(-EFAULT))
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 666b9c5672a2..b781d324211b 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -428,6 +428,11 @@ static const int pt_regoff[] = {
offsetof(struct pt_regs, r13),
offsetof(struct pt_regs, r14),
offsetof(struct pt_regs, r15),
+#else
+ offsetof(struct pt_regs, ds),
+ offsetof(struct pt_regs, es),
+ offsetof(struct pt_regs, fs),
+ offsetof(struct pt_regs, gs),
#endif
};

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 41eaa648349e..dba2197c05c3 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -126,18 +126,6 @@ static bool ex_handler_clear_fs(const struct exception_table_entry *fixup,
return ex_handler_default(fixup, regs);
}

-static bool ex_handler_pop_zero(const struct exception_table_entry *fixup,
- struct pt_regs *regs)
-{
- /*
- * Typically used for when "pop %seg" traps, in which case we'll clear
- * the stack slot and re-try the instruction, which will then succeed
- * to pop zero.
- */
- *((unsigned long *)regs->sp) = 0;
- return ex_handler_default(fixup, regs);
-}
-
static bool ex_handler_imm_reg(const struct exception_table_entry *fixup,
struct pt_regs *regs, int reg, int imm)
{
@@ -218,8 +206,9 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
case EX_TYPE_RDMSR_IN_MCE:
ex_handler_msr_mce(regs, false);
break;
- case EX_TYPE_POP_ZERO:
- return ex_handler_pop_zero(e, regs);
+ case EX_TYPE_POP_REG:
+ regs->sp += sizeof(long);
+ fallthrough;
case EX_TYPE_IMM_REG:
return ex_handler_imm_reg(e, regs, reg, imm);
case EX_TYPE_FAULT_SGX:



2022-01-12 01:29:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [x86/entry_32] aa93e2ad74: BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#]

On Tue, Jan 11, 2022, Peter Zijlstra wrote:
> On Thu, Jan 06, 2022 at 04:35:23PM +0800, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with clang-14):
> >
> > commit: aa93e2ad7464ffb90155a5ffdde963816f86d5dc ("x86/entry_32: Remove .fixup usage")
> > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
> >
> > in testcase: kernel-selftests
> > version:
> > with following parameters:
> >
> > group: x86
> >
>
> It would be very useful if this thing would also say which of the many
> x86 selftests fails... it appears to be: ldt_gdt_32.
>
> The below fixes it, but I'm still not entirely sure what the actual
> problem is, although Andy did find a bug in that the exception handler
> should do: *(ss:esp) = 0, adding ss-base (using insn_get_seg_base())
> doesn't seem to cure things.

Because I was curious...

The issue is that PARANOID_EXIT_TO_KERNEL_MODE in the handle_exception_return
path overwrites the entry stack data with the task stack data, restoring the "bad"
segment value.

2022-01-12 10:26:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [x86/entry_32] aa93e2ad74: BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#]

On Wed, Jan 12, 2022 at 01:28:58AM +0000, Sean Christopherson wrote:
> On Tue, Jan 11, 2022, Peter Zijlstra wrote:
> > On Thu, Jan 06, 2022 at 04:35:23PM +0800, kernel test robot wrote:
> > >
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with clang-14):
> > >
> > > commit: aa93e2ad7464ffb90155a5ffdde963816f86d5dc ("x86/entry_32: Remove .fixup usage")
> > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
> > >
> > > in testcase: kernel-selftests
> > > version:
> > > with following parameters:
> > >
> > > group: x86
> > >
> >
> > It would be very useful if this thing would also say which of the many
> > x86 selftests fails... it appears to be: ldt_gdt_32.
> >
> > The below fixes it, but I'm still not entirely sure what the actual
> > problem is, although Andy did find a bug in that the exception handler
> > should do: *(ss:esp) = 0, adding ss-base (using insn_get_seg_base())
> > doesn't seem to cure things.
>
> Because I was curious...
>
> The issue is that PARANOID_EXIT_TO_KERNEL_MODE in the handle_exception_return
> path overwrites the entry stack data with the task stack data, restoring the "bad"
> segment value.

Oh gawd... that's terrible, and yes, that now makes perfect sense.

However did you find that?

2022-01-12 10:56:03

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] x86/entry_32: Fix segment exceptions

On Wed, Jan 12, 2022 at 01:28:58AM +0000, Sean Christopherson wrote:
> On Tue, Jan 11, 2022, Peter Zijlstra wrote:
> > On Thu, Jan 06, 2022 at 04:35:23PM +0800, kernel test robot wrote:
> > >
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with clang-14):
> > >
> > > commit: aa93e2ad7464ffb90155a5ffdde963816f86d5dc ("x86/entry_32: Remove .fixup usage")
> > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
> > >
> > > in testcase: kernel-selftests
> > > version:
> > > with following parameters:
> > >
> > > group: x86
> > >
> >
> > It would be very useful if this thing would also say which of the many
> > x86 selftests fails... it appears to be: ldt_gdt_32.
> >
> > The below fixes it, but I'm still not entirely sure what the actual
> > problem is, although Andy did find a bug in that the exception handler
> > should do: *(ss:esp) = 0, adding ss-base (using insn_get_seg_base())
> > doesn't seem to cure things.
>
> Because I was curious...
>
> The issue is that PARANOID_EXIT_TO_KERNEL_MODE in the handle_exception_return
> path overwrites the entry stack data with the task stack data, restoring the "bad"
> segment value.

Full and proper patch below. Boris, if you could merge in x86/core that
branch should then be ready for a pull req.

---
Subject: x86/entry_32: Fix segment exceptions
From: Peter Zijlstra <[email protected]>
Date: Tue, 11 Jan 2022 12:11:14 +0100

The LKP robot reported that commit aa93e2ad7464 ("x86/entry_32: Remove
.fixup usage") caused failure. Turns out the ldt_gdt_32 selftest turns
into an infinite loop trying to clear the segment.

As discovered by Sean; what happens is that
PARANOID_EXIT_TO_KERNEL_MODE in the handle_exception_return path
overwrites the entry stack data with the task stack data, restoring
the "bad" segment value.

Instead of having the exception re-take the instruction, have it
emulate the full instruction. Replace EX_TYPE_POP_ZERO with
EX_TYPE_POP_REG which will do the equivalent of: POP %reg;
MOV $imm, %reg.

In order to encode the segment registers, add them as registers 8-11
for 32bit.

By setting regs->[defg]s the (nested) RESTORE_REGS will pop this value
at the end of the exception handler and by increasing regs->sp we'll
have skipped the stack slot.

Fixes: aa93e2ad7464 ("x86/entry_32: Remove .fixup usage")
Reported-by: kernel test robot <[email protected]>
Debugged-by: Sean Christopherson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/Yd1l0gInc4zRcnt/@hirez.programming.kicks-ass.net
---
arch/x86/entry/entry_32.S | 13 +++++++++----
arch/x86/include/asm/extable_fixup_types.h | 5 ++++-
arch/x86/lib/insn-eval.c | 5 +++++
arch/x86/mm/extable.c | 17 +++--------------
4 files changed, 21 insertions(+), 19 deletions(-)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -268,11 +268,16 @@
1: popl %ds
2: popl %es
3: popl %fs
- addl $(4 + \pop), %esp /* pop the unused "gs" slot */
+4: addl $(4 + \pop), %esp /* pop the unused "gs" slot */
IRET_FRAME
- _ASM_EXTABLE_TYPE(1b, 1b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(2b, 2b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(3b, 3b, EX_TYPE_POP_ZERO)
+
+ /*
+ * There is no _ASM_EXTABLE_TYPE_REG() for ASM, however since this is
+ * ASM the registers are known and we can trivially hard-code them.
+ */
+ _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_POP_ZERO|EX_DATA_REG(8))
+ _ASM_EXTABLE_TYPE(2b, 3b, EX_TYPE_POP_ZERO|EX_DATA_REG(9))
+ _ASM_EXTABLE_TYPE(3b, 4b, EX_TYPE_POP_ZERO|EX_DATA_REG(10))
.endm

.macro RESTORE_ALL_NMI cr3_reg:req pop=0
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -16,6 +16,7 @@
#define EX_DATA_FLAG_SHIFT 12
#define EX_DATA_IMM_SHIFT 16

+#define EX_DATA_REG(reg) ((reg) << EX_DATA_REG_SHIFT)
#define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
#define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)

@@ -41,7 +42,9 @@
#define EX_TYPE_RDMSR_IN_MCE 13
#define EX_TYPE_DEFAULT_MCE_SAFE 14
#define EX_TYPE_FAULT_MCE_SAFE 15
-#define EX_TYPE_POP_ZERO 16
+
+#define EX_TYPE_POP_REG 16 /* reg := (long)imm */
+#define EX_TYPE_POP_ZERO (EX_TYPE_POP_REG | EX_DATA_IMM(0))

#define EX_TYPE_IMM_REG 17 /* reg := (long)imm */
#define EX_TYPE_EFAULT_REG (EX_TYPE_IMM_REG | EX_DATA_IMM(-EFAULT))
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -428,6 +428,11 @@ static const int pt_regoff[] = {
offsetof(struct pt_regs, r13),
offsetof(struct pt_regs, r14),
offsetof(struct pt_regs, r15),
+#else
+ offsetof(struct pt_regs, ds),
+ offsetof(struct pt_regs, es),
+ offsetof(struct pt_regs, fs),
+ offsetof(struct pt_regs, gs),
#endif
};

--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -126,18 +126,6 @@ static bool ex_handler_clear_fs(const st
return ex_handler_default(fixup, regs);
}

-static bool ex_handler_pop_zero(const struct exception_table_entry *fixup,
- struct pt_regs *regs)
-{
- /*
- * Typically used for when "pop %seg" traps, in which case we'll clear
- * the stack slot and re-try the instruction, which will then succeed
- * to pop zero.
- */
- *((unsigned long *)regs->sp) = 0;
- return ex_handler_default(fixup, regs);
-}
-
static bool ex_handler_imm_reg(const struct exception_table_entry *fixup,
struct pt_regs *regs, int reg, int imm)
{
@@ -218,8 +206,9 @@ int fixup_exception(struct pt_regs *regs
case EX_TYPE_RDMSR_IN_MCE:
ex_handler_msr_mce(regs, false);
break;
- case EX_TYPE_POP_ZERO:
- return ex_handler_pop_zero(e, regs);
+ case EX_TYPE_POP_REG:
+ regs->sp += sizeof(long);
+ fallthrough;
case EX_TYPE_IMM_REG:
return ex_handler_imm_reg(e, regs, reg, imm);
case EX_TYPE_FAULT_SGX:

2022-01-12 15:42:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions

On Wed, Jan 12, 2022 at 11:55:41AM +0100, Peter Zijlstra wrote:
> Full and proper patch below. Boris, if you could merge in x86/core that
> branch should then be ready for a pull req.

I've got this as the final version. Scream if something's wrong.

---
From: Peter Zijlstra <[email protected]>
Date: Tue, 11 Jan 2022 12:11:14 +0100
Subject: [PATCH] x86/entry_32: Fix segment exceptions

The LKP robot reported that commit in Fixes: caused a failure. Turns out
the ldt_gdt_32 selftest turns into an infinite loop trying to clear the
segment.

As discovered by Sean, what happens is that PARANOID_EXIT_TO_KERNEL_MODE
in the handle_exception_return path overwrites the entry stack data with
the task stack data, restoring the "bad" segment value.

Instead of having the exception retry the instruction, have it emulate
the full instruction. Replace EX_TYPE_POP_ZERO with EX_TYPE_POP_REG
which will do the equivalent of: POP %reg; MOV $imm, %reg.

In order to encode the segment registers, add them as registers 8-11 for
32-bit.

By setting regs->[defg]s the (nested) RESTORE_REGS will pop this value
at the end of the exception handler and by increasing regs->sp, it will
have skipped the stack slot.

This was debugged by Sean Christopherson <[email protected]>.

[ bp: Add EX_REG_GS too. ]

Fixes: aa93e2ad7464 ("x86/entry_32: Remove .fixup usage")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/Yd1l0gInc4zRcnt/@hirez.programming.kicks-ass.net
---
arch/x86/entry/entry_32.S | 13 +++++++++----
arch/x86/include/asm/extable_fixup_types.h | 11 ++++++++++-
arch/x86/lib/insn-eval.c | 5 +++++
arch/x86/mm/extable.c | 17 +++--------------
4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e0a95d8a6553..a7ec22b1d06c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -268,11 +268,16 @@
1: popl %ds
2: popl %es
3: popl %fs
- addl $(4 + \pop), %esp /* pop the unused "gs" slot */
+4: addl $(4 + \pop), %esp /* pop the unused "gs" slot */
IRET_FRAME
- _ASM_EXTABLE_TYPE(1b, 1b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(2b, 2b, EX_TYPE_POP_ZERO)
- _ASM_EXTABLE_TYPE(3b, 3b, EX_TYPE_POP_ZERO)
+
+ /*
+ * There is no _ASM_EXTABLE_TYPE_REG() for ASM, however since this is
+ * ASM the registers are known and we can trivially hard-code them.
+ */
+ _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_POP_ZERO|EX_REG_DS)
+ _ASM_EXTABLE_TYPE(2b, 3b, EX_TYPE_POP_ZERO|EX_REG_ES)
+ _ASM_EXTABLE_TYPE(3b, 4b, EX_TYPE_POP_ZERO|EX_REG_FS)
.endm

.macro RESTORE_ALL_NMI cr3_reg:req pop=0
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index b5ab333e064a..503622627400 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -16,9 +16,16 @@
#define EX_DATA_FLAG_SHIFT 12
#define EX_DATA_IMM_SHIFT 16

+#define EX_DATA_REG(reg) ((reg) << EX_DATA_REG_SHIFT)
#define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
#define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)

+/* segment regs */
+#define EX_REG_DS EX_DATA_REG(8)
+#define EX_REG_ES EX_DATA_REG(9)
+#define EX_REG_FS EX_DATA_REG(10)
+#define EX_REG_GS EX_DATA_REG(11)
+
/* flags */
#define EX_FLAG_CLEAR_AX EX_DATA_FLAG(1)
#define EX_FLAG_CLEAR_DX EX_DATA_FLAG(2)
@@ -41,7 +48,9 @@
#define EX_TYPE_RDMSR_IN_MCE 13
#define EX_TYPE_DEFAULT_MCE_SAFE 14
#define EX_TYPE_FAULT_MCE_SAFE 15
-#define EX_TYPE_POP_ZERO 16
+
+#define EX_TYPE_POP_REG 16 /* sp += sizeof(long) */
+#define EX_TYPE_POP_ZERO (EX_TYPE_POP_REG | EX_DATA_IMM(0))

#define EX_TYPE_IMM_REG 17 /* reg := (long)imm */
#define EX_TYPE_EFAULT_REG (EX_TYPE_IMM_REG | EX_DATA_IMM(-EFAULT))
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 7760d228041b..c8a962c2e653 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -430,6 +430,11 @@ static const int pt_regoff[] = {
offsetof(struct pt_regs, r13),
offsetof(struct pt_regs, r14),
offsetof(struct pt_regs, r15),
+#else
+ offsetof(struct pt_regs, ds),
+ offsetof(struct pt_regs, es),
+ offsetof(struct pt_regs, fs),
+ offsetof(struct pt_regs, gs),
#endif
};

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 41eaa648349e..dba2197c05c3 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -126,18 +126,6 @@ static bool ex_handler_clear_fs(const struct exception_table_entry *fixup,
return ex_handler_default(fixup, regs);
}

-static bool ex_handler_pop_zero(const struct exception_table_entry *fixup,
- struct pt_regs *regs)
-{
- /*
- * Typically used for when "pop %seg" traps, in which case we'll clear
- * the stack slot and re-try the instruction, which will then succeed
- * to pop zero.
- */
- *((unsigned long *)regs->sp) = 0;
- return ex_handler_default(fixup, regs);
-}
-
static bool ex_handler_imm_reg(const struct exception_table_entry *fixup,
struct pt_regs *regs, int reg, int imm)
{
@@ -218,8 +206,9 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
case EX_TYPE_RDMSR_IN_MCE:
ex_handler_msr_mce(regs, false);
break;
- case EX_TYPE_POP_ZERO:
- return ex_handler_pop_zero(e, regs);
+ case EX_TYPE_POP_REG:
+ regs->sp += sizeof(long);
+ fallthrough;
case EX_TYPE_IMM_REG:
return ex_handler_imm_reg(e, regs, reg, imm);
case EX_TYPE_FAULT_SGX:
--
2.29.2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-01-12 16:15:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [x86/entry_32] aa93e2ad74: BUG:soft_lockup-CPU##stuck_for#s![systemd-logind:#]

On Wed, Jan 12, 2022, Peter Zijlstra wrote:
> On Wed, Jan 12, 2022 at 01:28:58AM +0000, Sean Christopherson wrote:
> > The issue is that PARANOID_EXIT_TO_KERNEL_MODE in the handle_exception_return
> > path overwrites the entry stack data with the task stack data, restoring the "bad"
> > segment value.
>
> Oh gawd... that's terrible, and yes, that now makes perfect sense.
>
> However did you find that?

printf and running under QEMU, which has a interactive "monitor" that lets you
read/write guest memory and can also do VA=>PA translations. Code inspection
once I realized the value on the stack was being restored between the exception
fixup and the POP.

2022-01-13 18:54:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions

On 1/12/22 07:42, Borislav Petkov wrote:
> On Wed, Jan 12, 2022 at 11:55:41AM +0100, Peter Zijlstra wrote:
>> Full and proper patch below. Boris, if you could merge in x86/core that
>> branch should then be ready for a pull req.
>
> I've got this as the final version. Scream if something's wrong.

AAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!


>
> ---
> From: Peter Zijlstra <[email protected]>
> Date: Tue, 11 Jan 2022 12:11:14 +0100
> Subject: [PATCH] x86/entry_32: Fix segment exceptions
>
> The LKP robot reported that commit in Fixes: caused a failure. Turns out
> the ldt_gdt_32 selftest turns into an infinite loop trying to clear the
> segment.
>
> As discovered by Sean, what happens is that PARANOID_EXIT_TO_KERNEL_MODE
> in the handle_exception_return path overwrites the entry stack data with
> the task stack data, restoring the "bad" segment value.
>
> Instead of having the exception retry the instruction, have it emulate
> the full instruction. Replace EX_TYPE_POP_ZERO with EX_TYPE_POP_REG
> which will do the equivalent of: POP %reg; MOV $imm, %reg.
>
> In order to encode the segment registers, add them as registers 8-11 for
> 32-bit.
>
> By setting regs->[defg]s the (nested) RESTORE_REGS will pop this value
> at the end of the exception handler and by increasing regs->sp, it will
> have skipped the stack slot.
>
> This was debugged by Sean Christopherson <[email protected]>.
>
> [ bp: Add EX_REG_GS too. ]
>
> Fixes: aa93e2ad7464 ("x86/entry_32: Remove .fixup usage")
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Link: https://lore.kernel.org/r/Yd1l0gInc4zRcnt/@hirez.programming.kicks-ass.net
> ---
> arch/x86/entry/entry_32.S | 13 +++++++++----
> arch/x86/include/asm/extable_fixup_types.h | 11 ++++++++++-
> arch/x86/lib/insn-eval.c | 5 +++++
> arch/x86/mm/extable.c | 17 +++--------------
> 4 files changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index e0a95d8a6553..a7ec22b1d06c 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -268,11 +268,16 @@
> 1: popl %ds
> 2: popl %es
> 3: popl %fs
> - addl $(4 + \pop), %esp /* pop the unused "gs" slot */
> +4: addl $(4 + \pop), %esp /* pop the unused "gs" slot */
> IRET_FRAME
> - _ASM_EXTABLE_TYPE(1b, 1b, EX_TYPE_POP_ZERO)
> - _ASM_EXTABLE_TYPE(2b, 2b, EX_TYPE_POP_ZERO)
> - _ASM_EXTABLE_TYPE(3b, 3b, EX_TYPE_POP_ZERO)
> +
> + /*
> + * There is no _ASM_EXTABLE_TYPE_REG() for ASM, however since this is
> + * ASM the registers are known and we can trivially hard-code them.
> + */
> + _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_POP_ZERO|EX_REG_DS)
> + _ASM_EXTABLE_TYPE(2b, 3b, EX_TYPE_POP_ZERO|EX_REG_ES)
> + _ASM_EXTABLE_TYPE(3b, 4b, EX_TYPE_POP_ZERO|EX_REG_FS)

Aside from POP_ZERO being a bit mystifying to a naive reader...

> .endm
>
> .macro RESTORE_ALL_NMI cr3_reg:req pop=0
> diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
> index b5ab333e064a..503622627400 100644
> --- a/arch/x86/include/asm/extable_fixup_types.h
> +++ b/arch/x86/include/asm/extable_fixup_types.h
> @@ -16,9 +16,16 @@
> #define EX_DATA_FLAG_SHIFT 12
> #define EX_DATA_IMM_SHIFT 16
>
> +#define EX_DATA_REG(reg) ((reg) << EX_DATA_REG_SHIFT)
> #define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
> #define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)
>
> +/* segment regs */
> +#define EX_REG_DS EX_DATA_REG(8)
> +#define EX_REG_ES EX_DATA_REG(9)
> +#define EX_REG_FS EX_DATA_REG(10)

These three seem likely to work

> +#define EX_REG_GS EX_DATA_REG(11)

But not this one.

> +
> /* flags */
> #define EX_FLAG_CLEAR_AX EX_DATA_FLAG(1)
> #define EX_FLAG_CLEAR_DX EX_DATA_FLAG(2)
> @@ -41,7 +48,9 @@
> #define EX_TYPE_RDMSR_IN_MCE 13
> #define EX_TYPE_DEFAULT_MCE_SAFE 14
> #define EX_TYPE_FAULT_MCE_SAFE 15
> -#define EX_TYPE_POP_ZERO 16
> +
> +#define EX_TYPE_POP_REG 16 /* sp += sizeof(long) */
> +#define EX_TYPE_POP_ZERO (EX_TYPE_POP_REG | EX_DATA_IMM(0))
>
> #define EX_TYPE_IMM_REG 17 /* reg := (long)imm */
> #define EX_TYPE_EFAULT_REG (EX_TYPE_IMM_REG | EX_DATA_IMM(-EFAULT))
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index 7760d228041b..c8a962c2e653 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -430,6 +430,11 @@ static const int pt_regoff[] = {
> offsetof(struct pt_regs, r13),
> offsetof(struct pt_regs, r14),
> offsetof(struct pt_regs, r15),
> +#else
> + offsetof(struct pt_regs, ds),
> + offsetof(struct pt_regs, es),
> + offsetof(struct pt_regs, fs),
> + offsetof(struct pt_regs, gs),

See the comment in asm/ptrace.h over gs :)

Fortunately nothing uses EX_REG_GS. Maybe just remove all the gs bits
and leave the rest alone?

2022-01-13 19:56:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions

On Thu, Jan 13, 2022 at 10:54:39AM -0800, Andy Lutomirski wrote:
> On 1/12/22 07:42, Borislav Petkov wrote:
> > On Wed, Jan 12, 2022 at 11:55:41AM +0100, Peter Zijlstra wrote:
> > > Full and proper patch below. Boris, if you could merge in x86/core that
> > > branch should then be ready for a pull req.
> >
> > I've got this as the final version. Scream if something's wrong.
>
> AAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!

:-)

> > + /*
> > + * There is no _ASM_EXTABLE_TYPE_REG() for ASM, however since this is
> > + * ASM the registers are known and we can trivially hard-code them.
> > + */
> > + _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_POP_ZERO|EX_REG_DS)
> > + _ASM_EXTABLE_TYPE(2b, 3b, EX_TYPE_POP_ZERO|EX_REG_ES)
> > + _ASM_EXTABLE_TYPE(3b, 4b, EX_TYPE_POP_ZERO|EX_REG_FS)
>
> Aside from POP_ZERO being a bit mystifying to a naive reader...
>
> > .endm
> > .macro RESTORE_ALL_NMI cr3_reg:req pop=0
> > diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
> > index b5ab333e064a..503622627400 100644
> > --- a/arch/x86/include/asm/extable_fixup_types.h
> > +++ b/arch/x86/include/asm/extable_fixup_types.h
> > @@ -16,9 +16,16 @@
> > #define EX_DATA_FLAG_SHIFT 12
> > #define EX_DATA_IMM_SHIFT 16
> > +#define EX_DATA_REG(reg) ((reg) << EX_DATA_REG_SHIFT)
> > #define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
> > #define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)
> > +/* segment regs */
> > +#define EX_REG_DS EX_DATA_REG(8)
> > +#define EX_REG_ES EX_DATA_REG(9)
> > +#define EX_REG_FS EX_DATA_REG(10)
>
> These three seem likely to work

On IRC Andrew also noted that these EX_REG_* things should be __i386__
only. Previosly when they lived as open-coded EX_DATA_REG() usage in
entry_32.S that was implied, but now ...

2022-01-14 11:24:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions

On Thu, Jan 13, 2022 at 08:55:37PM +0100, Peter Zijlstra wrote:
> On IRC Andrew also noted that these EX_REG_* things should be __i386__
> only. Previosly when they lived as open-coded EX_DATA_REG() usage in
> entry_32.S that was implied, but now ...

I guess that then below.

amluto, I'd love it if we (and by that I mean you :-)) could document
the rules for GS on 32-bit so that it is clear what's going on there...

entry_32.S is only hinting at what's going on, we have comments here and
there but not all concentrated into a single location.

Thx.

---
From: Borislav Petkov <[email protected]>
Date: Fri, 14 Jan 2022 12:15:21 +0100
Subject: [PATCH] x86/entry_32: Remove GS from the pt_regs offsets and fixup
regs

GS is special on 32-bit and segment exceptions fixup through it won't
work. Leave breadcrumbs for others not to walk into the same nasty.

Fixes: 9cdbeec40968 ("x86/entry_32: Fix segment exceptions")
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/extable_fixup_types.h | 5 +++--
arch/x86/lib/insn-eval.c | 5 ++++-
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 503622627400..0aa5f4d3234f 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -20,11 +20,12 @@
#define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
#define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)

-/* segment regs */
+#ifdef CONFIG_X86_32
+/* segment regs, valid only for 32-bit code, see pt_regoff */
#define EX_REG_DS EX_DATA_REG(8)
#define EX_REG_ES EX_DATA_REG(9)
#define EX_REG_FS EX_DATA_REG(10)
-#define EX_REG_GS EX_DATA_REG(11)
+#endif

/* flags */
#define EX_FLAG_CLEAR_AX EX_DATA_FLAG(1)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index b781d324211b..34001eee7482 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -432,7 +432,10 @@ static const int pt_regoff[] = {
offsetof(struct pt_regs, ds),
offsetof(struct pt_regs, es),
offsetof(struct pt_regs, fs),
- offsetof(struct pt_regs, gs),
+ /*
+ * Can't use that one - it is special - see entry_32.S
+ * offsetof(struct pt_regs, gs),
+ */
#endif
};

--
2.29.2


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-01-15 04:22:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions



On Fri, Jan 14, 2022, at 3:24 AM, Borislav Petkov wrote:
> On Thu, Jan 13, 2022 at 08:55:37PM +0100, Peter Zijlstra wrote:
>> On IRC Andrew also noted that these EX_REG_* things should be __i386__
>> only. Previosly when they lived as open-coded EX_DATA_REG() usage in
>> entry_32.S that was implied, but now ...
>
> I guess that then below.
>
> amluto, I'd love it if we (and by that I mean you :-)) could document
> the rules for GS on 32-bit so that it is clear what's going on there...
>
> entry_32.S is only hinting at what's going on, we have comments here and
> there but not all concentrated into a single location.

Acked-by: Andy Lutomirski <[email protected]>

>
> Thx.
>
> ---
> From: Borislav Petkov <[email protected]>
> Date: Fri, 14 Jan 2022 12:15:21 +0100
> Subject: [PATCH] x86/entry_32: Remove GS from the pt_regs offsets and fixup
> regs
>
> GS is special on 32-bit and segment exceptions fixup through it won't
> work. Leave breadcrumbs for others not to walk into the same nasty.
>
> Fixes: 9cdbeec40968 ("x86/entry_32: Fix segment exceptions")
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/include/asm/extable_fixup_types.h | 5 +++--
> arch/x86/lib/insn-eval.c | 5 ++++-
> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/extable_fixup_types.h
> b/arch/x86/include/asm/extable_fixup_types.h
> index 503622627400..0aa5f4d3234f 100644
> --- a/arch/x86/include/asm/extable_fixup_types.h
> +++ b/arch/x86/include/asm/extable_fixup_types.h
> @@ -20,11 +20,12 @@
> #define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
> #define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)
>
> -/* segment regs */
> +#ifdef CONFIG_X86_32
> +/* segment regs, valid only for 32-bit code, see pt_regoff */
> #define EX_REG_DS EX_DATA_REG(8)
> #define EX_REG_ES EX_DATA_REG(9)
> #define EX_REG_FS EX_DATA_REG(10)
> -#define EX_REG_GS EX_DATA_REG(11)
> +#endif
>
> /* flags */
> #define EX_FLAG_CLEAR_AX EX_DATA_FLAG(1)
> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
> index b781d324211b..34001eee7482 100644
> --- a/arch/x86/lib/insn-eval.c
> +++ b/arch/x86/lib/insn-eval.c
> @@ -432,7 +432,10 @@ static const int pt_regoff[] = {
> offsetof(struct pt_regs, ds),
> offsetof(struct pt_regs, es),
> offsetof(struct pt_regs, fs),
> - offsetof(struct pt_regs, gs),
> + /*
> + * Can't use that one - it is special - see entry_32.S
> + * offsetof(struct pt_regs, gs),
> + */
> #endif
> };
>
> --
> 2.29.2
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2022-01-15 20:38:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/entry_32: Fix segment exceptions

On Fri, Jan 14, 2022 at 03:48:31PM -0800, Andy Lutomirski wrote:
> Acked-by: Andy Lutomirski <[email protected]>

I actually did this new version:

---
From: Borislav Petkov <[email protected]>
Subject: [PATCH] x86/entry_32: Remove GS from the pt_regs offsets and fixup regs

Document how GS (and its stack slot) on 32-bit are used.

Fixes: 9cdbeec40968 ("x86/entry_32: Fix segment exceptions")
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/entry/entry_32.S | 4 +++-
arch/x86/include/asm/extable_fixup_types.h | 5 +++--
arch/x86/lib/insn-eval.c | 5 ++++-
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a7ec22b1d06c..addc3966ee20 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -20,7 +20,9 @@
* 1C(%esp) - %ds
* 20(%esp) - %es
* 24(%esp) - %fs
- * 28(%esp) - unused -- was %gs on old stackprotector kernels
+ * 28(%esp) - unused -- was %gs on old stackprotector kernels. %gs is unused in
+ * kernel mode in 32-bit and holds the user value. When handling exceptions, the
+ * C-exception handler address is pushed into the GS-slot on the stack.
* 2C(%esp) - orig_eax
* 30(%esp) - %eip
* 34(%esp) - %cs
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 503622627400..0aa5f4d3234f 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -20,11 +20,12 @@
#define EX_DATA_FLAG(flag) ((flag) << EX_DATA_FLAG_SHIFT)
#define EX_DATA_IMM(imm) ((imm) << EX_DATA_IMM_SHIFT)

-/* segment regs */
+#ifdef CONFIG_X86_32
+/* segment regs, valid only for 32-bit code, see pt_regoff */
#define EX_REG_DS EX_DATA_REG(8)
#define EX_REG_ES EX_DATA_REG(9)
#define EX_REG_FS EX_DATA_REG(10)
-#define EX_REG_GS EX_DATA_REG(11)
+#endif

/* flags */
#define EX_FLAG_CLEAR_AX EX_DATA_FLAG(1)
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index b781d324211b..cfc4d13b7d5b 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -432,7 +432,10 @@ static const int pt_regoff[] = {
offsetof(struct pt_regs, ds),
offsetof(struct pt_regs, es),
offsetof(struct pt_regs, fs),
- offsetof(struct pt_regs, gs),
+ /*
+ * Can't use that one, see top of entry_32.S
+ * offsetof(struct pt_regs, gs),
+ */
#endif
};

--
2.29.2


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette