2020-03-12 14:21:53

by Miroslav Benes

[permalink] [raw]
Subject: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

The unwinder reports the secondary CPU idle tasks' stack on XEN PV as
unreliable, which affects at least live patching.
cpu_initialize_context() sets up the context of the CPU through
VCPUOP_initialise hypercall. After it is woken up, the idle task starts
in cpu_bringup_and_idle() function and its stack starts at the offset
right below pt_regs. The unwinder correctly detects the end of stack
there but it is confused by NULL return address in the last frame.

RFC: I haven't found the way to teach the unwinder about the state of
the stack there. Thus the ugly hack using assembly. Similar to what
startup_xen() has got for boot CPU.

It introduces objtool "unreachable instruction" warning just right after
the jump to cpu_bringup_and_idle(). It should show the idea what needs
to be done though, I think. Ideas welcome.

Signed-off-by: Miroslav Benes <[email protected]>
---
arch/x86/xen/smp_pv.c | 3 ++-
arch/x86/xen/xen-head.S | 10 ++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 802ee5bba66c..6b88cdcbef8f 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -53,6 +53,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work) = { .irq = -1 };
static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };

static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id);
+extern unsigned char asm_cpu_bringup_and_idle[];

static void cpu_bringup(void)
{
@@ -309,7 +310,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
* pointing just below where pt_regs would be if it were a normal
* kernel entry.
*/
- ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
+ ctxt->user_regs.eip = (unsigned long)asm_cpu_bringup_and_idle;
ctxt->flags = VGCF_IN_KERNEL;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
ctxt->user_regs.ds = __USER_DS;
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 642f346bfe02..c9a9c0bb79ed 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -56,6 +56,16 @@ SYM_CODE_START(startup_xen)
1:
SYM_CODE_END(startup_xen)
__FINIT
+
+.pushsection .text
+SYM_CODE_START(asm_cpu_bringup_and_idle)
+ UNWIND_HINT_EMPTY
+
+ push $1f
+ jmp cpu_bringup_and_idle
+1:
+SYM_CODE_END(asm_cpu_bringup_and_idle)
+.popsection
#endif

.pushsection .text
--
2.25.1


2020-03-13 08:28:04

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

On 12.03.20 15:20, Miroslav Benes wrote:
> The unwinder reports the secondary CPU idle tasks' stack on XEN PV as
> unreliable, which affects at least live patching.
> cpu_initialize_context() sets up the context of the CPU through
> VCPUOP_initialise hypercall. After it is woken up, the idle task starts
> in cpu_bringup_and_idle() function and its stack starts at the offset
> right below pt_regs. The unwinder correctly detects the end of stack
> there but it is confused by NULL return address in the last frame.
>
> RFC: I haven't found the way to teach the unwinder about the state of
> the stack there. Thus the ugly hack using assembly. Similar to what
> startup_xen() has got for boot CPU.
>
> It introduces objtool "unreachable instruction" warning just right after
> the jump to cpu_bringup_and_idle(). It should show the idea what needs
> to be done though, I think. Ideas welcome.
>
> Signed-off-by: Miroslav Benes <[email protected]>
> ---
> arch/x86/xen/smp_pv.c | 3 ++-
> arch/x86/xen/xen-head.S | 10 ++++++++++
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 802ee5bba66c..6b88cdcbef8f 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -53,6 +53,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work) = { .irq = -1 };
> static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };
>
> static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id);
> +extern unsigned char asm_cpu_bringup_and_idle[];
>
> static void cpu_bringup(void)
> {

Would adding this here work?

+ asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));


Juergen

2020-03-13 09:54:48

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

On Fri, 13 Mar 2020, Jürgen Groß wrote:

> On 12.03.20 15:20, Miroslav Benes wrote:
> > The unwinder reports the secondary CPU idle tasks' stack on XEN PV as
> > unreliable, which affects at least live patching.
> > cpu_initialize_context() sets up the context of the CPU through
> > VCPUOP_initialise hypercall. After it is woken up, the idle task starts
> > in cpu_bringup_and_idle() function and its stack starts at the offset
> > right below pt_regs. The unwinder correctly detects the end of stack
> > there but it is confused by NULL return address in the last frame.
> >
> > RFC: I haven't found the way to teach the unwinder about the state of
> > the stack there. Thus the ugly hack using assembly. Similar to what
> > startup_xen() has got for boot CPU.
> >
> > It introduces objtool "unreachable instruction" warning just right after
> > the jump to cpu_bringup_and_idle(). It should show the idea what needs
> > to be done though, I think. Ideas welcome.
> >
> > Signed-off-by: Miroslav Benes <[email protected]>
> > ---
> > arch/x86/xen/smp_pv.c | 3 ++-
> > arch/x86/xen/xen-head.S | 10 ++++++++++
> > 2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> > index 802ee5bba66c..6b88cdcbef8f 100644
> > --- a/arch/x86/xen/smp_pv.c
> > +++ b/arch/x86/xen/smp_pv.c
> > @@ -53,6 +53,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work)
> > = { .irq = -1 };
> > static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };
> >
> > static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id);
> > +extern unsigned char asm_cpu_bringup_and_idle[];
> >
> > static void cpu_bringup(void)
> > {
>
> Would adding this here work?
>
> + asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));

I tried something similar. It did not work, because than the hint is
"bound" to the closest next call in the function which is cr4_init() in
this case. The unwinder would not take it into account.

In my case, I placed it at the beginning of cpu_bringup_and_idle(). I also
open coded it and played with the offset in the orc entry, but that did
not work for some other reason.

However, now I tried this

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6b88cdcbef8f..39afd88309cb 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -92,6 +92,7 @@ asmlinkage __visible void cpu_bringup_and_idle(void)
{
cpu_bringup();
boot_init_stack_canary();
+ asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
}

and that seems to work. I need to properly verify and test, but the
explanation is that as opposed to the above, cpu_startup_entry() is on the
idle task's stack and the hint is then taken into account. The unwound
stack seems to be complete, so it could indeed be the fix.

Thanks
Miroslav

2020-03-16 15:52:05

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

On Fri, 13 Mar 2020, Miroslav Benes wrote:

> On Fri, 13 Mar 2020, Jürgen Groß wrote:
>
> > On 12.03.20 15:20, Miroslav Benes wrote:
> > > The unwinder reports the secondary CPU idle tasks' stack on XEN PV as
> > > unreliable, which affects at least live patching.
> > > cpu_initialize_context() sets up the context of the CPU through
> > > VCPUOP_initialise hypercall. After it is woken up, the idle task starts
> > > in cpu_bringup_and_idle() function and its stack starts at the offset
> > > right below pt_regs. The unwinder correctly detects the end of stack
> > > there but it is confused by NULL return address in the last frame.
> > >
> > > RFC: I haven't found the way to teach the unwinder about the state of
> > > the stack there. Thus the ugly hack using assembly. Similar to what
> > > startup_xen() has got for boot CPU.
> > >
> > > It introduces objtool "unreachable instruction" warning just right after
> > > the jump to cpu_bringup_and_idle(). It should show the idea what needs
> > > to be done though, I think. Ideas welcome.
> > >
> > > Signed-off-by: Miroslav Benes <[email protected]>
> > > ---
> > > arch/x86/xen/smp_pv.c | 3 ++-
> > > arch/x86/xen/xen-head.S | 10 ++++++++++
> > > 2 files changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> > > index 802ee5bba66c..6b88cdcbef8f 100644
> > > --- a/arch/x86/xen/smp_pv.c
> > > +++ b/arch/x86/xen/smp_pv.c
> > > @@ -53,6 +53,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work)
> > > = { .irq = -1 };
> > > static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };
> > >
> > > static irqreturn_t xen_irq_work_interrupt(int irq, void *dev_id);
> > > +extern unsigned char asm_cpu_bringup_and_idle[];
> > >
> > > static void cpu_bringup(void)
> > > {
> >
> > Would adding this here work?
> >
> > + asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));
>
> I tried something similar. It did not work, because than the hint is
> "bound" to the closest next call in the function which is cr4_init() in
> this case. The unwinder would not take it into account.
>
> In my case, I placed it at the beginning of cpu_bringup_and_idle(). I also
> open coded it and played with the offset in the orc entry, but that did
> not work for some other reason.
>
> However, now I tried this
>
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 6b88cdcbef8f..39afd88309cb 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -92,6 +92,7 @@ asmlinkage __visible void cpu_bringup_and_idle(void)
> {
> cpu_bringup();
> boot_init_stack_canary();
> + asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));
> cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
> }
>
> and that seems to work. I need to properly verify and test, but the
> explanation is that as opposed to the above, cpu_startup_entry() is on the
> idle task's stack and the hint is then taken into account. The unwound
> stack seems to be complete, so it could indeed be the fix.

Not the correct one though. Objtool rightfully complains with

arch/x86/xen/smp_pv.o: warning: objtool: cpu_bringup_and_idle()+0x6a: undefined stack state

and all the other hacks I tried ended up in the same dead alley. It seems
to me the correct fix is that all orc entries for cpu_bringup_and_idle()
should have "end" property set to 1, since it is the first function on the
stack. I don't know how to achieve that without the assembly hack in the
patch I sent. If I am not missing something, of course.

Josh, any idea?

Thanks
Miroslav

2020-03-16 20:37:27

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

On Mon, Mar 16, 2020 at 04:51:12PM +0100, Miroslav Benes wrote:
> > diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> > index 6b88cdcbef8f..39afd88309cb 100644
> > --- a/arch/x86/xen/smp_pv.c
> > +++ b/arch/x86/xen/smp_pv.c
> > @@ -92,6 +92,7 @@ asmlinkage __visible void cpu_bringup_and_idle(void)
> > {
> > cpu_bringup();
> > boot_init_stack_canary();
> > + asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));
> > cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
> > }
> >
> > and that seems to work. I need to properly verify and test, but the
> > explanation is that as opposed to the above, cpu_startup_entry() is on the
> > idle task's stack and the hint is then taken into account. The unwound
> > stack seems to be complete, so it could indeed be the fix.
>
> Not the correct one though. Objtool rightfully complains with
>
> arch/x86/xen/smp_pv.o: warning: objtool: cpu_bringup_and_idle()+0x6a: undefined stack state
>
> and all the other hacks I tried ended up in the same dead alley. It seems
> to me the correct fix is that all orc entries for cpu_bringup_and_idle()
> should have "end" property set to 1, since it is the first function on the
> stack. I don't know how to achieve that without the assembly hack in the
> patch I sent. If I am not missing something, of course.
>
> Josh, any idea?

Yeah, I think mucking with the unwind hints in C code is going to be
precarious. You could maybe have something like

asm("
UNWIND_HINT_EMPTY\n
mov $CPUHP_AP_ONLINE_IDLE, %rdi\n
call cpu_startup_entry\n
)"
unreachable();

but that's pretty ugly (and it might not work anyway).

I suppose we could add a new facility to mark an entire C function as an
"end" point.

But I think it would be cleanest to just do something like your patch
and have the entry code be asm which then calls cpu_bringup_and_idle().
That would make it consistent with all other entry code, which all lives
in asm.

--
Josh

2020-03-17 09:16:49

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] x86/xen: Make the secondary CPU idle tasks reliable

On Mon, 16 Mar 2020, Josh Poimboeuf wrote:

> On Mon, Mar 16, 2020 at 04:51:12PM +0100, Miroslav Benes wrote:
> > > diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> > > index 6b88cdcbef8f..39afd88309cb 100644
> > > --- a/arch/x86/xen/smp_pv.c
> > > +++ b/arch/x86/xen/smp_pv.c
> > > @@ -92,6 +92,7 @@ asmlinkage __visible void cpu_bringup_and_idle(void)
> > > {
> > > cpu_bringup();
> > > boot_init_stack_canary();
> > > + asm volatile (UNWIND_HINT(ORC_REG_UNDEFINED, 0, ORC_TYPE_CALL, 1));
> > > cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
> > > }
> > >
> > > and that seems to work. I need to properly verify and test, but the
> > > explanation is that as opposed to the above, cpu_startup_entry() is on the
> > > idle task's stack and the hint is then taken into account. The unwound
> > > stack seems to be complete, so it could indeed be the fix.
> >
> > Not the correct one though. Objtool rightfully complains with
> >
> > arch/x86/xen/smp_pv.o: warning: objtool: cpu_bringup_and_idle()+0x6a: undefined stack state
> >
> > and all the other hacks I tried ended up in the same dead alley. It seems
> > to me the correct fix is that all orc entries for cpu_bringup_and_idle()
> > should have "end" property set to 1, since it is the first function on the
> > stack. I don't know how to achieve that without the assembly hack in the
> > patch I sent. If I am not missing something, of course.
> >
> > Josh, any idea?
>
> Yeah, I think mucking with the unwind hints in C code is going to be
> precarious. You could maybe have something like
>
> asm("
> UNWIND_HINT_EMPTY\n
> mov $CPUHP_AP_ONLINE_IDLE, %rdi\n
> call cpu_startup_entry\n
> )"
> unreachable();
>
> but that's pretty ugly (and it might not work anyway).
>
> I suppose we could add a new facility to mark an entire C function as an
> "end" point.

I think it would be an overkill for what I perceive as one-off scenario.
Maybe if there are more use cases in the future, but I doubt it.

> But I think it would be cleanest to just do something like your patch
> and have the entry code be asm which then calls cpu_bringup_and_idle().
> That would make it consistent with all other entry code, which all lives
> in asm.

Ack.

Thanks
Miroslav