Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3277770imu; Sun, 13 Jan 2019 23:23:51 -0800 (PST) X-Google-Smtp-Source: ALg8bN54Mgij+jcLqxv+LLyhk9mPGK04zOZ22qKnA0xvqMgTQIN4vpWiwY/0ZEZ56yLdErWbZnJ8 X-Received: by 2002:a63:34c3:: with SMTP id b186mr11331519pga.184.1547450631901; Sun, 13 Jan 2019 23:23:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547450631; cv=none; d=google.com; s=arc-20160816; b=U7tqsbPDVyDaz+cd/OGybZ4ZceGHYj8nwCmVYv60xPWXLI3pyu5ThE3hTExCyiUCtW q4Usr3+oC4/JeG5yHvFsfYeW54qrTuSCw0gX5Nzn4Sse3laCCgZZ2CDWrNXe8IjQly5V UvYjMjJLuh5NtEMsEwyLLLDs1eVa7XlOhUySbAQmoqaW+UoAOyU6Q6oe4tkrOZpP7iER QK9Rs8XStD302Mohd/3SxwGyXb0R3q4UzKBWoPDuMF493Bya9V8tPzVjwAaifhpsPm0J FTKc3QoCBquRTyP+HxUvamfi0VvkcIcM4fxyUOm0DQaGdxc/2MaNxpO4CEGmW598QdDe ahAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from; bh=NUIS2glZlB5SxJP/TL9d8vVJFlL2O8siZQfS1ypnl24=; b=gvm2ckegRNM4sgPU1oakCjJMAWwd3TmVF7en6GPfriGcQCqZuWFnhU5XvaQjT5bMga s35y4v7ZBiio+P8C/fOo79yz84UR9BM4YKltTC0NN1Q2xZiwm3mwg4uBlx3g4aaDaTwl LYgVY4SfUTIUE50y3MP8SIpio2eDqSlUsPB0Sna1yGpqY2xbwbUBkJoPthofLAqoZz8x pzp73M3pdM5mDG+hYA3bKzVsbzFZ1RXq3TcFn0XtGXgIcuw/Ou41+uPRH/zvnQYH8ftd mQufbS/RsQcRGF/r5J5023quyrhfa6WpjcXPh9Vp6iBoPYIPtFJD1HvJGBB56yT8WXal 8aGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u69si44542062pfj.219.2019.01.13.23.23.35; Sun, 13 Jan 2019 23:23:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726650AbfANHVn convert rfc822-to-8bit (ORCPT + 99 others); Mon, 14 Jan 2019 02:21:43 -0500 Received: from mx2.suse.de ([195.135.220.15]:39030 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726067AbfANHVn (ORCPT ); Mon, 14 Jan 2019 02:21:43 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3CDBAAE1D; Mon, 14 Jan 2019 07:21:41 +0000 (UTC) From: Nicolai Stange To: Joe Lawrence Cc: Nicolai Stange , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, live-patching@vger.kernel.org, Torsten Duwe , Michael Ellerman , Jiri Kosina , Balbir Singh Subject: Re: ppc64le reliable stack unwinder and scheduled tasks References: <7f468285-b149-37e2-e782-c9e538b997a9@redhat.com> <87bm4ocbbt.fsf@suse.de> <20190111010808.GA17858@redhat.com> <87fttzbpid.fsf@suse.de> <20190114040937.GA6739@redhat.com> Date: Mon, 14 Jan 2019 08:21:40 +0100 In-Reply-To: <20190114040937.GA6739@redhat.com> (Joe Lawrence's message of "Sun, 13 Jan 2019 23:09:37 -0500") Message-ID: <877ef7bt6j.fsf@suse.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Joe Lawrence writes: > On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote: >> Joe Lawrence writes: >> >> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote: > > [ ... snip ... ] > >> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER >> >> from _switch() helps? >> >> >> >> I.e. something like (completely untested): >> > >> > I'll kick off some builds tonight and try to get tests lined up >> > tomorrow. Unfortunately these take a bit of time to run setup, schedule >> > and complete, so perhaps by next week. >> >> Ok, that's probably still a good test for confirmation, but the solution >> you proposed below is much better. > > Good news, 40 tests (RHEL-7 backport) with clearing out the > STACK_FRAME_MARKER were all successful. Previously I would see stalled > livepatch transitions due to the unreliable report in ~10% of test > cases. > >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> >> index 435927f549c4..b747d0647ec4 100644 >> >> --- a/arch/powerpc/kernel/entry_64.S >> >> +++ b/arch/powerpc/kernel/entry_64.S >> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch) >> >> SAVE_8GPRS(14, r1) >> >> SAVE_10GPRS(22, r1) >> >> std r0,_NIP(r1) /* Return to switch caller */ >> >> + >> >> + li r23,0 >> >> + std r23,96(r1) /* 96 == STACK_FRAME_MARKER * sizeof(long) */ >> >> + >> >> mfcr r23 >> >> std r23,_CCR(r1) >> >> std r1,KSP(r3) /* Set old stack pointer */ >> >> >> >> >> > >> > This may be sufficient to avoid the condition, but if the contents of >> > frame 0 are truely uninitialized (not to be trusted), should the >> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER), >> > aside from the LR and other stack size geometry sanity checks? >> >> That's a very good point: we'll only ever be examining scheduled out tasks >> and current (which at that time is running klp_try_complete_transition()). >> >> current won't be in an interrupt/exception when it's walking its own >> stack. And scheduled out tasks can't have an exception/interrupt frame >> as their frame 0, correct? >> >> Thus, AFAICS, whenever klp_try_complete_transition() finds a >> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you >> said. > > I gave this about 20 tests and they were also all successful, what do > you think? (The log could probably use some trimming and cleanup.) I > think either solution would fix the issue. > > -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- > > From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001 > From: Joe Lawrence > Date: Fri, 11 Jan 2019 10:25:26 -0500 > Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception > check > > The ppc64le reliable stack tracer iterates over a given task's stack, > verifying a set of conditions to determine whether the trace is > 'reliable'. These checks include the presence of any exception frames. > > We should be careful when inspecting the bottom-most stack frame (the > first to be unwound), particularly for scheduled-out tasks. As Nicolai > Stange explains, "If I'm reading the code in _switch() correctly, the > first frame is completely uninitialized except for the pointer back to > the caller's stack frame." If a previous do_IRQ() invocation, for > example, has left a residual exception-marker on the first frame, the > stack tracer would incorrectly report this task's trace as unreliable. > FWIW, it's not been do_IRQ() who wrote the exception marker, but it's caller hardware_interrupt_common(), more specifically the EXCEPTION_PROLOG_COMMON_3 part therof. > save_stack_trace_tsk_reliable() already skips the first frame when > verifying the saved Link Register. It should do the same when looking > for the STACK_FRAME_REGS_MARKER. The task it is unwinding will be > either 'current', in which case the tracer will have never been called > from an exception path, or the task will be inactive and its first > frame's contents will be uninitialized (aside from backchain pointer). I thought about this a little more and can't see anything that would prevent higher, i.e. non-_switch() frames to also alias with prior exception frames. That STACK_FRAME_REGS_MARKER is written to a stack frame's "parameter area" and most functions probably don't initialize this either. So, AFAICS, higher stack frames could potentially be affected by the very same problem. I think the best solution would be to clear the STACK_FRAME_REGS_MARKER upon exception return. I have a patch ready for that and will post it after it has passed some basic testing -- hopefully later this day. That being said, I still think that your patch should also get applied in some form -- looking at unitialized memory is just not a good thing to do. > > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model") > Signed-off-by: Joe Lawrence > --- > arch/powerpc/kernel/stacktrace.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c > index e2c50b55138f..0793e75c45a6 100644 > --- a/arch/powerpc/kernel/stacktrace.c > +++ b/arch/powerpc/kernel/stacktrace.c > @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace) > EXPORT_SYMBOL_GPL(save_stack_trace_regs); > > #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE > +/* > + * This function returns an error if it detects any unreliable features of the > + * stack. Otherwise it guarantees that the stack trace is reliable. > + * > + * If the task is not 'current', the caller *must* ensure the task is inactive. > + */ > int > save_stack_trace_tsk_reliable(struct task_struct *tsk, > struct stack_trace *trace) > @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk, > return 1; > > /* Mark stacktraces with exception frames as unreliable. */ > - if (sp <= stack_end - STACK_INT_FRAME_SIZE && > + if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE && > stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { > return 1; > } I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also not emit the ip obtained from the first frame into the resulting trace. I.e., how about moving all the sp/newsp handling to the beginning of the loop and doing an 'if (firstframe) continue;' right after that? Thanks, Nicolai -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)