Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1429822pxb; Tue, 26 Oct 2021 08:52:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIHDtiJRB1fOiz/5xbILX3KRa9GaeDsJOmj+OsVhl9n8LFWyNUJCA65FlRDyyQqgvzhFrc X-Received: by 2002:a17:907:6d87:: with SMTP id sb7mr21263504ejc.547.1635263558645; Tue, 26 Oct 2021 08:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635263558; cv=none; d=google.com; s=arc-20160816; b=kcKy6peYBXtHT8luQJ/RKvuJSP/SZUzGT8VhXLa+Eukk1uSY6QIZeBIfIp6NYQdkbY DYBXXpuGVcseWg5FRQv6UNlvPqxRZdws46A/qTI8bStOMMEn7LBbfHCzTZGsxx0qw/Y8 G15lCg7J6k8Yac8C9GcYlhOyJ4XuNdodfulRTysic10YmmpN2VMZwElk1W7MzxfMXLat wQFBf8SSO+xoJrlX5DTr+RdienWOh9KqDG+SS3bJoQMeyztp2SwFc2brr6pVi/qyIjq7 G5xLwYm/rv/jeju4TBiGVxJYdYl0zzsaLW9fvG6Q8JkmVL81l2A5v8ABOzcXP0A6B9Ua zI8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=kUCOP7SO5+SpieJU+QnIpxEZfGPKHHGfbjDHs/xM1Zk=; b=L12ZHoz3HmNiZvm8yvv2xrj6ipmsP9CtuInKDyq0VgJoU4tDAiGm5B3RqPMGsVenh3 jRxiQmRclXZG0YpMO6JI+6+ynG8qLoHdkfpkPd1FHvp/dZ4BKMKWHUs2o1aFNzxF8YKX nVHZm16M6UfiMp0ZL0U6JTq9p1VqJXuAApVcDt8+cO5mA8GuJfZ9ibaikf2PUD8FZFKp I5ksNmcxvUR8VnoL+KrBc7YtSBDtQpBi4frdNWUpAyuSCyvjl8DE7wHFwMe4sOLImgcw t5Ay/YRqZsmVj6cQPECjjuYfJWwq6cviMOOERCYI42g+CzzsetKgDBmmH4lBZE/U8E8K OMPQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f10si568555edj.60.2021.10.26.08.52.11; Tue, 26 Oct 2021 08:52:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235828AbhJZMIR (ORCPT + 99 others); Tue, 26 Oct 2021 08:08:17 -0400 Received: from foss.arm.com ([217.140.110.172]:57232 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235685AbhJZMH4 (ORCPT ); Tue, 26 Oct 2021 08:07:56 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 54E8C1FB; Tue, 26 Oct 2021 05:05:31 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.74.144]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A3E453F73D; Tue, 26 Oct 2021 05:05:28 -0700 (PDT) Date: Tue, 26 Oct 2021 13:05:16 +0100 From: Mark Rutland To: madvenka@linux.microsoft.com Cc: broonie@kernel.org, jpoimboe@redhat.com, ardb@kernel.org, nobuta.keiya@fujitsu.com, sjitindarsingh@gmail.com, catalin.marinas@arm.com, will@kernel.org, jmorris@namei.org, linux-arm-kernel@lists.infradead.org, live-patching@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v10 05/11] arm64: Make dump_stacktrace() use arch_stack_walk() Message-ID: <20211026120516.GA34073@C02TD0UTHF1T.local> References: <20211015025847.17694-1-madvenka@linux.microsoft.com> <20211015025847.17694-6-madvenka@linux.microsoft.com> <20211025164925.GB2001@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211025164925.GB2001@C02TD0UTHF1T.local> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 25, 2021 at 05:49:25PM +0100, Mark Rutland wrote: > From f3e66ca75aff3474355839f72d123276028204e1 Mon Sep 17 00:00:00 2001 > From: Mark Rutland > Date: Mon, 25 Oct 2021 13:23:11 +0100 > Subject: [PATCH] arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR > > When CONFIG_FUNCTION_GRAPH_TRACER is selected, and the function graph: > tracer is in use, unwind_frame() may erroneously asscociate a traced > function with an incorrect return address. This can happen when starting > an unwind from a pt_regs, or when unwinding across an exception > boundary. > > The underlying problem is that ftrace_graph_get_ret_stack() takes an > index offset from the most recent entry added to the fgraph return > stack. We start an unwind at offset 0, and increment the offset each > time we encounter `return_to_handler`, which indicates a rewritten > return address. This is broken in two cases: > > * Between creating a pt_regs and starting the unwind, function calls may > place entries on the stack, leaving an abitrary offset which we can > only determine by performing a full unwind from the caller of the > unwind code. While this initial unwind is open-coded in > dump_backtrace(), this is not performed for other unwinders such as > perf_callchain_kernel(). > > * When unwinding across an exception boundary (whether continuing an > unwind or starting a new unwind from regs), we always consume the LR > of the interrupted context, though this may not have been live at the > time of the exception. Where the LR was not live but happened to > contain `return_to_handler`, we'll recover an address from the graph > return stack and increment the current offset, leaving subsequent > entries off-by-one. > > Where the LR was not live and did not contain `return_to_handler`, we > will still report an erroneous address, but subsequent entries will be > unaffected. It turns out I had this backwards, and we currently always *skip* the LR when unwinding across regs, because: * The entry assembly creates a synthetic frame record with the original FP and the ELR_EL1 value (i.e. the PC at the point of the exception), skipping the LR. * In arch_stack_walk() we start the walk from regs->pc, and continue with the frame record, skipping the LR. * In the existing dump_backtrace, we skip until we hit a frame record whose FP value matches the FP in the regs (i.e. the synthetic frame record created by the entry assembly). That'll dump the ELR_EL1 value, then continue to the next frame record, skipping the LR. So case two is bogus, and only case one can happen today. This cleanup shouldn't trigger the WARN_ON_ONCE() in unwind_frame(), and we can fix the missing LR entry in a subsequent cleanup. Thanks, Mark.