Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp383079pxb; Wed, 3 Mar 2021 05:51:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJxwQHnOq5DSMBiypd+iFVL0U6i5MmUE/msHhmbLqcvNfeA+4ZrUEmZFoKWVKC2+Fe2wi3h9 X-Received: by 2002:aa7:c7c8:: with SMTP id o8mr14777197eds.176.1614779479735; Wed, 03 Mar 2021 05:51:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614779479; cv=none; d=google.com; s=arc-20160816; b=BVyj1oAz1MVZRSWE6e6WJvgL616LNwIZe6wdJCVCEzN4wr2eaZTWbj3loHWFyPvUWf myz9ooSvAQ7xhxX/DUkaCEzX2SRse1cxm3tP3FbBV3uQLD9o0H6BKnw5F8eINmDEngT4 TqKuLDGZjB4EBCuclkKSir/NPUHBsYi/6oP/MTheoLYvJWXNsRxS7fTLuvx3lGKxIWEp Fz5B0uiCMVhVamXgZmxDO4HRpfWA1f69IHUsn51PAX8p8iJPw2duGNT9SuShkK8fkEUd fq5+zVc6vapeUfqssLwy708Ay8hSv+lVpPQmdhtvC/Pgm3vpU4tCYGGMG4UVKxbEmk4p UGpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature:dkim-filter; bh=5iLiT4zCUvpdCRGwIN7OylfYUPKmTAZ9Z3KeVffXDOc=; b=YCkhhJApuKN1Gt5kG0JrPlMJig44zKQQ2UriiYW+p+uhvMEB0PYVB0+vbxMVEoM8bg BtyIz5DsgqgIj+AvuxMGB6iQ3RfLt7snwgdQCD2mIRIt1ksDN1SyD7B5uG+J4KSiAYTJ Ewzmh1Ff2zwwGLIFGrJjUY0Z8zLCeEQSpsojS4GqYPNIcsX9mOp+8DRF9WmykZB+OCQ/ 1MUrawPoK/vTNuZx1wgZqZXcS35epRdc/TumXjWMgP005A97RxC2t4idszSrEnjowG9x tyIu2qwiM/xiEU3PdMKGKsHVLyq6Gw8fv6Wd3Ngo3WIsIoCgUIAtTP6+ALfcMeEFXYLR wzqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dS1CAaa3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a2si17371149eda.350.2021.03.03.05.50.31; Wed, 03 Mar 2021 05:51:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=dS1CAaa3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237654AbhCAUHJ (ORCPT + 99 others); Mon, 1 Mar 2021 15:07:09 -0500 Received: from linux.microsoft.com ([13.77.154.182]:34560 "EHLO linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236027AbhCAQ7f (ORCPT ); Mon, 1 Mar 2021 11:59:35 -0500 Received: from [192.168.254.32] (unknown [47.187.194.202]) by linux.microsoft.com (Postfix) with ESMTPSA id DA81A20B57A1; Mon, 1 Mar 2021 08:58:51 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com DA81A20B57A1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1614617932; bh=5iLiT4zCUvpdCRGwIN7OylfYUPKmTAZ9Z3KeVffXDOc=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=dS1CAaa3yJ0UPOnsU90ZhqIFUaGnM5whBFQxd4ni8FzqozwSNdhHqCBBMfo4UpdRZ yNWiIqBtGNJ7FyPngJzbZcDNLT1EqeU3tLQjuKaN0CVz+YPzSYOuTqxKLDikiqRP09 sQglsDNjAYMPV/CAe9LfgyevWmJNRu5VzfExbUXM= Subject: Re: [RFC PATCH v1 1/1] arm64: Unwinder enhancements for reliable stack trace To: Mark Rutland Cc: broonie@kernel.org, jpoimboe@redhat.com, jthierry@redhat.com, linux-arm-kernel@lists.infradead.org, live-patching@vger.kernel.org, linux-kernel@vger.kernel.org References: <20210223181243.6776-1-madvenka@linux.microsoft.com> <20210223181243.6776-2-madvenka@linux.microsoft.com> <20210224121716.GE50741@C02TD0UTHF1T.local> <4a96b31d-0d16-1f12-fa64-5fdae64cd2d1@linux.microsoft.com> <20210225115825.GB37015@C02TD0UTHF1T.local> From: "Madhavan T. Venkataraman" Message-ID: <0227d386-c392-eb5a-3f52-621a637e46a8@linux.microsoft.com> Date: Mon, 1 Mar 2021 10:58:51 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210225115825.GB37015@C02TD0UTHF1T.local> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/25/21 5:58 AM, Mark Rutland wrote: > On Wed, Feb 24, 2021 at 01:34:09PM -0600, Madhavan T. Venkataraman wrote: >> On 2/24/21 6:17 AM, Mark Rutland wrote: >>> On Tue, Feb 23, 2021 at 12:12:43PM -0600, madvenka@linux.microsoft.com wrote: >>>> From: "Madhavan T. Venkataraman" >>>> Termination >>>> =========== >>>> >>>> Currently, the unwinder terminates when both the FP (frame pointer) >>>> and the PC (return address) of a frame are 0. But a frame could get >>>> corrupted and zeroed. There needs to be a better check. >>>> >>>> The following special terminating frame and function have been >>>> defined for this purpose: >>>> >>>> const u64 arm64_last_frame[2] __attribute__ ((aligned (16))); >>>> >>>> void arm64_last_func(void) >>>> { >>>> } >>>> >>>> So, set the FP to arm64_last_frame and the PC to arm64_last_func in >>>> the bottom most frame. >>> >>> My expectation was that we'd do this per-task, creating an empty frame >>> record (i.e. with fp=NULL and lr=NULL) on the task's stack at the >>> instant it was created, and chaining this into x29. That way the address >>> is known (since it can be derived from the task), and the frame will >>> also implicitly check that the callchain terminates on the task stack >>> without loops. That also means that we can use it to detect the entry >>> code going wrong (e.g. if the SP gets corrupted), since in that case the >>> entry code would place the record at a different location. >> >> That is exactly what this is doing. arm64_last_frame[] is a marker frame >> that contains fp=0 and pc=0. > > Almost! What I meant was that rather that each task should have its own > final/marker frame record on its task task rather than sharing a common > global variable. > > That way a check for that frame record implicitly checks that a task > started at the expected location *on that stack*, which catches > additional stack corruption cases (e.g. if we left data on the stack > prior to an EL0 entry). > Ok. I will think about this. > [...] > >>> ... I reckon once we've moved the last of the exception triage out to C >>> this will be relatively simple, since all of the exception handlers will >>> look like: >>> >>> | SYM_CODE_START_LOCAL(elX_exception) >>> | kernel_entry X >>> | mov x0, sp >>> | bl elX_exception_handler >>> | kernel_exit X >>> | SYM_CODE_END(elX_exception) >>> >>> ... and so we just need to identify the set of elX_exception functions >>> (which we'll never expect to take exceptions from directly). We could be >>> strict and reject unwinding into arbitrary bits of the entry code (e.g. >>> if we took an unexpected exception), and only permit unwinding to the >>> BL. >>> >>>> It can do this if the FP and PC are also recorded elsewhere in the >>>> pt_regs for comparison. Currently, the FP is also stored in >>>> regs->regs[29]. The PC is stored in regs->pc. However, regs->pc can >>>> be changed by lower level functions. >>>> >>>> Create a new field, pt_regs->orig_pc, and record the return address >>>> PC there. With this, the unwinder can validate the exception frame >>>> and set a flag so that the caller of the unwinder can know when >>>> an exception frame is encountered. >>> >>> I don't understand the case you're trying to solve here. When is >>> regs->pc changed in a way that's problematic? >>> >> >> For instance, I used a test driver in which the driver calls a function >> pointer which is NULL. The low level fault handler sends a signal to the >> task. Looks like it changes regs->pc for this. > > I'm struggling to follow what you mean here. > > If the kernel branches to NULL, the CPU should raise an instruction > abort from the current EL, and our handling of that should terminate the > thread via die_kernel_fault(), without returning to the faulting > context, and without altering the PC in the faulting context. > > Signals are delivered to userspace and alter the userspace PC, not a > kernel PC, so this doesn't seem relevant. Do you mean an exception fixup > handler rather than a signal? > >> When I dump the stack from the low level handler, the comparison with >> regs->pc does not work. But comparison with regs->orig_pc works. > > As above, I'm struggling with this; could you share a concrete example? > Actually, my bad. I needed the orig_pc because of something that my test driver did. And, it slipped my mind entirely. Thanks for pointing it out. I will fix this in my resend. Thanks again for your comments. I am currently studying probing/tracing. As soon as I have a solution for that, I will send out the next version. I look forward to the in-depth review. Thanks, Madhavan