Received: by 10.213.65.68 with SMTP id h4csp1280950imn; Mon, 26 Mar 2018 04:41:07 -0700 (PDT) X-Google-Smtp-Source: AIpwx482tx7ka/5d/3UBRiWBB2spJf08UrB3C5dIZxYJDYH4FkUXKv3MPKzIqfj8adQLzAk5owqt X-Received: by 2002:a17:902:3e5:: with SMTP id d92-v6mr5628719pld.104.1522064467282; Mon, 26 Mar 2018 04:41:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522064467; cv=none; d=google.com; s=arc-20160816; b=g76qci4miPcLsVH96dFED8zcii/g6A6B2dhunxJ2ggmvjithMZs6Ysw1tY9PyD1Il2 /d214qU3HgfxGHJr5ebNt/04HrBJd5HVk20MZI1bExVYpiCLKde426oqjx22TL2OfFHW fpkbSJ3Rr/UKmoKgapbyCeTH3b350qdv1UBvXTq7ldrcUMmqPFIxotXxuVcaMms8l4al /gT2rcciyl+CHEb9Br+BhS+Wyh6PKeyjUHij1YlP4SJVsxL83FOeLCrd52/TSmi83NWs KDkgfxyTh2V2zgbHx5+gM6rhnRBoh4kDXoUZhLL86v5ac/YiH5bQh/KKjXTxb6wZXWqc dWng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=zedxR52zg1US2Lp2K7N73gBCwZXjFCu8aUExvYizrCc=; b=sechh461341+lFoCKAS8Rr4RIh5uiNn2UxcO4bMInJ4NDzYkq7TLfVhhkoCwV8Bite MghYAcB2b8pr8JrlS8CsHz1jbYDJCL2TS3xF0xSzAX4oT9mcQHxkBmMaUYal+PI9IpaN 8w+bp8D8KXCuobhutvEvvSyhu61rjdB98R26Ev67uRjLO/c2Yzjy19mBw1sQWv+ekGGl Fo740vwPVp2x+7qOXups6ETw+gqZ53A6u4cTsEI9sgpl7slwSc30iryqiK5azN0edX5U UBiG6bIji0+9qXKQhEWigzyMBtjsengPiKwbg+RVWxWra4/iihavAjqeNofOEEINz3T9 Ss6Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k3-v6si14429436pld.221.2018.03.26.04.40.53; Mon, 26 Mar 2018 04:41:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752031AbeCZLjj (ORCPT + 99 others); Mon, 26 Mar 2018 07:39:39 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:39806 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890AbeCZLji (ORCPT ); Mon, 26 Mar 2018 07:39:38 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 485E21529; Mon, 26 Mar 2018 04:39:38 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 953693F592; Mon, 26 Mar 2018 04:39:35 -0700 (PDT) Date: Mon, 26 Mar 2018 12:39:33 +0100 From: Mark Rutland To: "Ji.Zhang" Cc: Catalin Marinas , Will Deacon , Matthias Brugger , Ard Biesheuvel , James Morse , Dave Martin , Marc Zyngier , Michael Weiser , Julien Thierry , Xie XiuQi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com, shadanji@163.com Subject: Re: [PATCH] arm64: avoid race condition issue in dump_backtrace Message-ID: <20180326113932.2i6qp3776jtmcqk4@lakrids.cambridge.arm.com> References: <1521687960-3744-1-git-send-email-ji.zhang@mediatek.com> <20180322055929.z25brvwlmdighz66@salmiak> <1521711329.26617.31.camel@mtksdccf07> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1521711329.26617.31.camel@mtksdccf07> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote: > On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote: > > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote: > > > When we dump the backtrace of some specific task, there is a potential race > > > condition due to the task may be running on other cores if SMP enabled. > > > That is because for current implementation, if the task is not the current > > > task, we will get the registers used for unwind from cpu_context saved in > > > thread_info, which is the snapshot before context switch, but if the task > > > is running on other cores, the registers and the content of stack are > > > changed. > > > This may cause that we get the wrong backtrace or incomplete backtrace or > > > even crash the kernel. > > > > When do we call dump_backtrace() on a running task that is not current? > > > > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and > > this would have to be some caller of show_stack() in generic code. > Yes, show_stack() can make caller specify a task and dump its backtrace. > For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to > dump the backtrace of specific tasks. Ok. I see that this eventually calls show_state_filter(0), where we call sched_show_task() for every task. > > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped > > while we walk it. In unwind_frame() we check that the frame record falls > > entirely within the task's stack. So AFAICT, we cannot crash the kernel here, > > though the backtrace may be misleading (and we could potentially get stuck in > > an infinite loop). > You are right, I have checked the code and it seems that the check for > fp in unwind_frame() is strong enough to handle the case which stack > being changed due to task running. And as you mentioned, if > unfortunately fp is point to the address of itself, the unwind will be > an infinite loop, but it is a very small probability event, so we can > ignore this, is that right? I think that it would be preferable to try to avoid the inifinite loop case. We could hit that by accident if we're tracing a live task. It's a little tricky to ensure that we don't loop, since we can have traces that span several stacks, e.g. overflow -> irq -> task, so we need to know where the last frame was, and we need to defnie a strict order for stack nesting. > > > To avoid this case, do not dump the backtrace of the tasks which are > > > running on other cores. > > > This patch cannot solve the issue completely but can shrink the window of > > > race condition. > > > > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) > > > if (tsk == current) { > > > frame.fp = (unsigned long)__builtin_frame_address(0); > > > frame.pc = (unsigned long)dump_backtrace; > > > + else if (tsk->state == TASK_RUNNING) { > > > + pr_notice("Do not dump other running tasks\n"); > > > + return; > > > > As you note, if we can race with the task being scheduled, this doesn't help. > > > > Can we rule this out at a higher level? > > Thanks, > > Mark. > Actually, according to my previous understanding, the low level function > should be transparent to callers and should provide the right result and > handle some unexpected cases, which means that if the result may be > misleading, we should drop it. That is why I bypass all TASK_RUNNING > tasks. I am not sure if this understanding is reasonable for this case. Given that this can change under our feet, I think this only provides a false sense of security and complicates the code. > And as you mentioned that rule this out at a higher level, is there any > suggestions, handle this in the caller of show_stack()? Unfortunately, it doesn't look like we can do this in general given cases like sysrq-t. Thanks, Mark.