Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2089391imu; Sat, 10 Nov 2018 07:32:22 -0800 (PST) X-Google-Smtp-Source: AJdET5enm+wGoA0qtfYcWRgd4Mg9mWhNnIqALl/7i4b1gBzhGZVuYKi6O65R7NrAJdcX4VfAH31+ X-Received: by 2002:a63:1560:: with SMTP id 32mr11162333pgv.383.1541863942044; Sat, 10 Nov 2018 07:32:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541863942; cv=none; d=google.com; s=arc-20160816; b=XUjfLImQK+hPPZ1sBd3s71eVFiT+YvbUTkIGJFeamm00I6zclOZwhbUG4vH6sqyAy5 NhFQA7wwIeopo/o1/M/3ylE93cckUMFSaqrrJt2XdDrErP1DK6q9ZrtK16bxJR8NlEA7 AJPyBuZpaIfZ1k0V+7DyHxO6tRl7UJhhXcqopWjKKby58ORPGs8YcSQlGqggJpBfZn6U IB7FcXtrFe+mWTizKBzmaGN82NbHO6ccoS5t2RTPv1dRR585RDVgC/S3PSX81K+4goJh UB6e8fgpx51dc/42pkGM1AOJZQs3VFAhGjEd6dof3daRquPje0VHD6GCrN6bwyXJ73yl twSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=MwCEJNB6hvrFY40DJeXHDMLwb9yvkbuNBRMnBK8ogrE=; b=Wp3lhsOllN7ox/3MpqOFkiyLtv7cT6snU/TQfEUbhbXFKu/FclNG55AVZpdZqjrsuJ 2hEMcTTEcwL/PjOwMy/1CbQ1Cv9zeHWS0cpDfdWdiZlvWYnVFiE9y9QZhLkT+vqNCsYR id+zwHG9pkRYGkvuVEtSttjtjgf8WbTjS2oH+ylGGpt9RK8xefasNrrE2QrphJqkThIo LM2wnZ1cbo1mWcvfanCRGPG0ncPvLCs44Cf6yO0mXP+SaIcOf51z4HSwZx0QPq4vtR6J zx1vlTYcbQbCKE8MgHTZAXzPiGpeglLmvU1QHpEgtTv8VZyMKSFw1NgBFgkbu1p2N4p9 SEEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="EfBJ/XH1"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r7-v6si11671715ple.339.2018.11.10.07.32.06; Sat, 10 Nov 2018 07:32:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="EfBJ/XH1"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726630AbeKKBRG (ORCPT + 99 others); Sat, 10 Nov 2018 20:17:06 -0500 Received: from mail.kernel.org ([198.145.29.99]:33630 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726068AbeKKBRG (ORCPT ); Sat, 10 Nov 2018 20:17:06 -0500 Received: from devbox (NE2965lan1.rev.em-net.ne.jp [210.141.244.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0B27A20818; Sat, 10 Nov 2018 15:31:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1541863902; bh=0vsm6gonEsw/OJilFDSuH1TJDtMdH/vsOpss6vbTnts=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=EfBJ/XH1MDsmbdLmvmmuowocZqtKJ8Y+nLfl7v0Kp8E8HcVIRNV6XhBa7UhvDIZGs YRpc7Q10JZaVZhf4oDrtkEXsbtuX8hA2NrcSmD8phtKgrRr7BZ1y/HcTMfR7u3GQSx 0BbDnffvQSdHMzNtnuqGmqD9C0qId9FzYzn10Gz0= Date: Sun, 11 Nov 2018 00:31:37 +0900 From: Masami Hiramatsu To: Aleksa Sarai Cc: Aleksa Sarai , Steven Rostedt , "Naveen N. Rao" , Anil S Keshavamurthy , "David S. Miller" , Jonathan Corbet , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Shuah Khan , Alexei Starovoitov , Daniel Borkmann , Brendan Gregg , Christian Brauner , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Josh Poimboeuf Subject: Re: [PATCH v3 1/2] kretprobe: produce sane stack traces Message-Id: <20181111003137.c9df7a077d983cde57c06ee8@kernel.org> In-Reply-To: <20181109150629.wpedwxsgbftkl3ab@mikami> References: <20181101204720.6ed3fe37@vmware.local.home> <20181102050509.tw3dhvj5urudvtjl@yavin> <20181102065932.bdt4pubbrkvql4mp@yavin> <20181102091658.1bc979a4@gandalf.local.home> <20181103070253.ajrqzs5xu2vf5stu@yavin> <20181104115913.74l4yzecisvtt2j5@yavin> <20181106171501.59ccabbc@gandalf.local.home> <20181108074612.ldy6rozdpsdps6bf@yavin> <20181108080448.rggfn4zawi3por23@yavin> <20181109161551.6b96bd7d932c71432ac65e83@kernel.org> <20181109150629.wpedwxsgbftkl3ab@mikami> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 10 Nov 2018 02:06:29 +1100 Aleksa Sarai wrote: > On 2018-11-09, Masami Hiramatsu wrote: > > > diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h > > > index ee696efec99f..c4dfafd43e11 100644 > > > --- a/arch/x86/include/asm/ptrace.h > > > +++ b/arch/x86/include/asm/ptrace.h > > > @@ -172,6 +172,7 @@ static inline unsigned long kernel_stack_pointer(struct pt_regs *regs) > > > return regs->sp; > > > } > > > #endif > > > +#define stack_addr(regs) ((unsigned long *) kernel_stack_pointer(regs)) > > > > No, you should use kernel_stack_pointer(regs) itself instead of stack_addr(). > > > > > > > > #define GET_IP(regs) ((regs)->ip) > > > #define GET_FP(regs) ((regs)->bp) > > > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c > > > index b0d1e81c96bb..eb4da885020c 100644 > > > --- a/arch/x86/kernel/kprobes/core.c > > > +++ b/arch/x86/kernel/kprobes/core.c > > > @@ -69,8 +69,6 @@ > > > DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; > > > DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); > > > > > > -#define stack_addr(regs) ((unsigned long *)kernel_stack_pointer(regs)) > > > > I don't like keeping this meaningless macro... this should be replaced with generic > > kernel_stack_pointer() macro. > > Sure. This patch was just an example -- I can remove stack_addr() all > over. > > > > - if (regs) > > > - save_stack_address(trace, regs->ip, nosched); > > > + if (regs) { > > > + /* XXX: Currently broken -- stack_addr(regs) doesn't match entry. */ > > > + addr = regs->ip; > > > > Since this part is for storing regs->ip as a top of call-stack, this > > seems correct code. Stack unwind will be done next block. > > This comment was referring to the usage of stack_addr(). stack_addr() > doesn't give you the right result (it isn't the address of the return > address -- it's slightly wrong). This is the main issue I was having -- > am I doing something wrong here? Of course stack_addr() actually just returns where the stack is. It should not return address, but maybe a return address from this event happens. Note that the "regs != NULL" means you will be in the interrupt handler and it will be returned to the regs->ip. > > > + //addr = ftrace_graph_ret_addr(current, &state.graph_idx, addr, stack_addr(regs)); > > > > so func graph return trampoline address will be shown only when unwinding stack entries. > > I mean func-graph tracer is not used as an event, so it never kicks stackdump. > > Just to make sure I understand what you're saying -- func-graph trace > will never actually call __ftrace_stack_trace? Because if it does, then > this code will be necessary (and then I'm a bit confused why the > unwinder has func-graph trace code -- if stack traces are never taken > under func-graph then the code in the unwinder is not necessary) You seems misunderstanding. Even if this is not called from func-graph tracer, the stack entries are already replaced with func-graph trampoline. However, regs->ip (IRQ return address) is never replaced by the func-graph trampoline. > My reason for commenting this out is because at this point "state" isn't > initialised and thus .graph_idx would not be correctly handled during > unwind (and it's the same reason I commented it out later). OK, but anyway, I think we don't need it. > > > + addr = kretprobe_ret_addr(current, addr, stack_addr(regs)); > > > > But since kretprobe will be an event, which can kick the stackdump. > > BTW, from kretprobe, regs->ip should always be the trampoline handler, > > see arch/x86/kernel/kprobes/core.c:772 :-) > > So it must be fixed always. > > Right, but kretprobe_ret_addr() is returning the *original* return > address (and we need to do an (addr == kretprobe_trampoline)). The > real problem is that stack_addr(regs) isn't the same as it is during > kretprobe setup (but kretprobe_ret_addr() works everywhere else). I think stack_addr(regs) should be same when this is called from kretprobe handler context. Otherwise, yes, it is not same, but in that case, regs->ip is not kretprobe_trampoline too. If you find kretprobe_trampoline on the "stack", of course it's address should be same as it is during kretprobe setup, but if you find kretprobe_trampoline on the regs->ip, that should always happen on kretprobe handler context. Otherwise, some critical violation happens on kretprobe_trampoline. In that case, we should dump the kretprobe_trampoline address itself, should not recover it. > > > @@ -1856,6 +1870,41 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) > > > } > > > NOKPROBE_SYMBOL(pre_handler_kretprobe); > > > > > > +unsigned long kretprobe_ret_addr(struct task_struct *tsk, unsigned long ret, > > > + unsigned long *retp) > > > +{ > > > + struct kretprobe_instance *ri; > > > + unsigned long flags = 0; > > > + struct hlist_head *head; > > > + bool need_lock; > > > + > > > + if (likely(ret != (unsigned long) &kretprobe_trampoline)) > > > + return ret; > > > + > > > + need_lock = !kretprobe_hash_is_locked(tsk); > > > + if (WARN_ON(need_lock)) > > > + kretprobe_hash_lock(tsk, &head, &flags); > > > + else > > > + head = kretprobe_inst_table_head(tsk); > > > > This may not work unless this is called from the kretprobe handler context, > > since if we are out of kretprobe handler context, another CPU can lock the > > hash table and it can be detected by kretprobe_hash_is_locked();. > > Yeah, I noticed this as well when writing it (but needed a quick impl > that I could test). I will fix this, thanks! > > By is_kretprobe_handler_context() I imagine you are referring to > checking is_kretprobe(current_kprobe())? yes, that's correct :) Thank you, > > > So, we should check we are in the kretprobe handler context if tsk == current, > > if not, we definately can lock the hash lock without any warning. This can > > be something like; > > > > if (is_kretprobe_handler_context()) { > > // kretprobe_hash_lock(current == tsk) has been locked by caller > > if (tsk != current && kretprobe_hash(tsk) != kretprobe_hash(current)) > > // the hash of tsk and current can be same. > > need_lock = true; > > } else > > // we should take a lock for tsk. > > need_lock = true; > > -- > Aleksa Sarai > Senior Software Engineer (Containers) > SUSE Linux GmbH > -- Masami Hiramatsu