Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp639964ybz; Wed, 15 Apr 2020 15:43:57 -0700 (PDT) X-Google-Smtp-Source: APiQypIPhAsIKbX2mHMXpPgZZMjFzqe4RyC+f9VUqfndbgfkUtDa/p2YUjNTCH8AS7vl/42Tpk39 X-Received: by 2002:a17:906:c4f:: with SMTP id t15mr7296073ejf.193.1586990637677; Wed, 15 Apr 2020 15:43:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586990637; cv=none; d=google.com; s=arc-20160816; b=cUxdA8cccvbm6TVCfYEdAa8vey/5AsNADS7sVonJekIeNvlAbEdpOJOr4qamUKtBJj Endl6R9yiIaA8t/3CQC+nfgkGRAn1vl71k3XljZC90UzZrLWIxp3NqJDjR9HT+4Gy2nH 2Dp2pBO+SKM4zleLMBHZddMoOB1aRRwxqO/5fOwv/ISY55C1pdpPZYnBb2TbO69RWsjy 0sIcrZmUpmZdwSs4LRrJPhYMPq/yt/4MwDhLYnsIF069k/HVs83yCcIwuB7GeNaE86KD g2qui9o2UIWBX3vgwU0dJdHQsxKS63AoeoMld5FLlv1PKOGdMOXLQqSD79n8nj7AisQq PA0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=FPt9LCDfa/HfbhKaktspFlYSLG+Xrx8+0Nujdr48VHQ=; b=Ecfc5We8NkvtXHJiZM0/tvwCH1OlpYwzUq/T2U1aaa7sTT6BPSWWRRPOI1z5uF3v6v z314gj4zPqZmkEfa92i7W9Unb67+0zPnDDy28VQ3pbi9/l27pGsXWZ6Jc5jkvhCdnPUM 5mIWNByO8lTpZmn8+FNTzLrReIL5OhDYaVNIK0eh9RJMD9LlnOyntoBSMnMohUhRkMZ1 9jfNUz+ORAzor02y2mveMW3xOOtnJi0od6wdykJRcL5sbjc5+Yi98ggTgmPd1h6I6UL0 AYrkHKhP7/uxwFDNZsYPL2/T++IXCAeswg02XVRBepdHmtSqzhiBeqlcnzslCcdqTODD MpXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c0yWD0cp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g6si4972364edv.126.2020.04.15.15.43.34; Wed, 15 Apr 2020 15:43:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c0yWD0cp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393876AbgDOJGC (ORCPT + 99 others); Wed, 15 Apr 2020 05:06:02 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:26520 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2393831AbgDOJFV (ORCPT ); Wed, 15 Apr 2020 05:05:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586941518; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FPt9LCDfa/HfbhKaktspFlYSLG+Xrx8+0Nujdr48VHQ=; b=c0yWD0cpMmeZFH3ACyAxJMx1Et7Orru0BRVLNZ9x7Ii20Uu7qtFwbJCMplcIoHvZzXkEpz C9tx/tdK7LTe36aNMxSa8+ZoV+hp7cxJ+2jALH9xmXAC9EYqB1+8XW61XPtmm8cntEEaPD bQa1tOAz+iWkdX6YDYiSy/gEhNjKois= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-478-s037hFZ2PCaaPPVrCVca9w-1; Wed, 15 Apr 2020 05:05:14 -0400 X-MC-Unique: s037hFZ2PCaaPPVrCVca9w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 29F4B107ACC7; Wed, 15 Apr 2020 09:05:13 +0000 (UTC) Received: from krava (unknown [10.40.193.71]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7162A10013A1; Wed, 15 Apr 2020 09:05:10 +0000 (UTC) Date: Wed, 15 Apr 2020 11:05:07 +0200 From: Jiri Olsa To: Masami Hiramatsu Cc: Jiri Olsa , "Naveen N. Rao" , Anil S Keshavamurthy , "David S. Miller" , Peter Zijlstra , lkml , "bibo,mao" , "Ziqian SUN (Zamir)" Subject: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task Message-ID: <20200415090507.GG208694@krava> References: <20200408164641.3299633-1-jolsa@kernel.org> <20200409234101.8814f3cbead69337ac5a33fa@kernel.org> <20200409184451.GG3309111@krava> <20200409201336.GH3309111@krava> <20200410093159.0d7000a08fd76c2eaf1398f8@kernel.org> <20200414160338.GE208694@krava> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200414160338.GE208694@krava> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave. My test was also able to trigger lockdep output: ============================================ WARNING: possible recursive locking detected 5.6.0-rc6+ #6 Not tainted -------------------------------------------- sched-messaging/2767 is trying to acquire lock: ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0 but task is already holding lock: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(kretprobe_table_locks[i].lock)); lock(&(kretprobe_table_locks[i].lock)); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by sched-messaging/2767: #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 stack backtrace: CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6 Call Trace: dump_stack+0x96/0xe0 __lock_acquire.cold.57+0x173/0x2b7 ? native_queued_spin_lock_slowpath+0x42b/0x9e0 ? lockdep_hardirqs_on+0x590/0x590 ? __lock_acquire+0xf63/0x4030 lock_acquire+0x15a/0x3d0 ? kretprobe_hash_lock+0x52/0xa0 _raw_spin_lock_irqsave+0x36/0x70 ? kretprobe_hash_lock+0x52/0xa0 kretprobe_hash_lock+0x52/0xa0 trampoline_handler+0xf8/0x940 ? kprobe_fault_handler+0x380/0x380 ? find_held_lock+0x3a/0x1c0 kretprobe_trampoline+0x25/0x50 ? lock_acquired+0x392/0xbc0 ? _raw_spin_lock_irqsave+0x50/0x70 ? __get_valid_kprobe+0x1f0/0x1f0 ? _raw_spin_unlock_irqrestore+0x3b/0x40 ? finish_task_switch+0x4b9/0x6d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 The code within the kretprobe handler checks for probe reentrancy, so we won't trigger any _raw_spin_lock_irqsave probe in there. The problem is in outside kprobe_flush_task, where we call: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave where _raw_spin_lock_irqsave triggers the kretprobe and installs kretprobe_trampoline handler on _raw_spin_lock_irqsave return. The kretprobe_trampoline handler is then executed with already locked kretprobe_table_locks, and first thing it does is to lock kretprobe_table_locks ;-) the whole lockup path like: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed ---> kretprobe_table_locks locked kretprobe_trampoline trampoline_handler kretprobe_hash_lock(current, &head, &flags); <--- deadlock Adding kprobe_busy_begin/end helpers that mark code with fake probe installed to prevent triggering of another kprobe within this code. Using these helpers in kprobe_flush_task, so the probe recursion protection check is hit and the probe is never set to prevent above lockup. Reported-by: "Ziqian SUN (Zamir)" Signed-off-by: Jiri Olsa --- arch/x86/kernel/kprobes/core.c | 16 +++------------- include/linux/kprobes.h | 4 ++++ kernel/kprobes.c | 24 ++++++++++++++++++++++++ 3 files changed, 31 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 4d7022a740ab..a12adbe1559d 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -753,16 +753,11 @@ asm( NOKPROBE_SYMBOL(kretprobe_trampoline); STACK_FRAME_NON_STANDARD(kretprobe_trampoline); -static struct kprobe kretprobe_kprobe = { - .addr = (void *)kretprobe_trampoline, -}; - /* * Called from kretprobe_trampoline */ __used __visible void *trampoline_handler(struct pt_regs *regs) { - struct kprobe_ctlblk *kcb; struct kretprobe_instance *ri = NULL; struct hlist_head *head, empty_rp; struct hlist_node *tmp; @@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs) void *frame_pointer; bool skipped = false; - preempt_disable(); - /* * Set a dummy kprobe for avoiding kretprobe recursion. * Since kretprobe never run in kprobe handler, kprobe must not * be running at this point. */ - kcb = get_kprobe_ctlblk(); - __this_cpu_write(current_kprobe, &kretprobe_kprobe); - kcb->kprobe_status = KPROBE_HIT_ACTIVE; + kprobe_busy_begin(); INIT_HLIST_HEAD(&empty_rp); kretprobe_hash_lock(current, &head, &flags); @@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs) __this_cpu_write(current_kprobe, &ri->rp->kp); ri->ret_addr = correct_ret_addr; ri->rp->handler(ri, regs); - __this_cpu_write(current_kprobe, &kretprobe_kprobe); + __this_cpu_write(current_kprobe, &kprobe_busy); } recycle_rp_inst(ri, &empty_rp); @@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs) kretprobe_hash_unlock(current, &flags); - __this_cpu_write(current_kprobe, NULL); - preempt_enable(); + kprobe_busy_end(); hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) { hlist_del(&ri->hlist); diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 04bdaf01112c..645fd401c856 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void) return this_cpu_ptr(&kprobe_ctlblk); } +extern struct kprobe kprobe_busy; +void kprobe_busy_begin(void); +void kprobe_busy_end(void); + kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset); int register_kprobe(struct kprobe *p); void unregister_kprobe(struct kprobe *p); diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 2625c241ac00..75bb4a8458e7 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1236,6 +1236,26 @@ __releases(hlist_lock) } NOKPROBE_SYMBOL(kretprobe_table_unlock); +struct kprobe kprobe_busy = { + .addr = (void *) get_kprobe, +}; + +void kprobe_busy_begin(void) +{ + struct kprobe_ctlblk *kcb; + + preempt_disable(); + __this_cpu_write(current_kprobe, &kprobe_busy); + kcb = get_kprobe_ctlblk(); + kcb->kprobe_status = KPROBE_HIT_ACTIVE; +} + +void kprobe_busy_end(void) +{ + __this_cpu_write(current_kprobe, NULL); + preempt_enable(); +} + /* * This function is called from finish_task_switch when task tk becomes dead, * so that we can recycle any function-return probe instances associated @@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk) /* Early boot. kretprobe_table_locks not yet initialized. */ return; + kprobe_busy_begin(); + INIT_HLIST_HEAD(&empty_rp); hash = hash_ptr(tk, KPROBE_HASH_BITS); head = &kretprobe_inst_table[hash]; @@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk) hlist_del(&ri->hlist); kfree(ri); } + + kprobe_busy_end(); } NOKPROBE_SYMBOL(kprobe_flush_task); -- 2.18.2