Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp1952110ybd; Thu, 27 Jun 2019 04:24:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqwIE7xzGlzH3DyRMQDGos1p1xPzJzf8ZKJMA7js8Og7UF3EgOHWFfAkViyRGILrnq8GBARl X-Received: by 2002:a17:90a:ac14:: with SMTP id o20mr5505388pjq.114.1561634698095; Thu, 27 Jun 2019 04:24:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561634698; cv=none; d=google.com; s=arc-20160816; b=GhK7ZGPvDuqWfnDKckhwka2KrShhRr7gMv2L9SGK0k8Mwr5LVdmABqeeOj6FxIraGp TxbacI8FHtIQLo3XGlccy/OWF719txi7Itl39CW9o8cpH/6Lpb6TY3TyaWozRK28Y1IW mDV/YCNfvB8Xa40tT2q39MG/2o3tdQrTpW2A6/vW9fscV0dYkrbUZo03zP1M/MEeNFa/ qzh2potk+2H7lluBCNlw8yr9ZPsfyzUMA4rFVmLLaDqdkgXKFKs0V0B0pGSm1pQHZaeJ MoVXGqR9rNFZY4q+Wl9ZrhSX4nLAqDrtlPsx5YZGByumdd8WgWKEMo987MK0oeP8Wv14 z/3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:date:subject:cc:to:from; bh=wTLxrlP7csI1iOYfUPg4HGafULki2bhAwM0nw5Yrb9g=; b=C2jJue5ynNq3tZyTOWB26HcEtv28pw8j2Dfcu8LGNl19qzagPcd4AL8wsF1coOV7Nr 5hdPIPc/CM9mlQTMMo70j7bxaeMqsuA3jyUoHFhqenV1Y75RVvIe/NwAmGTofaDXAJ8C 6IYHOh99a+Aif1Ubgl+mMdPNFPQE892I7yIMQVNWvKVJxcMMfNRL5SBgVKAw7wDUaW6q FxlkoMPLLMe7OsJz5VrvNt/K6554HxDcqGh21R98Rb+RFtiOLsdGiDN4qSNMyz6mGrUO cwmU9XXpuKZyyDuoQQeRxXglrZwYJ27S0U8U6Jlt3wMcy/ydlqf5StZANJKN9PzoOJ46 Jp/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id fr1si4900122pjb.57.2019.06.27.04.24.41; Thu, 27 Jun 2019 04:24:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726762AbfF0LYX (ORCPT + 99 others); Thu, 27 Jun 2019 07:24:23 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:51726 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726734AbfF0LYX (ORCPT ); Thu, 27 Jun 2019 07:24:23 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5RBMngI033623 for ; Thu, 27 Jun 2019 07:24:21 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tct99y69p-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 27 Jun 2019 07:24:21 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 27 Jun 2019 12:24:19 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 27 Jun 2019 12:24:16 +0100 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x5RBOFox60227716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Jun 2019 11:24:15 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 57B73A4054; Thu, 27 Jun 2019 11:24:15 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 59407A405C; Thu, 27 Jun 2019 11:24:13 +0000 (GMT) Received: from naverao1-tp.ibmuc.com (unknown [9.85.73.27]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 27 Jun 2019 11:24:13 +0000 (GMT) From: "Naveen N. Rao" To: Michael Ellerman , Steven Rostedt , Masami Hiramatsu , Ingo Molnar , Nicholas Piggin Cc: , Subject: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel Date: Thu, 27 Jun 2019 16:53:52 +0530 X-Mailer: git-send-email 2.22.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19062711-4275-0000-0000-00000346BF15 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19062711-4276-0000-0000-00003856C743 Message-Id: <841386feda429a1f0d4b7442c3ede1ed91466f92.1561634177.git.naveen.n.rao@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-06-27_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906270133 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to enable function tracing and profiling. So far, with dynamic ftrace, we used to only patch out the branch to _mcount(). However, mflr is executed by the branch unit that can only execute one per cycle on POWER9 and shared with branches, so it would be nice to avoid it where possible. We cannot simply nop out the mflr either. When enabling function tracing, there can be a race if tracing is enabled when some thread was interrupted after executing a nop'ed out mflr. In this case, the thread would execute the now-patched-in branch to _mcount() without having executed the preceding mflr. To solve this, we now enable function tracing in 2 steps: patch in the mflr instruction, use 'smp_call_function(isync); synchronize_rcu_tasks()' to ensure all existing threads make progress, and then patch in the branch to _mcount(). We override ftrace_replace_code() with a powerpc64 variant for this purpose. Suggested-by: Nicholas Piggin Reviewed-by: Nicholas Piggin Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++++--- 1 file changed, 236 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 517662a56bdc..86c2b50dcaa9 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod, { unsigned long entry, ptr, tramp; unsigned long ip = rec->ip; - unsigned int op, pop; + unsigned int op; /* read where this goes */ if (probe_kernel_read(&op, (void *)ip, sizeof(int))) { @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod, #ifdef CONFIG_MPROFILE_KERNEL /* When using -mkernel_profile there is no load to jump over */ - pop = PPC_INST_NOP; - if (probe_kernel_read(&op, (void *)(ip - 4), 4)) { pr_err("Fetching instruction at %lx failed.\n", ip - 4); return -EFAULT; @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod, /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) { - pr_err("Unexpected instruction %08x around bl _mcount\n", op); + pr_err("Unexpected instruction %08x before bl _mcount\n", op); return -EINVAL; } -#else - /* - * Our original call site looks like: - * - * bl - * ld r2,XX(r1) - * - * Milton Miller pointed out that we can not simply nop the branch. - * If a task was preempted when calling a trace function, the nops - * will remove the way to restore the TOC in r2 and the r2 TOC will - * get corrupted. - * - * Use a b +8 to jump over the load. - */ - pop = PPC_INST_BRANCH | 8; /* b +8 */ + /* We should patch out the bl to _mcount first */ + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) { + pr_err("Patching NOP failed.\n"); + return -EPERM; + } + /* then, nop out the preceding 'mflr r0' as an optimization */ + if (op == PPC_INST_MFLR && + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) { + pr_err("Patching NOP failed.\n"); + return -EPERM; + } +#else /* * Check what is in the next instruction. We can see ld r2,40(r1), but * on first pass after boot we will see mflr r0. @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod, pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op); return -EINVAL; } -#endif /* CONFIG_MPROFILE_KERNEL */ - if (patch_instruction((unsigned int *)ip, pop)) { + /* + * Our original call site looks like: + * + * bl + * ld r2,XX(r1) + * + * Milton Miller pointed out that we can not simply nop the branch. + * If a task was preempted when calling a trace function, the nops + * will remove the way to restore the TOC in r2 and the r2 TOC will + * get corrupted. + * + * Use a b +8 to jump over the load. + */ + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) { pr_err("Patching NOP failed.\n"); return -EPERM; } +#endif /* CONFIG_MPROFILE_KERNEL */ return 0; } @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) return -EPERM; } +#ifdef CONFIG_MPROFILE_KERNEL + /* Nop out the preceding 'mflr r0' as an optimization */ + if (probe_kernel_read(&op, (void *)(ip - 4), 4)) { + pr_err("Fetching instruction at %lx failed.\n", ip - 4); + return -EFAULT; + } + + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ + if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) { + pr_err("Unexpected instruction %08x before bl _mcount\n", op); + return -EINVAL; + } + + if (op == PPC_INST_MFLR && + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) { + pr_err("Patching NOP failed.\n"); + return -EPERM; + } +#endif + return 0; } @@ -429,6 +457,7 @@ int ftrace_make_nop(struct module *mod, { unsigned long ip = rec->ip; unsigned int old, new; + int rc; /* * If the calling address is more that 24 bits away, @@ -439,7 +468,34 @@ int ftrace_make_nop(struct module *mod, /* within range */ old = ftrace_call_replace(ip, addr, 1); new = PPC_INST_NOP; - return ftrace_modify_code(ip, old, new); + rc = ftrace_modify_code(ip, old, new); +#ifdef CONFIG_MPROFILE_KERNEL + if (rc) + return rc; + + /* + * For -mprofile-kernel, we patch out the preceding 'mflr r0' + * instruction, as an optimization. It is important to nop out + * the branch to _mcount() first, as a lone 'mflr r0' is + * harmless. + */ + if (probe_kernel_read(&old, (void *)(ip - 4), 4)) { + pr_err("Fetching instruction at %lx failed.\n", ip - 4); + return -EFAULT; + } + + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */ + if (old != PPC_INST_MFLR && old != PPC_INST_STD_LR) { + pr_err("Unexpected instruction %08x before bl _mcount\n", + old); + return -EINVAL; + } + + if (old == PPC_INST_MFLR) + rc = patch_instruction((unsigned int *)(ip - 4), + PPC_INST_NOP); +#endif + return rc; } else if (core_kernel_text(ip)) return __ftrace_make_nop_kernel(rec, addr); @@ -567,6 +623,37 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) return -EINVAL; } +#ifdef CONFIG_MPROFILE_KERNEL + /* + * We could end up here without having called __ftrace_make_call_prep() + * if function tracing is enabled before a module is loaded. + * + * ftrace_module_enable() --> ftrace_replace_code_rec() --> + * ftrace_make_call() --> __ftrace_make_call() + * + * In this scenario, the previous instruction will be a NOP. It is + * safe to patch it with a 'mflr r0' since we know for a fact that + * this code is not yet being run. + */ + ip -= MCOUNT_INSN_SIZE; + + /* read where this goes */ + if (probe_kernel_read(op, ip, MCOUNT_INSN_SIZE)) + return -EFAULT; + + /* + * nothing to do if this is using the older -mprofile-kernel + * instruction sequence + */ + if (op[0] != PPC_INST_NOP) + return 0; + + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) { + pr_err("Patching MFLR failed.\n"); + return -EPERM; + } +#endif + return 0; } @@ -863,6 +950,133 @@ void arch_ftrace_update_code(int command) ftrace_modify_all_code(command); } +#ifdef CONFIG_MPROFILE_KERNEL +/* Returns 1 if we patched in the mflr */ +static int __ftrace_make_call_prep(struct dyn_ftrace *rec) +{ + void *ip = (void *)rec->ip - MCOUNT_INSN_SIZE; + unsigned int op[2]; + + /* read where this goes */ + if (probe_kernel_read(op, ip, sizeof(op))) + return -EFAULT; + + if (op[1] != PPC_INST_NOP) { + pr_err("Unexpected call sequence at %p: %x %x\n", + ip, op[0], op[1]); + return -EINVAL; + } + + /* + * nothing to do if this is using the older -mprofile-kernel + * instruction sequence + */ + if (op[0] != PPC_INST_NOP) + return 0; + + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) { + pr_err("Patching MFLR failed.\n"); + return -EPERM; + } + + return 1; +} + +static void do_isync(void *info __maybe_unused) +{ + isync(); +} + +/* + * When enabling function tracing for -mprofile-kernel that uses a + * 2-instruction sequence of 'mflr r0, bl _mcount()', we use a 2 step process: + * 1. Patch in the 'mflr r0' instruction + * 1a. flush icache on all cpus, so that the updated instruction is seen + * 1b. synchronize_rcu_tasks() to ensure that any cpus that had executed + * the earlier nop there make progress (and hence do not branch into + * ftrace without executing the preceding mflr) + * 2. Patch in the branch to ftrace + */ +void ftrace_replace_code(int mod_flags) +{ + int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL; + int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL; + int ret, failed, make_call = 0; + struct ftrace_rec_iter *iter; + struct dyn_ftrace *rec; + + if (unlikely(!ftrace_enabled)) + return; + + for_ftrace_rec_iter(iter) { + rec = ftrace_rec_iter_record(iter); + + if (rec->flags & FTRACE_FL_DISABLED) + continue; + + failed = 0; + ret = ftrace_test_record(rec, enable); + if (ret == FTRACE_UPDATE_MAKE_CALL) { + failed = __ftrace_make_call_prep(rec); + if (failed < 0) { + ftrace_bug(failed, rec); + return; + } else if (failed == 1) + make_call++; + } + + if (!failed) { + /* We can patch the record directly */ + failed = ftrace_replace_code_rec(rec, enable); + if (failed) { + ftrace_bug(failed, rec); + return; + } + } + + if (schedulable) + cond_resched(); + } + + if (!make_call) + /* No records needed patching a preceding mflr */ + return; + + /* Make sure all cpus see the new instruction */ + smp_call_function(do_isync, NULL, 1); + + /* + * We also need to ensure that all cpus make progress: + * - With !CONFIG_PREEMPT, we want to be sure that cpus return from + * any interrupts they may be handling, and make progress. + * - With CONFIG_PREEMPT, we want to be additionally sure that there + * are no pre-empted tasks that have executed the earlier nop, and + * might end up executing the subsequently patched branch to ftrace. + */ + synchronize_rcu_tasks(); + + for_ftrace_rec_iter(iter) { + rec = ftrace_rec_iter_record(iter); + + if (rec->flags & FTRACE_FL_DISABLED) + continue; + + ret = ftrace_test_record(rec, enable); + if (ret == FTRACE_UPDATE_MAKE_CALL) + failed = ftrace_replace_code_rec(rec, enable); + + if (failed) { + ftrace_bug(failed, rec); + return; + } + + if (schedulable) + cond_resched(); + } + +} +#endif + #ifdef CONFIG_PPC64 #define PACATOC offsetof(struct paca_struct, kernel_toc) -- 2.22.0