Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1654637yba; Tue, 2 Apr 2019 13:05:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqzXSjzTMyxNCiaASm/fg4NtNc2bql3ZFCrV1rnYfaWqdqhVbUh0e1xDP9T+NPqqQFdAmH+Z X-Received: by 2002:a17:902:7b93:: with SMTP id w19mr3624727pll.137.1554235522748; Tue, 02 Apr 2019 13:05:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554235522; cv=none; d=google.com; s=arc-20160816; b=bQ+hzlJY2R9Pzu8ruu8NiAXIkal9gtvMCOyfRxiciV+ZJyGVwYFEOVcdi30e6QDiJ4 lyuPc2IvCDBWHv3cPXGKJhh2Hdo5U3DEnbS/oIR0dyV+nT4GUpJTsHZOgXs9CSbvKq0j ceKt0rUbX7inLZzENXSq29Awk+WSOKSSF2zw7xKg+fJDvxW2AzTanAehU+C73bmwpf3c vBHobfwvLQhL8PFEuw60EYBVz2X/kUDcKUWqwnpQX83cLo4C0ZEq0iSv+3jN9zg7gOrB CrvQm8rSUWlfDB13u5QWXuRu0qpaIQmTeHWDfUhjOEOb41Bj5fNefbEotPoRziHhvGWb YobA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=RUap84Hfr/GPBZa1YcDPAaaVWedXADv26PB+8wWtKgI=; b=juq0u+WrPPvFqNobwl7O7JlN1m9OtphvNVU793TJb5OMTfJkPhadhaIMUmEyvy2Nlw mayyGU9hhlWlhLXUKsndw17ldMXUsAG4pA7YqLrC4/BH4e6NLKEFb1Q/LVxIuKAEBLRQ wLS+HTiYYFSjYLKqIGzyDoxij/2DM2GyMFVNsn/5fBNYMIPt5NxAciCCPeA+C3kl/oeZ AZfjD0+Bgx1Gx4vJCOw8aqmT+zkDViF2qYguVnykKFPZYGY8KhV6H07M6JG2ghUFD8DK LJQeg4ta8oM5OqyD7XeQ/n1U0VsBkqV0TbJYWCgNpdTgazktd32qVKe/q9blpcNGak6k utFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4si9606654plp.59.2019.04.02.13.05.07; Tue, 02 Apr 2019 13:05:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726495AbfDBUEP (ORCPT + 99 others); Tue, 2 Apr 2019 16:04:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60188 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725857AbfDBUEP (ORCPT ); Tue, 2 Apr 2019 16:04:15 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4C3C7307C941; Tue, 2 Apr 2019 20:04:14 +0000 (UTC) Received: from localhost.default (ovpn-116-31.phx2.redhat.com [10.3.116.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id 943A36014A; Tue, 2 Apr 2019 20:04:09 +0000 (UTC) From: Daniel Bristot de Oliveira To: linux-kernel@vger.kernel.org Cc: Steven Rostedt , Arnaldo Carvalho de Melo , Ingo Molnar , Andy Lutomirski , Thomas Gleixner , Borislav Petkov , Peter Zijlstra , "H. Peter Anvin" , "Joel Fernandes (Google)" , Jiri Olsa , Namhyung Kim , Alexander Shishkin , Tommaso Cucinotta , Romulo Silva de Oliveira , Clark Williams , x86@kernel.org Subject: [RFC PATCH 1/7] x86/entry: Add support for early task context tracking Date: Tue, 2 Apr 2019 22:03:53 +0200 Message-Id: <90ce8a6a4ca02e1e8a2a43185f193cd72a59d020.1554234787.git.bristot@redhat.com> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Tue, 02 Apr 2019 20:04:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, the identification of the context is made through the preempt_counter, but it is set after the execution of the first functions of the IRQ/NMI, causing potential problems in the identification of the current status. For instance, ftrace/perf might drop events in the early stage of IRQ/NMI handlers because the preempt_counter was not set. The proposed approach is to use a dedicated per-cpu variable to keep track of the context of execution, with values set before the execution of the first C function of the interrupt handler. This is a PoC in the x86_64. Signed-off-by: Daniel Bristot de Oliveira Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Peter Zijlstra Cc: "H. Peter Anvin" Cc: "Joel Fernandes (Google)" Cc: Jiri Olsa Cc: Namhyung Kim Cc: Alexander Shishkin Cc: Tommaso Cucinotta Cc: Romulo Silva de Oliveira Cc: Clark Williams Cc: linux-kernel@vger.kernel.org Cc: x86@kernel.org --- arch/x86/entry/entry_64.S | 9 +++++++++ arch/x86/include/asm/irqflags.h | 30 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 4 ++++ include/linux/irqflags.h | 4 ++++ kernel/softirq.c | 5 ++++- 5 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 1f0efdb7b629..1471b544241f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -545,6 +545,7 @@ ENTRY(interrupt_entry) testb $3, CS+8(%rsp) jz 1f + TASK_CONTEXT_SET_BIT context=TASK_CTX_IRQ /* * IRQ from user mode. * @@ -561,6 +562,8 @@ ENTRY(interrupt_entry) 1: ENTER_IRQ_STACK old_rsp=%rdi save_ret=1 + + TASK_CONTEXT_SET_BIT context=TASK_CTX_IRQ /* We entered an interrupt context - irqs are off: */ TRACE_IRQS_OFF @@ -586,6 +589,7 @@ ret_from_intr: DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF + TASK_CONTEXT_RESET_BIT context=TASK_CTX_IRQ LEAVE_IRQ_STACK testb $3, CS(%rsp) @@ -780,6 +784,7 @@ ENTRY(\sym) call interrupt_entry UNWIND_HINT_REGS indirect=1 call \do_sym /* rdi points to pt_regs */ + TASK_CONTEXT_RESET_BIT context=TASK_CTX_IRQ jmp ret_from_intr END(\sym) _ASM_NOKPROBE(\sym) @@ -1403,9 +1408,11 @@ ENTRY(nmi) * done with the NMI stack. */ + TASK_CONTEXT_SET_BIT context=TASK_CTX_NMI movq %rsp, %rdi movq $-1, %rsi call do_nmi + TASK_CONTEXT_RESET_BIT context=TASK_CTX_NMI /* * Return back to user mode. We must *not* do the normal exit @@ -1615,10 +1622,12 @@ end_repeat_nmi: call paranoid_entry UNWIND_HINT_REGS + TASK_CONTEXT_SET_BIT context=TASK_CTX_NMI /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ movq %rsp, %rdi movq $-1, %rsi call do_nmi + TASK_CONTEXT_RESET_BIT context=TASK_CTX_NMI /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 058e40fed167..5a12bc3ea02b 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -3,6 +3,7 @@ #define _X86_IRQFLAGS_H_ #include +#include #ifndef __ASSEMBLY__ @@ -202,4 +203,33 @@ static inline int arch_irqs_disabled(void) #endif #endif /* __ASSEMBLY__ */ +#ifdef CONFIG_X86_64 +/* + * NOTE: I know I need to implement this to the 32 bits as well. + * But... this is just a POC. + */ +#define ARCH_HAS_TASK_CONTEXT 1 + +#define TASK_CTX_THREAD 0x0 +#define TASK_CTX_SOFTIRQ 0x1 +#define TASK_CTX_IRQ 0x2 +#define TASK_CTX_NMI 0x4 + +#ifdef __ASSEMBLY__ +.macro TASK_CONTEXT_SET_BIT context:req + orb $\context, PER_CPU_VAR(task_context) +.endm + +.macro TASK_CONTEXT_RESET_BIT context:req + andb $~\context, PER_CPU_VAR(task_context) +.endm +#else /* __ASSEMBLY__ */ +DECLARE_PER_CPU(unsigned char, task_context); + +static __always_inline void task_context_set(unsigned char context) +{ + raw_cpu_write_1(task_context, context); +} +#endif /* __ASSEMBLY__ */ +#endif /* CONFIG_X86_64 */ #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index cb28e98a0659..1acbec22319b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1531,6 +1531,8 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1; DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); +DEFINE_PER_CPU(unsigned char, task_context) __visible = 0; + /* May not be marked __init: used by software suspend */ void syscall_init(void) { @@ -1604,6 +1606,8 @@ EXPORT_PER_CPU_SYMBOL(current_task); DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); +DEFINE_PER_CPU(unsigned char, task_context) __visible = 0; + /* * On x86_32, vm86 modifies tss.sp0, so sp0 isn't a reliable way to find * the top of the kernel stack. Use an extra percpu variable to track the diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 21619c92c377..1c3473bbe5d2 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -168,4 +168,8 @@ do { \ #define irqs_disabled_flags(flags) raw_irqs_disabled_flags(flags) +#ifndef ARCH_HAS_TASK_CONTEXT +#define task_context_set(context) do {} while (0) +#endif + #endif diff --git a/kernel/softirq.c b/kernel/softirq.c index 10277429ed84..324de769dc07 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -410,8 +410,11 @@ void irq_exit(void) #endif account_irq_exit_time(current); preempt_count_sub(HARDIRQ_OFFSET); - if (!in_interrupt() && local_softirq_pending()) + if (!in_interrupt() && local_softirq_pending()) { + task_context_set(TASK_CTX_SOFTIRQ); invoke_softirq(); + task_context_set(TASK_CTX_IRQ); + } tick_irq_exit(); rcu_irq_exit(); -- 2.20.1