Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932521AbbEHR7F (ORCPT ); Fri, 8 May 2015 13:59:05 -0400 Received: from mail-db3on0095.outbound.protection.outlook.com ([157.55.234.95]:18955 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932501AbbEHR65 (ORCPT ); Fri, 8 May 2015 13:58:57 -0400 Authentication-Results: spf=fail (sender IP is 12.216.194.146) smtp.mailfrom=ezchip.com; ezchip.com; dkim=none (message not signed) header.d=none; From: Chris Metcalf To: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , "Rik van Riel" , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , "Paul E. McKenney" , Christoph Lameter , "Srivatsa S. Bhat" , , , CC: Chris Metcalf Subject: [PATCH 1/6] nohz_full: add support for "dataplane" mode Date: Fri, 8 May 2015 13:58:42 -0400 Message-ID: <1431107927-13998-2-git-send-email-cmetcalf@ezchip.com> X-Mailer: git-send-email 2.1.2 In-Reply-To: <1431107927-13998-1-git-send-email-cmetcalf@ezchip.com> References: <1431107927-13998-1-git-send-email-cmetcalf@ezchip.com> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:12.216.194.146;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(339900001)(189002)(199003)(50986999)(106466001)(107886002)(105606002)(5001960100002)(229853001)(76176999)(42186005)(5001770100001)(77156002)(47776003)(62966003)(189998001)(46102003)(50466002)(48376002)(85426001)(87936001)(50226001)(2950100001)(33646002)(36756003)(104016003)(6806004)(19580405001)(19580395003)(2201001)(86362001)(92566002)(921003)(4001430100001)(1121003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB4PR02MB0429;H:ld-1.internal.tilera.com;FPR:;SPF:Fail;MLV:sfv;MX:1;A:1;LANG:en; MIME-Version: 1.0 Content-Type: text/plain X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB4PR02MB0429; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:DB4PR02MB0429;BCL:0;PCL:0;RULEID:;SRVR:DB4PR02MB0429; X-Forefront-PRVS: 0570F1F193 X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 May 2015 17:58:53.2113 (UTC) X-MS-Exchange-CrossTenant-Id: 0fc16e0a-3cd3-4092-8b2f-0a42cff122c3 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=0fc16e0a-3cd3-4092-8b2f-0a42cff122c3;Ip=[12.216.194.146];Helo=[ld-1.internal.tilera.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR02MB0429 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6152 Lines: 182 The existing nohz_full mode makes tradeoffs to minimize userspace interruptions while still attempting to avoid overheads in the kernel entry/exit path, to provide 100% kernel semantics, etc. However, some applications require a stronger commitment from the kernel to avoid interruptions, in particular userspace device driver style applications, such as high-speed networking code. This change introduces a framework to allow applications to elect to have the stronger semantics as needed, specifying prctl(PR_SET_DATAPLANE, PR_DATAPLANE_ENABLE) to do so. Subsequent commits will add additional flags and additional semantics. The dataplane state is indicated by setting a new task struct field, dataplane_flags, to the value passed by prctl(). When the _ENABLE bit is set for a task, and it is returning to userspace on a nohz_full core, it calls the new tick_nohz_dataplane_enter() routine to take additional actions to help the task avoid being interrupted in the future. For this first patch, the only action taken is to call lru_add_drain() to prevent being interrupted by a subsequent lru_add_drain_all() call on another core. Signed-off-by: Chris Metcalf --- include/linux/sched.h | 3 +++ include/linux/tick.h | 10 ++++++++++ include/uapi/linux/prctl.h | 5 +++++ kernel/context_tracking.c | 3 +++ kernel/sys.c | 8 ++++++++ kernel/time/tick-sched.c | 13 +++++++++++++ 6 files changed, 42 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8222ae40ecb0..3680aa07c9ea 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1732,6 +1732,9 @@ struct task_struct { #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif +#ifdef CONFIG_NO_HZ_FULL + unsigned int dataplane_flags; +#endif }; /* Future-safe accessor for struct task_struct's cpus_allowed. */ diff --git a/include/linux/tick.h b/include/linux/tick.h index f8492da57ad3..d191cda9b71a 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -10,6 +10,7 @@ #include #include #include +#include #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -134,11 +135,18 @@ static inline bool tick_nohz_full_cpu(int cpu) return cpumask_test_cpu(cpu, tick_nohz_full_mask); } +static inline bool tick_nohz_is_dataplane(void) +{ + return tick_nohz_full_cpu(smp_processor_id()) && + (current->dataplane_flags & PR_DATAPLANE_ENABLE); +} + extern void __tick_nohz_full_check(void); extern void tick_nohz_full_kick(void); extern void tick_nohz_full_kick_cpu(int cpu); extern void tick_nohz_full_kick_all(void); extern void __tick_nohz_task_switch(struct task_struct *tsk); +extern void tick_nohz_dataplane_enter(void); #else static inline bool tick_nohz_full_enabled(void) { return false; } static inline bool tick_nohz_full_cpu(int cpu) { return false; } @@ -147,6 +155,8 @@ static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void tick_nohz_full_kick(void) { } static inline void tick_nohz_full_kick_all(void) { } static inline void __tick_nohz_task_switch(struct task_struct *tsk) { } +static inline bool tick_nohz_is_dataplane(void) { return false; } +static inline void tick_nohz_dataplane_enter(void) { } #endif static inline bool is_housekeeping_cpu(int cpu) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 31891d9535e2..1aa8fa8a8b05 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -190,4 +190,9 @@ struct prctl_mm_map { # define PR_FP_MODE_FR (1 << 0) /* 64b FP registers */ # define PR_FP_MODE_FRE (1 << 1) /* 32b compatibility */ +/* Enable/disable or query dataplane mode for NO_HZ_FULL kernels. */ +#define PR_SET_DATAPLANE 47 +#define PR_GET_DATAPLANE 48 +# define PR_DATAPLANE_ENABLE (1 << 0) + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 72d59a1a6eb6..dd6bdd6197b6 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -20,6 +20,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -85,6 +86,8 @@ void context_tracking_enter(enum ctx_state state) * on the tick. */ if (state == CONTEXT_USER) { + if (tick_nohz_is_dataplane()) + tick_nohz_dataplane_enter(); trace_user_enter(0); vtime_user_enter(current); } diff --git a/kernel/sys.c b/kernel/sys.c index a4e372b798a5..930b750aefde 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2243,6 +2243,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_GET_FP_MODE: error = GET_FP_MODE(me); break; +#ifdef CONFIG_NO_HZ_FULL + case PR_SET_DATAPLANE: + me->dataplane_flags = arg2; + break; + case PR_GET_DATAPLANE: + error = me->dataplane_flags; + break; +#endif default: error = -EINVAL; break; diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 914259128145..31c674719647 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -24,6 +24,7 @@ #include #include #include +#include #include @@ -389,6 +390,18 @@ void __init tick_nohz_init(void) pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n", cpumask_pr_args(tick_nohz_full_mask)); } + +/* + * When returning to userspace on a nohz_full core after doing + * prctl(PR_DATAPLANE_SET,1), we come here and try more aggressively + * to prevent this core from being interrupted later. + */ +void tick_nohz_dataplane_enter(void) +{ + /* Drain the pagevecs to avoid unnecessary IPI flushes later. */ + lru_add_drain(); +} + #endif /* -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/