Received: by 10.223.185.116 with SMTP id b49csp4450114wrg; Tue, 6 Mar 2018 16:26:26 -0800 (PST) X-Google-Smtp-Source: AG47ELuG9fBiCQvDwNlnzUcmPKJa9vxlmpP4SO+9ZkZvrgjauucjg7bPRq3oYJl3UvO6U3T0eX7i X-Received: by 10.99.112.77 with SMTP id a13mr16383362pgn.253.1520382386180; Tue, 06 Mar 2018 16:26:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520382386; cv=none; d=google.com; s=arc-20160816; b=B83iezpRcVfZwfcEAZfExa3ehip7ynGbFrd6AhSrCoeiS3qlo1p0MiYwRxVmiDzyg3 T445Fo4uTfTclK5DMFlDXia1rL8xWp2uNp+YlaUlCDSOCVSxbf1omPpSxrbDZfLNh07o mnhK2TM6L6Wb2GOVG5quFdvjJJIATW/u56/LmkGc7mzTl6Uno7QzFRmFnQsUL8WXqOBC s+b3F0dXXiylh+ul+MZGMaw2C6vpVBjqqwnD/4WtEnKb+VNII7bnKVAu0cFa122rTyXa RG4ap1w3VWi8CfDPIrioMXcPLTNUJp1Kq3fRMPab5jhhxHzd+fLhIX4FU5LhSohCKt0T IriA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:arc-authentication-results; bh=W23lClzcYVAqIvVwOi8xpZnzeM8Z+noKsUsGT/c7EHw=; b=Qa6Ibe2ZAknkDJjCD1YxUmI2LOz6jD+/KiTskHufmFVWa5LKjpEhKWAkaCXiTHReRH H2fS6iKI77YrLDXNqWj65JxjJ7OSCzl5mw0kfCZNm3jEQvImeMFnu1nVQ3N7hL40zjlZ kLnQF7eygsflDvf9JGaBxTVWaCOzs61/KULXDubyYiFbZftb8IuMO4vMdqx3d6tzSqbv KmC8UrPH7EI/huMsQCnXq1P0KMkyXmZQcj91tHlIqNH02O+el/TH63TfkmHRxP5zuvwt IateFpa96YG/iOsNOucLUVU1u1BhfDDJ9bafbDOgB2B1oG9PFfuzuCK5pH22AmxOY4pP 9y7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x12si10573035pge.263.2018.03.06.16.26.11; Tue, 06 Mar 2018 16:26:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753940AbeCFXkI (ORCPT + 99 others); Tue, 6 Mar 2018 18:40:08 -0500 Received: from mx0a-00010702.pphosted.com ([148.163.156.75]:44686 "EHLO mx0b-00010702.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753504AbeCFXkG (ORCPT ); Tue, 6 Mar 2018 18:40:06 -0500 Received: from pps.filterd (m0098781.ppops.net [127.0.0.1]) by mx0a-00010702.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w26Nahhg015488; Tue, 6 Mar 2018 17:39:55 -0600 Received: from ni.com (skprod2.natinst.com [130.164.80.23]) by mx0a-00010702.pphosted.com with ESMTP id 2gh8xwnck8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Mar 2018 17:39:55 -0600 Received: from us-aus-exch2.ni.corp.natinst.com (us-aus-exch2.ni.corp.natinst.com [130.164.68.12]) by us-aus-skprod2.natinst.com (8.16.0.22/8.16.0.22) with ESMTPS id w26Ndsub000962 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 6 Mar 2018 17:39:54 -0600 Received: from us-aus-exch3.ni.corp.natinst.com (130.164.68.13) by us-aus-exch2.ni.corp.natinst.com (130.164.68.12) with Microsoft SMTP Server (TLS) id 15.0.1156.6; Tue, 6 Mar 2018 17:39:54 -0600 Received: from us-aus-exhub2.ni.corp.natinst.com (130.164.68.32) by us-aus-exch3.ni.corp.natinst.com (130.164.68.13) with Microsoft SMTP Server (TLS) id 15.0.1156.6; Tue, 6 Mar 2018 17:39:54 -0600 Received: from harisdt.amer.corp.natinst.com (130.164.49.7) by us-aus-exhub2.ni.corp.natinst.com (130.164.68.32) with Microsoft SMTP Server id 15.0.1156.6 via Frontend Transport; Tue, 6 Mar 2018 17:39:54 -0600 From: Haris Okanovic To: , CC: , , , , , , , Subject: [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Date: Tue, 6 Mar 2018 17:39:43 -0600 Message-ID: <20180306233944.11271-1-haris.okanovic@ni.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-06_13:,, signatures=0 X-Proofpoint-Spam-Reason: safe Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We recently upgraded from 4.1 to 4.6 and noticed a minor latency regression caused by an additional thread wakeup (ktimersoftd) in interrupt context on every tick. The wakeups are from run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq coalesced into one ksoftirqd wakeup prior to Sebastian's change to split timers into their own thread. There's already logic in run_local_timers() to avoid some unnecessary wakeups of ksoftirqd, but it doesn't seems to catch them all. In particular, I've seen many unnecessary wakeups when jiffies increments prior to run_local_timers(). Change the way timers are collected per Julia and Thomas' recommendation: Expired timers are now collected in interrupt context and fired in ktimersoftd to avoid double-walk of `pending_map`. Collect expired timers in interrupt context to avoid overhead of waking ktimersoftd on every tick. ktimersoftd now wakes only when one or more timers are ready, which yields a minor reduction in small latency spikes measure by cyclictest. Execution time of run_local_timers() increases by 0.2us to 2.5us as measured by TSC on a 2core Intel Atom E3825 system. This is implemented by storing lists of expired timers in timer_base, updated on each tick. Any addition to the lists wakes ktimersoftd (softirq) to process those timers. Signed-off-by: Haris Okanovic --- [PATCH v2] Applied Thomas Gleixner's suggestions: - Fix expired_count race - Remove unneeded base->clk lookahead - Return expired_count in collect_expired_timers() - Add block_softirq - Rebase to v4.11.8-rt5 [PATCH v3] - Fix cosmetic issues - Rename "count" to "levels" in timer_base and various functions - Move expired_levels and block_softirq to fill holes in timer_base - Remove READ_ONCE/WRITE_ONCE around block_softirq [PATCH v4] - Rebase onto v4.14.20-rt17 https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v6 --- kernel/time/timer.c | 111 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 82 insertions(+), 29 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 96fd01c9f6b1..98e952a6428d 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -197,6 +197,7 @@ EXPORT_SYMBOL(jiffies_64); struct timer_base { raw_spinlock_t lock; + int expired_levels; struct timer_list *running_timer; #ifdef CONFIG_PREEMPT_RT_FULL struct swait_queue_head wait_for_running_timer; @@ -208,6 +209,7 @@ struct timer_base { bool must_forward_clk; DECLARE_BITMAP(pending_map, WHEEL_SIZE); struct hlist_head vectors[WHEEL_SIZE]; + struct hlist_head expired_lists[LVL_DEPTH]; } ____cacheline_aligned; static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]); @@ -1342,7 +1344,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long), } } -static void expire_timers(struct timer_base *base, struct hlist_head *head) +static void __expire_timers(struct timer_base *base, + struct hlist_head *head) { while (!hlist_empty(head)) { struct timer_list *timer; @@ -1372,21 +1375,49 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) } } -static int __collect_expired_timers(struct timer_base *base, - struct hlist_head *heads) +static void expire_timers(struct timer_base *base) +{ + struct hlist_head *head; + int levels = base->expired_levels; + + while (levels--) { + head = base->expired_lists + levels; + __expire_timers(base, head); + } + + /* + * Zero base->expired_levels after processing all base->expired_lists + * to signal it's ready to get re-populated. Otherwise, we race with + * tick_find_expired() when base->lock is temporarily dropped in + * __expire_timers() + */ + base->expired_levels = 0; +} + +static int __collect_expired_timers(struct timer_base *base) { - unsigned long clk = base->clk; struct hlist_head *vec; - int i, levels = 0; + struct hlist_head *expired_list = base->expired_lists; + unsigned long clk; + int i; unsigned int idx; + /* + * expire_timers() must be called at least once before we can + * collect more timers. + */ + if (base->expired_levels) + return base->expired_levels; + + clk = base->clk; for (i = 0; i < LVL_DEPTH; i++) { idx = (clk & LVL_MASK) + i * LVL_SIZE; if (__test_and_clear_bit(idx, base->pending_map)) { vec = base->vectors + idx; - hlist_move_list(vec, heads++); - levels++; + hlist_move_list(vec, expired_list); + base->expired_levels++; + expired_list++; } /* Is it time to look at the next level? */ if (clk & LVL_CLK_MASK) @@ -1394,7 +1425,8 @@ static int __collect_expired_timers(struct timer_base *base, /* Shift clock for the next level granularity */ clk >>= LVL_CLK_SHIFT; } - return levels; + + return base->expired_levels; } #ifdef CONFIG_NO_HZ_COMMON @@ -1593,8 +1625,7 @@ void timer_clear_idle(void) base->is_idle = false; } -static int collect_expired_timers(struct timer_base *base, - struct hlist_head *heads) +static int collect_expired_timers(struct timer_base *base) { /* * NOHZ optimization. After a long idle sleep we need to forward the @@ -1615,16 +1646,47 @@ static int collect_expired_timers(struct timer_base *base, } base->clk = next; } - return __collect_expired_timers(base, heads); + return __collect_expired_timers(base); } #else -static inline int collect_expired_timers(struct timer_base *base, - struct hlist_head *heads) +static inline int collect_expired_timers(struct timer_base *base) { - return __collect_expired_timers(base, heads); + return __collect_expired_timers(base); } #endif +/* + * Increments timer_base to current jiffies or until first expired + * timer is found. Return number of expired levels. + */ +static int find_expired_timers(struct timer_base *base) +{ + unsigned long int end_clk = jiffies; + int expired_levels; + + while (!(expired_levels = collect_expired_timers(base)) && + time_after_eq(end_clk, base->clk)) { + base->clk++; + } + + return expired_levels; +} + +/* + * Called from CPU tick routine to collect expired timers up to current + * jiffies. Return number of expired levels. + */ +static int tick_find_expired(struct timer_base *base) +{ + int levels; + + raw_spin_lock(&base->lock); + levels = find_expired_timers(base); + raw_spin_unlock(&base->lock); + + return levels; +} + /* * Called from the timer interrupt handler to charge one tick to the current * process. user_tick is 1 if the tick is user time, 0 for system. @@ -1652,22 +1714,12 @@ void update_process_times(int user_tick) */ static inline void __run_timers(struct timer_base *base) { - struct hlist_head heads[LVL_DEPTH]; - int levels; - - if (!time_after_eq(jiffies, base->clk)) - return; - raw_spin_lock_irq(&base->lock); - while (time_after_eq(jiffies, base->clk)) { + do { + expire_timers(base); + } while (find_expired_timers(base)); - levels = collect_expired_timers(base, heads); - base->clk++; - - while (levels--) - expire_timers(base, heads + levels); - } raw_spin_unlock_irq(&base->lock); wakeup_timer_waiters(base); } @@ -1707,12 +1759,12 @@ void run_local_timers(void) hrtimer_run_queues(); /* Raise the softirq only if required. */ - if (time_before(jiffies, base->clk)) { + if (time_before(jiffies, base->clk) || !tick_find_expired(base)) { if (!IS_ENABLED(CONFIG_NO_HZ_COMMON)) return; /* CPU is awake, so check the deferrable base. */ base++; - if (time_before(jiffies, base->clk)) + if (time_before(jiffies, base->clk) || !tick_find_expired(base)) return; } raise_softirq(TIMER_SOFTIRQ); @@ -1887,6 +1939,7 @@ int timers_dead_cpu(unsigned int cpu) raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING); BUG_ON(old_base->running_timer); + BUG_ON(old_base->expired_levels); for (i = 0; i < WHEEL_SIZE; i++) migrate_timer_list(new_base, old_base->vectors + i); -- 2.15.1