Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964822AbbFJMos (ORCPT ); Wed, 10 Jun 2015 08:44:48 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:36058 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933376AbbFJMol (ORCPT ); Wed, 10 Jun 2015 08:44:41 -0400 Message-ID: <1433940278.6814.66.camel@gmail.com> Subject: RFC: futex_wait() can DoS the tick From: Mike Galbraith To: LKML Cc: Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Steven Rostedt Date: Wed, 10 Jun 2015 14:44:38 +0200 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4235 Lines: 130 Greetings, Like so... #include #include #include #include int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3) { return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3); } int main() { struct timespec t; int f = 1; clock_gettime(CLOCK_REALTIME, &t); t.tv_sec -= 10; while (1) { sys_futex(&f, FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME, 1, &t, NULL, FUTEX_BITSET_MATCH_ANY); } } The above was handed to me by a colleague working on a Xen guest that livelocked. I at first though Xen arch must have a weird problem, but when I tried proggy on my desktop box, while it didn't stop the tick completely as it did the Xen box, it slowed it to a crawl. I noticed that this did not happen with newer kernels, so a bisecting I did go, and found that... 279f14614 x86: apic: Use tsc deadline for oneshot when available ..is what fixed it up. Trouble is, while it fixes up my Haswell box, a Xen dom0 remains busted by that testcase whether that patch is applied to the host or not, even though the hypervisor supports deadline timer, and seemingly regardless of CPU type all together. Of all the x86_64 bare metal boxen I've tested, only those with the TSC deadline timer have shown the issue, and there it goes away as of v3.8 unless you boot lapic=notscdeadline. However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy, hasbeen) can be made to exhibit the symptom, there may be other arches that get seriously dinged up or maybe even as thoroughly b0rked as Xen does when hrtimer_interrupt() is pounded into the ground by userspace. Alternatively, should someone out there know that all bare metal is in fact fine post 279f14614, that person will likely also know what the Xen folks need to do to fix up their busted arch. The below targets the symptom, consider it hrtimer cluebat attractant. --- kernel/time/hrtimer.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -933,6 +933,8 @@ remove_hrtimer(struct hrtimer *timer, st return 0; } +static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer); + int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, unsigned long delta_ns, const enum hrtimer_mode mode, int wakeup) @@ -980,8 +982,27 @@ int __hrtimer_start_range_ns(struct hrti * on dynticks target. */ wake_up_nohz_cpu(new_base->cpu_base->cpu); - } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases) && - hrtimer_reprogram(timer, new_base)) { + } else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases)) { + int res = hrtimer_reprogram(timer, new_base); + + if (!res) + goto out; + + /* + * If a buggy app tries forever to be awakened in the past, + * banging on hrtimer_interrupt() at high speed can stall + * the tick, and on a Xen box, forever. On haswell with + * tsc_deadline_timer disabled you can see it, though it + * only slows the tick way down. Other bare metal boxes + * may also be terminally affected. + */ + if (unlikely(wakeup && !ret && IS_ERR_VALUE(res) && + timer->function == hrtimer_wakeup)) { + debug_deactivate(timer); + __remove_hrtimer(timer, new_base, 0, 0); + ret = -ETIMEDOUT; + } + /* * Only allow reprogramming if the new base is on this CPU. * (it might still be on another CPU if the timer was pending) @@ -994,7 +1015,10 @@ int __hrtimer_start_range_ns(struct hrti * lock ordering issue vs. rq->lock. */ raw_spin_unlock(&new_base->cpu_base->lock); - raise_softirq_irqoff(HRTIMER_SOFTIRQ); + if (!IS_ERR_VALUE(ret)) + raise_softirq_irqoff(HRTIMER_SOFTIRQ); + else + hrtimer_wakeup(timer); local_irq_restore(flags); return ret; } else { @@ -1002,6 +1026,7 @@ int __hrtimer_start_range_ns(struct hrti } } +out: unlock_hrtimer_base(timer, &flags); return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/