Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753870AbbFVTKH (ORCPT ); Mon, 22 Jun 2015 15:10:07 -0400 Received: from smtprelay0058.hostedemail.com ([216.40.44.58]:47820 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753770AbbFVTKC (ORCPT ); Mon, 22 Jun 2015 15:10:02 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::,RULES_HIT:2:41:355:379:421:541:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1535:1593:1594:1605:1730:1747:1777:1792:2194:2198:2199:2200:2393:2553:2559:2562:2693:2904:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:4049:4118:4250:4470:5007:6261:7875:7903:8660:9010:9108:9707:10004:10226:10848:11026:11232:11473:11658:11914:12043:12296:12438:12517:12519:12555:12663:13148:13149:13160:13161:13229:13230:13868:14096:14097:14394:21063:21080,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0 X-HE-Tag: plot94_1d26b6f5acf39 X-Filterd-Recvd-Size: 7002 Date: Mon, 22 Jun 2015 15:09:59 -0400 From: Steven Rostedt To: Sebastian Andrzej Siewior Cc: LKML , linux-rt-users , Thomas Gleixner , Peter Zijlstra Subject: [PATCH][RT][RFC] irq_work: Have non HARD_IRQ irq work just run from ticks Message-ID: <20150622150959.44055b32@gandalf.local.home> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6134 Lines: 159 With PREEMPT_RT, the irq work callbacks are called from the softirq thread, unless the HARD_IRQ flag is set for the irq work. When an irq work item is added without the HARD_IRQ flag set, and without the LAZY flag set, an interrupt is raised, and that interrupt will wake up the softirq thread to run the irq work like it would do without PREEMPT_RT. The current logic in irq_work_queue() will not raise the interrupt when the first irq work item added has the LAZY flag set. But if another irq work item is added without the LAZY flag set, and without the HARD_IRQ item set, the interrupt is not raised because the interrupt is only raised when the list was empty before adding the current irq work item. This means that if an irq work item is added with the LAZY flag set, it will not raise the interrupt and that work item will have to wait till the next timer tick (which in computer terms is a long way away). Now if in the mean time, another irq work item is added without the LAZY flag set, and without the HARD_IRQ flag set (meaning it wants to run from the softirq), the interrupt will still not be raised. This is because the interrupt is only raised when the first item of the list is added. Future items added will not raise the interrupt. This makes the raising of the irq work callback non deterministic. Rather ironic considering this only happens when PREEMPT_RT is enabled. I have several ideas on how to fix this. 1) Add another list (softirq_list), and add to it if PREEMPT_RT is enabled and the flag doesn't have either LAZY or HARD_IRQ flags set. This is what would be checked in the interrupt irq work callback instead of the lazy_list. 2) Raise the interrupt whenever a first item is added to a list (lazy or otherwise) when PREEMPT_RT is enabled, and have the lazy with the non lazy handled by softirq. 3) Only raise the hard interrupt when something is added to the raised_list. That is, for PREEMPT_RT, that would only be irq work that has the HARD_IRQ flag set. All other irq_work will be done when the tick happens. To keep things deterministic, the irq_work_run() no longer checks the lazy_list and is the same as the vanilla kernel. I'm thinking that ideally, #1 is the best choice. #2 has the issue where something may add itself as lazy, really expecting to be done from the next timer tick, but then happen from a "sooner" softirq. Although, I doubt that will really be an issue. #3 (this patch), is something that I discussed with Sebastian, and he said that nothing should break if we wait at most 10ms for the next tick. My concern here, is that the ipi call function (sending an irq work from another CPU without the HARD_IRQ flag set), on a NO_HZ cpu, may not wake it up to run it. Although, I'm not sure there's anything that uses cross CPU irq work without setting HARD_IRQ. I can add back the check to wake up the softirq, but then we make the timing of irq_work non deterministic again. Is that an issue? But here's the patch presented to you as an RFC. I can write up #1 too if people think that would be the better solution. Oh, and then there's #4, which is to do nothing. Just let irq work come in non deterministic, and that may not hurt anything either. Not-so-signed-off-by: Steven Rostedt -- Steve diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 9678fd1382a7..f247388b0fda 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -67,6 +67,7 @@ void __weak arch_irq_work_raise(void) bool irq_work_queue_on(struct irq_work *work, int cpu) { struct llist_head *list; + bool lazy_work; /* All work should have been flushed before going offline */ WARN_ON_ONCE(cpu_is_offline(cpu)); @@ -78,12 +79,15 @@ bool irq_work_queue_on(struct irq_work *work, int cpu) if (!irq_work_claim(work)) return false; - if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && !(work->flags & IRQ_WORK_HARD_IRQ)) + lazy_work = IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && + !(work->flags & IRQ_WORK_HARD_IRQ); + + if (lazy_work) list = &per_cpu(lazy_list, cpu); else list = &per_cpu(raised_list, cpu); - if (llist_add(&work->llnode, list)) + if (llist_add(&work->llnode, list) && !lazy_work) arch_send_call_function_single_ipi(cpu); return true; @@ -95,7 +99,7 @@ EXPORT_SYMBOL_GPL(irq_work_queue_on); bool irq_work_queue(struct irq_work *work) { struct llist_head *list; - bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT_FULL); + bool lazy_work; /* Only queue if not already pending */ if (!irq_work_claim(work)) @@ -104,17 +108,23 @@ bool irq_work_queue(struct irq_work *work) /* Queue the entry and raise the IPI if needed. */ preempt_disable(); - lazy_work = work->flags & IRQ_WORK_LAZY; + lazy_work = work->flags & IRQ_WORK_LAZY || + (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && + !(work->flags & IRQ_WORK_HARD_IRQ)); - if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ))) + if (lazy_work) list = this_cpu_ptr(&lazy_list); else list = this_cpu_ptr(&raised_list); - if (llist_add(&work->llnode, list)) { - if (!lazy_work || tick_nohz_tick_stopped()) - arch_irq_work_raise(); - } + /* + * If the list was empty prior to adding this item and that + * list was the raised list, or if the tick is stopped, + * then force an interrupt to occur. + */ + if (llist_add(&work->llnode, list) && + (!lazy_work || tick_nohz_tick_stopped())) + arch_irq_work_raise(); preempt_enable(); @@ -181,16 +191,7 @@ static void irq_work_run_list(struct llist_head *list) void irq_work_run(void) { irq_work_run_list(this_cpu_ptr(&raised_list)); - if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) { - /* - * NOTE: we raise softirq via IPI for safety, - * and execute in irq_work_tick() to move the - * overhead from hard to soft irq context. - */ - if (!llist_empty(this_cpu_ptr(&lazy_list))) - raise_softirq(TIMER_SOFTIRQ); - } else - irq_work_run_list(this_cpu_ptr(&lazy_list)); + irq_work_run_list(this_cpu_ptr(&lazy_list)); } EXPORT_SYMBOL_GPL(irq_work_run); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/