Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754003Ab3GXXba (ORCPT ); Wed, 24 Jul 2013 19:31:30 -0400 Received: from webmail.solarflare.com ([12.187.104.25]:62122 "EHLO webmail.solarflare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752981Ab3GXXb2 (ORCPT ); Wed, 24 Jul 2013 19:31:28 -0400 Message-ID: <1374708683.1732.38.camel@bwh-desktop.uk.level5networks.com> Subject: IRQ affinity notifiers vs RT From: Ben Hutchings To: Thomas Gleixner , Steven Rostedt CC: "Alexandra N. Kossovsky" , linux-kernel Date: Thu, 25 Jul 2013 00:31:23 +0100 Organization: Solarflare Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.4 (3.6.4-3.fc18) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.17.20.137] X-TM-AS-Product-Ver: SMEX-10.0.0.1412-7.000.1014-20034.005 X-TM-AS-Result: No--14.162100-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6654 Lines: 169 Alexandra Kossovsky reported the following lockdep splat when testing an out-of-tree version of sfc on 3.6-rt. The problem is specific to RT, and we haven't tested anything later but I think it's still unfixed. > ====================================================== > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > 3.6.11.2-rt33.39.el6rt.x86_64.debug #1 Tainted: G O > ------------------------------------------------------ > insmod/3076 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > (&(&(&gcwq->lock)->lock)->wait_lock){+.+...}, at: [] rt_spin_lock_slowlock+0x48/0x2f0 > > and this task is already holding: > (&irq_desc_lock_class){-.....}, at: [] irq_set_affinity+0x46/0x80 irq_set_affinity() holds the irq_desc lock, and then schedules a work item to call the IRQ affinity notifier. > which would create a new lock dependency: > (&irq_desc_lock_class){-.....} -> (&(&(&gcwq->lock)->lock)->wait_lock){+.+...} > > but this new dependency connects a HARDIRQ-irq-safe lock: > (&irq_desc_lock_class){-.....} > ... which became HARDIRQ-irq-safe at: > [] mark_irqflags+0x172/0x190 > [] __lock_acquire+0x344/0x4e0 > [] lock_acquire+0x8a/0x140 > [] _raw_spin_lock+0x40/0x80 > [] handle_level_irq+0x1e/0x100 > [] handle_irq+0x71/0x190 > [] do_IRQ+0x5d/0xe0 > [] ret_from_intr+0x0/0x13 > [] tsc_init+0x24/0x102 > [] x86_late_time_init+0xf/0x11 > [] start_kernel+0x312/0x3c6 > [] x86_64_start_reservations+0x131/0x136 > [] x86_64_start_kernel+0xed/0xf4 Obviously, irq_desc is used in hard-IRQ context. > to a HARDIRQ-irq-unsafe lock: > (&(&(&gcwq->lock)->lock)->wait_lock){+.+...} > ... which became HARDIRQ-irq-unsafe at: > ... [] mark_irqflags+0x120/0x190 > [] __lock_acquire+0x344/0x4e0 > [] lock_acquire+0x8a/0x140 > [] _raw_spin_lock+0x40/0x80 > [] rt_spin_lock_slowlock+0x48/0x2f0 > [] rt_spin_lock+0x16/0x40 > [] create_worker+0x69/0x220 > [] init_workqueues+0x24b/0x3f8 > [] do_one_initcall+0x42/0x170 > [] kernel_init+0xe5/0x17a > [] kernel_thread_helper+0x4/0x10 [...] Workqueue code uses spin_lock_irq() on the workqueue lock, which with PREEMPT_RT enabled doesn't actually block IRQs. In 3.6, the irq_cpu_rmap functions relies on a workqueue flush to finish any outstanding notifications before freeing the cpu_rmap that they use. This won't be reliable if the notification is scheduled after releasing the irq_desc lock. However, following commit 896f97ea95c1 ('lib: cpu_rmap: avoid flushing all workqueues') in 3.8, I think that it is sufficient to do only kref_get(&desc->affinity_notify->kref) in __irq_set_affinity_locked() and then call schedule_work() in irq_set_affinity() after releasing the lock. Something like this (untested): diff --git a/arch/mips/cavium-octeon/octeon-irq.c b/arch/mips/cavium-octeon/octeon-irq.c index 9d36774..0406481 100644 --- a/arch/mips/cavium-octeon/octeon-irq.c +++ b/arch/mips/cavium-octeon/octeon-irq.c @@ -635,7 +635,8 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data *data) cpumask_clear(&new_affinity); cpumask_set_cpu(cpumask_first(cpu_online_mask), &new_affinity); } - __irq_set_affinity_locked(data, &new_affinity); + /* XXX No-one else calls this; why does this chip need it? */ + __irq_set_affinity_locked(data, &new_affinity, NULL); } static int octeon_irq_ciu_set_affinity(struct irq_data *data, diff --git a/include/linux/irq.h b/include/linux/irq.h index f04d3ba..de992f4 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -380,7 +380,9 @@ extern void remove_percpu_irq(unsigned int irq, struct irqaction *act); extern void irq_cpu_online(void); extern void irq_cpu_offline(void); -extern int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *cpumask); +extern int __irq_set_affinity_locked(struct irq_data *data, + const struct cpumask *cpumask, + struct irq_affinity_notify **notify); #ifdef CONFIG_GENERIC_HARDIRQS diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 514bcfd..157afa2 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -162,12 +162,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, return ret; } -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) +int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, + struct irq_affinity_notify **notify) { struct irq_chip *chip = irq_data_get_irq_chip(data); struct irq_desc *desc = irq_data_to_desc(data); int ret = 0; + if (notify) + *notify = NULL; + if (!chip || !chip->irq_set_affinity) return -EINVAL; @@ -178,9 +182,9 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) irq_copy_pending(desc, mask); } - if (desc->affinity_notify) { + if (notify && desc->affinity_notify) { kref_get(&desc->affinity_notify->kref); - schedule_work(&desc->affinity_notify->work); + *notify = desc->affinity_notify; } irqd_set(data, IRQD_AFFINITY_SET); @@ -196,6 +200,7 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask) int irq_set_affinity(unsigned int irq, const struct cpumask *mask) { struct irq_desc *desc = irq_to_desc(irq); + struct irq_affinity_notify *notify; unsigned long flags; int ret; @@ -203,8 +208,13 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *mask) return -EINVAL; raw_spin_lock_irqsave(&desc->lock, flags); - ret = __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask); + ret = __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask, + ¬ify); raw_spin_unlock_irqrestore(&desc->lock, flags); + + if (notify) + schedule_work(¬ify->work); + return ret; } --- END --- Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/