2012-11-14 02:25:30

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-iommu: Fix AB-BA lockdep report

On Sun, 2011-12-04 at 13:54 -0500, Steven Rostedt wrote:
> From: Roland Dreier <[email protected]>
>
> When unbinding a device so that I could pass it through to a KVM VM, I
> got the lockdep report below. It looks like a legitimate lock
> ordering problem:

Did this patch not make it into stable releases other than 3.1. I
couldn't find it in any other stable tress prior to 3.1.

-- Shuah
>
> - domain_context_mapping_one() takes iommu->lock and calls
> iommu_support_dev_iotlb(), which takes device_domain_lock (inside
> iommu->lock).
>
> - domain_remove_one_dev_info() starts by taking device_domain_lock
> then takes iommu->lock inside it (near the end of the function).
>
> So this is the classic AB-BA deadlock. It looks like a safe fix is to
> simply release device_domain_lock a bit earlier, since as far as I can
> tell, it doesn't protect any of the stuff accessed at the end of
> domain_remove_one_dev_info() anyway.
>
> BTW, the use of device_domain_lock looks a bit unsafe to me... it's
> at least not obvious to me why we aren't vulnerable to the race below:
>
> iommu_support_dev_iotlb()
> domain_remove_dev_info()
>
> lock device_domain_lock
> find info
> unlock device_domain_lock
>
> lock device_domain_lock
> find same info
> unlock device_domain_lock
>
> free_devinfo_mem(info)
>
> do stuff with info after it's free
>
> However I don't understand the locking here well enough to know if
> this is a real problem, let alone what the best fix is.
>
> Anyway here's the full lockdep output that prompted all of this:
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.39.1+ #1
> -------------------------------------------------------
> bash/13954 is trying to acquire lock:
> (&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
>
> but task is already holding lock:
> (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (device_domain_lock){-.-...}:
> [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> [<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
> [<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
> [<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
> [<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
> [<ffffffff81cab204>] pci_iommu_init+0x16/0x41
> [<ffffffff81002165>] do_one_initcall+0x45/0x190
> [<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
> [<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
>
> -> #0 (&(&iommu->lock)->rlock){......}:
> [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
> [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
> [<ffffffff812f8b42>] device_notifier+0x72/0x90
> [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
> [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
> [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
> [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
> [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
> [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
> [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
> [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
> [<ffffffff8117569e>] vfs_write+0xce/0x190
> [<ffffffff811759e4>] sys_write+0x54/0xa0
> [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
>
> other info that might help us debug this:
>
> 6 locks held by bash/13954:
> #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
> #1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
> #2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
> #3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
> #4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
> #5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
>
> stack backtrace:
> Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
> Call Trace:
> [<ffffffff810993a7>] print_circular_bug+0xf7/0x100
> [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
> [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
> [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
> [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
> [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
> [<ffffffff812f8b42>] device_notifier+0x72/0x90
> [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
> [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
> [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
> [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
> [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
> [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
> [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
> [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
> [<ffffffff8117569e>] vfs_write+0xce/0x190
> [<ffffffff811759e4>] sys_write+0x54/0xa0
> [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
>
> Signed-off-by: Roland Dreier <[email protected]>
> Signed-off-by: David Woodhouse <[email protected]>
> ---
> drivers/pci/intel-iommu.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 8c2564d..bc05a51 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -3569,6 +3569,8 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
> found = 1;
> }
>
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> if (found == 0) {
> unsigned long tmp_flags;
> spin_lock_irqsave(&domain->iommu_lock, tmp_flags);
> @@ -3585,8 +3587,6 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
> spin_unlock_irqrestore(&iommu->lock, tmp_flags);
> }
> }
> -
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> }
>
> static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)


2012-11-14 03:04:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-iommu: Fix AB-BA lockdep report

On Tue, 2012-11-13 at 19:25 -0700, Shuah Khan wrote:
> On Sun, 2011-12-04 at 13:54 -0500, Steven Rostedt wrote:
> > From: Roland Dreier <[email protected]>
> >
> > When unbinding a device so that I could pass it through to a KVM VM, I
> > got the lockdep report below. It looks like a legitimate lock
> > ordering problem:
>
> Did this patch not make it into stable releases other than 3.1. I
> couldn't find it in any other stable tress prior to 3.1.

Ah, this was done in the early stable-rt releases. Where we took
mainline fixes as long as they were in tglx's tree. But today the
stable-rt tree waits for mainline fixes to come in via the stable tree
so we don't do something like this. That is, miss getting a fix into
stable. Yeah, this should be added to 3.0. (looks to be already in 3.2
and 3.4).


Greg,

Can you add the following commit to the 3.0 stable tree. We've had this
in v3.0-rt for some time now. :-/

commit 3e7abe2556b583e87dabda3e0e6178a67b20d06f
Author: Roland Dreier <[email protected]>
Date: Wed Jul 20 06:22:21 2011 -0700

intel-iommu: Fix AB-BA lockdep report

Thanks,

-- Steve

>
> -- Shuah
> >
> > - domain_context_mapping_one() takes iommu->lock and calls
> > iommu_support_dev_iotlb(), which takes device_domain_lock (inside
> > iommu->lock).
> >
> > - domain_remove_one_dev_info() starts by taking device_domain_lock
> > then takes iommu->lock inside it (near the end of the function).
> >
> > So this is the classic AB-BA deadlock. It looks like a safe fix is to
> > simply release device_domain_lock a bit earlier, since as far as I can
> > tell, it doesn't protect any of the stuff accessed at the end of
> > domain_remove_one_dev_info() anyway.
> >
> > BTW, the use of device_domain_lock looks a bit unsafe to me... it's
> > at least not obvious to me why we aren't vulnerable to the race below:
> >
> > iommu_support_dev_iotlb()
> > domain_remove_dev_info()
> >
> > lock device_domain_lock
> > find info
> > unlock device_domain_lock
> >
> > lock device_domain_lock
> > find same info
> > unlock device_domain_lock
> >
> > free_devinfo_mem(info)
> >
> > do stuff with info after it's free
> >
> > However I don't understand the locking here well enough to know if
> > this is a real problem, let alone what the best fix is.
> >
> > Anyway here's the full lockdep output that prompted all of this:
> >
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.39.1+ #1
> > -------------------------------------------------------
> > bash/13954 is trying to acquire lock:
> > (&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
> >
> > but task is already holding lock:
> > (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (device_domain_lock){-.-...}:
> > [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> > [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> > [<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
> > [<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
> > [<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
> > [<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
> > [<ffffffff81cab204>] pci_iommu_init+0x16/0x41
> > [<ffffffff81002165>] do_one_initcall+0x45/0x190
> > [<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
> > [<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
> >
> > -> #0 (&(&iommu->lock)->rlock){......}:
> > [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
> > [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> > [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> > [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
> > [<ffffffff812f8b42>] device_notifier+0x72/0x90
> > [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
> > [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
> > [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
> > [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
> > [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
> > [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
> > [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
> > [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
> > [<ffffffff8117569e>] vfs_write+0xce/0x190
> > [<ffffffff811759e4>] sys_write+0x54/0xa0
> > [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
> >
> > other info that might help us debug this:
> >
> > 6 locks held by bash/13954:
> > #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
> > #1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
> > #2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
> > #3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
> > #4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
> > #5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
> >
> > stack backtrace:
> > Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
> > Call Trace:
> > [<ffffffff810993a7>] print_circular_bug+0xf7/0x100
> > [<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
> > [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
> > [<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
> > [<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
> > [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
> > [<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
> > [<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
> > [<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
> > [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
> > [<ffffffff812f8b42>] device_notifier+0x72/0x90
> > [<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
> > [<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
> > [<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
> > [<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
> > [<ffffffff81373ccf>] device_release_driver+0x2f/0x50
> > [<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
> > [<ffffffff813724ac>] drv_attr_store+0x2c/0x30
> > [<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
> > [<ffffffff8117569e>] vfs_write+0xce/0x190
> > [<ffffffff811759e4>] sys_write+0x54/0xa0
> > [<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
> >
> > Signed-off-by: Roland Dreier <[email protected]>
> > Signed-off-by: David Woodhouse <[email protected]>
> > ---
> > drivers/pci/intel-iommu.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index 8c2564d..bc05a51 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -3569,6 +3569,8 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > found = 1;
> > }
> >
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > +
> > if (found == 0) {
> > unsigned long tmp_flags;
> > spin_lock_irqsave(&domain->iommu_lock, tmp_flags);
> > @@ -3585,8 +3587,6 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
> > spin_unlock_irqrestore(&iommu->lock, tmp_flags);
> > }
> > }
> > -
> > - spin_unlock_irqrestore(&device_domain_lock, flags);
> > }
> >
> > static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)
>

2012-11-14 03:34:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-iommu: Fix AB-BA lockdep report

On Tue, Nov 13, 2012 at 10:04:53PM -0500, Steven Rostedt wrote:
> On Tue, 2012-11-13 at 19:25 -0700, Shuah Khan wrote:
> > On Sun, 2011-12-04 at 13:54 -0500, Steven Rostedt wrote:
> > > From: Roland Dreier <[email protected]>
> > >
> > > When unbinding a device so that I could pass it through to a KVM VM, I
> > > got the lockdep report below. It looks like a legitimate lock
> > > ordering problem:
> >
> > Did this patch not make it into stable releases other than 3.1. I
> > couldn't find it in any other stable tress prior to 3.1.
>
> Ah, this was done in the early stable-rt releases. Where we took
> mainline fixes as long as they were in tglx's tree. But today the
> stable-rt tree waits for mainline fixes to come in via the stable tree
> so we don't do something like this. That is, miss getting a fix into
> stable. Yeah, this should be added to 3.0. (looks to be already in 3.2
> and 3.4).
>
>
> Greg,
>
> Can you add the following commit to the 3.0 stable tree. We've had this
> in v3.0-rt for some time now. :-/
>
> commit 3e7abe2556b583e87dabda3e0e6178a67b20d06f
> Author: Roland Dreier <[email protected]>
> Date: Wed Jul 20 06:22:21 2011 -0700
>
> intel-iommu: Fix AB-BA lockdep report

I had to edit the path to the file, but it looks like it applies
properly now, thanks.

greg k-h

2012-11-14 03:43:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-iommu: Fix AB-BA lockdep report

On Tue, 2012-11-13 at 19:34 -0800, Greg Kroah-Hartman wrote:

> I had to edit the path to the file, but it looks like it applies
> properly now, thanks.

Thanks,

Just in case, below is what I applied to v3.0-rt.

-- Steve

>From fd92bf3e21c77fe06647b57feebb1b52da54969a Mon Sep 17 00:00:00 2001
From: Roland Dreier <[email protected]>
Date: Wed, 20 Jul 2011 06:22:21 -0700
Subject: [PATCH] intel-iommu: Fix AB-BA lockdep report

When unbinding a device so that I could pass it through to a KVM VM, I
got the lockdep report below. It looks like a legitimate lock
ordering problem:

- domain_context_mapping_one() takes iommu->lock and calls
iommu_support_dev_iotlb(), which takes device_domain_lock (inside
iommu->lock).

- domain_remove_one_dev_info() starts by taking device_domain_lock
then takes iommu->lock inside it (near the end of the function).

So this is the classic AB-BA deadlock. It looks like a safe fix is to
simply release device_domain_lock a bit earlier, since as far as I can
tell, it doesn't protect any of the stuff accessed at the end of
domain_remove_one_dev_info() anyway.

BTW, the use of device_domain_lock looks a bit unsafe to me... it's
at least not obvious to me why we aren't vulnerable to the race below:

iommu_support_dev_iotlb()
domain_remove_dev_info()

lock device_domain_lock
find info
unlock device_domain_lock

lock device_domain_lock
find same info
unlock device_domain_lock

free_devinfo_mem(info)

do stuff with info after it's free

However I don't understand the locking here well enough to know if
this is a real problem, let alone what the best fix is.

Anyway here's the full lockdep output that prompted all of this:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39.1+ #1
-------------------------------------------------------
bash/13954 is trying to acquire lock:
(&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230

but task is already holding lock:
(device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (device_domain_lock){-.-...}:
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
[<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
[<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
[<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
[<ffffffff81cab204>] pci_iommu_init+0x16/0x41
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
[<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10

-> #0 (&(&iommu->lock)->rlock){......}:
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

6 locks held by bash/13954:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
#1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
#2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
#3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
#4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
#5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230

stack backtrace:
Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
Call Trace:
[<ffffffff810993a7>] print_circular_bug+0xf7/0x100
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b

Signed-off-by: Roland Dreier <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
drivers/pci/intel-iommu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 8c2564d..bc05a51 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -3569,6 +3569,8 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
found = 1;
}

+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
if (found == 0) {
unsigned long tmp_flags;
spin_lock_irqsave(&domain->iommu_lock, tmp_flags);
@@ -3585,8 +3587,6 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain,
spin_unlock_irqrestore(&iommu->lock, tmp_flags);
}
}
-
- spin_unlock_irqrestore(&device_domain_lock, flags);
}

static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)
--
1.7.10.4


2012-11-14 15:58:53

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 03/11] intel-iommu: Fix AB-BA lockdep report

On Tue, 2012-11-13 at 19:34 -0800, Greg Kroah-Hartman wrote:
> On Tue, Nov 13, 2012 at 10:04:53PM -0500, Steven Rostedt wrote:
> > On Tue, 2012-11-13 at 19:25 -0700, Shuah Khan wrote:
> > > On Sun, 2011-12-04 at 13:54 -0500, Steven Rostedt wrote:
> > > > From: Roland Dreier <[email protected]>
> > > >
> > > > When unbinding a device so that I could pass it through to a KVM VM, I
> > > > got the lockdep report below. It looks like a legitimate lock
> > > > ordering problem:
> > >
> > > Did this patch not make it into stable releases other than 3.1. I
> > > couldn't find it in any other stable tress prior to 3.1.
> >
> > Ah, this was done in the early stable-rt releases. Where we took
> > mainline fixes as long as they were in tglx's tree. But today the
> > stable-rt tree waits for mainline fixes to come in via the stable tree
> > so we don't do something like this. That is, miss getting a fix into
> > stable. Yeah, this should be added to 3.0. (looks to be already in 3.2
> > and 3.4).
> >
> >
> > Greg,
> >
> > Can you add the following commit to the 3.0 stable tree. We've had this
> > in v3.0-rt for some time now. :-/
> >
> > commit 3e7abe2556b583e87dabda3e0e6178a67b20d06f
> > Author: Roland Dreier <[email protected]>
> > Date: Wed Jul 20 06:22:21 2011 -0700
> >
> > intel-iommu: Fix AB-BA lockdep report
>
> I had to edit the path to the file, but it looks like it applies
> properly now, thanks.


Thanks,
-- Shuah