2010-02-09 07:00:18

by Tao Ma

[permalink] [raw]
Subject: lockdep warning for iscsi in 2.6.33-rc6

Feb 9 11:39:29 ocfs2-test7 kernel: =======================================================
Feb 9 11:39:29 ocfs2-test7 kernel: [ INFO: possible circular locking dependency detected ]
Feb 9 11:39:29 ocfs2-test7 kernel: 2.6.33-rc6 #6
Feb 9 11:39:29 ocfs2-test7 kernel: -------------------------------------------------------
Feb 9 11:39:29 ocfs2-test7 kernel: iscsid/2960 is trying to acquire lock:
Feb 9 11:39:29 ocfs2-test7 kernel: (&shost->scan_mutex){+.+.+.}, at: [<ffffffffa006aec5>] scsi_scan_target+0x6d/0xad [scsi_mod]
Feb 9 11:39:29 ocfs2-test7 kernel:
Feb 9 11:39:29 ocfs2-test7 kernel: but task is already holding lock:
Feb 9 11:39:30 ocfs2-test7 kernel: (&ihost->mutex){+.+...}, at: [<ffffffffa02aeb34>] iscsi_user_scan_session+0x96/0x14b [scsi_transport_iscsi]
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: which lock already depends on the new lock.
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: the existing dependency chain (in reverse order) is:
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: -> #2 (&ihost->mutex){+.+...}:
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820641ea>] validate_chain+0xa31/0xedb
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82064f9c>] __lock_acquire+0x908/0x95f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82065d62>] lock_acquire+0xbc/0x102
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8232d3c5>] mutex_lock_nested+0x48/0x29f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa02aeb34>] iscsi_user_scan_session+0x96/0x14b [scsi_transport_iscsi]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221702f>] device_for_each_child+0x32/0x64
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa02aea83>] iscsi_user_scan+0x28/0x2a [scsi_transport_iscsi]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006b979>] store_scan+0x9d/0xc9 [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82216c88>] dev_attr_store+0x1b/0x1d
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8212b78b>] sysfs_write_file+0xf7/0x133
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820d881e>] vfs_write+0xae/0x14a
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820d8d2a>] sys_write+0x47/0x6f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: -> #1 (s_active){++++.+}:
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820641ea>] validate_chain+0xa31/0xedb
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82064f9c>] __lock_acquire+0x908/0x95f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82065d62>] lock_acquire+0xbc/0x102
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8212c885>] sysfs_addrm_finish+0xba/0x125
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8212adbe>] sysfs_hash_and_remove+0x4f/0x6b
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8212ce6f>] sysfs_remove_link+0x1c/0x1e
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221a10e>] driver_sysfs_remove+0x29/0x3c
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221a3e5>] driver_probe_device+0xdc/0x139
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221a627>] __device_attach+0x33/0x3c
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff822194a9>] bus_for_each_drv+0x4d/0x80
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221a541>] device_attach+0x60/0x77
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82219445>] bus_probe_device+0x22/0x39
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82217edb>] device_add+0x44c/0x5d3
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006c026>] scsi_sysfs_add_sdev+0xb8/0x1e2 [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006a0e4>] scsi_probe_and_add_lun+0xadd/0xbcf [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006adf7>] __scsi_add_device+0xb6/0xe5 [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa00b16d6>] ata_scsi_scan_host+0x89/0x17a [libata]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa00afb25>] async_port_probe+0xa9/0xb4 [libata]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8205a5cc>] async_thread+0x115/0x206
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82054a5d>] kthread+0x7d/0x85
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82003794>] kernel_thread_helper+0x4/0x10
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: -> #0 (&shost->scan_mutex){+.+.+.}:
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82063ed6>] validate_chain+0x71d/0xedb
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82064f9c>] __lock_acquire+0x908/0x95f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82065d62>] lock_acquire+0xbc/0x102
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8232d3c5>] mutex_lock_nested+0x48/0x29f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006aec5>] scsi_scan_target+0x6d/0xad [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa02aeb9f>] iscsi_user_scan_session+0x101/0x14b [scsi_transport_iscsi]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8221702f>] device_for_each_child+0x32/0x64
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa02aea83>] iscsi_user_scan+0x28/0x2a [scsi_transport_iscsi]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffffa006b979>] store_scan+0x9d/0xc9 [scsi_mod]
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff82216c88>] dev_attr_store+0x1b/0x1d
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8212b78b>] sysfs_write_file+0xf7/0x133
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820d881e>] vfs_write+0xae/0x14a
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff820d8d2a>] sys_write+0x47/0x6f
Feb 9 11:39:30 ocfs2-test7 kernel: [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b
Feb 9 11:39:30 ocfs2-test7 kernel:
Feb 9 11:39:30 ocfs2-test7 kernel: other info that might help us debug this:
Feb 9 11:39:31 ocfs2-test7 kernel:
Feb 9 11:39:31 ocfs2-test7 kernel: 4 locks held by iscsid/2960:
Feb 9 11:39:31 ocfs2-test7 kernel: #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8212b6cd>] sysfs_write_file+0x39/0x133
Feb 9 11:39:31 ocfs2-test7 kernel: #1: (s_active){++++.+}, at: [<ffffffff8212cb61>] sysfs_get_active_two+0x1f/0x44
Feb 9 11:39:31 ocfs2-test7 kernel: #2: (s_active){++++.+}, at: [<ffffffff8212cb6e>] sysfs_get_active_two+0x2c/0x44
Feb 9 11:39:31 ocfs2-test7 kernel: #3: (&ihost->mutex){+.+...}, at: [<ffffffffa02aeb34>] iscsi_user_scan_session+0x96/0x14b [scsi_transport_iscsi]
Feb 9 11:39:31 ocfs2-test7 kernel:
Feb 9 11:39:31 ocfs2-test7 kernel: stack backtrace:
Feb 9 11:39:31 ocfs2-test7 kernel: Pid: 2960, comm: iscsid Not tainted 2.6.33-rc6 #6
Feb 9 11:39:31 ocfs2-test7 kernel: Call Trace:
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff820630b2>] print_circular_bug+0xb3/0xc1
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff82063ed6>] validate_chain+0x71d/0xedb
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff820613d0>] ? add_lock_to_list+0x82/0xc2
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff82064f9c>] __lock_acquire+0x908/0x95f
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff82065d62>] lock_acquire+0xbc/0x102
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa006aec5>] ? scsi_scan_target+0x6d/0xad [scsi_mod]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8232d3c5>] mutex_lock_nested+0x48/0x29f
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa006aec5>] ? scsi_scan_target+0x6d/0xad [scsi_mod]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff820622ef>] ? mark_held_locks+0x49/0x69
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8232ef48>] ? _raw_spin_unlock_irqrestore+0x3f/0x47
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa006aec5>] scsi_scan_target+0x6d/0xad [scsi_mod]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa02aeb9f>] iscsi_user_scan_session+0x101/0x14b [scsi_transport_iscsi]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa02aea9e>] ? iscsi_user_scan_session+0x0/0x14b [scsi_transport_iscsi]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8221702f>] device_for_each_child+0x32/0x64
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa02aea83>] iscsi_user_scan+0x28/0x2a [scsi_transport_iscsi]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffffa006b979>] store_scan+0x9d/0xc9 [scsi_mod]
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8212cb6e>] ? sysfs_get_active_two+0x2c/0x44
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff82216c88>] dev_attr_store+0x1b/0x1d
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8212b78b>] sysfs_write_file+0xf7/0x133
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff820d881e>] vfs_write+0xae/0x14a
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff820d8d2a>] sys_write+0x47/0x6f
Feb 9 11:39:31 ocfs2-test7 kernel: [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b


Attachments:
iscsi_lockdep (8.68 kB)

2010-02-09 19:30:53

by Mike Christie

[permalink] [raw]
Subject: Re: lockdep warning for iscsi in 2.6.33-rc6

On 02/09/2010 12:59 AM, Tao Ma wrote:
> Hi Mike,
> I meet with a lockdep warning for iscsi in 2.6.33-rc6.
> the lockdep is attached.
>

If the lockdep output is saying that there is a problem with the iscsi
host taking its mutex then the scsi host mutex, I think that might be a
mistake in the lockdep detection. We always take the iscsi host mutex
then take the scsi host mutex.

Could it get confused if we are scanning two hosts at the same time? If
it is just looking at if a lock is being taken then it would look like
host1 has its ihost->mutex and then took its shost->mutex, but then
host2 could start to get scanned at the same time, and it is going to
take its ihost->mutex. It would then look like we are trying to grab a
ihost->mutex while holding a shost->mutex. However, the mutexs are not
global and they are different instances of the mutex because each host
has its own.

If that is not the problem, maybe it has something to do with some sysfs
lock and ata doing scanning while iscsi is.

2010-02-10 00:42:52

by Tao Ma

[permalink] [raw]
Subject: Re: lockdep warning for iscsi in 2.6.33-rc6

Hi Mike,

Mike Christie wrote:
> On 02/09/2010 12:59 AM, Tao Ma wrote:
>> Hi Mike,
>> I meet with a lockdep warning for iscsi in 2.6.33-rc6.
>> the lockdep is attached.
>>
>
> If the lockdep output is saying that there is a problem with the iscsi
> host taking its mutex then the scsi host mutex, I think that might be a
> mistake in the lockdep detection. We always take the iscsi host mutex
> then take the scsi host mutex.
>
> Could it get confused if we are scanning two hosts at the same time? If
> it is just looking at if a lock is being taken then it would look like
> host1 has its ihost->mutex and then took its shost->mutex, but then
> host2 could start to get scanned at the same time, and it is going to
> take its ihost->mutex. It would then look like we are trying to grab a
> ihost->mutex while holding a shost->mutex. However, the mutexs are not
> global and they are different instances of the mutex because each host
> has its own.
If these 2 mutexes are grabed in the same order(in your description,
ihost->mutex first and then shost->mutex), there would be no problem.
>
> If that is not the problem, maybe it has something to do with some sysfs
> lock and ata doing scanning while iscsi is.
yeah it looks so from the lockdep output.

You know the real bad thing is that after lockdep find an error, it will
set debug_locks to 0, so any future check will be disabled. As iscsi
start up very early, no other check can be proceeded.

Regards,
Tao

2010-02-11 20:08:17

by Maciej Rutecki

[permalink] [raw]
Subject: Re: lockdep warning for iscsi in 2.6.33-rc6

Dnia wtorek, 9 lutego 2010 o 07:59:27 Tao Ma napisaƂ(a):
> Hi Mike,
> I meet with a lockdep warning for iscsi in 2.6.33-rc6.
> the lockdep is attached.
>
> Regards,
> Tao
>

I created a Bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=15278
for Your bug report, please add Your address to the CC list in there, thanks!
--
Maciej Rutecki
http://www.maciek.unixy.pl

2010-02-12 17:21:34

by Mike Christie

[permalink] [raw]
Subject: Re: lockdep warning for iscsi in 2.6.33-rc6

On 02/09/2010 06:42 PM, Tao Ma wrote:
> Hi Mike,
>
> Mike Christie wrote:
>> On 02/09/2010 12:59 AM, Tao Ma wrote:
>>> Hi Mike,
>>> I meet with a lockdep warning for iscsi in 2.6.33-rc6.
>>> the lockdep is attached.
>>>
>>
>> If the lockdep output is saying that there is a problem with the iscsi
>> host taking its mutex then the scsi host mutex, I think that might be
>> a mistake in the lockdep detection. We always take the iscsi host
>> mutex then take the scsi host mutex.
>>
>> Could it get confused if we are scanning two hosts at the same time?
>> If it is just looking at if a lock is being taken then it would look
>> like host1 has its ihost->mutex and then took its shost->mutex, but
>> then host2 could start to get scanned at the same time, and it is
>> going to take its ihost->mutex. It would then look like we are trying
>> to grab a ihost->mutex while holding a shost->mutex. However, the
>> mutexs are not global and they are different instances of the mutex
>> because each host has its own.
> If these 2 mutexes are grabed in the same order(in your description,
> ihost->mutex first and then shost->mutex), there would be no problem.

I checked the code and we are taking the locks in the proper order, but
I did some more testing and I do not think it was detecting what I
thought above.

I just did a login with one iscsi session and got the lockdep warning.
So I will do some more digging into what is going on.