2014-02-22 17:52:07

by Sasha Levin

[permalink] [raw]
Subject: kernfs/rtc: circular dependency between kernfs and ops_lock

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on
the following (now with pretty line numbers!) spew:

[ 218.132898] ======================================================
[ 218.133902] [ INFO: possible circular locking dependency detected ]
[ 218.134820] 3.14.0-rc3-next-20140221-sasha-00008-g0e660cf-dirty #116 Tainted: G W
[ 218.135911] -------------------------------------------------------
[ 218.136842] trinity-c47/11528 is trying to acquire lock:
[ 218.137613] (&rtc->ops_lock){+.+.+.}, at: [<drivers/rtc/interface.c:42>] rtc_read_time+0x2b/0x60
[ 218.138909]
[ 218.138909] but task is already holding lock:
[ 218.139802] (s_active#20){++++.+}, at: [<fs/kernfs/file.c:99>] kernfs_seq_start+0x34/0xa0
[ 218.140087]
[ 218.140087] which lock already depends on the new lock.
[ 218.140087]
[ 218.140087]
[ 218.140087] the existing dependency chain (in reverse order) is:
[ 218.140087]
-> #1 (s_active#20){++++.+}:
[ 218.140087] [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>]
validate_chain+0x6ca/0x7c0
[ 218.140087] [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4ad/0x580
[ 218.140087] [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>]
lock_acquire+0x182/0x1d0
[ 218.140087] [<arch/x86/include/asm/atomic.h:26 fs/kernfs/dir.c:349>] kernfs_drain+0xb8/0x180
[ 218.140087] [<fs/kernfs/dir.c:1031>] __kernfs_remove+0x10b/0x1d0
[ 218.140087] [<fs/kernfs/dir.c:1226>] kernfs_remove_by_name_ns+0x79/0xb0
[ 218.140087] [<fs/sysfs/group.c:28>] remove_files+0x31/0x70
[ 218.140087] [<fs/sysfs/group.c:228>] sysfs_remove_group+0x7e/0xb0
[ 218.140087] [<fs/sysfs/group.c:250>] sysfs_remove_groups+0x32/0x50
[ 218.140087] [<drivers/base/core.c:524>] device_remove_attrs+0x6d/0x90
[ 218.140087] [<drivers/base/core.c:1206>] device_del+0x135/0x1c0
[ 218.140087] [<drivers/base/core.c:1237>] device_unregister+0x50/0x60
[ 218.140087] [<drivers/rtc/class.c:273>] rtc_device_unregister+0x4f/0x80
[ 218.140087] [<include/linux/ioport.h:166 drivers/rtc/rtc-cmos.c:805>]
cmos_do_probe+0x3b0/0x3f0
[ 218.157097] [<drivers/rtc/rtc-cmos.c:1153>] cmos_platform_probe+0x46/0x4e
[ 218.157097] [<drivers/base/platform.c:487>] platform_drv_probe+0x36/0x90
[ 218.157097] [<drivers/base/dd.c:292>] really_probe+0x11f/0x2f0
[ 218.157097] [<include/linux/pm_runtime.h:189 drivers/base/dd.c:383>]
driver_probe_device+0x83/0xb0
[ 218.157097] [<drivers/base/dd.c:460>] __driver_attach+0x69/0xa0
[ 218.157097] [<drivers/base/bus.c:311>] bus_for_each_dev+0x66/0xc0
[ 218.157097] [<drivers/base/dd.c:480>] driver_attach+0x19/0x20
[ 218.157097] [<drivers/base/bus.c:692>] bus_add_driver+0x116/0x240
[ 218.157097] [<drivers/base/driver.c:167>] driver_register+0xa8/0xf0
[ 218.157097] [<drivers/base/platform.c:542>] __platform_driver_register+0x45/0x50
[ 218.157097] [<drivers/base/platform.c:590>] platform_driver_probe+0x1f/0xc0
[ 218.157097] [<drivers/rtc/rtc-cmos.c:1200>] cmos_init+0x3f/0x77
[ 218.157097] [<init/main.c:696>] do_one_initcall+0xc2/0x1e0
[ 218.157097] [<init/main.c:761 init/main.c:770 init/main.c:789>] do_basic_setup+0x9c/0xbf
[ 218.157097] [<init/main.c:913>] kernel_init_freeable+0x2d1/0x354
[ 218.157097] [<init/main.c:842>] kernel_init+0x9/0xf0
[ 218.157097] [<arch/x86/kernel/entry_64.S:555>] ret_from_fork+0x7c/0xb0
[ 218.157097]
-> #0 (&rtc->ops_lock){+.+.+.}:
[ 218.157097] [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
[ 218.157097] [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>]
validate_chain+0x6ca/0x7c0
[ 218.157097] [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4ad/0x580
[ 218.157097] [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>]
lock_acquire+0x182/0x1d0
[ 218.157097] [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:600>]
mutex_lock_interruptible_nested+0x6a/0x600
[ 218.157097] [<drivers/rtc/interface.c:42>] rtc_read_time+0x2b/0x60
[ 218.157097] [<drivers/rtc/rtc-sysfs.c:56>] time_show+0x15/0x40
[ 218.157097] [<drivers/base/core.c:122>] dev_attr_show+0x22/0x50
[ 218.157097] [<fs/sysfs/file.c:62>] sysfs_kf_seq_show+0xa8/0x130
[ 218.157097] [<fs/kernfs/file.c:155>] kernfs_seq_show+0x21/0x30
[ 218.157097] [<fs/seq_file.c:109>] traverse+0xd8/0x1f0
[ 218.157097] [<fs/seq_file.c:179>] seq_read+0x5c/0x430
[ 218.157097] [<fs/kernfs/file.c:230>] kernfs_fop_read+0x24/0x40
[ 218.157097] [<fs/read_write.c:396>] vfs_read+0xb4/0x180
[ 218.157097] [<include/linux/file.h:36 fs/read_write.c:545 fs/read_write.c:532>]
SyS_pread64+0x67/0x90
[ 218.157097] [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
[ 218.157097]
[ 218.157097] other info that might help us debug this:
[ 218.157097]
[ 218.157097] Possible unsafe locking scenario:
[ 218.157097]
[ 218.157097] CPU0 CPU1
[ 218.157097] ---- ----
[ 218.157097] lock(s_active#20);
[ 218.157097] lock(&rtc->ops_lock);
[ 218.157097] lock(s_active#20);
[ 218.157097] lock(&rtc->ops_lock);
[ 218.157097]
[ 218.157097] *** DEADLOCK ***
[ 218.157097]
[ 218.157097] 3 locks held by trinity-c47/11528:
[ 218.157097] #0: (&p->lock){+.+.+.}, at: [<fs/seq_file.c:175>] seq_read+0x3a/0x430
[ 218.157097] #1: (&of->mutex){+.+.+.}, at: [<fs/kernfs/file.c:99>] kernfs_seq_start+0x2c/0xa0
[ 218.157097] #2: (s_active#20){++++.+}, at: [<fs/kernfs/file.c:99>] kernfs_seq_start+0x34/0xa0
[ 218.157097]
[ 218.157097] stack backtrace:
[ 218.157097] CPU: 8 PID: 11528 Comm: trinity-c47 Tainted: G W
3.14.0-rc3-next-20140221-sasha-00008-g0e660cf-dirty #116
[ 218.157097] 0000000000000000 ffff8806a74dba18 ffffffff843894c8 0000000000000000
[ 218.157097] 0000000000000000 ffff8806a74dba68 ffffffff811a0379 ffff8806a74dba88
[ 218.157097] ffff8806a74dba88 ffff8806a74dba68 ffff8806a96d0d38 ffff8806a96d0d70
[ 218.157097] Call Trace:
[ 218.157097] [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
[ 218.157097] [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
[ 218.157097] [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
[ 218.157097] [<arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305>] ?
sched_clock+0x1d/0x30
[ 218.157097] [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>]
validate_chain+0x6ca/0x7c0
[ 218.157097] [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4ad/0x580
[ 218.157097] [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>]
lock_acquire+0x182/0x1d0
[ 218.157097] [<drivers/rtc/interface.c:42>] ? rtc_read_time+0x2b/0x60
[ 218.157097] [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:600>]
mutex_lock_interruptible_nested+0x6a/0x600
[ 218.157097] [<drivers/rtc/interface.c:42>] ? rtc_read_time+0x2b/0x60
[ 218.157097] [<kernel/locking/lockdep.c:3189>] ? __lock_acquire+0x4cd/0x580
[ 218.157097] [<drivers/rtc/interface.c:42>] ? rtc_read_time+0x2b/0x60
[ 218.157097] [<drivers/rtc/interface.c:42>] rtc_read_time+0x2b/0x60
[ 218.157097] [<drivers/rtc/rtc-sysfs.c:56>] time_show+0x15/0x40
[ 218.157097] [<drivers/base/core.c:122>] dev_attr_show+0x22/0x50
[ 218.157097] [<fs/sysfs/file.c:33>] ? sysfs_file_ops+0x30/0x60
[ 218.157097] [<fs/sysfs/file.c:62>] sysfs_kf_seq_show+0xa8/0x130
[ 218.157097] [<fs/kernfs/file.c:155>] kernfs_seq_show+0x21/0x30
[ 218.157097] [<fs/seq_file.c:109>] traverse+0xd8/0x1f0
[ 218.157097] [<fs/seq_file.c:179>] seq_read+0x5c/0x430
[ 218.157097] [<security/security.c:677>] ? security_file_permission+0x85/0x90
[ 218.157097] [<fs/kernfs/file.c:230>] kernfs_fop_read+0x24/0x40
[ 218.157097] [<fs/read_write.c:396>] vfs_read+0xb4/0x180
[ 218.157097] [<include/linux/file.h:36 fs/read_write.c:545 fs/read_write.c:532>]
SyS_pread64+0x67/0x90
[ 218.157097] [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2


Thanks,
Sasha


2014-02-22 20:52:22

by Tejun Heo

[permalink] [raw]
Subject: Re: kernfs/rtc: circular dependency between kernfs and ops_lock

Hello,

On Sat, Feb 22, 2014 at 12:51:42PM -0500, Sasha Levin wrote:
> While fuzzing with trinity inside a KVM tools guest running the
> latest -next kernel I've stumbled on the following (now with pretty
> line numbers!) spew:

Heh, that's pretty awesome.

> [ 218.132898] ======================================================
> [ 218.133902] [ INFO: possible circular locking dependency detected ]
> [ 218.134820] 3.14.0-rc3-next-20140221-sasha-00008-g0e660cf-dirty #116 Tainted: G W
> [ 218.135911] -------------------------------------------------------
> [ 218.136842] trinity-c47/11528 is trying to acquire lock:
> [ 218.137613] (&rtc->ops_lock){+.+.+.}, at: [<drivers/rtc/interface.c:42>] rtc_read_time+0x2b/0x60
> [ 218.138909]
> [ 218.138909] but task is already holding lock:
> [ 218.139802] (s_active#20){++++.+}, at: [<fs/kernfs/file.c:99>] kernfs_seq_start+0x34/0xa0
> [ 218.140087]
> [ 218.140087] which lock already depends on the new lock.

The problem is rtc_device_unregister(). It grabs rtc->ops_lock and
then tries to remove the sysfs files. Some of those sysfs file
implementations acquire rtc->ops_lock, so it of course can lead to
deadlocks. One can't try to delete a sysfs file which acquires a lock
while holding the same lock.

Thanks.

--
tejun

2014-02-24 11:10:07

by Alessandro Zummo

[permalink] [raw]
Subject: Re: kernfs/rtc: circular dependency between kernfs and ops_lock

On Sat, 22 Feb 2014 15:52:17 -0500
Tejun Heo <[email protected]> wrote:

> The problem is rtc_device_unregister(). It grabs rtc->ops_lock and
> then tries to remove the sysfs files. Some of those sysfs file
> implementations acquire rtc->ops_lock, so it of course can lead to
> deadlocks. One can't try to delete a sysfs file which acquires a lock
> while holding the same lock.

Pretty interesting indeed. One option would be to remove
the sys files before acquiring the lock. But I wonder
if this could lead to other issues.

--

Best regards,

Alessandro Zummo,
Tower Technologies - Torino, Italy

http://www.towertech.it

2014-04-02 22:52:29

by Sasha Levin

[permalink] [raw]
Subject: Re: kernfs/rtc: circular dependency between kernfs and ops_lock

On 03/31/2014 08:19 AM, Lars-Peter Clausen wrote:
> On 03/31/2014 02:03 PM, Alessandro Zummo wrote:
>> On Mon, 31 Mar 2014 13:07:10 +0200
>> Lars-Peter Clausen <[email protected]> wrote:
>>
>>> It doesn't really matter where it is unregistered. device_unregister() will
>>> (somewhere down it's callchain) take the kernfs lock, hence it must be
>>> callled with the rtc mutex being held.
>>
>> Maybe device_remove_attrs could be called in the rtc base class,
>> before the device removal?
>>
>
> Just move the device_unregister() call outside the lock. I think the only thing that needs to be protected is the ops = NULL assignment. Moving the unregister after the unlock also means that the extra get_device()/put_device() pair can be removed.

That seems to cause errors on boot:

[ 23.714976] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
[ 23.716017] IP: [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[ 23.716620] PGD 0
[ 23.716843] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 23.717448] Dumping ftrace buffer:
[ 23.717861] (ftrace buffer empty)
[ 23.718233] Modules linked in:
[ 23.718561] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-next-20140402-sasha-00013-g0cfaf7e-dirty #364
[ 23.719523] task: ffff8806ec5b0000 ti: ffff88003e284000 task.ti: ffff88003e284000
[ 23.720134] RIP: 0010:[<ffffffffb438ee5f>] [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[ 23.720134] RSP: 0000:ffff88003e285af8 EFLAGS: 00010292
[ 23.720134] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 23.720134] RDX: 0000000000000000 RSI: ffffffffb78bd6c0 RDI: 0000000000000000
[ 23.720134] RBP: ffff88003e285b28 R08: 0000000000000000 R09: 0000000000000000
[ 23.720134] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 23.720134] R13: ffffffffb78bd6c0 R14: 0000000000000000 R15: 0000000000000008
[ 23.720134] FS: 0000000000000000(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000
[ 23.720134] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 23.720134] CR2: 0000000000000090 CR3: 0000000038e2c000 CR4: 00000000000006a0
[ 23.720134] Stack:
[ 23.720134] ffff88003e285b28 0000000000000000 ffffffffb78bd6c0 0000000000000000
[ 23.720134] 0000000000000080 0000000000000008 ffff88003e285b58 ffffffffb438f0be
[ 23.720134] ffff880000000010 ffffffffb9c0f6c0 ffff8800bdd89148 ffffffffb8e430b0
[ 23.720134] Call Trace:
[ 23.720134] [<ffffffffb438f0be>] kernfs_find_and_get_ns+0x3e/0x70
[ 23.730419] kobject: 'tlan' (ffff88026bf2a760): kobject_cleanup, parent ffff8803ec71efe8
[ 23.730422] kobject: 'tlan' (ffff88026bf2a760): auto cleanup 'remove' event
[ 23.730424] kobject: 'tlan' (ffff88026bf2a760): kobject_uevent_env
[ 23.730460] kobject: 'tlan' (ffff88026bf2a760): fill_kobj_path: path = '/bus/pci/drivers/tlan'
[ 23.730559] kobject: 'tlan' (ffff88026bf2a760): auto cleanup kobject_del
[ 23.720134] [<ffffffffb439392d>] sysfs_unmerge_group+0x1d/0x70
[ 23.720134] [<ffffffffb50d858b>] dpm_sysfs_remove+0x2b/0x70
[ 23.720134] [<ffffffffb50cdd97>] device_del+0x47/0x1c0
[ 23.720134] [<ffffffffb50cdf68>] device_unregister+0x58/0x70
[ 23.720134] [<ffffffffb636bb95>] rtc_device_unregister+0x65/0x80
[ 23.720134] [<ffffffffb63720a3>] cmos_do_probe+0x3b3/0x400
[ 23.720134] [<ffffffffba35ddd3>] cmos_platform_probe+0x44/0x4d
[ 23.720134] [<ffffffffb50d3862>] platform_drv_probe+0x32/0x90
[ 23.720134] [<ffffffffb50d1ba5>] driver_probe_device+0x175/0x370
[ 23.720134] [<ffffffffb50d1e24>] __driver_attach+0x84/0xc0
[ 23.720134] [<ffffffffb50d1da0>] ? driver_probe_device+0x370/0x370
[ 23.720134] [<ffffffffb50cfa39>] bus_for_each_dev+0x69/0xb0
[ 23.720134] [<ffffffffb50d13de>] driver_attach+0x1e/0x20
[ 23.720134] [<ffffffffb50d1018>] bus_add_driver+0x138/0x260
[ 23.720134] [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[ 23.720134] [<ffffffffb50d2608>] driver_register+0x98/0xe0
[ 23.720134] [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[ 23.720134] [<ffffffffb50d36da>] __platform_driver_register+0x4a/0x50
[ 23.720134] [<ffffffffb50d3704>] platform_driver_probe+0x24/0xc0
[ 23.720134] [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[ 23.720134] [<ffffffffba35dd2c>] cmos_init+0x3c/0x74
[ 23.720134] [<ffffffffb400216a>] do_one_initcall+0xca/0x1d0
[ 23.720134] [<ffffffffb4185100>] ? parse_args+0x280/0x460
[ 23.720134] [<ffffffffba2b8128>] kernel_init_freeable+0x1d5/0x26c
[ 23.720134] [<ffffffffba2b7848>] ? loglevel+0x31/0x31
[ 23.720134] [<ffffffffb74a1fe0>] ? rest_init+0x140/0x140
[ 23.720134] [<ffffffffb74a1fee>] kernel_init+0xe/0x100
[ 23.720134] [<ffffffffb7514b7c>] ret_from_fork+0x7c/0xb0
[ 23.720134] [<ffffffffb74a1fe0>] ? rest_init+0x140/0x140
[ 23.720134] Code: 8b 75 f8 c9 c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 d6 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 <0f> b7 87 90 00 00 00 8b 15 d4 00 7d 05 48 8b 5f 78 66 c1 e8 05
[ 23.720134] RIP [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[ 23.720134] RSP <ffff88003e285af8>
[ 23.720134] CR2: 0000000000000090


Thanks,
Sasha