2013-03-02 11:00:31

by Russ Dill

[permalink] [raw]
Subject: fasync race in fs/fcntl.c

I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
occurring, but the following is my best guess. A kernel log is
attached.

The comment for fasync_insert_entry:

* NOTE! It is very important that the FASYNC flag always
* match the state "is the filp on a fasync list".

Is not always true leading to deadlock.

CPU0 calls syscall fcntl(fd, F_SETFL, FASYNC)
fcntl calls fdget_raw, the count on the filp is 1, so it is not
incremented (no reference taken)
fcntl calls do_fcntl, which calls setfl which calls filp->op->fasync
which calls fasync_helper
fasync_helper calls fasync_add_entry, which calls fasync_insert_entry
fasync_insert_entry adds a fasync_struct to the list for the current
filp and assigns the pointer,
before getting to filp->f_flags |= FASYNC, we go to CPU1

CPU1 calls fput on the same filp, the counter is decremented to 0 and
that filp is either added to the delayed_fput_list or scheduled for
____fput task_work.
CPU1 calls __fput
__fput checks 'if (unlikely(file->f_flags & FASYNC)) {'
the flag is not set, so it continues on.
CPU1 calls file_free, scheduling the file to be freed at the end of
the rcu grace period

CPU0 continues, setting the flag

CPU0 and CPU1 run their rcu tasks, the file struct gets freed by file_free_rcu

someone calls kill_fasync, which calls kill_fasync_rcu, as its walking
the list, it eventually gets to the fasync_struct whose fa_file
pointer points to freed memory. send_sigio is called with this
pointer, which calls read_lock(&fown->lock), however, the memory used
by that lock has been reused and the system hardlocks.

I exercise this by running a UML instance which uses /dev/random.
After a day or so, I'll eventually get a crash or a hang. I captured
this with netconsole.


[172635.399438] ------------[ cut here ]------------
[172635.399449] WARNING: at
/build/buildd/linux-3.8.0/kernel/watchdog.c:246
watchdog_overflow_callback+0x9c/0xd0()
[172635.399451] Hardware name: VPCSE190X
[172635.399454] Watchdog detected hard LOCKUP on cpu 0
[172635.399456] Modules linked in: ftdi_sio netconsole(F) configfs(F)
ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd
_usb_audio snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F)
bnep rfcomm bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(
F) snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core snd_hd
a_codec videodev snd_hwdep(F) coretemp snd_pcm(F) snd_page_alloc(F)
joydev(F) iwlwifi[172635.399521] ------------[ cut here ]------------
[172635.399525] WARNING: at
/build/buildd/linux-3.8.0/net/core/skbuff.c:573
skb_release_head_state+0xed/0x100()
[172635.399525] Hardware name: VPCSE190X
[172635.399526] Modules linked in: ftdi_sio netconsole(F) configfs(F)
ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd_usb_audio
snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F) bnep rfcomm
bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core
snd_hda_codec videodev snd_hwdep(F) coretemp snd_pcm(F)
snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
snd_seq_midi_event(F) snd_rawmidi(F) kvm cfg80211 dm_multipath(F)
snd_seq(F) snd_seq_device(F) tpm_infineon snd_timer(F) psmouse(F)
snd(F) soundcore(F) scsi_dh serio_raw(F) sony_laptop rtsx_pci_ms mei
microcode(F) memstick mac_hid pcspkr lpc_ich tpm_tis firewire_sbp2
firewire_core crc_itu_t(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
hid_generic usbhid hid rtsx_pci_sdmmc ghash_clmulni_intel(F)
aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F)
cryptd(F) radeon ahci(F) libahci(F) r8169 i915 ttm i2c_algo_bit
drm_kms_helper video(F) drm rtsx_pci
[172635.399572] Pid: 12575, comm: flock Tainted: GF
3.8.0-7-generic #15-Ubuntu
[172635.399572] Call Trace:
[172635.399573] <NMI> [<ffffffff810587cf>] warn_slowpath_common+0x7f/0xc0
[172635.399577] [<ffffffff8105882a>] warn_slowpath_null+0x1a/0x20
[172635.399579] [<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
[172635.399580] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
[172635.399582] [<ffffffff815b551c>] consume_skb+0x2c/0x80
[172635.399586] [<ffffffffa014cf47>] rtl8169_poll+0x4b7/0x6d0 [r8169]
[172635.399590] [<ffffffff815d96c2>] netpoll_poll_dev+0x162/0x580
[172635.399593] [<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
[172635.399595] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
[172635.399598] [<ffffffff815da198>] netpoll_send_udp+0x278/0x2a0
[172635.399600] [<ffffffffa07c4967>] write_msg+0xc7/0x110 [netconsole]
[172635.399603] [<ffffffff810594f1>]
call_console_drivers.constprop.13+0x91/0x100
[172635.399605] [<ffffffff81059e4b>] console_unlock+0x2db/0x420
[172635.399606] [<ffffffff8105a7dd>] vprintk_emit+0x1fd/0x4e0
[172635.399608] [<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
[172635.399610] [<ffffffff816b78c1>] printk+0x67/0x69
[172635.399613] [<ffffffff810c0423>] print_modules+0xa3/0xd0
[172635.399617] [<ffffffff810587ca>] warn_slowpath_common+0x7a/0xc0
[172635.399618] [<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
[172635.399620] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
[172635.399623] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
[172635.399625] [<ffffffff810e8ffc>] watchdog_overflow_callback+0x9c/0xd0
[172635.399627] [<ffffffff8112609d>] __perf_event_overflow+0x9d/0x230
[172635.399630] [<ffffffff81025277>] ? x86_perf_event_set_period+0xd7/0x160
[172635.399632] [<ffffffff81126cb4>] perf_event_overflow+0x14/0x20
[172635.399634] [<ffffffff8102ad8b>] intel_pmu_handle_irq+0x1ab/0x330
[172635.399637] [<ffffffff816c5b9d>] perf_event_nmi_handler+0x1d/0x20
[172635.399640] [<ffffffff816c5351>] nmi_handle.isra.0+0x51/0x80
[172635.399641] [<ffffffff816c5460>] do_nmi+0xe0/0x360
[172635.399643] [<ffffffff816c4981>] end_repeat_nmi+0x1e/0x2e
[172635.399645] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
[172635.399648] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
[172635.399649] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
[172635.399651] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
[172635.399654] [<ffffffff811a4532>] send_sigio+0x52/0xf0
[172635.399656] [<ffffffff811a4631>] kill_fasync+0x61/0x90
[172635.399658] [<ffffffff8143ad83>] account+0x113/0x1d0
[172635.399661] [<ffffffff816c4618>] ? page_fault+0x28/0x30
[172635.399663] [<ffffffff8143b335>] extract_entropy+0x65/0x140
[172635.399664] [<ffffffff8143b650>] get_random_bytes+0x20/0x30
[172635.399666] [<ffffffff816bb2a9>] create_elf_tables+0xaa/0x614
[172635.399669] [<ffffffff811e8a84>] load_elf_binary+0xae4/0xe00
[172635.399672] [<ffffffff811e7fa0>] ? load_elf_library+0x240/0x240
[172635.399674] [<ffffffff8119999e>] search_binary_handler+0x19e/0x340
[172635.399677] [<ffffffff8119ad15>] do_execve_common.isra.22+0x3c5/0x470
[172635.399679] [<ffffffff8119add8>] do_execve+0x18/0x20
[172635.399680] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
[172635.399682] [<ffffffff816ccb49>] stub_execve+0x69/0xc0
[172635.399685] ---[ end trace 5ccb38c703860d58 ]---
[172635.399687] ------------[ cut here ]------------
[172635.399688] WARNING: at
/build/buildd/linux-3.8.0/net/core/skbuff.c:573
skb_release_head_state+0xed/0x100()
[172635.399689] Hardware name: VPCSE190X
[172635.399689] Modules linked in: ftdi_sio netconsole(F) configfs(F)
ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd_usb_audio
snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F) bnep rfcomm
bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core
snd_hda_codec videodev snd_hwdep(F) coretemp snd_pcm(F)
snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
snd_seq_midi_event(F) snd_rawmidi(F) kvm cfg80211 dm_multipath(F)
snd_seq(F) snd_seq_device(F) tpm_infineon snd_timer(F) psmouse(F)
snd(F) soundcore(F) scsi_dh serio_raw(F) sony_laptop rtsx_pci_ms mei
microcode(F) memstick mac_hid pcspkr lpc_ich tpm_tis firewire_sbp2
firewire_core crc_itu_t(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
hid_generic usbhid hid rtsx_pci_sdmmc ghash_clmulni_intel(F)
aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F)
cryptd(F) radeon ahci(F) libahci(F) r8169 i915 ttm i2c_algo_bit
drm_kms_helper video(F) drm rtsx_pci
[172635.399725] Pid: 12575, comm: flock Tainted: GF W
3.8.0-7-generic #15-Ubuntu
[172635.399726] Call Trace:
[172635.399726] <NMI> [<ffffffff810587cf>] warn_slowpath_common+0x7f/0xc0
[172635.399728] [<ffffffff8105882a>] warn_slowpath_null+0x1a/0x20
[172635.399729] [<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
[172635.399731] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
[172635.399732] [<ffffffff815b551c>] consume_skb+0x2c/0x80
[172635.399735] [<ffffffffa014cf47>] rtl8169_poll+0x4b7/0x6d0 [r8169]
[172635.399738] [<ffffffff815d96c2>] netpoll_poll_dev+0x162/0x580
[172635.399740] [<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
[172635.399742] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
[172635.399744] [<ffffffff815da198>] netpoll_send_udp+0x278/0x2a0
[172635.399746] [<ffffffffa07c4967>] write_msg+0xc7/0x110 [netconsole]
[172635.399749] [<ffffffff810594f1>]
call_console_drivers.constprop.13+0x91/0x100
[172635.399750] [<ffffffff81059e4b>] console_unlock+0x2db/0x420
[172635.399752] [<ffffffff8105a7dd>] vprintk_emit+0x1fd/0x4e0
[172635.399754] [<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
[172635.399756] [<ffffffff816b78c1>] printk+0x67/0x69
[172635.399757] [<ffffffff810c0423>] print_modules+0xa3/0xd0
[172635.399760] [<ffffffff810587ca>] warn_slowpath_common+0x7a/0xc0
[172635.399761] [<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
[172635.399763] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
[172635.399765] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
[172635.399766] [<ffffffff816c4981>] end_repeat_nmi+0x1e/0x2e
[<ffffffff811a4631>] kill_fasync+0x61/0x90
[172635.399793] [<ffffffff8143ad83>] account+0x113/0x1d0
[172635.399794] [<ffffffff816c4618>] ? page_fault+0x28/0x30
[172635.399796] [172635.399800] [<ffffffff816bb2a9>]
create_elf_tables+0xaa/0x614
[172635.399802] [<ffffffff811e8a84>] load_elf_binary+0xae4/0xe00
[172635.399804] [<ffffffff811e7fa0>] ? load_elf_library+0x240/0x240
[<ffffffff8119ad15>] do_execve_common.isra.22+0x3c5/0x470
[172635.399810] [<ffffffff8119add8>] do_execve+0x18/0x20
[172635.399812] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
[172635.399813] netconsole(F) configfs(F) ext2(F) bnep rfcomm
bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
snd_hda_codec_hdmi iwldvm snd_hwdep(F) coretemp snd_pcm(F)
snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F) scsi_dh
serio_raw(F) sony_laptop rtsx_pci_ms mei aes_x86_64(F) xts(F) lrw(F)
gf128mul(F) ablk_helper(F) i2c_algo_bit drm_kms_helper[172635.399858]
[<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
[172635.399860] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
[172635.399861] [<ffffffff815b551c>] consume_skb+0x2c/0x80
[<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
[172635.399870] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
[<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
[172635.399884] [<ffffffff816b78c1>] printk+0x67/0x69
[<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
[172635.399891] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
[172635.399893] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
[172635.399894] [<ffffffff810e8ffc>] watchdog_overflow_callback+0x9c/0xd0
[172635.399896] [172635.399903] [172635.399907] [172635.399912]
[<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
[172635.399914] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
[172635.399916] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
[172635.399918] [<ffffffff811a4532>] send_sigio+0x52/0xf0
[<ffffffff816bb2a9>] create_elf_tables+0xaa/0x614
[172635.399929] [172635.399939] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
[172635.399941] [<ffffffff816ccb49>] stub_execve+0x69/0xc0
[172635.399943] ---[ end trace 5ccb38c703860d5a ]---
[172635.399944] ------------[ cut here ]------------
nls_iso8859_1(F) usb_storage(F) parport_pc(F) ppdev(F) bluetooth
binfmt_misc(F) snd_hda_intel videobuf2_vmalloc videobuf2_memops
mac80211 videobuf2_core snd_hda_codec videodev snd_hwdep(F) snd_pcm(F)
snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
snd_seq_midi_event(F) dm_multipath(F) snd_seq(F) snd_seq_device(F)
tpm_infineon snd_timer(F)


2013-03-02 17:54:43

by Al Viro

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:

> CPU0 calls syscall fcntl(fd, F_SETFL, FASYNC)
> fcntl calls fdget_raw, the count on the filp is 1, so it is not
> incremented (no reference taken)
> fcntl calls do_fcntl, which calls setfl which calls filp->op->fasync
> which calls fasync_helper
> fasync_helper calls fasync_add_entry, which calls fasync_insert_entry
> fasync_insert_entry adds a fasync_struct to the list for the current
> filp and assigns the pointer,
> before getting to filp->f_flags |= FASYNC, we go to CPU1
>
> CPU1 calls fput on the same filp, the counter is decremented to 0 and
> that filp is either added to the delayed_fput_list or scheduled for
> ____fput task_work.

Stop here. Just how does CPU1 manage to do that? fdget_raw() will not
increment ->f_count *only* if there's nobody else with reference to its
descriptor table. And if ->f_count is 1, we'd better have no references
outside of that descriptor table.

So where had the reference dropped by process on CPU1 come from?

2013-03-02 18:42:45

by Al Viro

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
> occurring, but the following is my best guess. A kernel log is
> attached.
>
> The comment for fasync_insert_entry:
>
> * NOTE! It is very important that the FASYNC flag always
> * match the state "is the filp on a fasync list".
>
> Is not always true leading to deadlock.
>
> CPU0 calls syscall fcntl(fd, F_SETFL, FASYNC)
> fcntl calls fdget_raw, the count on the filp is 1, so it is not
> incremented (no reference taken)

You misunderstand what fdget_raw() checks, but in any case...

> pointer points to freed memory. send_sigio is called with this
> pointer, which calls read_lock(&fown->lock), however, the memory used
> by that lock has been reused and the system hardlocks.

... what makes you think that it's fown->lock, in the first place?

> [172635.399651] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
> [172635.399654] [<ffffffff811a4532>] send_sigio+0x52/0xf0

send_sigio() is
[initialization of a local variable to 1]
read_lock(&fown->lock);
[getting type and pid, checking them]
read_lock(&tasklist_lock);
[loop doing the majority of work]
read_unlock(&tasklist_lock);
read_unlock(&fown->lock);
and you are at about 1/3 into the function. Who said it's fown->lock and
not tasklist_lock? Could you check (or post) disassembly of send_sigio
to see which one it is?

2013-03-02 19:25:20

by Al Viro

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 02, 2013 at 06:42:43PM +0000, Al Viro wrote:
> ... what makes you think that it's fown->lock, in the first place?
>
> > [172635.399651] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
> > [172635.399654] [<ffffffff811a4532>] send_sigio+0x52/0xf0
>
> send_sigio() is
> [initialization of a local variable to 1]
> read_lock(&fown->lock);
> [getting type and pid, checking them]
> read_lock(&tasklist_lock);
> [loop doing the majority of work]
> read_unlock(&tasklist_lock);
> read_unlock(&fown->lock);
> and you are at about 1/3 into the function. Who said it's fown->lock and
> not tasklist_lock? Could you check (or post) disassembly of send_sigio
> to see which one it is?

FWIW, after having grabbed Ubuntu binary of 3.8.0-8 generic (yours is -7, it's
not immediately visible on their site and the odds the damn thing has changed
in send_sigio() between those revisions are low):

send_sigio: (address is off by 16 compared to yours)
ffffffff811a4bf0: e8 cb b5 52 00 callq 0xffffffff816d01c0
(that's __fentry__)
ffffffff811a4bf5: 55 push %rbp
ffffffff811a4bf6: 48 89 e5 mov %rsp,%rbp
ffffffff811a4bf9: 41 57 push %r15
ffffffff811a4bfb: 41 89 d7 mov %edx,%r15d
ffffffff811a4bfe: 41 56 push %r14
ffffffff811a4c00: 41 89 f6 mov %esi,%r14d
ffffffff811a4c03: 41 55 push %r13
ffffffff811a4c05: 41 54 push %r12
ffffffff811a4c07: 53 push %rbx
ffffffff811a4c08: 48 89 fb mov %rdi,%rbx
ffffffff811a4c0b: 48 83 ec 10 sub $0x10,%rsp
ffffffff811a4c0f: e8 1c 32 52 00 callq 0xffffffff816c7e30
(first _raw_read_lock())

ffffffff811a4c14: 44 8b 4b 10 mov 0x10(%rbx),%r9d
ffffffff811a4c18: 41 83 f9 03 cmp $0x3,%r9d
ffffffff811a4c1c: 0f 84 a6 00 00 00 je 0xffffffff811a4cc8
ffffffff811a4c22: c7 45 d4 01 00 00 00 movl $0x1,-0x2c(%rbp)
ffffffff811a4c29: 4c 8b 63 08 mov 0x8(%rbx),%r12
ffffffff811a4c2d: 4d 85 e4 test %r12,%r12
ffffffff811a4c30: 74 5b je 0xffffffff811a4c8d
ffffffff811a4c32: 48 c7 c7 40 60 c0 81 mov $0xffffffff81c06040,%rdi
ffffffff811a4c39: 44 89 4d c8 mov %r9d,-0x38(%rbp)
ffffffff811a4c3d: e8 ee 31 52 00 callq 0xffffffff816c7e30
ffffffff811a4c42:
... and this was the second one, i.e. read_lock(&tasklist_lock). Note that
return address is send_sigio + 0x52, just as in your trace.

In other words, tasklist_lock it is...

2013-03-02 19:49:25

by Al Viro

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
> occurring, but the following is my best guess. A kernel log is
> attached.

[snip the analysis - it's a different lock anyway]

The traces below are essentially sys_execve() getting to get_random_bytes(),
to kill_fasync(), to send_sigio(), which spins on tasklist_lock.

Could you rebuild it with lockdep enabled and try to reproduce that?
I very much doubt that this execve() is a part of deadlock - it's
getting caught on one, but it shouldn't be holding any locks that
nest inside tasklist_lock at that point, so even it hadn't been there,
the process holding tasklist_lock probably wouldn't have progressed any
further...

> I exercise this by running a UML instance which uses /dev/random.
> After a day or so, I'll eventually get a crash or a hang. I captured
> this with netconsole.
>
>
> [172635.399438] ------------[ cut here ]------------
> [172635.399449] WARNING: at
> /build/buildd/linux-3.8.0/kernel/watchdog.c:246
> watchdog_overflow_callback+0x9c/0xd0()
> [172635.399451] Hardware name: VPCSE190X
> [172635.399454] Watchdog detected hard LOCKUP on cpu 0
> [172635.399456] Modules linked in: ftdi_sio netconsole(F) configfs(F)
> ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd
> _usb_audio snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F)
> bnep rfcomm bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(
> F) snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
> videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core snd_hd
> a_codec videodev snd_hwdep(F) coretemp snd_pcm(F) snd_page_alloc(F)
> joydev(F) iwlwifi[172635.399521] ------------[ cut here ]------------
> [172635.399525] WARNING: at
> /build/buildd/linux-3.8.0/net/core/skbuff.c:573
> skb_release_head_state+0xed/0x100()
> [172635.399525] Hardware name: VPCSE190X
> [172635.399526] Modules linked in: ftdi_sio netconsole(F) configfs(F)
> ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd_usb_audio
> snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F) bnep rfcomm
> bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
> snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
> videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core
> snd_hda_codec videodev snd_hwdep(F) coretemp snd_pcm(F)
> snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
> snd_seq_midi_event(F) snd_rawmidi(F) kvm cfg80211 dm_multipath(F)
> snd_seq(F) snd_seq_device(F) tpm_infineon snd_timer(F) psmouse(F)
> snd(F) soundcore(F) scsi_dh serio_raw(F) sony_laptop rtsx_pci_ms mei
> microcode(F) memstick mac_hid pcspkr lpc_ich tpm_tis firewire_sbp2
> firewire_core crc_itu_t(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
> hid_generic usbhid hid rtsx_pci_sdmmc ghash_clmulni_intel(F)
> aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F)
> cryptd(F) radeon ahci(F) libahci(F) r8169 i915 ttm i2c_algo_bit
> drm_kms_helper video(F) drm rtsx_pci
> [172635.399572] Pid: 12575, comm: flock Tainted: GF
> 3.8.0-7-generic #15-Ubuntu
> [172635.399572] Call Trace:
> [172635.399573] <NMI> [<ffffffff810587cf>] warn_slowpath_common+0x7f/0xc0
> [172635.399577] [<ffffffff8105882a>] warn_slowpath_null+0x1a/0x20
> [172635.399579] [<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
> [172635.399580] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
> [172635.399582] [<ffffffff815b551c>] consume_skb+0x2c/0x80
> [172635.399586] [<ffffffffa014cf47>] rtl8169_poll+0x4b7/0x6d0 [r8169]
> [172635.399590] [<ffffffff815d96c2>] netpoll_poll_dev+0x162/0x580
> [172635.399593] [<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
> [172635.399595] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
> [172635.399598] [<ffffffff815da198>] netpoll_send_udp+0x278/0x2a0
> [172635.399600] [<ffffffffa07c4967>] write_msg+0xc7/0x110 [netconsole]
> [172635.399603] [<ffffffff810594f1>]
> call_console_drivers.constprop.13+0x91/0x100
> [172635.399605] [<ffffffff81059e4b>] console_unlock+0x2db/0x420
> [172635.399606] [<ffffffff8105a7dd>] vprintk_emit+0x1fd/0x4e0
> [172635.399608] [<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
> [172635.399610] [<ffffffff816b78c1>] printk+0x67/0x69
> [172635.399613] [<ffffffff810c0423>] print_modules+0xa3/0xd0
> [172635.399617] [<ffffffff810587ca>] warn_slowpath_common+0x7a/0xc0
> [172635.399618] [<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
> [172635.399620] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
> [172635.399623] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
> [172635.399625] [<ffffffff810e8ffc>] watchdog_overflow_callback+0x9c/0xd0
> [172635.399627] [<ffffffff8112609d>] __perf_event_overflow+0x9d/0x230
> [172635.399630] [<ffffffff81025277>] ? x86_perf_event_set_period+0xd7/0x160
> [172635.399632] [<ffffffff81126cb4>] perf_event_overflow+0x14/0x20
> [172635.399634] [<ffffffff8102ad8b>] intel_pmu_handle_irq+0x1ab/0x330
> [172635.399637] [<ffffffff816c5b9d>] perf_event_nmi_handler+0x1d/0x20
> [172635.399640] [<ffffffff816c5351>] nmi_handle.isra.0+0x51/0x80
> [172635.399641] [<ffffffff816c5460>] do_nmi+0xe0/0x360
> [172635.399643] [<ffffffff816c4981>] end_repeat_nmi+0x1e/0x2e
> [172635.399645] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
> [172635.399648] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
> [172635.399649] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
> [172635.399651] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
> [172635.399654] [<ffffffff811a4532>] send_sigio+0x52/0xf0
> [172635.399656] [<ffffffff811a4631>] kill_fasync+0x61/0x90
> [172635.399658] [<ffffffff8143ad83>] account+0x113/0x1d0
> [172635.399661] [<ffffffff816c4618>] ? page_fault+0x28/0x30
> [172635.399663] [<ffffffff8143b335>] extract_entropy+0x65/0x140
> [172635.399664] [<ffffffff8143b650>] get_random_bytes+0x20/0x30
> [172635.399666] [<ffffffff816bb2a9>] create_elf_tables+0xaa/0x614
> [172635.399669] [<ffffffff811e8a84>] load_elf_binary+0xae4/0xe00
> [172635.399672] [<ffffffff811e7fa0>] ? load_elf_library+0x240/0x240
> [172635.399674] [<ffffffff8119999e>] search_binary_handler+0x19e/0x340
> [172635.399677] [<ffffffff8119ad15>] do_execve_common.isra.22+0x3c5/0x470
> [172635.399679] [<ffffffff8119add8>] do_execve+0x18/0x20
> [172635.399680] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
> [172635.399682] [<ffffffff816ccb49>] stub_execve+0x69/0xc0
> [172635.399685] ---[ end trace 5ccb38c703860d58 ]---
> [172635.399687] ------------[ cut here ]------------
> [172635.399688] WARNING: at
> /build/buildd/linux-3.8.0/net/core/skbuff.c:573
> skb_release_head_state+0xed/0x100()
> [172635.399689] Hardware name: VPCSE190X
> [172635.399689] Modules linked in: ftdi_sio netconsole(F) configfs(F)
> ext2(F) nls_iso8859_1(F) usb_storage(F) pl2303 usbserial snd_usb_audio
> snd_usbmidi_lib parport_pc(F) ppdev(F) lp(F) parport(F) bnep rfcomm
> bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
> snd_hda_codec_hdmi iwldvm snd_hda_codec_realtek snd_hda_intel
> videobuf2_vmalloc videobuf2_memops mac80211 videobuf2_core
> snd_hda_codec videodev snd_hwdep(F) coretemp snd_pcm(F)
> snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
> snd_seq_midi_event(F) snd_rawmidi(F) kvm cfg80211 dm_multipath(F)
> snd_seq(F) snd_seq_device(F) tpm_infineon snd_timer(F) psmouse(F)
> snd(F) soundcore(F) scsi_dh serio_raw(F) sony_laptop rtsx_pci_ms mei
> microcode(F) memstick mac_hid pcspkr lpc_ich tpm_tis firewire_sbp2
> firewire_core crc_itu_t(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
> hid_generic usbhid hid rtsx_pci_sdmmc ghash_clmulni_intel(F)
> aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F)
> cryptd(F) radeon ahci(F) libahci(F) r8169 i915 ttm i2c_algo_bit
> drm_kms_helper video(F) drm rtsx_pci
> [172635.399725] Pid: 12575, comm: flock Tainted: GF W
> 3.8.0-7-generic #15-Ubuntu
> [172635.399726] Call Trace:
> [172635.399726] <NMI> [<ffffffff810587cf>] warn_slowpath_common+0x7f/0xc0
> [172635.399728] [<ffffffff8105882a>] warn_slowpath_null+0x1a/0x20
> [172635.399729] [<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
> [172635.399731] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
> [172635.399732] [<ffffffff815b551c>] consume_skb+0x2c/0x80
> [172635.399735] [<ffffffffa014cf47>] rtl8169_poll+0x4b7/0x6d0 [r8169]
> [172635.399738] [<ffffffff815d96c2>] netpoll_poll_dev+0x162/0x580
> [172635.399740] [<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
> [172635.399742] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
> [172635.399744] [<ffffffff815da198>] netpoll_send_udp+0x278/0x2a0
> [172635.399746] [<ffffffffa07c4967>] write_msg+0xc7/0x110 [netconsole]
> [172635.399749] [<ffffffff810594f1>]
> call_console_drivers.constprop.13+0x91/0x100
> [172635.399750] [<ffffffff81059e4b>] console_unlock+0x2db/0x420
> [172635.399752] [<ffffffff8105a7dd>] vprintk_emit+0x1fd/0x4e0
> [172635.399754] [<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
> [172635.399756] [<ffffffff816b78c1>] printk+0x67/0x69
> [172635.399757] [<ffffffff810c0423>] print_modules+0xa3/0xd0
> [172635.399760] [<ffffffff810587ca>] warn_slowpath_common+0x7a/0xc0
> [172635.399761] [<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
> [172635.399763] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
> [172635.399765] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
> [172635.399766] [<ffffffff816c4981>] end_repeat_nmi+0x1e/0x2e
> [<ffffffff811a4631>] kill_fasync+0x61/0x90
> [172635.399793] [<ffffffff8143ad83>] account+0x113/0x1d0
> [172635.399794] [<ffffffff816c4618>] ? page_fault+0x28/0x30
> [172635.399796] [172635.399800] [<ffffffff816bb2a9>]
> create_elf_tables+0xaa/0x614
> [172635.399802] [<ffffffff811e8a84>] load_elf_binary+0xae4/0xe00
> [172635.399804] [<ffffffff811e7fa0>] ? load_elf_library+0x240/0x240
> [<ffffffff8119ad15>] do_execve_common.isra.22+0x3c5/0x470
> [172635.399810] [<ffffffff8119add8>] do_execve+0x18/0x20
> [172635.399812] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
> [172635.399813] netconsole(F) configfs(F) ext2(F) bnep rfcomm
> bluetooth binfmt_misc(F) dm_crypt(F) uvcvideo arc4(F)
> snd_hda_codec_hdmi iwldvm snd_hwdep(F) coretemp snd_pcm(F)
> snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F) scsi_dh
> serio_raw(F) sony_laptop rtsx_pci_ms mei aes_x86_64(F) xts(F) lrw(F)
> gf128mul(F) ablk_helper(F) i2c_algo_bit drm_kms_helper[172635.399858]
> [<ffffffff815b51bd>] skb_release_head_state+0xed/0x100
> [172635.399860] [<ffffffff815b4fb2>] __kfree_skb+0x12/0xa0
> [172635.399861] [<ffffffff815b551c>] consume_skb+0x2c/0x80
> [<ffffffff815b666b>] ? __alloc_skb+0x8b/0x2a0
> [172635.399870] [<ffffffff815d9c6c>] netpoll_send_skb_on_dev+0x18c/0x3a0
> [<ffffffff810e8ffc>] ? watchdog_overflow_callback+0x9c/0xd0
> [172635.399884] [<ffffffff816b78c1>] printk+0x67/0x69
> [<ffffffff810588cc>] warn_slowpath_fmt+0x4c/0x50
> [172635.399891] [<ffffffff8109170d>] ? sched_clock_cpu+0xbd/0x110
> [172635.399893] [<ffffffff810e8f60>] ? touch_nmi_watchdog+0x80/0x80
> [172635.399894] [<ffffffff810e8ffc>] watchdog_overflow_callback+0x9c/0xd0
> [172635.399896] [172635.399903] [172635.399907] [172635.399912]
> [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
> [172635.399914] [<ffffffff813536d9>] ? __read_lock_failed+0x9/0x20
> [172635.399916] <<EOE>> [<ffffffff816c3e13>] _raw_read_lock+0x13/0x20
> [172635.399918] [<ffffffff811a4532>] send_sigio+0x52/0xf0
> [<ffffffff816bb2a9>] create_elf_tables+0xaa/0x614
> [172635.399929] [172635.399939] [<ffffffff8119b07d>] sys_execve+0x3d/0x60
> [172635.399941] [<ffffffff816ccb49>] stub_execve+0x69/0xc0
> [172635.399943] ---[ end trace 5ccb38c703860d5a ]---
> [172635.399944] ------------[ cut here ]------------
> nls_iso8859_1(F) usb_storage(F) parport_pc(F) ppdev(F) bluetooth
> binfmt_misc(F) snd_hda_intel videobuf2_vmalloc videobuf2_memops
> mac80211 videobuf2_core snd_hda_codec videodev snd_hwdep(F) snd_pcm(F)
> snd_page_alloc(F) joydev(F) iwlwifi kvm_intel snd_seq_midi(F)
> snd_seq_midi_event(F) dm_multipath(F) snd_seq(F) snd_seq_device(F)
> tpm_infineon snd_timer(F)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-03 00:09:47

by Russ Dill

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 2, 2013 at 11:49 AM, Al Viro <[email protected]> wrote:
> On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
>> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
>> occurring, but the following is my best guess. A kernel log is
>> attached.
>
> [snip the analysis - it's a different lock anyway]
>
> The traces below are essentially sys_execve() getting to get_random_bytes(),
> to kill_fasync(), to send_sigio(), which spins on tasklist_lock.
>
> Could you rebuild it with lockdep enabled and try to reproduce that?
> I very much doubt that this execve() is a part of deadlock - it's
> getting caught on one, but it shouldn't be holding any locks that
> nest inside tasklist_lock at that point, so even it hadn't been there,
> the process holding tasklist_lock probably wouldn't have progressed any
> further...

ok, I did screw up the analysis quite badly, luckily, lockdep got it right away.


[ 199.411236]
[ 199.411261] ======================================================
[ 199.411268] [ INFO: possible circular locking dependency detected ]
[ 199.411277] 3.8.0-9-generic #18 Not tainted
[ 199.411283] -------------------------------------------------------
[ 199.411290] slirp-wrapper.s/5574 is trying to acquire lock:
[ 199.411296] (&(&new->fa_lock)->rlock){-.-...}, at:
[<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
[ 199.411332]
[ 199.411332] but task is already holding lock:
[ 199.411338] (&nonblocking_pool.lock){......}, at:
[<ffffffff81456789>] account+0x39/0x1d0
[ 199.411364]
[ 199.411364] which lock already depends on the new lock.
[ 199.411364]
[ 199.411371]
[ 199.411371] the existing dependency chain (in reverse order) is:
[ 199.411379]
[ 199.411379] -> #2 (&nonblocking_pool.lock){......}:
[ 199.411400] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
[ 199.411416] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
[ 199.411430] [<ffffffff81456543>] mix_pool_bytes.constprop.20+0x43/0xb0
[ 199.411442] [<ffffffff81456724>] add_device_randomness+0x64/0x90
[ 199.411454] [<ffffffff81081e0e>] posix_cpu_timers_exit+0x1e/0x50
[ 199.411469] [<ffffffff8105d0b2>] release_task+0xe2/0x470
[ 199.411484] [<ffffffff8105eb77>] do_exit+0x5d7/0x9c0
[ 199.411495] [<ffffffff810c6b4a>] __module_put_and_exit+0x1a/0x20
[ 199.411509] [<ffffffff8132b725>] cryptomgr_test+0x25/0x30
[ 199.411526] [<ffffffff8107f5ea>] kthread+0xea/0xf0
[ 199.411537] [<ffffffff8170146c>] ret_from_fork+0x7c/0xb0
[ 199.411550]
[ 199.411550] -> #1 (&(&sighand->siglock)->rlock){-.-...}:
[ 199.411689] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
[ 199.411702] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
[ 199.411713] [<ffffffffa0008200>] vblank_disable_fn+0x60/0xd0 [drm]
[ 199.411768] [<ffffffff810695da>] call_timer_fn+0x7a/0x180
[ 199.411781] [<ffffffff8106aa2c>] run_timer_softirq+0x1fc/0x2a0
[ 199.411794] [<ffffffff81061580>] __do_softirq+0xe0/0x220
[ 199.411807] [<ffffffff81702a3c>] call_softirq+0x1c/0x30
[ 199.411819] [<ffffffff81016685>] do_softirq+0xa5/0xe0
[ 199.411833] [<ffffffff81061855>] irq_exit+0xb5/0xc0
[ 199.411846] [<ffffffff817033de>] smp_apic_timer_interrupt+0x6e/0x99
[ 199.411859] [<ffffffff817022f2>] apic_timer_interrupt+0x72/0x80
[ 199.411871] [<ffffffff8158e3c0>] cpuidle_enter_tk+0x10/0x20
[ 199.411885] [<ffffffff8158df95>] cpuidle_idle_call+0xa5/0x270
[ 199.411897] [<ffffffff8101d865>] cpu_idle+0xb5/0x120
[ 199.411910] [<ffffffff816d4d64>] rest_init+0xb4/0xc0
[ 199.411924] [<ffffffff81d22c07>] start_kernel+0x3ea/0x3f6
[ 199.411939] [<ffffffff81d22355>] x86_64_start_reservations+0x130/0x133
[ 199.411952] [<ffffffff81d22458>] x86_64_start_kernel+0x100/0x10f
[ 199.411966]
[ 199.411966] -> #0 (&(&new->fa_lock)->rlock){-.-...}:
[ 199.411984] [<ffffffff810bcfc9>] __lock_acquire+0x1279/0x1aa0
[ 199.411997] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
[ 199.412103] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
[ 199.412115] [<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
[ 199.412127] [<ffffffff81456863>] account+0x113/0x1d0
[ 199.412138] [<ffffffff81456e15>] extract_entropy+0x65/0x140
[ 199.412149] [<ffffffff81457120>] get_random_bytes+0x20/0x30
[ 199.412159] [<ffffffff816ee9b2>] create_elf_tables+0xaa/0x615
[ 199.412172] [<ffffffff811f9a94>] load_elf_binary+0xae4/0xe00
[ 199.412185] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
[ 199.412197] [<ffffffff811f7515>] load_script+0x265/0x2b0
[ 199.412208] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
[ 199.412219] [<ffffffff811aa305>] do_execve_common.isra.23+0x405/0x4c0
[ 199.412230] [<ffffffff811aa3d8>] do_execve+0x18/0x20
[ 199.412240] [<ffffffff811aa67d>] sys_execve+0x3d/0x60
[ 199.412251] [<ffffffff81701b19>] stub_execve+0x69/0xc0
[ 199.412263]
[ 199.412263] other info that might help us debug this:
[ 199.412263]
[ 199.412271] Chain exists of:
[ 199.412271] &(&new->fa_lock)->rlock -->
&(&sighand->siglock)->rlock --> &nonblocking_pool.lock
[ 199.412271]
[ 199.412298] Possible unsafe locking scenario:
[ 199.412298]
[ 199.412305] CPU0 CPU1
[ 199.412310] ---- ----
[ 199.412316] lock(&nonblocking_pool.lock);
[ 199.412328] lock(&(&sighand->siglock)->rlock);
[ 199.412339] lock(&nonblocking_pool.lock);
[ 199.412452] lock(&(&new->fa_lock)->rlock);
[ 199.412464]
[ 199.412464] *** DEADLOCK ***
[ 199.412464]
[ 199.412473] 2 locks held by slirp-wrapper.s/5574:
[ 199.412478] #0: (&nonblocking_pool.lock){......}, at:
[<ffffffff81456789>] account+0x39/0x1d0
[ 199.412504] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff811b3f81>]
kill_fasync+0x21/0xe0
[ 199.412531]
[ 199.412531] stack backtrace:
[ 199.412540] Pid: 5574, comm: slirp-wrapper.s Not tainted 3.8.0-9-generic #18
[ 199.412546] Call Trace:
[ 199.412560] [<ffffffff816eb393>] print_circular_bug+0x1f9/0x208
[ 199.412572] [<ffffffff810bcfc9>] __lock_acquire+0x1279/0x1aa0
[ 199.412584] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
[ 199.412596] [<ffffffff811b3fc9>] ? kill_fasync+0x69/0xe0
[ 199.412606] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
[ 199.412618] [<ffffffff811b3fc9>] ? kill_fasync+0x69/0xe0
[ 199.412628] [<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
[ 199.412639] [<ffffffff811b3f81>] ? kill_fasync+0x21/0xe0
[ 199.412649] [<ffffffff81456863>] account+0x113/0x1d0
[ 199.412658] [<ffffffff81456e15>] extract_entropy+0x65/0x140
[ 199.412668] [<ffffffff81457120>] get_random_bytes+0x20/0x30
[ 199.412678] [<ffffffff816ee9b2>] create_elf_tables+0xaa/0x615
[ 199.412689] [<ffffffff811f9a94>] load_elf_binary+0xae4/0xe00
[ 199.412700] [<ffffffff811f8fb0>] ? load_elf_library+0x240/0x240
[ 199.412710] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
[ 199.412720] [<ffffffff811a8f8f>] ? search_binary_handler+0x7f/0x390
[ 199.412730] [<ffffffff811f72b0>] ? compat_sys_ioctl+0x320/0x320
[ 199.412740] [<ffffffff811f7515>] load_script+0x265/0x2b0
[ 199.412750] [<ffffffff811a909f>] ? search_binary_handler+0x18f/0x390
[ 199.412760] [<ffffffff811f72b0>] ? compat_sys_ioctl+0x320/0x320
[ 199.412770] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
[ 199.412779] [<ffffffff811a8f8f>] ? search_binary_handler+0x7f/0x390
[ 199.412789] [<ffffffff811aa305>] do_execve_common.isra.23+0x405/0x4c0
[ 199.412799] [<ffffffff811aa014>] ? do_execve_common.isra.23+0x114/0x4c0
[ 199.412809] [<ffffffff811aa3d8>] do_execve+0x18/0x20
[ 199.412922] [<ffffffff811aa67d>] sys_execve+0x3d/0x60
[ 199.412933] [<ffffffff81701b19>] stub_execve+0x69/0xc0

2013-03-04 06:16:13

by Russ Dill

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sat, Mar 2, 2013 at 4:09 PM, Russ Dill <[email protected]> wrote:
> On Sat, Mar 2, 2013 at 11:49 AM, Al Viro <[email protected]> wrote:
>> On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
>>> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
>>> occurring, but the following is my best guess. A kernel log is
>>> attached.
>>
>> [snip the analysis - it's a different lock anyway]
>>
>> The traces below are essentially sys_execve() getting to get_random_bytes(),
>> to kill_fasync(), to send_sigio(), which spins on tasklist_lock.
>>
>> Could you rebuild it with lockdep enabled and try to reproduce that?
>> I very much doubt that this execve() is a part of deadlock - it's
>> getting caught on one, but it shouldn't be holding any locks that
>> nest inside tasklist_lock at that point, so even it hadn't been there,
>> the process holding tasklist_lock probably wouldn't have progressed any
>> further...
>
> ok, I did screw up the analysis quite badly, luckily, lockdep got it right away.
>

So lockdep gives some clues, but seems a bit confused, so here's what happened.

mix_pool_bytes /* takes nonblocking_pool.lock */
add_device_randomness
posix_cpu_timers_exit
__exit_signal
release_task /* takes write lock on tasklist_lock */
do_exit
__module_put_and_exit
cryptomgr_test

send_sigio /* takes read lock on tasklist_lock */
kill_fasync_rcu
kill_fasync
account /* takes nonblocking_pool.lock */
extract_entropy
get_random_bytes
create_elf_tables
load_elf_binary
load_elf_library
search_binary_handler

This would mark the culprit as 613370549 'random: Mix cputime from
each thread that exits to the pool'. So long as I'm not as crazy on
the last analysis as this one, may I suggest a revert of this commit
for 3.8.3?

> [ 199.411236]
> [ 199.411261] ======================================================
> [ 199.411268] [ INFO: possible circular locking dependency detected ]
> [ 199.411277] 3.8.0-9-generic #18 Not tainted
> [ 199.411283] -------------------------------------------------------
> [ 199.411290] slirp-wrapper.s/5574 is trying to acquire lock:
> [ 199.411296] (&(&new->fa_lock)->rlock){-.-...}, at:
> [<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
> [ 199.411332]
> [ 199.411332] but task is already holding lock:
> [ 199.411338] (&nonblocking_pool.lock){......}, at:
> [<ffffffff81456789>] account+0x39/0x1d0
> [ 199.411364]
> [ 199.411364] which lock already depends on the new lock.
> [ 199.411364]
> [ 199.411371]
> [ 199.411371] the existing dependency chain (in reverse order) is:
> [ 199.411379]
> [ 199.411379] -> #2 (&nonblocking_pool.lock){......}:
> [ 199.411400] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
> [ 199.411416] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
> [ 199.411430] [<ffffffff81456543>] mix_pool_bytes.constprop.20+0x43/0xb0
> [ 199.411442] [<ffffffff81456724>] add_device_randomness+0x64/0x90
> [ 199.411454] [<ffffffff81081e0e>] posix_cpu_timers_exit+0x1e/0x50
> [ 199.411469] [<ffffffff8105d0b2>] release_task+0xe2/0x470
> [ 199.411484] [<ffffffff8105eb77>] do_exit+0x5d7/0x9c0
> [ 199.411495] [<ffffffff810c6b4a>] __module_put_and_exit+0x1a/0x20
> [ 199.411509] [<ffffffff8132b725>] cryptomgr_test+0x25/0x30
> [ 199.411526] [<ffffffff8107f5ea>] kthread+0xea/0xf0
> [ 199.411537] [<ffffffff8170146c>] ret_from_fork+0x7c/0xb0
> [ 199.411550]
> [ 199.411550] -> #1 (&(&sighand->siglock)->rlock){-.-...}:
> [ 199.411689] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
> [ 199.411702] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
> [ 199.411713] [<ffffffffa0008200>] vblank_disable_fn+0x60/0xd0 [drm]
> [ 199.411768] [<ffffffff810695da>] call_timer_fn+0x7a/0x180
> [ 199.411781] [<ffffffff8106aa2c>] run_timer_softirq+0x1fc/0x2a0
> [ 199.411794] [<ffffffff81061580>] __do_softirq+0xe0/0x220
> [ 199.411807] [<ffffffff81702a3c>] call_softirq+0x1c/0x30
> [ 199.411819] [<ffffffff81016685>] do_softirq+0xa5/0xe0
> [ 199.411833] [<ffffffff81061855>] irq_exit+0xb5/0xc0
> [ 199.411846] [<ffffffff817033de>] smp_apic_timer_interrupt+0x6e/0x99
> [ 199.411859] [<ffffffff817022f2>] apic_timer_interrupt+0x72/0x80
> [ 199.411871] [<ffffffff8158e3c0>] cpuidle_enter_tk+0x10/0x20
> [ 199.411885] [<ffffffff8158df95>] cpuidle_idle_call+0xa5/0x270
> [ 199.411897] [<ffffffff8101d865>] cpu_idle+0xb5/0x120
> [ 199.411910] [<ffffffff816d4d64>] rest_init+0xb4/0xc0
> [ 199.411924] [<ffffffff81d22c07>] start_kernel+0x3ea/0x3f6
> [ 199.411939] [<ffffffff81d22355>] x86_64_start_reservations+0x130/0x133
> [ 199.411952] [<ffffffff81d22458>] x86_64_start_kernel+0x100/0x10f
> [ 199.411966]
> [ 199.411966] -> #0 (&(&new->fa_lock)->rlock){-.-...}:
> [ 199.411984] [<ffffffff810bcfc9>] __lock_acquire+0x1279/0x1aa0
> [ 199.411997] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
> [ 199.412103] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
> [ 199.412115] [<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
> [ 199.412127] [<ffffffff81456863>] account+0x113/0x1d0
> [ 199.412138] [<ffffffff81456e15>] extract_entropy+0x65/0x140
> [ 199.412149] [<ffffffff81457120>] get_random_bytes+0x20/0x30
> [ 199.412159] [<ffffffff816ee9b2>] create_elf_tables+0xaa/0x615
> [ 199.412172] [<ffffffff811f9a94>] load_elf_binary+0xae4/0xe00
> [ 199.412185] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
> [ 199.412197] [<ffffffff811f7515>] load_script+0x265/0x2b0
> [ 199.412208] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
> [ 199.412219] [<ffffffff811aa305>] do_execve_common.isra.23+0x405/0x4c0
> [ 199.412230] [<ffffffff811aa3d8>] do_execve+0x18/0x20
> [ 199.412240] [<ffffffff811aa67d>] sys_execve+0x3d/0x60
> [ 199.412251] [<ffffffff81701b19>] stub_execve+0x69/0xc0
> [ 199.412263]
> [ 199.412263] other info that might help us debug this:
> [ 199.412263]
> [ 199.412271] Chain exists of:
> [ 199.412271] &(&new->fa_lock)->rlock -->
> &(&sighand->siglock)->rlock --> &nonblocking_pool.lock
> [ 199.412271]
> [ 199.412298] Possible unsafe locking scenario:
> [ 199.412298]
> [ 199.412305] CPU0 CPU1
> [ 199.412310] ---- ----
> [ 199.412316] lock(&nonblocking_pool.lock);
> [ 199.412328] lock(&(&sighand->siglock)->rlock);
> [ 199.412339] lock(&nonblocking_pool.lock);
> [ 199.412452] lock(&(&new->fa_lock)->rlock);
> [ 199.412464]
> [ 199.412464] *** DEADLOCK ***
> [ 199.412464]
> [ 199.412473] 2 locks held by slirp-wrapper.s/5574:
> [ 199.412478] #0: (&nonblocking_pool.lock){......}, at:
> [<ffffffff81456789>] account+0x39/0x1d0
> [ 199.412504] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff811b3f81>]
> kill_fasync+0x21/0xe0
> [ 199.412531]
> [ 199.412531] stack backtrace:
> [ 199.412540] Pid: 5574, comm: slirp-wrapper.s Not tainted 3.8.0-9-generic #18
> [ 199.412546] Call Trace:
> [ 199.412560] [<ffffffff816eb393>] print_circular_bug+0x1f9/0x208
> [ 199.412572] [<ffffffff810bcfc9>] __lock_acquire+0x1279/0x1aa0
> [ 199.412584] [<ffffffff810bdd58>] lock_acquire+0x98/0x120
> [ 199.412596] [<ffffffff811b3fc9>] ? kill_fasync+0x69/0xe0
> [ 199.412606] [<ffffffff816f8b1e>] _raw_spin_lock_irqsave+0x4e/0x70
> [ 199.412618] [<ffffffff811b3fc9>] ? kill_fasync+0x69/0xe0
> [ 199.412628] [<ffffffff811b3fc9>] kill_fasync+0x69/0xe0
> [ 199.412639] [<ffffffff811b3f81>] ? kill_fasync+0x21/0xe0
> [ 199.412649] [<ffffffff81456863>] account+0x113/0x1d0
> [ 199.412658] [<ffffffff81456e15>] extract_entropy+0x65/0x140
> [ 199.412668] [<ffffffff81457120>] get_random_bytes+0x20/0x30
> [ 199.412678] [<ffffffff816ee9b2>] create_elf_tables+0xaa/0x615
> [ 199.412689] [<ffffffff811f9a94>] load_elf_binary+0xae4/0xe00
> [ 199.412700] [<ffffffff811f8fb0>] ? load_elf_library+0x240/0x240
> [ 199.412710] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
> [ 199.412720] [<ffffffff811a8f8f>] ? search_binary_handler+0x7f/0x390
> [ 199.412730] [<ffffffff811f72b0>] ? compat_sys_ioctl+0x320/0x320
> [ 199.412740] [<ffffffff811f7515>] load_script+0x265/0x2b0
> [ 199.412750] [<ffffffff811a909f>] ? search_binary_handler+0x18f/0x390
> [ 199.412760] [<ffffffff811f72b0>] ? compat_sys_ioctl+0x320/0x320
> [ 199.412770] [<ffffffff811a9081>] search_binary_handler+0x171/0x390
> [ 199.412779] [<ffffffff811a8f8f>] ? search_binary_handler+0x7f/0x390
> [ 199.412789] [<ffffffff811aa305>] do_execve_common.isra.23+0x405/0x4c0
> [ 199.412799] [<ffffffff811aa014>] ? do_execve_common.isra.23+0x114/0x4c0
> [ 199.412809] [<ffffffff811aa3d8>] do_execve+0x18/0x20
> [ 199.412922] [<ffffffff811aa67d>] sys_execve+0x3d/0x60
> [ 199.412933] [<ffffffff81701b19>] stub_execve+0x69/0xc0

2013-03-04 07:38:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: fasync race in fs/fcntl.c

On Sun, Mar 03, 2013 at 10:16:10PM -0800, Russ Dill wrote:
> On Sat, Mar 2, 2013 at 4:09 PM, Russ Dill <[email protected]> wrote:
> > On Sat, Mar 2, 2013 at 11:49 AM, Al Viro <[email protected]> wrote:
> >> On Sat, Mar 02, 2013 at 03:00:28AM -0800, Russ Dill wrote:
> >>> I'm seeing a race in fs/fcntl.c. I'm not sure exactly how the race is
> >>> occurring, but the following is my best guess. A kernel log is
> >>> attached.
> >>
> >> [snip the analysis - it's a different lock anyway]
> >>
> >> The traces below are essentially sys_execve() getting to get_random_bytes(),
> >> to kill_fasync(), to send_sigio(), which spins on tasklist_lock.
> >>
> >> Could you rebuild it with lockdep enabled and try to reproduce that?
> >> I very much doubt that this execve() is a part of deadlock - it's
> >> getting caught on one, but it shouldn't be holding any locks that
> >> nest inside tasklist_lock at that point, so even it hadn't been there,
> >> the process holding tasklist_lock probably wouldn't have progressed any
> >> further...
> >
> > ok, I did screw up the analysis quite badly, luckily, lockdep got it right away.
> >
>
> So lockdep gives some clues, but seems a bit confused, so here's what happened.
>
> mix_pool_bytes /* takes nonblocking_pool.lock */
> add_device_randomness
> posix_cpu_timers_exit
> __exit_signal
> release_task /* takes write lock on tasklist_lock */
> do_exit
> __module_put_and_exit
> cryptomgr_test
>
> send_sigio /* takes read lock on tasklist_lock */
> kill_fasync_rcu
> kill_fasync
> account /* takes nonblocking_pool.lock */
> extract_entropy
> get_random_bytes
> create_elf_tables
> load_elf_binary
> load_elf_library
> search_binary_handler
>
> This would mark the culprit as 613370549 'random: Mix cputime from
> each thread that exits to the pool'. So long as I'm not as crazy on
> the last analysis as this one, may I suggest a revert of this commit
> for 3.8.3?

I'll revert it, but shouldn't we fix this properly upstream in Linus's
tree as well? I'd rather take the fix than a revert so that we don't
have a problem that no one remembers to fix until 3.9-final is out.

thanks,

greg k-h

2013-03-04 08:03:32

by Russ Dill

[permalink] [raw]
Subject: [PATCH] Revert "random: Mix cputime from each thread that exits to the pool"

This reverts commit 6133705494bb02953e1e2cc3018a4373981b3c97.

The above commit introduces a circular locking dependacy. Existing code
takes nonblocking_port.lock and then tasklist_lock. The code in this
commit takes nonblocking_port.lock while talklist_lock is already held
leading to a potential deadlock. For now, simply revert the commit.

Existing lock order:

send_sigio /* takes read lock on tasklist_lock */
kill_fasync_rcu
kill_fasync
account /* takes nonblocking_pool.lock */
extract_entropy
get_random_bytes
create_elf_tables
load_elf_binary
load_elf_library
search_binary_handler

New code:

mix_pool_bytes /* takes nonblocking_pool.lock */
add_device_randomness
posix_cpu_timers_exit
__exit_signal
release_task /* takes write lock on tasklist_lock */
do_exit
__module_put_and_exit
cryptomgr_test

Signed-off-by: Russ Dill <[email protected]>
---
kernel/posix-cpu-timers.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 942ca27..1f2ef3d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -9,7 +9,6 @@
#include <asm/uaccess.h>
#include <linux/kernel_stat.h>
#include <trace/events/timer.h>
-#include <linux/random.h>

/*
* Called after updating RLIMIT_CPU to run cpu timer and update
@@ -471,8 +470,6 @@ static void cleanup_timers(struct list_head *head,
*/
void posix_cpu_timers_exit(struct task_struct *tsk)
{
- add_device_randomness((const void*) &tsk->se.sum_exec_runtime,
- sizeof(unsigned long long));
cleanup_timers(tsk->cpu_timers,
tsk->utime, tsk->stime, tsk->se.sum_exec_runtime);

--
1.8.1.2

2013-03-04 17:05:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] Revert "random: Mix cputime from each thread that exits to the pool"

Russ, how about this patch instead? Can you verify whether or not
this fixes your issue? If so, I'll push it to Linus.

- Ted

>From 24c598b5abd466f2bab954d0cc8c225e0d3a2a74 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Mon, 4 Mar 2013 11:59:12 -0500
Subject: [PATCH] random: fix locking dependency with the tasklist_lock

Commit 6133705494bb introduced a circular lock dependency because
posix_cpu_timers_exit() is called by release_task(), which is holding
a writer lock on tasklist_lock, and this can cause a deadlock since
kill_fasync() gets called with nonblocking_pool.lock taken.

There's no reason why kill_fasync() needs to be taken while the random
pool is locked, so move it out to fix this locking dependency.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Reported-by: Russ Dill <[email protected]>
Cc: [email protected]
---
drivers/char/random.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 85e81ec..57d4b15 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -852,6 +852,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
int reserved)
{
unsigned long flags;
+ int wakeup_write = 0;

/* Hold lock while accounting */
spin_lock_irqsave(&r->lock, flags);
@@ -873,10 +874,8 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
else
r->entropy_count = reserved;

- if (r->entropy_count < random_write_wakeup_thresh) {
- wake_up_interruptible(&random_write_wait);
- kill_fasync(&fasync, SIGIO, POLL_OUT);
- }
+ if (r->entropy_count < random_write_wakeup_thresh)
+ wakeup_write = 1;
}

DEBUG_ENT("debiting %zu entropy credits from %s%s\n",
@@ -884,6 +883,11 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,

spin_unlock_irqrestore(&r->lock, flags);

+ if (wakeup_write) {
+ wake_up_interruptible(&random_write_wait);
+ kill_fasync(&fasync, SIGIO, POLL_OUT);
+ }
+
return nbytes;
}

--
1.7.12.rc0.22.gcdd159b

2013-03-04 19:33:14

by Russ Dill

[permalink] [raw]
Subject: Re: [PATCH] Revert "random: Mix cputime from each thread that exits to the pool"

On Mon, Mar 4, 2013 at 9:05 AM, Theodore Ts'o <[email protected]> wrote:
> Russ, how about this patch instead? Can you verify whether or not
> this fixes your issue? If so, I'll push it to Linus.
>
> - Ted

Yup, fixes the circular locking dependency.

Tested-by: [email protected]

> From 24c598b5abd466f2bab954d0cc8c225e0d3a2a74 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <[email protected]>
> Date: Mon, 4 Mar 2013 11:59:12 -0500
> Subject: [PATCH] random: fix locking dependency with the tasklist_lock
>
> Commit 6133705494bb introduced a circular lock dependency because
> posix_cpu_timers_exit() is called by release_task(), which is holding
> a writer lock on tasklist_lock, and this can cause a deadlock since
> kill_fasync() gets called with nonblocking_pool.lock taken.
>
> There's no reason why kill_fasync() needs to be taken while the random
> pool is locked, so move it out to fix this locking dependency.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Reported-by: Russ Dill <[email protected]>
> Cc: [email protected]
> ---
> drivers/char/random.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 85e81ec..57d4b15 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -852,6 +852,7 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
> int reserved)
> {
> unsigned long flags;
> + int wakeup_write = 0;
>
> /* Hold lock while accounting */
> spin_lock_irqsave(&r->lock, flags);
> @@ -873,10 +874,8 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
> else
> r->entropy_count = reserved;
>
> - if (r->entropy_count < random_write_wakeup_thresh) {
> - wake_up_interruptible(&random_write_wait);
> - kill_fasync(&fasync, SIGIO, POLL_OUT);
> - }
> + if (r->entropy_count < random_write_wakeup_thresh)
> + wakeup_write = 1;
> }
>
> DEBUG_ENT("debiting %zu entropy credits from %s%s\n",
> @@ -884,6 +883,11 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
>
> spin_unlock_irqrestore(&r->lock, flags);
>
> + if (wakeup_write) {
> + wake_up_interruptible(&random_write_wait);
> + kill_fasync(&fasync, SIGIO, POLL_OUT);
> + }
> +
> return nbytes;
> }
>
> --
> 1.7.12.rc0.22.gcdd159b
>