Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965364AbbHKQsh (ORCPT ); Tue, 11 Aug 2015 12:48:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35622 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752728AbbHKQsf (ORCPT ); Tue, 11 Aug 2015 12:48:35 -0400 Date: Tue, 11 Aug 2015 13:48:33 -0300 From: "Herton R. Krzesinski" To: Manfred Spraul Cc: linux-kernel@vger.kernel.org, Andrew Morton , Davidlohr Bueso , Rafael Aquini , Joe Perches , Aristeu Rozanski , djeffery@redhat.com Subject: Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Message-ID: <20150811164833.GA29948@dhcppc4.redhat.com> References: <1438967375-14877-1-git-send-email-herton@redhat.com> <55C79294.2010006@colorfullife.com> <20150810153147.GA3540@dhcppc4.redhat.com> <55C8F533.1090007@colorfullife.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55C8F533.1090007@colorfullife.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12009 Lines: 211 On Mon, Aug 10, 2015 at 09:02:11PM +0200, Manfred Spraul wrote: > Hi Herton, > > On 08/10/2015 05:31 PM, Herton R. Krzesinski wrote: > >Well without the synchronize_rcu() and with the semid list loop fix I was still > >able to get issues, and I thought the problem is related to racing with IPC_RMID > >on freeary again. This is one scenario I would imagine: > > > > A B > > > >freeary() > > list_del(&un->list_id) > > spin_lock(&un->ulp->lock) > > un->semid = -1 > > list_del_rcu(&un->list_proc) > > __list_del_entry(&un->list_proc) > > __list_del(entry->prev, entry->next) exit_sem() > > next->prev = prev; ... > > prev->next = next; ... > > ... un = list_entry_rcu(ulp->list_proc.next...) > > (&un->list_proc)->prev = LIST_POISON2 if (&un->list_proc == &ulp->list_proc) > > ... kfree(ulp) > > spin_unlock(&un->ulp->lock) <---- bug > > > >Now that is a very tight window, but I had problems even when I tried this patch > >first: > > > >(...) > >- if (&un->list_proc == &ulp->list_proc) > >- semid = -1; > >- else > >- semid = un->semid; > >+ if (&un->list_proc == &ulp->list_proc) { > >+ rcu_read_unlock(); > What about: > + spin_unlock_wait(&ulp->lock); spin_unlock_wait works, thanks (both in theory and in practice, I tested). > >+ break; > >+ } > >+ spin_lock(&ulp->lock); > >+ semid = un->semid; > >+ spin_unlock(&ulp->lock); > > > >+ /* exit_sem raced with IPC_RMID, nothing to do */ > > if (semid == -1) { > > rcu_read_unlock(); > >- break; > >+ synchronize_rcu(); > >+ continue; > > } > >(...) > > > >So even with the bad/uneeded synchronize_rcu() which I had placed there, I could > >still get issues (however the testing on patch above was on an older kernel than > >latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest > >upstream, however the affected code is the same). That's when I thought of > >scenario above. I was able to get this oops: > Adding sleep() usually help, too. But it is ugly, so let's try to understand > the race and to fix it. Yes, the race should be what I described earlier, and it's understood. I also now have proof in practice that without synchronization before exit, I'm able to trigger a crash on latest upstream kernel as well. This is the change I tested with latest stock 4.2-rc6: diff --git a/ipc/sem.c b/ipc/sem.c index bc3d530..3b8b66b 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -2074,17 +2074,26 @@ void exit_sem(struct task_struct *tsk) rcu_read_lock(); un = list_entry_rcu(ulp->list_proc.next, struct sem_undo, list_proc); - if (&un->list_proc == &ulp->list_proc) - semid = -1; - else - semid = un->semid; + if (&un->list_proc == &ulp->list_proc) { + /* we must wait for freeary() before freeing this ulp, + * in case we raced with last sem_undo. There is a small + * possibility where we exit while freeary() didn't + * finish unlocking sem_undo_list */ + /*spin_unlock_wait(&ulp->lock);*/ + rcu_read_unlock(); + break; + } + spin_lock(&ulp->lock); + semid = un->semid; + spin_unlock(&ulp->lock); + /* exit_sem raced with IPC_RMID, nothing to do */ if (semid == -1) { rcu_read_unlock(); - break; + continue; } - sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid); + sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid); /* exit_sem raced with IPC_RMID, nothing to do */ if (IS_ERR(sma)) { rcu_read_unlock(); I commented the synchronization for testing (the spin_unlock_wait call). Because of the comment/removing the spin_unlock_wait, I'm then able to trigger the following crash: [ 587.768363] BUG: spinlock wrong owner on CPU#2, test/419 [ 587.768434] general protection fault: 0000 [#1] SMP [ 587.768454] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6 table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul c rc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcor e qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib] [ 587.768712] CPU: 2 PID: 419 Comm: test Not tainted 4.2.0-rc6+ #1 [ 587.768732] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 587.768762] task: ffff88003ad68340 ti: ffff88003a0ec000 task.ti: ffff88003a0ec000 [ 587.768785] RIP: 0010:[] [] spin_dump+0x53/0xc0 [ 587.768818] RSP: 0018:ffff88003a0efd68 EFLAGS: 00010202 [ 587.768836] RAX: 000000000000002c RBX: ffff88003d016f08 RCX: 0000000000000000 [ 587.768859] RDX: ffff88003fd11518 RSI: ffff88003fd0eb68 RDI: ffff88003fd0eb68 [ 587.768882] RBP: ffff88003a0efd78 R08: 0000000000000000 R09: 000000006b6b6b6b [ 587.768904] R10: 000000000000025f R11: ffffc90000b30020 R12: 6b6b6b6b6b6b6b6b [ 587.768927] R13: ffff88003a5e5c40 R14: ffff88003d1eb340 R15: ffff88003a5e5c40 [ 587.768949] FS: 00007fc2fcd89700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 587.768981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 587.769002] CR2: 00007fc2fcb84dd4 CR3: 000000003d230000 CR4: 00000000000406e0 [ 587.769004] Stack: [ 587.769004] ffff88003d016f08 ffffffff81a4f4ad ffff88003a0efd98 ffffffff810d6150 [ 587.769004] ffff88003d016f08 ffff88003a5e5cb8 ffff88003a0efdb8 ffffffff810d61e2 [ 587.769004] ffff88003a5e5c40 ffff88003a5e5c90 ffff88003a0efdc8 ffffffff8175d50e [ 587.769004] Call Trace: [ 587.769004] [] spin_bug+0x30/0x40 [ 587.769004] [] do_raw_spin_unlock+0x82/0xa0 [ 587.769004] [] _raw_spin_unlock+0xe/0x10 [ 587.769004] [] freeary+0x82/0x2a0 [ 587.769004] [] ? _raw_spin_lock+0xe/0x10 [ 587.769004] [] semctl_down.clone.0+0xce/0x160 [ 587.769004] [] ? __do_page_fault+0x19a/0x430 [ 587.769004] [] ? __audit_syscall_entry+0xa8/0x100 [ 587.769004] [] SyS_semctl+0x236/0x2c0 [ 587.769004] [] ? syscall_trace_leave+0xde/0x130 [ 587.769004] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 587.769004] Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2d a4 81 31 c0 65 8b 15 8b 40 f3 7e e8 21 33 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89 [ 587.769004] RIP [] spin_dump+0x53/0xc0 [ 587.769004] RSP [ 587.769563] ---[ end trace 844d186084062ba4 ]--- [ 612.022002] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:446] [ 612.022002] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib] [ 612.022002] CPU: 3 PID: 446 Comm: test Tainted: G D 4.2.0-rc6+ #1 [ 612.022002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 612.022002] task: ffff88003e3ab040 ti: ffff88003ac20000 task.ti: ffff88003ac20000 [ 612.022002] RIP: 0010:[] [] native_read_tsc+0x6/0x20 [ 612.022002] RSP: 0018:ffff88003ac23c08 EFLAGS: 00000206 [ 612.022002] RAX: 00000000e0df8c7d RBX: 800000002060b065 RCX: 00000000e0df8c5f [ 612.022002] RDX: 00000000000001a4 RSI: 0000000000000000 RDI: 0000000000000001 [ 612.022002] RBP: ffff88003ac23c08 R08: 0000000000000000 R09: ffff88003aea5cc0 [ 612.022002] R10: 00007ffdf0e89830 R11: 0000000000000008 R12: ffffffff8106a938 [ 612.022002] R13: ffff88003ac23b98 R14: ffff88003a295300 R15: ffff88003a295000 [ 612.022002] FS: 00007fc2fcd89700(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000 [ 612.022002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 612.022002] CR2: 00007fc2fc8dffd0 CR3: 000000003905d000 CR4: 00000000000406e0 [ 612.022002] Stack: [ 612.022002] ffff88003ac23c38 ffffffff8138bd30 ffff88003a5e5c40 00000000ac762d50 [ 612.022002] 000000003662aefc 0000000000000001 ffff88003ac23c48 ffffffff8138bcaf [ 612.022002] ffff88003ac23c78 ffffffff810d6386 ffff88003a5e5c40 ffff880037a0f880 [ 612.022002] Call Trace: [ 612.022002] [] delay_tsc+0x40/0x70 [ 612.022002] [] __delay+0xf/0x20 [ 612.022002] [] do_raw_spin_lock+0x96/0x140 [ 612.022002] [] _raw_spin_lock+0xe/0x10 [ 612.022002] [] sem_lock_and_putref+0x11/0x70 [ 612.022002] [] SYSC_semtimedop+0x7bf/0x960 [ 612.022002] [] ? handle_mm_fault+0xbf6/0x1880 [ 612.022002] [] ? _raw_spin_unlock+0xe/0x10 [ 612.022002] [] ? free_pcppages_bulk+0x436/0x5a0 [ 612.022002] [] ? __do_page_fault+0x19a/0x430 [ 612.022002] [] ? kfree_debugcheck+0x16/0x40 [ 612.022002] [] ? __do_page_fault+0x19a/0x430 [ 612.022002] [] ? __audit_syscall_entry+0xa8/0x100 [ 612.022002] [] ? do_audit_syscall_entry+0x66/0x70 [ 612.022002] [] ? syscall_trace_enter_phase1+0x139/0x160 [ 612.022002] [] SyS_semtimedop+0xe/0x10 [ 612.022002] [] SyS_semop+0x10/0x20 [ 612.022002] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 612.022002] Code: c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 55 48 89 e5 0f 31 <89> c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9 c3 66 2e 0f 1f 84 [ 612.022002] Kernel panic - not syncing: softlockup: hung tasks Then in another build with the spin_unlock_wait in, I left the test case running for some hours, I didn't see issues. In theory also I expect there is no more problems. I'll submit a new version of the patch, it's same diff as above but without the comment. > > Best regards, > Manfred Thanks, -- []'s Herton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/