Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754793AbcKBOGA (ORCPT ); Wed, 2 Nov 2016 10:06:00 -0400 Received: from mail-ua0-f173.google.com ([209.85.217.173]:35281 "EHLO mail-ua0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753021AbcKBOF7 (ORCPT ); Wed, 2 Nov 2016 10:05:59 -0400 MIME-Version: 1.0 In-Reply-To: <201611021950.FEJ34368.HFFJOOMLtQOVSF@I-love.SAKURA.ne.jp> References: <201611012336.IAC18714.VLMOQSHOFtOFJF@I-love.SAKURA.ne.jp> <201611021950.FEJ34368.HFFJOOMLtQOVSF@I-love.SAKURA.ne.jp> From: Andy Lutomirski Date: Wed, 2 Nov 2016 07:05:35 -0700 Message-ID: Subject: Re: [4.9-rc3] BUG: unable to handle kernel paging request at ffffc900144dfc60 To: Tetsuo Handa Cc: Linus Torvalds , Peter Zijlstra , Ingo Molnar , Andrew Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , Brian Gerst , Borislav Petkov , Jann Horn , Linux API , Kees Cook , Tycho Andersen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uA2E6AgC008780 Content-Length: 6320 Lines: 111 On Wed, Nov 2, 2016 at 3:50 AM, Tetsuo Handa wrote: > Linus Torvalds wrote: >> On Tue, Nov 1, 2016 at 8:36 AM, Tetsuo Handa >> wrote: >> > >> > I got an Oops with khungtaskd. This kernel was built with CONFIG_THREAD_INFO_IN_TASK=y . >> > Is this same reason? >> >> CONFIG_THREAD_INFO_IN_TASK is always set on x86, but I assume you also >> did VMAP_STACK > > Yes. And I wrote a reproducer. > > ---------- Reproducer start ---------- > #include > #include > > int main(int argc, char *argv[]) > { > if (fork() == 0) > _exit(0); > sleep(1); > system("echo t > /proc/sysrq-trigger"); > return 0; > } > ---------- Reproducer end ---------- > > ---------- Serial console log start ---------- > [ 328.528734] a.out x > [ 328.529293] BUG: unable to handle kernel > [ 328.530655] paging request at ffffc90001f43e18 > [ 328.531837] IP: [] thread_saved_pc+0xb/0x20 > [ 328.533512] PGD 7f4c0067 > [ 328.533972] PUD 7f4c1067 > [ 328.535065] PMD 74cba067 > [ 328.535296] PTE 0 > > [ 328.537173] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 328.538698] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter coretemp pcspkr sg i2c_piix4 shpchp vmw_vmci ip_tables sd_mod ata_generic pata_acpi serio_raw mptspi vmwgfx scsi_transport_spi drm_kms_helper ahci syscopyarea sysfillrect sysimgblt mptscsih e1000 fb_sys_fops libahci ttm drm mptbase ata_piix i2c_core libata > [ 328.552465] CPU: 0 PID: 4299 Comm: sh Tainted: G W 4.9.0-rc3+ #83 > [ 328.554403] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 > [ 328.556939] task: ffff8800792b5380 task.stack: ffffc90001f58000 > [ 328.558686] RIP: 0010:[] [] thread_saved_pc+0xb/0x20 > [ 328.560926] RSP: 0018:ffffc90001f5bd28 EFLAGS: 00010202 > [ 328.562603] RAX: ffffc90001f43de8 RBX: ffff88007826d380 RCX: 0000000000000006 > [ 328.564507] RDX: 0000000000000000 RSI: ffffffff8197f2d1 RDI: ffff88007826d380 > [ 328.566437] RBP: ffffc90001f5bd28 R08: 0000000000000001 R09: 0000000000000001 > [ 328.568354] R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000007 > [ 328.570266] R13: ffff88007826d638 R14: ffff88007826d380 R15: 0000000000000002 > [ 328.572197] FS: 00007ff7b501e740(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000 > [ 328.574303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 328.576006] CR2: ffffc90001f43e18 CR3: 000000007894c000 CR4: 00000000001406f0 > [ 328.577995] Stack: > [ 328.579024] ffffc90001f5bd50 ffffffff810974c0 ffffc90001f5bd50 ffff88007826d380 > [ 328.581219] 0000000000000000 ffffc90001f5bd88 ffffffff81097767 ffffffff810976b0 > [ 328.583300] ffffffff81c74e60 0000000000000074 0000000000000000 0000000000000007 > [ 328.585404] Call Trace: > [ 328.586531] [] sched_show_task+0x50/0x240 > [ 328.588184] [] show_state_filter+0xb7/0x190 > [ 328.589860] [] ? sched_show_task+0x240/0x240 > [ 328.591553] [] sysrq_handle_showstate+0xb/0x20 > [ 328.593304] [] __handle_sysrq+0x136/0x220 > [ 328.594992] [] ? __sysrq_get_key_op+0x30/0x30 > [ 328.596678] [] write_sysrq_trigger+0x41/0x50 > [ 328.598386] [] proc_reg_write+0x38/0x70 > [ 328.600038] [] __vfs_write+0x32/0x140 > [ 328.601604] [] ? rcu_read_lock_sched_held+0x87/0x90 > [ 328.603365] [] ? rcu_sync_lockdep_assert+0x2a/0x50 > [ 328.605111] [] ? __sb_start_write+0x189/0x240 > [ 328.606735] [] ? vfs_write+0x182/0x1b0 > [ 328.608278] [] vfs_write+0xb0/0x1b0 > [ 328.609777] [] ? syscall_trace_enter+0x1b0/0x240 > [ 328.611513] [] SyS_write+0x53/0xc0 > [ 328.612989] [] ? __this_cpu_preempt_check+0x13/0x20 > [ 328.614757] [] do_syscall_64+0x61/0x1d0 > [ 328.616329] [] entry_SYSCALL64_slow_path+0x25/0x25 > [ 328.618057] Code: 55 48 8b bf d0 01 00 00 be 00 00 00 02 48 89 e5 e8 6b 58 3f 00 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 8b 87 e0 15 00 00 48 89 e5 <48> 8b 40 30 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 > [ 328.624402] RIP [] thread_saved_pc+0xb/0x20 > [ 328.626124] RSP > [ 328.627375] CR2: ffffc90001f43e18 > [ 328.628646] ---[ end trace 70b31f25a2ce0c0c ]--- > ---------- Serial console log end ---------- > >> Considering that we just print out a useless hex number, not even a >> symbol, and there's a big question mark whether this even makes sense >> anyway, I suspect we should just remove it all. The real information >> would have come later as part of "show_stack()", which seems to be >> doing the proper try_get_task_stack(). >> >> So I _think_ the fix is to just remove this. Perhaps something like >> the attached? Adding scheduler people since this is in their code.. > > That is not sufficient, for another Oops occurs inside stack_not_used(). > Since I don't want to break stack_not_used(), can we tolerate nested > try_get_task_stack() usage and protect the whole sched_show_task()? > > ---------------------------------------- > >From 9cf83a0a8c48d281434b040694835743940a88b2 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Wed, 2 Nov 2016 19:31:07 +0900 > Subject: [PATCH] sched: Fix oops in sched_show_task() > > When CONFIG_VMAP_STACK=y, it is possible that an exited thread remains in Nit: It's CONFIG_THREAD_INFO_IN_TASK=y that does this. This patch looks fine to me. Linus, your patch also looks almost good (I think the lines you deleted were spaced like that to preserve output alignment, which may or may not matter), and maybe it would make sense to apply both.