Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935470AbdLRVop (ORCPT ); Mon, 18 Dec 2017 16:44:45 -0500 Received: from scorn.kernelslacker.org ([45.56.101.199]:44006 "EHLO scorn.kernelslacker.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933574AbdLRVoj (ORCPT ); Mon, 18 Dec 2017 16:44:39 -0500 Date: Mon, 18 Dec 2017 16:44:38 -0500 From: Dave Jones To: Linux Kernel Cc: Linus Torvalds , Al Viro Subject: proc_flush_task oops Message-ID: <20171218214438.GA32728@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Linux Kernel , Linus Torvalds , Al Viro MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.2 (2017-12-15) X-Spam-Note: SpamAssassin invocation failed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4360 Lines: 107 I've hit this twice today. It's odd, because afaics, none of this code has really changed in a long time. Dave Oops: 0000 [#1] SMP CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:ffffc9000bbffc40 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb RDX: 0000000000000000 RSI: ffffc9000bbffc50 RDI: 0000000000000000 RBP: ffffc9000bbffc63 R08: 0000000000000000 R09: 0000000000000002 R10: ffffc9000bbffb70 R11: ffffc9000bbffc64 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8804c10d7840 FS: 00007f7cb8965700(0000) GS:ffff88050a200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000003e21ae003 CR4: 00000000001606e0 DR0: 00007fb1d6c22000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7cb82603aa RSP: 002b:00007ffd60770bc8 EFLAGS: 00000246 ORIG_RAX: 000000000000003d RAX: ffffffffffffffda RBX: 00007f7cb6cd4000 RCX: 00007f7cb82603aa RDX: 000000000000000b RSI: 00007ffd60770bd0 RDI: 0000000000007cca RBP: 0000000000007cca R08: 00007f7cb8965700 R09: 00007ffd607c7080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd60770bd0 R14: 00007f7cb6cd4058 R15: 00000000cccccccd Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 RIP: proc_flush_task+0x8e/0x1b0 RSP: ffffc9000bbffc40 CR2: 0000000000000000 ---[ end trace 53d67a6481059862 ]--- All code ======== 0: c1 e2 04 shl $0x4,%edx 3: 44 8b 60 30 mov 0x30(%rax),%r12d 7: 48 8b 40 38 mov 0x38(%rax),%rax b: 44 8b 34 11 mov (%rcx,%rdx,1),%r14d f: 48 c7 c2 60 3a f5 81 mov $0xffffffff81f53a60,%rdx 16: 44 89 e1 mov %r12d,%ecx 19: 4c 8b 68 58 mov 0x58(%rax),%r13 1d: e8 4b b4 77 00 callq 0x77b46d 22: 89 44 24 14 mov %eax,0x14(%rsp) 26: 48 8d 74 24 10 lea 0x10(%rsp),%rsi 2b:* 49 8b 7d 00 mov 0x0(%r13),%rdi <-- trapping instruction 2f: e8 b9 6a f9 ff callq 0xfffffffffff96aed 34: 48 85 c0 test %rax,%rax 37: 74 1a je 0x53 39: 48 89 c7 mov %rax,%rdi 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 44 rex.R 3f: 24 .byte 0x24 Code starting with the faulting instruction =========================================== 0: 49 8b 7d 00 mov 0x0(%r13),%rdi 4: e8 b9 6a f9 ff callq 0xfffffffffff96ac2 9: 48 85 c0 test %rax,%rax c: 74 1a je 0x28 e: 48 89 c7 mov %rax,%rdi 11: 48 rex.W 12: 89 .byte 0x89 13: 44 rex.R 14: 24 .byte 0x24 This looks like an inlined part of proc_flush_task_mnt dentry = d_hash_and_lookup(mnt->mnt_root, &name); 4f99: 48 8d 74 24 10 lea 0x10(%rsp),%rsi 4f9e: 49 8b 7d 00 mov 0x0(%r13),%rdi 4fa2: e8 00 00 00 00 callq 4fa7 So it looks like this.. 3097 for (i = 0; i <= pid->level; i++) { 3098 upid = &pid->numbers[i]; 3099 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr, 3100 tgid->numbers[i].nr); 3101 } somehow passed a null upid->ns->proc_mnt down there. I'll try and narrow down a reproducer tomorrow. Any obvious recent changes that might explain this, or did I just finally appease the entropy gods enough to find the right combination of args to hit this ? Dave