Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756022AbdLTS0W (ORCPT ); Wed, 20 Dec 2017 13:26:22 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:41950 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755979AbdLTS0U (ORCPT ); Wed, 20 Dec 2017 13:26:20 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Dave Jones Cc: Linus Torvalds , Al Viro , Linux Kernel , syzkaller-bugs@googlegroups.com, Gargi Sharma , Alexey Dobriyan References: <20171218214438.GA32728@codemonkey.org.uk> <20171218221541.GP21978@ZenIV.linux.org.uk> <20171218231013.GA9481@codemonkey.org.uk> <20171219033926.GA26981@codemonkey.org.uk> <87lghy7eul.fsf@xmission.com> <20171219193020.GA9237@codemonkey.org.uk> <878tdy5r5t.fsf@xmission.com> <87mv2e17vz.fsf@xmission.com> <20171220052803.GA17079@codemonkey.org.uk> Date: Wed, 20 Dec 2017 12:25:52 -0600 In-Reply-To: <20171220052803.GA17079@codemonkey.org.uk> (Dave Jones's message of "Wed, 20 Dec 2017 00:28:03 -0500") Message-ID: <871sjp1cjz.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eRj4K-0005kN-PS;;;mid=<871sjp1cjz.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=75.170.127.89;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+DtFAw/60atkoQsjKkR5zgdjI+pEh6x50= X-SA-Exim-Connect-IP: 75.170.127.89 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.2 LotsOfNums_01 BODY: Lots of long strings of numbers * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa08 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa08 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Dave Jones X-Spam-Relay-Country: X-Spam-Timing: total 434 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 3.2 (0.7%), b_tie_ro: 2.4 (0.6%), parse: 0.61 (0.1%), extract_message_metadata: 8 (1.9%), get_uri_detail_list: 1.46 (0.3%), tests_pri_-1000: 3.6 (0.8%), tests_pri_-950: 0.95 (0.2%), tests_pri_-900: 0.77 (0.2%), tests_pri_-400: 24 (5.6%), check_bayes: 24 (5.4%), b_tokenize: 7 (1.5%), b_tok_get_all: 8 (1.9%), b_comp_prob: 1.82 (0.4%), b_tok_touch_all: 4.6 (1.1%), b_finish: 0.62 (0.1%), tests_pri_0: 279 (64.3%), check_dkim_signature: 0.40 (0.1%), check_dkim_adsp: 2.8 (0.6%), tests_pri_500: 111 (25.6%), poll_dns_idle: 105 (24.3%), rewrite_mail: 0.00 (0.0%) Subject: Re: proc_flush_task oops X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3789 Lines: 93 Dave Jones writes: > On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote: > > > > *Scratches my head* I am not seeing anything obvious. > > > > Can you try this patch as you reproduce this issue? > > > > diff --git a/kernel/pid.c b/kernel/pid.c > > index b13b624e2c49..df9e5d4d8f83 100644 > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) > > goto out_unlock; > > for ( ; upid >= pid->numbers; --upid) { > > /* Make the PID visible to find_pid_ns. */ > > + WARN_ON(!upid->ns->proc_mnt); > > idr_replace(&upid->ns->idr, pid, upid->nr); > > upid->ns->pid_allocated++; > > } > > > > > > If the warning triggers it means the bug is in alloc_pid and somehow > > something has gotten past the is_child_reaper check. > > You're onto something. > > WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280 > CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 > RIP: 0010:alloc_pid+0x230/0x280 > RSP: 0018:ffffc90009977d48 EFLAGS: 00010046 > RAX: 0000000000000030 RBX: ffff8804fb431280 RCX: 8f5c28f5c28f5c29 > RDX: ffff88050a00de40 RSI: ffffffff82005218 RDI: ffff8804fc6aa9a8 > RBP: ffff8804fb431270 R08: 0000000000000000 R09: 0000000000000001 > R10: ffffc90009977cc0 R11: eab94e31da7171b7 R12: ffff8804fb431260 > R13: ffff8804fb431240 R14: ffffffff82005200 R15: ffff8804fb431268 > FS: 00007f49b9065700(0000) GS:ffff88050a000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f49b906a000 CR3: 00000004f7446001 CR4: 00000000001606e0 > DR0: 00007f0b4c405000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 > Call Trace: > copy_process.part.41+0x14fa/0x1e30 > _do_fork+0xe7/0x720 > ? rcu_read_lock_sched_held+0x6c/0x80 > ? syscall_trace_enter+0x2d7/0x340 > do_syscall_64+0x60/0x210 > entry_SYSCALL64_slow_path+0x25/0x25 > > followed immediately by... > > Oops: 0000 [#1] SMP > CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: G W 4.15.0-rc4-think+ #3 > RIP: 0010:proc_flush_task+0x8e/0x1b0 > RSP: 0018:ffffc90009977c40 EFLAGS: 00010286 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb > RDX: 0000000000000000 RSI: ffffc90009977c50 RDI: 0000000000000000 > RBP: ffffc90009977c63 R08: 0000000000000000 R09: 0000000000000002 > R10: ffffc90009977b70 R11: ffffc90009977c64 R12: 0000000000000004 > R13: 0000000000000000 R14: 0000000000000004 R15: ffff8804fb431240 > FS: 00007f49b9065700(0000) GS:ffff88050a000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 00000004f7446001 CR4: 00000000001606e0 > DR0: 00007f0b4c405000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 > Call Trace: > ? release_task+0xaf/0x680 > release_task+0xd2/0x680 > ? wait_consider_task+0xb82/0xce0 > wait_consider_task+0xbe9/0xce0 > ? do_wait+0xe1/0x330 > do_wait+0x151/0x330 > kernel_wait4+0x8d/0x150 > ? task_stopped_code+0x50/0x50 > SYSC_wait4+0x95/0xa0 > ? rcu_read_lock_sched_held+0x6c/0x80 > ? syscall_trace_enter+0x2d7/0x340 > ? do_syscall_64+0x60/0x210 > do_syscall_64+0x60/0x210 > entry_SYSCALL64_slow_path+0x25/0x25 I am not seeing where things go wrong, but that puts the recent pid bitmap, bit hash to idr change in the suspect zone. Can you try reverting that change: e8cfbc245e24 ("pid: remove pidhash") 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") While keeping the warning in place so we can see if this fixes the allocation problem? Eric