Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756973AbdLXDQf (ORCPT ); Sat, 23 Dec 2017 22:16:35 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:39919 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750876AbdLXDQd (ORCPT ); Sat, 23 Dec 2017 22:16:33 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Alexey Dobriyan Cc: Dave Jones , Linus Torvalds , Al Viro , Linux Kernel , syzkaller-bugs@googlegroups.com, Gargi Sharma , Oleg Nesterov , Rik van Riel , Andrew Morton References: <871sjp1cjz.fsf@xmission.com> <20171221031606.GA4636@codemonkey.org.uk> <87po78trjm.fsf@xmission.com> <20171221220044.GA4977@codemonkey.org.uk> <87wp1fk0pd.fsf@xmission.com> <20171222033500.GA17273@codemonkey.org.uk> <87y3lvi480.fsf@xmission.com> <87lghuesel.fsf@xmission.com> <20171222161123.GA2632@avx2> <87r2rkddkh.fsf@xmission.com> Date: Sat, 23 Dec 2017 21:16:03 -0600 In-Reply-To: <87r2rkddkh.fsf@xmission.com> (Eric W. Biederman's message of "Sat, 23 Dec 2017 21:12:14 -0600") Message-ID: <87efnkdde4.fsf_-_@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eSwm7-0004Ae-OO;;;mid=<87efnkdde4.fsf_-_@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/PuDsRPvOm3/WFP5wT0KiSgvxaIJJ8QoE= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 1.2 LotsOfNums_01 BODY: Lots of long strings of numbers * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Alexey Dobriyan X-Spam-Relay-Country: X-Spam-Timing: total 1046 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 2.4 (0.2%), b_tie_ro: 1.64 (0.2%), parse: 0.81 (0.1%), extract_message_metadata: 13 (1.3%), get_uri_detail_list: 2.3 (0.2%), tests_pri_-1000: 7 (0.6%), tests_pri_-950: 1.17 (0.1%), tests_pri_-900: 0.97 (0.1%), tests_pri_-400: 29 (2.8%), check_bayes: 28 (2.7%), b_tokenize: 10 (0.9%), b_tok_get_all: 9 (0.9%), b_comp_prob: 2.8 (0.3%), b_tok_touch_all: 4.9 (0.5%), b_finish: 0.56 (0.1%), tests_pri_0: 455 (43.5%), check_dkim_signature: 0.50 (0.0%), check_dkim_adsp: 4.4 (0.4%), tests_pri_500: 534 (51.1%), poll_dns_idle: 530 (50.7%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH] pid: Handle failure to allocate the first pid in a pid namespace X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4038 Lines: 101 With the replacement of the pid bitmap and hashtable with an idr in alloc_pid started occassionally failing when allocating the first pid in a pid namespace. Things were not completely reset resulting in the first allocated pid getting the number 2 (not 1). Which further resulted in ns->proc_mnt not getting set and eventually causing an oops in proc_flush_task. Oops: 0000 [#1] SMP CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:ffffc9000bbffc40 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb RDX: 0000000000000000 RSI: ffffc9000bbffc50 RDI: 0000000000000000 RBP: ffffc9000bbffc63 R08: 0000000000000000 R09: 0000000000000002 R10: ffffc9000bbffb70 R11: ffffc9000bbffc64 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8804c10d7840 FS: 00007f7cb8965700(0000) GS:ffff88050a200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000003e21ae003 CR4: 00000000001606e0 DR0: 00007fb1d6c22000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7cb82603aa RSP: 002b:00007ffd60770bc8 EFLAGS: 00000246 ORIG_RAX: 000000000000003d RAX: ffffffffffffffda RBX: 00007f7cb6cd4000 RCX: 00007f7cb82603aa RDX: 000000000000000b RSI: 00007ffd60770bd0 RDI: 0000000000007cca RBP: 0000000000007cca R08: 00007f7cb8965700 R09: 00007ffd607c7080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd60770bd0 R14: 00007f7cb6cd4058 R15: 00000000cccccccd Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 RIP: proc_flush_task+0x8e/0x1b0 RSP: ffffc9000bbffc40 CR2: 0000000000000000 ---[ end trace 53d67a6481059862 ]--- Improve the quality of the implementation by resetting the place to start allocating pids on failure to allocate the first pid. As improving the quality of the implementation is the goal remove the now unnecesarry disable_pid_allocations call when we fail to mount proc. Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") Fixes: 8ef047aaaeb8 ("pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid") Reported-by: Dave Jones Signed-off-by: "Eric W. Biederman" --- I have tested this by forcing an allocation failure after allocating pid 1, and the symptoms are identical to what Dave Jones has observed. I have further confirmed that this change causes the failure to go away. If no one objects I will send a pull request to Linus in the next day or two. kernel/pid.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index b13b624e2c49..1e8bb6550ec4 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -193,10 +193,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) } if (unlikely(is_child_reaper(pid))) { - if (pid_ns_prepare_proc(ns)) { - disable_pid_allocation(ns); + if (pid_ns_prepare_proc(ns)) goto out_free; - } } get_pid_ns(ns); @@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns) while (++i <= ns->level) idr_remove(&ns->idr, (pid->numbers + i)->nr); + /* On failure to allocate the first pid, reset the state */ + if (ns->pid_allocated == PIDNS_ADDING) + idr_set_cursor(&ns->idr, 0); + spin_unlock_irq(&pidmap_lock); kmem_cache_free(ns->pid_cachep, pid); -- 2.14.1