Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756305AbdLXDMt (ORCPT ); Sat, 23 Dec 2017 22:12:49 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:39565 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750805AbdLXDMq (ORCPT ); Sat, 23 Dec 2017 22:12:46 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Alexey Dobriyan Cc: Dave Jones , Linus Torvalds , Al Viro , Linux Kernel , syzkaller-bugs@googlegroups.com, Gargi Sharma , Oleg Nesterov , Rik van Riel , Andrew Morton References: <871sjp1cjz.fsf@xmission.com> <20171221031606.GA4636@codemonkey.org.uk> <87po78trjm.fsf@xmission.com> <20171221220044.GA4977@codemonkey.org.uk> <87wp1fk0pd.fsf@xmission.com> <20171222033500.GA17273@codemonkey.org.uk> <87y3lvi480.fsf@xmission.com> <87lghuesel.fsf@xmission.com> <20171222161123.GA2632@avx2> Date: Sat, 23 Dec 2017 21:12:14 -0600 In-Reply-To: <20171222161123.GA2632@avx2> (Alexey Dobriyan's message of "Fri, 22 Dec 2017 19:11:23 +0300") Message-ID: <87r2rkddkh.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eSwiR-0000Vy-2q;;;mid=<87r2rkddkh.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/PFWQipX4xclGbcvpp0/KsfrygDwEqeTg= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4994] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Alexey Dobriyan X-Spam-Relay-Country: X-Spam-Timing: total 397 ms - load_scoreonly_sql: 0.07 (0.0%), signal_user_changed: 4.3 (1.1%), b_tie_ro: 2.8 (0.7%), parse: 1.49 (0.4%), extract_message_metadata: 5 (1.3%), get_uri_detail_list: 2.2 (0.5%), tests_pri_-1000: 6 (1.5%), tests_pri_-950: 1.71 (0.4%), tests_pri_-900: 1.43 (0.4%), tests_pri_-400: 27 (6.8%), check_bayes: 26 (6.5%), b_tokenize: 9 (2.2%), b_tok_get_all: 9 (2.2%), b_comp_prob: 2.8 (0.7%), b_tok_touch_all: 3.2 (0.8%), b_finish: 0.77 (0.2%), tests_pri_0: 332 (83.7%), check_dkim_signature: 0.69 (0.2%), check_dkim_adsp: 4.5 (1.1%), tests_pri_500: 4.0 (1.0%), rewrite_mail: 0.00 (0.0%) Subject: Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops) X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1745 Lines: 55 Alexey Dobriyan writes: > On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote: >> Alexey Dobriyan writes: > >> > unshare >> > fork >> > alloc_pid in level 1 succeeds >> > alloc_pid in level 0 fails, ->idr_next is 2 >> > fork >> > alloc pid 2 >> > exit >> > >> > Reliable reproducer and fail injection patch attached >> > >> > I'd say proper fix is allocating pids in the opposite order >> > so that failure in the last layer doesn't move IDR cursor >> > in baby child pidns. >> >> I agree with you about changing the order. That will make >> the code simpler and in the second loop actually conforming C code, >> and fix the immediate problem. > > Something like that (barely tested) I have thought about this some more and I think we can do better. I don't like the on stack pid_ns array. The only reason the code calls disable_pid_allocation is that we don't handle this error. The semantics of least surprise, are that if we run out of resources while allocating something, trying again when more resources are available will make it work. So it looks like handling the error will improve the quality of the implemenation, and be a simpler, less dangerous patch. diff --git a/kernel/pid.c b/kernel/pid.c --- a/kernel/pid.c +++ b/kernel/pid.c @@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns) while (++i <= ns->level) idr_remove(&ns->idr, (pid->numbers + i)->nr); + /* On failure to allocate the first pid, reset the state */ + if (ns->pid_allocated == PIDNS_ADDING) + idr_set_cursor(&ns->idr, 0); + spin_unlock_irq(&pidmap_lock); kmem_cache_free(ns->pid_cachep, pid); Eric