Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753350AbdLUQlz (ORCPT ); Thu, 21 Dec 2017 11:41:55 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:41368 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751772AbdLUQlv (ORCPT ); Thu, 21 Dec 2017 11:41:51 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Dave Jones Cc: Alexey Dobriyan , Linus Torvalds , Al Viro , Linux Kernel , syzkaller-bugs@googlegroups.com, Gargi Sharma , Oleg Nesterov , Rik van Riel , Andrew Morton References: <20171219033926.GA26981@codemonkey.org.uk> <87lghy7eul.fsf@xmission.com> <20171219193020.GA9237@codemonkey.org.uk> <878tdy5r5t.fsf@xmission.com> <87mv2e17vz.fsf@xmission.com> <20171220052803.GA17079@codemonkey.org.uk> <871sjp1cjz.fsf@xmission.com> <20171221031606.GA4636@codemonkey.org.uk> <87po78trjm.fsf@xmission.com> <20171221142535.GA17258@codemonkey.org.uk> Date: Thu, 21 Dec 2017 10:41:22 -0600 In-Reply-To: <20171221142535.GA17258@codemonkey.org.uk> (Dave Jones's message of "Thu, 21 Dec 2017 09:25:35 -0500") Message-ID: <87vah0rq31.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eS3un-0002AR-5x;;;mid=<87vah0rq31.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18oX86CP9cZ39hxm5CD/KXTrgt8iqcha/Y= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Dave Jones X-Spam-Relay-Country: X-Spam-Timing: total 242 ms - load_scoreonly_sql: 0.09 (0.0%), signal_user_changed: 4.2 (1.7%), b_tie_ro: 3.3 (1.4%), parse: 0.79 (0.3%), extract_message_metadata: 12 (5.1%), get_uri_detail_list: 1.32 (0.5%), tests_pri_-1000: 7 (2.8%), tests_pri_-950: 1.19 (0.5%), tests_pri_-900: 1.03 (0.4%), tests_pri_-400: 22 (9.1%), check_bayes: 21 (8.6%), b_tokenize: 6 (2.6%), b_tok_get_all: 6 (2.6%), b_comp_prob: 2.1 (0.9%), b_tok_touch_all: 4.0 (1.7%), b_finish: 0.56 (0.2%), tests_pri_0: 186 (76.7%), check_dkim_signature: 0.48 (0.2%), check_dkim_adsp: 4.0 (1.6%), tests_pri_500: 5 (2.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: proc_flush_task oops X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1679 Lines: 43 Dave Jones writes: > On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > > On 12/21/17, Eric W. Biederman wrote: > > > I have stared at this code, and written some test programs and I can't > > > see what is going on. alloc_pid by design and in implementation (as far > > > as I can see) is always single threaded when allocating the first pid > > > in a pid namespace. idr_init always initialized idr_next to 0. > > > > > > So how we can get past: > > > > > > if (unlikely(is_child_reaper(pid))) { > > > if (pid_ns_prepare_proc(ns)) { > > > disable_pid_allocation(ns); > > > goto out_free; > > > } > > > } > > > > > > with proc_mnt still set to NULL is a mystery to me. > > > > > > Is there any chance the idr code doesn't always return the lowest valid > > > free number? So init gets assigned something other than 1? > > > > Well, this theory is easy to test (attached). > > I'll give this a shot and report back when I get to the office. > > > There is a "valid" way to break the code via kernel.ns_last_pid: > > unshare+write+fork but the reproducer doesn't seem to use it (or it does?) > > that sysctl is root only, so that isn't at play here. ns_capable(CAP_SYS_ADMIN) will allow root in a user namespace. So the sysctl should be fuzzable. The ns_last_pid sysctl is still not in play because it changes task_active_pid_ns (aka the pid namespace of the callers pid) not pid_ns_for_children. So it still is not in play. Every time I think of a "valid" way to break the code, I double check myself and find there are already checks in place to prevent that. Eric