Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3761146ybb; Mon, 6 Apr 2020 15:21:29 -0700 (PDT) X-Google-Smtp-Source: APiQypItJH3hRoQGmhQFMzfV4xrrL1K+N2TBmn4NA1vPmm3UXmrTEgFeXtaqijyttHxZhGn355iu X-Received: by 2002:a9d:2002:: with SMTP id n2mr20507217ota.127.1586211689516; Mon, 06 Apr 2020 15:21:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586211689; cv=none; d=google.com; s=arc-20160816; b=vEFGipsIMDlyjTGg4HZlWio+ViBXj49KBUmbK4R4irBIxuvikCRHe2qruCydjol+Sl 4QMnV3ZFY8UrnH7eN9Wc7O1icE+ykztKx5N3ogDBB1ZMYFLiBE90GV3yN6jHKg0Zh5os D/XnZJ0NTFnvDidEMGNAVBlmwP0qo7A01K/oVgylGYOtVybo3U1xIREM96nRNIOosEEI 7YcxECvXNkcRru0YWE8bE/dGQbkPCQD4wxY3Z/PwYaohJCWzSZrMM86WbIkAQEAswXF2 AAH5iOZFM8WJvSZ/Euei44w8XPU8i5ekGx/pzzr/0f4vMw37i5ZYl3xxz8m/Xq2i5jic EGkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from; bh=6elV+BjRpJBkaMTQhSzWyZwkuPInnbDognNmtYQqGkE=; b=yO3Ut+5fypxYoa81Uky1zQz3NCdYyUkU6jRWr2ahwfq9siKe4D4RM+BLTGxmLMuy+4 CDbl9yE6hmsyd/LUBH2z939otip7lgoh6Cz34lbUEZbuiZM5X6aWwVU/IwSwRvcCH2zJ IO7tGEl6JucPsLn/vgAXR2ryQaa1hIMwtenOJYnO09tVDXOkS1D9wVyJGJPFa/MXnsJh V/gNqJUVSg1mvM4OUiK/SmXmkeHxwOFN/CamkZYpZCdKAl4sDYXkY0ivtADx4RhutdxD Dd5egmOpYlspFk1joO4b48CPOcfdeN7t5X5Bvzoymc95QWBgYmEyYKQcNQzsPcxAomYF N9zw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y5si7918753oiy.150.2020.04.06.15.21.17; Mon, 06 Apr 2020 15:21:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726307AbgDFWUT (ORCPT + 99 others); Mon, 6 Apr 2020 18:20:19 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:35886 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725933AbgDFWUS (ORCPT ); Mon, 6 Apr 2020 18:20:18 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jLa6L-0007a2-Ah; Mon, 06 Apr 2020 16:20:17 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jLa6J-0004ZD-QZ; Mon, 06 Apr 2020 16:20:17 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Waiman Long , Ingo Molnar , Will Deacon , Bernd Edlinger , Linux Kernel Mailing List , Alexey Gladkov References: <87blobnq02.fsf@x220.int.ebiederm.org> <87lfnda3w3.fsf@x220.int.ebiederm.org> Date: Mon, 06 Apr 2020 17:17:27 -0500 In-Reply-To: (Linus Torvalds's message of "Fri, 3 Apr 2020 12:26:51 -0700") Message-ID: <87blo45keg.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jLa6J-0004ZD-QZ;;;mid=<87blo45keg.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/Hw6BC0R8d+nTg64OR2vW+KvY7sHH/xm8= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on sa02.xmission.com X-Spam-Level: ** X-Spam-Status: No, score=2.1 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,T_TooManySym_01, XMSubMetaSxObfu_03,XMSubMetaSx_00 autolearn=disabled version=3.4.2 X-Spam-Virus: No X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa02 1397; Body=1 Fuz1=1 Fuz2=1] * 1.2 XMSubMetaSxObfu_03 Obfuscated Sexy Noun-People * 1.0 XMSubMetaSx_00 1+ Sexy Words * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa02 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Linus Torvalds X-Spam-Relay-Country: X-Spam-Timing: total 1091 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 4.6 (0.4%), b_tie_ro: 3.0 (0.3%), parse: 1.34 (0.1%), extract_message_metadata: 16 (1.5%), get_uri_detail_list: 4.2 (0.4%), tests_pri_-1000: 19 (1.8%), tests_pri_-950: 1.00 (0.1%), tests_pri_-900: 0.81 (0.1%), tests_pri_-90: 222 (20.3%), check_bayes: 219 (20.1%), b_tokenize: 10 (0.9%), b_tok_get_all: 85 (7.8%), b_comp_prob: 3.9 (0.4%), b_tok_touch_all: 117 (10.7%), b_finish: 0.72 (0.1%), tests_pri_0: 813 (74.4%), check_dkim_signature: 0.45 (0.0%), check_dkim_adsp: 2.3 (0.2%), poll_dns_idle: 0.62 (0.1%), tests_pri_10: 2.8 (0.3%), tests_pri_500: 8 (0.7%), rewrite_mail: 0.00 (0.0%) Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1 X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds writes: > [ For Waiman & co - the problem is that the current cred_guard_mutex > is horrendous and has problems with execve() deadlocking against > various users. We've had this bug before, there's a new one, it's just > nasty ] > > On Thu, Apr 2, 2020 at 4:04 PM Eric W. Biederman wrote: >> >> That is not the direction I intend to take either. >> >> I was hoping I could put off replying to this thread for a bit because >> I only managed to get 4 hours of sleep last night and I am not as alert >> to technical details as I would like to be. > > Hmm.. So I've been looking at this cred_guard_mutex, and I wonder... > > This is a bit hand-wavy, because I haven't walker through all the > paths, but could we perhaps work around a lot of the problems a > different way., namely: > > - make the "cred_guard_mutex" an rwsem-like thing instead of being a mutex. > > - make the ptrace_attach() case get it for writing - not because > ptrace changes the creds, but because ptrace changes 'task->ptrace' > and depends on dumpability etc. > > - change the *name* of that damn thing. Not because it's now > rwsem'ish rather than a mutex, but because it was never really about > just "creds". It was about creds+ptrace+dumpable flags etc. > > - make all the ones that read the creds to just take it for reading > (IOW, the cases that were basically switched over to > exec_update_mutex). > > - and finally: make "execve()" take it just for reading too, but > introduce a "upgrade to write" at the very end (when it actually is > all done and then finally changes the creds and dumpability) > > Wouldn't that solve all problems? We wouldn't get deadlocks wrt > execve(), simply because execve() doesn't need it to be writable, and > the things execve() does and can deadlock all only want readability. > > But hear me out, because the above is fundamentally broken in a couple > of ways, so let me address that brokenness before you tell me I'm a > complete nincompoop and an idiot. > > I'm including some locking people here because of these issues, so > that they can maybe verify my thinking. > > (a) our rwsem's are fair > > So the whole "execve takes it for reading, so now others can take > it for reading too without deadlocks" is simply not true - if you use > the existing rwsem. > > Because a concurrent (blocked) writer will then block other > readers for fairness reasons, and holding it for reading doesn't > guarantee that others can get it for reading. > > So clearly, the above doesn't even *fix* the deadlocks - unless > we have an unfair mode (or just a special lock for just this that is > not our standard rwsem, but a special unfair one). > > So I'm suggesting we use a special unfair rwsem here (we can make > a simple spinlock-based one - it doesn't need to be as clever or > optimized as the real rwsems are) > > (b) similarly, our rwsem's don't actually have a "upgrade from read > to write", because that's also a fundamentally deadlocky operation. > > Again, that's true. Except execve() is special, and we know > there's only _one_ execve() at a time that will complete, since we're > serializing them. So for this particular use, "upgrade to write" would > be possible without the general-case deadlock issues. > > (c) I didn't think things through, and even with these special > semantics, my idea is complete garbage > > Ok, this may well be true. > > Anyway, the advantage of this (if it works) is that it would allow us > to go back to the _really_ simple original model of just taking this > lock for reading at the beginning of execve(), and not worrying so > much about complex nesting or very complex rules for exactly when we > got the lock and error handling. > > The final part when we actually update the credentials and dumpability > and stuff in execve() is actually fairly simple. So the "upgrade to a > write lock" phase doesn't worry me too much. It's the interaction > with all the previous parts (which happen with it held just for > reading) that tend to be the nastier ones. > > And ptrace_attach() really is special, and I think it would be the > only one that really needs that write lock. > > The disadvantage, of course, is that it would require that > special-case lock semantic, and I might also be missing some thing > that makes it not work anyway. > > Comments? Am I just dreaming of a simpler model without my medications > again? Withough reading everything through at least. * There is also security_setprocattr which needs ptrace and nnp state not to change it needs to set something that at least selinux's cred calculations needs to remain constant (like nnp and ptrace). Which means one thread calling security_setprocattr and another thread calling exec can deadlock in de_thread. * Even with your lock and just the ptrace case I can deadlock. Ptracer: Thread A Tread B ptrace_attach A exec ptrace_attach B uprade R to RW ---------------------- DEADLOCKED ------------------------- Those are the first two cases I have thought of. There are probably more. But fundamentally the only reason we need this information stable before the point of no return is so that we can return a nice error code to the process calling exec. Instead of terminating the process with SIGSEGV. These are for the most part unlikely scenarios or people would have been complaining much more loudly about deadlock. So my plan is to perform the relevant calculations effectively twice. Once just before the point of no return, and give a graceful return code if necessary and possible. Once just afer the point of no return, and SIGSEGV if necessary. Of course this all only applies to LSMs that refuse to continue under NNP or ptrace without changing the cred. Linux without those LSMs enabled will just continue with the original credentials. So I don't think we will noticably be sacraficing the quality of the user experience with my plan. In the worst case a deadlock will become a SIGSEGV killing the execing program. Eric p.s. Yes we can do better than a mutex that makes everything mutually exclusive. I am just starting there for simplicity, and to see if we need anything better. Unfortuantely too many things are changing simultaneously for rcu to cover all of the read side cases.