Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2075773ybb; Thu, 2 Apr 2020 12:36:15 -0700 (PDT) X-Google-Smtp-Source: APiQypJg/Nk5oTA1tbMRt4KFBUu8l6L+dtNqEo7rJEErqwgureZ/AVqT83qNp7UapMwScBRyvbAN X-Received: by 2002:aca:4e47:: with SMTP id c68mr571759oib.16.1585856175170; Thu, 02 Apr 2020 12:36:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585856175; cv=none; d=google.com; s=arc-20160816; b=qgLRC9xyAYxHqhckv7lBLTus1JJWHPMz7YFIJ9eU68LspI9o08DoDOOvlD/w854Gj7 AH3skhgMz6+HXaFLFmIX9p99tBF2JkLz6w6WYdi1iYyumRfR39piJJeEO5bHScR9E84L fpGKXr3p7OuhapLcwHIBEwDUG5PQAG4lg5F57DWhbkhhY2PI8TZazrwimUl+JTEOV5iN uiXhNyLpve1Z9wnJHrQGUtxmEClbBq36uwovpxDbArwBhMZpq+XNZlC+m6Gnxj6k+lw7 1gouPj7T0iMzQTuBw0pzs2Pk3pRTng3FEvD4xuX9Ak9KTmHq9KP0FYhIJyUeop14BIBy 0opQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=NAVPeSq9DvyCnnULUdYg1UKNlLeynSptkjCtMfU+YO4=; b=iYGSY4nLGVqnsoj0rjFV/DN/4hbcKPMJ+EjWZga2+BmBZRq6J1xxruZElLm87974M+ +y1skkPPxTMgKnGfJ5zLhpFzoZEsHsWsaG7eUwkoCY6ve+qZQZ9+uJ51u64fdrlsJFOV qb28ClIZkrBSeUayFvn2cICEex7oYfZLg0W+FGdZz0kIkwrrFf2MziqnH9v1uM2JMzRE zhTSvMGnHhuk4yBtkVn7Ejrn65/Vvu0uo80NIPgSxmggtWiPkvDlYQfszoz2VDRyemow PbsksuyjLB7NTDr7GKST1hQiiyHbHXwGocJFUKvupaS43VRDQzcltNTOUZtnyFkhZuyG G11A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PDOYB19t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l21si2811629otd.63.2020.04.02.12.36.01; Thu, 02 Apr 2020 12:36:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=PDOYB19t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389892AbgDBTFA (ORCPT + 99 others); Thu, 2 Apr 2020 15:05:00 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:42641 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731715AbgDBTFA (ORCPT ); Thu, 2 Apr 2020 15:05:00 -0400 Received: by mail-lf1-f68.google.com with SMTP id s13so3663859lfb.9 for ; Thu, 02 Apr 2020 12:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NAVPeSq9DvyCnnULUdYg1UKNlLeynSptkjCtMfU+YO4=; b=PDOYB19tl0R/qSHJMsVSKCPIn+QbVP2dcFk7SuiZ5HDheZU7jxtchi028JYyuoufiT wDCyhOEjYd/BWubosz/o8ESqCvA/7Wy+YgBhiWqwe/rCw0PoB3whed+cGtUozSBoz1bz wdDyV6eJ2YIlByIyvRqqh7IuTKmEvT/pWJjRc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NAVPeSq9DvyCnnULUdYg1UKNlLeynSptkjCtMfU+YO4=; b=TUdMIJZ9Dr1vY+UNPTWTHme7wN9E8a83vtJlRdf7fAwdyFna8lidUu6pK6pr02l8Xs Go/L15QUClT3fUoDfPfO7wgQ+A+HaQfHYx1gGMQ2X8i3goTV8D6/7RlJKZaZ62M3u01n pR9FaqGPY2QJJz9iQ8kcRyiEKLYWoerOmpBDGRu8mLJ0xhFtyI4MYiYqBzfd3V7NooAH csF8o2gF8UI8FULd48i4+TJGhDXfcvzRV4BkrJhT5X/l9i7Vzw7vvnB46nIEFof/NTKA o66QMs8N8Ih3MppGmuM2t9CvlYtlCPy6WMQTQOuBfuXU0qfhGgd+FLgg2rkleqssg7PO JjOw== X-Gm-Message-State: AGi0PuZejKlTZARWUEPl1lGPaPWj9FSsBmO4Uj2s3TFczh+D+ws/r3uH tqUs1VU9dX1/KHjGYNv0czVH2e+wGbA= X-Received: by 2002:a05:6512:3127:: with SMTP id p7mr3040761lfd.108.1585854296505; Thu, 02 Apr 2020 12:04:56 -0700 (PDT) Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com. [209.85.208.181]) by smtp.gmail.com with ESMTPSA id c4sm548019ljd.30.2020.04.02.12.04.54 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Apr 2020 12:04:55 -0700 (PDT) Received: by mail-lj1-f181.google.com with SMTP id r7so4371558ljg.13 for ; Thu, 02 Apr 2020 12:04:54 -0700 (PDT) X-Received: by 2002:a2e:8911:: with SMTP id d17mr2888851lji.16.1585854294457; Thu, 02 Apr 2020 12:04:54 -0700 (PDT) MIME-Version: 1.0 References: <87blobnq02.fsf@x220.int.ebiederm.org> In-Reply-To: <87blobnq02.fsf@x220.int.ebiederm.org> From: Linus Torvalds Date: Thu, 2 Apr 2020 12:04:38 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1 To: "Eric W. Biederman" Cc: Linux Kernel Mailing List , Bernd Edlinger , Alexey Gladkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 1, 2020 at 9:16 AM Eric W. Biederman wrote: > > The work on exec starts solving a long standing issue with exec that > it takes mutexes of blocking userspace applications, which makes exec > extremely deadlock prone. For the moment this adds a second mutex > with a narrower scope that handles all of the easy cases. Which > makes the tricky cases easy to spot. With a little luck the code to > solve those deadlocks will be ready by next merge window. So this worries me. I've pulled it, but I'm not entirely happy about some of it. For example, the "rationale" for some of the changes is This should be safe, as the credentials are only used for reading. which is just nonsensical. "Only used for reading" is immaterial, and there's no explanation for why that would matter at all. Most of the credentials are ever used for reading, and the worry about execve() is that the credentials can change, and people race with them and use the new 'suid' credentials and allow things that shouldn't be allowed. So the rationale makes no sense at all. Btw, if "this only takes it for reading" is such a big deal, why is that mutex not an rw-semaphore? The pidfd change at least has a rationale that makes sense: This should be safe, as the credentials do not change before exec_update_mutex is locked. Therefore whatever file access is possible with holding the cred_guard_mutex here is also possbile with the exec_update_mutex. so now you at least have an explanation of why that particular lock makes sense and works and is equivalent. It's still not a *great* explanation for why it would be equivalent, because cred_guard_mutex ends up not just guarding the write of the credentials, but makes it atomic wrt *other* data. That's the same problem as "I'm only reading". Locking is not about *one* value being consistent - if that was the case, then you could just do a "get refcount on the credentials, now I have a stable set of creds I can read forever". No lock needed. So locking is about *multiple* values being consistent, which is why that "I'm only reading" is not an explanation for why you can change the lock. It's also why that second one is questionable: it's a _better_ attempt at explaining things, but the point is really that cred_guard_mutex protects *other* things too. A real explanation would have absolutely *nothing* to do with "reading" or "same value of credentials". Both of those are entirely immaterial, since - as mentioned - you could just get a snapshot of the creds instead. A real explanation would be about how there is no other state that cred_guard_mutex protects that matters. See what I'm saying? This code is subtle as h*ll, and we've had bugs in it, and it has a series of tens of patches to fix them. But that also means that the explanations for the patches should take the subtleties into account, and not gloss over them with things like this. Ok, enough about the explanations. The actual _code_ is kind of odd too. For example, you have that "bprm->called_exec_mmap" flag to say "I've taken the exec_update_mutex, and need to drop it". But that flag is not set anywhere _near_ actually taking the lock. Sure, it is taken after exec_mmap() returns successfully, and that makes sense from a naming standpoint, but wouldn't it have been a _lot_ more obvious if you just set the flag when you took that lock, and instead of naming it by some magical code sequence, you named it for what it does? Again, this looks all technically correct, but it's written in a way that doesn't seem to make a lot of sense. Why is the code literally written with a magical assumption of "calling exec_mmap takes this lock, so if the flag named called_exec_mmap is set, I have to free that lock that is not named that at all". I hate conditional locking in the first place, but if it has to exist, then the conditional should be named after the lock, and the lock getting should be very very explicitly tied to it. Wouldn't it have been much clearer if you called that flag "exec_update_mutex_taken", and set it WHEN YOU TAKE IT? In fact, then you could drop the mutex_unlock(&tsk->signal->exec_update_mutex); in the error case of exec_mmap(), because now the error handling in free_bprm() would do the cleanup automatically. See what I'm saying? You've made the locking more complex and subtle than it needed to be. And since the whole point of the *new* lock is that it should replace an old lock that was really complex and subtle, that's a problem. Linus