Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3296170ybb; Mon, 6 Apr 2020 06:14:29 -0700 (PDT) X-Google-Smtp-Source: APiQypKgcRUV0geDDkLONm9+SB7nWSZt1xRygXvRjwUh2O7Iy1QsvHk+AR6IKD/db1821ul5kJTt X-Received: by 2002:a9d:7402:: with SMTP id n2mr17934283otk.262.1586178869680; Mon, 06 Apr 2020 06:14:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586178869; cv=none; d=google.com; s=arc-20160816; b=WdcYFxrVxA7hLYV8wfcDszTqTXckpLBhrAM401sXrqaopQ1Imu3zNaBWjHmrsZ7Jsd lhBVEvYgXvcgPfYPStuTZBSeF5To5p5lZLBoz3lGqH4kBCmhUpzBrKYL6WrmlbqsRsRd a2pYl3kTUUPPvWHwL72p06P6zPEOZ8M9OaKVo0KL9KkN/G6eXOyURKltv7eI8pUHgnh1 3FOBisAMfOPStOYJ6hQE1LTbe8DBOMGZpo+s2qrC2NrquLlzrFuEZmxsr5942tn2E57s MkNh8TP0LUNlFj8kdHU+38cXMvf6c0WAWINrafHMvA8AfM5M4sDy2qz12PTgxJ/jyW2G VniA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=jJJZmIHpn17vQmU+LRZOiuIIZ60UxVZ8cOMpk2FmOxI=; b=IjlKC3BYRZIDy2aEso0tF5ghtbz8a32AutB/uvohX9XvT5+hqsL57uXdJdN+oN2D2q gqCDn5zJv/lU7Vu+h2Q/hQaue68D9fC26zphWDVfN284yNJpLbj9qXgHmzsCgH6QJneh +t4YT39swU8Yv9yEb9Ixb4ULJGyly/NnM487NV3RWmwkMLAt6QXPB6dif9X+wndFGdlP HHM+902jhPeoJONjtiSpwlj5fOdW3sfEO3B4/Cx4bCk1k+HlB6OTiuk3+emR5WlJQLM3 P26NWH955JiYwg48NyAykcOU5sxqdNRezupNBiYxPQZ1j7mvDf6yjXUv3s5BTzSdj0kr buqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tKfqAI9O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w21si7331375oia.257.2020.04.06.06.14.17; Mon, 06 Apr 2020 06:14:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tKfqAI9O; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728275AbgDFNNs (ORCPT + 99 others); Mon, 6 Apr 2020 09:13:48 -0400 Received: from mail.kernel.org ([198.145.29.99]:49138 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728018AbgDFNNs (ORCPT ); Mon, 6 Apr 2020 09:13:48 -0400 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 54BE42251E; Mon, 6 Apr 2020 13:13:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1586178827; bh=jh3NSnB2OChORSdyK/R5VRnOitljpE5AdK8Et2JndVw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tKfqAI9O4Vj3pAn8TxNtWBDUVYTgADyejnD3XFp4wCzB3clQu0kIroJmdFsPZC8w7 EGy+qBIdlmSus9HuDy/xgJ9xrgtummd6uNuPmtdEeqQ6YHXlRigCc1LK8cFjL/NxX8 HI7736Mfp9xXjtqa/e3AyArAS4p6DARIY1Niev7I= Date: Mon, 6 Apr 2020 14:13:42 +0100 From: Will Deacon To: Linus Torvalds Cc: Waiman Long , "Eric W. Biederman" , Ingo Molnar , Bernd Edlinger , Linux Kernel Mailing List , Alexey Gladkov , peterz@infradead.org Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1 Message-ID: <20200406131341.GA3178@willie-the-truck> References: <87lfnda3w3.fsf@x220.int.ebiederm.org> <328f5ad3-f8b3-09b9-f2f7-b6dae0137542@redhat.com> <86aa9fc6-6ac9-a0c2-3e1d-a602ef16d873@redhat.com> <5c04cc6d-ec44-b840-071d-248ac81a0f91@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+Peter] On Fri, Apr 03, 2020 at 07:28:36PM -0700, Linus Torvalds wrote: > On Fri, Apr 3, 2020 at 7:02 PM Waiman Long wrote: > > > > So in term of priority, my current thinking is > > > > upgrading unfair reader > unfair reader > reader/writer > > > > A higher priority locker will block other lockers from acquiring the lock. > > An alternative option might be to have readers normally be 100% normal > (ie with fairness wrt writers), and not really introduce any special > "unfair reader" lock. > > Instead, all the unfairness would come into play only when the special > case - execve() - does it's special "lock for reading with intent to > upgrade". > > But when it enters that kind of "intent to upgrade" lock state, it > would not only block all subsequent writers, it would also guarantee > that all other readers can continue to go). > > So then the new rwsem operations would be > > - read_with_write_intent_lock_interruptible() > > This is the beginning of "execve()", and waits for all writers to > exit, and puts the lock into "all readers can go" mode. > > You could think of it as a "I'm queuing myself for a write lock, > but I'm allowing readers to go ahead" state. > > - read_lock_to_write_upgrade() > > This is the "now this turns into a regular write lock". It needs to > wait for all other readers to exit, of course. ... and at this point, subsequent readers queue behind the upgrader so we can't run into the usual "stream of readers prevents forward progress" issue, which was my initial worry when I started reading the thread. Makes sense. > - read_with_write_intent_unlock() > > This is the "I'm unqueuing myself, I aborted and will not become a > write lock after all" operation. > > NOTE! In this model, there may be multiple threads that do that > initial queuing thing. We only guarantee that only one of them will > get to the actual write lock stage, and the others will abort before > that happens. I do worry a bit about how much of this we can enforce, but I suppose I'll wait for the patches. For example, it would nice for read_lock_to_write_upgrade() to return -EBUSY if there was a concurrent (successful) upgrade rather than some pathological failure mode like deadlock, but that feels like it might be a pain to do. It would probably also be nice to scream if read_lock_to_write_upgrade() is called on a lock where the upgrade *did* go ahead. Maybe some of this is food for lockdep. That said, if this all ends up being spelled task_cred_*() then perhaps it doesn't matter. Will