Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp44449pxb; Tue, 12 Jan 2021 19:32:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJyPlkgzkavmMU0XQ25fTqQsJ+luRqb//uRhYF17gnZ/X2SEWZ85lRZK4YK5l4zB2GZCFL4W X-Received: by 2002:a17:906:5293:: with SMTP id c19mr96168ejm.72.1610508737952; Tue, 12 Jan 2021 19:32:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610508737; cv=none; d=google.com; s=arc-20160816; b=wMm1Qw/LUWVOEdTaLxjb5v65RfCqqsUVCqj+Bh1N7VQrknWj17Ypfrm3oZyzSEiVyI N30Z7OKTyOML12uLENfuB9du+uk9+hrgDwsBu6D5YfhQx/IBrYI4Q6+hA93FQ9V0Qu4n sCu7XWoCKmt4omYHgMtV8cAdJn1AO320UWC8HIVZZ/ppgA38dB3OvixDtgCe05BxUWOO 5wuTcGaZ94L9J38td2UBI6YqABWDxOmp0TJATYp7G+Ud74gTsiApUaYc+blsGRlEjgRg CD8jVr7lwxDjK71cGUR90Z4pDyuwC1UF8mpwn+dBiSuG63DSDB2k/anzrs1kHVYuA+WK SInw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=PxeFnrXFjCMeKRsHViqPHhs/5hTEiB1hQEf1cG4ANpU=; b=HKzX+KgOkYJyqhry2lOGyDowfaLtsh4A5ddDXlrXS96ZNf6yOSDF3+nqLIjlmBlnb0 zZAihbaIrGNbNib1wdFct7qL2txAaV+w38zW970T9po5qQwn9QAKdPZmePFBLP7gTDx7 ij0kOqcPhptn8fgd8reXw2FDut2i1kiyuwzJramtVPGh8YQ+QvAMbnu+K8QvKuetcLC3 6L+g45bU7KExUQrjyxLdX/ULjHyet9G1Twv0OQAL3jFFRE+GNRB4kz9zdBAo2w0PTbUS +XV+jimN+HEpmVy8gTrvUChEHodvbS16XYm5wLznC1vLZdVCHmcUaMguJneWYVDqiusy uW5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gNr2f5ZV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l16si309665ejr.711.2021.01.12.19.31.54; Tue, 12 Jan 2021 19:32:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gNr2f5ZV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404038AbhAMAmy (ORCPT + 99 others); Tue, 12 Jan 2021 19:42:54 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:30421 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392010AbhALXwt (ORCPT ); Tue, 12 Jan 2021 18:52:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610495482; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PxeFnrXFjCMeKRsHViqPHhs/5hTEiB1hQEf1cG4ANpU=; b=gNr2f5ZVnPw/Cs8jBkU2UcyH8jP+eGXt2369vUScQbeDHODLsPoUKbi5FEXz9JI6gGgxE5 dkV8JVjlG9p50ytgfyiz6tMMitPgrlL+p8cANlv7njPxbH9KaAjYsm//knX6uuRCYYCOhG 3tM865ERMi2wHoA4Y+NvUJyfV6Z5w+I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-89-rYGW17kXMMe_7ukUMMYRzg-1; Tue, 12 Jan 2021 18:51:20 -0500 X-MC-Unique: rYGW17kXMMe_7ukUMMYRzg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5669E1014E75; Tue, 12 Jan 2021 23:51:16 +0000 (UTC) Received: from redhat.com (ovpn-112-31.rdu2.redhat.com [10.10.112.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9943A13470; Tue, 12 Jan 2021 23:51:06 +0000 (UTC) Date: Tue, 12 Jan 2021 18:51:04 -0500 From: Jerome Glisse To: Linus Torvalds Cc: John Hubbard , Andrea Arcangeli , Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse Message-ID: <20210112235104.GA490399@redhat.com> References: <20210110004435.26382-1-aarcange@redhat.com> <45806a5a-65c2-67ce-fc92-dc8c2144d766@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 11, 2021 at 02:18:13PM -0800, Linus Torvalds wrote: > On Mon, Jan 11, 2021 at 11:19 AM Linus Torvalds > wrote: > > > > On Sun, Jan 10, 2021 at 11:27 PM John Hubbard wrote: > > > IMHO, a lot of the bits in page _refcount are still being wasted (even > > > after GUP_PIN_COUNTING_BIAS overloading), because it's unlikely that > > > there are many callers of gup/pup per page. > > > > It may be unlikely under real loads. > > > > But we've actually had overflow issues on this because rather than > > real loads you can do attack loads (ie "lots of processes, lots of > > pipe file descriptors, lots of vmsplice() operations on the same > > page". > > > > We had to literally add that conditional "try_get_page()" that > > protects against overflow.. > > Actually, what I think might be a better model is to actually > strengthen the rules even more, and get rid of GUP_PIN_COUNTING_BIAS > entirely. > > What we could do is just make a few clear rules explicit (most of > which we already basically hold to). Starting from that basic > > (a) Anonymous pages are made writable (ie COW) only when they have a > page_count() of 1 > > That very simple rule then automatically results in the corollary > > (b) a writable page in a COW mapping always starts out reachable > _only_ from the page tables > > and now we could have a couple of really simple new rules: > > (c) we never ever make a writable page in a COW mapping read-only > _unless_ it has a page_count() of 1 This breaks mprotect(R_ONLY) i do not think we want to do that. This might break security scheme for user space application which expect mprotect to make CPU mapping reads only. Maybe an alternative would be to copy page on mprotect for pages that do not have a page_count of 1 ? But that makes me uneasy toward short lived GUP (direct IO racing with a mprotect or maybe simply even page migration) versus unbound one (like RDMA). Also I want to make sure i properly understand what happens on fork() on a COW mapping for a page that has a page_count > 1 ? We copy the page instead of write protecting the page ? I believe better here would be to protect the page on the CPU but forbid child to reuse the page ie if the child ever inherit the page (parent unmapped the page for instance) it will have to make a copy and the GUP reference (taken before the fork) might linger on a page that is no longer associated with any VM. This way we keep fast fork. J?r?me