Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp641465pxk; Thu, 17 Sep 2020 12:05:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxD+ngrOxU6PMDYzscIorFTAlnfXsi6JZgdg79hgIlrSyJljLVUNo0OVgkZ3T+HcX/XTe2G X-Received: by 2002:aa7:c308:: with SMTP id l8mr34445943edq.361.1600369537994; Thu, 17 Sep 2020 12:05:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600369537; cv=none; d=google.com; s=arc-20160816; b=IeLdORwyzbY+GUO53WVcwaTS3vG16LVWlk4/Ko7TRBeLx93VN/9IsO93KSdxFJa0y1 zc9TsL1FFjjZWWo/JkrT2oJwtBQ/p/TWY6Ti5+hGMKsT10h5W7PhgIsdcceQEKyaQS56 4px0rlgd+D7clhfl//x4xjZK5z10VJ7FcUa1hX988y/Qaqwita+5/KQy2d9sWv8vg3Oz dTfRzOQzTWA5RGq6U+H69iQlYxU8f/xfuklrABvxMxXAR330oaFwTxW/JK1gT8y4JPAA wpNo8x6FPO4xQ0Nax8Pq4je3K/eTN5cKdhGmBwFVDqNizFgV+gUFjz75RFst6eoUlmlh AC4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=werZBEA6aQYY57n+G7CDD2kpsyT+DQ1TWdsoiO3KsTk=; b=mpaIPLMlu5tsv4YMJOZT5Yo3S/349IIuRWXo6k14dwi4iV4zj5MjsGVzdzq97Kbrf5 tPUpcpj1pNrb81vM2NMKKyHSS1o19b06CSXeT3zMqrFle49CpOGxl6G+tdNvVXfFgMBH kYrS7fmKnLgtrmNvmEZTptoYZNAmSDE7KmWoTigxkfxL9BzN1tA9moUrfTiyCEvSBXkH UrPES3ZfEge+cpKv4WAfmQSnVeGNw22ct35i2ayIP+WXuzfZsVk7yc1/NeJbXqDvHA4x DjM0Jin+D/e8rw+awxogJA+/Iiyxe04LdEWT9qkIbfna/hPvyAq6JKOo3GI4cYF0Djl5 g6Wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aRxotHNl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r26si522829eja.484.2020.09.17.12.05.13; Thu, 17 Sep 2020 12:05:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aRxotHNl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726600AbgIQTDw (ORCPT + 99 others); Thu, 17 Sep 2020 15:03:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36396 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726598AbgIQTDl (ORCPT ); Thu, 17 Sep 2020 15:03:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600369419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=werZBEA6aQYY57n+G7CDD2kpsyT+DQ1TWdsoiO3KsTk=; b=aRxotHNlZRv+xdPlVocKzzxCi3mkgvhN5XkEeaJp2/Kts+BMTZ5EMJRO6lTt8BcD7DxiGj yPJaueqRLvx2agZgjY4OY0Y2ODOJgAYI6xM9gCJL171KiOL6Iby0m1oSSGhwPQWCx90xAq I97erkXaHGp8VUkeVtMXIw3d2Je3gRw= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-459-VgTHY9PnOpuSrGqjNwpciA-1; Thu, 17 Sep 2020 15:03:35 -0400 X-MC-Unique: VgTHY9PnOpuSrGqjNwpciA-1 Received: by mail-qv1-f70.google.com with SMTP id dj20so2120553qvb.23 for ; Thu, 17 Sep 2020 12:03:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=werZBEA6aQYY57n+G7CDD2kpsyT+DQ1TWdsoiO3KsTk=; b=onaXMrT5WTDm/zgxAFp3g87Akl1uc2mDYMl9TfxsH4MSXCB3BdZ+EjjUc2pU/CDbGr Fi3MOR6CyoUcnxhMVFDwWQjQK9stgktcfPXamYZ9S+kpvApUySsEBEaaxSCw3vxsMUir B1BDcMOnhD/DzwDRrBBM+fpi6HFr1bTI9YBH+hshr9qbLcflICWUHpTWq9XVV600QwoH X/NeAsPbxQn4neZDwJvTGp09g5T/e+8/y11VYcVdAL9BNp6zESFStp27dT6e7vr5K819 sdc2T8b7JoJbj/Neg+b0HUIDjMYDiH+zoj9qA37ALP/oddGs21J4Wd/CCrg8EtlZfJ+K aTlQ== X-Gm-Message-State: AOAM5305YWFNs+EeuOZVjTaLg//imPXj8TNoalsSfJSg9bCnFkgLjrt/ 79QCsawxSkYe5tLbGXpXyt4GrS2KCKA2I88FBMbnIhZp67J5pUFV1Md3jpT7Ouy26OeKkXT07KX AjrycToe0zRBFoJkcPcMjwrf0 X-Received: by 2002:ac8:3261:: with SMTP id y30mr29831090qta.242.1600369415120; Thu, 17 Sep 2020 12:03:35 -0700 (PDT) X-Received: by 2002:ac8:3261:: with SMTP id y30mr29831049qta.242.1600369414812; Thu, 17 Sep 2020 12:03:34 -0700 (PDT) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-11-70-53-122-15.dsl.bell.ca. [70.53.122.15]) by smtp.gmail.com with ESMTPSA id g25sm423162qto.47.2020.09.17.12.03.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Sep 2020 12:03:33 -0700 (PDT) Date: Thu, 17 Sep 2020 15:03:32 -0400 From: Peter Xu To: Linus Torvalds Cc: Jason Gunthorpe , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Jan Kara , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification Message-ID: <20200917190332.GB133226@xz-x1> References: <20200915191346.GD2949@xz-x1> <20200915193838.GN1221970@ziepe.ca> <20200915213330.GE2949@xz-x1> <20200915232238.GO1221970@ziepe.ca> <20200916174804.GC8409@ziepe.ca> <20200916184619.GB40154@xz-x1> <20200917112538.GD8409@ziepe.ca> <20200917181411.GA133226@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 17, 2020 at 11:26:01AM -0700, Linus Torvalds wrote: > On Thu, Sep 17, 2020 at 11:14 AM Peter Xu wrote: > > > > In my humble opinion, the real solution is still to use MADV_DONTFORK properly > > so we should never share the DMA pages with others when we know the fact. > > Is this all just because somebody does a fork() after doing page pinning? > > If so, I feel this should be trivially fixed in copy_one_pte(). > That's where we currently do > > /* > * If it's a COW mapping, write protect it both > * in the parent and the child > */ > if (is_cow_mapping(vm_flags) && pte_write(pte)) { > ptep_set_wrprotect(src_mm, addr, src_pte); > pte = pte_wrprotect(pte); > } > > and I feel that that is where we could just change the code to do a > COW event for pinned pages (and *not* mark the parent write protected, > since the parent page now isn't a COW page). > > Because if that's the case that Jason is hitting, then I feel that > really is the correct fix: make sure that the pinning action is > meaningful. > > As mentioned, I really think the whole (and only) point of page > pinning is that it should keep the page locked in the page tables. And > by "locked" I mean exactly that: not just present, but writable. > > And then the "we never COW a pinned page" comes not from the COW code > doing magic, but by it simply never becoming non-writable - because > the page table entry is locked! Looks reasonable to me. The fork() should be slightly slower though, since we'll need to copy the data for all the DMA buffers for each of the child processes, even if we should be pretty sure those processes won't use these pages at all. But it seems a good approach anyway if we care about the potential breakages in the userspace so the breakage is turned into perf degrades, and if any userspace noticed such perf degrade on fork() then people will probably remember to use MADV_DONTFORK properly since afaict MADV_DONTFORK can remove this extra overhead.. Another side effect I can think of is that we'll bring some uncertainty to fork() starting from when page_maybe_dma_pinned() is used, since it's sometimes bogus (hpage_pincount_available()==false) so some COWs might be triggered during fork() even when not necessary if we've got some normal pages with too many refcounts (over GUP_PIN_COUNTING_BIAS). But assuming that's not a big deal since it should be extremely rare, or is it?.. -- Peter Xu