Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1867670rwd; Mon, 15 May 2023 04:21:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6XGTwUrt0JFi9OmyygCIpi8A5xQLaWVdxg8UhMigsGkufkHv/FX+jzoECm3eA8+lDTVPFg X-Received: by 2002:a05:6a00:1747:b0:644:ad29:fd5a with SMTP id j7-20020a056a00174700b00644ad29fd5amr40018778pfc.21.1684149683817; Mon, 15 May 2023 04:21:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684149683; cv=none; d=google.com; s=arc-20160816; b=TJp2vu5Cu/KXfXRYXSEhXIxh2SRuADnaLrVb0QSRppI3Yb+DW+cR4PQJBiEoMUyGnC WwZhcenpQh4dSE1vEMmJqmJYbOVkjAvJdW0EFN26Dg9jVNLe6L6j/c1NZwHei98q0zTt ChLJmvRB0rFL+KD57+7iuEDbDv9wkCbBWcivEdZOcb7jiaiO0jObYuY0X8OuWwnJHfwB Ic7XiRmFIDx/rplzvWhYJeRd+IkC89LsHMRsTRH9jdgRUAOPdoXXj2D+ORXtiBWhxmxP qT2eYkT67zwmv+9Bumrp13wfuH4PJ5GgHasNEkuL4MZY6t4XcSx3UZ/nRgvem/q3tj3L z91w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=rF6DoYwQKSW9ePIWjXKkjidVpwyfYdyJLZqH287Jnn+W0qBGm8jCgtaYf0tXSf/Mk5 xR2yGPWCm+wKHgyxXPDv55ltbhXDJZINrmhXrKZcvBQgPFa+c8J70QfiL0v+Lex+XYdX 8fhgrVMM9m6Pzp7yxgV4n1fLMOb6Z7HrbZJaEVuZjIgAveYSF+kSSXaX/pMKBW6H8Nkl KBYRTKrScjbaGTP+wIb4y308KphzfasRzyNJw8P7Em3XQsHjfCUh5r05Tkq9hMqfAH7Y 5uoK2Y36gh8sihrkI+kOXqXGiouX6l588eMPb1T2qsSOhxfB9Zw5QQy5xbKFM4im07KH +mlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=TDrCEU8F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w68-20020a626247000000b0063b61b49a29si10679782pfb.251.2023.05.15.04.21.12; Mon, 15 May 2023 04:21:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=TDrCEU8F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241176AbjEOLQ2 (ORCPT + 99 others); Mon, 15 May 2023 07:16:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237581AbjEOLQ1 (ORCPT ); Mon, 15 May 2023 07:16:27 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D7551725; Mon, 15 May 2023 04:16:25 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-965e4be7541so2192559866b.1; Mon, 15 May 2023 04:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684149384; x=1686741384; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=TDrCEU8FuW3hDweQ5bwdycEiJjcobOp3IpmpMjXkU7AaTDL+TdsNmJoEXoMEQMruop GIScUD01CMAgF/1mC0SD1gv77x0WiGT8QjnQz3Ma4Wy//I36ne/NL9Vd6iTB4U/wlThq 8bcQj7xmcYxzSNb8+GF2+O52JpZVE3ZVfzvvS9BdKzWQ/h6Hi1zStdC9/TvhYq1OvFiZ yiDCT/F8jkXfyg5LlWU+wYh3CYKCnwZ0lwdXgeg2isHG83ddUTkj5mfHqhUVBQxKGfCR yBPB/bdHdisdWue0ctSmOENHVHpaiuH7A0vEFCI7gdMD9Hhe0ViXPRv77UOveUz2UpaD egdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684149384; x=1686741384; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=knPSlIoi8JJlfvyB9MqyozwzNx+4xRNl0fuueJdn78w=; b=YOZPGPHbI3+bc1mWlznQm85z5ZBg6mNgZtk75WKVHb6y1cFRPyD02mt1nunikmi9Qb Cu2Faa1TjOonCkL2ZXpIrv0b9xHd5s1dJOsWhgMFXwSAEhEhNj6x6dUnbDhHyGklw3vY bKiJWXwYbTQYRTz9mmEMri6GFwyjx2kOP3Pr1eUAO3CwT6xsGss3GSbVN+trZsja1S3H F+9blB7rXOLGh0aqMVyL+jMf3yebULx8e+2N4wKL5pBJNkx0n9yhTa2PO3Q06P6Y9Gef BRgCEPESzvdHYH1nuvAWz+zWe/mfmHsq+IdmJoB/8Ac4lhx3Yfype/4ag/GoRpmGvj6b vTPg== X-Gm-Message-State: AC+VfDwhWweuxnl3W1sSZgth6W5xyqjq7IvMR9ZFrFKdSrExAZ1gkTBB Uh73sSk7rDiuPmkODXy8zgY= X-Received: by 2002:a17:906:9c83:b0:94f:449e:75db with SMTP id fj3-20020a1709069c8300b0094f449e75dbmr32016828ejc.52.1684149383725; Mon, 15 May 2023 04:16:23 -0700 (PDT) Received: from localhost ([31.94.21.70]) by smtp.gmail.com with ESMTPSA id wi21-20020a170906fd5500b0094edbe5c7ddsm9460583ejb.38.2023.05.15.04.16.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 May 2023 04:16:22 -0700 (PDT) Date: Mon, 15 May 2023 12:16:21 +0100 From: Lorenzo Stoakes To: "Kirill A . Shutemov" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , Pavel Begunkov , Mika Penttila , David Hildenbrand , Dave Chinner , Theodore Ts'o , Peter Xu , Matthew Rosato , "Paul E . McKenney" , Christian Borntraeger Subject: Re: [PATCH v9 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Message-ID: <7f6dbe36-88f2-468e-83c1-c97e666d8317@lucifer.local> References: <20230515110315.uqifqgqkzcrrrubv@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230515110315.uqifqgqkzcrrrubv@box.shutemov.name> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 15, 2023 at 02:03:15PM +0300, Kirill A . Shutemov wrote: > On Thu, May 04, 2023 at 10:27:50PM +0100, Lorenzo Stoakes wrote: > > Writing to file-backed mappings which require folio dirty tracking using > > GUP is a fundamentally broken operation, as kernel write access to GUP > > mappings do not adhere to the semantics expected by a file system. > > > > A GUP caller uses the direct mapping to access the folio, which does not > > cause write notify to trigger, nor does it enforce that the caller marks > > the folio dirty. > > Okay, problem is clear and the patchset look good to me. But I'm worried > breaking existing users. > > Do we expect the change to be visible to real world users? If yes, are we > okay to break them? The general consensus at the moment is that there is no entirely reasonable usage of this case and you're already running the riks of a kernel oops if you do this, so it's already broken. > > One thing that came to mind is KVM with "qemu -object memory-backend-file,share=on..." > It is mostly used for pmem emulation. > > Do we have plan B? Yes, we can make it opt-in or opt-out via a FOLL_FLAG. This would be easy to implement in the event of any issues arising. > > Just a random/crazy/broken idea: > > - Allow folio_mkclean() (and folio_clear_dirty_for_io()) to fail, > indicating that the page cannot be cleared because it is pinned; > > - Introduce a new vm_operations_struct::mkclean() that would be called by > page_vma_mkclean_one() before clearing the range and can fail; > > - On GUP, create an in-kernel fake VMA that represents the file, but with > custom vm_ops. The VMA registered in rmap to get notified on > folio_mkclean() and fail it because of GUP. > > - folio_clear_dirty_for_io() callers will handle the new failure as > indication that the page can be written back but will stay dirty and > fs-specific data that is associated with the page writeback cannot be > freed. > > I'm sure the idea is broken on many levels (I have never looked closely at > the writeback path). But maybe it is good enough as conversation started? > Yeah there are definitely a few ideas down this road that might be possible, I am not sure how a filesystem can be expected to cope or this to be reasonably used without dirty/writeback though because you'll just not track anything or I guess you mean the mapping would be read-only but somehow stay dirty? I also had ideas along these lines of e.g. having a special vmalloc mode which mimics the correct wrprotect settings + does the right thing, but of course that does nothing to help DMA writing to a GUP-pinned page. Though if the issue is at the point of the kernel marking the page dirty unexpectedly, perhaps we can just invoke the mkwrite() _there_ before marking dirty? There are probably some sycnhronisation issues there too. Jason will have some thoughts on this I'm sure. I guess the key question here is - is it actually feasible for this to work at all? Once we establish that, the rest are details :) > -- > Kiryl Shutsemau / Kirill A. Shutemov