Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp425686rwr; Wed, 3 May 2023 00:10:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6upwUmRwvxvIA8Uu5nCpvQEG8OaLI57YoYjIYUWdQvqaDC42VmYDL64aoXlNLpfJp5YwYt X-Received: by 2002:a05:6a20:d818:b0:f0:7b8:c77b with SMTP id iv24-20020a056a20d81800b000f007b8c77bmr21387689pzb.59.1683097838733; Wed, 03 May 2023 00:10:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683097838; cv=none; d=google.com; s=arc-20160816; b=ape3/GfVe6iBdCYnJRGML0/+8/rjI+k8hvnagaZVqTkBabME5SHoRG1uHJagLrYc7n TsekwwFF8TIhM0+VJxb1gE3EdnnzH3NqV1baj3d1uDXTZfc6BoUqLWOB4Gs1Bl7Th64k xp0sAiJMP0O6M10m5sj/qHjo+3Y1qcL4kFdaYYmb33CUqY8dFjUibpvrjQfF9MdfIFoP 4y3kLIdM1QiKthA07UNMxo777gKcsqaS8Hf0n8P+76M+Rxwg+gWxQGkE/GdT/6I+pxiN IKc7bqjSWhU67jPhqgxXPTUpQ6b7lCs4PG0M447ScoRMojSLsssW7X1YiV9cTAvuKpYV yzqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:content-language:references:cc:to:user-agent :mime-version:date:message-id:dkim-signature; bh=pHJN0MwZt1Qkyg0FwWKVsL71HOfF8cRCNsnTVzqCki0=; b=Nv+rdLWBipS2ZaBm3eU7iBsWT6HgSjwKSui/P2mSCuemwTssuUfpR+14dJ+wZPwUKU a2Tsd/t42e22ZJWHGXtRD75Un+JQylmA/XPGqsOp9qqajIGK2+RnHmGiaJdYmGCzOkT3 uCFRPgB4k8Za5fosw3zLbnRAlSp4UDCmpOAip+eic+PYerIOCmcZMLmqji0MYXGE5wBd dXDMqUDIFkD/9LWJ69o5RdHA8QIi2/ybezjw+JPWBY0xbUMgaeF4HJDpzTJAG2qO7Lwh qAB0rQv4fUoWAuQTNzbMfxU8OdxXF7cUOA9lfFvYuAAhlWAlRR5lYByLSHHkXA86uXrq GZcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="D0/wd8rn"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g16-20020aa79f10000000b0063b82909b1esi31535151pfr.0.2023.05.03.00.10.22; Wed, 03 May 2023 00:10:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="D0/wd8rn"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229674AbjECHJs (ORCPT + 99 others); Wed, 3 May 2023 03:09:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjECHJq (ORCPT ); Wed, 3 May 2023 03:09:46 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C17B2D4E for ; Wed, 3 May 2023 00:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683097738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pHJN0MwZt1Qkyg0FwWKVsL71HOfF8cRCNsnTVzqCki0=; b=D0/wd8rnEWmWlMr/XtAwPijjPgbMq3yz/a2EVkmuofSTffQGzLgpGF8mN/V2Jxn3wkdOyB 5d3gacQGLWuGX6k6KvcUmopGW4FLfyRmQHYnAiy73XmV+Kas85MRoiu6d1EQRrLTHdUvRJ rHz1AFHTLD06s9XnbCrv+pJXicnhUJs= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-639-8Z0LMvm6MdyzFLqkpr8sdQ-1; Wed, 03 May 2023 03:08:57 -0400 X-MC-Unique: 8Z0LMvm6MdyzFLqkpr8sdQ-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-30641258c9eso141610f8f.2 for ; Wed, 03 May 2023 00:08:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683097736; x=1685689736; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pHJN0MwZt1Qkyg0FwWKVsL71HOfF8cRCNsnTVzqCki0=; b=GSjolvVOAjDm1yQH8SmUc2jUFCFDAnfgktjYxPOjzOZo1+RiklabF+f+raGsaCB5Rg 4R0kh1q6bo8tNfjqoNVGGeMfyyaXL2dbMOMhaL+suYN8yoMJjkHYSLg2jVeO3w7QNFne WLDEjVYDEDuksfTKheOCf80G3SBkx0Ddy40JgQRAnwo3PhqGM8BvqjNpj3WYKremrA8w lfCFtxkScm3hrK1l6g+al/L+kZanzp5dc5NoVaGYJnfYJaf0j4+em9iWdhzpbpW8OAmZ PmaCuqqVFFfoG/dj+ccZIxRB7io3YtVBYUyacZ4P99hxgybp0xpU301ge5ORGpOcwit1 tIww== X-Gm-Message-State: AC+VfDzNjNDqSf2Qx7QNTPOElK2CMIVgvK/3i5GjPnUd7BdaPB6aNXMz X/lnYFDa9rzyusWiCSaIwFhunhVwWLbWUdwVf1jFbNTAG3jCxKtQ8CaZXaniG8Irv2LNVz1/sQl yfyCcM3J23wNeIKEQaBjVJ22C X-Received: by 2002:a5d:6410:0:b0:306:3a28:f950 with SMTP id z16-20020a5d6410000000b003063a28f950mr3061861wru.7.1683097735924; Wed, 03 May 2023 00:08:55 -0700 (PDT) X-Received: by 2002:a5d:6410:0:b0:306:3a28:f950 with SMTP id z16-20020a5d6410000000b003063a28f950mr3061838wru.7.1683097735470; Wed, 03 May 2023 00:08:55 -0700 (PDT) Received: from ?IPV6:2003:cb:c711:6a00:9109:6424:1804:a441? (p200300cbc7116a00910964241804a441.dip0.t-ipconnect.de. [2003:cb:c711:6a00:9109:6424:1804:a441]) by smtp.gmail.com with ESMTPSA id v2-20020a1cf702000000b003f32f013c3csm962580wmh.6.2023.05.03.00.08.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 May 2023 00:08:54 -0700 (PDT) Message-ID: <1b34e9a4-83c0-2f44-1457-dd8800b9287a@redhat.com> Date: Wed, 3 May 2023 09:08:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 To: Matthew Rosato , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o , Peter Xu , "Paul E . McKenney" , Christian Borntraeger References: <20d078c5-4ee6-18dc-d3a5-d76b6a68f64e@linux.ibm.com> Content-Language: en-US From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v8 0/3] mm/gup: disallow GUP writing to file-backed mappings by default In-Reply-To: <20d078c5-4ee6-18dc-d3a5-d76b6a68f64e@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03.05.23 02:31, Matthew Rosato wrote: > On 5/2/23 6:51 PM, Lorenzo Stoakes wrote: >> Writing to file-backed mappings which require folio dirty tracking using >> GUP is a fundamentally broken operation, as kernel write access to GUP >> mappings do not adhere to the semantics expected by a file system. >> >> A GUP caller uses the direct mapping to access the folio, which does not >> cause write notify to trigger, nor does it enforce that the caller marks >> the folio dirty. >> >> The problem arises when, after an initial write to the folio, writeback >> results in the folio being cleaned and then the caller, via the GUP >> interface, writes to the folio again. >> >> As a result of the use of this secondary, direct, mapping to the folio no >> write notify will occur, and if the caller does mark the folio dirty, this >> will be done so unexpectedly. >> >> For example, consider the following scenario:- >> >> 1. A folio is written to via GUP which write-faults the memory, notifying >> the file system and dirtying the folio. >> 2. Later, writeback is triggered, resulting in the folio being cleaned and >> the PTE being marked read-only. >> 3. The GUP caller writes to the folio, as it is mapped read/write via the >> direct mapping. >> 4. The GUP caller, now done with the page, unpins it and sets it dirty >> (though it does not have to). >> >> This change updates both the PUP FOLL_LONGTERM slow and fast APIs. As >> pin_user_pages_fast_only() does not exist, we can rely on a slightly >> imperfect whitelisting in the PUP-fast case and fall back to the slow case >> should this fail. >> >> v8: >> - Fixed typo writeable -> writable. >> - Fixed bug in writable_file_mapping_allowed() - must check combination of >> FOLL_PIN AND FOLL_LONGTERM not either/or. >> - Updated vma_needs_dirty_tracking() to include write/shared to account for >> MAP_PRIVATE mappings. >> - Move to open-coding the checks in folio_pin_allowed() so we can >> READ_ONCE() the mapping and avoid unexpected compiler loads. Rename to >> account for fact we now check flags here. >> - Disallow mapping == NULL or mapping & PAGE_MAPPING_FLAGS other than >> anon. Defer to slow path. >> - Perform GUP-fast check _after_ the lowest page table level is confirmed to >> be stable. >> - Updated comments and commit message for final patch as per Jason's >> suggestions. > > Tested again on s390 using QEMU with a memory backend file (on ext4) and vfio-pci -- This time both vfio_pin_pages_remote (which will call pin_user_pages_remote(flags | FOLL_LONGTERM)) and the pin_user_pages_fast(FOLL_WRITE | FOLL_LONGTERM) in kvm_s390_pci_aif_enable are being allowed (e.g. returning positive pin count) At least it's consistent now ;) And it might be working as expected ... In v7: * pin_user_pages_fast() succeeded * vfio_pin_pages_remote() failed But also in v7: * GUP-fast allows pinning (anonymous) pages in MAP_PRIVATE file mappings * Ordinary GUP allows pinning pages in MAP_PRIVATE file mappings In v8: * pin_user_pages_fast() succeeds * vfio_pin_pages_remote() succeeds But also in v8: * GUP-fast allows pinning (anonymous) pages in MAP_PRIVATE file mappings * Ordinary GUP allows pinning pages in MAP_PRIVATE file mappings I have to speculate, but ... could it be that you are using a private mapping? In QEMU, unfortunately, the default for memory-backend-file is "share=off" (private) ... for memory-backend-memfd it is "share=on" (shared). The default is stupid ... If you invoke QEMU manually, can you specify "share=on" for the memory-backend-file? I thought libvirt would always default to "share=on" for file mappings (everything else doesn't make much sense) ... but you might have to specify in addition to -- Thanks, David / dhildenb