Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7299228rwr; Tue, 2 May 2023 12:26:21 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Rp6x4o78uruTw87z7GQvAgAqJ3MOzWos26jNaS9jz+gO+i7CfZKqgY3VH8g5myBe375h8 X-Received: by 2002:a05:6a20:a121:b0:ef:dd63:1831 with SMTP id q33-20020a056a20a12100b000efdd631831mr26323337pzk.11.1683055581414; Tue, 02 May 2023 12:26:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683055581; cv=none; d=google.com; s=arc-20160816; b=cYPmEBreetC/rxyu0e6u1IsyNZZ0wx0roCudL00jgn2Qxu/AoIr/oZ0bcuKYrtTfQz FGr78N6S71iTYnr2WDDR3Yf5fCXXRX3XSBMydNouo7zGmZfDjrdfms2Foy6ObkHdVd51 A8eiur3b/qtST2/oGs8gfUJlbIKH55ghrwUXUcJPQYnch+2ctXcRHs5VCbDxXLzPVYtY uzddgsEfSlysrP6El6Sgvep73FgeI1F2yxgHgzPUTvflANdLvw/A3W+R4cnkFpwKFNAx +v4B5KICrZn63zKELsICTKoiXQNO+CbgUSnuvOh77gRpaDvBpgGvBvgKrs30GB6CC/Cj OPUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=eebKb+TsagwbVZx+TXwlGTIKwC3csstjJ30OZt/ChDs=; b=Exy+Cdk2QQoIbBuM5ZModQt3grCfuMnbRhdm9n+sAf2PpA91G1fYDa3i4T3Aj9kiYn A48cTS+KmOE09hKqywPtCMPHZpFBj4k7ud+HctTLr+lJiz6Fa8OTfTA0Vmit+D20JBcm jBoEjha1DTEHVSAy3kdV/rQ3Gy6XF2lf1bWYj3vhBSKY3eSAdvyf6pw1fINSw/VsxFfT K+WFa15nZXgh1F6SWofHCaWCXvt8JQw/ODSEnNLswho12rVsZ31uHo6LsGXAnV4xbG/h YV5xxCQKG+1Udvnew/iVY0Jy8+7GgdjgpAA8IorbcloUkNyS0yZspgV+/xRCDhKT2M5u 72bQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YIbKfESt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q11-20020a631f4b000000b00513af1fa0b5si29154214pgm.797.2023.05.02.12.26.09; Tue, 02 May 2023 12:26:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YIbKfESt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229624AbjEBTY7 (ORCPT + 99 others); Tue, 2 May 2023 15:24:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229576AbjEBTYy (ORCPT ); Tue, 2 May 2023 15:24:54 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D73B1BFD for ; Tue, 2 May 2023 12:24:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683055451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eebKb+TsagwbVZx+TXwlGTIKwC3csstjJ30OZt/ChDs=; b=YIbKfESt0wEcdvGXrNOqzKWHbsdbvRksH+0Qag2/SdKkEAY/gZDzCoZKmJXTAzJy87p9L2 keyGcYwwoS2aFW7XIvkGASaCIXXjEglqlTWOe64V5qhNaoi05N4VKVuTi22VNCGcR9/hK5 ZE/PlSFpZqraAAAHjL05c6a4lSAgcCg= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-104-QVlD-hseOrSU0-cqAD45_Q-1; Tue, 02 May 2023 15:24:10 -0400 X-MC-Unique: QVlD-hseOrSU0-cqAD45_Q-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3063a78f8a4so331119f8f.3 for ; Tue, 02 May 2023 12:24:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683055437; x=1685647437; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eebKb+TsagwbVZx+TXwlGTIKwC3csstjJ30OZt/ChDs=; b=NKQKjZec/zNNmtx8ufcalNY2uZDxr/OLibngnJ5XItEMxWUEzeMkNsRe22MamEllJW +wGiJ7apOECbXgEaNkKbxaLvZEqyVfQnqzZbi4q+lISXuUxIiTwnLP6Py0RndejumOBL /NWJZb7g23iAIjt2sezIzPMdEktcgIq/BMXv/7fPEjRSlHPPdAsZ2Or1ZXyF9hDLP7lu ojx1F43jOguoBK85oykNKGTjhkrv8rfmbx7DC9nx6yKkbBvUJtP/FZjgLK91LqSQ9GA7 tIOR1pktdcyxjoCPK8wuzVCk3D8f1sTbCZqVEVxZtCwjkErxeTupmm1ctIzgklQp445O QqoQ== X-Gm-Message-State: AC+VfDwMrSZrILZz2+W+oUosNS71LwawhDrYI5jyAgTt3azBPnR1Gifd qZAZbBqWtLWMRbrelQrgOI7lChwHUKmAzl3vnv/RVJt5/BoKgfsXfBTqjSVPt59WLlSKJBReZZv PuT+FqFpq2YM0snQmMhJgzycr X-Received: by 2002:a5d:4e08:0:b0:2fe:2775:6067 with SMTP id p8-20020a5d4e08000000b002fe27756067mr12850011wrt.28.1683055437187; Tue, 02 May 2023 12:23:57 -0700 (PDT) X-Received: by 2002:a5d:4e08:0:b0:2fe:2775:6067 with SMTP id p8-20020a5d4e08000000b002fe27756067mr12849995wrt.28.1683055436775; Tue, 02 May 2023 12:23:56 -0700 (PDT) Received: from ?IPV6:2003:cb:c700:2400:6b79:2aa:9602:7016? (p200300cbc70024006b7902aa96027016.dip0.t-ipconnect.de. [2003:cb:c700:2400:6b79:2aa:9602:7016]) by smtp.gmail.com with ESMTPSA id p8-20020a05600c358800b003f1738d0d13sm52367092wmq.1.2023.05.02.12.23.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 May 2023 12:23:56 -0700 (PDT) Message-ID: Date: Tue, 2 May 2023 21:23:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Content-Language: en-US To: Jason Gunthorpe Cc: Peter Xu , Matthew Rosato , Christian Borntraeger , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o References: <3c17e07a-a7f9-18fc-fa99-fa55a5920803@linux.ibm.com> <4fd5f74f-3739-f469-fd8a-ad0ea22ec966@redhat.com> <1f29fe90-1482-7435-96bd-687e991a4e5b@redhat.com> <6681789f-f70e-820d-a185-a17e638dfa53@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02.05.23 19:46, Jason Gunthorpe wrote: > On Tue, May 02, 2023 at 06:32:23PM +0200, David Hildenbrand wrote: >> On 02.05.23 18:19, Jason Gunthorpe wrote: >>> On Tue, May 02, 2023 at 06:12:39PM +0200, David Hildenbrand wrote: >>> >>>>> It missses the general architectural point why we have all these >>>>> shootdown mechanims in other places - plares are not supposed to make >>>>> these kinds of assumptions. When the userspace unplugs the memory from >>>>> KVM or unmaps it from VFIO it is not still being accessed by the >>>>> kernel. >>>> >>>> Yes. Like having memory in a vfio iommu v1 and doing the same (mremap, >>>> munmap, MADV_DONTNEED, ...). Which is why we disable MADV_DONTNEED (e.g., >>>> virtio-balloon) in QEMU with vfio. >>> >>> That is different, VFIO has it's own contract how it consumes the >>> memory from the MM and VFIO breaks all this stuff. >>> >>> But when you tell VFIO to unmap the memory it doesn't keep accessing >>> it in the background like this does. >> >> To me, this is similar to when QEMU (user space) triggers >> KVM_S390_ZPCIOP_DEREG_AEN, to tell KVM to disable AIF and stop using the >> page (1) When triggered by the guest explicitly (2) when resetting the VM >> (3) when resetting the virtual PCI device / configuration. >> >> Interrupt gets unregistered from HW (which stops using the page), the pages >> get unpinned. Pages get no longer used. >> >> I guess I am still missing (a) how this is fundamentally different (b) how >> it could be done differently. > > It uses an address that is already scoped within the KVM memory map > and uses KVM's gpa_to_gfn() to translate it to some pinnable page > > It is not some independent thing like VFIO, it is explicitly scoped > within the existing KVM structure and it does not follow any mutations > that are done to the gpa map through the usual KVM APIs. Right, it consumes guest physical addresses that are translated via the KVM memslots. Agreed that it does not (and possibly cannot easily) update the hardware when the KVM mapping (memslots) would ever change. I guess it's also not documented that this is not supported. > >> I'd really be happy to learn how a better approach would look like that does >> not use longterm pinnings. > > Sounds like the FW sadly needs pinnings. This is why I said it looks > like DMA. If possible it would be better to get the pinning through > VFIO, eg as a mdev > > Otherwise, it would have been cleaner if this was divorced from KVM > and took in a direct user pointer, then maybe you could make the > argument is its own thing with its own lifetime rules. (then you are > kind of making your own mdev) It would be cleaner if user space would translate the GPA to a HVA and provid that, agreed ... > > Or, perhaps, this is really part of some radical "irqfd" that we've > been on and off talking about specifically to get this area of > interrupt bypass uAPI'd properly.. Most probably. It's one of these very special cases ... thankfully: $ git grep -i longterm | grep kvm arch/s390/kvm/pci.c: npages = pin_user_pages_fast(hva, 1, FOLL_WRITE | FOLL_LONGTERM, pages); arch/s390/kvm/pci.c: npages = pin_user_pages_fast(hva, 1, FOLL_WRITE | FOLL_LONGTERM, -- Thanks, David / dhildenb