Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp7044627rwr; Tue, 2 May 2023 08:46:38 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6iyVQeWSAz+J54BlAEd5W+ovjNmUiDC+bzD2aLfq46f4EZL2HAtk61ssRPLEDla8YYuVAc X-Received: by 2002:a17:902:bd8d:b0:1a9:90bc:c3c5 with SMTP id q13-20020a170902bd8d00b001a990bcc3c5mr16746699pls.62.1683042397800; Tue, 02 May 2023 08:46:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683042397; cv=none; d=google.com; s=arc-20160816; b=uRdxQuA9L5X5L9+RIjYbn4Z3KYTGZUVz+3K+CO56mYlOVDyjshRHT4U5VnWM5lnb1T x8MG5lIeDbV7ZQ2o9gEuyaztadTLPtEbUL3REVR+b2ZW0gMpU1QM3BkFQErtSfq4qYi+ vr/fsTOmxZQcxRk3plE/Q/B6EV2HRK/hJMTdvbC7+xGL7VCmijtbNmaoOwMTPzege1SA H5fHV7BsRn1sQt+gbau8QHyJPD/ukqxZgich5fvbT+VLXyx6zDVimjh4vK1uv2dKwDCq uLFajXBEvMZN2YXBnNQw38M1vd9rcSJDh2OdXaRWQ5w28DvhC+qvINVLx7IK9iqp/q5Q 0hnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=qVyMlYQrCOCJQlbTbssjDOURAbJMXiphryUuhr0Mq9s=; b=DAANaBiOWnXo2vSjqyFkgaClYlJPJ0fNGFR1wBciSfasqepFY6XPy5fHUxzxQBJ11b C5FS3MClzB14xTtzVTxqIE10FfwdQXaImRCPARFoZ+95tFs8Ev4aFKBu954y9DkAcBpU zwNjkjUn6o0zxJFVFhETRz92hXtY8PDAwehq0npIjjcdA52ousS5Oejw+nhJarP1l28z A26DEJxN+mCSAQMg3lFPOlCFx7DpjU30D7oncBf+7cfZ6MX43Hn0RHIhpeRwktMLbt0L MRsNFcJEEMFVt1Ojxq9dvpozsl5r4QO9e0z1S1ZR2Bg9MEB04xy9U7FdFkIPIqJQSRxH cg1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Ck/16+aj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w1-20020a170902e88100b001a971d32b41si23532466plg.267.2023.05.02.08.46.15; Tue, 02 May 2023 08:46:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Ck/16+aj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234456AbjEBPdz (ORCPT + 99 others); Tue, 2 May 2023 11:33:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234254AbjEBPdx (ORCPT ); Tue, 2 May 2023 11:33:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70279F1 for ; Tue, 2 May 2023 08:33:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683041584; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qVyMlYQrCOCJQlbTbssjDOURAbJMXiphryUuhr0Mq9s=; b=Ck/16+aj5YWxq9ugHUWRtNQ+CsXGyzwYHLrKotDnz6ncmW99tBCswB88XSPw+SZG0qQIdu YROPQmuvUqnQE8dRoo96txkoLKYwyeITelW07lNMTe17BWDyiRtTNUzshb9mJo9El19Dij JI0G5Q3mXJqxHPkFNMfzocpEkhp2LyY= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-408-NcrfOwCKM7a7qd-TzePbCA-1; Tue, 02 May 2023 11:33:03 -0400 X-MC-Unique: NcrfOwCKM7a7qd-TzePbCA-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-74deffa28efso19526385a.1 for ; Tue, 02 May 2023 08:33:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683041583; x=1685633583; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qVyMlYQrCOCJQlbTbssjDOURAbJMXiphryUuhr0Mq9s=; b=ZS/EBTHgusw7RUyX2BQ2yafZWvSswRiOs9E0wsTxdkfNZv/3e5Vzu16PJCEab6S72J QsVDKHw01u6/Wunbc1avFALScuqZ2luX0FDQxtBAEyV8LLxbskWtzZOiqIYdRiiLWDnN nbMBXosSqIYJuQ439Qs6EhT/VNJnd3fmfIUrUfiXJbGBaSLraM6evE1fxcJloV06uewI YIsgLQMiypaQMfbrPZgbQBuCWtvUhauOwhn1R9+VosOAr5Nf3juT77IxkTmYvqGpUGMS y9naK9n73PXtH0NCi8O/6+KpZIoGs0kYF6/LlwHlhfYWlFanwtKZOs320gsuHfIGyNrd S8Iw== X-Gm-Message-State: AC+VfDyH03gYxUqA50a0gRVpIYunVeeShXaryLv+LrQSCD8WzVCqcFIz Tsc4+RqHAduO6JJdrfH7xAN8bAAVv7yTssg6jpXsuRUyiS6quyIfi3ZnInTXhNKzQX1nLU32SKd 8y3B38QECUfo1ftL6myjwbD5M X-Received: by 2002:a05:6214:4102:b0:5ef:55d8:7164 with SMTP id kc2-20020a056214410200b005ef55d87164mr4078009qvb.5.1683041582859; Tue, 02 May 2023 08:33:02 -0700 (PDT) X-Received: by 2002:a05:6214:4102:b0:5ef:55d8:7164 with SMTP id kc2-20020a056214410200b005ef55d87164mr4077964qvb.5.1683041582540; Tue, 02 May 2023 08:33:02 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id i3-20020a05620a27c300b0074236d3a149sm9758731qkp.92.2023.05.02.08.32.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 May 2023 08:33:00 -0700 (PDT) Date: Tue, 2 May 2023 11:32:57 -0400 From: Peter Xu To: Jason Gunthorpe Cc: Matthew Rosato , David Hildenbrand , Christian Borntraeger , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Mika Penttila , Dave Chinner , Theodore Ts'o Subject: Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings Message-ID: References: <1ffbbfb7-6bca-0ab0-1a96-9ca81d5fa373@redhat.com> <3c17e07a-a7f9-18fc-fa99-fa55a5920803@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 02, 2023 at 12:20:46PM -0300, Jason Gunthorpe wrote: > On Tue, May 02, 2023 at 10:54:35AM -0400, Matthew Rosato wrote: > > On 5/2/23 10:15 AM, David Hildenbrand wrote: > > > On 02.05.23 16:04, Jason Gunthorpe wrote: > > >> On Tue, May 02, 2023 at 03:57:30PM +0200, David Hildenbrand wrote: > > >>> On 02.05.23 15:50, Jason Gunthorpe wrote: > > >>>> On Tue, May 02, 2023 at 03:47:43PM +0200, David Hildenbrand wrote: > > >>>>>> Eventually we want to implement a mechanism where we can dynamically pin in response to RPCIT. > > >>>>> > > >>>>> Okay, so IIRC we'll fail starting the domain early, that's good. And if we > > >>>>> pin all guest memory (instead of small pieces dynamically), there is little > > >>>>> existing use for file-backed RAM in such zPCI configurations (because memory > > >>>>> cannot be reclaimed either way if it's all pinned), so likely there are no > > >>>>> real existing users. > > >>>> > > >>>> Right, this is VFIO, the physical HW can't tolerate not having pinned > > >>>> memory, so something somewhere is always pinning it. > > >>>> > > >>>> Which, again, makes it weird/wrong that this KVM code is pinning it > > >>>> again :\ > > >>> > > >>> IIUC, that pinning is not for ordinary IOMMU / KVM memory access. It's for > > >>> passthrough of (adapter) interrupts. > > >>> > > >>> I have to speculate, but I guess for hardware to forward interrupts to the > > >>> VM, it has to pin the special guest memory page that will receive the > > >>> indications, to then configure (interrupt) hardware to target the interrupt > > >>> indications to that special guest page (using a host physical address). > > >> > > >> Either the emulated access is "CPU" based happening through the KVM > > >> page table so it should use mmu_notifier locking. > > >> > > >> Or it is "DMA" and should go through an IOVA through iommufd pinning > > >> and locking. > > >> > > >> There is no other ground, nothing in KVM should be inventing its own > > >> access methodology. > > > > > > I might be wrong, but this seems to be a bit different. > > > > > > It cannot tolerate page faults (needs a host physical address), so > > > memory notifiers don't really apply. (as a side note, KVM on s390x > > > does not use mmu notifiers as we know them) > > > > The host physical address is one shared between underlying firmware > > and the host kvm. Either might make changes to the referenced page > > and then issue an alert to the guest via a mechanism called GISA, > > giving impetus to the guest to look at that page and process the > > event. As you say, firmware can't tolerate the page being > > unavailable; it's expecting that once we feed it that location it's > > always available until we remove it (kvm_s390_pci_aif_disable). > > That is a CPU access delegated to the FW without any locking scheme to > make it safe with KVM :\ > > It would have been better if FW could inject it through the kvm page > tables so it has some coherency. > > Otherwise you have to call this "DMA", I think. > > How does s390 avoid mmu notifiers without having lots of problems?? It > is not really optional to hook the invalidations if you need to build > a shadow page table.. Totally no idea on s390 details, but.. per my read above, if the firmware needs to make sure the page is always available (so no way to fault it in on demand), which means a longterm pinning seems appropriate here. Then if pinned a must, there's no need for mmu notifiers (as the page will simply not be invalidated anyway)? Thanks, -- Peter Xu