Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp1480327rdb; Fri, 1 Dec 2023 19:38:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBHSXtF9kzmK/9/u5uMVUqKBYENhNkdOnFALj4Ik+v31rYGEKMdhdtBx+xbtvjdadfwZ8c X-Received: by 2002:a92:cd4a:0:b0:35c:df90:a7d with SMTP id v10-20020a92cd4a000000b0035cdf900a7dmr745528ilq.25.1701488314798; Fri, 01 Dec 2023 19:38:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701488314; cv=none; d=google.com; s=arc-20160816; b=KfhdM45Tn92vujk5+nhWj8ZUcNjNY6WrxU3B8fUECE4/T1HOSZcPeLN+0R1l/QC95b f9INrza3dSMGKnjEeOsKuBpDIFB+5fx/09/16i+tS4VcJ9F3/YcL49olo86xDCc5Ih1v 65MVy8orMsTVnSXOCSPlXv068K7R4AmLC2azf1yZFJYOj62dxwM/lDkVKx8dJGmP/PeA WpQR0mqEB+xYF8mw8gLIESfrBJPU3eYvAd+/EiPhSabMy/OpszfhuowZXUU+wShABYRU 8ouKfQTB8IA4GjBWJg6RxqOelKJ1KLIGpSH1D6bYPQ9E7zTra60j8DHxm2rDmu2gudbO zxaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-id:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=rfSG+D2mVXoQn7b98MTcMGqpDeaDSmEjyj2hrCjpF8Q=; fh=JqH50gnQqyD7Rg1fH5C3mGfm8b94qtaRKnwSYOntWRs=; b=J1nqNYYU/2APEP6R8ARJ9KucmAvrri3xWsXm3g6cGirVv/dKHvlngX0aTpyqxSVBT1 JgRIbpqpKDIQephYQoGYQpKKuEa4fXhW75Zd3NgItyP1uY7wJOkcWqgTL65+gdV2yI5d hmrQ2bRLar6XoYi9G62jR9mtk1ig68hTBaEM5jV8xh4OevuQrSstQfyMeE8sVyAauaTs EMHRxbaPlFKZEfpXVEdknY6Kpai5EdSKXpzx6hyjgGoQYmX1fFi9LhSJLAIArZuwG1zR h8SnJ/JjTpdCMe19iu3S/wg2hmGgFDNF/nEFRb/0n51w+tkGooIvCt4pIrB3bO2gCAiQ qSlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oiQndPNG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id c12-20020a170902c1cc00b001c0cb378f04si1050526plc.335.2023.12.01.19.38.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 19:38:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oiQndPNG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id D3E7C83CEF1D; Fri, 1 Dec 2023 19:38:29 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231409AbjLBDhy (ORCPT + 99 others); Fri, 1 Dec 2023 22:37:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjLBDhx (ORCPT ); Fri, 1 Dec 2023 22:37:53 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F5121B2 for ; Fri, 1 Dec 2023 19:37:59 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D87EFC433C7; Sat, 2 Dec 2023 03:37:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701488279; bh=NarBwsDGf+urCkR+0KsD6NtO/pJjqSgUz4O5m4o8FXo=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=oiQndPNGfYaJ8uCgjjmlyJLqQ7AVNalarc+HVopht2uw+VBhjq9W4AVOdQgnF5Cwt RszsMvfslq5zMX47pvgUJIl20b/wYQz4LMr7VR6waH+WRNKghSlzm4+5RkNLuVqw6V dS01VAIPXzSKDnUDF+ufr1GbldmBmCJaoqaU/XBY9bIlWlqmrt5jDFWgkhhANCvIUh /ul7L7hrFzaapLfSZ2qM3S/gPYyMOdzT/F3qrcXsjxrjy1FBVDw/gQVM8QcEsJ5Ww+ oFDstsugb2RJVa0mgqX2n5v+0ukvXsvJUqw9zkdksPihZm5vfmoVJ7PWI3YgX27AIV fv31RAN9zZpXw== Date: Fri, 1 Dec 2023 19:37:55 -0800 (PST) From: Stefano Stabellini X-X-Sender: sstabellini@ubuntu-linux-20-04-desktop To: =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= cc: Stefano Stabellini , Jiqian Chen , Juergen Gross , Oleksandr Tyshchenko , Thomas Gleixner , Boris Ostrovsky , "Rafael J . Wysocki" , Len Brown , Bjorn Helgaas , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Stefano Stabellini , Alex Deucher , Christian Koenig , Stewart Hildebrand , Xenia Ragiadakou , Honglei Huang , Julia Zhang , Huang Rui Subject: Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0 In-Reply-To: Message-ID: References: <20231124103123.3263471-1-Jiqian.Chen@amd.com> <20231124103123.3263471-3-Jiqian.Chen@amd.com> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="8323329-579185399-1701486027=:110490" Content-ID: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 01 Dec 2023 19:38:30 -0800 (PST) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-579185399-1701486027=:110490 Content-Type: text/plain; CHARSET=UTF-8 Content-Transfer-Encoding: 8BIT Content-ID: On Fri, 1 Dec 2023, Roger Pau Monné wrote: > On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote: > > On Thu, 30 Nov 2023, Roger Pau Monné wrote: > > > On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote: > > > > On Fri, 24 Nov 2023, Jiqian Chen wrote: > > > > > This patch is to solve two problems we encountered when we try to > > > > > passthrough a device to hvm domU base on Xen PVH dom0. > > > > > > > > > > First, hvm guest will alloc a pirq and irq for a passthrough device > > > > > by using gsi, before that, the gsi must first has a mapping in dom0, > > > > > see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call > > > > > into Xen and check whether dom0 has the mapping. See > > > > > XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH > > > > > dom0 and it return irq is 0, and then return -EPERM. > > > > > This is because the passthrough device doesn't do PHYSDEVOP_map_pirq > > > > > when thay are enabled. > > > > > > > > > > Second, in PVH dom0, the gsi of a passthrough device doesn't get > > > > > registered, but gsi must be configured for it to be able to be > > > > > mapped into a domU. > > > > > > > > > > After searching codes, we can find map_pirq and register_gsi will be > > > > > done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when > > > > > the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems > > > > > can be conclude to that the gsi of a passthrough device doesn't be > > > > > unmasked. > > > > > > > > > > To solve the unmaske problem, this patch call the unmask_irq when we > > > > > assign a device to be passthrough. So that the gsi can get registered > > > > > and mapped in PVH dom0. > > > > > > > > > > > > Roger, this seems to be more of a Xen issue than a Linux issue. Why do > > > > we need the unmask check in Xen? Couldn't we just do: > > > > > > > > > > > > diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c > > > > index 4e40d3609a..df262a4a18 100644 > > > > --- a/xen/arch/x86/hvm/vioapic.c > > > > +++ b/xen/arch/x86/hvm/vioapic.c > > > > @@ -287,7 +287,7 @@ static void vioapic_write_redirent( > > > > hvm_dpci_eoi(d, gsi); > > > > } > > > > > > > > - if ( is_hardware_domain(d) && unmasked ) > > > > + if ( is_hardware_domain(d) ) > > > > { > > > > /* > > > > * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock > > > > > > There are some issues with this approach. > > > > > > mp_register_gsi() will only setup the trigger and polarity of the > > > IO-APIC pin once, so we do so once the guest unmask the pin in order > > > to assert that the configuration is the intended one. A guest is > > > allowed to write all kind of nonsense stuff to the IO-APIC RTE, but > > > that doesn't take effect unless the pin is unmasked. > > > > > > Overall the question would be whether we have any guarantees that > > > the hardware domain has properly configured the pin, even if it's not > > > using it itself (as it hasn't been unmasked). > > > > > > IIRC PCI legacy interrupts are level triggered and low polarity, so we > > > could configure any pins that are not setup at bind time? > > > > That could work. > > > > Another idea is to move only the call to allocate_and_map_gsi_pirq at > > bind time? That might be enough to pass a pirq_access_permitted check. > > Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq > just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the > parameter would be a GSI instead of a previously mapped IRQ). Such > difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that > route I would recommend that we instead introduce a new dmop that has > this syntax regardless of the domain type it's called from. Looking at the code it is certainly a bit confusing. My point was that we don't need to wait until polarity and trigger are set appropriately to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen should be able to figure out that Dom0 is permitted pirq access. So the idea was to move the call to allocate_and_map_gsi_pirq() earlier somewhere because allocate_and_map_gsi_pirq doesn't require trigger or polarity to be configured to work. But the suggestion of doing it a "bind time" (meaning: XEN_DOMCTL_bind_pt_irq) was a bad idea. But maybe we can find another location, maybe within xen/arch/x86/hvm/vioapic.c, to call allocate_and_map_gsi_pirq() before trigger and polarity are set and before the interrupt is unmasked. Then we change the implementation of vioapic_hwdom_map_gsi to skip the call to allocate_and_map_gsi_pirq, because by the time vioapic_hwdom_map_gsi we assume that allocate_and_map_gsi_pirq had already been done. I am not familiar with vioapic.c but to give you an idea of what I was thinking: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 4e40d3609a..16d56fe851 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -189,14 +189,6 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, return ret; } - ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq); - if ( ret ) - { - gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n", - gsi, ret); - return ret; - } - pcidevs_lock(); ret = pt_irq_create_bind(currd, &pt_irq_bind); if ( ret ) @@ -287,6 +279,17 @@ static void vioapic_write_redirent( hvm_dpci_eoi(d, gsi); } + if ( is_hardware_domain(d) ) + { + int pirq = gsi, ret; + ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq); + if ( ret ) + { + gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n", + gsi, ret); + return ret; + } + } if ( is_hardware_domain(d) && unmasked ) { /* --8323329-579185399-1701486027=:110490--