Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90871C433EF for ; Tue, 23 Nov 2021 13:15:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237089AbhKWNTF (ORCPT ); Tue, 23 Nov 2021 08:19:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237018AbhKWNTF (ORCPT ); Tue, 23 Nov 2021 08:19:05 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46453C061574; Tue, 23 Nov 2021 05:15:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=MIME-Version:Content-Type:References: In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=hga9pTUILmsdhjR4qf2q36AKWJFzGCqCUOOYCcq2wb0=; b=CqipqGwyc0ygY9pLLshiT+cwHq 2NSqub0lGqPhR43TaXqrf7Vl4uIw37hpyDFPlv16V53elfLcyCKk94f6iYp4eaPpLBGp6WLoxEgvT TEvxsaONsOVk0Wzg2qBCNmdxava1bE/ceUh/5HZjJ7sTn/riowtJRDFvoAPBkInPEJ00+7RAyRBwq DRg6VMJbEiIx5zi9z7I0GSoDXORiHpewRl+P1TA2iKnLXjCzCpDBgHa3lyxos13NZeVHPcOmaGukK xy+ZjiOE7IizLJOt/Pq3LTf4IiYrcgo9+qJxtdKv7raMGZGqxSVZEWlDnXNdmm7O7FTJ3Pezkx/QI mc/hhLbg==; Received: from [2001:8b0:10b:1:4a2a:e3ff:fe14:8625] (helo=u3832b3a9db3152.ant.amazon.com) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mpVe6-002JLU-KS; Tue, 23 Nov 2021 13:15:39 +0000 Message-ID: <137637f96af57825eedca0c8bdb91efd049c0819.camel@infradead.org> Subject: Re: [PATCH RFC 11/39] KVM: x86/xen: evtchn signaling via eventfd From: David Woodhouse To: Joao Martins , Ankur Arora , Paul Durrant Cc: Boris Ostrovsky , Paolo Bonzini , Radim =?UTF-8?Q?Kr=C4=8Dm=C3=A1=C5=99?= , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Date: Tue, 23 Nov 2021 13:15:35 +0000 In-Reply-To: <0b9d3901-c10b-effd-6278-6afd1e95b09e@oracle.com> References: <20190220201609.28290-1-joao.m.martins@oracle.com> <20190220201609.28290-12-joao.m.martins@oracle.com> <874d1fa922cb56238676b90bbeeba930d0706500.camel@infradead.org> <18e854e2a84750c2de2d32384710132b83d84286.camel@infradead.org> <0b9d3901-c10b-effd-6278-6afd1e95b09e@oracle.com> Content-Type: multipart/signed; micalg="sha-256"; protocol="application/pkcs7-signature"; boundary="=-OGJ+GLZGwBhf127XZHIV" User-Agent: Evolution 3.36.5-0ubuntu1 MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-OGJ+GLZGwBhf127XZHIV Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable So... what we have in the kernel already is entirely sufficient to run real Xen guests so they don't know they're running under KVM =E2=80=94 with= PV netback and blkback, Xen timers, etc. We can even run the PV Xen shim itself, to host PV guests. There are some pieces we really want to move into the kernel though, IPIs being top of the list =E2=80=94 we really don't want them bouncing out= to the userspace VMM. In-kernel timers would be good, too. I've finally got the basic 2l event channel delivery working in the kernel, with a minor detour into fixing nesting UAF bugs. The first simple use case for that is routing MSIs of assigned PCI devices to Xen PIRQs. So I'm ready to stare at the patch in $SUBJECT again... On Mon, 2020-11-30 at 15:08 +0000, Joao Martins wrote: > On 11/30/20 12:55 PM, David Woodhouse wrote: > > On Mon, 2020-11-30 at 12:17 +0000, Joao Martins wrote: > > > But still, eventfd is probably unnecessary complexity when another @t= ype > > > (XEN_EVTCHN_TYPE_USER) would serve, and then just exiting to userspac= e > > > and let it route its evtchn port handling to the its own I/O handling= thread. > >=20 > > Hmm... so the benefit of the eventfd is that we can wake the I/O thread > > directly instead of bouncing out to userspace on the vCPU thread only > > for it to send a signal and return to the guest? Did you ever use that, > > and it is worth the additional in-kernel code? > >=20 > This was my own preemptive optimization to the interface -- it's not wort= h > the added code for vIRQ and IPI at this point which is what *for sure* th= e > kernel will handle. >=20 > > Is there any reason we'd want that for IPI or VIRQ event channels, or > > can it be only for INTERDOM/UNBOUND event channels which come later? > >=20 > /me nods. >=20 > No reason to keep that for IPI/vIRQ. >=20 > > I'm tempted to leave it out of the first patch, and *maybe* add it back > > in a later patch, putting it in the union alongside .virq.type. > >=20 > >=20 > > struct kvm_xen_eventfd { > > =20 > > #define XEN_EVTCHN_TYPE_VIRQ 0 > > #define XEN_EVTCHN_TYPE_IPI 1 > > __u32 type; > > __u32 port; > > __u32 vcpu; > > - __s32 fd; > > =20 > > #define KVM_XEN_EVENTFD_DEASSIGN (1 << 0) > > #define KVM_XEN_EVENTFD_UPDATE (1 << 1) > > eentfd_ __u32 flags; > > union { > > struct { > > __u8 type; > > } virq; > > + struct { > > + __s32 eventfd; > > + } interdom; /* including unbound */ > > __u32 padding[2]; > > }; > > } evtchn; > >=20 > > Does that make sense to you? > >=20 > Yeap! :) It turns out there are actually two cases for interdom event channels. As well as signalling "other domains" (i.e. userspace) they can also be set up as loopback, so sending an event on one port actually raises an event on another of that guest's same ports, a bit like an IPI. So it isn't quite as simple as "if we handle interdom in the kernel it has an eventfd".=20 It also means we need to provide both *source* and *target* ports. Which brings me to the KVM_XEN_EVENTFD_DEASSIGN/KVM_XEN_EVENTFD_UPDATE flags that you used. I'm not sure I like that API; given that we now need separate source and target ports can we handle 'deassign' just by setting the target to zero? (and eventfd to -1) ?=20 I think your original patch leaked refcounts on the eventfd in kvm_xen_eventfd_update() because it would call eventfd_ctx_fdget() then neither used nor released the result? So I'm thinking of making it just a 'set'. And in fact I'm thinking of modelling it on the irq routing table by having a single call to set/update all of them at once? Any objections? --=-OGJ+GLZGwBhf127XZHIV Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Disposition: attachment; filename="smime.p7s" Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCCECow ggUcMIIEBKADAgECAhEA4rtJSHkq7AnpxKUY8ZlYZjANBgkqhkiG9w0BAQsFADCBlzELMAkGA1UE BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgG A1UEChMRQ09NT0RPIENBIExpbWl0ZWQxPTA7BgNVBAMTNENPTU9ETyBSU0EgQ2xpZW50IEF1dGhl bnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwHhcNMTkwMTAyMDAwMDAwWhcNMjIwMTAxMjM1 OTU5WjAkMSIwIAYJKoZIhvcNAQkBFhNkd213MkBpbmZyYWRlYWQub3JnMIIBIjANBgkqhkiG9w0B AQEFAAOCAQ8AMIIBCgKCAQEAsv3wObLTCbUA7GJqKj9vHGf+Fa+tpkO+ZRVve9EpNsMsfXhvFpb8 RgL8vD+L133wK6csYoDU7zKiAo92FMUWaY1Hy6HqvVr9oevfTV3xhB5rQO1RHJoAfkvhy+wpjo7Q cXuzkOpibq2YurVStHAiGqAOMGMXhcVGqPuGhcVcVzVUjsvEzAV9Po9K2rpZ52FE4rDkpDK1pBK+ uOAyOkgIg/cD8Kugav5tyapydeWMZRJQH1vMQ6OVT24CyAn2yXm2NgTQMS1mpzStP2ioPtTnszIQ Ih7ASVzhV6csHb8Yrkx8mgllOyrt9Y2kWRRJFm/FPRNEurOeNV6lnYAXOymVJwIDAQABo4IB0zCC Ac8wHwYDVR0jBBgwFoAUgq9sjPjF/pZhfOgfPStxSF7Ei8AwHQYDVR0OBBYEFLfuNf820LvaT4AK xrGK3EKx1DE7MA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQGCCsGAQUF BwMEBggrBgEFBQcDAjBGBgNVHSAEPzA9MDsGDCsGAQQBsjEBAgEDBTArMCkGCCsGAQUFBwIBFh1o dHRwczovL3NlY3VyZS5jb21vZG8ubmV0L0NQUzBaBgNVHR8EUzBRME+gTaBLhklodHRwOi8vY3Js LmNvbW9kb2NhLmNvbS9DT01PRE9SU0FDbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWls Q0EuY3JsMIGLBggrBgEFBQcBAQR/MH0wVQYIKwYBBQUHMAKGSWh0dHA6Ly9jcnQuY29tb2RvY2Eu Y29tL0NPTU9ET1JTQUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kU2VjdXJlRW1haWxDQS5jcnQwJAYI KwYBBQUHMAGGGGh0dHA6Ly9vY3NwLmNvbW9kb2NhLmNvbTAeBgNVHREEFzAVgRNkd213MkBpbmZy YWRlYWQub3JnMA0GCSqGSIb3DQEBCwUAA4IBAQALbSykFusvvVkSIWttcEeifOGGKs7Wx2f5f45b nv2ghcxK5URjUvCnJhg+soxOMoQLG6+nbhzzb2rLTdRVGbvjZH0fOOzq0LShq0EXsqnJbbuwJhK+ PnBtqX5O23PMHutP1l88AtVN+Rb72oSvnD+dK6708JqqUx2MAFLMevrhJRXLjKb2Mm+/8XBpEw+B 7DisN4TMlLB/d55WnT9UPNHmQ+3KFL7QrTO8hYExkU849g58Dn3Nw3oCbMUgny81ocrLlB2Z5fFG Qu1AdNiBA+kg/UxzyJZpFbKfCITd5yX49bOriL692aMVDyqUvh8fP+T99PqorH4cIJP6OxSTdxKM MIIFHDCCBASgAwIBAgIRAOK7SUh5KuwJ6cSlGPGZWGYwDQYJKoZIhvcNAQELBQAwgZcxCzAJBgNV BAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMT0wOwYDVQQDEzRDT01PRE8gUlNBIENsaWVudCBBdXRo ZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBMB4XDTE5MDEwMjAwMDAwMFoXDTIyMDEwMTIz NTk1OVowJDEiMCAGCSqGSIb3DQEJARYTZHdtdzJAaW5mcmFkZWFkLm9yZzCCASIwDQYJKoZIhvcN AQEBBQADggEPADCCAQoCggEBALL98Dmy0wm1AOxiaio/bxxn/hWvraZDvmUVb3vRKTbDLH14bxaW /EYC/Lw/i9d98CunLGKA1O8yogKPdhTFFmmNR8uh6r1a/aHr301d8YQea0DtURyaAH5L4cvsKY6O 0HF7s5DqYm6tmLq1UrRwIhqgDjBjF4XFRqj7hoXFXFc1VI7LxMwFfT6PStq6WedhROKw5KQytaQS vrjgMjpICIP3A/CroGr+bcmqcnXljGUSUB9bzEOjlU9uAsgJ9sl5tjYE0DEtZqc0rT9oqD7U57My ECIewElc4VenLB2/GK5MfJoJZTsq7fWNpFkUSRZvxT0TRLqznjVepZ2AFzsplScCAwEAAaOCAdMw ggHPMB8GA1UdIwQYMBaAFIKvbIz4xf6WYXzoHz0rcUhexIvAMB0GA1UdDgQWBBS37jX/NtC72k+A CsaxitxCsdQxOzAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIwADAdBgNVHSUEFjAUBggrBgEF BQcDBAYIKwYBBQUHAwIwRgYDVR0gBD8wPTA7BgwrBgEEAbIxAQIBAwUwKzApBggrBgEFBQcCARYd aHR0cHM6Ly9zZWN1cmUuY29tb2RvLm5ldC9DUFMwWgYDVR0fBFMwUTBPoE2gS4ZJaHR0cDovL2Ny bC5jb21vZG9jYS5jb20vQ09NT0RPUlNBQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1cmVFbWFp bENBLmNybDCBiwYIKwYBBQUHAQEEfzB9MFUGCCsGAQUFBzAChklodHRwOi8vY3J0LmNvbW9kb2Nh LmNvbS9DT01PRE9SU0FDbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWlsQ0EuY3J0MCQG CCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wHgYDVR0RBBcwFYETZHdtdzJAaW5m cmFkZWFkLm9yZzANBgkqhkiG9w0BAQsFAAOCAQEAC20spBbrL71ZEiFrbXBHonzhhirO1sdn+X+O W579oIXMSuVEY1LwpyYYPrKMTjKECxuvp24c829qy03UVRm742R9Hzjs6tC0oatBF7KpyW27sCYS vj5wbal+TttzzB7rT9ZfPALVTfkW+9qEr5w/nSuu9PCaqlMdjABSzHr64SUVy4ym9jJvv/FwaRMP gew4rDeEzJSwf3eeVp0/VDzR5kPtyhS+0K0zvIWBMZFPOPYOfA59zcN6AmzFIJ8vNaHKy5QdmeXx RkLtQHTYgQPpIP1Mc8iWaRWynwiE3ecl+PWzq4i+vdmjFQ8qlL4fHz/k/fT6qKx+HCCT+jsUk3cS jDCCBeYwggPOoAMCAQICEGqb4Tg7/ytrnwHV2binUlYwDQYJKoZIhvcNAQEMBQAwgYUxCzAJBgNV BAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMSswKQYDVQQDEyJDT01PRE8gUlNBIENlcnRpZmljYXRp b24gQXV0aG9yaXR5MB4XDTEzMDExMDAwMDAwMFoXDTI4MDEwOTIzNTk1OVowgZcxCzAJBgNVBAYT AkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNV BAoTEUNPTU9ETyBDQSBMaW1pdGVkMT0wOwYDVQQDEzRDT01PRE8gUlNBIENsaWVudCBBdXRoZW50 aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC AQEAvrOeV6wodnVAFsc4A5jTxhh2IVDzJXkLTLWg0X06WD6cpzEup/Y0dtmEatrQPTRI5Or1u6zf +bGBSyD9aH95dDSmeny1nxdlYCeXIoymMv6pQHJGNcIDpFDIMypVpVSRsivlJTRENf+RKwrB6vcf WlP8dSsE3Rfywq09N0ZfxcBa39V0wsGtkGWC+eQKiz4pBZYKjrc5NOpG9qrxpZxyb4o4yNNwTqza aPpGRqXB7IMjtf7tTmU2jqPMLxFNe1VXj9XB1rHvbRikw8lBoNoSWY66nJN/VCJv5ym6Q0mdCbDK CMPybTjoNCQuelc0IAaO4nLUXk0BOSxSxt8kCvsUtQIDAQABo4IBPDCCATgwHwYDVR0jBBgwFoAU u69+Aj36pvE8hI6t7jiY7NkyMtQwHQYDVR0OBBYEFIKvbIz4xf6WYXzoHz0rcUhexIvAMA4GA1Ud DwEB/wQEAwIBhjASBgNVHRMBAf8ECDAGAQH/AgEAMBEGA1UdIAQKMAgwBgYEVR0gADBMBgNVHR8E RTBDMEGgP6A9hjtodHRwOi8vY3JsLmNvbW9kb2NhLmNvbS9DT01PRE9SU0FDZXJ0aWZpY2F0aW9u QXV0aG9yaXR5LmNybDBxBggrBgEFBQcBAQRlMGMwOwYIKwYBBQUHMAKGL2h0dHA6Ly9jcnQuY29t b2RvY2EuY29tL0NPTU9ET1JTQUFkZFRydXN0Q0EuY3J0MCQGCCsGAQUFBzABhhhodHRwOi8vb2Nz cC5jb21vZG9jYS5jb20wDQYJKoZIhvcNAQEMBQADggIBAHhcsoEoNE887l9Wzp+XVuyPomsX9vP2 SQgG1NgvNc3fQP7TcePo7EIMERoh42awGGsma65u/ITse2hKZHzT0CBxhuhb6txM1n/y78e/4ZOs 0j8CGpfb+SJA3GaBQ+394k+z3ZByWPQedXLL1OdK8aRINTsjk/H5Ns77zwbjOKkDamxlpZ4TKSDM KVmU/PUWNMKSTvtlenlxBhh7ETrN543j/Q6qqgCWgWuMAXijnRglp9fyadqGOncjZjaaSOGTTFB+ E2pvOUtY+hPebuPtTbq7vODqzCM6ryEhNhzf+enm0zlpXK7q332nXttNtjv7VFNYG+I31gnMrwfH M5tdhYF/8v5UY5g2xANPECTQdu9vWPoqNSGDt87b3gXb1AiGGaI06vzgkejL580ul+9hz9D0S0U4 jkhJiA7EuTecP/CFtR72uYRBcunwwH3fciPjviDDAI9SnC/2aPY8ydehzuZutLbZdRJ5PDEJM/1t yZR2niOYihZ+FCbtf3D9mB12D4ln9icgc7CwaxpNSCPt8i/GqK2HsOgkL3VYnwtx7cJUmpvVdZ4o gnzgXtgtdk3ShrtOS1iAN2ZBXFiRmjVzmehoMof06r1xub+85hFQzVxZx5/bRaTKTlL8YXLI8nAb R9HWdFqzcOoB/hxfEyIQpx9/s81rgzdEZOofSlZHynoSMYIDyjCCA8YCAQEwga0wgZcxCzAJBgNV BAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMT0wOwYDVQQDEzRDT01PRE8gUlNBIENsaWVudCBBdXRo ZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBAhEA4rtJSHkq7AnpxKUY8ZlYZjANBglghkgB ZQMEAgEFAKCCAe0wGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMjEx MTIzMTMxNTM1WjAvBgkqhkiG9w0BCQQxIgQgxUQRN8YZz7uBRQE+7eQyPXk7YLWOX6evAJnuCQCQ aJAwgb4GCSsGAQQBgjcQBDGBsDCBrTCBlzELMAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIg TWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQx PTA7BgNVBAMTNENPTU9ETyBSU0EgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1h aWwgQ0ECEQDiu0lIeSrsCenEpRjxmVhmMIHABgsqhkiG9w0BCRACCzGBsKCBrTCBlzELMAkGA1UE BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgG A1UEChMRQ09NT0RPIENBIExpbWl0ZWQxPTA7BgNVBAMTNENPTU9ETyBSU0EgQ2xpZW50IEF1dGhl bnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0ECEQDiu0lIeSrsCenEpRjxmVhmMA0GCSqGSIb3 DQEBAQUABIIBAFVDfG6e7I6rPB87UC3sF9v3tbAsY4sNpjg3lE0X8QyjOXXxBd1NffNGBJVSzW87 A7ENJ6M02CnM640DEMtcXx/TZfw5bgyPM96k3FS10inRPyur0G2cIqL/DR0gBci0hfyZL/JND61n GG96RNm4Vxf13JG2fPBs5puXtHC92d4r5t5FaEOOW7fRfrlT/0FhNbz9nHIkxpCCnTgElF+DGvqN a4iCGz5djs3Z8Uhxv7VApr5QX5xLnNirG0jb0bkVgrBIhvNx+55NoXBtKXfJWVG/oPi7gfggUKhq 7uxfhCgrU12ehmzyhYoQRsW27OIRSz6EjO3/2EGb9Eser18dOTQAAAAAAAA= --=-OGJ+GLZGwBhf127XZHIV--