Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp423648pxv; Thu, 24 Jun 2021 10:53:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyV+JhT3h5eRBM1X+/61XQQz5QVE6TmhYrw7Otg010D5PAZxA27SjFPkkYinmPlcs5NyO+r X-Received: by 2002:a6b:b4d3:: with SMTP id d202mr2464767iof.26.1624557206450; Thu, 24 Jun 2021 10:53:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624557206; cv=none; d=google.com; s=arc-20160816; b=DgH3tz6tnT2hD2dRXr8jTsj1gZa8QhNtB5i58yP1qYx5QNMSE+QKIXQWM5Wv9GhM5m TFVqRxwiHgVBud3tNcIYGOKOHj27EQ7fnlwRRRd1SBaY6BfF9vP3B9GuxC70WTrGAbPj /Th/p6lqcmIA5sLPKgalgCQDC2dM+NszWhybjj4ryVHgD80hx9L9YEPqTUZZwkxmZaEO McMPAgfPBex4S21iS4M4ySlipE+Lah2/XSrIkAndY6tO4vMeYuZh6UUF1hDfT6x0QYbE t4pYM+K1AljaxRZJVt7a8rfmZ0Pwgs2p+Nc0SSjElqyPxgVSNmvH7FzhryM2Qj+47vsd TYag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=JRyTmO/okuVJfaJ7dk8yyjIjINkUQGFobyRYmuUJ7W4=; b=L4wHHQMvxP8kIXobXq6/F8FuuoUJ8fdhWC9lP9t83crIi2hd8Oyt/99Achrfhcu/9H 7JxdVdIjtrQpUTBfTBuwWGKA/AWe0tRQWX190uyzjTYpGC75Ioj4ZXTf8ichRgmpB+sZ 6hNA/dhDC08PnpQ66yT1hYgQYHUwb3ynG8MH++YoS5ZW5jCiwkfV0ysX31zH2Kpd0kMj KhiuCN+b3z3P8OzKDBjRYrukV9penNJ2WrTL1P/e/s/cG5HesR3qtiiX0yPY4EaUog+3 k31k6gUiCIVoElmzqXUrXnZsaLvck32zdJr1y+N7xbFlB++uoDl4lSe3wv0Lhrjdm7Vo 5uNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YuyAIucp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 10si3451343ilj.9.2021.06.24.10.53.13; Thu, 24 Jun 2021 10:53:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YuyAIucp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232163AbhFXRzB (ORCPT + 99 others); Thu, 24 Jun 2021 13:55:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:25128 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231407AbhFXRzB (ORCPT ); Thu, 24 Jun 2021 13:55:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624557161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JRyTmO/okuVJfaJ7dk8yyjIjINkUQGFobyRYmuUJ7W4=; b=YuyAIucpXjPxanHwMGDX4DCy6kSBUgxVfkdowsyRtB4M4LIH0YTZrD9VM6ObmkeZpOzm+K i8k6uwbL2WH7vGdsyuDpTU63b4xdcGEIFYa4YDSJTpSthaYkC/t3b9z1GVbcxb+9JYtQLR 8cXPbEIHBulER2RCEGEKDft96BlSdFU= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-213-KPO89TcdNx-WS9KdWXpW9g-1; Thu, 24 Jun 2021 13:52:40 -0400 X-MC-Unique: KPO89TcdNx-WS9KdWXpW9g-1 Received: by mail-oi1-f198.google.com with SMTP id r3-20020acac1030000b02902068458b0f9so4301001oif.5 for ; Thu, 24 Jun 2021 10:52:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JRyTmO/okuVJfaJ7dk8yyjIjINkUQGFobyRYmuUJ7W4=; b=tfznNkbrWAHCMD2wn5a6lR6YsAJTGW4TG/dewlBR7lxiAexQVn3m9Lrh/N12Dq9xnt cPVD1xhnFtcLe4SqHzExMYxRUILDuRafVPi1zZx1/b4htjDqZD8H2NoHm+FFvqAX+CD4 HSc+N6FePFKrqAdsn0bE+D4jNlvhA35XJ0ACZwq40JbL8RMutxmHZ31lVYj3UV6wyVr6 1YajKW0awFFLX0D4k0GfDEPi2h/CNDDOQwlrPrg0Rpjz6z/f69peszFxLZgPd8Giccfh JogTJVbgSAa5DJQtyQYdcWg4L2D7fzRHOsLJhCwZhGLVW9uGzWAdAKpbSaa4kd3s/dob pwpg== X-Gm-Message-State: AOAM530w4FjhhB4cjWI8oxzXzAF6z46XChEA/iaNC/zoabcfVPNNvYS4 azjSeHo8Hc9mfPG91MXnzzw5N0SbnJ1f94R7PbG9AsZxPiIomD/67cvTDdXHcVqe+Pl87Jxded6 Fc9cDPSDQGzUayVKjw75OxBFR X-Received: by 2002:aca:dac5:: with SMTP id r188mr8247757oig.100.1624557159310; Thu, 24 Jun 2021 10:52:39 -0700 (PDT) X-Received: by 2002:aca:dac5:: with SMTP id r188mr8247743oig.100.1624557159087; Thu, 24 Jun 2021 10:52:39 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id y17sm779762oih.54.2021.06.24.10.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Jun 2021 10:52:38 -0700 (PDT) Date: Thu, 24 Jun 2021 11:52:36 -0600 From: Alex Williamson To: "Tian, Kevin" Cc: Thomas Gleixner , Jason Gunthorpe , "Dey, Megha" , "Raj, Ashok" , "Pan, Jacob jun" , "Jiang, Dave" , "Liu, Yi L" , "Lu, Baolu" , "Williams, Dan J" , "Luck, Tony" , "Kumar, Sanjay K" , LKML , KVM , Kirti Wankhede , "Peter Zijlstra" , Marc Zyngier , "Bjorn Helgaas" Subject: Re: Virtualizing MSI-X on IMS via VFIO Message-ID: <20210624115236.309d6b48.alex.williamson@redhat.com> In-Reply-To: References: <20210622131217.76b28f6f.alex.williamson@redhat.com> <87o8bxcuxv.ffs@nanos.tec.linutronix.de> <20210623091935.3ab3e378.alex.williamson@redhat.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 24 Jun 2021 00:00:37 +0000 "Tian, Kevin" wrote: > > From: Alex Williamson > > Sent: Wednesday, June 23, 2021 11:20 PM > > > [...] > > > So the only downside today of allocating more MSI-X vectors than > > > necessary is memory consumption for the irq descriptors. > > > > As above, this is a QEMU policy of essentially trying to be a good > > citizen and allocate only what we can infer the guest is using. What's > > a good way for QEMU, or any userspace, to know it's running on a host > > where vector exhaustion is not an issue? > > In my proposal a new command (VFIO_DEVICE_ALLOC_IRQS) is > introduced to separate allocation from enabling. The availability > of this command could be the indicator whether vector > exhaustion is not an issue now? We have options with existing interfaces if we want to provide some programmatic means through vfio to hint to userspace about vector usage. Otherwise I don't see much justification for this new ioctl, it can largely be done with SET_IRQS, or certainly with extensions of flags. > > > So no, we are not going to proliferate this complete ignorance of how > > > MSI-X actually works and just cram another "feature" into code which is > > > known to be incorrect. > > > > Some of the issues of virtualizing MSI-X are unsolvable without > > creating a new paravirtual interface, but obviously we want to work > > with existing drivers and unmodified guests, so that's not an option. > > > > To work with what we've got, the vfio API describes the limitation of > > the host interfaces via the VFIO_IRQ_INFO_NORESIZE flag. QEMU then > > makes a choice in an attempt to better reflect what we can infer of the > > guest programming of the device to incrementally enable vectors. We > > It's a surprise to me that Qemu even doesn't look at this flag today after > searching its code... There are no examples of the alternative, it would be dead, untested code. The flag exists in the uAPI to indicate a limitation of the underlying implementation that has always existed. Should we remove that limitation, as Thomas now sees as possible, then QEMU wouldn't need to make a choice whether to fully allocate the vector table or incrementally tear-down and re-init. > > could a) work to provide host kernel interfaces that allow us to remove > > that noresize flag and b) decide whether QEMU's usage policy can be > > improved on kernels where vector exhaustion is no longer an issue. > > Thomas can help confirm but looks noresize limitation is still there. > b) makes more sense since Thomas thinks vector exhaustion is not > an issue now (except one minor open about irte). As noted elsewhere, a) is indeed a limitation of the host interfaces, not implicit to MSI-X. Obviously we can look at different QEMU policies, including generating hardware faults to the VM on exhaustion or unmask failures, interrupt injection or better inferring potential vector usage. Thanks, Alex