Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp88804rwd; Wed, 14 Jun 2023 12:37:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4R39aW8Lo2NoRkVfvgHTiwwO9kL94Kc3N+dTzT86Iw/mMHjD3tov1rezHB3siBmngmZkH9 X-Received: by 2002:a05:6808:1b06:b0:39a:abd1:36f5 with SMTP id bx6-20020a0568081b0600b0039aabd136f5mr14040216oib.31.1686771449872; Wed, 14 Jun 2023 12:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686771449; cv=none; d=google.com; s=arc-20160816; b=U1mTP5r1fh9MXIViGuS9+GXgcaJaspXzJMn0n4PycTAlEgXU7uIKXaGxooj5PViSvL 9cyWqwCbtflF3JWIqmTOo0VHKnTlbEY8+S0EU+AqF8ecHb900dqtPMdv3XTdt3eQrnV1 lC1U/8pOIEnCo9nkUWVK/BONnaA0X8HIU+asOyfPm3rirb2QdjjwC6CWEvYPOHZRart1 dw1yGxwiasBbCDmpKgLZ0L/dKEoHaBZmvBZwdKogbnprid2TCDmRVy2CPMGxMX2B6DFN TPkrQaU0gIrOEoyi78HUnub3WN4EWiUjrgOT67S5eAYf64Y9GN1eoUaK8WAOrEOnGZQU cThw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=fd3UDbr2Nlwpbzx+9+5vMnQCL/akEJaw/zMcUhpGS2E=; b=YAXfD9vJ5jKbGU8wMyYT3EyahjAhcVQ8AX4sHtzwZjpay/GAHtZDBmAkuN8QiSx3pP fds1vrFvo9NPDt9UHbn73VioYZApfHBIABJaIJacsqAJO/3pl0xy3whEQLHp3gkCv9c1 IIB/+LE7ZYjalJvHPxwoepOI8oPhoU0B7eFwkXujuSeFfQA0gUbAeL7LVxz/URqjngoo xc40W/ynXZMqzE8VT1tUrX9lNo54cR9Yw2qF98blKLVc+zdKKQOJsRo3kt3/TpfyvmNX Q7co1+eiLQz3ohYu3vFz3MHMZxbyEGRYTq8ogKUKjD4XnMx525yoRO0LRB4g7nlPte+y BH+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZibbrmyS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j69-20020a638048000000b0054a9a35520bsi752663pgd.347.2023.06.14.12.37.18; Wed, 14 Jun 2023 12:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZibbrmyS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236045AbjFNTVv (ORCPT + 99 others); Wed, 14 Jun 2023 15:21:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbjFNTVn (ORCPT ); Wed, 14 Jun 2023 15:21:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E9352682 for ; Wed, 14 Jun 2023 12:20:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686770454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fd3UDbr2Nlwpbzx+9+5vMnQCL/akEJaw/zMcUhpGS2E=; b=ZibbrmySBPKt41oh/S8uGR95Bop1+qo0YoDlN0/4BzeISsEQULpZymxz1Fm5+++lJdpRaV qOBvO45Zd3JKwis0LO86feRDleIiMNAzp1KzCopdetr87+RG5jNfq4/aKU+wuyBYUAc4/P Lpdoygk8bRNFz2Q9QyVx9Q48VWdnzgU= Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-551-nRrmgRycNNSEQ89Btfic0Q-1; Wed, 14 Jun 2023 15:20:53 -0400 X-MC-Unique: nRrmgRycNNSEQ89Btfic0Q-1 Received: by mail-io1-f71.google.com with SMTP id ca18e2360f4ac-770222340cfso852921939f.3 for ; Wed, 14 Jun 2023 12:20:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686770452; x=1689362452; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fd3UDbr2Nlwpbzx+9+5vMnQCL/akEJaw/zMcUhpGS2E=; b=O0YZMLWUCDu1KfNXarmCQT2y7GW+R22QQKpW7m3zKk3ao8mKoDsxd6T9HizFz8HHmN T4ekpX83ivw47AyZjSBdi1wDJHhnokOIN8H1qcRk8bB9Q98cmtBUsqhzkwJ9tsv9aymm cv4kfjtSGy1anMNJUlj44rwPSyc4DanUKN2ccZTpK5cuhAbGCjcdAMSzR/lEJeboL85c 0yvVkfP4H3gBv/FbByf184Ulq7UbbzCEalyKn5u0pOrFUV9yZp+JiQskXM5UWQ5muv5f jEOzO2NR7K2Nvuhh5oyq/kQMNmyQUpMJNf56nJ2rudgZ084deapo1+/7UsEb1m9CKBtV TxxQ== X-Gm-Message-State: AC+VfDxGEsbT2PW0G99tn16xQWXRMRxult45VJln3SqTfMNRJ1ZEPfEH HmMy9cQ95aPEMjd9Dt3i9IYXqk9tuKJCk1dZoN0IhSDqvpR+YXta6bw1z9goQ0YqPP3B9iSfD2v F8LfLaAY/puTzC5jtCtT19ejD X-Received: by 2002:a92:d34f:0:b0:340:998a:e966 with SMTP id a15-20020a92d34f000000b00340998ae966mr2855913ilh.17.1686770452308; Wed, 14 Jun 2023 12:20:52 -0700 (PDT) X-Received: by 2002:a92:d34f:0:b0:340:998a:e966 with SMTP id a15-20020a92d34f000000b00340998ae966mr2855893ilh.17.1686770451956; Wed, 14 Jun 2023 12:20:51 -0700 (PDT) Received: from redhat.com ([38.15.36.239]) by smtp.gmail.com with ESMTPSA id k8-20020a02c768000000b00416650ba62esm5153404jao.76.2023.06.14.12.20.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Jun 2023 12:20:50 -0700 (PDT) Date: Wed, 14 Jun 2023 13:20:47 -0600 From: Alex Williamson To: Jason Gunthorpe Cc: ankita@nvidia.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 1/1] vfio/nvgpu: Add vfio pci variant module for grace hopper Message-ID: <20230614132047.519abe95.alex.williamson@redhat.com> In-Reply-To: References: <20230606110510.0f87952c.alex.williamson@redhat.com> <20230606121348.670229ff.alex.williamson@redhat.com> <20230606153057.4cbc36a0.alex.williamson@redhat.com> <20230607122303.5d25c973.alex.williamson@redhat.com> <20230613132402.2765b6cb.alex.williamson@redhat.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 14 Jun 2023 14:55:28 -0300 Jason Gunthorpe wrote: > On Tue, Jun 13, 2023 at 01:24:02PM -0600, Alex Williamson wrote: > > > I'd even forgotten about the sparse mmap solution here, that's even > > better than trying to do something clever with the mmap. > > Okay, Ankit please try this, it sounds good > > > > You may be right, I think this patch is trying to make things > > > automatic for user, but a dedicated machine type might make more > > > sense. > > > > Juan and I discussed this with Ankit last week, there are a lot of down > > sides with another machine type, but the automatic manipulation of the > > machine is still problematic. Another option we have is to use QEMU > > command line options for each feature. For example we already support > > NUMA VM configurations and loading command line ACPI tables, hopefully > > also associating devices to nodes. Do we end up with just a > > configuration spec for the VM to satisfy the in-guest drivers? > > Potentially guest driver requirements may changes over time, so a hard > > coded recipe built-in to QEMU might not be the best solution anyway. > > Let's have those discussions settle then, I know there are a few > different ideas here people are looking at. > > > I think NVIDIA might have an interest in enabling Atomic Ops support in > > VMs as well, so please comment in the series thread if there are > > concerns here or if anyone can definitively says that another guest OS > > we might care about does cache root port capability bits. Thanks, > > I expect we do - I haven't heard of atomic ops specifically yet > though. > > We just did a big exercise on relaxed ordering which is similarly > troubled. > > Here we deciced to just not use the VM's config space at all. The > device itself knows if it can do relaxed ordering and it just reports > this directly to the driver. > > In many ways I would prefer to do the same for atomic.. I haven't > checked fully but I think we do this anyhow as you can see mlx5 simply > tries to enable PCI atomics but doesn't appear to do anything with the > result of it. I expect the actual success/fail is looped back through > the device interface itself. > > So, for mlx5, it probably already works in most real cases. Passing a > PF might not work I guess. > > It is not a satisfying answer from a VMM design perspective.. > > Some qemu command line to say what root ports with what atomic caps to > create seems like a reasonable thing to do. The referenced QEMU proposal puts a number of restrictions on automatically flipping bits on the root port, ex. as exposed in the VM the endpoint must be directly connected to a root port (avoiding complications around atomic ops routing support) and must be a single function device at devfn 0x0 (avoiding heterogeneous paths on the host). It also tests the root port bits to make sure they aren't otherwise set in order to be compatible with some future root port device switch to enable fixed atomic completer support. This tries to balance the idea that we want to support device switches for these sort of fine grained VM configuration, but there are also isolated cases which can be automatically enabled that can potentially cover the vast majority of use cases. OTOH, trying to do something automatic for 'AtomicOps Routing Support' looks far more challenging and we probably would rely on command line device switches for that. Regarding relaxed ordering, are we talking about the 'No RO-enabled PR-PR Passing' bit in the devcap2 register? Unfortunately that bit is labeled HwInit, so we don't have the same leniency towards modifying it at runtime as we do for the immediately preceding AtomicOps completer support bits. In my interpretation, that bit also only seems to be reporting whether a specific reordering is implemented, so more important in determining expected performance than functionality(?) In general, I think we put driver writers in an awkward place if they start trying things that are clearly not supported as reported by hardware capability bits. Error handling can be pretty fragile, especially when value-add firmware thinks it knows best. Thanks, Alex