Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4864693rdb; Tue, 12 Dec 2023 11:21:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IFoQ7i66I3PJ1qMDLblqY4MV1eqEVOpQENoJkfZ1ARhPYVXf6SFNvZxdZlkWLaqf6dyJF3K X-Received: by 2002:a17:90a:7343:b0:286:6afe:63d1 with SMTP id j3-20020a17090a734300b002866afe63d1mr3859277pjs.11.1702408877726; Tue, 12 Dec 2023 11:21:17 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702408877; cv=pass; d=google.com; s=arc-20160816; b=JzwuK72wRGD4cYQkDnNvBn9w4C3SaFPH1NNh+SWAxXBoP9N+XTXo4LGumIzngKtsUG CotMudyI5ll80Ssw5VBHWr1na/KgoqQDTZOSa6JJ7YhK30+1rW0CnY0sk9lkjaiFSGrU sIHtbBS0wPlNPCejGIjDDFeTC8yQdsl96Xd4V0ebUobFoqo3PiEMHZFhS7X1x5ZRgJJo WOUD1OeMIvuDIlGrh0F1dTIp6ykrImiWwVOSVA8iZKjoqz4jcODOnE39GOTcanmqfoie ZAmmX8OqcimLk/HH2upBP8NE2049L+KkRY0QlKWpuzesfFVNz/6gFvN0+9ljzLfniuto jjlQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:in-reply-to:content-disposition :references:message-id:subject:cc:to:from:date:dkim-signature; bh=/vvaH62ZuLn29zoShfj2213HgEBdguv0Lpw5AY6dsjs=; fh=jbzFMiWpoaIZWDSUNY5tSpdjZ8nCzoyqVtg5Sako1Xs=; b=uTCPsSpWfMweuapU6Rs6f7HP2lWB0gosLffCrdmETiXKGWnTtBxfM/QzzhbfkET6W9 w4uVa6vGOD+U9T+E+KD7sU/DGqWi7oenkGf3ZCed+q/Y/e2MNr8eTN+NhtA3+Mu0+BaA mkUYOq+hnfPv6pM7YsJDoWYDiH91aVFUwDvCOs9uKHsKiBqaeaImRt6Mix4YvcMaMi+q /ecqK8dMBsrHUDb14Mi58MV2e/0vof3Tq7ZLmIDZjvyEAco9oS5D7k51jlOqoZUdH4pH FNN/xCINKhx4Sys/f0YO9dsK4l7wiZpvf4ejkp8y0fdiktOycoErvl+9xYjfdWMPJkqQ j8OQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=Y3EiW9bO; arc=pass (i=1 spf=pass spfdomain=nvidia.com dkim=pass dkdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id bi11-20020a170902bf0b00b001d08f1a597asi8001616plb.285.2023.12.12.11.21.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 11:21:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@Nvidia.com header.s=selector2 header.b=Y3EiW9bO; arc=pass (i=1 spf=pass spfdomain=nvidia.com dkim=pass dkdomain=nvidia.com dmarc=pass fromdomain=nvidia.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=nvidia.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 0A7B480A97CE; Tue, 12 Dec 2023 11:21:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233051AbjLLTVA (ORCPT + 99 others); Tue, 12 Dec 2023 14:21:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230181AbjLLTU7 (ORCPT ); Tue, 12 Dec 2023 14:20:59 -0500 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2044.outbound.protection.outlook.com [40.107.100.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40C909F; Tue, 12 Dec 2023 11:21:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V6X859voliJfChrWaCn3TlmV9BlwHyGKQsTNFwiJSw7Gu+5bV9+2y74y13+6ZPJlaF5ERKjxUURLiEMlQldpmaYG9kPVr+tNvdCUtOVZini4f2atX2vu6nAxEo2flNydPgwn79f+kwVInrAAoTCWIUXys2BMncLx0YJKQdU9yf+6A1EtV+IS71R1JqqkCxp3wV4dD7elT63r1CiO28z9SPS9Vw3M+d/mOAXHInE6exmNTob0r2oNjL1prZM986coh+5iAa2vLF8Cx6D1n+mvT2BpsCgxNpWbmNABn8xwq1l/aXCKAjGtnQ+zAl01/4GzZQ/GoTP0MB/kNRRk6V1SEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/vvaH62ZuLn29zoShfj2213HgEBdguv0Lpw5AY6dsjs=; b=UnD24DjiMQPfRlfSBNdmFuL1205yTG00DPE06VUpzeT+qYb6yjkS2bZLtXenGKLo5mNCZ99xgxcIa9Ulo0pqiojEKrz27Gww1DqjmCFsk7FZbHIh2HmsHmpMj37EHSZ61ZJ6YXN1lXeiAVY3GT1ZcpB+jDGK3NRAYBThtcBCHVwLDnmHbrA6Qt8zHe5jYMyMLoL12F5rV2GuCHQbeaKSlxI4GKhZwzdwe0qbCjs6oY0gW3yamUiatwUJVcQwiHTL4FRJQQYk5sqYagvdpwXMheQW7ktxG0KY+ju+AcI9nbFdxZejnWuevhZSjutu+8QW1AINtrkemvf3WNN2xzEINQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/vvaH62ZuLn29zoShfj2213HgEBdguv0Lpw5AY6dsjs=; b=Y3EiW9bOWDwmzgaz34K4MSBanrRYh/3+ndtGBhqmW/fVfSoXPKPVpcVm/Dyw2prXqd1BVkiYdkA7EXyq6KKq3oOir2vHv/mTjXKfwlSBi1Qtc1AEx5MVQbVOnUMoWS5rHufInUhgkxNjP9UQkS0Sz3H8Z1qTAVUk8KUHpOV+YeMkdRYlIgNKRNhHOf7Rj4g3tmUEemC+JkZLUiAib+kJsZ+HJhCez/ttOFXvLPL5w9XmoCHXXAB5yLXtxAkSrKDtvCcLtSkTddPxLniaWZ2vFOi+IP5mV+GfMUxM1u4BjZIxNeBUfhv/Ap6h548qLDBcsuH6v+PUul7sWMDBcx7Ubw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) by MN0PR12MB6125.namprd12.prod.outlook.com (2603:10b6:208:3c7::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.32; Tue, 12 Dec 2023 19:21:01 +0000 Received: from LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::60d4:c1e3:e1aa:8f93]) by LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::60d4:c1e3:e1aa:8f93%4]) with mapi id 15.20.7091.022; Tue, 12 Dec 2023 19:21:01 +0000 Date: Tue, 12 Dec 2023 15:21:00 -0400 From: Jason Gunthorpe To: Nicolin Chen Cc: Yi Liu , "Giani, Dhaval" , Vasant Hegde , Suravee Suthikulpanit , joro@8bytes.org, alex.williamson@redhat.com, kevin.tian@intel.com, robin.murphy@arm.com, baolu.lu@linux.intel.com, cohuck@redhat.com, eric.auger@redhat.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com, shameerali.kolothum.thodi@huawei.com, lulu@redhat.com, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, zhenzhong.duan@intel.com, joao.m.martins@oracle.com, xin.zeng@intel.com, yan.y.zhao@intel.com Subject: Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2) Message-ID: <20231212192100.GP3014157@nvidia.com> References: <20231117130717.19875-1-yi.l.liu@intel.com> <20231209014726.GA2945299@nvidia.com> <20231211215738.GB3014157@nvidia.com> <20231212144421.GH3014157@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MN2PR02CA0036.namprd02.prod.outlook.com (2603:10b6:208:fc::49) To LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV2PR12MB5869:EE_|MN0PR12MB6125:EE_ X-MS-Office365-Filtering-Correlation-Id: b4c5dc20-3c2e-49e4-89b9-08dbfb477a25 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wULEmtN+I3GhVpnYMx/1C5nW0AeB70eC4Z3WVGS0uah0ASy7+xWaGOtEem6mhuJJ/c1IRPOs740NVncNofkEocpMsusIaPVbZLLUszEWyRQCK0co2tpJiw193JKqhhB402azSO2IbVHzFaEo8gkTCleQbQ801TLBmXwaxGQ0H8DpScU+at80W5WE2oRSWUPgX4BqagLa4K1FasydxHy37J0UKM/1WKWx/M+67YyU5DbExPHQb3Tlc+oOJUJwfZK+GjbuIkWV5E1UVH1SWq3BHePkOShPJWnIcnX3CaO8QX4002xFf9YKS7CKC0jfVmsRIOJ0U5OP/rD7uaRkubq2FzGzdwqIseloosnSiRXExECgwWJBBUzn8Cchd5KO2ZVStbnSIEF/TvN5p7mheYLnFZOUmzA+oDm+mjdEOhkTau17hiEM8DVPHsMdAIbRLM5Z+uUmL3pzu2FJ8c9Q7iMlkm+pRmAPF4+mhumB7TgZB3pXta5uMdEeh1ZIDzPxCbiGOpu7yVo+ji+sRtFw+DppeKQ3TvIggrJ4/sxLCexuCcuwsulLS35ht/+1DqOWiw6w X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV2PR12MB5869.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(346002)(376002)(39860400002)(396003)(366004)(230922051799003)(186009)(451199024)(1800799012)(64100799003)(41300700001)(2616005)(1076003)(26005)(83380400001)(33656002)(36756003)(38100700002)(5660300002)(8676002)(316002)(8936002)(6862004)(4326008)(2906002)(7416002)(86362001)(6512007)(6506007)(6636002)(66476007)(54906003)(66556008)(66946007)(37006003)(478600001)(6486002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?KIgobnsPTqBMGevWjo84aIsBHHRgXmjbWZuMQjHHXmhX5FmQvr+n+cDMNEs9?= =?us-ascii?Q?NGy9mC5gzUkLk/aPrid9yR7aYAWxtavfFds9wrtsw8OMxjayqS958UyvmBds?= =?us-ascii?Q?K0yfTwNzRM1C7ZZho4hUWr5hNS5QqG4kTGhmik5p6d+HOFGFOXF+dnCfYKwA?= =?us-ascii?Q?0CnGELzF0ODcxOohwTZjxBS0pongC+xklF0ffxAkmD+PSP/BIurT6IAnf9xZ?= =?us-ascii?Q?k27Oe4YNr1SzoeILUqnVaIqtYqRYMT/QZnlFtZLGmxe+Vi6UtEQz+y0SkUOi?= =?us-ascii?Q?Gq4lQXWqdhb6aO+gxo5WEAPgxQl9XJzgTtyOcO/THfnc7/FZbUsrPiayDIOm?= =?us-ascii?Q?mzImkidN4RaTVyhXLytMZj4sydr6jXZZUtootPj8VMnmJS/pSNWlAobXvfHm?= =?us-ascii?Q?qtj+4VSLMkJ1ENAcXUL1nCCPr75NfOgdpjPemyW08Uhn+XolaOyLXZaswjFx?= =?us-ascii?Q?hRma5mP159RlJpagByPyfBCClYJKLM9ajqK2UC/w9bzK22IgQ93X1ao91z6Q?= =?us-ascii?Q?fRCZRdh4mRVr50TqvjP5v+JE1tVLBBQj3tnRCem7n0OdZWT3OSMPugdZmtbc?= =?us-ascii?Q?VXUlwGbHgCiwIUTYKws7GWhfmdE38yQ2IXWhoK5cC9JAB0nGNzulWjtq6klw?= =?us-ascii?Q?B/6CXzdm587h5AFG79Z7sF+mBk+Af6q54MiKqTUH3mxBFVhvFgci7fwYAx3/?= =?us-ascii?Q?k9/yjkehgncbaY6Mq/YmbtzmHy1rvIJOpA0kEBeGaQbCcynA1faURSSBVgd0?= =?us-ascii?Q?wd3q78XU8E8BScG9iqQvhHp8XBWAVYuH+9pYDLSYe3NZBhyNrRmof91nk2AM?= =?us-ascii?Q?n8V7cH4+QY9Gje8Ijd3cPbznXL9RPi6/gabnTboYJI+dv3I7QbjDuXMSWlF1?= =?us-ascii?Q?cTOIkYOiUgVByRk30r0HiAimElsbVFtfkgyB/sKhLxFxhrN4d58Jw8Tp17yB?= =?us-ascii?Q?RNkEb38NpGEVP61CLhY7xOcToOV0tnFffqa0r0ifN6mXlHb56zQKukUGuKwd?= =?us-ascii?Q?S9+g/x1pU7SuGYfI+SmT8Rr2SbhObCI9m11sZDZVr+/Gd59PPCaJMWIXVYKK?= =?us-ascii?Q?TrWJWUq2NlHgn7An033pEeCNS1tP/3QJT02rOjpyaKL4pRktZj+tT0OSzNSz?= =?us-ascii?Q?lEdQdEw+LKaWz1PXd2Du4qID4Lt88Mlma29ZNkMSAaQ+wwI9mQ1XGB+4Dacp?= =?us-ascii?Q?f4eF+07XDkMgVHkP7xDf1e9j5Y3TkLkjtti2Mc/5SHHPSyB672frsyD7Cb0o?= =?us-ascii?Q?mNObj4gK2Q4KkJiAyHoY/ql23cPsz0fi3uFGldiJvxkE6pHGRd2VNmULzLTS?= =?us-ascii?Q?DX3D7Q/PmIZ4BKBUgPUn4SHry3Q1a60Sc4DrfAUOBxkrKelPNDou519AfL6j?= =?us-ascii?Q?n66aSamAM2nmNu7qVnRgUvzklkh9Q5/w58up0sq1saL2YknAD/qz4uQslyja?= =?us-ascii?Q?O9E38eBQ91bLHsrysfH6fNGDWRwhZqTvPjFGWZx3SZAlj7jA7B/M7SN10OE1?= =?us-ascii?Q?kU/fhF7LwykrBEHtFfRS3sP1yvaWP7BBzj/5YT3NkhFgx1dm8U3odyn8er6A?= =?us-ascii?Q?9cT54JdMgaxSE7WJANwPz2j/RRi6U9GSE4JHbVA6?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: b4c5dc20-3c2e-49e4-89b9-08dbfb477a25 X-MS-Exchange-CrossTenant-AuthSource: LV2PR12MB5869.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2023 19:21:01.5403 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 47I/lRRI5uDYpHc31hPszL2x0ZOfp8uTIKW+kB05eNeuHcYTepMDT3zaYNv/22KQ X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6125 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 12 Dec 2023 11:21:15 -0800 (PST) On Tue, Dec 12, 2023 at 11:13:37AM -0800, Nicolin Chen wrote: > On Tue, Dec 12, 2023 at 10:44:21AM -0400, Jason Gunthorpe wrote: > > On Mon, Dec 11, 2023 at 11:30:00PM -0800, Nicolin Chen wrote: > > > > > > > Could the structure just look like this? > > > > > struct iommu_dev_assign_virtual_id { > > > > > __u32 size; > > > > > __u32 dev_id; > > > > > __u32 id_type; > > > > > __u32 id; > > > > > }; > > > > > > > > It needs to take in the viommu_id also, and I'd make the id 64 bits > > > > just for good luck. > > > > > > What is viommu_id required for in this context? I thought we > > > already know which SMMU instance to issue commands via dev_id? > > > > The viommu_id would be the container that holds the xarray that maps > > the vRID to pRID > > > > Logically we could have multiple mappings per iommufd as we could have > > multiple iommu instances working here. > > I see. This is the object to hold a shared stage-2 HWPT/domain then. It could be done like that, yes. I wasn't thinking about linking the stage two so tightly but perhaps? If we can avoid putting the hwpt here that might be more general. > // iommufd_private.h > > enum iommufd_object_type { > ... > + IOMMUFD_OBJ_VIOMMU, > ... > }; > > +struct iommufd_viommu { > + struct iommufd_object obj; > + struct iommufd_hwpt_paging *hwpt; > + struct xarray devices; > +}; > > struct iommufd_hwpt_paging hwpt { > ... > + struct list_head viommu_list; > ... > }; I'd probably first try to go backwards and link the hwpt to the viommu. > struct iommufd_group { > ... > + struct iommufd_viommu *viommu; // should we attach to viommu instead of hwpt? > ... > }; No. Attach is a statement of translation so you still attach to the HWPT. > Question to finalize how we maps vRID-pRID in the xarray: > how should IOMMUFD_DEV_INVALIDATE work? The ioctl structure has > a dev_id and a list of commands that belongs to the device. So, > it forwards the struct device pointer to the driver along with > the commands. Then, doesn't the driver already know the pRID > from the dev pointer without looking up a vRID-pRID table? The first version of DEV_INVALIDATE should have no xarray. The invalidate commands are stripped of the SID and executed on the given dev_id period. VMM splits up the invalidate command list. The second version maybe we have the xarray, or maybe we just push the xarray to the eventual viommu series. > struct iommu_hwpt_alloc { > ... > + __u32 viommu_id; > }; > > +enum iommu_dev_virtual_id_type { > + IOMMU_DEV_VIRTUAL_ID_TYPE_AMD_VIOMMU_DID, // not sure how this fits the xarray in viommu obj. > + IOMMU_DEV_VIRTUAL_ID_TYPE_AMD_VIOMMU_RID, It is just DID. In both cases the ID is the index to the "STE" radix tree, whatever the driver happens to call it. > Then, I think that we also need an iommu_viommu_alloc structure > and ioctl to allocate an object, and that VMM should know if it > needs to allocate multiple viommu objects -- this probably needs > the hw_info ioctl to return a piommu_id so VMM gets the list of > piommus from the attached devices? Yes and yes > Another question, introducing the viommu obj complicates things > a lot. Do we want it to come with iommu_dev_assign_virtual_id, > or maybe put in a later series? We could stage the xarray in the > iommu_hwpt_paging struct for now, so a single-IOMMU system could > still work with that. All this would be in its own series to enable HW accelerated viommu support on ARM & AMD as we've been doing so far. I imagine it after we get the basic invalidation done > > > And should we rename the "cache_invalidate_user"? Would VT-d > > > still uses it for device cache? > > > > I think vt-d will not implement it > > Then should we "s/cache_invalidate_user/iotlb_sync_user"? I think cache_invalidate is still a fine name.. vt-d will generate ATC invalidations under that function too. Jason