Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp5367202imb; Thu, 7 Mar 2019 14:02:46 -0800 (PST) X-Google-Smtp-Source: APXvYqzXuyIhrDQDBpDRdtvPhJE/5PGdjomG2ZimK0bB+O4LJY0akm1p7SbZqRkGf5BYGMa56oZG X-Received: by 2002:a17:902:16a4:: with SMTP id h33mr15447826plh.107.1551996166661; Thu, 07 Mar 2019 14:02:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551996166; cv=none; d=google.com; s=arc-20160816; b=XzWPBd+EQ1ELuYZHS87DEm1GGr/pel0doR2VGIUVDyBezP1IHLPmzM2fc91MjZvloA jZ7N0aZeoTLwnN9h0hsiD+eWg2gZabM8pOvCil4cE/OawvGPnv45TqLgwGHdIMXmPRRo D1U06SwL9uSL94hoTiLDlJZ1S5G4pF+V3kSuG+GFLm+dEiCpglK1890kM59H3lEh+L6M dQ6FW9UrBr7ZcJR/K3LYoWMuwqn7rv8lsGOFPfGeE0MYcoDyUB9UGFVTFDrn6rikHy/v k+uRGuTXozCQlim2ndUrOz3nszIVGM3y2QBF5stW9/qP1d0irWOmm/wGUSGhuWorDRIj P6Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:date:message-id:from :references:cc:to:subject; bh=MLCFhA0OqeTDv/SfQOhiKXXQXXc0qTu34HvnmxDs6uo=; b=EGcjPOq/0KVZvsQ55MQ9lVyB1UM651GnizFTDf3XapaiOqIP04DfJXtHRewcqStTDF mn6EApbMzmsAiQzmVUTxRJ3sHoEyapA/vU12EXCnlcLfvLzPwEa9b7yRfvmTtxGrNnYK QDur3riQkxh0HLTpL7I0Hvt4xtiAhOXV8r0iK6/YIY36enOtAtoYNpGyvzInoVyCQn69 +4o17z8+9ZFkBC5OC9XecqZj/9ht7ogntXhrgMA1p6AEd6OlLgrXsJL1JnSe87Nv4VWI KHXqsFM7xb//v56+mGfQvp0KKsMkA2VVa/4KNHhWtAyNmABItrzLDm/M+nY0KsnfJUvp Ndgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=ipPaBOPS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 75si4752749pgb.230.2019.03.07.14.02.29; Thu, 07 Mar 2019 14:02:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=ipPaBOPS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726268AbfCGWCH (ORCPT + 99 others); Thu, 7 Mar 2019 17:02:07 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:18638 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726224AbfCGWCH (ORCPT ); Thu, 7 Mar 2019 17:02:07 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 07 Mar 2019 14:01:56 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 07 Mar 2019 14:02:04 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 07 Mar 2019 14:02:04 -0800 Received: from [10.24.71.26] (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 7 Mar 2019 22:02:00 +0000 Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension To: Parav Pandit , Jakub Kicinski CC: Or Gerlitz , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "michal.lkml@markovi.net" , "davem@davemloft.net" , "gregkh@linuxfoundation.org" , Jiri Pirko , Alex Williamson References: <1551418672-12822-1-git-send-email-parav@mellanox.com> <54d846bc-cfa5-6665-efcb-a6c85e87763b@nvidia.com> <97d63e18-b151-8b35-6687-1dcf5216f08a@nvidia.com> <9dbc644f-4e4c-7119-8f99-99850fc67b73@nvidia.com> <9e9b3e39-a649-a9cd-83cc-dab74cf77ac7@nvidia.com> <965ae0c8-5e6d-20ac-4baa-22b7e8dab5e3@nvidia.com> X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: <931d95b3-7153-4bfc-353b-7226d4ea9465@nvidia.com> Date: Fri, 8 Mar 2019 03:31:56 +0530 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1551996116; bh=MLCFhA0OqeTDv/SfQOhiKXXQXXc0qTu34HvnmxDs6uo=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:MIME-Version:In-Reply-To:X-Originating-IP: X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=ipPaBOPSw/VQjLIK+W/vdD0CdOZUY58tV/cY2x/7BdBXX1zPlcjZ3JYRTxF6PfXx3 iqM1Eu7RtC0diqcNpHI+goqA0OG0eJ/GSnKz5lpaJSXz9KlnjTmnIqs3qhtgzv1TOK laQl/r9N4605epOahvbuD85EQPPUzZGC8GIgAqW1nC/vBMcOOB5Vtov6PdqeAqxRvI +l8njCJKbenocl0iyzAMXqlrVQGWuv9EhsFM7WSvsv0CtqmNwhs71zzXECG92HXuQa 32bqEb9r3DUHp1GKkLo0xmdJ9Mb0wZl3JXJh49jqkGTU4fs5fRmM+w4KFKMhQPEwsX zNYEAt6ThUFFA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/8/2019 2:51 AM, Parav Pandit wrote: > > >> -----Original Message----- >> From: Kirti Wankhede >> Sent: Thursday, March 7, 2019 3:08 PM >> To: Parav Pandit ; Jakub Kicinski >> >> Cc: Or Gerlitz ; netdev@vger.kernel.org; linux- >> kernel@vger.kernel.org; michal.lkml@markovi.net; davem@davemloft.net; >> gregkh@linuxfoundation.org; Jiri Pirko ; Alex >> Williamson >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> On 3/8/2019 2:32 AM, Parav Pandit wrote: >>> >>> >>>> -----Original Message----- >>>> From: Kirti Wankhede >>>> Sent: Thursday, March 7, 2019 2:54 PM >>>> To: Parav Pandit ; Jakub Kicinski >>>> >>>> Cc: Or Gerlitz ; netdev@vger.kernel.org; linux- >>>> kernel@vger.kernel.org; michal.lkml@markovi.net; >> davem@davemloft.net; >>>> gregkh@linuxfoundation.org; Jiri Pirko ; Alex >>>> Williamson >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>> >>>> >>>>>>> >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 >> soon. >>>>>>> Will wait for a day to receive more comments/views from Greg and >>>> others. >>>>>>> >>>>>>> As I explained in this cover-letter and discussion, First use case >>>>>>> is to create and use mdevs in the host (and not in VM). >>>>>>> Later on, I am sure once we have mdevs available, VM users will >>>>>>> likely use >>>>>> it. >>>>>>> >>>>>>> So, mlx5_core driver will have two components as starting point. >>>>>>> >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c >>>>>>> This is mdev device life cycle driver which will do, >>>>>>> mdev_register_device() >>>>>> and implements mlx5_mdev_ops. >>>>>>> >>>>>> Ok. I would suggest not use mdev.c file name, may be add device >>>>>> name, something like mlx_mdev.c or vfio_mlx.c >>>>>> >>>>> mlx5/core is coding convention is not following to prefix mlx to its >>>>> 40+ >>>> files. >>>>> >>>>> it uses actual subsystem or functionality name, such as, sriov.c >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to >>>>> rest of the 40+ files. >>>>> >>>>> >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c >>>>>>> This is mdev device driver which does mdev_register_driver() and >>>>>>> probe() creates netdev by heavily reusing existing code of the PF >> device. >>>>>>> These drivers will not be placed under drivers/vfio/mdev, because >>>>>>> this is >>>>>> not a vfio driver. >>>>>>> This is fine, right? >>>>>>> >>>>>> >>>>>> I'm not too familiar with netdev, but can you create netdev on >>>>>> open() call on mlx mdev device? Then you don't have to write mdev >>>>>> device >>>> driver. >>>>>> >>>>> Who invokes open() and release()? >>>>> I believe it is the qemu would do open(), release, read/write/mmap? >>>>> >>>>> Assuming that is the case, >>>>> I think its incorrect to create netdev in open. >>>>> Because when we want to map the mdev to VM using above mdev calls, >>>>> we >>>> actually wont be creating netdev in host. >>>>> Instead, some queues etc will be setup as part of these calls. >>>>> >>>>> By default this created mdev is bound to vfio_mdev. >>>>> And once we unbind the device from this driver, we need to bind to >>>>> mlx5 >>>> driver so that driver can create the netdev etc. >>>>> >>>>> Or did I get open() and friends call wrong? >>>>> >>>> >>>> In 'struct mdev_parent_ops' there are create() and remove(). When >>>> user creates mdev device by writing UUID to create sysfs, vendor >>>> driver's >>>> create() callback gets called. This should be used to allocate/commit >>> Yes. I am already past that stage. >>> >>>> resources from parent device and on remove() callback free those >> resources. >>>> So there is no need to bind mlx5 driver to that mdev device. >>>> >>> If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver >> won't create netdev. >> >> Doesn't need to. >> >> Create netdev from create() callback. >> > I strongly believe this is incorrect way to use create() API. > Because, > mdev is mediated device from its primary pci device. It is not a protocol device. > > It it also incorrect to tell user that vfio_mdev driver is bound to this mdev and mlx5_core driver creating netdev on top of mdev. > vfio_mdev is generic common driver. Vendor driver who want to partition its device should handle its child creation and its life cycle. What is wrong in that? Why netdev has to be created from probe() only and not from create()? > When we want to map this mdev to VM, what should create() do? Mediated device should be created before it is mapped to VM. If you look at the sequence of mdev device creation: - 'struct device' is created with bus 'mdev_bus_type' - register the device - device_register(&mdev->dev) -> which calls vfio_mdev's probe() -> common code for all vendor drivers - mdev_device_create_ops() -> calls vendor driver's create() -> this is for vendor specific allocation and initialization. This is the callback from where you can do what you want to do for mdev device creation and initialization. Why it has to be named as probe()? > We will have to shift the code from create() to mdev_device_driver()->probe() to address a use case of selectively mapping a mdev to VM or to host and implement appropriate open/close etc functions for VM case. > > So why not start correctly from the beginning? > What is wrong with current implementation which is being used and tested for multiple devices? Thanks, Kirti > >> Thanks, >> Kirti >> >>> Again, we do not want to map this mdev to a VM. >>> We want to consume it in the host where mdev is created. >>> So I am able to detach this mdev from vfio_mdev driver as usaual using >>> $ echo mdev_name > ../drivers/vfio_mdev/unbind >>> >>> Followed by binding it to mlx5_core driver. >>> >>> Below is sample output before binding it to mlx5_core driver. >>> When we bind with mlx5_core driver, that driver creates the netdev in >> host. >>> If user wants to map this mdev to VM, user won't bind to mlx5_core driver. >> instead he will bind to vfio driver and that does usual open/release/... >>> >>> >>> lrwxrwxrwx 1 root root 0 Mar 7 14:24 >>> 69ea1551-d054-46e9-974d-8edae8f0aefe -> >>> ../../../devices/pci0000:00/0000:00:02.2/0000:05:00.0/69ea1551-d054-46 >>> e9-974d-8edae8f0aefe >>> [root@sw-mtx-036 net-next]# ls -l >>> /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/ >>> total 0 >>> lrwxrwxrwx 1 root root 0 Mar 7 14:24 driver -> >> ../../../../../bus/mdev/drivers/vfio_mdev >>> lrwxrwxrwx 1 root root 0 Mar 7 14:24 iommu_group -> >> ../../../../../kernel/iommu_groups/0 >>> lrwxrwxrwx 1 root root 0 Mar 7 14:24 mdev_type -> >> ../mdev_supported_types/mlx5_core-mgmt >>> drwxr-xr-x 2 root root 0 Mar 7 14:24 power >>> --w------- 1 root root 4096 Mar 7 14:24 remove >>> lrwxrwxrwx 1 root root 0 Mar 7 14:24 subsystem -> ../../../../../bus/mdev >>> -rw-r--r-- 1 root root 4096 Mar 7 14:24 uevent >>> >>>> open/release/read/write/mmap/ioctl are regular file operations for >>>> that mdev device. >>>> >>> >>>> Thanks, >>>> Kirti >>>