Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp6296200imb; Fri, 8 Mar 2019 14:08:57 -0800 (PST) X-Google-Smtp-Source: APXvYqzVEZVhiy9xOUT5Lq1BeQpBzGmus1++sTSeeOOpGSzEBnRCv57ADrfHpwxjmD61G6C8NTsC X-Received: by 2002:aa7:8849:: with SMTP id k9mr20691668pfo.149.1552082937476; Fri, 08 Mar 2019 14:08:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1552082937; cv=none; d=google.com; s=arc-20160816; b=W/J/gEFd6EfapnqDOGkJF38U246MKQ/jTPvcHPE9bKsNI7TEvdhCYxrIi6UEyZu5qc fF20zDdsdVk3uFycEq4Ao9KwLOnUoRzB0OBzu3jkPDAFAFJNe5q+PsNcPdeWnBxj2nLu gSiMh/RpZbb+G6xw0p8M+ByNFlTFSUgeYFigneyjQsfjpMCVtbX4OrGERmu2xJrKZvNW VwADtYwHaJr8SGJI+coR+Vulflz7my8bnFoekOJdF4bW5Bnwt7WSDtp0rNM7XvIIwS3Z Awnrqe6E57KcgWcXhL3IJnSymy98OSa6fvmK81x5j6gydFRvQ08c80QTSGb5PqJZunXM aITg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=4rE+/vtUF6+DxwlCbMg41KWW3QrYh5DIPsV++kk4YMk=; b=cFrTZ0LdqrJZdBVHUjcPEEkzK5p6+4bXp3dAfdzxtESvBAt84LfQmSPObhq6qHofhp p8kjxfPi88tQe0i5/JhpatTxD9PRPtzUMS7zAR8KF4g+lOwlHFUxi2yCbgw+M49zU0jo QUqn+1GPtOxOwxS9CS5pxeO4agiuK0mjRhugG8jfLsYjsWWgPbb0TuDpQ/bmYt1N+G2E 3Ou5RiPye1YX31SdKsRO4G2ruyEx/TqmIg8D8NjGO5QMVdcTKELbep8MuCelFHSf+29K 79PKtXhGlcHn142BjQbiZgsDS9TtlHDEq647eiPJAEuB8QYSxsOcEZ1ICRbCySgI9z+8 pwHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mellanox.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e65si7234092pgc.339.2019.03.08.14.08.41; Fri, 08 Mar 2019 14:08:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mellanox.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726502AbfCHWIQ (ORCPT + 99 others); Fri, 8 Mar 2019 17:08:16 -0500 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:42804 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726286AbfCHWIN (ORCPT ); Fri, 8 Mar 2019 17:08:13 -0500 Received: from Internal Mail-Server by MTLPINE1 (envelope-from parav@mellanox.com) with ESMTPS (AES256-SHA encrypted); 9 Mar 2019 00:08:06 +0200 Received: from sw-mtx-036.mtx.labs.mlnx (sw-mtx-036.mtx.labs.mlnx [10.12.150.149]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x28M83SU002296; Sat, 9 Mar 2019 00:08:04 +0200 From: Parav Pandit To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, michal.lkml@markovi.net, davem@davemloft.net, gregkh@linuxfoundation.org, jiri@mellanox.com, kwankhede@nvidia.com Cc: parav@mellanox.com, alex.williamson@redhat.com, vuhuong@mellanox.com, yuvalav@mellanox.com, jakub.kicinski@netronome.com, kvm@vger.kernel.org Subject: [RFC net-next v1 0/3] Support mlx5 mediated devices in host Date: Fri, 8 Mar 2019 16:07:53 -0600 Message-Id: <1552082876-60228-1-git-send-email-parav@mellanox.com> X-Mailer: git-send-email 1.8.3.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Use case: --------- A user wants to create/delete hardware linked sub devices without using SR-IOV. These devices for a pci device can be netdev (optional rdma device) or other devices. Such sub devices share some of the PCI device resources and also have their own dedicated resources. A user wants to use this device in a host where PF PCI device exist. (not in a guest VM.) A user may want to use such sub device in future in guest VM. Few examples are: 1. netdev having its own txq(s), rq(s) and/or hw offload parameters. 2. netdev with switchdev mode using netdev representor 3. rdma device with IB link layer and IPoIB netdev 4. rdma/RoCE device and a netdev 5. rdma device with multiple ports Requirements for above use cases: -------------------------------- 1. We need a generic user interface & core APIs to create sub devices from a parent pci device but should be generic enough for other parent devices 2. Interface should be vendor agnostic 3. User should be able to set device params at creation time 4. In future if needed, tool should be able to create passthrough device to map to a virtual machine 5. A device can have multiple ports 6. An orchestration software wants to know how many such sub devices can be created from a parent device so that it can manage them in global cluster resources. So how is it done? ------------------ Kernel has existing mediated device infrastructure for lifecyle of such sub devices provided by mdev driver. Hence, these sub devices are created with help of mdev driver. mlx5_core driver registers with mdev core to do so and exposes necessary sysfs files. Each creates sub device has unique uuid id assigned by the user. mdev sub devices inherit their parent's dma parameters. Each registered mdev has corresponding devlink instance. Through this devlink instance, such device and it port(s) are managed. In order to use mediated device in a VM or in host, user decides which driver to use. Typically vfio_mdev is used to expose a mdev in a guest VM. In current use case, mlx5 mediated devices are only usable inside the host through mlx5_core driver binding to it. Patchset summary: ----------------- Patch-1 adds support to inherit dma params of parent device in child mdev. Patch-2 registers with mdev core. Patch-3 registers a mdev device driver to create actual netdev. Summary of alternatives considered and discussed: ------------------------------------------------- 1. new subdev bus Fits the need but mdev simplifies it. 2. visorbus Very specific to Unisys s-Par devices. 3. platform devices Primarily meant for autonomous, SoC etc devices. 4. mfd devices Depends on platform device infra. 5. Directly creating netdev, rdma device instead of sub device Doesn't fit use case of passthrough mode. 6. creating subports of devlink instance Doesn't cover multiport rdma device usecase. While discussion [1], [2] is still ongoing, v1 is posted to describe how two use cases of using mdev in host or in guest via standard Linux device driver model are addressed. [1] https://www.spinics.net/lists/netdev/msg556552.html [2] https://www.spinics.net/lists/netdev/msg556944.html All patches are only a reference implementation to see framework in works at devlink, sysfs, mdev and device model level. Once RFC looks good, solid upstreamable version of the implementation will be done. System view with one mdev: -------------------------- $ ls -l /sys/bus/pci/devices/0000:05:00.0 [..] drwxr-xr-x 3 root root 0 Mar 8 14:53 69ea1551-d054-46e9-974d-8edae8f0aefe drwxr-xr-x 3 root root 0 Mar 8 15:41 infiniband drwxr-xr-x 3 root root 0 Mar 8 15:41 mdev_supported_types -rw-r--r-- 1 root root 4096 Mar 8 13:17 msi_bus drwxr-xr-x 2 root root 0 Mar 8 15:41 msi_irqs drwxr-xr-x 3 root root 0 Mar 8 15:41 net ls -l /sys/bus/mdev/drivers total 0 drwxr-xr-x 2 root root 0 Mar 8 13:39 mlx5_core drwxr-xr-x 2 root root 0 Mar 8 14:53 vfio_mdev ls -l /sys/bus/mdev/devices/ total 0 lrwxrwxrwx 1 root root 0 Mar 8 14:53 69ea1551-d054-46e9-974d-8edae8f0aefe -> ../../../devices/pci0000:00/0000:00:02.2/0000:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe Bind mdev to mlx5_core driver: $ echo 69ea1551-d054-46e9-974d-8edae8f0aefe > /sys/bus/mdev/drivers/mlx5_core/bind $ ls -l /sys/class/net/eth0/ -r--r--r-- 1 root root 4096 Mar 8 15:43 carrier_up_count lrwxrwxrwx 1 root root 0 Mar 8 15:43 device -> ../../../69ea1551-d054-46e9-974d-8edae8f0aefe -r--r--r-- 1 root root 4096 Mar 8 15:43 dev_id $ devlink dev show pci/0000:05:00.0 mdev/69ea1551-d054-46e9-974d-8edae8f0aefe Changelog --- v0->v1: - Removed subdev bus, instead using existing mdev bus which fits the need. - Dropped devlink patches which are not needed anymore due to use of mdev framework. - Updated SPDX license line in patches. - Added TODO to patches where more hardware specific code will be added. Parav Pandit (3): vfio/mdev: Inherit dma masks of parent device net/mlx5: Add mdev sub device life cycle command support net/mlx5: Add mdev driver to bind to mdev devices drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 9 ++ drivers/net/ethernet/mellanox/mlx5/core/Makefile | 5 + drivers/net/ethernet/mellanox/mlx5/core/dev.c | 18 ++++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++++ drivers/net/ethernet/mellanox/mlx5/core/mdev.c | 120 +++++++++++++++++++++ .../net/ethernet/mellanox/mlx5/core/mdev_driver.c | 106 ++++++++++++++++++ .../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 19 ++++ drivers/vfio/mdev/mdev_core.c | 4 + include/linux/mlx5/driver.h | 5 + 9 files changed, 308 insertions(+) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c -- 1.8.3.1