Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp369598lqs; Thu, 13 Jun 2024 12:22:48 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWBCtSpju2AiuMFNl1aKCQXmkFiZYT7l3pqL/pl/b2niZL3+B1FigCWcdivNkkXCJbhJY+fivWVZnDFZG7xE9ire3NE6u22FozaubhRpA== X-Google-Smtp-Source: AGHT+IFiCw8fYA3w7Jc4DXNl75g/8HQly2Sbip4JzhaUsbz9sZ2IN+B2lfs9w3NW5DkIkdzqMR/H X-Received: by 2002:ad4:4181:0:b0:6b0:8222:ba49 with SMTP id 6a1803df08f44-6b2afc8ca62mr6028606d6.16.1718306568540; Thu, 13 Jun 2024 12:22:48 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718306568; cv=pass; d=google.com; s=arc-20160816; b=E+w5RULvE3wSEMcAOE9x0Jzt7wlg8I/mMAUfQKOSxAfBbGt3jLthLfPpr4uQRdVmdF /h/h5cKlAAzcQJ3tvZahYwP3kCnJxwmcf+am/iL5MPtxdhMOQEu7La7tlFEVlnTzh6uZ DJ9ySQhm6p0Fbia/nQrn7VTnjgZnLQvr0xW/GCw1cTB1eyxoKI3JTJbSjoagRAqn+zmi ICvhkIxznSndJATkzDRRBujJ+6FNUtXThXPJc3vhY3gFJaPKH0EMtd7zbCkNJLjXgNnQ IjCh0WVDIm+W3qUumKDDVmJ4Ud3wQr8vBBIAmOGQu3VPoJYz1B14ghejn3iXqiGYtDks LHOg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=CkWWhpDkkEY8YScsWvx3QzGVOZ53WW1sEeVzTMGLcj0=; fh=N0oSx7g7MGqzcj6jzEYv16BYGHMx2xR+0hSvozunbVg=; b=peC7S/fLJ4+XLgdrDWSehXqfTwysQqC7cnrshXP37DkB7TfXNvfMJm4IVFV4SWy5J3 G+tPKaaGV8FYD3uXTihfVB53ZAxuZ1qDQL9cBYAmkHUBpOplY+9dESBBSEOCI2nRSLoc ip2NY6o5fLe8gHg5YZv7XkkVknSkkCzGikDSYUXIwA7AIAzNI+G7gRtNv+Xbv16BVebl tFIksAPdNaicrTi677gyku9vW1RrufW3086U3yw/1nADibEXlBzK6IbB0jkpmkwBzQmB VWHMFKQln8VaE4vW1Mu/aTW5R5ciWk+984zg2iGQO4UzE3H3M1W+E5XSCd7ByAzMZaye iYvw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=BiODEcA+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-213910-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-213910-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-441f2ea8d0esi20671101cf.258.2024.06.13.12.22.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 12:22:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-213910-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=BiODEcA+; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-213910-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-213910-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 35D6B1C25722 for ; Thu, 13 Jun 2024 19:22:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8E5BE155CA2; Thu, 13 Jun 2024 19:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BiODEcA+" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D37C130AC8; Thu, 13 Jun 2024 19:18:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718306313; cv=none; b=cUkF8BEwK/d1zXFOOj0JrNMcgUhFTtFPTX6g0kOuOIURvu005N80apde3kIWezkM1P5f9dGAfDQMd8FR44IpVC871ai6w50qhRxx2qlQ6dffV1kOARMwyiT+RZK8Qt0zsacNkdQbGHYQAssMRxyQ1hq3pm2m5Hs5zUYi9r+TMDk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718306313; c=relaxed/simple; bh=0dUoIERvha84Ttm88+VYuMJSLvLnVGmZhewF3c9D9Ps=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=L7sU2irM7vE6izDa99CtBzZfydBxC4isGPgwGrIYmTWlMeHj4XbtG6ZFmw2/TpY6fLNF6gRXevUJA0xwTQqnpfLIaDvrvDWuAN1k19vp/91R6vAhkRWXn41eXv5YS5saiQwdTgTrzkXH2Wv0cvHJoLDmOUPvNm2e8PJ3QSnQRjY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BiODEcA+; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90473C2BBFC; Thu, 13 Jun 2024 19:18:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718306313; bh=0dUoIERvha84Ttm88+VYuMJSLvLnVGmZhewF3c9D9Ps=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BiODEcA+596m57waePRbr913TZX70bGwTVw2U5cJow/u7yy3sF6R2rIlvUrh+qb2f TcdMkIQhP4yRqZPODt5+ZhhoESIkGBCi4njvMtRkhzTjrrR1hZuStkxs0VpIkm2uYj 8zRoIcad7KwuZMVgmpERwGErcraSzkmsweD4f5EAJXXFK6lyM0ISNarP9ZOw1aMhrJ V2oNjlo+fhM/YdcYWZ8WZIHYJ+Ko5I6NQfAR6kum2OGlYXDsPqO7QqXgpkUPyAFb4+ 8Yeg5tEOuSzNF3D437mE5IOh8SqDhmlzOz8xMFJZSSL0PtlFQylD4tnTAywoGj0faz jx19BE/xTN7mg== Date: Thu, 13 Jun 2024 22:18:28 +0300 From: Leon Romanovsky To: Omer Shpigelman Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, ogabbay@kernel.org, zyehudai@habana.ai Subject: Re: [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Message-ID: <20240613191828.GJ4966@unreal> References: <20240613082208.1439968-1-oshpigelman@habana.ai> <20240613082208.1439968-12-oshpigelman@habana.ai> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240613082208.1439968-12-oshpigelman@habana.ai> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote: > Add an RDMA driver of Gaudi ASICs family for AI scaling. > The driver itself is agnostic to the ASIC in action, it operates according > to the capabilities that were passed on device initialization. > The device is initialized by the hbl_cn driver via auxiliary bus. > The driver also supports QP resource tracking and port/device HW counters. > > Signed-off-by: Omer Shpigelman > Co-developed-by: Abhilash K V > Signed-off-by: Abhilash K V > Co-developed-by: Andrey Agranovich > Signed-off-by: Andrey Agranovich > Co-developed-by: Bharat Jauhari > Signed-off-by: Bharat Jauhari > Co-developed-by: David Meriin > Signed-off-by: David Meriin > Co-developed-by: Sagiv Ozeri > Signed-off-by: Sagiv Ozeri > Co-developed-by: Zvika Yehudai > Signed-off-by: Zvika Yehudai I afraid that you misinterpreted the "Co-developed-by" tag. All these people are probably touch the code and not actually sit together at the same room and write the code together. So, please remove the extensive "Co-developed-by" tags. It is not full review yet, but simple pass-by-comments. > --- > MAINTAINERS | 10 + > drivers/infiniband/Kconfig | 1 + > drivers/infiniband/hw/Makefile | 1 + > drivers/infiniband/hw/hbl/Kconfig | 17 + > drivers/infiniband/hw/hbl/Makefile | 8 + > drivers/infiniband/hw/hbl/hbl.h | 326 +++ > drivers/infiniband/hw/hbl/hbl_main.c | 478 ++++ > drivers/infiniband/hw/hbl/hbl_verbs.c | 2686 ++++++++++++++++++++++ > include/uapi/rdma/hbl-abi.h | 204 ++ > include/uapi/rdma/hbl_user_ioctl_cmds.h | 66 + > include/uapi/rdma/hbl_user_ioctl_verbs.h | 106 + > include/uapi/rdma/ib_user_ioctl_verbs.h | 1 + > 12 files changed, 3904 insertions(+) > create mode 100644 drivers/infiniband/hw/hbl/Kconfig > create mode 100644 drivers/infiniband/hw/hbl/Makefile > create mode 100644 drivers/infiniband/hw/hbl/hbl.h > create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c > create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c > create mode 100644 include/uapi/rdma/hbl-abi.h > create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h > create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h <...> > +#define hbl_ibdev_emerg(ibdev, format, ...) ibdev_emerg(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_alert(ibdev, format, ...) ibdev_alert(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_crit(ibdev, format, ...) ibdev_crit(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_err(ibdev, format, ...) ibdev_err(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_warn(ibdev, format, ...) ibdev_warn(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_info(ibdev, format, ...) ibdev_info(ibdev, format, ##__VA_ARGS__) > +#define hbl_ibdev_dbg(ibdev, format, ...) ibdev_dbg(ibdev, format, ##__VA_ARGS__) > + > +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...) \ > + ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...) \ > + ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...) \ > + ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...) \ > + ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...) \ > + ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...) \ > + ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...) \ > + ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__) > +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...) \ > + ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__) > + Please don't redefine the existing macros. Just use the existing ones. <...> > + if (hbl_ib_match_netdev(ibdev, netdev)) > + ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port); > + else > + return NOTIFY_DONE; It is not kernel coding style. Please write: if (!hbl_ib_match_netdev(ibdev, netdev)) return NOTIFY_DONE; ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port); > + <...> > +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id) > +{ > + struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev); > + struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops; > + struct hbl_ib_device *hdev; > + ktime_t timeout; > + int rc; > + > + rc = hdev_init(aux_dev); > + if (rc) { > + dev_err(&aux_dev->adev.dev, "Failed to init hdev\n"); > + return -EIO; > + } > + > + hdev = aux_dev->priv; > + > + /* don't allow module unloading while it is attached */ > + if (!try_module_get(THIS_MODULE)) { This part makes wonder, what are you trying to do here? What doesn't work for you in standard driver core and module load mechanism? > + dev_err(hdev->dev, "Failed to increment %s module refcount\n", > + module_name(THIS_MODULE)); > + rc = -EIO; > + goto module_get_err; > + } > + > + timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC); > + while (1) { > + aux_ops->hw_access_lock(aux_dev); > + > + /* if the device is operational, proceed to actual init while holding the lock in > + * order to prevent concurrent hard reset > + */ > + if (aux_ops->device_operational(aux_dev)) > + break; > + > + aux_ops->hw_access_unlock(aux_dev); > + > + if (ktime_compare(ktime_get(), timeout) > 0) { > + dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n"); > + rc = -EBUSY; > + goto timeout_err; > + } > + > + dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n"); > + > + msleep_interruptible(MSEC_PER_SEC); > + } The code above is unexpected. > + > + rc = hbl_ib_dev_init(hdev); > + if (rc) { > + dev_err(hdev->dev, "Failed to init ib device\n"); > + goto dev_init_err; > + } > + > + aux_ops->hw_access_unlock(aux_dev); > + > + return 0; > + > +dev_init_err: > + aux_ops->hw_access_unlock(aux_dev); > +timeout_err: > + module_put(THIS_MODULE); > +module_get_err: > + hdev_fini(aux_dev); > + > + return rc; > +} <...> > +static int __init hbl_ib_init(void) > +{ > + pr_info("loading driver\n"); Please remove all these debug prints and leave only the necessary ones. > + > + return auxiliary_driver_register(&hbl_ib_driver); > +} > + > +static void __exit hbl_ib_exit(void) > +{ > + auxiliary_driver_unregister(&hbl_ib_driver); > + > + pr_info("driver removed\n"); > +} > + > +module_init(hbl_ib_init); > +module_exit(hbl_ib_exit) Thanks