Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp435956lqs; Thu, 13 Jun 2024 14:49:52 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWpDa0HbUHpD7lZ8+8XnaTKkeUUt587n+ECzsGwBzvWcjBpJTQmNk1RHsKx8MNLq1TfFwzXvbJfPHWQjxIaRv4EKN0w0dXB0TvFktz7Cw== X-Google-Smtp-Source: AGHT+IGb1yMQ2BJgNA2VZEJ4+u3CD2NUL03yBB1yfA6PQEKxOD/Cm39p+/BcG5C5LEq745NxskUu X-Received: by 2002:a17:906:b790:b0:a6f:45b7:f69e with SMTP id a640c23a62f3a-a6f60d2bd20mr68038166b.20.1718315392816; Thu, 13 Jun 2024 14:49:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718315392; cv=pass; d=google.com; s=arc-20160816; b=YHxa2dA/4joVpc3fen6raEhfAOiR5g+UJWg6EFFS57h6bGp+Wcf9O+aWXChXrgTRVd F8y57kxJu62UkihW9m+XHUPoWMePrSmxwmiQGt0duPxoN5Ohz0GGdOoT3NhuH+aYwWV6 6efbACovo68daiPYTrB3q2gJTe+npvIZMwCozNb3n/lv9GoUFor2apGH3J2MQk44q/Mb aSb/x9n4U2jkbwzhbbPpoVccmNpq7QUXvMRbRJRbBTIHtu47ioSH2f6cXuMdQOlGSP1f us0ofdfeE9DfrqHusUjRSRIqvjC+q8yOyO/pAj0n2M+CENv5e7D5yCzvELbH0hujOrqg Ql5g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=2tgawcyxp7e6LF/gNJHmG9oPpwqaaAvXUh9+M7UWf7c=; fh=N0oSx7g7MGqzcj6jzEYv16BYGHMx2xR+0hSvozunbVg=; b=k78O/RbOPtrfrLEV6Hzwp6dtmBXYj2cE0SKCC3tjrfv3FfaaWYXCiTjCK1IxbuzL7t Un7JN7dTb8Yh3LSpJDfHG9GlmxB9AOQaBiaUBDJQqGPz3m2J7JTb+FvPcUm1yDEEqaVm Ycz+ZU6iUAyKSe2fwUdN5+t55gpZCE9RfLsT1o4ATqgUAY2ihC6NJHDK7noe9y+ZVkEi VG6FYA0Laaj4GZfL7EuMXfT0VY1neGuQOcf6wQccf5ThadwXX/R8Dr9fVXIxxSQc9KNo 523Rjw20Iv5QCh+xJOlTkWqxBOdp37DbA+Dfm8EeYNROEbkeSdkRbMifyZarTO4BCiXh oWZQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@lunn.ch header.s=20171124 header.b=3EnM92Rk; arc=pass (i=1 spf=pass spfdomain=lunn.ch dkim=pass dkdomain=lunn.ch dmarc=pass fromdomain=lunn.ch); spf=pass (google.com: domain of linux-kernel+bounces-214038-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-214038-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lunn.ch Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a6f56dd664asi103832366b.308.2024.06.13.14.49.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jun 2024 14:49:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-214038-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@lunn.ch header.s=20171124 header.b=3EnM92Rk; arc=pass (i=1 spf=pass spfdomain=lunn.ch dkim=pass dkdomain=lunn.ch dmarc=pass fromdomain=lunn.ch); spf=pass (google.com: domain of linux-kernel+bounces-214038-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-214038-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lunn.ch Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 5F1371F23A26 for ; Thu, 13 Jun 2024 21:49:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 00B7214B94D; Thu, 13 Jun 2024 21:49:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=lunn.ch header.i=@lunn.ch header.b="3EnM92Rk" Received: from vps0.lunn.ch (vps0.lunn.ch [156.67.10.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF9A65A4FD; Thu, 13 Jun 2024 21:49:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=156.67.10.101 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718315378; cv=none; b=hnXP4lfClevBt2I9dYO1zFBZz9iarqBaYg6ilY8tLqQSGTcqNgC4HRYSF+Wav4oTrjM9naU19NRu220RZupTH7ltUzTqUqFjBKjw/9kKLPFMu5ILm+L4FR/qPk0aePcpZSJ8L4BlhEDbjjTQvCLdEXBh104kBabD904PFb1as5A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718315378; c=relaxed/simple; bh=ETYgORMuqtSoTr818gKpIfKridOj8I0R7TG0CAFQppc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=M7WPuz0bR/wZdo8uVSLCI8miQNvHsETG/2B+rp1rvCv9n5aaEDfauWxqoYrfmZTXmWTpvIGrs7W6D2SZ9MaBX0JWrj5Bog66opt0c7osOyPYrZIagSnkSYMexhhVRl9w8pZQgC9XulPSFobKeyI/Qjh8Ct7LJkn6QiA7fKuOhCk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lunn.ch; spf=pass smtp.mailfrom=lunn.ch; dkim=pass (1024-bit key) header.d=lunn.ch header.i=@lunn.ch header.b=3EnM92Rk; arc=none smtp.client-ip=156.67.10.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=lunn.ch Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lunn.ch DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lunn.ch; s=20171124; h=In-Reply-To:Content-Disposition:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:From:Sender:Reply-To:Subject: Date:Message-ID:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Content-Disposition:In-Reply-To:References; bh=2tgawcyxp7e6LF/gNJHmG9oPpwqaaAvXUh9+M7UWf7c=; b=3EnM92Rk7DmDJc+RAa3EiMUH1m hHpvDI0GQPzxsTAWddB2Q20QiDM/mMCPCPtKzRZ1hi+4tCZ+WoImbaFDu5RTREUPfcIN6iStA3T9J /pA8sqcDg0PcH5KpLpk6yqSMBIufj7NfpabiOrUoHNLerqkOxh/pkPonVcnEdcTsWSVk=; Received: from andrew by vps0.lunn.ch with local (Exim 4.94.2) (envelope-from ) id 1sHsK5-0000ZS-5O; Thu, 13 Jun 2024 23:49:33 +0200 Date: Thu, 13 Jun 2024 23:49:33 +0200 From: Andrew Lunn To: Omer Shpigelman Cc: linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, ogabbay@kernel.org, zyehudai@habana.ai Subject: Re: [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Message-ID: <10902044-fb02-4328-bf88-0b386ee51c78@lunn.ch> References: <20240613082208.1439968-1-oshpigelman@habana.ai> <20240613082208.1439968-10-oshpigelman@habana.ai> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240613082208.1439968-10-oshpigelman@habana.ai> > +static int hbl_en_napi_poll(struct napi_struct *napi, int budget); > +static int hbl_en_port_open(struct hbl_en_port *port); When you do the Intel internal review, i expect this is crop up. No forward declarations please. Put the code in the right order so they are not needed. > +static int hbl_en_get_src_ip(struct hbl_aux_dev *aux_dev, u32 port_idx, u32 *src_ip) > +{ > + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx); > + struct net_device *ndev = port->ndev; > + struct in_device *in_dev; > + struct in_ifaddr *ifa; > + int rc = 0; > + > + /* for the case where no src IP is configured */ > + *src_ip = 0; > + > + /* rtnl lock should be acquired in relevant flows before taking configuration lock */ > + if (!rtnl_is_locked()) { > + netdev_err(port->ndev, "Rtnl lock is not acquired, can't proceed\n"); > + rc = -EFAULT; > + goto out; > + } You will find all other drivers just do: ASSERT_RTNL(). If your locking is broken, you are probably dead anyway, so you might as well keep going and try to explode in the most interesting way possible. > +static void hbl_en_reset_stats(struct hbl_aux_dev *aux_dev, u32 port_idx) > +{ > + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx); > + > + port->net_stats.rx_packets = 0; > + port->net_stats.tx_packets = 0; > + port->net_stats.rx_bytes = 0; > + port->net_stats.tx_bytes = 0; > + port->net_stats.tx_errors = 0; > + atomic64_set(&port->net_stats.rx_dropped, 0); > + atomic64_set(&port->net_stats.tx_dropped, 0); Why atomic64_set? Atomics are expensive, so you should not be using them. netdev has other cheaper methods, which other Intel developers should be happy to tell you all about. > +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx) > +{ > + struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx); > + struct net_device *ndev = port->ndev; > + u32 mtu; > + > + if (atomic_cmpxchg(&port->in_reset, 0, 1)) { > + netdev_err(ndev, "port is in reset, can't get MTU\n"); > + return 0; > + } > + > + mtu = ndev->mtu; I think you need a better error message. All this does is access ndev->mtu. What does it matter if the port is in reset? You don't access it. > +static int hbl_en_close(struct net_device *netdev) > +{ > + struct hbl_en_port *port = hbl_netdev_priv(netdev); > + struct hbl_en_device *hdev = port->hdev; > + ktime_t timeout; > + > + /* Looks like the return value of this function is not checked, so we can't just return > + * EBUSY if the port is under reset. We need to wait until the reset is finished and then > + * close the port. Otherwise the netdev will set the port as closed although port_close() > + * wasn't called. Only if we waited long enough and the reset hasn't finished, we can return > + * an error without actually closing the port as it is a fatal flow anyway. > + */ > + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC); > + while (atomic_cmpxchg(&port->in_reset, 0, 1)) { > + /* If this is called from unregister_netdev() then the port was already closed and > + * hence we can safely return. > + * We could have just check the port_open boolean, but that might hide some future > + * bugs. Hence it is better to use a dedicated flag for that. > + */ > + if (READ_ONCE(hdev->in_teardown)) > + return 0; > + > + usleep_range(50, 200); > + if (ktime_compare(ktime_get(), timeout) > 0) { > + netdev_crit(netdev, > + "Timeout while waiting for port to finish reset, can't close it\n" > + ); > + return -EBUSY; > + } This has the usual bug. Please look at include/linux/iopoll.h. > + timeout = ktime_add_ms(ktime_get(), PORT_RESET_TIMEOUT_MSEC); > + while (atomic_cmpxchg(&port->in_reset, 0, 1)) { > + usleep_range(50, 200); > + if (ktime_compare(ktime_get(), timeout) > 0) { > + netdev_crit(port->ndev, > + "Timeout while waiting for port %d to finish reset\n", > + port->idx); > + break; > + } > + } and again. Don't roll your own timeout loops like this, use the core version. > +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu) > +{ > + struct hbl_en_port *port = hbl_netdev_priv(netdev); > + int rc = 0; > + > + if (atomic_cmpxchg(&port->in_reset, 0, 1)) { > + netdev_err(netdev, "port is in reset, can't change MTU\n"); > + return -EBUSY; > + } > + > + if (netif_running(port->ndev)) { > + hbl_en_port_close(port); > + > + /* Sleep in order to let obsolete events to be dropped before re-opening the port */ > + msleep(20); > + > + netdev->mtu = new_mtu; > + > + rc = hbl_en_port_open(port); > + if (rc) > + netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc); Does that mean the port is FUBAR? Most operations like this are expected to roll back to the previous working configuration on failure. So if changing the MTU requires new buffers in your ring, you should first allocate the new buffers, then free the old buffers, so that if allocation fails, you still have buffers, and the device can continue operating. > +module_param(poll_enable, bool, 0444); > +MODULE_PARM_DESC(poll_enable, > + "Enable Rx polling rather than IRQ + NAPI (0 = no, 1 = yes, default: no)"); Module parameters are not liked. This probably needs to go away. > +static int hbl_en_ethtool_get_module_info(struct net_device *ndev, struct ethtool_modinfo *modinfo) > +{ > + modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; > + modinfo->type = ETH_MODULE_SFF_8636; Is this an SFF, not an SFP? How else can you know what module it is without doing an I2C transfer to ask the module what it is? > +static int hbl_en_ethtool_get_module_eeprom(struct net_device *ndev, struct ethtool_eeprom *ee, > + u8 *data) > +{ This is the old API. Please update to the new API so there is access to all the pages of the SFF/SFP. > +static int hbl_en_ethtool_get_link_ksettings(struct net_device *ndev, > + struct ethtool_link_ksettings *cmd) > +{ > + struct hbl_en_aux_ops *aux_ops; > + struct hbl_aux_dev *aux_dev; > + struct hbl_en_device *hdev; > + struct hbl_en_port *port; > + u32 port_idx, speed; > + > + port = hbl_netdev_priv(ndev); > + hdev = port->hdev; > + port_idx = port->idx; > + aux_dev = hdev->aux_dev; > + aux_ops = aux_dev->aux_ops; > + speed = aux_ops->get_speed(aux_dev, port_idx); > + > + cmd->base.speed = speed; > + cmd->base.duplex = DUPLEX_FULL; > + > + ethtool_link_ksettings_zero_link_mode(cmd, supported); > + ethtool_link_ksettings_zero_link_mode(cmd, advertising); > + > + switch (speed) { > + case SPEED_100000: > + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseSR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseKR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 100000baseLR4_ER4_Full); > + > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseSR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseKR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 100000baseLR4_ER4_Full); > + > + cmd->base.port = PORT_FIBRE; > + > + ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE); > + > + ethtool_link_ksettings_add_link_mode(cmd, supported, Backplane); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, Backplane); > + break; > + case SPEED_50000: > + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseSR2_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseCR2_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 50000baseKR2_Full); > + > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseSR2_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseCR2_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 50000baseKR2_Full); > + break; > + case SPEED_25000: > + ethtool_link_ksettings_add_link_mode(cmd, supported, 25000baseCR_Full); > + > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 25000baseCR_Full); > + break; > + case SPEED_200000: > + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 200000baseKR4_Full); > + > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 200000baseKR4_Full); > + break; > + case SPEED_400000: > + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, supported, 400000baseKR4_Full); > + > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseCR4_Full); > + ethtool_link_ksettings_add_link_mode(cmd, advertising, 400000baseKR4_Full); > + break; > + default: > + netdev_err(port->ndev, "unknown speed %d\n", speed); > + return -EFAULT; > + } > + > + ethtool_link_ksettings_add_link_mode(cmd, supported, Autoneg); > + > + if (port->auto_neg_enable) { > + ethtool_link_ksettings_add_link_mode(cmd, advertising, Autoneg); > + cmd->base.autoneg = AUTONEG_ENABLE; > + if (port->auto_neg_resolved) > + ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg); That looks odd. Care to explain? > + } else { > + cmd->base.autoneg = AUTONEG_DISABLE; > + } > + > + ethtool_link_ksettings_add_link_mode(cmd, supported, Pause); > + > + if (port->pfc_enable) > + ethtool_link_ksettings_add_link_mode(cmd, advertising, Pause); And is suspect that is wrong. Everybody gets pause wrong. Please double check my previous posts about pause. > + if (auto_neg && !(hdev->auto_neg_mask & BIT(port_idx))) { > + netdev_err(port->ndev, "port autoneg is disabled by BMC\n"); > + rc = -EFAULT; > + goto out; Don't say you support autoneg in supported if that is the case. And EFAULT is about memory problems. EINVAL, maybe EPERM? or EOPNOTSUPP. Andrew