Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1561323pxb; Fri, 6 Nov 2020 13:00:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJyzNSufD7L2pbSGZRlI6la2rFRlFcgdJFhp+2qgSVQdqXglxFUIDDlpHxyXUPydBHaNKSTc X-Received: by 2002:aa7:c054:: with SMTP id k20mr4109145edo.224.1604696417439; Fri, 06 Nov 2020 13:00:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604696417; cv=none; d=google.com; s=arc-20160816; b=Ji+Y3cWL1jBqcBjNL3+PvUX5ytZYO3VT+L1CIQz5KV0MuoEnalzdbG+ySOmwp3+n9x iuQUmpDXX6flKWvV8K2BNFLRlaRTB645M1X2yJbHuY3LkA/IslJdetyMsCWC4quK0bVc Kke71FLSVeCXSTOt4jDcGNhS48Tfc7vWeali6C0/C+EvjVNMdHhgvDbfXL8Vq9vPk2WE LvHT58i189OM48Z6qyz6FjeUW9ldOrPt24YN2aNe3WxVU70OcxhY/RBg050LmXQ2sG+U 2/JwcVbbDT/lpvyIkwj0TrFCg3kbc7NF9CddHBLL1UDwXKgbGmOjolJrwZRSykGwZOZG rNQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=EFHVncG92muS4sf6hzoy0ZPep1Txp80H9LmA2+Yd9ZM=; b=ip908XfIuPSmrKNdjgGcMcgbr4itv6NIdumZCXdTFVz/G3Ew02+hmhaBzx+4v392sj FDZDceWnfckti0pyhEjoDNnbV1/aFQZwUdwHjts7Ro/Adry1dpNC3oP1VbzF+54zYphv 7zxsEjjR4vc3zAGoGGcmV74uVB3W//X8muLSsznkFBdauG4j5kVxlrmOqBUVZcDU8CN1 3Nn2C7z6p+KA2wOAPu/KmZceG8oYAtOCF2bp/DMSzoilscHDicMJlTxzw0K+qASdhPGJ iXKvQRmW65Ttxt1iGlOkA1+qo3tCYgCfMqb0WF3T+0sI35A7g8YjbazUHINs81Ljxnvt JBew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1dqdahWq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u15si1859267edx.51.2020.11.06.12.59.54; Fri, 06 Nov 2020 13:00:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1dqdahWq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728543AbgKFU62 (ORCPT + 99 others); Fri, 6 Nov 2020 15:58:28 -0500 Received: from mail.kernel.org ([198.145.29.99]:43058 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727129AbgKFU62 (ORCPT ); Fri, 6 Nov 2020 15:58:28 -0500 Received: from sx1.lan (c-24-6-56-119.hsd1.ca.comcast.net [24.6.56.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D521B20B1F; Fri, 6 Nov 2020 20:58:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604696307; bh=Qs7s5PkEVmcdFdJtVbfQaSH2HwSKtixNCGABo8gZLzs=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=1dqdahWq+6ZNk3JsBrpONfOl6UrsPPEYH3unW+weooG8ff3Jrasf9jFVwDGZPdxlk Vnn5VWmgHeGv+JUNIateSldplxtFpDXASFS/9r2imGmFS/0aVeOxtmgBZu6GjreXYd HbWzgLebNjS/BHoajFR/rI6ImeVW8eobS9zm4SLY= Message-ID: Subject: Re: [PATCH v2 net-next 3/3] octeontx2-af: Add devlink health reporters for NIX From: Saeed Mahameed To: Sunil Kovvuri Cc: George Cherian , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Jiri Pirko , "kuba@kernel.org" , "davem@davemloft.net" , Sunil Kovvuri Goutham , Linu Cherian , Geethasowjanya Akula , "masahiroy@kernel.org" , "willemdebruijn.kernel@gmail.com" Date: Fri, 06 Nov 2020 12:58:25 -0800 In-Reply-To: References: <1dd085b9f7013e9a28057f3080ee7b920bfbc9fc.camel@kernel.org> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2020-11-06 at 00:59 +0530, Sunil Kovvuri wrote: > > > > > Output: > > > > > # ./devlink health > > > > > pci/0002:01:00.0: > > > > > reporter npa > > > > > state healthy error 0 recover 0 > > > > > reporter nix > > > > > state healthy error 0 recover 0 > > > > > # ./devlink health dump show pci/0002:01:00.0 reporter nix > > > > > NIX_AF_GENERAL: > > > > > Memory Fault on NIX_AQ_INST_S read: 0 > > > > > Memory Fault on NIX_AQ_RES_S write: 0 > > > > > AQ Doorbell error: 0 > > > > > Rx on unmapped PF_FUNC: 0 > > > > > Rx multicast replication error: 0 > > > > > Memory fault on NIX_RX_MCE_S read: 0 > > > > > Memory fault on multicast WQE read: 0 > > > > > Memory fault on mirror WQE read: 0 > > > > > Memory fault on mirror pkt write: 0 > > > > > Memory fault on multicast pkt write: 0 > > > > > NIX_AF_RAS: > > > > > Poisoned data on NIX_AQ_INST_S read: 0 > > > > > Poisoned data on NIX_AQ_RES_S write: 0 > > > > > Poisoned data on HW context read: 0 > > > > > Poisoned data on packet read from mirror buffer: 0 > > > > > Poisoned data on packet read from mcast buffer: 0 > > > > > Poisoned data on WQE read from mirror buffer: 0 > > > > > Poisoned data on WQE read from multicast buffer: 0 > > > > > Poisoned data on NIX_RX_MCE_S read: 0 > > > > > NIX_AF_RVU: > > > > > Unmap Slot Error: 0 > > > > > > > > > > > > > Now i am a little bit skeptic here, devlink health reporter > > > > infrastructure was > > > > never meant to deal with dump op only, the main purpose is to > > > > diagnose/dump and recover. > > > > > > > > especially in your use case where you only report counters, i > > > > don't > > > > believe > > > > devlink health dump is a proper interface for this. > > > These are not counters. These are error interrupts raised by HW > > > blocks. > > > The count is provided to understand on how frequently the errors > > > are > > > seen. > > > Error recovery for some of the blocks happen internally. That is > > > the > > > reason, > > > Currently only dump op is added. > > > > So you are counting these events in driver, sounds like a counter > > to > > me, i really think this shouldn't belong to devlink, unless you > > really > > utilize devlink health ops for actual reporting and recovery. > > > > what's wrong with just dumping these counters to ethtool ? > > This driver is a administrative driver which handles all the > resources > in the system and doesn't do any IO. > NIX and NPA are key co-processor blocks which this driver handles. > With NIX and NPA, there are pieces > which gets attached to a PCI device to make it a networking device. > We > have netdev drivers registered to this > networking device. Some more information about the drivers is > available at > https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/marvell/octeontx2.html > > So we don't have a netdev here to report these co-processor block > level errors over ethtool. > but AF driver can't be standalone to operate your hw, it must have a PF/VF with netdev interface to do io, so even if your model is modular, a common user of this driver will always see a netdev.