Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp279876pxu; Tue, 1 Dec 2020 11:04:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJylnTiaOltt4pyVGdy2sN+CGF87Pe0TX7PqlE7Qu0SU6Ku3PF3L4jsGR0aPTtdhddhaq7+Z X-Received: by 2002:a17:906:b53:: with SMTP id v19mr4618548ejg.136.1606849455071; Tue, 01 Dec 2020 11:04:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606849455; cv=none; d=google.com; s=arc-20160816; b=pGDQ5pO4Em8d+UlruoiFNf1oQfdRlDfHLe38xUK9G72Qy6Ho23hSbSdUfSW+7/kDfI K6mlC2Rgu6rQAYHjsStlcoA0+RasHIxjpXzl43sRqOGklAbSzAtl1QVH+9tFxbp/xDTX M4+WEqBqFZXevxKXwl4rsl8Ly1dRFnjDF11rYoxTnOOycXK1e9iACGf6s7j1p0Litprw xuQzzveonXPSQSRY2Xv+idEHjPcZlVXmjgSsr5gJCBcySY7bo5hJqpGL6HVlkHaPEnSI Jrzhz5mJWTUvaBisnsSE28Z1Ybd2llaLXSoy3XhGL3uWqAX5qvNaHVVj+e1th6WibD+X BsOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=VYWFSVNhz2pWl9xAtwiUvZ0FzLESfSJUIP0A1V2aN8g=; b=Jp4s5ilxd2jgjTyECFcWZU/QCOsIRSF6ZIZZDEftaSSU+VpTVPOrhOvkOBq3D8scDg XQOorMvAnmM8qdIi8YCX5umSKWojd3vZYSiyQpT3S347uO0UKHFCFIzsEV4LdKm9rnOv PbGWjSA2C67MIEsIiB2LOPBj8W998UXGjUb+d56AkDdp5rdAy7zKeO4OmZBr9Bk/qOVE l79qZSWCQZuiu9ugbTrUDkGnQyPQL25BtV5Zse0zKf48nV7ybGXfE5TEC4yk/RMp9Bj5 oM9VIOe2C5OYEuduUV/pKO/X9+qHFLyCEfToLy7LJ2DYKisatoA/0vM9H0CoL9rs7jjs rW8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Q53A8Rlt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ay17si503669ejb.702.2020.12.01.11.03.50; Tue, 01 Dec 2020 11:04:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Q53A8Rlt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730182AbgLATAQ (ORCPT + 99 others); Tue, 1 Dec 2020 14:00:16 -0500 Received: from mail.kernel.org ([198.145.29.99]:44224 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727006AbgLATAQ (ORCPT ); Tue, 1 Dec 2020 14:00:16 -0500 Received: from kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com (c-67-180-217-166.hsd1.ca.comcast.net [67.180.217.166]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2AD3920643; Tue, 1 Dec 2020 18:59:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606849175; bh=BtxyHBLXobh4EjqbIkPQzC0vVh8swzUv2fALa8vpju8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Q53A8RltsvlnS6QBdtNnfyWDgZ+x/JGvNhD3NPOnLOPHu7ErAtvSs2mOQa86Vd6b6 fOowGpp/Exr/VofogE6hUqXogq0gAThSeBf6aMPdiF8g+eunvVLEpbbBA9jgEqyBFD eUlLeaKqK5WOW0MaFCdsoj/6cfWRh+IoBQfGuR2w= Date: Tue, 1 Dec 2020 10:59:33 -0800 From: Jakub Kicinski To: George Cherian Cc: "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "davem@davemloft.net" , Sunil Kovvuri Goutham , Linu Cherian , "Geethasowjanya Akula" , "masahiroy@kernel.org" , "willemdebruijn.kernel@gmail.com" , "saeed@kernel.org" , "jiri@resnulli.us" Subject: Re: [PATCHv5 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA Message-ID: <20201201105933.7b22d119@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 1 Dec 2020 05:23:23 +0000 George Cherian wrote: > > > > You seem to have missed the feedback Saeed and I gave you on v2. > > > > > > > > Did you test this with the errors actually triggering? Devlink > > > > should store only > > > Yes, the same was tested using devlink health test interface by > > > injecting errors. > > > The dump gets generated automatically and the counters do get out of > > > sync, in case of continuous error. > > > That wouldn't be much of an issue as the user could manually trigger a > > > dump clear and Re-dump the counters to get the exact status of the > > > counters at any point of time. > > > > Now that recover op is added the devlink error counter and recover counter > > will be proper. The internal counter for each event is needed just to > > understand within a specific reporter, how many such events occurred. > > > > Following is the log snippet of the devlink health test being done on hw_nix > > reporter. > > # for i in `seq 1 33` ; do devlink health test pci/0002:01:00.0 reporter hw_nix; > > done //Inject 33 errors (16 of NIX_AF_RVU and 17 of NIX_AF_RAS and > > NIX_AF_GENERAL errors) # devlink health > > pci/0002:01:00.0: > > reporter hw_npa > > state healthy error 0 recover 0 grace_period 0 auto_recover true > > auto_dump true > > reporter hw_nix > > state healthy error 250 recover 250 last_dump_date 1970-01-01 > > last_dump_time 00:04:16 grace_period 0 auto_recover true auto_dump true > Oops, There was a log copy paste error above its not 250 (that was from a run, in which test was done > for 250 error injections) > # devlink health > pci/0002:01:00.0: > reporter hw_npa > state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true > reporter hw_nix > state healthy error 33 recover 33 I thought it'd be better to just add each error as its own reporter rather than combining them and abusing context for reporting detailed stats. This seems to be harder to get done than I thought. Maybe just go back to the prints and we can move on. > last_dump_date 1970-01-01 last_dump_time 00:02:16 grace_period 0 auto_recover true auto_dump true Why the weird date? Is this something on your system?