Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp2844384ybk; Tue, 12 May 2020 09:26:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/sFKbc2ZPdNKKulkVM562sHdoBMesvn5WbTHg8GRB/7MpLZYjfUQqz4EzcjDUvWx6iQs4 X-Received: by 2002:a05:6402:1855:: with SMTP id v21mr5622687edy.189.1589300807865; Tue, 12 May 2020 09:26:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589300807; cv=none; d=google.com; s=arc-20160816; b=GUtlhYBBOQJYGwLUZdhRBo1874TJNuXWmeK8eb1LPA4A3KlftNok083JuhNlaPHZep 8i2orz/yOlpFykUg0lfWJO2CmWYLmoWgji8he+6jj+gN49L8/SkvkYFfMl7pRiPa2KfP vUuPaFCPwxsO8+TZ+v018EmuvE+htMXE0bZ8paJYImcx+KaPVCSRqbg7798aYD99tZd2 GcmceS6V0MQ4EGZaueX7FIvJWAKkUGqhPytSaNkFBxJHQvsr9Qx6fb3NktBUc6FuKzuj eigiKtqXT/6bSMLECuwazQ+OHRppHR3weWweM0CryF9CVLOJ+dKxed4R8/eWGnf73YfM bYew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=1r6cd4S8RMtPNhd34CncojOuEsgGyyr3fSzoDBmCBaA=; b=NsRf0N1v9YFWZfJLYSHH8OUtDzGMmoNjSPJL9NMoUv/T8wlL2b+N0IodSJ0IOqsucL CkcSU4uHWc3gB9NeIHVWssgyE5wqW269czX9ocnfy3rYoVlmDolvMimJtRNGEulxgaAv 2Cbwbi/QnSJXhQCZUO6EIBTn/1LCMiYWX/lsSYRf1Pt7AziiV8+F+Z7xgU/Dhcq5r2/v tpdviGqwKy/+JxPZPH+ZRQVAozaT76s8mc5m39/INQ17BzFfqzEUCesB9BDAZlEhQXl5 Lh6xZ8Byy2LKpeJNTHxQHGTajjA9NbYUrTGllujiwIkHKtX1e6IZuHqrDESVSnnxFb9K tqPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@marvell.com header.s=pfpt0818 header.b=dz9MjXR3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o3si8339780edb.156.2020.05.12.09.26.23; Tue, 12 May 2020 09:26:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@marvell.com header.s=pfpt0818 header.b=dz9MjXR3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727853AbgELQYm (ORCPT + 99 others); Tue, 12 May 2020 12:24:42 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:9996 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725851AbgELQYm (ORCPT ); Tue, 12 May 2020 12:24:42 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04CGFuqw025440; Tue, 12 May 2020 09:23:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pfpt0818; bh=1r6cd4S8RMtPNhd34CncojOuEsgGyyr3fSzoDBmCBaA=; b=dz9MjXR3RWwr+z+H4pj5Vv+ykwxVY+fYTYcevnqpDTMi1EWutaVXpiLO+7BKHOJiHDPR WWGcg91qticg6bS4b45MIGPZUP9MC+U6Wc3EpAckuNdB6PxAxc/I7bC8mMZ5TzXT2rKk /ek03Dkj7BJOEYWFfEednV8Fsv6g/Zs9KPKxcy4adam8ECPWVL5NfqbyFh9pVkfJFVsJ s/mOwKwPtYH4BkkpEyLOi4JeaSqBMOVPcykVfQv8ibj/cjcNC7MDnFSJHxMT/RO+B268 nVEmtsTRBQfTt/Lj7UNPZVVHdahcMoiD6A9+uipoPN3R0PUCfUmge7u6fxDsUw7ZVs0n GA== Received: from sc-exch04.marvell.com ([199.233.58.184]) by mx0a-0016f401.pphosted.com with ESMTP id 30wsvqmts0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 12 May 2020 09:23:39 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 12 May 2020 09:23:38 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 12 May 2020 09:23:37 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 12 May 2020 09:23:37 -0700 Received: from [10.193.39.5] (unknown [10.193.39.5]) by maili.marvell.com (Postfix) with ESMTP id 2853A3F703F; Tue, 12 May 2020 09:23:29 -0700 (PDT) Subject: Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed() To: Luis Chamberlain CC: , , , , , , , , , , , , , , , , , , , , , , , , Ariel Elior , GR-everest-linux-l2 References: <20200509043552.8745-1-mcgrof@kernel.org> <20200509043552.8745-10-mcgrof@kernel.org> <2aaddb69-2292-ff3f-94c7-0ab9dbc8e53c@marvell.com> <20200509164229.GJ11244@42.do-not-panic.com> From: Igor Russkikh Message-ID: Date: Tue, 12 May 2020 19:23:28 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Thunderbird/77.0 MIME-Version: 1.0 In-Reply-To: <20200509164229.GJ11244@42.do-not-panic.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.676 definitions=2020-05-12_05:2020-05-11,2020-05-12 signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> So I think its not a good place to insert this call. >> Its hard to find exact good place to insert it in qed. > > Is there a way to check if what happened was indeed a fw crash? Our driver has two firmwares (slowpath and fastpath). For slowpath firmware the way to understand it crashed is to observe command response timeout. This is in qed_mcp.c, around "The MFW failed to respond to command" traceout. For fastpath this is tricky, think you may leave the above place as the only place to invoke module_firmware_crashed() > >> One more thing is that AFAIU taint flag gets permanent on kernel, but > for >> example our device can recover itself from some FW crashes, thus it'd be >> transparent for user. > > Similar things are *supposed* to recoverable with other device, however > this can also sometimes lead to a situation where devices are not usable > anymore, and require a full driver unload / load. > >> Whats the logical purpose of module_firmware_crashed? Does it mean fatal >> unrecoverable error on device? > > Its just to annotate on the module and kernel that this has happened. > > I take it you may agree that, firmware crashing *often* is not good > design, > and these issues should be reported to / fixed by vendors. In cases > where driver bugs are reported it is good to see if a firmware crash has > happened before, so that during analysis this is ruled out. Probably, but still I see some misalignment here, in sense that taint is about the kernel state, not about a hardware state indication. devlink health could really be a much better candidate for such things. Regards Igor