Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2473347pxk; Mon, 14 Sep 2020 14:32:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGAhO2IIxGr/oEfoUGigAE2YuKykD5OQsKujVaeek7VysWU7jpenbhdpde5VjLuV2V54xN X-Received: by 2002:a17:906:fb97:: with SMTP id lr23mr17064500ejb.257.1600119165196; Mon, 14 Sep 2020 14:32:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600119165; cv=none; d=google.com; s=arc-20160816; b=BT+EDJtFqdWIvxxI//8HXJlNoeFNd13ofCTA0a+cF66vu6neNxKx6jQeJdBQvSsNj7 EQx3Rfu4A7ynMOxlfkG+hg3kWivJ3g5rTniXX/Y34fgybPvintjaB05s+ZCe0EWOdJxE 0XR54Gq/hDAEtmRtEmrWJ4kkHeOzeALFwLXrMGmFGoU4yfcyQea8s4mNa3o9/8whHW2o /M7wxQzawPr6FBhQjyzFx8wx/e8nsLIvDuI9alHLnQJxrrpCT4Uu8Ihe105t70Hrbmcw 1hwF4rm9mUtIycWPHAQbCyHqpcscP8bERv5/FIVedMbpgwwSMCee/3R82r/xA+XAc/Dt sIQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=zgXvAOxnqXqQ6nm3VifvOWfihZgwTJ41ShQnb7/dzcE=; b=aTfcw2gFIYdNVBa6y5CJ+eP5eRXIuD2zfG6JjA9l/qcvlqbdb54RFabPQunHbclrKC hGbKbEz/RHeCfKugKEihSHWSt74YbM+ZII1zA5z5G6LoEA+jcjbLAYECrbAXQ72X6Ns1 ap/O18SbNcBuMt3DY3teSyrBBw8qopAjQXr5o6NPll868ICWFL+gX9PtqDF0Q6U+3/P1 eVlZIL2gKh9Xi+2kjm3cO6IBmWkQQMbg7M5d19nU4AItG5PpV+nDlxBavk7sruXw2CpI PQ5+RN0eh+PL6uaIibJo2lcD6prRUI8iRUfz+RILYi+QJztPEaaDcIimMLEX9HLX19Tg icvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1XMZ4k3P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n11si8027448ejg.69.2020.09.14.14.32.22; Mon, 14 Sep 2020 14:32:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1XMZ4k3P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726064AbgINVbE (ORCPT + 99 others); Mon, 14 Sep 2020 17:31:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:55990 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725961AbgINVbD (ORCPT ); Mon, 14 Sep 2020 17:31:03 -0400 Received: from kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com (unknown [163.114.132.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B8FE120715; Mon, 14 Sep 2020 21:31:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600119063; bh=3rkZ7CdRjA/a6sBAPu8azs7PpRx/PIItk/odvnzR4Lg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=1XMZ4k3PkXzdo83RypURrov6El8Lv/9IadYO3elfz2vWmH4qh5XzNycExEvgb08ie nyN/qb2rXSPj5DVSLTeMNqtaoca6LBhFJg9OsTpsn8VyXixZ2F6BNuRVP0iaFFmtJ6 rtzaUq2nXYOlytOcKqB4EjLTmhCnxodV91WI23YY= Date: Mon, 14 Sep 2020 14:31:00 -0700 From: Jakub Kicinski To: Jiri Pirko Cc: Vasundhara Volam , Moshe Shemesh , "David S. Miller" , Jiri Pirko , Netdev , open list , Michael Chan Subject: Re: [PATCH net-next RFC v4 01/15] devlink: Add reload action option to devlink reload command Message-ID: <20200914143100.06a4641d@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> In-Reply-To: <20200914112829.GC2236@nanopsycho.orion> References: <1600063682-17313-1-git-send-email-moshe@mellanox.com> <1600063682-17313-2-git-send-email-moshe@mellanox.com> <20200914093234.GB2236@nanopsycho.orion> <20200914112829.GC2236@nanopsycho.orion> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 14 Sep 2020 13:28:29 +0200 Jiri Pirko wrote: > Mon, Sep 14, 2020 at 11:54:55AM CEST, vasundhara-v.volam@broadcom.com wrote: > >On Mon, Sep 14, 2020 at 3:02 PM Jiri Pirko wrote: > >> >> +mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action, > >> >> + struct netlink_ext_ack *extack, > >> >> + unsigned long *actions_performed) > >> >Sorry for repeating again, for fw_activate action on our device, all > >> >the driver entities undergo reset asynchronously once user initiates > >> >"devlink dev reload action fw_activate" and reload_up does not have > >> >much to do except reporting actions that will be/being performed. > >> > > >> >Once reset is complete, the health reporter will be notified using > >> > >> Hmm, how is the fw reset related to health reporter recovery? Recovery > >> happens after some error event. I don't believe it is wise to mix it. > >Our device has a fw_reset health reporter, which is updated on reset > >events and firmware activation is one among them. All non-fatal > >firmware reset events are reported on fw_reset health reporter. > > Hmm, interesting. In that case, assuming this is fine, should we have > some standard in this. I mean, if the driver supports reset, should it > also define the "fw_reset" reporter to report such events? > > Jakub, what is your take here? Sounds doubly wrong to me. As you say health reporters should trigger on error events, communicating completion of an action requested by the user seems very wrong. IIUC operators should monitor and collect health failures. In this case looks like all events from fw_reset would need to be discarded, since they are not meaningful without the context of what triggered them. And secondly, reporting the completion via some async mechanism that user has to monitor is just plain lazy. That's pushing out the work that has to be done out to user space. Wait for the completion in the driver. > >> Instead, why don't you block in reload_up() until the reset is complete? > > > >Though user initiate "devlink dev reload" event on a single interface, > >all driver entities undergo reset and all entities recover > >independently. I don't think we can block the reload_up() on the > >interface(that user initiated the command), until whole reset is > >complete. > > Why not? mlxsw reset takes up to like 10 seconds for example. +1, why?