Received: by 10.213.65.68 with SMTP id h4csp1599267imn; Mon, 19 Mar 2018 08:24:35 -0700 (PDT) X-Google-Smtp-Source: AG47ELsxabsbk+r9oDbSA3pfWyQFOeZCukhoQ2OcoPNMRahkQRdRWs/G5eSiNNEL9T4v5uo6QI2o X-Received: by 10.98.217.85 with SMTP id s82mr4614216pfg.208.1521473075705; Mon, 19 Mar 2018 08:24:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521473075; cv=none; d=google.com; s=arc-20160816; b=Z2cjUxA6gJ4Uu2UNJSL+pmqMb6vUgVv/rIVhX3G42gp0ujTeP4le1dYAVKDZp7hEL6 dTcs/n2kRSKApWwiWcU7pY8UI8D84eWKnRoz+EEjCAbMDYZWnHU6fxLkBjnVsyHetNc+ ysbjAKkTXK2Kcw/eiTRpQ7ooWVQUnSOl8XAhJc/exolv5M1O/MDJlCvWmfZO5K7lUyrL BqTEnX01MiJh8BaTi0mwbFZNQRBYx32COlod1Ie9KPbet3Q4VEb6d3ameNPDdvpqjBxo YwLE2v0zdOM1ioKCYaPBCey+t0MkWRQahqobCCpZ39DnuWhDfgR7Iwd1vvaupBYSEJE8 ihJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=E0oL2bIHLKH5wAQQiMMtzN0Y+amYwsLhy4GF8BYmrj0=; b=e9Ejlq50VQEc54nas0nQ1F9jbECGijx1+kDR+xA2o8bZ+CTPyHqLccaNGk2AYYvoOQ QYH/j6BpL8hKeQEd0z5z423lHkggVfUS1/9d0ErQAtszNMOHP0U0YErrKv0Rr/+0QDIm hFdWAArOjoxuERVOPg1w/uTI5lSd2USx9GdY5FqLH+HJ7ujiKMrMPb3C5PZqz4VIngrg 0ltsiFW6comM3qTdgOWjWknsReOsvvKDFMR5OEHPoUy+XjvCz06Vc/fA7F+GZqrcaLPL onkc0r2p8RqkQKdOIr0v/LxVcsb10VoGOgEUmc3c0zQZ9dxZiN+IUeqkS5K09flIJlJu 9Naw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@networkplumber-org.20150623.gappssmtp.com header.s=20150623 header.b=bHujk/rl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x5si132098pgb.365.2018.03.19.08.24.20; Mon, 19 Mar 2018 08:24:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@networkplumber-org.20150623.gappssmtp.com header.s=20150623 header.b=bHujk/rl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933525AbeCSPWd (ORCPT + 99 others); Mon, 19 Mar 2018 11:22:33 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:45658 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933402AbeCSPWV (ORCPT ); Mon, 19 Mar 2018 11:22:21 -0400 Received: by mail-pg0-f66.google.com with SMTP id s13so7006493pgn.12 for ; Mon, 19 Mar 2018 08:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=E0oL2bIHLKH5wAQQiMMtzN0Y+amYwsLhy4GF8BYmrj0=; b=bHujk/rlyRv4x2ZZpVy7WWRH9lUNrECch19lmMZ1Z7vIQofTZYMRp5BuTqrc5ubgp/ Rb9oBubeD5ij8LWMopXQ8qAXg5WvGAVKg4V3PfmxPRASsN3+J6W8J3ahjyl39Gl84acu HxogNa8SrHc0iatSiYDmHkcCrqoMwuEy6LpcmJAMXwJhBhfZuZM8F+C2mwDIayVDnuSr +u0ttkmSttfzY0BzjyLoSahg81hlykHeO60ARTjtlEK0BsF1GDDohZk6yhCVp3yxnmd2 TQkI3d+kmgP19NSd8Y6Kct8xVEbeQjW8k5rLB86SFojUI+zuR6dt5X0pN4NtLIO8XRyW GXGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=E0oL2bIHLKH5wAQQiMMtzN0Y+amYwsLhy4GF8BYmrj0=; b=EmpZTJJ5zNUSNVdEerYf4rzFagP9JkImPnSCgOugOITPFJz+2f3xWZjt1KhVbVsj/b 0cT0CvVM1mrYn3ZGXaVgHr6wePj3l+BkjE8qP9B/9Dd3GxlDxsRbt9oOnK2AxJJwz9JB 8ege0QeFEsL6zVIA9alpsEPJ7b+4SFpRl3ott+lplepXPIhog64N8qrjhjUxgRRDl+tf QyMR8ePL0L9GlO7XGcU23Z7we8Mdczalq1ZIkjbH89vjKhbwvonKxQtXeU3jxDPnx2Wc fOoynTEAyMZsx2LrglitBXwrBWIWTRj1OhcbXkjkJBrzaQBhAqJrHX6OpgqnGNhuKznA MoCQ== X-Gm-Message-State: AElRT7G9aW0mK2Lbet4PQsjCiaNr7mkJOSgXtRvRuxQlgJo+LLULw1iJ qKG2Vn1tlefGEfPfr2HU0jlpsA== X-Received: by 10.101.92.6 with SMTP id u6mr9426737pgr.440.1521472940933; Mon, 19 Mar 2018 08:22:20 -0700 (PDT) Received: from xeon-e3 (204-195-71-95.wavecable.com. [204.195.71.95]) by smtp.gmail.com with ESMTPSA id j19sm473688pfh.26.2018.03.19.08.22.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 19 Mar 2018 08:22:20 -0700 (PDT) Date: Mon, 19 Mar 2018 08:22:11 -0700 From: Stephen Hemminger To: Rahul Lakkireddy Cc: "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "kexec@lists.infradead.org" , "davem@davemloft.net" , "ebiederm@xmission.com" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , Ganesh GR , Nirranjan Kirubaharan , Indranil Choudhury Subject: Re: [RFC v2 0/2] kernel: add support to collect hardware logs in crash recovery kernel Message-ID: <20180319082211.6651b45a@xeon-e3> In-Reply-To: <20180319075555.GA22955@chelsio.com> References: <1521198725-13463-1-git-send-email-rahul.lakkireddy@chelsio.com> <20180319075555.GA22955@chelsio.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Mar 2018 13:25:56 +0530 Rahul Lakkireddy wrote: > On Friday, March 03/16/18, 2018 at 16:42:03 +0530, Rahul Lakkireddy wrote: > > On production servers running variety of workloads over time, kernel > > panic can happen sporadically after days or even months. It is > > important to collect as much debug logs as possible to root cause > > and fix the problem, that may not be easy to reproduce. Snapshot of > > underlying hardware/firmware state (like register dump, firmware > > logs, adapter memory, etc.), at the time of kernel panic will be very > > helpful while debugging the culprit device driver. > > > > This series of patches add new generic framework that enable device > > drivers to collect device specific snapshot of the hardware/firmware > > state of the underlying device in the crash recovery kernel. In crash > > recovery kernel, the collected logs are exposed via /proc/crashdd/ > > directory, which is copied by user space scripts for post-analysis. > > > > A kernel module crashdd is newly added. In crash recovery kernel, > > crashdd exposes /proc/crashdd/ directory containing device specific > > hardware/firmware logs. > > > > The sequence of actions done by device drivers to append their device > > specific hardware/firmware logs to /proc/crashdd/ directory are as > > follows: > > > > 1. During probe (before hardware is initialized), device drivers > > register to the crashdd module (via crashdd_add_dump()), with > > callback function, along with buffer size and log name needed for > > firmware/hardware log collection. > > > > 2. Crashdd creates a driver's directory under /proc/crashdd/. > > Then, it allocates the buffer with requested size and invokes the > > device driver's registered callback function. > > > > 3. Device driver collects all hardware/firmware logs into the buffer > > and returns control back to crashdd. > > > > 4. Crashdd exposes the buffer as a file via > > /proc/crashdd//. > > > > 5. User space script (/usr/lib/kdump/kdump-lib-initramfs.sh) copies > > the entire /proc/crashdd/ directory to /var/crash/ directory. > > > > Patch 1 adds crashdd module to allow drivers to register callback to > > collect the device specific hardware/firmware logs. The module also > > exports /proc/crashdd/ directory containing the hardware/firmware logs. > > > > Patch 2 shows a cxgb4 driver example using the API to collect > > hardware/firmware logs in crash recovery kernel, before hardware is > > initialized. The logs for the devices are made available under > > /proc/crashdd/cxgb4/ directory. > > > > Suggestions and feedback will be much appreciated. > > > > Thanks, > > Rahul > > > > RFC v1: https://www.spinics.net/lists/netdev/msg486562.html > > > > --- > > v2: > > - Added new crashdd module that exports /proc/crashdd/ containing > > driver's registered hardware/firmware logs in patch 1. > > - Replaced the API to allow drivers to register their hardware/firmware > > log collect routine in crash recovery kernel in patch 1. > > - Updated patch 2 to use the new API in patch 1. > > > > Rahul Lakkireddy (2): > > proc/crashdd: add API to collect hardware dump in second kernel > > cxgb4: collect hardware dump in second kernel > > > > drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 4 + > > drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 25 +++ > > drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h | 3 + > > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 12 ++ > > fs/proc/Kconfig | 11 + > > fs/proc/Makefile | 1 + > > fs/proc/crashdd.c | 263 +++++++++++++++++++++++ > > include/linux/crashdd.h | 43 ++++ > > 8 files changed, 362 insertions(+) > > create mode 100644 fs/proc/crashdd.c > > create mode 100644 include/linux/crashdd.h > > > > -- > > 2.14.1 > > > > Does anyone have any comments with this approach? If there are no > comments, then I'll re-spin this RFC to Patch series. > > Thanks, > Rahul This does look like it gives useful data, but it is not clear that this can not already be done with existing API's or small extensions. Introducing a new /proc interface and one that is mostly device specific is unlikely to be greeted with a warm reception by the current Linux kernel community. For example, getting firmware logs seems like something more related to ethtool or sysfs.