Received: by 10.223.185.116 with SMTP id b49csp8950403wrg; Fri, 2 Mar 2018 10:41:41 -0800 (PST) X-Google-Smtp-Source: AG47ELvmVxJfm78VTaxzpHqhqIPHxhvbdoBnAhCOD49qZKGSrF88//fHjeFA+5iXtSYa0E9bbrwU X-Received: by 10.98.216.137 with SMTP id e131mr6623431pfg.17.1520016100880; Fri, 02 Mar 2018 10:41:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520016100; cv=none; d=google.com; s=arc-20160816; b=RjdGS8+FTiV8MAfioDOhv74gYWGmKyQoobe1kzfaKndLqkFKF8KSbWuycKZCJHNP9P 1z++krtcd0ppIK0u6sRKFIGMzw6lLhP6OXsT0LaE79sADP8Vi+HMZdLP8W6vDcbVaLww nJIcS+OfioTN+RMCWkq1eKNq8nDfQpOnb6Dbt17fA6lbx5g7FtWg/Bix0rQRrF8kJ61E r5kjJQRRJu+bERD6bxgl6pDsNG7+jl28CH2xUEu+y1ezRvGATHnosVrtf402MpdmjZFs WTZ1KxJdec0sU6eCOzBuCcYjDLjQkyUcjXH2nDbQHn4RAR25EXvddhXjLLFV/T6vjCtL fChA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=eiyQwkIAFrDvKV28gmqC5gP40vZc6JEEP0Itopm4cz0=; b=z9C+WEOe4ic+f+e6ygbcCO4ebO16dpymYy+zbr5/P9MMLjcZ28Xzxf1pdpdvF/NCLt d+SIjHaHP09n6at+rmRNUSP73mjyk4oGoLajp0XX8whP3t6YFDAnW+QEMXqmgZZIDrTD qK6iwZt8y7JxYkJobpEkp9YCXJDLFbXmPNDxixqrDCDdEqlG/+PtLU1lkUpCCGorD9oP Mw56lEPddKPikqr3iUtHJ3qGg7cD2TgXyELi90mntEprltQxdH7pg6FyYF15yKHrFEpk D4QoUf9/sF5DCq2aUNzsTznf0T1Yw9JGrWCjZKQIiZEsmZtbhpSFJcoBtfpEwgLUV/k7 Zavg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c1-v6si5190522pld.401.2018.03.02.10.41.26; Fri, 02 Mar 2018 10:41:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1427511AbeCBMV0 (ORCPT + 99 others); Fri, 2 Mar 2018 07:21:26 -0500 Received: from stargate.chelsio.com ([12.32.117.8]:40628 "EHLO stargate.chelsio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423042AbeCBMVP (ORCPT ); Fri, 2 Mar 2018 07:21:15 -0500 Received: from localhost (scalar.blr.asicdesigners.com [10.193.185.94]) by stargate.chelsio.com (8.13.8/8.13.8) with ESMTP id w22CKeNI013049; Fri, 2 Mar 2018 04:20:41 -0800 From: Rahul Lakkireddy To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kexec@lists.infradead.org Cc: davem@davemloft.net, ebiederm@xmission.com, akpm@linux-foundation.org, torvalds@linux-foundation.org, ganeshgr@chelsio.com, nirranjan@chelsio.com, indranil@chelsio.com, Rahul Lakkireddy Subject: [RFC 0/2] kernel: add support to collect hardware logs in panic Date: Fri, 2 Mar 2018 17:49:56 +0530 Message-Id: X-Mailer: git-send-email 2.5.3 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On production servers running variety of workloads over time, kernel panic can happen sporadically after days or even months. It is important to collect as much debug logs as possible to root cause and fix the problem, that may not be easy to reproduce. Snapshot of underlying hardware/firmware state (like register dump, firmware logs, adapter memory, etc.), at the time of kernel panic will be very helpful while debugging the culprit device driver. This series of patches add new generic framework that enable device drivers to collect device specific snapshot of the hardware/firmware state of the underlying device at the time of kernel panic. The collected logs are appended to vmcore along with details, such as start address and length of the logs, which are required for extraction during post-analysis. Device drivers can use crash_driver_dump_register() to register their callback that collects underlying device specific hardware/firmware logs during kernel panic (i.e. before booting into the second kernel). Drivers can unregister with crash_driver_dump_unregister(). To extract the device specific hardware/firmware logs using crash: crash> help -D | grep DRIVERDUMP DRIVERDUMP=(cxgb4_0000:02:00.4, ffffb131090bd000, 37782968) crash> rd ffffb131090bd000 37782968 -r hardware.log 37782968 bytes copied from 0xffffb131090bd000 to hardware.log Patch 1 adds API to allow drivers to register callback to collect the device specific hardware/firmware logs. Patch 2 shows a cxgb4 driver example using the API to collect hardware/firmware logs during kernel panic. Suggestions and feedback will be much appreciated. Thanks, Rahul Rahul Lakkireddy (2): kernel/crash_core: add API to collect hardware dump in kernel panic cxgb4: collect hardware dump in kernel panic drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 6 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 95 +++++++++++++++++++++++- drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h | 4 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 12 +++ include/linux/crash_core.h | 33 ++++++++ kernel/crash_core.c | 83 ++++++++++++++++++++- kernel/kexec_core.c | 1 + 7 files changed, 229 insertions(+), 5 deletions(-) -- 2.14.1