Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp799849lqs; Tue, 5 Mar 2024 18:01:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUsfSyKLJygL2ZYBVSrzAyhpk1A3xcVqaeWl9horYnN7lyvtVJgD2roRrZ0fUDUDp8UPz29YQS2DR1/miPL/5JexhLzMyC36sQhn78bVQ== X-Google-Smtp-Source: AGHT+IFdGqkn+fNQLCSoHlRPN13bqDIgSPMzM8bCSAF70a9A/+FUkwj/LB6vMUlUKTz0MEbPvPAA X-Received: by 2002:a17:90a:17e9:b0:29a:24c6:5cd2 with SMTP id q96-20020a17090a17e900b0029a24c65cd2mr10342057pja.32.1709690471772; Tue, 05 Mar 2024 18:01:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709690471; cv=pass; d=google.com; s=arc-20160816; b=loY5D5Vc9eDjE/2GY/kiUiCL0/4nrnKuLtwaxE57jPCZKFkKOHo39mpKvozEQO0MIC 83drmYM2OryzCtYjIq5rD/Z3E7m/U7hrDlJHhhToB1ee/BfBWuptjEGu6b44pDPHRHP9 XhZ7QXZHfi35/8OjnmSoAnYhYVarA8w3QYuM46k5NhqeBzjcN5f3pYGMu08iobmzyJ0H uMUBeMp2Xnkzi9M79u/k8njKTnijRYbCpcfzr1OD+iPr3QlvWgbLr1SsT6iJfBrUqBOa /d5L/T+ZDmsa8HmrR4g9Brod66Awph9+/nP63uwXdA47kr5u4Pd226z19+ncB1ZhtaA3 T/VA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:subject:cc:to :from:date:user-agent:message-id; bh=U/O2FsKZko8uW7HFvn2rgX6HnUMbJerqSVrOs+cijYw=; fh=Ix7vUKcZB54T3hGcALQhA/BzgLD1qNEu9GE2SGoxGjs=; b=R05SlE6wOA4MbxNZ5YMw9evrroAvUrm4Umbe11GrP5VgNBV8XT1acUzp5q5V81Zcji 7KvvLpjLiwxXhGrIKBxE7AYqjq+uwfSh8oLLMpdECu0aaq4Y/TR58XdKA+yEcyxrMms/ Q1G7T7Ao4aS9i/BWpvz2QYwY+Hg/vRXgGDkr8+Q/UMUexr03VBpQ34IvrKBRozo+xEwi hrOImW6WPtXglFSM2pkuAglrw0ztF0nbPyD0FZiJ7oGwNwJLup4p5Ozw8ggwU0JQcMHO BrEwwpR8FR3568nlfrqjXApNTXEbytBeH8E6sEeW/MKrkQBTMObDPmaE8FSURS3z/LEl iBrA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-93211-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-93211-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id ft18-20020a17090b0f9200b0029a3c01e4a1si13005940pjb.139.2024.03.05.18.01.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 18:01:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-93211-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-93211-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-93211-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 4C232B24612 for ; Wed, 6 Mar 2024 01:58:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 293F0DF44; Wed, 6 Mar 2024 01:58:15 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9269E7475; Wed, 6 Mar 2024 01:58:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709690294; cv=none; b=OYMVrooHQZubeeUu/TCGz19I1n8IxTOff59nqFIM2cYq61e+SzEUWORiyjUIFTQbhQTXh41kgK7oZKJKzM4E4rtiA4EPGaInHQHUsFvQY+dwhQfb7069TfuCLy27MAZP5JrQiA7pwkQhYgajtAm/U3SJjWXsRKmUt909EL746H4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709690294; c=relaxed/simple; bh=N4Ees7iya5usqGesrtJuSIFeHK7U2bfQot6aaUcpFmA=; h=Message-ID:Date:From:To:Cc:Subject; b=npI32z3JFixrYb9shdqkHty71uEozQj2CGjjnIdJJSDOUeOjrPxi7fvfts/vMoQaMMtKDCfDgrXyEHLhoVo2sXuiDd8HCCJYW3psnq8cHSA8vcErSKMObNQK8Fw6ZJ0f+VuelglXNj4WK6Eu3JQICZjPwvpGxM2u9R7vvshYe70= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D2A6C43390; Wed, 6 Mar 2024 01:58:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rhgZh-00000000T9u-1oSf; Tue, 05 Mar 2024 21:00:05 -0500 Message-ID: <20240306015910.766510873@goodmis.org> User-Agent: quilt/0.67 Date: Tue, 05 Mar 2024 20:59:10 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Daniel Bristot de Oliveira , Ingo Molnar , Peter Zijlstra , suleiman@google.com, Thomas Gleixner , Vineeth Pillai , Youssef Esmat , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells Subject: [PATCH 0/8] tracing: Persistent traces across a reboot or crash Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: This is a way to map a ring buffer instance across reboots. The requirement is that you have a memory region that is not erased. I tested this on a Debian VM running on qemu on a Debian server, and even tested it on a baremetal box running Fedora. I was surprised that it worked on the baremetal box, but it does so surprisingly consistently. The idea is that you can reserve a memory region and save it in two special variables: trace_buffer_start and trace_buffer_size If these are set by fs_initcall() then a "boot_mapped" instance is created. The memory that was reserved is used by the ring buffer of this instance. It acts like a memory mapped instance so it has some limitations. It does not allow snapshots nor does it allow tracers which use a snapshot buffer (like irqsoff and wakeup tracers). On boot up, when setting up the ring buffer, it looks at the current content and does a vigorous test to see if the content is valid. It even walks the events in all the sub-buffers to make sure the ring buffer meta data is correct. If it determines that the content is valid, it will reconstruct the ring buffer to use the content it has found. If the buffer is valid, on the next boot, the boot_mapped instance will contain the data from the previous boot. You can cat the trace or trace_pipe file, or even run trace-cmd extract on it to make a trace.dat file that holds the date. This is much better than dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!) There are still some limitations of this buffer. One is that it assumes that the kernel you are booting back into is the same one that crashed. At least the trace_events (like sched_switch and friends) all have the same ids. This would be true with the same kernel as the ids are determined at link time. Module events could possible be a problem as the ids may not match. One idea is to just print the raw fields and not process the print formats for this instance, as the print formats may do some crazy things with data that does not match. Another limitation is any print format that has "%pS" will likely not work. That's because the pointer in the old ring buffer is for an address that may be different than the function points to now. I was thinking of adding a file in the boot_mapped instance that holds the delta of the old mapping to the new mapping, so that trace-cmd and perf could calculate the current kallsyms from the old pointers. Finally, this is still a proof of concept. How to create this memory mapping isn't decided yet. In this patch set I simply hacked into kexec crash code and hard coded an address that worked for one of my machines (for the other machine I had to play around to find another address). Perhaps we could add a kernel command line parameter that lets people decided, or an option where it could possibly look at the ACPI (for intel) tables to come up with an address on its own. Anyway, I plan on using this for debugging, as it already is pretty featureful but there's much more that can be done. Basically, all you need to do is: echo 1 > /sys/kernel/tracing/instances/boot_mapped/events/enable Do what ever you want and the system crashes (and boots to the same kernel). Then: cat /sys/kernel/tracing/instances/boot_mapped/trace and it will have the trace. I'm sure there's still some gotchas here, which is why this is currently still just a POC. Enjoy... Steven Rostedt (Google) (8): ring-buffer: Allow mapped field to be set without mapping ring-buffer: Add ring_buffer_alloc_range() tracing: Create "boot_mapped" instance for memory mapped buffer HACK: Hard code in mapped tracing buffer address ring-buffer: Add ring_buffer_meta data ring-buffer: Add output of ring buffer meta page ring-buffer: Add test if range of boot buffer is valid ring-buffer: Validate boot range memory events ---- arch/x86/kernel/setup.c | 20 ++ include/linux/ring_buffer.h | 17 + include/linux/trace.h | 7 + kernel/trace/ring_buffer.c | 826 ++++++++++++++++++++++++++++++++++++++------ kernel/trace/trace.c | 95 ++++- kernel/trace/trace.h | 5 + 6 files changed, 856 insertions(+), 114 deletions(-)