Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2527164imu; Thu, 17 Jan 2019 16:16:31 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Qptlm0nShqYZAmGib1ytFELRt52vtd8a+JS7ZASFYhNuvlVbIxNehtptg4OyqFA2oW6cJ X-Received: by 2002:a63:26c1:: with SMTP id m184mr14519917pgm.367.1547770591553; Thu, 17 Jan 2019 16:16:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547770591; cv=none; d=google.com; s=arc-20160816; b=cg40g4nhluncpKwr5R9n2CfvCvDbVX+MtbvF+b4sQ/pw7pZTy5qvhDStme+A8PlX6N Hk9gkCQZGljiVxqG5W9mvESqWEqJ+rrGWtbSjgCJNHJeWIx19AmqVdQmFx9KGyKYFtkj FJJ92vlRfQOIBHKEYee2P2Gbtl3GFiuNiQloMdYhVUO3pzYqUplR/O7rLo1gmxcN1+m4 qGhmxN/+SzXWb+Jw6xzm11NekQItrsZvbThX34dzEPiCAQHz6f4l67LveRDr2gmXj/TU BhejOFtVipmKwFHIy3IN2G8L129Dqri3IcIonm87UlMF8FTDrf1mQuazILTV+Ak1aWTl tc9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=XmKxm4jtlJkr4axIe+NWOqqfiIg6Vi9T4D2Yd79Y6Qg=; b=JZtyp4gTOEEIbu/+/F0RIsYLu/wgGD3Mh/P75IHctfqbXDToTUT1EdgFeH2RfSMi1Z 2JD59siQut89/EzNsLeoPtloGW9sxfNMTnqdHoPzkHhwLsHxjNgR71H28Jj8kcgZn7ld Y1qhms/TQgIDuhwAS4sxCiuhgN7uFb1piVc08SD1J08nW18G3iRM27jlyxoGt55naeZD F4hgrvRyBd7IkxtI2fDt6Puy03XyKmtbu5ydGIdTJ6kKkYeku4W93qLo/Y4/lGQ6YxPb qQFAiMBPk6UVG6HHvam5t1LtCcVs+djRYGSLQT6WwzyqDYUVSI8IQ6H6yBqFLLgGa9JX mvKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="djx2/fBB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v2si2813661plz.53.2019.01.17.16.16.14; Thu, 17 Jan 2019 16:16:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="djx2/fBB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726158AbfARAMw (ORCPT + 99 others); Thu, 17 Jan 2019 19:12:52 -0500 Received: from mail-vs1-f67.google.com ([209.85.217.67]:40900 "EHLO mail-vs1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbfARAMw (ORCPT ); Thu, 17 Jan 2019 19:12:52 -0500 Received: by mail-vs1-f67.google.com with SMTP id z3so7397715vsf.7 for ; Thu, 17 Jan 2019 16:12:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XmKxm4jtlJkr4axIe+NWOqqfiIg6Vi9T4D2Yd79Y6Qg=; b=djx2/fBBWVLs6M+HRv5aaSToM23msIngxki/rZsmx6V5SEFzDXdRtVZ/F/YNTzulvQ 7dKEhZ2fqonZ3qRm+3fxJHkB/hTFRwKkaAdf9ZA9qBiSBWiOTRYBLYonEOcTLwbl/9RU SjFCu2YwuHsML1zX932mXtu8PbrjJbi+wOYyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XmKxm4jtlJkr4axIe+NWOqqfiIg6Vi9T4D2Yd79Y6Qg=; b=d5yW2TaBpUIkRIxwWWSJEcrzqTd1+OzzvD/0RkgIJahjxZYXYPFz8Djw0+GzM/c4jz zqWThBowV3OgCyI8IoxBI+VXHCrZqTJyk39ZbFX99W/09s3ySr0xmaeHtVCkkPqXpgDD 2KBaxXhrFKIDEkQfxn2SNpWXhmAH8ygSAnDjPZHI++eu14dVlRufWqOCZfQkqlbcHbXb OtTHlP58xXXpo5jnXSx9Vcr/C3zCVz6Ge9b3EFQ+zBg9vPOu1E6R8gXAcvfBKIGP+POe upPLhfTWMYG0iFnmtiBffVxwn29wfhWSejYmHpcGtvyi0Poic+0zgaKSJBHRVpe0kc7O epyg== X-Gm-Message-State: AJcUukdOGe40px+weDhBM5eKzaZvHC1+dRMH6belOHzdFP0G2oWwXMlH e1RMSSHtP/OxnQ3yd0QfYhFit1aFrbc= X-Received: by 2002:a67:fa98:: with SMTP id f24mr7512427vsq.125.1547770369123; Thu, 17 Jan 2019 16:12:49 -0800 (PST) Received: from mail-vs1-f53.google.com (mail-vs1-f53.google.com. [209.85.217.53]) by smtp.gmail.com with ESMTPSA id j95sm5524268uad.6.2019.01.17.16.12.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Jan 2019 16:12:47 -0800 (PST) Received: by mail-vs1-f53.google.com with SMTP id x28so7357470vsh.12 for ; Thu, 17 Jan 2019 16:12:47 -0800 (PST) X-Received: by 2002:a67:e15e:: with SMTP id o30mr7118947vsl.66.1547770366630; Thu, 17 Jan 2019 16:12:46 -0800 (PST) MIME-Version: 1.0 References: <1546862462-19640-1-git-send-email-liaoweixiong@allwinnertech.com> <1546862462-19640-2-git-send-email-liaoweixiong@allwinnertech.com> In-Reply-To: <1546862462-19640-2-git-send-email-liaoweixiong@allwinnertech.com> From: Kees Cook Date: Thu, 17 Jan 2019 16:12:32 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC v5 1/4] pstore/blk: new support logger for block devices To: liaoweixiong Cc: Anton Vorontsov , Colin Cross , Tony Luck , Jonathan Corbet , Mauro Carvalho Chehab , Greg Kroah-Hartman , "David S. Miller" , Andrew Morton , Nicolas Ferre , Arnd Bergmann , "open list:DOCUMENTATION" , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 7, 2019 at 4:01 AM liaoweixiong wrote: > > pstore_blk is similar to pstore_ram, but dump log to block devices > rather than persistent ram. > > Why should we need pstore_blk? > 1. Most embedded intelligent equipment have no persistent ram, which > increases costs. We perfer to cheaper solutions, like block devices. > In fast, there is already a sample for block device logger in driver typo: fast -> fact > MTD (drivers/mtd/mtdoops.c). Would mtdoops get dropped in favor of pstore/blk, or do they not share features? > 2. Do not any equipment have battery, which means that it lost all data > on general ram if power failure. Pstore has little to do for these > equipments. > > pstore_blk can only dump Oops/Panic log to block devices. It only > supports dmesg now. To make pstore_blk work, the block driver should > provide the path of a block partition device (/dev/XXXX) and the > read/write apis when on panic. > > pstore_blk begins at 'blkz_register', by witch block device can register > a block partition to pstore_blk. Then pstore_blk divide and manage the > partition as zones, which is similar to pstore_ram. > > Recommend that, block driver register pstore_blk after block device is > ready. > > pstore_blk works well on allwinner(sunxi) platform. > > Signed-off-by: liaoweixiong > --- > fs/pstore/Kconfig | 7 + > fs/pstore/Makefile | 3 + > fs/pstore/blkzone.c | 958 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/pstore_blk.h | 61 +++ > 4 files changed, 1029 insertions(+) > create mode 100644 fs/pstore/blkzone.c > create mode 100644 include/linux/pstore_blk.h > > diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig > index 8b3ba27..a881c53 100644 > --- a/fs/pstore/Kconfig > +++ b/fs/pstore/Kconfig > @@ -152,3 +152,10 @@ config PSTORE_RAM > "ramoops.ko". > > For more information, see Documentation/admin-guide/ramoops.rst. > + > +config PSTORE_BLK > + tristate "Log panic/oops to a block device" > + depends on PSTORE > + help > + This enables panic and oops message to be logged to a block dev > + where it can be read back at some later point. > diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile > index 967b589..c431700 100644 > --- a/fs/pstore/Makefile > +++ b/fs/pstore/Makefile > @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG) += pmsg.o > > ramoops-objs += ram.o ram_core.o > obj-$(CONFIG_PSTORE_RAM) += ramoops.o > + > +obj-$(CONFIG_PSTORE_BLK) += pstore_blk.o > +pstore_blk-y += blkzone.o > diff --git a/fs/pstore/blkzone.c b/fs/pstore/blkzone.c > new file mode 100644 > index 0000000..e1b7f26 > --- /dev/null > +++ b/fs/pstore/blkzone.c > @@ -0,0 +1,958 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * > + * blkzone.c: Block device Oops/Panic logger > + * > + * Copyright (C) 2019 liaoweixiong > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#define pr_fmt(fmt) "blkzone: " fmt To follow with "ramoops", maybe this should be "blkoops"? I don't have a strong opinion, but "zone" is only used internally in ram.c for the memory segments (zones), so it's not clear to me if "blkzone" is meaningful to a given pstore end user. > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/** > + * struct blkz_head - head of zone to flush to storage > + * > + * @sig: signature to indicate header (BLK_SIG xor BLKZONE-type value) > + * @datalen: length of data in @data > + * @data: zone data. > + */ > +struct blkz_buffer { > +#define BLK_SIG (0x43474244) /* DBGC */ > + uint32_t sig; > + atomic_t datalen; > + uint8_t data[0]; nit: I prefer this to be [] instead of [0], but technically is doesn't matter. :) > +}; > + > +/** > + * sruct blkz_dmesg_header: dmesg information typo: sruct -> struct > + * > + * @magic: magic num for dmesg header > + * @time: trigger time > + * @compressed: whether conpressed > + * @count: oops/panic counter > + * @reason: identify oops or panic > + */ > +struct blkz_dmesg_header { > +#define DMESG_HEADER_MAGIC 0x4dfc3ae5 > + uint32_t magic; > + struct timespec64 time; > + bool compressed; > + uint32_t counter; > + enum kmsg_dump_reason reason; > + uint8_t data[0]; Same. > +}; > + > +/** > + * struct blkz_zone - zone information > + * @off: > + * zone offset of partition > + * @type: > + * frontent type for this zone > + * @name: > + * frontent name for this zone > + * @buffer: > + * pointer to data buffer managed by this zone > + * @buffer_size: > + * bytes in @buffer->data > + * @should_recover: > + * should recover from storage > + * @dirty: > + * mark whether the data in @buffer are dirty (not flush to storage yet) > + */ > +struct blkz_zone { > + unsigned long off; > + const char *name; > + enum pstore_type_id type; > + > + struct blkz_buffer *buffer; > + size_t buffer_size; > + bool should_recover; > + atomic_t dirty; > +}; > + > +struct blkoops_context { > + struct blkz_zone **dbzs; /* dmesg block zones */ > + unsigned int dmesg_max_cnt; > + unsigned int dmesg_read_cnt; > + unsigned int dmesg_write_cnt; > + /** > + * the counter should be recovered when do recovery > + * It records the oops/panic times after burning rather than booting. > + */ > + unsigned int oops_counter; > + unsigned int panic_counter; > + atomic_t blkdev_up; > + atomic_t recovery; > + atomic_t on_panic; > + > + /* > + * bzinfo_lock just protects "bzinfo" during calls to > + * blkz_register/blkz_unregister > + */ > + spinlock_t bzinfo_lock; > + struct blkz_info *bzinfo; > + struct pstore_info pstore; > +}; > +static struct blkoops_context blkz_cxt; > + > +enum blkz_flush_mode { > + FLUSH_NONE = 0, > + FLUSH_PART, > + FLUSH_META, > + FLUSH_ALL, > +}; > + > +static inline int buffer_datalen(struct blkz_zone *zone) > +{ > + return atomic_read(&zone->buffer->datalen); > +} > + > +static inline bool is_on_panic(void) > +{ > + struct blkoops_context *cxt = &blkz_cxt; > + > + return atomic_read(&cxt->on_panic); > +} > + > +static inline bool is_blkdev_up(void) > +{ > + struct blkoops_context *cxt = &blkz_cxt; > + const char *devpath = cxt->bzinfo->part_path; > + > + if (atomic_read(&cxt->blkdev_up)) > + return true; > + if (is_on_panic()) > + goto set_up; > + if (devpath && !name_to_dev_t(devpath)) > + return false; > +set_up: > + atomic_set(&cxt->blkdev_up, 1); > + return true; > +} > + > +static int blkz_zone_read(struct blkz_zone *zone, char *buf, > + size_t len, unsigned long off) > +{ > + if (!buf || !zone->buffer) > + return -EINVAL; > + len = min((size_t)len, (size_t)(zone->buffer_size - off)); I'd rather this had an explicit overflow test before the min(): make sure "off" is not > buffer_size. (And probably you want min_t(size_t, ...) instead of the double-cast. > + memcpy(buf, zone->buffer->data + off, len); > + return 0; > +} > + > +static int blkz_zone_write(struct blkz_zone *zone, > + enum blkz_flush_mode flush_mode, const char *buf, > + size_t len, unsigned long off) > +{ > + struct blkz_info *info = blkz_cxt.bzinfo; > + ssize_t wcnt; > + ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos); > + size_t wlen; > + > + wlen = min((size_t)len, (size_t)(zone->buffer_size - off)); Same overflow test here. > + if (flush_mode != FLUSH_META && flush_mode != FLUSH_NONE) { > + if (buf && zone->buffer) > + memcpy(zone->buffer->data + off, buf, wlen); > + atomic_set(&zone->buffer->datalen, wlen + off); > + } > + > + if (!is_blkdev_up()) > + goto set_dirty; > + > + writeop = is_on_panic() ? info->panic_write : info->write; > + if (!writeop) > + return -EINVAL; > + > + switch (flush_mode) { > + case FLUSH_NONE: > + return 0; > + case FLUSH_PART: > + wcnt = writeop((const char *)zone->buffer->data + off, wlen, > + zone->off + sizeof(*zone->buffer) + off); > + if (wcnt != wlen) > + goto set_dirty; > + case FLUSH_META: > + wlen = sizeof(struct blkz_buffer); > + wcnt = writeop((const char *)zone->buffer, wlen, zone->off); > + if (wcnt != wlen) > + goto set_dirty; > + break; > + case FLUSH_ALL: > + wlen = buffer_datalen(zone) + sizeof(*zone->buffer); > + wcnt = writeop((const char *)zone->buffer, wlen, zone->off); > + if (wcnt != wlen) > + goto set_dirty; > + break; > + } > + > + return 0; > +set_dirty: > + atomic_set(&zone->dirty, true); > + return -EBUSY; > +} > + > +/* > + * blkz_move_zone: move data from a old zone to a new zone > + * > + * @old: the old zone > + * @new: the new zone > + * > + * NOTE: > + * Call blkz_zone_write to copy and flush data. If it failed, we > + * should reset new->dirty, because the new zone not realy dirty. typo: realy -> really > + */ > +static int blkz_move_zone(struct blkz_zone *old, struct blkz_zone *new) > +{ > + const char *data = (const char *)old->buffer->data; > + int ret; > + > + ret = blkz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0); > + if (ret) { > + atomic_set(&new->buffer->datalen, 0); > + atomic_set(&new->dirty, false); > + return ret; > + } > + atomic_set(&old->buffer->datalen, 0); > + return 0; > +} > + > +static int blkz_recover_dmesg_data(struct blkoops_context *cxt) > +{ > + struct blkz_info *info = cxt->bzinfo; > + struct blkz_zone *zone = NULL; > + struct blkz_buffer *buf; > + unsigned long i; > + ssize_t (*readop)(char *buf, size_t bytes, loff_t pos); > + ssize_t rcnt; > + > + readop = is_on_panic() ? info->panic_read : info->read; > + if (!readop) > + return -EINVAL; > + > + for (i = 0; i < cxt->dmesg_max_cnt; i++) { > + zone = cxt->dbzs[i]; > + if (unlikely(!zone)) > + return -EINVAL; > + if (atomic_read(&zone->dirty)) { > + unsigned int wcnt = cxt->dmesg_write_cnt; > + struct blkz_zone *new = cxt->dbzs[wcnt]; > + int ret; > + > + ret = blkz_move_zone(zone, new); > + if (ret) { > + pr_err("move zone from %lu to %d failed\n", > + i, wcnt); > + return ret; > + } > + cxt->dmesg_write_cnt = (wcnt + 1) % cxt->dmesg_max_cnt; > + } > + if (!zone->should_recover) > + continue; > + buf = zone->buffer; > + rcnt = readop((char *)buf, zone->buffer_size + sizeof(*buf), > + zone->off); > + if (rcnt != zone->buffer_size + sizeof(*buf)) > + return (int)rcnt < 0 ? (int)rcnt : -EIO; > + } > + return 0; > +} > + > +/* typo: for kern-doc the opening comment should be /** > + * blkz_recover_dmesg_meta: recover meta datas of dmesg typo: meta datas -> metadata ("data" is already plural) > + * > + * Recover datas as follow: Should this be "metadata" ? > + * @cxt->dmesg_write_cnt > + * @cxt->oops_counter > + * @cxt->panic_counter > + */ > +static int blkz_recover_dmesg_meta(struct blkoops_context *cxt) > +{ > + struct blkz_info *info = cxt->bzinfo; > + struct blkz_zone *zone; > + size_t rcnt, len; > + struct blkz_buffer *buf; > + struct blkz_dmesg_header *hdr; > + ssize_t (*readop)(char *buf, size_t bytes, loff_t pos); > + struct timespec64 time = {0}; > + unsigned long i; > + /** typo: double-* not needed here > + * Recover may on panic, we can't allocate any memory by kmalloc. > + * So, we use local array instead. > + */ > + char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0}; > + > + readop = is_on_panic() ? info->panic_read : info->read; > + if (!readop) > + return -EINVAL; > + > + len = sizeof(*buf) + sizeof(*hdr); > + buf = (struct blkz_buffer *)buffer_header; > + for (i = 0; i < cxt->dmesg_max_cnt; i++) { > + zone = cxt->dbzs[i]; > + if (unlikely(!zone)) > + return -EINVAL; > + > + rcnt = readop((char *)buf, len, zone->off); > + if (rcnt != len) > + return (int)rcnt < 0 ? (int)rcnt : -EIO; > + > + /* > + * If sig NOT match, it means this zone never used before, > + * because we write one by one, and we never modify sig even > + * when erase. So, we do not need to check next one. > + */ > + if (buf->sig != zone->buffer->sig) { > + cxt->dmesg_write_cnt = i; > + pr_debug("no valid data in dmesg zone %lu\n", i); > + break; > + } > + > + if (zone->buffer_size < atomic_read(&buf->datalen)) { > + pr_info("found overtop zone: %s: id %lu, off %lu, size %zu\n", > + zone->name, i, zone->off, > + zone->buffer_size); > + continue; > + } > + > + hdr = (struct blkz_dmesg_header *)buf->data; > + if (hdr->magic != DMESG_HEADER_MAGIC) { > + pr_info("found invalid zone: %s: id %lu, off %lu, size %zu\n", > + zone->name, i, zone->off, > + zone->buffer_size); > + continue; > + } > + > + /* > + * we get the newest zone, and the next one must be the oldest > + * or unused zone, because we do write one by one like a circle. > + */ > + if (hdr->time.tv_sec >= time.tv_sec) { > + time.tv_sec = hdr->time.tv_sec; > + cxt->dmesg_write_cnt = (i + 1) % cxt->dmesg_max_cnt; > + } > + > + if (hdr->reason == KMSG_DUMP_OOPS) > + cxt->oops_counter = > + max(cxt->oops_counter, hdr->counter); > + else > + cxt->panic_counter = > + max(cxt->panic_counter, hdr->counter); > + > + if (!atomic_read(&buf->datalen)) { > + pr_debug("found erased zone: %s: id %ld, off %lu, size %zu, datalen %d\n", > + zone->name, i, zone->off, > + zone->buffer_size, > + atomic_read(&buf->datalen)); > + continue; > + } > + > + zone->should_recover = true; > + pr_debug("found nice zone: %s: id %ld, off %lu, size %zu, datalen %d\n", > + zone->name, i, zone->off, > + zone->buffer_size, atomic_read(&buf->datalen)); > + } > + > + return 0; > +} > + > +static int blkz_recover_dmesg(struct blkoops_context *cxt) > +{ > + int ret; > + > + if (!cxt->dbzs) > + return 0; > + > + ret = blkz_recover_dmesg_meta(cxt); > + if (ret) > + goto recover_fail; > + > + ret = blkz_recover_dmesg_data(cxt); > + if (ret) > + goto recover_fail; > + > + return 0; > +recover_fail: > + pr_debug("recovery dmesg failed\n"); > + return ret; > +} > + > +static inline int blkz_recovery(struct blkoops_context *cxt) > +{ > + int ret = -EBUSY; > + > + if (atomic_read(&cxt->recovery)) > + return 0; > + > + if (!is_blkdev_up()) > + goto recover_fail; > + > + ret = blkz_recover_dmesg(cxt); > + if (ret) > + goto recover_fail; > + > + atomic_set(&cxt->recovery, 1); > + pr_debug("recover end!\n"); > + return 0; > + > +recover_fail: > + pr_debug("recovery failed, handle buffer\n"); > + return ret; > +} > + > +static int blkoops_pstore_open(struct pstore_info *psi) > +{ > + struct blkoops_context *cxt = psi->data; > + > + cxt->dmesg_read_cnt = 0; > + return 0; > +} > + > +static inline bool blkz_ok(struct blkz_zone *zone) > +{ > + if (!zone || !zone->buffer || !buffer_datalen(zone)) > + return false; > + return true; > +} > + > +static int blkoops_pstore_erase(struct pstore_record *record) > +{ > + struct blkoops_context *cxt = record->psi->data; > + struct blkz_zone *zone = NULL; > + > + /* > + * before read/erase, we must recover from storage. > + * if recover failed, handle buffer > + */ > + blkz_recovery(cxt); > + > + if (record->type == PSTORE_TYPE_DMESG) > + zone = cxt->dbzs[record->id]; > + if (!blkz_ok(zone)) > + return 0; > + > + atomic_set(&zone->buffer->datalen, 0); > + return blkz_zone_write(zone, FLUSH_META, NULL, 0, 0); > +} > + > +static void blkoops_write_kmsg_hdr(struct blkz_zone *zone, > + struct pstore_record *record) > +{ > + struct blkoops_context *cxt = record->psi->data; > + struct blkz_buffer *buffer = zone->buffer; > + struct blkz_dmesg_header *hdr = > + (struct blkz_dmesg_header *)buffer->data; > + > + hdr->magic = DMESG_HEADER_MAGIC; > + hdr->compressed = record->compressed; > + hdr->time.tv_sec = record->time.tv_sec; > + hdr->time.tv_nsec = record->time.tv_nsec; > + hdr->reason = record->reason; > + if (hdr->reason == KMSG_DUMP_OOPS) > + hdr->counter = ++cxt->oops_counter; > + else > + hdr->counter = ++cxt->panic_counter; > +} > + > +static int notrace blkz_dmesg_write(struct blkoops_context *cxt, > + struct pstore_record *record) > +{ > + struct blkz_info *info = cxt->bzinfo; > + struct blkz_zone *zone; > + size_t size, hlen; > + > + /* > + * Out of the various dmesg dump types, ramoops is currently designed Heh, copy/paste-o from ramoops. :) ramoops -> blkoops ? > + * to only store crash logs, rather than storing general kernel logs. > + */ > + if (record->reason != KMSG_DUMP_OOPS && > + record->reason != KMSG_DUMP_PANIC) > + return -EINVAL; > + > + /* Skip Oopes when configured to do so. */ > + if (record->reason == KMSG_DUMP_OOPS && !info->dump_oops) > + return -EINVAL; > + > + /* > + * Explicitly only take the first part of any new crash. > + * If our buffer is larger than kmsg_bytes, this can never happen, > + * and if our buffer is smaller than kmsg_bytes, we don't want the > + * report split across multiple records. > + */ > + if (record->part != 1) > + return -ENOSPC; > + > + if (!cxt->dbzs) > + return -ENOSPC; > + > + zone = cxt->dbzs[cxt->dmesg_write_cnt]; > + if (!zone) > + return -ENOSPC; > + > + blkoops_write_kmsg_hdr(zone, record); > + hlen = sizeof(struct blkz_dmesg_header); > + size = record->size; > + if (size + hlen > zone->buffer_size) > + size = zone->buffer_size - hlen; > + blkz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen); > + > + pr_debug("write %s to zone id %d\n", zone->name, cxt->dmesg_write_cnt); > + cxt->dmesg_write_cnt = (cxt->dmesg_write_cnt + 1) % cxt->dmesg_max_cnt; > + return 0; > +} > + > +static int notrace blkoops_pstore_write(struct pstore_record *record) > +{ > + struct blkoops_context *cxt = record->psi->data; > + > + if (record->type == PSTORE_TYPE_DMESG && > + record->reason == KMSG_DUMP_PANIC) > + atomic_set(&cxt->on_panic, 1); > + > + /* > + * before read/erase, we must recover from storage. > + * if recover failed, handle buffer > + */ > + blkz_recovery(cxt); I don't understand this part: you're doing reads before the write? Can you clarify this a bit? (Or maybe, describe what "recovery" is? I thought it was just reading the blk records?) > + > + switch (record->type) { > + case PSTORE_TYPE_DMESG: > + return blkz_dmesg_write(cxt, record); > + default: > + return -EINVAL; > + } > +} > + > +#define READ_NEXT_ZONE ((ssize_t)(-1024)) > +static struct blkz_zone *blkz_read_next_zone(struct blkoops_context *cxt) > +{ > + struct blkz_zone *zone = NULL; > + > + while (cxt->dmesg_read_cnt < cxt->dmesg_max_cnt) { > + zone = cxt->dbzs[cxt->dmesg_read_cnt++]; > + if (blkz_ok(zone)) > + return zone; > + } > + > + return NULL; > +} > + > +static int blkoops_read_dmesg_hdr(struct blkz_zone *zone, > + struct pstore_record *record) > +{ > + struct blkz_buffer *buffer = zone->buffer; > + struct blkz_dmesg_header *hdr = > + (struct blkz_dmesg_header *)buffer->data; > + > + if (hdr->magic != DMESG_HEADER_MAGIC) > + return -EINVAL; > + record->compressed = hdr->compressed; > + record->time.tv_sec = hdr->time.tv_sec; > + record->time.tv_nsec = hdr->time.tv_nsec; > + record->reason = hdr->reason; > + record->count = hdr->counter; > + return 0; > +} > + > +static ssize_t blkz_dmesg_read(struct blkz_zone *zone, > + struct pstore_record *record) > +{ > + size_t size, hlen = 0; > + > + size = buffer_datalen(zone); > + /* Clear and skip this DMESG record if it has no valid header */ > + if (blkoops_read_dmesg_hdr(zone, record)) { > + atomic_set(&zone->buffer->datalen, 0); > + atomic_set(&zone->dirty, 0); > + return READ_NEXT_ZONE; > + } > + size -= sizeof(struct blkz_dmesg_header); > + > + if (!record->compressed) { > + char *buf = kasprintf(GFP_KERNEL, > + "blkoops: %s: Total %d times\n", > + record->reason == KMSG_DUMP_OOPS ? "Oops" : > + "Panic", record->count); > + hlen = strlen(buf); > + record->buf = krealloc(buf, hlen + size, GFP_KERNEL); > + if (!record->buf) { > + kfree(buf); > + return -ENOMEM; > + } > + } else { > + record->buf = kmalloc(size, GFP_KERNEL); > + if (!record->buf) > + return -ENOMEM; > + } > + > + if (unlikely(blkz_zone_read(zone, record->buf + hlen, size, > + sizeof(struct blkz_dmesg_header)) < 0)) { > + kfree(record->buf); > + return READ_NEXT_ZONE; > + } > + > + return size + hlen; > +} > + > +static ssize_t blkoops_pstore_read(struct pstore_record *record) > +{ > + struct blkoops_context *cxt = record->psi->data; > + ssize_t (*blkz_read)(struct blkz_zone *zone, > + struct pstore_record *record); > + struct blkz_zone *zone; > + ssize_t ret; > + > + /* > + * before read/erase, we must recover from storage. > + * if recover failed, handle buffer > + */ > + blkz_recovery(cxt); > + > +next_zone: > + zone = blkz_read_next_zone(cxt); > + if (!zone) > + return 0; > + > + record->id = 0; > + record->type = zone->type; > + > + record->time.tv_sec = 0; > + record->time.tv_nsec = 0; > + record->compressed = false; None of the zeroing is needed: platform.c already zero-initializes, populates record->psi and record->time. > + > + switch (record->type) { > + case PSTORE_TYPE_DMESG: > + blkz_read = blkz_dmesg_read; > + record->id = cxt->dmesg_read_cnt - 1; > + break; > + default: > + goto next_zone; > + } > + > + ret = blkz_read(zone, record); > + if (ret == READ_NEXT_ZONE) > + goto next_zone; > + return ret; > +} > + > +static struct blkoops_context blkz_cxt = { > + .bzinfo_lock = __SPIN_LOCK_UNLOCKED(blkz_cxt.bzinfo_lock), > + .blkdev_up = ATOMIC_INIT(0), > + .recovery = ATOMIC_INIT(0), > + .on_panic = ATOMIC_INIT(0), > + .pstore = { > + .owner = THIS_MODULE, > + .name = "blkoops", > + .open = blkoops_pstore_open, > + .read = blkoops_pstore_read, > + .write = blkoops_pstore_write, > + .erase = blkoops_pstore_erase, > + }, > +}; > + > +static ssize_t blkz_sample_read(char *buf, size_t bytes, loff_t pos) > +{ > + struct blkoops_context *cxt = &blkz_cxt; > + const char *devpath = cxt->bzinfo->part_path; > + struct file *filp; > + ssize_t ret; > + > + if (!devpath) > + return -EINVAL; > + > + if (!is_blkdev_up()) > + return -EBUSY; > + > + filp = filp_open(devpath, O_RDONLY, 0); > + if (IS_ERR(filp)) { > + pr_debug("open %s failed, maybe unready\n", devpath); > + return -EACCES; > + } > + ret = kernel_read(filp, buf, bytes, &pos); > + filp_close(filp, NULL); > + > + return ret; > +} > + > +static ssize_t blkz_sample_write(const char *buf, size_t bytes, loff_t pos) > +{ > + struct blkoops_context *cxt = &blkz_cxt; > + const char *devpath = cxt->bzinfo->part_path; > + struct file *filp; > + ssize_t ret; > + > + if (!devpath) > + return -EINVAL; > + > + if (!is_blkdev_up()) > + return -EBUSY; > + > + filp = filp_open(devpath, O_WRONLY, 0); > + if (IS_ERR(filp)) { > + pr_debug("open %s failed, maybe unready\n", devpath); > + return -EACCES; > + } > + ret = kernel_write(filp, buf, bytes, &pos); > + vfs_fsync(filp, 0); > + filp_close(filp, NULL); > + > + return ret; > +} > + > +static struct blkz_zone *blkz_init_zone(enum pstore_type_id type, > + unsigned long *off, size_t size) > +{ > + struct blkz_info *info = blkz_cxt.bzinfo; > + struct blkz_zone *zone; > + const char *name = pstore_type_to_name(type); > + > + if (!size) > + return NULL; > + > + if (*off + size > info->part_size) { > + pr_err("no room for %s (0x%zx@0x%lx over 0x%lx)\n", > + name, size, *off, info->part_size); > + return ERR_PTR(-ENOMEM); > + } > + > + zone = kzalloc(sizeof(struct blkz_zone), GFP_KERNEL); > + if (!zone) > + return ERR_PTR(-ENOMEM); > + > + /** > + * NOTE: allocate buffer for blk zones for two reasons: > + * 1. It can temporarily hold the data before sample_read/write > + * are useable. > + * 2. It makes pstore usable even if no persistent storage. Most > + * events of pstore except panic are suitable!! > + */ > + zone->buffer = kmalloc(size, GFP_KERNEL); > + if (!zone->buffer) { > + kfree(zone); > + return ERR_PTR(-ENOMEM); > + } > + memset(zone->buffer, 0xFF, size); > + zone->off = *off; > + zone->name = name; > + zone->type = type; > + zone->buffer_size = size - sizeof(struct blkz_buffer); > + zone->buffer->sig = type ^ BLK_SIG; > + atomic_set(&zone->dirty, 0); > + atomic_set(&zone->buffer->datalen, 0); > + > + *off += size; > + > + pr_debug("blkzone %s: off 0x%lx, %zu header, %zu data\n", zone->name, > + zone->off, sizeof(*zone->buffer), zone->buffer_size); > + return zone; > +} > + > +static struct blkz_zone **blkz_init_zones(enum pstore_type_id type, > + unsigned long *off, size_t total_size, ssize_t record_size, > + unsigned int *cnt) > +{ > + struct blkz_info *info = blkz_cxt.bzinfo; > + struct blkz_zone **zones, *zone; > + const char *name = pstore_type_to_name(type); > + int c, i; > + > + if (!total_size || !record_size) > + return NULL; > + > + if (*off + total_size > info->part_size) { > + pr_err("no room for zones %s (0x%zx@0x%lx over 0x%lx)\n", > + name, total_size, *off, info->part_size); > + return ERR_PTR(-ENOMEM); > + } > + > + c = total_size / record_size; > + zones = kcalloc(c, sizeof(*zones), GFP_KERNEL); > + if (!zones) { > + pr_err("allocate for zones %s failed\n", name); > + return ERR_PTR(-ENOMEM); > + } > + memset(zones, 0, c * sizeof(*zones)); > + > + for (i = 0; i < c; i++) { > + zone = blkz_init_zone(type, off, record_size); > + if (!zone || IS_ERR(zone)) { > + pr_err("initialize zones %s failed\n", name); > + while (--i >= 0) > + kfree(zones[i]); > + kfree(zones); > + return (void *)zone; > + } > + zones[i] = zone; > + } > + > + *cnt = c; > + return zones; > +} > + > +static void blkz_free_zone(struct blkz_zone **blkzone) > +{ > + struct blkz_zone *zone = *blkzone; > + > + if (!zone) > + return; > + > + kfree(zone->buffer); > + kfree(zone); > + *blkzone = NULL; > +} > + > +static void blkz_free_zones(struct blkz_zone ***blkzones, unsigned int *cnt) > +{ > + struct blkz_zone **zones = *blkzones; > + > + while (*cnt > 0) { > + blkz_free_zone(&zones[*cnt]); > + (*cnt)--; > + } > + kfree(zones); > + *blkzones = NULL; > +} > + > +static int blkz_cut_zones(struct blkoops_context *cxt) > +{ > + struct blkz_info *info = cxt->bzinfo; > + unsigned long off = 0; > + int err; > + size_t size; > + > + size = info->part_size; > + cxt->dbzs = blkz_init_zones(PSTORE_TYPE_DMESG, &off, size, > + info->dmesg_size, &cxt->dmesg_max_cnt); > + if (IS_ERR(cxt->dbzs)) { > + err = PTR_ERR(cxt->dbzs); > + goto fail_out; > + } > + > + return 0; > +fail_out: > + return err; > +} > + > +int blkz_register(struct blkz_info *info) > +{ > + int err = -EINVAL; > + struct blkoops_context *cxt = &blkz_cxt; > + struct module *owner = info->owner; > + > + if (!info->part_size || !info->dmesg_size) { > + pr_warn("The memory size and the dmesg size must be non-zero\n"); > + return -EINVAL; > + } > + > + if (info->part_size < 4096) { > + pr_err("partition size must be over 4096 bytes\n"); > + return -EINVAL; > + } > + > +#define check_size(name, size) { \ > + if (info->name & (size)) { \ > + pr_err(#name " must be a multiple of %d\n", \ > + (size)); \ > + return -EINVAL; \ > + } \ > + } > + > + check_size(part_size, 4096 - 1); > + check_size(dmesg_size, SECTOR_SIZE - 1); > + > +#undef check_size > + > + if (!info->read) > + info->read = blkz_sample_read; > + if (!info->write) > + info->write = blkz_sample_write; Can you explain the write vs panic_write difference? And "sample" seems unusual here -- maybe "default"? Are there other handlers you imagine that wouldn't want to use these sample handlers? > + > + if (owner && !try_module_get(owner)) > + return -EINVAL; > + > + spin_lock(&cxt->bzinfo_lock); > + if (cxt->bzinfo) { > + pr_warn("blk '%s' already loaded: ignoring '%s'\n", > + cxt->bzinfo->name, info->name); > + spin_unlock(&cxt->bzinfo_lock); > + return -EBUSY; > + } > + cxt->bzinfo = info; > + spin_unlock(&cxt->bzinfo_lock); > + > + if (blkz_cut_zones(cxt)) { > + pr_err("cut zones fialed\n"); > + goto fail_out; > + } > + > + cxt->pstore.bufsize = cxt->dbzs[0]->buffer_size - > + sizeof(struct blkz_dmesg_header); > + cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL); > + if (!cxt->pstore.buf) { > + pr_err("cannot allocate pstore crash dump buffer\n"); > + err = -ENOMEM; > + goto fail_out; > + } > + cxt->pstore.data = cxt; > + cxt->pstore.flags = PSTORE_FLAGS_DMESG; > + > + err = pstore_register(&cxt->pstore); > + if (err) { > + pr_err("registering with pstore failed\n"); > + goto free_pstore_buf; > + } > + > + pr_info("Registered %s as blkzone backend for %s%s\n", info->name, > + cxt->dbzs && cxt->bzinfo->dump_oops ? "Oops " : "", > + cxt->dbzs && cxt->bzinfo->panic_write ? "Panic " : ""); > + So if panic_write isn't registered, panics will be dropped? > + module_put(owner); > + return 0; > + > +free_pstore_buf: > + kfree(cxt->pstore.buf); > +fail_out: > + spin_lock(&blkz_cxt.bzinfo_lock); > + blkz_cxt.bzinfo = NULL; > + spin_unlock(&blkz_cxt.bzinfo_lock); > + return err; > +} > +EXPORT_SYMBOL_GPL(blkz_register); > + > +void blkz_unregister(struct blkz_info *info) > +{ > + struct blkoops_context *cxt = &blkz_cxt; > + > + pstore_unregister(&cxt->pstore); > + kfree(cxt->pstore.buf); > + cxt->pstore.bufsize = 0; > + > + spin_lock(&cxt->bzinfo_lock); > + blkz_cxt.bzinfo = NULL; > + spin_unlock(&cxt->bzinfo_lock); > + > + blkz_free_zones(&cxt->dbzs, &cxt->dmesg_max_cnt); > + > +} > +EXPORT_SYMBOL_GPL(blkz_unregister); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("liaoweixiong "); > +MODULE_DESCRIPTION("Block device Oops/Panic logger"); > diff --git a/include/linux/pstore_blk.h b/include/linux/pstore_blk.h > new file mode 100644 > index 0000000..426cae4 > --- /dev/null > +++ b/include/linux/pstore_blk.h Is there any reason for this file to live in include/linux? I think it can just be in fs/pstore/ as blkzone.h or something? > @@ -0,0 +1,61 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > + > +#ifndef __PSTORE_BLK_H_ > +#define __PSTORE_BLK_H_ > + > +#include > +#include > + > +#ifndef SECTOR_SIZE > +#define SECTOR_SIZE 512 > +#endif > + > +/** > + * struct blkz_info - backend blkzone driver structure > + * > + * @owner: > + * module which is responsible for this backend driver > + * @name: > + * name of the backend driver > + * @part_path: > + * path of a storage partition. It's ok to keep it as NULL > + * if you passing @read and @write in blkz_info. @part_path > + * is needed by stoz_simple_read/write. If both of @part_path, stoz -> blkz ? > + * @read and @write are NULL, it will temporarity hold the data > + * in buffer allocated by 'vmalloc'. > + * @part_size: > + * total size of a storage partition in bytes. The partition > + * will be used to save data of pstore. > + * @dmesg_size: > + * the size of each zones for dmesg (oops & panic). > + * @dump_oops: > + * dump oops and panic log or only panic. > + * @read: > + * the normal (not panic) read operation. If NULL, replaced as > + * stoz_sample_read. See also @part_path > + * @write: > + * the normal (not panic) write operation. If NULL, replaced as > + * stoz_sample_write. See also @part_path Both as above? > + * @panic_read: > + * the read operation only used for panic. > + * @panic_write: > + * the write operation only used for panic. > + */ > +struct blkz_info { > + struct module *owner; > + const char *name; > + > + const char *part_path; > + unsigned long part_size; > + unsigned long dmesg_size; > + int dump_oops; > + ssize_t (*read)(char *buf, size_t bytes, loff_t pos); > + ssize_t (*write)(const char *buf, size_t bytes, loff_t pos); > + ssize_t (*panic_read)(char *buf, size_t bytes, loff_t pos); > + ssize_t (*panic_write)(const char *buf, size_t bytes, loff_t pos); > +}; > + > +extern int blkz_register(struct blkz_info *info); > +extern void blkz_unregister(struct blkz_info *info); > + > +#endif > -- > 1.9.1 > Slowly making my way through the series... :) -- Kees Cook