Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4052642ybv; Tue, 25 Feb 2020 12:12:24 -0800 (PST) X-Google-Smtp-Source: APXvYqx0g8j8RmXLTeOfkXRUDD+7F6FfvV2BxLjXmA901tKf/BmiNWQZZlQWokz0Dyl28iu9TjpE X-Received: by 2002:a05:6808:104:: with SMTP id b4mr456604oie.169.1582661544143; Tue, 25 Feb 2020 12:12:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582661544; cv=none; d=google.com; s=arc-20160816; b=MU2bT6vn4GYCtd6oKK59BzrAwRmgW1Q2wm631x+TsEE1jG19SZtZh/HFhjI8tEb7oX qUL2yYc0q3jlwvEYPyxWV3IT0WhoGWQYFfwQCtqSawZJ5JP4idicecNkrnoofCzjCjOy 9yZHUKxjGiBUdqYhtRicxeWvI6Ny3DElif499l2Kz80+Tih6lo3fe3Di8jvMcKqnHpaj +V16aEeZVyKFgsEljJkACkC1xPIKTruqHoOIZJOCYROmD41gD1VO9tNnki2Ptj3kb8xN 8MIqHYQTtgJk1LHy9/YzykJPaRPxXujm0Oy7fwNOOZqt4C+uW0G5eTLbgN/q7LdUSHeU 1drA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=3bI/ROscbd3XWSgLOT03O50aep/XtWZJZaUiK/524Ks=; b=zuxYKYDP0MJE/eJbYcTps8BrQ1Wu2brWyqS/jLaNw0w5SendKSau+PTDpRAq7ePt4y 769Jc8W34ykfFRNAPZLfIABOxCWqFLhEZhZ99IBRg5RtCCmZBtx5dI/XkVFYz/PXtohA Lzd8Fpb7VP7bTSMc58fEKwxytjzZrqZHoRTgj+t9LT+qolsoUpBVt91t9Tbwvk6S23ma 9JfzsYHJuBfNUf3Vy7q6DydIf4hWC/E8lPT3+y4BJYbLghwKkxJRZOoU5Zconj5UQfwt ACZ69RMnPzB/E0Gqt70Nv7yEFJ146FTXMTKQ7y0L0iKis8UgybeIeS11AhjtMwu4cyGL dnSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w7si125336oie.196.2020.02.25.12.12.10; Tue, 25 Feb 2020 12:12:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731640AbgBYULj (ORCPT + 99 others); Tue, 25 Feb 2020 15:11:39 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:54422 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730794AbgBYULi (ORCPT ); Tue, 25 Feb 2020 15:11:38 -0500 Received: from localhost ([127.0.0.1] helo=vostro.local) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1j6gYH-00024R-8I; Tue, 25 Feb 2020 21:11:33 +0100 From: John Ogness To: Petr Mladek Cc: Peter Zijlstra , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Linus Torvalds , Greg Kroah-Hartman , Andrea Parri , Thomas Gleixner , kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: misc details: Re: [PATCH 2/2] printk: use the lockless ringbuffer References: <20200128161948.8524-1-john.ogness@linutronix.de> <20200128161948.8524-3-john.ogness@linutronix.de> <20200217144110.xiqlzhs6ynoqdpun@pathway.suse.cz> Date: Tue, 25 Feb 2020 21:11:31 +0100 In-Reply-To: <20200217144110.xiqlzhs6ynoqdpun@pathway.suse.cz> (Petr Mladek's message of "Mon, 17 Feb 2020 15:41:10 +0100") Message-ID: <87h7zeqvf0.fsf@linutronix.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> - Record meta-data is now stored in a separate array of descriptors. >> This is an additional 72 * (2 ^ ((CONFIG_LOG_BUF_SHIFT - 6))) bytes >> for the static array and 72 * (2 ^ ((log_buf_len - 6))) bytes for >> the dynamic array. > > It might help to show some examples. I mean to mention the sizes > when CONFIG_LOG_BUF_SHIFT is 12 or so. OK. >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> - * Every record carries the monotonic timestamp in microseconds, as well as >> - * the standard userspace syslog level and syslog facility. The usual >> + * Every record meta-data carries the monotonic timestamp in microseconds, as > > I am afraid that we could not guarantee monotonic timestamp because > the writers are not synchronized. I hope that it will not create > real problems and we could just remove the word "monotonic" ;-) I removed "monotonic". I hope userspace doesn't require the ringbuffer to be chronologically sorted. That would explain why the safe buffers use bogus timestamps. :-/ >> +/* >> + * Define the average message size. This only affects the number of >> + * descriptors that will be available. Underestimating is better than >> + * overestimating (too many available descriptors is better than not enough). >> + * The dictionary buffer will be the same size as the text buffer. >> + */ >> +#define PRB_AVGBITS 6 > > Do I get it correctly that '6' means 2^6 = 64 characters? Correct. > Some ugly counting on my test systems shows the average 49 chars: > > $> dmesg | cut -d ']' -f 2- | wc -c > 30172 > $> dmesg | cut -d ']' -f 2- | wc -l > 612 > $> echo $((30172 / 612)) > 49 > > If I get it correctly then lower number is the more safe side. > So, a more safe default should be 5? For v2 the value will be lowered to 5. >> - if (log_make_free_space(size)) { >> + if (!prb_reserve(&e, prb, &r)) { >> /* truncate the message if it is too long for empty buffer */ >> - size = truncate_msg(&text_len, &trunc_msg_len, >> - &dict_len, &pad_len); >> + truncate_msg(&text_len, &trunc_msg_len, &dict_len); >> + r.text_buf_size = text_len + trunc_msg_len; Note that the additional space for the trunc_msg_len is being reserved. >> + r.dict_buf_size = dict_len; >> /* survive when the log buffer is too small for trunc_msg */ >> - if (log_make_free_space(size)) >> + if (!prb_reserve(&e, prb, &r)) >> return 0; >> } >> >> - if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) { >> - /* >> - * This message + an additional empty header does not fit >> - * at the end of the buffer. Add an empty header with len == 0 >> - * to signify a wrap around. >> - */ >> - memset(log_buf + log_next_idx, 0, sizeof(struct printk_log)); >> - log_next_idx = 0; >> - } >> - >> /* fill message */ >> - msg = (struct printk_log *)(log_buf + log_next_idx); >> - memcpy(log_text(msg), text, text_len); >> - msg->text_len = text_len; >> - if (trunc_msg_len) { >> - memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len); >> - msg->text_len += trunc_msg_len; > > Note that the old code updates msg->text_len. msg->text_len is equivalent to r.info->text_len, which was already set by the prb_reserve() (and already includes the trunc_msg_len). >> - } >> - memcpy(log_dict(msg), dict, dict_len); >> - msg->dict_len = dict_len; >> - msg->facility = facility; >> - msg->level = level & 7; >> - msg->flags = flags & 0x1f; >> + memcpy(&r.text_buf[0], text, text_len); >> + if (trunc_msg_len) >> + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); > > The new one just appends the string. That is all it needs to do here. >> + if (r.dict_buf) >> + memcpy(&r.dict_buf[0], dict, dict_len); >> + r.info->facility = facility; >> + r.info->level = level & 7; >> + r.info->flags = flags & 0x1f; >> if (ts_nsec > 0) >> - msg->ts_nsec = ts_nsec; >> + r.info->ts_nsec = ts_nsec; >> else >> - msg->ts_nsec = local_clock(); >> -#ifdef CONFIG_PRINTK_CALLER >> - msg->caller_id = caller_id; >> -#endif >> - memset(log_dict(msg) + dict_len, 0, pad_len); >> - msg->len = size; >> + r.info->ts_nsec = local_clock(); >> + r.info->caller_id = caller_id; >> >> /* insert message */ >> - log_next_idx += msg->len; >> - log_next_seq++; >> + prb_commit(&e); >> >> - return msg->text_len; >> + return text_len; > > So, this should be text_len + trunc_msg_len. Good catch! Yes. Fixed for v2. Thank you. (Note that simply returning r.info->text_len is not allowed because the writer must not access that data after calling prb_commit()). >> @@ -1974,9 +1966,9 @@ asmlinkage int vprintk_emit(int facility, int level, >> >> /* This stops the holder of console_sem just where we want him */ >> logbuf_lock_irqsave(flags); >> - curr_log_seq = log_next_seq; >> + pending_output = !prb_read_valid(prb, console_seq, NULL); >> printed_len = vprintk_store(facility, level, dict, dictlen, fmt, args); >> - pending_output = (curr_log_seq != log_next_seq); >> + pending_output &= prb_read_valid(prb, console_seq, NULL); > > The original code checked whether vprintk_store() stored the text > into the main log buffer or only into the cont buffer. > > The new code checks whether console is behind which is something > different. I would argue that they are the same thing in this context. Keep in mind that we are under the logbuf_lock. If there was previously nothing pending and now there is, this context is the only one that could have added it. This logic will change significantly when we remove the locks (and it will disappear once we go to kthreads). But we aren't that far at this stage and I'd like to keep the general logic somewhat close to the current mainline implementation for now. > I prefer to call wake_up_klogd() directly from log_output() or > log_store() instead. It might later be used to wake up > printk kthreads as well. > > It was done this way because consoles were historically preferred > over userspace loggers. But the difference will be lower when > consoles are handled by kthread. Agreed, but that is something I would like to save for a later series. Right now I only want to replace the ringbuffer without rearranging priorities. >> -skip: >> - if (console_seq == log_next_seq) >> - break; >> + console_seq = console_record.info->seq; > > This code suggests that it might be possible to get > console_seq > console_record.info->seq and we just > ignore it. I prefer to make it clear by: > > if (console_seq != console_record.info->seq) { OK. Thanks for your help. John Ogness