From: "Luis R. Rodriguez" <[email protected]>
The default size of the ring buffer is too small for machines
with a large amount of CPUs under heavy load. What ends up
happening when debugging is the ring buffer overlaps and chews
up old messages making debugging impossible unless the size is
passed as a kernel parameter. An idle system upon boot up will
on average spew out only about one or two extra lines but where
this really matters is on heavy load and that will vary widely
depending on the system and environment.
There are mechanisms to help increase the kernel ring buffer
for tracing through debugfs, and those interfaces even allow growing
the kernel ring buffer per CPU. We also have a static value which
can be passed upon boot. Relying on debugfs however is not ideal
for production, and relying on the value passed upon bootup is
can only used *after* an issue has creeped up. Instead of being
reactive this adds a proactive measure which lets you scale the
amount of contributions you'd expect to the kernel ring buffer
under load by each CPU in the worst case scenario.
We use num_possible_cpus() to avoid complexities which could be
introduced by dynamically changing the ring buffer size at run
time, num_possible_cpus() lets us use the upper limit on possible
number of CPUs therefore avoiding having to deal with hotplugging
CPUs on and off. This introduces the kernel configuration option
LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
of contributions to the kernel ring buffer in the worst case before
the kernel ring buffer flips over, the size is specified as a power
of 2. The total amount of contributions made by each CPU must be
greater than half of the default kernel ring buffer size
(1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
bootup. The kernel ring buffer is increased to the next power of
two that would fit the required minimum kernel ring buffer size
plus the additional CPU contribution. For example if LOG_BUF_SHIFT
is 18 (256 KB) you'd require at least 128 KB contributions by
other CPUs in order to trigger an increase of the kernel ring buffer.
With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
anything over > 64 possible CPUs to trigger an increase. If you
had 128 possible CPUs the amount of minimum required kernel ring
buffer bumps to:
((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
Since we require the ring buffer to be a power of two this would
the new required size would be 1024 KB.
This CPU contributions are ignored when the "log_buf_len" kernel parameter
is used as it forces the exact size of the ring buffer to an expected power
of two value.
In order to make this code a bit more legible, add a small enum to keep
track of when the reasons of setting the ring buffer, and extend the
documentation quite a bit to make all this clear.
Cc: Michal Hocko <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
I've modified the computation to just go to the next power of two. All
other implementations do that, and although its not well documented
I rather follow that logic. Without the enum stuff this code can get
ugly easy, while at it I also extended the documentation a bit more.
Documentation/kernel-parameters.txt | 8 ++-
init/Kconfig | 53 +++++++++++++++++-
kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++--
3 files changed, 162 insertions(+), 7 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6eaa9cd..229d031 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
7 (KERN_DEBUG) debug-level messages
log_buf_len=n[KMG] Sets the size of the printk ring buffer,
- in bytes. n must be a power of two. The default
- size is set in the kernel config file.
+ in bytes. n must be a power of two and greater
+ than the minimal size. The minimal size is defined
+ by LOG_BUF_SHIFT kernel config parameter. There is
+ also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
+ that allows to increase the default size depending on
+ the number of CPUs. See init/Kconfig for more details.
logo.nologo [FB] Disables display of the built-in Linux logo.
This may be used to provide more screen space for
diff --git a/init/Kconfig b/init/Kconfig
index 9d76b99..69bdbcf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
range 12 21
default 17
help
- Select kernel log buffer size as a power of 2.
+ Select the minimal kernel log buffer size as a power of 2.
+ The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
+ parameter, see below. Any higher size also might be forced
+ by "log_buf_len" boot parameter.
+
Examples:
17 => 128 KB
16 => 64 KB
@@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
13 => 8 KB
12 => 4 KB
+config LOG_CPU_MIN_BUF_SHIFT
+ int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
+ range 0 21
+ default 12
+ depends on SMP
+ depends on !BASE_SMALL
+ help
+ The kernel ring buffer will get additional data logged onto it
+ when multiple CPUs are supported. Typically the contributions are
+ only a few lines when idle however under under load this can vary
+ and in the worst case it can mean losing logging information. You
+ can use this to set the maximum expected mount of amount of logging
+ contribution under load by each CPU in the worst case scenario, as
+ a power of 2. The total amount of contributions made by each CPU
+ must be greater than half of the default kernel ring buffer size
+ ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
+ bootup. If an increase is required the ring buffer is increated to
+ the next power of 2 that can fit both the minimum kernel ring buffer
+ (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
+ For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
+ 128 KB contributions by other CPUs in order to trigger an increase.
+ With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
+ over > 64 possible CPUs to trigger an increase. If you had 128
+ possible CPUs the new minimum required kernel ring buffer size
+ would be:
+
+ ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
+
+ Since we only allow powers of two for the kernel ring buffer size the
+ new kernel ring buffer size would be 1024 KB.
+
+ CPU contributions are ignored when "log_buf_len" kernel parameter is
+ used as it forces an exact (power of two) size of the ring buffer to
+ an expected value.
+
+ The number of possible CPUs is used for this computation ignoring
+ hotplugging making the compuation optimal for the the worst case
+ scenerio while allowing a simple algorithm to be used from bootup.
+
+ Examples shift values and their meaning:
+ 17 => 128 KB for each CPU
+ 16 => 64 KB for each CPU
+ 15 => 32 KB for each CPU
+ 14 => 16 KB for each CPU
+ 13 => 8 KB for each CPU
+ 12 => 4 KB for each CPU
+
#
# Architectures with an unreliable sched_clock() should select this:
#
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6..54632a0c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -214,6 +214,67 @@ enum log_flags {
LOG_CONT = 8, /* text is a fragment of a continuation line */
};
+/**
+ * DOC: Kernel ring buffer setup
+ *
+ * The size of the kernel ring buffer is always set to a size as a power of 2.
+ * By default the kernel ring buffer size is set up using static data through
+ * LOG_BUF_SHIFT. You can manually override the kernel ring buffer on boot by
+ * using the log_buf_len=n parameter. On really large CPU systems (default is
+ * more than 64 CPUs) an extra CPU contribution is computed in which case the
+ * ring buffer is adjusted to the next power 2 that can fit the expected worst
+ * case CPU contributions to the kernel ring buffer. The LOG_CPU_MIN_BUF_SHIFT
+ * contributions are computed late after setup_arch() in order to make use of
+ * num_possible_cpus(). LOG_CPU_MIN_BUF_SHIFT is a proactive measure for large
+ * systems. With a LOG_BUF_SHIFT of 18 and LOG_CPU_MIN_BUF_SHIFT 12 you'd need
+ * at least 64 CPUs in order to trigger an increase.
+ *
+ * setup_log_buf() can be called at early init during setup_arch(), at
+ * which point the early parameter must be set to true, this varies accross
+ * architectures but currently only x86 uses this. The early case of
+ * setup_log_buf() cannot use num_possible_cpus() as set_cpu_possible()
+ * is not guaranteed to have been called then.
+ */
+
+
+/**
+ * enum klog_setup_state - states for the kernel ring buffer setup
+ *
+ * This is used by by setup_log_buf() to keep track of the state of the
+ * kernel ring buffer via klog_state.
+ *
+ * @KLOG_STATIC: the minimum configured kernel ring buffer size is used. The
+ * ring buffer size is set from static kernel data. The size is set with
+ * LOG_BUF_SHIFT. Each architecture can define its own default size
+ * through its respective architecture defconfig. This is a power of 2.
+ * The maximum allowed value is 21, allowing for a maximum static size
+ * of 2 GiB.
+ * @KLOG_PARAM: the log_buf_len=n kernel parameter was passed, the passsed
+ * size will be used, it must be a power of 2 and greater than the
+ * mimimum required size set with LOG_BUF_SHIFT. When the kernel parameter
+ * is set CPU contributions set by LOG_CPU_MIN_BUF_SHIFT will be ignored.
+ * Architectures can require the kernel ring buffer to be set up
+ * early in the boot process during setup_arch(). These architectures must
+ * call setup_log_buf() with the early parameter set. Since the kernel
+ * ring buffer is already set with the minimum required size an early
+ * allocation can only occur when the log_buf_len=n parameter is used.
+ * Currently only the x86 architectures requires the early setup.
+ * @KLOG_CPU_EXTRA: after setup_arch() has been called setup_log_buf() will
+ * be called for all architectures. For architectures that did not use
+ * the early call during its setup_arch() this is where the log_buf_len=n
+ * kernel parameter is checked for. The LOG_CPU_MIN_BUF_SHIFT
+ * contributions will be ignored if klog_state is already set to
+ * KLOG_PARAM. Architecures are expected to have prefilled the possible
+ * CPU bits cpu_possible_bits with set_cpu_possible() during setup_arch()
+ * in order for this extra CPU computation to work, for those that haven't
+ * this evaluation will have no effect.
+ */
+enum klog_setup_state {
+ KLOG_STATIC,
+ KLOG_PARAM,
+ KLOG_CPU_EXTRA,
+};
+
struct printk_log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
@@ -266,9 +327,11 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
+static enum klog_setup_state klog_state = KLOG_STATIC;
/* human readable text of the record */
static char *log_text(const struct printk_log *msg)
@@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
}
early_param("log_buf_len", log_buf_len_setup);
+static unsigned __init compute_cpu_contrib(void)
+{
+ int cpu_extra;
+ unsigned extra_cpu_log_size;
+
+ /*
+ * archs should set up cpu_possible_bits properly with
+ * set_cpu_possible() after setup_arch() but just in
+ * case lets ensure this is valid.
+ */
+ if (num_possible_cpus() <= 1)
+ return 0;
+
+ cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+ extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
+
+ if (cpu_extra <= __LOG_BUF_LEN / 2)
+ return 0;
+
+ pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+ pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+ return extra_cpu_log_size;
+}
+
void __init setup_log_buf(int early)
{
unsigned long flags;
char *new_log_buf;
int free;
-
- if (!new_log_buf_len)
- return;
+ enum klog_setup_state new_klog_state;
if (early) {
+ if (!new_log_buf_len)
+ return;
new_log_buf =
memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
+ new_klog_state = KLOG_PARAM;
} else {
- new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
+ if (klog_state == KLOG_PARAM)
+ return;
+ if (new_log_buf_len)
+ new_klog_state = KLOG_PARAM;
+ else {
+ new_log_buf_len = compute_cpu_contrib();
+ new_klog_state = KLOG_CPU_EXTRA;
+ }
+ if (!new_log_buf_len)
+ return;
+ new_log_buf = memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
}
if (unlikely(!new_log_buf)) {
@@ -868,6 +967,7 @@ void __init setup_log_buf(int early)
log_buf_len = new_log_buf_len;
log_buf = new_log_buf;
new_log_buf_len = 0;
+ klog_state = new_klog_state;
free = __LOG_BUF_LEN - log_next_idx;
memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
--
1.9.3
On Mon 2014-06-16 17:37:44, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> The default size of the ring buffer is too small for machines
> with a large amount of CPUs under heavy load. What ends up
> happening when debugging is the ring buffer overlaps and chews
> up old messages making debugging impossible unless the size is
> passed as a kernel parameter. An idle system upon boot up will
> on average spew out only about one or two extra lines but where
> this really matters is on heavy load and that will vary widely
> depending on the system and environment.
>
> There are mechanisms to help increase the kernel ring buffer
> for tracing through debugfs, and those interfaces even allow growing
> the kernel ring buffer per CPU. We also have a static value which
> can be passed upon boot. Relying on debugfs however is not ideal
> for production, and relying on the value passed upon bootup is
> can only used *after* an issue has creeped up. Instead of being
> reactive this adds a proactive measure which lets you scale the
> amount of contributions you'd expect to the kernel ring buffer
> under load by each CPU in the worst case scenario.
>
> We use num_possible_cpus() to avoid complexities which could be
> introduced by dynamically changing the ring buffer size at run
> time, num_possible_cpus() lets us use the upper limit on possible
> number of CPUs therefore avoiding having to deal with hotplugging
> CPUs on and off. This introduces the kernel configuration option
> LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
> of contributions to the kernel ring buffer in the worst case before
> the kernel ring buffer flips over, the size is specified as a power
> of 2. The total amount of contributions made by each CPU must be
> greater than half of the default kernel ring buffer size
> (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
> bootup. The kernel ring buffer is increased to the next power of
> two that would fit the required minimum kernel ring buffer size
> plus the additional CPU contribution. For example if LOG_BUF_SHIFT
> is 18 (256 KB) you'd require at least 128 KB contributions by
> other CPUs in order to trigger an increase of the kernel ring buffer.
> With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
> anything over > 64 possible CPUs to trigger an increase. If you
> had 128 possible CPUs the amount of minimum required kernel ring
> buffer bumps to:
>
> ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
>
> Since we require the ring buffer to be a power of two this would
> the new required size would be 1024 KB.
>
> This CPU contributions are ignored when the "log_buf_len" kernel parameter
> is used as it forces the exact size of the ring buffer to an expected power
> of two value.
>
> In order to make this code a bit more legible, add a small enum to keep
> track of when the reasons of setting the ring buffer, and extend the
> documentation quite a bit to make all this clear.
>
> Cc: Michal Hocko <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Arun KS <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
>
> I've modified the computation to just go to the next power of two. All
> other implementations do that, and although its not well documented
> I rather follow that logic. Without the enum stuff this code can get
> ugly easy, while at it I also extended the documentation a bit more.
>
> Documentation/kernel-parameters.txt | 8 ++-
> init/Kconfig | 53 +++++++++++++++++-
> kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++--
> 3 files changed, 162 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 6eaa9cd..229d031 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> 7 (KERN_DEBUG) debug-level messages
>
> log_buf_len=n[KMG] Sets the size of the printk ring buffer,
> - in bytes. n must be a power of two. The default
> - size is set in the kernel config file.
> + in bytes. n must be a power of two and greater
> + than the minimal size. The minimal size is defined
> + by LOG_BUF_SHIFT kernel config parameter. There is
> + also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
> + that allows to increase the default size depending on
> + the number of CPUs. See init/Kconfig for more details.
>
> logo.nologo [FB] Disables display of the built-in Linux logo.
> This may be used to provide more screen space for
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d76b99..69bdbcf 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
> range 12 21
> default 17
> help
> - Select kernel log buffer size as a power of 2.
> + Select the minimal kernel log buffer size as a power of 2.
> + The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
> + parameter, see below. Any higher size also might be forced
> + by "log_buf_len" boot parameter.
> +
> Examples:
> 17 => 128 KB
> 16 => 64 KB
> @@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
> 13 => 8 KB
> 12 => 4 KB
>
> +config LOG_CPU_MIN_BUF_SHIFT
> + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> + range 0 21
> + default 12
> + depends on SMP
> + depends on !BASE_SMALL
> + help
> + The kernel ring buffer will get additional data logged onto it
> + when multiple CPUs are supported. Typically the contributions are
> + only a few lines when idle however under under load this can vary
> + and in the worst case it can mean losing logging information. You
> + can use this to set the maximum expected mount of amount of logging
> + contribution under load by each CPU in the worst case scenario, as
> + a power of 2. The total amount of contributions made by each CPU
> + must be greater than half of the default kernel ring buffer size
> + ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
> + bootup. If an increase is required the ring buffer is increated to
> + the next power of 2 that can fit both the minimum kernel ring buffer
> + (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
> + For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
> + 128 KB contributions by other CPUs in order to trigger an increase.
> + With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
> + over > 64 possible CPUs to trigger an increase. If you had 128
> + possible CPUs the new minimum required kernel ring buffer size
> + would be:
> +
> + ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
> +
> + Since we only allow powers of two for the kernel ring buffer size the
> + new kernel ring buffer size would be 1024 KB.
> +
> + CPU contributions are ignored when "log_buf_len" kernel parameter is
> + used as it forces an exact (power of two) size of the ring buffer to
> + an expected value.
> +
> + The number of possible CPUs is used for this computation ignoring
> + hotplugging making the compuation optimal for the the worst case
> + scenerio while allowing a simple algorithm to be used from bootup.
> +
> + Examples shift values and their meaning:
> + 17 => 128 KB for each CPU
> + 16 => 64 KB for each CPU
> + 15 => 32 KB for each CPU
> + 14 => 16 KB for each CPU
> + 13 => 8 KB for each CPU
> + 12 => 4 KB for each CPU
> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6..54632a0c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -214,6 +214,67 @@ enum log_flags {
> LOG_CONT = 8, /* text is a fragment of a continuation line */
> };
>
> +/**
> + * DOC: Kernel ring buffer setup
> + *
> + * The size of the kernel ring buffer is always set to a size as a power of 2.
> + * By default the kernel ring buffer size is set up using static data through
> + * LOG_BUF_SHIFT. You can manually override the kernel ring buffer on boot by
> + * using the log_buf_len=n parameter. On really large CPU systems (default is
> + * more than 64 CPUs) an extra CPU contribution is computed in which case the
> + * ring buffer is adjusted to the next power 2 that can fit the expected worst
> + * case CPU contributions to the kernel ring buffer. The LOG_CPU_MIN_BUF_SHIFT
> + * contributions are computed late after setup_arch() in order to make use of
> + * num_possible_cpus(). LOG_CPU_MIN_BUF_SHIFT is a proactive measure for large
> + * systems. With a LOG_BUF_SHIFT of 18 and LOG_CPU_MIN_BUF_SHIFT 12 you'd need
> + * at least 64 CPUs in order to trigger an increase.
> + *
> + * setup_log_buf() can be called at early init during setup_arch(), at
> + * which point the early parameter must be set to true, this varies accross
> + * architectures but currently only x86 uses this. The early case of
> + * setup_log_buf() cannot use num_possible_cpus() as set_cpu_possible()
> + * is not guaranteed to have been called then.
> + */
> +
> +
> +/**
> + * enum klog_setup_state - states for the kernel ring buffer setup
> + *
> + * This is used by by setup_log_buf() to keep track of the state of the
> + * kernel ring buffer via klog_state.
> + *
> + * @KLOG_STATIC: the minimum configured kernel ring buffer size is used. The
> + * ring buffer size is set from static kernel data. The size is set with
> + * LOG_BUF_SHIFT. Each architecture can define its own default size
> + * through its respective architecture defconfig. This is a power of 2.
> + * The maximum allowed value is 21, allowing for a maximum static size
> + * of 2 GiB.
> + * @KLOG_PARAM: the log_buf_len=n kernel parameter was passed, the passsed
> + * size will be used, it must be a power of 2 and greater than the
> + * mimimum required size set with LOG_BUF_SHIFT. When the kernel parameter
> + * is set CPU contributions set by LOG_CPU_MIN_BUF_SHIFT will be ignored.
> + * Architectures can require the kernel ring buffer to be set up
> + * early in the boot process during setup_arch(). These architectures must
> + * call setup_log_buf() with the early parameter set. Since the kernel
> + * ring buffer is already set with the minimum required size an early
> + * allocation can only occur when the log_buf_len=n parameter is used.
> + * Currently only the x86 architectures requires the early setup.
> + * @KLOG_CPU_EXTRA: after setup_arch() has been called setup_log_buf() will
> + * be called for all architectures. For architectures that did not use
> + * the early call during its setup_arch() this is where the log_buf_len=n
> + * kernel parameter is checked for. The LOG_CPU_MIN_BUF_SHIFT
> + * contributions will be ignored if klog_state is already set to
> + * KLOG_PARAM. Architecures are expected to have prefilled the possible
> + * CPU bits cpu_possible_bits with set_cpu_possible() during setup_arch()
> + * in order for this extra CPU computation to work, for those that haven't
> + * this evaluation will have no effect.
> + */
> +enum klog_setup_state {
> + KLOG_STATIC,
> + KLOG_PARAM,
> + KLOG_CPU_EXTRA,
> +};
> +
> struct printk_log {
> u64 ts_nsec; /* timestamp in nanoseconds */
> u16 len; /* length of entire record */
> @@ -266,9 +327,11 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> +static enum klog_setup_state klog_state = KLOG_STATIC;
>
> /* human readable text of the record */
> static char *log_text(const struct printk_log *msg)
> @@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static unsigned __init compute_cpu_contrib(void)
The function name is slightly misleading. It does not compute the
extra space but the whole length of the ring buffer. What about using
default_len_by_cpu_num() or so?
> +{
> + int cpu_extra;
> + unsigned extra_cpu_log_size;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid.
> + */
> + if (num_possible_cpus() <= 1)
> + return 0;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> + extra_cpu_log_size = roundup_pow_of_two(cpu_extra +
> __LOG_BUF_LEN);
Great catch. Well, I am not sure if this is really
needed. memblock_virt_alloc() is called on another locations with "any" size.
It might be enough to make sure that the size is aligned to
LOG_ALIGN. This is how the messages are aligned. I would do:
cpu_extra %= LOG_ALIGN;
Another possibility would be to set the minimal size of
LOG_CPU_MIN_BUF_SHIFT to "6" or so. I hope that the alignment of the
"struct printk_log" newer would be bigger than 64 bytes. Well, we
could add a compile check if we want to be sure.
Anyway, I do not have any strong opinion here. I might be too careful
and the roundup_pow_of_two() is perfectly fine.
> +
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return 0;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + return extra_cpu_log_size;
> +}
> +
>
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
> -
> - if (!new_log_buf_len)
> - return;
> + enum klog_setup_state new_klog_state;
>
> if (early) {
> + if (!new_log_buf_len)
> + return;
> new_log_buf =
> memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> + new_klog_state = KLOG_PARAM;
> } else {
> - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> + if (klog_state == KLOG_PARAM)
> + return;
> + if (new_log_buf_len)
> + new_klog_state = KLOG_PARAM;
> + else {
> + new_log_buf_len = compute_cpu_contrib();
> + new_klog_state = KLOG_CPU_EXTRA;
> + }
> + if (!new_log_buf_len)
> + return;
> + new_log_buf = memblock_virt_alloc(new_log_buf_len,
> PAGE_SIZE);
We should call memblock_virt_allocc_nopanic() in this else part.
Well, I am not sure if the new klog states make the code really better
readable. I wonder where we lost the simplicity from v3 of this patch ;-)
What about replacing the above changes in kernel/printk/printk.c with
the following ones:
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6962ed..e00a9600f5fa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -266,6 +266,7 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
@@ -842,12 +843,52 @@ static int __init log_buf_len_setup(char *str)
}
early_param("log_buf_len", log_buf_len_setup);
+static unsigned __init default_len_by_cpu_num(void)
+{
+ int cpu_extra;
+ unsigned extra_cpu_log_size;
+
+ /*
+ * archs should set up cpu_possible_bits properly with
+ * set_cpu_possible() after setup_arch() but just in
+ * case lets ensure this is valid.
+ */
+ if (num_possible_cpus() <= 1)
+ return 0;
+
+ cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+ /* make sure that the buffer is aligned */
+ cpu_extra %= LOG_ALIGN;
+ extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
+
+ if (cpu_extra <= __LOG_BUF_LEN / 2)
+ return 0;
+
+ pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+ pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+ return extra_cpu_log_size;
+}
+
void __init setup_log_buf(int early)
{
unsigned long flags;
char *new_log_buf;
int free;
+ /* nope when already allocated earlier */
+ if (log_buf != __log_buf)
+ return;
+
+ /*
+ * The default size need to be increased on systems with many CPUs.
+ * It is done only when an exact size is not forced by log_buf_len=n
+ * kernel parameter.
+ */
+ if (!new_log_buf_len)
+ new_log_buf_len = default_len_by_cpu_num();
+
+ /* nope when nobody wants to increase the size after all */
if (!new_log_buf_len)
return;
--
1.8.4
I think that it is better readable than the two level if-magic with
the three new flags. The long description of the three flags looked
scary in itself ;-)
Best Regards,
Petr
On Tue 2014-06-17 16:52:00, Petr Ml?dek wrote:
> On Mon 2014-06-16 17:37:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> >
> > The default size of the ring buffer is too small for machines
> > with a large amount of CPUs under heavy load. What ends up
> > happening when debugging is the ring buffer overlaps and chews
> > up old messages making debugging impossible unless the size is
> > passed as a kernel parameter. An idle system upon boot up will
> > on average spew out only about one or two extra lines but where
> > this really matters is on heavy load and that will vary widely
> > depending on the system and environment.
> >
> > There are mechanisms to help increase the kernel ring buffer
> > for tracing through debugfs, and those interfaces even allow growing
> > the kernel ring buffer per CPU. We also have a static value which
> > can be passed upon boot. Relying on debugfs however is not ideal
> > for production, and relying on the value passed upon bootup is
> > can only used *after* an issue has creeped up. Instead of being
> > reactive this adds a proactive measure which lets you scale the
> > amount of contributions you'd expect to the kernel ring buffer
> > under load by each CPU in the worst case scenario.
> >
> > We use num_possible_cpus() to avoid complexities which could be
> > introduced by dynamically changing the ring buffer size at run
> > time, num_possible_cpus() lets us use the upper limit on possible
> > number of CPUs therefore avoiding having to deal with hotplugging
> > CPUs on and off. This introduces the kernel configuration option
> > LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
> > of contributions to the kernel ring buffer in the worst case before
> > the kernel ring buffer flips over, the size is specified as a power
> > of 2. The total amount of contributions made by each CPU must be
> > greater than half of the default kernel ring buffer size
> > (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
> > bootup. The kernel ring buffer is increased to the next power of
> > two that would fit the required minimum kernel ring buffer size
> > plus the additional CPU contribution. For example if LOG_BUF_SHIFT
> > is 18 (256 KB) you'd require at least 128 KB contributions by
> > other CPUs in order to trigger an increase of the kernel ring buffer.
> > With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
> > anything over > 64 possible CPUs to trigger an increase. If you
> > had 128 possible CPUs the amount of minimum required kernel ring
> > buffer bumps to:
> >
> > ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
> >
> > Since we require the ring buffer to be a power of two this would
> > the new required size would be 1024 KB.
> >
> > This CPU contributions are ignored when the "log_buf_len" kernel parameter
> > is used as it forces the exact size of the ring buffer to an expected power
> > of two value.
> >
> > In order to make this code a bit more legible, add a small enum to keep
> > track of when the reasons of setting the ring buffer, and extend the
> > documentation quite a bit to make all this clear.
> >
> > Cc: Michal Hocko <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Joe Perches <[email protected]>
> > Cc: Arun KS <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Davidlohr Bueso <[email protected]>
> > Cc: Chris Metcalf <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > ---
> >
> > I've modified the computation to just go to the next power of two. All
> > other implementations do that, and although its not well documented
> > I rather follow that logic. Without the enum stuff this code can get
> > ugly easy, while at it I also extended the documentation a bit more.
> >
> > Documentation/kernel-parameters.txt | 8 ++-
> > init/Kconfig | 53 +++++++++++++++++-
> > kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++--
> > 3 files changed, 162 insertions(+), 7 deletions(-)
[...]
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ea2d5f6..54632a0c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
[...]
> > /* human readable text of the record */
> > static char *log_text(const struct printk_log *msg)
> > @@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
> > }
> > early_param("log_buf_len", log_buf_len_setup);
> >
> > +static unsigned __init compute_cpu_contrib(void)
>
> The function name is slightly misleading. It does not compute the
> extra space but the whole length of the ring buffer. What about using
> default_len_by_cpu_num() or so?
>
> > +{
> > + int cpu_extra;
> > + unsigned extra_cpu_log_size;
> > +
> > + /*
> > + * archs should set up cpu_possible_bits properly with
> > + * set_cpu_possible() after setup_arch() but just in
> > + * case lets ensure this is valid.
> > + */
> > + if (num_possible_cpus() <= 1)
> > + return 0;
> > +
> > + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> > + extra_cpu_log_size = roundup_pow_of_two(cpu_extra +
> > __LOG_BUF_LEN);
>
> Great catch. Well, I am not sure if this is really
> needed. memblock_virt_alloc() is called on another locations with "any" size.
>
> It might be enough to make sure that the size is aligned to
> LOG_ALIGN. This is how the messages are aligned. I would do:
>
> cpu_extra %= LOG_ALIGN;
Shame on me :-) I have added this computation just before sending the
mail wihtout enough thingking and testing. Better solution would be:
cpu_extra &= ~(LOG_ALIGN - 1);
If we would want to round up, we could use a similar trick that is used in
msg_used_size(). But I think that it is not really needed here.
> Another possibility would be to set the minimal size of
> LOG_CPU_MIN_BUF_SHIFT to "6" or so. I hope that the alignment of the
> "struct printk_log" newer would be bigger than 64 bytes. Well, we
> could add a compile check if we want to be sure.
>
> Anyway, I do not have any strong opinion here. I might be too careful
> and the roundup_pow_of_two() is perfectly fine.
>
> > +
> > + if (cpu_extra <= __LOG_BUF_LEN / 2)
> > + return 0;
> > +
> > + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> > + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> > +
> > + return extra_cpu_log_size;
> > +}
> > +
> >
> > void __init setup_log_buf(int early)
> > {
> > unsigned long flags;
> > char *new_log_buf;
> > int free;
> > -
> > - if (!new_log_buf_len)
> > - return;
> > + enum klog_setup_state new_klog_state;
> >
> > if (early) {
> > + if (!new_log_buf_len)
> > + return;
> > new_log_buf =
> > memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> > + new_klog_state = KLOG_PARAM;
> > } else {
> > - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> > + if (klog_state == KLOG_PARAM)
> > + return;
> > + if (new_log_buf_len)
> > + new_klog_state = KLOG_PARAM;
> > + else {
> > + new_log_buf_len = compute_cpu_contrib();
> > + new_klog_state = KLOG_CPU_EXTRA;
> > + }
> > + if (!new_log_buf_len)
> > + return;
> > + new_log_buf = memblock_virt_alloc(new_log_buf_len,
> > PAGE_SIZE);
>
> We should call memblock_virt_allocc_nopanic() in this else part.
>
>
> Well, I am not sure if the new klog states make the code really better
> readable. I wonder where we lost the simplicity from v3 of this patch ;-)
>
> What about replacing the above changes in kernel/printk/printk.c with
> the following ones:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6962ed..e00a9600f5fa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -842,12 +843,52 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static unsigned __init default_len_by_cpu_num(void)
> +{
> + int cpu_extra;
> + unsigned extra_cpu_log_size;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid.
> + */
> + if (num_possible_cpus() <= 1)
> + return 0;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> + /* make sure that the buffer is aligned */
> + cpu_extra %= LOG_ALIGN;
> + extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
The correct lines are:
+ cpu_extra &= ~(LOG_ALIGN - 1);
+ extra_cpu_log_size = __LOG_BUF_LEN + cpu_extra;
Best Regards,
Petr
> +
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return 0;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + return extra_cpu_log_size;
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + /* nope when already allocated earlier */
> + if (log_buf != __log_buf)
> + return;
> +
> + /*
> + * The default size need to be increased on systems with many CPUs.
> + * It is done only when an exact size is not forced by log_buf_len=n
> + * kernel parameter.
> + */
> + if (!new_log_buf_len)
> + new_log_buf_len = default_len_by_cpu_num();
> +
> + /* nope when nobody wants to increase the size after all */
> if (!new_log_buf_len)
> return;
>
> --
> 1.8.4
>
>
> I think that it is better readable than the two level if-magic with
> the three new flags. The long description of the three flags looked
> scary in itself ;-)
>
> Best Regards,
> Petr
On Tue 2014-06-17 16:52:00, Petr Ml?dek wrote:
> What about replacing the above changes in kernel/printk/printk.c with
> the following ones:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6962ed..e00a9600f5fa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -842,12 +843,52 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static unsigned __init default_len_by_cpu_num(void)
> +{
> + int cpu_extra;
> + unsigned extra_cpu_log_size;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid.
> + */
> + if (num_possible_cpus() <= 1)
> + return 0;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> + /* make sure that the buffer is aligned */
> + cpu_extra %= LOG_ALIGN;
> + extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
> +
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return 0;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + return extra_cpu_log_size;
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + /* nope when already allocated earlier */
> + if (log_buf != __log_buf)
> + return;
> +
> + /*
> + * The default size need to be increased on systems with many CPUs.
> + * It is done only when an exact size is not forced by log_buf_len=n
> + * kernel parameter.
> + */
> + if (!new_log_buf_len)
> + new_log_buf_len = default_len_by_cpu_num();
Also I forgot to explain that default_len_by_cpu_num()
could be called even in the early stage. In this case
it returns zero because num_possible_cpus() returns 1. It means that
the buffer wont be allocated at this stage. I think that it is pretty
safe.
If you want to avoid this speculative call, you might use:
if (!new_log_buf_len && !early)
new_log_buf_len = default_len_by_cpu_num();
> + /* nope when nobody wants to increase the size after all */
> if (!new_log_buf_len)
> return;
>
> --
> 1.8.4
Best Regards,
Petr
On Tue, Jun 17, 2014 at 04:52:01PM +0200, Petr Ml?dek wrote:
> On Mon 2014-06-16 17:37:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ea2d5f6..54632a0c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
> > }
> > early_param("log_buf_len", log_buf_len_setup);
> >
> > +static unsigned __init compute_cpu_contrib(void)
>
> The function name is slightly misleading. It does not compute the
> extra space but the whole length of the ring buffer. What about using
> default_len_by_cpu_num() or so?
Sure.
> > +{
> > + int cpu_extra;
> > + unsigned extra_cpu_log_size;
> > +
> > + /*
> > + * archs should set up cpu_possible_bits properly with
> > + * set_cpu_possible() after setup_arch() but just in
> > + * case lets ensure this is valid.
> > + */
> > + if (num_possible_cpus() <= 1)
> > + return 0;
> > +
> > + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> > + extra_cpu_log_size = roundup_pow_of_two(cpu_extra +
> > __LOG_BUF_LEN);
>
> Great catch. Well, I am not sure if this is really
> needed. memblock_virt_alloc() is called on another locations with "any" size.
Ok, on the other hand using roundup_pow_of_two() would guarantee both general
alignment and upkeeping the tradition of having a ring buffer as a power of
two. I think the implicit trick here was that alignment is ensured by
becauase __LOG_BUF_LEN would be aligned to the architecture already as its
an architecture specific value, furthermore no value smaller than this would
be allowed and since we're doing power of two's any further extra multiplication
by two should be aligned to the architecture. That should also mean that we
wouldn't have to care for the size of log_buf_add_cpu in consideration for
the archicture alignment as its already done for us.
cpu_extra &= ~(LOG_ALIGN - 1); seems fair to me as well if we want to save
some bytes, it probably does no harm if we bumped to the next power of two
and and we could shared aligning code to make this explicit.
I'll let you make the call as you'd know more than me of the requirements
here I'm sure.
> It might be enough to make sure that the size is aligned to
> LOG_ALIGN. This is how the messages are aligned. I would do:
>
> cpu_extra %= LOG_ALIGN;
cpu_extra &= ~(LOG_ALIGN - 1) also works as I noticed in your follow up.
> Another possibility would be to set the minimal size of
> LOG_CPU_MIN_BUF_SHIFT to "6" or so. I hope that the alignment of the
> "struct printk_log" newer would be bigger than 64 bytes. Well, we
> could add a compile check if we want to be sure.
commit 6ebb017de seems to to have added __alignof__(struct log) but more
than anything to ensure it lets the compiler figure it out. I think
using roundup_pow_of_two() and ensuring its > __LOG_BUF_LEN would
suffice the concern too. In fact we could / should probably repurpose
the implicit alignement check with log_buf_len_setup() to share
code.
> Anyway, I do not have any strong opinion here. I might be too careful
> and the roundup_pow_of_two() is perfectly fine.
I think sharing the alignment strategy can help document how this is done.
> > +
> > + if (cpu_extra <= __LOG_BUF_LEN / 2)
> > + return 0;
> > +
> > + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> > + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> > +
> > + return extra_cpu_log_size;
> > +}
> > +
> >
> > void __init setup_log_buf(int early)
> > {
> > unsigned long flags;
> > char *new_log_buf;
> > int free;
> > -
> > - if (!new_log_buf_len)
> > - return;
> > + enum klog_setup_state new_klog_state;
> >
> > if (early) {
> > + if (!new_log_buf_len)
> > + return;
> > new_log_buf =
> > memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> > + new_klog_state = KLOG_PARAM;
> > } else {
> > - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> > + if (klog_state == KLOG_PARAM)
> > + return;
> > + if (new_log_buf_len)
> > + new_klog_state = KLOG_PARAM;
> > + else {
> > + new_log_buf_len = compute_cpu_contrib();
> > + new_klog_state = KLOG_CPU_EXTRA;
> > + }
> > + if (!new_log_buf_len)
> > + return;
> > + new_log_buf = memblock_virt_alloc(new_log_buf_len,
> > PAGE_SIZE);
>
> We should call memblock_virt_allocc_nopanic() in this else part.
Oops, sorry yes.
> Well, I am not sure if the new klog states make the code really better
> readable. I wonder where we lost the simplicity from v3 of this patch ;-)
The issue came up after I realized that an architecture that uses the early
call will end up on this path twice, the trick that the last developer did for
this situation was to use the kernel parameter as an indicator of whether or
not the parameter was treated already or not. Towards the end of the call it
uses:
new_log_buf_len = 0;
Upon entry on the second call when the kernel parameter was set we bail
as its now 0. The issue I had not addressed was three fold:
* if an architecture had used the early call new_log_buf_len would
be 0 now, and so the so check in place on v3 would not have
worked well, instead you'd have to check for log_buf_len as
Davidlohr had suspected but only for the non-early case, but only
if new_log_buf_len was 0 as an early call could have been issued.
* we want to ensure that for archs that don't use the early call
but did pass the kernel parameter that its respected and the extra
cpu stuff is ignored
* num_possible_cpus() setup_arch() requirement
After realizing the double entry issue, as well as the num_possible_cpus()
requirement to be done later, the above was the cleanest solution I could come
up with without having to repurpose new_log_buf_len and setting it to some
magic value, or adding just a bool. I decided a simpler hacky solution would
would work but considered the lack of documentation on all this nasty enough to
merit an enum and some clarification.
> What about replacing the above changes in kernel/printk/printk.c with
> the following ones:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6962ed..e00a9600f5fa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -842,12 +843,52 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static unsigned __init default_len_by_cpu_num(void)
> +{
> + int cpu_extra;
> + unsigned extra_cpu_log_size;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid.
> + */
> + if (num_possible_cpus() <= 1)
> + return 0;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> + /* make sure that the buffer is aligned */
> + cpu_extra %= LOG_ALIGN;
> + extra_cpu_log_size = roundup_pow_of_two(cpu_extra + __LOG_BUF_LEN);
>From your other e-mail:
cpu_extra &= ~(LOG_ALIGN - 1);
extra_cpu_log_size = cpu_extra + __LOG_BUF_LEN;
I think ensuring the extra_cpu_log_size is > __LOG_BUF_LEN and using
roundup_pow_of_two() with the assumption archs have made __LOG_BUF_LEN
properly arch aligned should suffice to not have to do the alignment
check above, both should work, its a judgement call thing. I'm fine either
way. Let me know.
> +
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return 0;
We could do this earlier to save ourself the not needed roundup_pow_of_two()
computation on the false case or the addition for extra_cpu_log_size.
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + return extra_cpu_log_size;
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + /* nope when already allocated earlier */
> + if (log_buf != __log_buf)
> + return;
That would do it too, all this just makes all the requirements and use
cases only legible to the reader of the callers / use cases. I do think
its cleaner.
> +
> + /*
> + * The default size need to be increased on systems with many CPUs.
> + * It is done only when an exact size is not forced by log_buf_len=n
> + * kernel parameter.
> + */
> + if (!new_log_buf_len)
> + new_log_buf_len = default_len_by_cpu_num();
Lets let my documentation exist on archive :), I'll take this and
modify it slightly with some basic comments and sharing aligment
and documenting a bit more.
> +
> + /* nope when nobody wants to increase the size after all */
> if (!new_log_buf_len)
> return;
>
> --
> 1.8.4
>
>
> I think that it is better readable than the two level if-magic with
> the three new flags. The long description of the three flags looked
> scary in itself ;-)
Yeah, true, do we want to share the alignment code with log_buf_len_setup() ?
How about this then?
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6..da57f6f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -266,6 +266,7 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
@@ -828,29 +829,68 @@ void log_buf_kexec_setup(void)
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len;
-/* save requested log_buf_len since it's too early to process it */
-static int __init log_buf_len_setup(char *str)
+/*
+ * CONFIG_LOG_BUF_SHIFT would be architecture aligned, anything > than it and
+ * a multiple of two of it upkeeps the alignment.
+ */
+static void __init log_buf_len_align(unsigned size)
{
- unsigned size = memparse(str, &str);
-
if (size)
size = roundup_pow_of_two(size);
if (size > log_buf_len)
new_log_buf_len = size;
+}
+
+/* save requested log_buf_len since it's too early to process it */
+static int __init log_buf_len_setup(char *str)
+{
+ unsigned size = memparse(str, &str);
+
+ log_buf_len_align(size);
return 0;
}
early_param("log_buf_len", log_buf_len_setup);
+static void __init log_buf_add_cpu(void)
+{
+ int cpu_extra;
+
+ /*
+ * archs should set up cpu_possible_bits properly with
+ * set_cpu_possible() after setup_arch() but just in
+ * case lets ensure this is valid. During an early
+ * call before setup_arch()() this will be 1.
+ */
+ if (num_possible_cpus() <= 1)
+ return;
+
+ cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+
+ /* by default this will only continue through for large > 64 CPUs */
+ if (cpu_extra <= __LOG_BUF_LEN / 2)
+ return;
+
+ pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+ pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+ log_buf_len_align(cpu_extra + __LOG_BUF_LEN);
+}
+
void __init setup_log_buf(int early)
{
unsigned long flags;
char *new_log_buf;
int free;
- if (!new_log_buf_len)
+ if (log_buf != __log_buf)
return;
+ if (!early && !new_log_buf_len)
+ log_buf_add_cpu();
+
+ if (!new_log_buf_len)
+ return;
if (early) {
new_log_buf =
memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
On Tue, Jun 17, 2014 at 5:18 PM, Luis R. Rodriguez <[email protected]> wrote:
> @@ -828,29 +829,68 @@ void log_buf_kexec_setup(void)
> /* requested log_buf_len from kernel cmdline */
> static unsigned long __initdata new_log_buf_len;
>
> -/* save requested log_buf_len since it's too early to process it */
> -static int __init log_buf_len_setup(char *str)
> +/*
> + * CONFIG_LOG_BUF_SHIFT would be architecture aligned, anything > than it and
> + * a multiple of two of it upkeeps the alignment.
> + */
> +static void __init log_buf_len_align(unsigned size)
> {
> - unsigned size = memparse(str, &str);
> -
> if (size)
> size = roundup_pow_of_two(size);
> if (size > log_buf_len)
> new_log_buf_len = size;
> +}
> +
> +/* save requested log_buf_len since it's too early to process it */
> +static int __init log_buf_len_setup(char *str)
> +{
> + unsigned size = memparse(str, &str);
> +
> + log_buf_len_align(size);
>
> return 0;
> }
> early_param("log_buf_len", log_buf_len_setup);
This could go as a separate patch first too, which would make the
addition very clean.
Luis
On Wed 2014-06-18 02:18:16, Luis R. Rodriguez wrote:
> On Tue, Jun 17, 2014 at 04:52:01PM +0200, Petr Ml?dek wrote:
> > On Mon 2014-06-16 17:37:44, Luis R. Rodriguez wrote:
> > > From: "Luis R. Rodriguez" <[email protected]>
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index ea2d5f6..54632a0c 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -842,20 +905,56 @@ static int __init log_buf_len_setup(char *str)
> > > }
> > > early_param("log_buf_len", log_buf_len_setup);
> > >
> > > +static unsigned __init compute_cpu_contrib(void)
> >
> > The function name is slightly misleading. It does not compute the
> > extra space but the whole length of the ring buffer. What about using
> > default_len_by_cpu_num() or so?
>
> Sure.
>
> > > +{
> > > + int cpu_extra;
> > > + unsigned extra_cpu_log_size;
> > > +
> > > + /*
> > > + * archs should set up cpu_possible_bits properly with
> > > + * set_cpu_possible() after setup_arch() but just in
> > > + * case lets ensure this is valid.
> > > + */
> > > + if (num_possible_cpus() <= 1)
> > > + return 0;
> > > +
> > > + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> > > + extra_cpu_log_size = roundup_pow_of_two(cpu_extra +
> > > __LOG_BUF_LEN);
> >
> > Great catch. Well, I am not sure if this is really
> > needed. memblock_virt_alloc() is called on another locations with "any" size.
>
> Ok, on the other hand using roundup_pow_of_two() would guarantee both general
> alignment and upkeeping the tradition of having a ring buffer as a power of
> two. I think the implicit trick here was that alignment is ensured by
> becauase __LOG_BUF_LEN would be aligned to the architecture already as its
> an architecture specific value, furthermore no value smaller than this would
> be allowed and since we're doing power of two's any further extra multiplication
> by two should be aligned to the architecture. That should also mean that we
> wouldn't have to care for the size of log_buf_add_cpu in consideration for
> the archicture alignment as its already done for us.
>
> cpu_extra &= ~(LOG_ALIGN - 1); seems fair to me as well if we want to save
> some bytes, it probably does no harm if we bumped to the next power of two
> and and we could shared aligning code to make this explicit.
>
> I'll let you make the call as you'd know more than me of the requirements
> here I'm sure.
I do not have any strong opinion here. Let's use roundup_pow_of_two()
for now. It is easier and should not cause any harm.
[...]
> > > +
> > > + if (cpu_extra <= __LOG_BUF_LEN / 2)
> > > + return 0;
> > > +
> > > + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> > > + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> > > +
> > > + return extra_cpu_log_size;
> > > +}
> > > +
> > >
> > > void __init setup_log_buf(int early)
> > > {
> > > unsigned long flags;
> > > char *new_log_buf;
> > > int free;
> > > -
> > > - if (!new_log_buf_len)
> > > - return;
> > > + enum klog_setup_state new_klog_state;
> > >
> > > if (early) {
> > > + if (!new_log_buf_len)
> > > + return;
> > > new_log_buf =
> > > memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> > > + new_klog_state = KLOG_PARAM;
> > > } else {
> > > - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> > > + if (klog_state == KLOG_PARAM)
> > > + return;
> > > + if (new_log_buf_len)
> > > + new_klog_state = KLOG_PARAM;
> > > + else {
> > > + new_log_buf_len = compute_cpu_contrib();
> > > + new_klog_state = KLOG_CPU_EXTRA;
> > > + }
> > > + if (!new_log_buf_len)
> > > + return;
> > > + new_log_buf = memblock_virt_alloc(new_log_buf_len,
> > > PAGE_SIZE);
> >
> > We should call memblock_virt_allocc_nopanic() in this else part.
>
> Oops, sorry yes.
>
> > Well, I am not sure if the new klog states make the code really better
> > readable. I wonder where we lost the simplicity from v3 of this patch ;-)
>
> The issue came up after I realized that an architecture that uses the early
> call will end up on this path twice, the trick that the last developer did for
> this situation was to use the kernel parameter as an indicator of whether or
> not the parameter was treated already or not. Towards the end of the call it
> uses:
>
> new_log_buf_len = 0;
>
> Upon entry on the second call when the kernel parameter was set we bail
> as its now 0. The issue I had not addressed was three fold:
>
> * if an architecture had used the early call new_log_buf_len would
> be 0 now, and so the so check in place on v3 would not have
> worked well, instead you'd have to check for log_buf_len as
> Davidlohr had suspected but only for the non-early case, but only
> if new_log_buf_len was 0 as an early call could have been issued.
>
> * we want to ensure that for archs that don't use the early call
> but did pass the kernel parameter that its respected and the extra
> cpu stuff is ignored
>
> * num_possible_cpus() setup_arch() requirement
>
> After realizing the double entry issue, as well as the num_possible_cpus()
> requirement to be done later, the above was the cleanest solution I could come
> up with without having to repurpose new_log_buf_len and setting it to some
> magic value, or adding just a bool. I decided a simpler hacky solution would
> would work but considered the lack of documentation on all this nasty enough to
> merit an enum and some clarification.
I see the point. I have solved this by
if (log_buf != __log_buf)
return;
You have even used this in some earlier version of the patch. It is much
easier, and better readable. IMHO, it solves the reentrancy pretty
clear way.
[...]
> >
> > I think that it is better readable than the two level if-magic with
> > the three new flags. The long description of the three flags looked
> > scary in itself ;-)
>
> Yeah, true, do we want to share the alignment code with log_buf_len_setup() ?
>
> How about this then?
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6..da57f6f 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -828,29 +829,68 @@ void log_buf_kexec_setup(void)
> /* requested log_buf_len from kernel cmdline */
> static unsigned long __initdata new_log_buf_len;
>
> -/* save requested log_buf_len since it's too early to process it */
> -static int __init log_buf_len_setup(char *str)
> +/*
> + * CONFIG_LOG_BUF_SHIFT would be architecture aligned, anything > than it and
> + * a multiple of two of it upkeeps the alignment.
> + */
> +static void __init log_buf_len_align(unsigned size)
> {
> - unsigned size = memparse(str, &str);
> -
> if (size)
> size = roundup_pow_of_two(size);
> if (size > log_buf_len)
> new_log_buf_len = size;
> +}
> +
> +/* save requested log_buf_len since it's too early to process it */
> +static int __init log_buf_len_setup(char *str)
> +{
> + unsigned size = memparse(str, &str);
> +
> + log_buf_len_align(size);
>
> return 0;
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static void __init log_buf_add_cpu(void)
> +{
> + int cpu_extra;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid. During an early
> + * call before setup_arch()() this will be 1.
> + */
> + if (num_possible_cpus() <= 1)
> + return;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> +
> + /* by default this will only continue through for large > 64 CPUs */
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + log_buf_len_align(cpu_extra + __LOG_BUF_LEN);
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> - if (!new_log_buf_len)
> + if (log_buf != __log_buf)
> return;
>
> + if (!early && !new_log_buf_len)
> + log_buf_add_cpu();
> +
> + if (!new_log_buf_len)
> + return;
I would add empty line here.
> if (early) {
> new_log_buf =
> memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
I am happy with this solution. And I agree that it is better to split
log_buf_len_align() in a separate patch as you suggested in the other
mail.
Best Regards,
Petr
On Wed, Jun 18, 2014 at 10:31:02AM +0200, Petr Ml?dek wrote:
> On Wed 2014-06-18 02:18:16, Luis R. Rodriguez wrote:
> I am happy with this solution. And I agree that it is better to split
> log_buf_len_align() in a separate patch as you suggested in the other
> mail.
OK just to be on safe side I noticed memblock_virt_alloc() and
memblock_virt_alloc_nopanic() allow passing an explicit alignment
requirement, traced back the orignal code with no good reason to
not use the LOG_ALIGN, so I think using that would be the safest
thing to do. Will roll that into the first patch, curious if the
folks that ran into the alignment issues on ARM could reproduce
an align barf without this on some situations, perhaps not because
of the power of 2 thing and since the min value for LOG_BUF_SHIFT
is 12.
Luis
On Wed 2014-06-18 12:59:26, Luis R. Rodriguez wrote:
> On Wed, Jun 18, 2014 at 10:31:02AM +0200, Petr Ml?dek wrote:
> > On Wed 2014-06-18 02:18:16, Luis R. Rodriguez wrote:
> > I am happy with this solution. And I agree that it is better to split
> > log_buf_len_align() in a separate patch as you suggested in the other
> > mail.
>
> OK just to be on safe side I noticed memblock_virt_alloc() and
> memblock_virt_alloc_nopanic() allow passing an explicit alignment
> requirement, traced back the orignal code with no good reason to
> not use the LOG_ALIGN, so I think using that would be the safest
> thing to do. Will roll that into the first patch, curious if the
> folks that ran into the alignment issues on ARM could reproduce
> an align barf without this on some situations, perhaps not because
> of the power of 2 thing and since the min value for LOG_BUF_SHIFT
> is 12.
Great catch. It makes sense to me. There is no reason to have aligned
stores when the buffer itself is not properly aligned.
IMHO, it would make sense to have separate patch for this change. It might be
candidate for stable releases.
Best Regards,
Petr
On Wed, Jun 18, 2014 at 04:21:45PM +0200, Petr Ml?dek wrote:
> On Wed 2014-06-18 12:59:26, Luis R. Rodriguez wrote:
> > On Wed, Jun 18, 2014 at 10:31:02AM +0200, Petr Ml?dek wrote:
> > > On Wed 2014-06-18 02:18:16, Luis R. Rodriguez wrote:
> > > I am happy with this solution. And I agree that it is better to split
> > > log_buf_len_align() in a separate patch as you suggested in the other
> > > mail.
> >
> > OK just to be on safe side I noticed memblock_virt_alloc() and
> > memblock_virt_alloc_nopanic() allow passing an explicit alignment
> > requirement, traced back the orignal code with no good reason to
> > not use the LOG_ALIGN, so I think using that would be the safest
> > thing to do. Will roll that into the first patch, curious if the
> > folks that ran into the alignment issues on ARM could reproduce
> > an align barf without this on some situations, perhaps not because
> > of the power of 2 thing and since the min value for LOG_BUF_SHIFT
> > is 12.
>
> Great catch. It makes sense to me. There is no reason to have aligned
> stores when the buffer itself is not properly aligned.
>
> IMHO, it would make sense to have separate patch for this change. It might be
> candidate for stable releases.
OK thanks for the review and all your help, I'll split that up into another
patch, so it'll be 3 total.
Luis