2014-06-18 11:14:33

by Luis Chamberlain

[permalink] [raw]
Subject: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

From: "Luis R. Rodriguez" <[email protected]>

We have to consider alignment for the ring buffer both for the
default static size, and then also for when an dynamic allocation
is made when the log_buf_len=n kernel parameter is passed to set
the size specifically to a size larger than the default size set
by the architecture through CONFIG_LOG_BUF_SHIFT.

The default static kernel ring buffer can be aligned properly if
architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
aligned value it can be reduced to a non aligned value. Commit
6ebb017de9 by Andrew ensures the static buffer is always aligned
and the decision of alignment is done by the compiler by using
__alignof__(struct log) (curious what value caused the crash?).

When log_buf_len=n is used we allocate the ring buffer dynamically.
Dynamic allocation varies, for the early allocation called
before setup_arch() memblock_virt_alloc() requests a page aligment
and for the default kernel allocation memblock_virt_alloc_nopanic()
requests no special alignment, which in turn ends up aligning the
allocation to SMP_CACHE_BYTES, which is L1 cache aligned.

Since we already have the required alignment for the kernel ring
buffer though we can do better and request explicit alignment for
LOG_ALIGN. Do that and also put the power of 2 practice of
the ring buffer size into a helper which we'll use later.

Cc: Andrew Lunn <[email protected]>
Cc: Stephen Warren <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---

This is perhaps not required given that we stick to powers of 2
and the min LOG_BUF_SHIFT is 12, if the min LOG_BUF_SHIFT is
aligned then I think any passed log_buf_len=n would be aligned
as well as we don't make log_buf_len=n take effect unless
its > than the default size, and we round to the produced size
to the next power of 2. If the min length produced by
LOG_BUF_SHIFT is aligned, multiples of 2 of this should be as
well I think.

This might be perhaps safest thing to do though given we'll
add other alloc entries next.

kernel/printk/printk.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6..af164a7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -828,15 +828,21 @@ void log_buf_kexec_setup(void)
/* requested log_buf_len from kernel cmdline */
static unsigned long __initdata new_log_buf_len;

-/* save requested log_buf_len since it's too early to process it */
-static int __init log_buf_len_setup(char *str)
+/* we practice scaling the ring buffer by powers of 2 */
+static void __init log_buf_len_update(unsigned size)
{
- unsigned size = memparse(str, &str);
-
if (size)
size = roundup_pow_of_two(size);
if (size > log_buf_len)
new_log_buf_len = size;
+}
+
+/* save requested log_buf_len since it's too early to process it */
+static int __init log_buf_len_setup(char *str)
+{
+ unsigned size = memparse(str, &str);
+
+ log_buf_len_update(size);

return 0;
}
@@ -853,9 +859,10 @@ void __init setup_log_buf(int early)

if (early) {
new_log_buf =
- memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
+ memblock_virt_alloc(new_log_buf_len, LOG_ALIGN);
} else {
- new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
+ new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len,
+ LOG_ALIGN);
}

if (unlikely(!new_log_buf)) {
--
1.9.3


2014-06-18 11:14:40

by Luis Chamberlain

[permalink] [raw]
Subject: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

From: "Luis R. Rodriguez" <[email protected]>

The default size of the ring buffer is too small for machines
with a large amount of CPUs under heavy load. What ends up
happening when debugging is the ring buffer overlaps and chews
up old messages making debugging impossible unless the size is
passed as a kernel parameter. An idle system upon boot up will
on average spew out only about one or two extra lines but where
this really matters is on heavy load and that will vary widely
depending on the system and environment.

There are mechanisms to help increase the kernel ring buffer
for tracing through debugfs, and those interfaces even allow growing
the kernel ring buffer per CPU. We also have a static value which
can be passed upon boot. Relying on debugfs however is not ideal
for production, and relying on the value passed upon bootup is
can only used *after* an issue has creeped up. Instead of being
reactive this adds a proactive measure which lets you scale the
amount of contributions you'd expect to the kernel ring buffer
under load by each CPU in the worst case scenario.

We use num_possible_cpus() to avoid complexities which could be
introduced by dynamically changing the ring buffer size at run
time, num_possible_cpus() lets us use the upper limit on possible
number of CPUs therefore avoiding having to deal with hotplugging
CPUs on and off. This introduces the kernel configuration option
LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
of contributions to the kernel ring buffer in the worst case before
the kernel ring buffer flips over, the size is specified as a power
of 2. The total amount of contributions made by each CPU must be
greater than half of the default kernel ring buffer size
(1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
bootup. The kernel ring buffer is increased to the next power of
two that would fit the required minimum kernel ring buffer size
plus the additional CPU contribution. For example if LOG_BUF_SHIFT
is 18 (256 KB) you'd require at least 128 KB contributions by
other CPUs in order to trigger an increase of the kernel ring buffer.
With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
anything over > 64 possible CPUs to trigger an increase. If you
had 128 possible CPUs the amount of minimum required kernel ring
buffer bumps to:

((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB

Since we require the ring buffer to be a power of two the new
required size would be 1024 KB.

This CPU contributions are ignored when the "log_buf_len" kernel parameter
is used as it forces the exact size of the ring buffer to an expected power
of two value.

Cc: Andrew Lunn <[email protected]>
Cc: Stephen Warren <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
Documentation/kernel-parameters.txt | 8 ++++--
init/Kconfig | 53 ++++++++++++++++++++++++++++++++++++-
kernel/printk/printk.c | 32 ++++++++++++++++++++++
3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6eaa9cd..229d031 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
7 (KERN_DEBUG) debug-level messages

log_buf_len=n[KMG] Sets the size of the printk ring buffer,
- in bytes. n must be a power of two. The default
- size is set in the kernel config file.
+ in bytes. n must be a power of two and greater
+ than the minimal size. The minimal size is defined
+ by LOG_BUF_SHIFT kernel config parameter. There is
+ also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
+ that allows to increase the default size depending on
+ the number of CPUs. See init/Kconfig for more details.

logo.nologo [FB] Disables display of the built-in Linux logo.
This may be used to provide more screen space for
diff --git a/init/Kconfig b/init/Kconfig
index 9d76b99..69bdbcf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
range 12 21
default 17
help
- Select kernel log buffer size as a power of 2.
+ Select the minimal kernel log buffer size as a power of 2.
+ The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
+ parameter, see below. Any higher size also might be forced
+ by "log_buf_len" boot parameter.
+
Examples:
17 => 128 KB
16 => 64 KB
@@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
13 => 8 KB
12 => 4 KB

+config LOG_CPU_MIN_BUF_SHIFT
+ int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
+ range 0 21
+ default 12
+ depends on SMP
+ depends on !BASE_SMALL
+ help
+ The kernel ring buffer will get additional data logged onto it
+ when multiple CPUs are supported. Typically the contributions are
+ only a few lines when idle however under under load this can vary
+ and in the worst case it can mean losing logging information. You
+ can use this to set the maximum expected mount of amount of logging
+ contribution under load by each CPU in the worst case scenario, as
+ a power of 2. The total amount of contributions made by each CPU
+ must be greater than half of the default kernel ring buffer size
+ ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
+ bootup. If an increase is required the ring buffer is increated to
+ the next power of 2 that can fit both the minimum kernel ring buffer
+ (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
+ For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
+ 128 KB contributions by other CPUs in order to trigger an increase.
+ With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
+ over > 64 possible CPUs to trigger an increase. If you had 128
+ possible CPUs the new minimum required kernel ring buffer size
+ would be:
+
+ ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
+
+ Since we only allow powers of two for the kernel ring buffer size the
+ new kernel ring buffer size would be 1024 KB.
+
+ CPU contributions are ignored when "log_buf_len" kernel parameter is
+ used as it forces an exact (power of two) size of the ring buffer to
+ an expected value.
+
+ The number of possible CPUs is used for this computation ignoring
+ hotplugging making the compuation optimal for the the worst case
+ scenerio while allowing a simple algorithm to be used from bootup.
+
+ Examples shift values and their meaning:
+ 17 => 128 KB for each CPU
+ 16 => 64 KB for each CPU
+ 15 => 32 KB for each CPU
+ 14 => 16 KB for each CPU
+ 13 => 8 KB for each CPU
+ 12 => 4 KB for each CPU
+
#
# Architectures with an unreliable sched_clock() should select this:
#
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index af164a7..7c7b599 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -266,6 +266,7 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
@@ -848,12 +849,43 @@ static int __init log_buf_len_setup(char *str)
}
early_param("log_buf_len", log_buf_len_setup);

+static void __init log_buf_add_cpu(void)
+{
+ int cpu_extra;
+
+ /*
+ * archs should set up cpu_possible_bits properly with
+ * set_cpu_possible() after setup_arch() but just in
+ * case lets ensure this is valid. During an early
+ * call before setup_arch()() this will be 1.
+ */
+ if (num_possible_cpus() <= 1)
+ return;
+
+ cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
+
+ /* by default this will only continue through for large > 64 CPUs */
+ if (cpu_extra <= __LOG_BUF_LEN / 2)
+ return;
+
+ pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
+ pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
+
+ log_buf_len_update(cpu_extra + __LOG_BUF_LEN);
+}
+
void __init setup_log_buf(int early)
{
unsigned long flags;
char *new_log_buf;
int free;

+ if (log_buf != __log_buf)
+ return;
+
+ if (!early && !new_log_buf_len)
+ log_buf_add_cpu();
+
if (!new_log_buf_len)
return;

--
1.9.3

2014-06-18 15:40:28

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

On Wed 2014-06-18 04:14:24, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> We have to consider alignment for the ring buffer both for the
> default static size, and then also for when an dynamic allocation
> is made when the log_buf_len=n kernel parameter is passed to set
> the size specifically to a size larger than the default size set
> by the architecture through CONFIG_LOG_BUF_SHIFT.
>
> The default static kernel ring buffer can be aligned properly if
> architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
> for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
> aligned value it can be reduced to a non aligned value. Commit
> 6ebb017de9 by Andrew ensures the static buffer is always aligned
> and the decision of alignment is done by the compiler by using
> __alignof__(struct log) (curious what value caused the crash?).
>
> When log_buf_len=n is used we allocate the ring buffer dynamically.
> Dynamic allocation varies, for the early allocation called
> before setup_arch() memblock_virt_alloc() requests a page aligment
> and for the default kernel allocation memblock_virt_alloc_nopanic()
> requests no special alignment, which in turn ends up aligning the
> allocation to SMP_CACHE_BYTES, which is L1 cache aligned.
>
> Since we already have the required alignment for the kernel ring
> buffer though we can do better and request explicit alignment for
> LOG_ALIGN. Do that and also put the power of 2 practice of
> the ring buffer size into a helper which we'll use later.
>
> Cc: Andrew Lunn <[email protected]>
> Cc: Stephen Warren <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Arun KS <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>

The change makes sense to me.

Acked-by: Petr Mladek <[email protected]>
Tested-by: Petr Mladek <[email protected]>

> This is perhaps not required given that we stick to powers of 2
> and the min LOG_BUF_SHIFT is 12, if the min LOG_BUF_SHIFT is
> aligned then I think any passed log_buf_len=n would be aligned
> as well as we don't make log_buf_len=n take effect unless
> its > than the default size, and we round to the produced size
> to the next power of 2. If the min length produced by
> LOG_BUF_SHIFT is aligned, multiples of 2 of this should be as
> well I think.
>
> This might be perhaps safest thing to do though given we'll
> add other alloc entries next.
>
> kernel/printk/printk.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6..af164a7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -828,15 +828,21 @@ void log_buf_kexec_setup(void)
> /* requested log_buf_len from kernel cmdline */
> static unsigned long __initdata new_log_buf_len;
>
> -/* save requested log_buf_len since it's too early to process it */
> -static int __init log_buf_len_setup(char *str)
> +/* we practice scaling the ring buffer by powers of 2 */
> +static void __init log_buf_len_update(unsigned size)
> {
> - unsigned size = memparse(str, &str);
> -
> if (size)
> size = roundup_pow_of_two(size);
> if (size > log_buf_len)
> new_log_buf_len = size;
> +}
> +
> +/* save requested log_buf_len since it's too early to process it */
> +static int __init log_buf_len_setup(char *str)
> +{
> + unsigned size = memparse(str, &str);
> +
> + log_buf_len_update(size);
>
> return 0;
> }
> @@ -853,9 +859,10 @@ void __init setup_log_buf(int early)
>
> if (early) {
> new_log_buf =
> - memblock_virt_alloc(new_log_buf_len, PAGE_SIZE);
> + memblock_virt_alloc(new_log_buf_len, LOG_ALIGN);
> } else {
> - new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len, 0);
> + new_log_buf = memblock_virt_alloc_nopanic(new_log_buf_len,
> + LOG_ALIGN);
> }
>
> if (unlikely(!new_log_buf)) {
> --
> 1.9.3
>

2014-06-18 15:42:58

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

On Wed 2014-06-18 04:14:25, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> The default size of the ring buffer is too small for machines
> with a large amount of CPUs under heavy load. What ends up
> happening when debugging is the ring buffer overlaps and chews
> up old messages making debugging impossible unless the size is
> passed as a kernel parameter. An idle system upon boot up will
> on average spew out only about one or two extra lines but where
> this really matters is on heavy load and that will vary widely
> depending on the system and environment.
>
> There are mechanisms to help increase the kernel ring buffer
> for tracing through debugfs, and those interfaces even allow growing
> the kernel ring buffer per CPU. We also have a static value which
> can be passed upon boot. Relying on debugfs however is not ideal
> for production, and relying on the value passed upon bootup is
> can only used *after* an issue has creeped up. Instead of being
> reactive this adds a proactive measure which lets you scale the
> amount of contributions you'd expect to the kernel ring buffer
> under load by each CPU in the worst case scenario.
>
> We use num_possible_cpus() to avoid complexities which could be
> introduced by dynamically changing the ring buffer size at run
> time, num_possible_cpus() lets us use the upper limit on possible
> number of CPUs therefore avoiding having to deal with hotplugging
> CPUs on and off. This introduces the kernel configuration option
> LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
> of contributions to the kernel ring buffer in the worst case before
> the kernel ring buffer flips over, the size is specified as a power
> of 2. The total amount of contributions made by each CPU must be
> greater than half of the default kernel ring buffer size
> (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
> bootup. The kernel ring buffer is increased to the next power of
> two that would fit the required minimum kernel ring buffer size
> plus the additional CPU contribution. For example if LOG_BUF_SHIFT
> is 18 (256 KB) you'd require at least 128 KB contributions by
> other CPUs in order to trigger an increase of the kernel ring buffer.
> With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
> anything over > 64 possible CPUs to trigger an increase. If you
> had 128 possible CPUs the amount of minimum required kernel ring
> buffer bumps to:
>
> ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
>
> Since we require the ring buffer to be a power of two the new
> required size would be 1024 KB.
>
> This CPU contributions are ignored when the "log_buf_len" kernel parameter
> is used as it forces the exact size of the ring buffer to an expected power
> of two value.
>
> Cc: Andrew Lunn <[email protected]>
> Cc: Stephen Warren <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Arun KS <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>

It is where we ended after several versions. I am happy with this
state.

Signed-off-by: Petr Mladek <[email protected]>
Tested-by: Petr Mladek <[email protected]>

> ---
> Documentation/kernel-parameters.txt | 8 ++++--
> init/Kconfig | 53 ++++++++++++++++++++++++++++++++++++-
> kernel/printk/printk.c | 32 ++++++++++++++++++++++
> 3 files changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 6eaa9cd..229d031 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> 7 (KERN_DEBUG) debug-level messages
>
> log_buf_len=n[KMG] Sets the size of the printk ring buffer,
> - in bytes. n must be a power of two. The default
> - size is set in the kernel config file.
> + in bytes. n must be a power of two and greater
> + than the minimal size. The minimal size is defined
> + by LOG_BUF_SHIFT kernel config parameter. There is
> + also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
> + that allows to increase the default size depending on
> + the number of CPUs. See init/Kconfig for more details.
>
> logo.nologo [FB] Disables display of the built-in Linux logo.
> This may be used to provide more screen space for
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d76b99..69bdbcf 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
> range 12 21
> default 17
> help
> - Select kernel log buffer size as a power of 2.
> + Select the minimal kernel log buffer size as a power of 2.
> + The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
> + parameter, see below. Any higher size also might be forced
> + by "log_buf_len" boot parameter.
> +
> Examples:
> 17 => 128 KB
> 16 => 64 KB
> @@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
> 13 => 8 KB
> 12 => 4 KB
>
> +config LOG_CPU_MIN_BUF_SHIFT
> + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> + range 0 21
> + default 12
> + depends on SMP
> + depends on !BASE_SMALL
> + help
> + The kernel ring buffer will get additional data logged onto it
> + when multiple CPUs are supported. Typically the contributions are
> + only a few lines when idle however under under load this can vary
> + and in the worst case it can mean losing logging information. You
> + can use this to set the maximum expected mount of amount of logging
> + contribution under load by each CPU in the worst case scenario, as
> + a power of 2. The total amount of contributions made by each CPU
> + must be greater than half of the default kernel ring buffer size
> + ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
> + bootup. If an increase is required the ring buffer is increated to
> + the next power of 2 that can fit both the minimum kernel ring buffer
> + (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
> + For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
> + 128 KB contributions by other CPUs in order to trigger an increase.
> + With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
> + over > 64 possible CPUs to trigger an increase. If you had 128
> + possible CPUs the new minimum required kernel ring buffer size
> + would be:
> +
> + ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
> +
> + Since we only allow powers of two for the kernel ring buffer size the
> + new kernel ring buffer size would be 1024 KB.
> +
> + CPU contributions are ignored when "log_buf_len" kernel parameter is
> + used as it forces an exact (power of two) size of the ring buffer to
> + an expected value.
> +
> + The number of possible CPUs is used for this computation ignoring
> + hotplugging making the compuation optimal for the the worst case
> + scenerio while allowing a simple algorithm to be used from bootup.
> +
> + Examples shift values and their meaning:
> + 17 => 128 KB for each CPU
> + 16 => 64 KB for each CPU
> + 15 => 32 KB for each CPU
> + 14 => 16 KB for each CPU
> + 13 => 8 KB for each CPU
> + 12 => 4 KB for each CPU
> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index af164a7..7c7b599 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -848,12 +849,43 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static void __init log_buf_add_cpu(void)
> +{
> + int cpu_extra;
> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid. During an early
> + * call before setup_arch()() this will be 1.
> + */
> + if (num_possible_cpus() <= 1)
> + return;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> +
> + /* by default this will only continue through for large > 64 CPUs */
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + log_buf_len_update(cpu_extra + __LOG_BUF_LEN);
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + if (log_buf != __log_buf)
> + return;
> +
> + if (!early && !new_log_buf_len)
> + log_buf_add_cpu();
> +
> if (!new_log_buf_len)
> return;
>
> --
> 1.9.3
>

2014-06-18 15:56:09

by Stephen Warren

[permalink] [raw]
Subject: Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

On 06/18/2014 05:14 AM, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> We have to consider alignment for the ring buffer both for the
> default static size, and then also for when an dynamic allocation
> is made when the log_buf_len=n kernel parameter is passed to set
> the size specifically to a size larger than the default size set
> by the architecture through CONFIG_LOG_BUF_SHIFT.
>
> The default static kernel ring buffer can be aligned properly if
> architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
> for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
> aligned value it can be reduced to a non aligned value. Commit
> 6ebb017de9 by Andrew ensures the static buffer is always aligned
> and the decision of alignment is done by the compiler by using
> __alignof__(struct log) (curious what value caused the crash?).

IIRC the issue was that __log_buf's type is char[] so without the
__aligned it could have any alignment at all, e.g. 1 or 2. However,
struct printk_log is stored in the buffer rather than just char*, and so
if __log_buf isn't aligned to the required alignment for that structure,
that can caused unaligned accesses to fields in the structure, which
isn't supported on ARM in at least some cases.

As such, I think the change to setup_log_buf() in this patch makes sense
(although I suppose in practice memblock_virt_alloc() probably has some
minimum internal alignment that dwards LOG_ALIGN, but that's an
implementation detail we shouldn't rely on).

2014-06-18 18:11:31

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

On Wed, 2014-06-18 at 04:14 -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> The default size of the ring buffer is too small for machines
> with a large amount of CPUs under heavy load. What ends up
> happening when debugging is the ring buffer overlaps and chews
> up old messages making debugging impossible unless the size is
> passed as a kernel parameter. An idle system upon boot up will
> on average spew out only about one or two extra lines but where
> this really matters is on heavy load and that will vary widely
> depending on the system and environment.
>
> There are mechanisms to help increase the kernel ring buffer
> for tracing through debugfs, and those interfaces even allow growing
> the kernel ring buffer per CPU. We also have a static value which
> can be passed upon boot. Relying on debugfs however is not ideal
> for production, and relying on the value passed upon bootup is
> can only used *after* an issue has creeped up. Instead of being
> reactive this adds a proactive measure which lets you scale the
> amount of contributions you'd expect to the kernel ring buffer
> under load by each CPU in the worst case scenario.
>
> We use num_possible_cpus() to avoid complexities which could be
> introduced by dynamically changing the ring buffer size at run
> time, num_possible_cpus() lets us use the upper limit on possible
> number of CPUs therefore avoiding having to deal with hotplugging
> CPUs on and off. This introduces the kernel configuration option
> LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
> of contributions to the kernel ring buffer in the worst case before
> the kernel ring buffer flips over, the size is specified as a power
> of 2. The total amount of contributions made by each CPU must be
> greater than half of the default kernel ring buffer size
> (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
> bootup. The kernel ring buffer is increased to the next power of
> two that would fit the required minimum kernel ring buffer size
> plus the additional CPU contribution. For example if LOG_BUF_SHIFT
> is 18 (256 KB) you'd require at least 128 KB contributions by
> other CPUs in order to trigger an increase of the kernel ring buffer.
> With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
> anything over > 64 possible CPUs to trigger an increase. If you
> had 128 possible CPUs the amount of minimum required kernel ring
> buffer bumps to:
>
> ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
>
> Since we require the ring buffer to be a power of two the new
> required size would be 1024 KB.
>
> This CPU contributions are ignored when the "log_buf_len" kernel parameter
> is used as it forces the exact size of the ring buffer to an expected power
> of two value.
>
> Cc: Andrew Lunn <[email protected]>
> Cc: Stephen Warren <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Joe Perches <[email protected]>
> Cc: Arun KS <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>

Looks good Luis, thanks a lot for doing this -- it will definitely help
my everyday debugging issues on huge machines.

I ran this on my 160-core Westmere. Some nits below, otherwise:

Reviewed-and-tested-by: Davidlohr Bueso <[email protected]>

> ---
> Documentation/kernel-parameters.txt | 8 ++++--
> init/Kconfig | 53 ++++++++++++++++++++++++++++++++++++-
> kernel/printk/printk.c | 32 ++++++++++++++++++++++
> 3 files changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 6eaa9cd..229d031 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1685,8 +1685,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> 7 (KERN_DEBUG) debug-level messages
>
> log_buf_len=n[KMG] Sets the size of the printk ring buffer,
> - in bytes. n must be a power of two. The default
> - size is set in the kernel config file.
> + in bytes. n must be a power of two and greater
> + than the minimal size. The minimal size is defined
> + by LOG_BUF_SHIFT kernel config parameter. There is
> + also CONFIG_LOG_CPU_MIN_BUF_SHIFT config parameter
> + that allows to increase the default size depending on
> + the number of CPUs. See init/Kconfig for more details.
>
> logo.nologo [FB] Disables display of the built-in Linux logo.
> This may be used to provide more screen space for
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d76b99..69bdbcf 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -807,7 +807,11 @@ config LOG_BUF_SHIFT
> range 12 21
> default 17
> help
> - Select kernel log buffer size as a power of 2.
> + Select the minimal kernel log buffer size as a power of 2.
> + The final size is affected by LOG_CPU_MIN_BUF_SHIFT config
> + parameter, see below. Any higher size also might be forced
> + by "log_buf_len" boot parameter.
> +
> Examples:
> 17 => 128 KB
> 16 => 64 KB
> @@ -816,6 +820,53 @@ config LOG_BUF_SHIFT
> 13 => 8 KB
> 12 => 4 KB
>
> +config LOG_CPU_MIN_BUF_SHIFT
> + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> + range 0 21
> + default 12
> + depends on SMP
> + depends on !BASE_SMALL
> + help
> + The kernel ring buffer will get additional data logged onto it
> + when multiple CPUs are supported. Typically the contributions are
> + only a few lines when idle however under under load this can vary
> + and in the worst case it can mean losing logging information. You
> + can use this to set the maximum expected mount of amount of logging
> + contribution under load by each CPU in the worst case scenario, as
> + a power of 2. The total amount of contributions made by each CPU
> + must be greater than half of the default kernel ring buffer size
> + ((1 << LOG_BUF_SHIFT / 2 bytes)) in order to trigger an increase upon
> + bootup. If an increase is required the ring buffer is increated to
> + the next power of 2 that can fit both the minimum kernel ring buffer
> + (LOG_BUF_SHIFT) plus the additional worst case CPU contributions.
> + For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at laest
> + 128 KB contributions by other CPUs in order to trigger an increase.
> + With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything
> + over > 64 possible CPUs to trigger an increase. If you had 128
> + possible CPUs the new minimum required kernel ring buffer size
> + would be:
> +
> + ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
> +
> + Since we only allow powers of two for the kernel ring buffer size the
> + new kernel ring buffer size would be 1024 KB.
> +
> + CPU contributions are ignored when "log_buf_len" kernel parameter is
> + used as it forces an exact (power of two) size of the ring buffer to
> + an expected value.
> +
> + The number of possible CPUs is used for this computation ignoring
> + hotplugging making the compuation optimal for the the worst case
> + scenerio while allowing a simple algorithm to be used from bootup.
> +
> + Examples shift values and their meaning:
> + 17 => 128 KB for each CPU
> + 16 => 64 KB for each CPU
> + 15 => 32 KB for each CPU
> + 14 => 16 KB for each CPU
> + 13 => 8 KB for each CPU
> + 12 => 4 KB for each CPU
> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index af164a7..7c7b599 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -266,6 +266,7 @@ static u32 clear_idx;
> #define LOG_ALIGN __alignof__(struct printk_log)
> #endif
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -848,12 +849,43 @@ static int __init log_buf_len_setup(char *str)
> }
> early_param("log_buf_len", log_buf_len_setup);
>
> +static void __init log_buf_add_cpu(void)
> +{
> + int cpu_extra;

unsigned int

> +
> + /*
> + * archs should set up cpu_possible_bits properly with
> + * set_cpu_possible() after setup_arch() but just in
> + * case lets ensure this is valid. During an early
> + * call before setup_arch()() this will be 1.
> + */
> + if (num_possible_cpus() <= 1)

This can never return 0, so how about making it == 1?

> + return;
> +
> + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> +
> + /* by default this will only continue through for large > 64 CPUs */
> + if (cpu_extra <= __LOG_BUF_LEN / 2)
> + return;
> +
> + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);

We should add 'bytes' for units. Also, while at it, how about making it
easier for users and also print the total size (num_possible_cpus() *
cpu_extra)?

> + pr_info("log_buf_len min size: %d\n", __LOG_BUF_LEN);
> +
> + log_buf_len_update(cpu_extra + __LOG_BUF_LEN);
> +}
> +
> void __init setup_log_buf(int early)
> {
> unsigned long flags;
> char *new_log_buf;
> int free;
>
> + if (log_buf != __log_buf)
> + return;
> +
> + if (!early && !new_log_buf_len)
> + log_buf_add_cpu();
> +
> if (!new_log_buf_len)
> return;
>

2014-06-18 18:15:09

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

On Wed, 2014-06-18 at 11:11 -0700, Davidlohr Bueso wrote:
> Also, while at it, how about making it
> easier for users and also print the total size (num_possible_cpus() *
> cpu_extra)?

Sorry, I meant the individual contribution of each CPU, not the total
which you already have.

2014-06-18 19:21:38

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

On Wed, Jun 18, 2014 at 11:11:12AM -0700, Davidlohr Bueso wrote:
> On Wed, 2014-06-18 at 04:14 -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> Looks good Luis, thanks a lot for doing this -- it will definitely help
> my everyday debugging issues on huge machines.
>
> I ran this on my 160-core Westmere. Some nits below, otherwise:
>
> Reviewed-and-tested-by: Davidlohr Bueso <[email protected]>

Great thanks for testing and your review!

> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index af164a7..7c7b599 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -848,12 +849,43 @@ static int __init log_buf_len_setup(char *str)
> > }
> > early_param("log_buf_len", log_buf_len_setup);
> >
> > +static void __init log_buf_add_cpu(void)
> > +{
> > + int cpu_extra;
>
> unsigned int

Amended.

> > +
> > + /*
> > + * archs should set up cpu_possible_bits properly with
> > + * set_cpu_possible() after setup_arch() but just in
> > + * case lets ensure this is valid. During an early
> > + * call before setup_arch()() this will be 1.
> > + */
> > + if (num_possible_cpus() <= 1)
>
> This can never return 0, so how about making it == 1?

I was originally concerned over the early boot code which had
not yet called setup_arch() but since we now have a check for
early on setup_log_buf() before calling log_buf_add_cpu() I
think its safe to check for 1 then, will change! I'll also
remove the note about this always returning 1 on early init
before setup_arch() as I only confirmed that for x86 -- unless
of course there is code that ensures this for early boot for
all archs, I just can't find it.

> > + return;
> > +
> > + cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> > +
> > + /* by default this will only continue through for large > 64 CPUs */
> > + if (cpu_extra <= __LOG_BUF_LEN / 2)
> > + return;
> > +
> > + pr_info("log_buf_len cpu_extra contribution: %d\n", cpu_extra);
>
> We should add 'bytes' for units.

That should mean also amending the orignal setup_log_buf() for the final size,
will do that too.

> Also, while at it, how about making it easier for users and also print
> the individual contribution of each CPU

Sure, done. While at it I renamed LOG_CPU_MIN_BUF_SHIFT to MAX to annotate folks
want to to consider the worst case scenario to help with debugging on production.

Luis

2014-06-18 19:33:32

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

On Wed, Jun 18, 2014 at 09:56:03AM -0600, Stephen Warren wrote:
> On 06/18/2014 05:14 AM, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> >
> > We have to consider alignment for the ring buffer both for the
> > default static size, and then also for when an dynamic allocation
> > is made when the log_buf_len=n kernel parameter is passed to set
> > the size specifically to a size larger than the default size set
> > by the architecture through CONFIG_LOG_BUF_SHIFT.
> >
> > The default static kernel ring buffer can be aligned properly if
> > architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
> > for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
> > aligned value it can be reduced to a non aligned value. Commit
> > 6ebb017de9 by Andrew ensures the static buffer is always aligned
> > and the decision of alignment is done by the compiler by using
> > __alignof__(struct log) (curious what value caused the crash?).
>
> IIRC the issue was that __log_buf's type is char[] so without the
> __aligned it could have any alignment at all, e.g. 1 or 2. However,
> struct printk_log is stored in the buffer rather than just char*, and so
> if __log_buf isn't aligned to the required alignment for that structure,
> that can caused unaligned accesses to fields in the structure, which
> isn't supported on ARM in at least some cases.
>
> As such, I think the change to setup_log_buf() in this patch makes sense
> (although I suppose in practice memblock_virt_alloc() probably has some
> minimum internal alignment that dwards LOG_ALIGN, but that's an
> implementation detail we shouldn't rely on).

Thanks for the feedback.

memblock_virt_alloc() will by default align to L1 cache, so if that satisfies
the architecture alignment it should be safe, but perhaps not optimal for
saving a few bytes. Still curious if without this patch a crash can be
triggered somehow with some log_buf_len=n, if so this can go to stable.

Luis

2014-06-18 19:46:23

by Stephen Warren

[permalink] [raw]
Subject: Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

On 06/18/2014 01:33 PM, Luis R. Rodriguez wrote:
> On Wed, Jun 18, 2014 at 09:56:03AM -0600, Stephen Warren wrote:
>> On 06/18/2014 05:14 AM, Luis R. Rodriguez wrote:
>>> From: "Luis R. Rodriguez" <[email protected]>
>>>
>>> We have to consider alignment for the ring buffer both for the
>>> default static size, and then also for when an dynamic allocation
>>> is made when the log_buf_len=n kernel parameter is passed to set
>>> the size specifically to a size larger than the default size set
>>> by the architecture through CONFIG_LOG_BUF_SHIFT.
>>>
>>> The default static kernel ring buffer can be aligned properly if
>>> architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
>>> for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
>>> aligned value it can be reduced to a non aligned value. Commit
>>> 6ebb017de9 by Andrew ensures the static buffer is always aligned
>>> and the decision of alignment is done by the compiler by using
>>> __alignof__(struct log) (curious what value caused the crash?).
>>
>> IIRC the issue was that __log_buf's type is char[] so without the
>> __aligned it could have any alignment at all, e.g. 1 or 2. However,
>> struct printk_log is stored in the buffer rather than just char*, and so
>> if __log_buf isn't aligned to the required alignment for that structure,
>> that can caused unaligned accesses to fields in the structure, which
>> isn't supported on ARM in at least some cases.
>>
>> As such, I think the change to setup_log_buf() in this patch makes sense
>> (although I suppose in practice memblock_virt_alloc() probably has some
>> minimum internal alignment that dwards LOG_ALIGN, but that's an
>> implementation detail we shouldn't rely on).
>
> Thanks for the feedback.
>
> memblock_virt_alloc() will by default align to L1 cache, so if that satisfies
> the architecture alignment it should be safe, but perhaps not optimal for
> saving a few bytes. Still curious if without this patch a crash can be
> triggered somehow with some log_buf_len=n, if so this can go to stable.

If memblock_virt_alloc() aligns to L1 cache, then I believe that the
crash would never trigger.

2014-06-18 20:03:37

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFT 1/2] printk: make dynamic kernel ring buffer alignemnt explicit

On Wed, Jun 18, 2014 at 01:46:18PM -0600, Stephen Warren wrote:
> On 06/18/2014 01:33 PM, Luis R. Rodriguez wrote:
> > On Wed, Jun 18, 2014 at 09:56:03AM -0600, Stephen Warren wrote:
> >> On 06/18/2014 05:14 AM, Luis R. Rodriguez wrote:
> >>> From: "Luis R. Rodriguez" <[email protected]>
> >>>
> >>> We have to consider alignment for the ring buffer both for the
> >>> default static size, and then also for when an dynamic allocation
> >>> is made when the log_buf_len=n kernel parameter is passed to set
> >>> the size specifically to a size larger than the default size set
> >>> by the architecture through CONFIG_LOG_BUF_SHIFT.
> >>>
> >>> The default static kernel ring buffer can be aligned properly if
> >>> architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges
> >>> for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible
> >>> aligned value it can be reduced to a non aligned value. Commit
> >>> 6ebb017de9 by Andrew ensures the static buffer is always aligned
> >>> and the decision of alignment is done by the compiler by using
> >>> __alignof__(struct log) (curious what value caused the crash?).
> >>
> >> IIRC the issue was that __log_buf's type is char[] so without the
> >> __aligned it could have any alignment at all, e.g. 1 or 2. However,
> >> struct printk_log is stored in the buffer rather than just char*, and so
> >> if __log_buf isn't aligned to the required alignment for that structure,
> >> that can caused unaligned accesses to fields in the structure, which
> >> isn't supported on ARM in at least some cases.
> >>
> >> As such, I think the change to setup_log_buf() in this patch makes sense
> >> (although I suppose in practice memblock_virt_alloc() probably has some
> >> minimum internal alignment that dwards LOG_ALIGN, but that's an
> >> implementation detail we shouldn't rely on).
> >
> > Thanks for the feedback.
> >
> > memblock_virt_alloc() will by default align to L1 cache, so if that satisfies
> > the architecture alignment it should be safe, but perhaps not optimal for
> > saving a few bytes. Still curious if without this patch a crash can be
> > triggered somehow with some log_buf_len=n, if so this can go to stable.
>
> If memblock_virt_alloc() aligns to L1 cache, then I believe that the
> crash would never trigger.

By default it does, that is, if no alignment requirement is passed.

Luis

2014-06-18 20:43:44

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFT 2/2] printk: allow increasing the ring buffer depending on the number of CPUs

On Wed, Jun 18, 2014 at 05:42:55PM +0200, Petr Ml?dek wrote:
> On Wed 2014-06-18 04:14:25, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> >
> > The default size of the ring buffer is too small for machines
> > with a large amount of CPUs under heavy load. What ends up
> > happening when debugging is the ring buffer overlaps and chews
> > up old messages making debugging impossible unless the size is
> > passed as a kernel parameter. An idle system upon boot up will
> > on average spew out only about one or two extra lines but where
> > this really matters is on heavy load and that will vary widely
> > depending on the system and environment.
> >
> > There are mechanisms to help increase the kernel ring buffer
> > for tracing through debugfs, and those interfaces even allow growing
> > the kernel ring buffer per CPU. We also have a static value which
> > can be passed upon boot. Relying on debugfs however is not ideal
> > for production, and relying on the value passed upon bootup is
> > can only used *after* an issue has creeped up. Instead of being
> > reactive this adds a proactive measure which lets you scale the
> > amount of contributions you'd expect to the kernel ring buffer
> > under load by each CPU in the worst case scenario.
> >
> > We use num_possible_cpus() to avoid complexities which could be
> > introduced by dynamically changing the ring buffer size at run
> > time, num_possible_cpus() lets us use the upper limit on possible
> > number of CPUs therefore avoiding having to deal with hotplugging
> > CPUs on and off. This introduces the kernel configuration option
> > LOG_CPU_MIN_BUF_SHIFT which is used to specify the maximum amount
> > of contributions to the kernel ring buffer in the worst case before
> > the kernel ring buffer flips over, the size is specified as a power
> > of 2. The total amount of contributions made by each CPU must be
> > greater than half of the default kernel ring buffer size
> > (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon
> > bootup. The kernel ring buffer is increased to the next power of
> > two that would fit the required minimum kernel ring buffer size
> > plus the additional CPU contribution. For example if LOG_BUF_SHIFT
> > is 18 (256 KB) you'd require at least 128 KB contributions by
> > other CPUs in order to trigger an increase of the kernel ring buffer.
> > With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least
> > anything over > 64 possible CPUs to trigger an increase. If you
> > had 128 possible CPUs the amount of minimum required kernel ring
> > buffer bumps to:
> >
> > ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB
> >
> > Since we require the ring buffer to be a power of two the new
> > required size would be 1024 KB.
> >
> > This CPU contributions are ignored when the "log_buf_len" kernel parameter
> > is used as it forces the exact size of the ring buffer to an expected power
> > of two value.
> >
> > Cc: Andrew Lunn <[email protected]>
> > Cc: Stephen Warren <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Joe Perches <[email protected]>
> > Cc: Arun KS <[email protected]>
> > Cc: Kees Cook <[email protected]>
> > Cc: Davidlohr Bueso <[email protected]>
> > Cc: Chris Metcalf <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> It is where we ended after several versions. I am happy with this
> state.
>
> Signed-off-by: Petr Mladek <[email protected]>
> Tested-by: Petr Mladek <[email protected]>

Thanks, I did some minor spell fixes, and also renamed min to max
for the kconfig variable. Will send as final patches now.

Luis