2021-10-27 21:28:53

by Petr Mladek

[permalink] [raw]
Subject: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

prb_next_seq() always iterates from the first known sequence number.
In the worst case, it might loop 8k times for 256kB buffer,
15k times for 512kB buffer, and 64k times for 2MB buffer.

It was reported that pooling and reading using syslog interface
might occupy 50% of CPU.

Speedup the search by storing @id of the last finalized descriptor.

The loop is still needed because the @id is stored and read in the best
effort way. An atomic variable is used to keep the @id consistent.
But the stores and reads are not serialized against each other.
The descriptor could get reused in the meantime. The related sequence
number will be used only when it is still valid.

An invalid value should be read _only_ when there is a flood of messages
and the ringbuffer is rapidly reused. The performance is the least
problem in this case.

Reported-by: Chunlei Wang <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
---
Changes against v1:

+ Read @seq by the last finalized @id directly in prb_next_seq() (John)

kernel/printk/printk_ringbuffer.c | 49 ++++++++++++++++++++++++++++---
kernel/printk/printk_ringbuffer.h | 2 ++
2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 8a7b7362c0dd..24f47fbefbb5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -474,8 +474,10 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
* state has been re-checked. A memcpy() for all of @desc
* cannot be used because of the atomic_t @state_var field.
*/
- memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
- sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
+ if (desc_out) {
+ memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
+ sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
+ }
if (seq_out)
*seq_out = info->seq; /* also part of desc_read:C */
if (caller_id_out)
@@ -1449,6 +1451,9 @@ static void desc_make_final(struct prb_desc_ring *desc_ring, unsigned long id)

atomic_long_cmpxchg_relaxed(&d->state_var, prev_state_val,
DESC_SV(id, desc_finalized)); /* LMM(desc_make_final:A) */
+
+ /* Best effort to remember the last finalized @id. */
+ atomic_long_set(&desc_ring->last_finalized_id, id);
}

/**
@@ -1657,7 +1662,12 @@ void prb_commit(struct prb_reserved_entry *e)
*/
void prb_final_commit(struct prb_reserved_entry *e)
{
+ struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
+
_prb_commit(e, desc_finalized);
+
+ /* Best effort to remember the last finalized @id. */
+ atomic_long_set(&desc_ring->last_finalized_id, e->id);
}

/*
@@ -2005,9 +2015,39 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb)
*/
u64 prb_next_seq(struct printk_ringbuffer *rb)
{
- u64 seq = 0;
+ struct prb_desc_ring *desc_ring = &rb->desc_ring;
+ enum desc_state d_state;
+ unsigned long id;
+ u64 seq;
+
+ /* Check if the cached @id still points to a valid @seq. */
+ id = atomic_long_read(&desc_ring->last_finalized_id);
+ d_state = desc_read(desc_ring, id, NULL, &seq, NULL);

- /* Search forward from the oldest descriptor. */
+ if (d_state == desc_finalized || d_state == desc_reusable) {
+ /*
+ * Begin searching after the last finalized record.
+ *
+ * On 0, the search must begin at 0 because of hack#2
+ * of the bootstrapping phase it is not known if a
+ * record at index 0 exists.
+ */
+ if (seq != 0)
+ seq++;
+ } else {
+ /*
+ * The information about the last finalized sequence number
+ * has gone. It should happen only when there is a flood of
+ * new messages and the ringbuffer is rapidly recycled.
+ * Give up and start from the beginning.
+ */
+ seq = 0;
+ }
+
+ /*
+ * The information about the last finalized @seq might be inaccurate.
+ * Search forward to find the current one.
+ */
while (_prb_read_valid(rb, &seq, NULL, NULL))
seq++;

@@ -2044,6 +2084,7 @@ void prb_init(struct printk_ringbuffer *rb,
rb->desc_ring.infos = infos;
atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits));
atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits));
+ atomic_long_set(&rb->desc_ring.last_finalized_id, DESC0_ID(descbits));

rb->text_data_ring.size_bits = textbits;
rb->text_data_ring.data = text_buf;
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 73cc80e01cef..18cd25e489b8 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -75,6 +75,7 @@ struct prb_desc_ring {
struct printk_info *infos;
atomic_long_t head_id;
atomic_long_t tail_id;
+ atomic_long_t last_finalized_id;
};

/*
@@ -258,6 +259,7 @@ static struct printk_ringbuffer name = { \
.infos = &_##name##_infos[0], \
.head_id = ATOMIC_INIT(DESC0_ID(descbits)), \
.tail_id = ATOMIC_INIT(DESC0_ID(descbits)), \
+ .last_finalized_id = ATOMIC_INIT(DESC0_ID(descbits)), \
}, \
.text_data_ring = { \
.size_bits = (avgtextbits) + (descbits), \
--
2.26.2


2021-10-27 21:32:13

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

On 2021-10-27, Petr Mladek <[email protected]> wrote:
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 8a7b7362c0dd..24f47fbefbb5 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -474,8 +474,10 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
> * state has been re-checked. A memcpy() for all of @desc
> * cannot be used because of the atomic_t @state_var field.
> */
> - memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> - sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> + if (desc_out) {
> + memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> + sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> + }
> if (seq_out)
> *seq_out = info->seq; /* also part of desc_read:C */
> if (caller_id_out)

You also need this hunk:

@@ -530,7 +530,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
state_val = atomic_long_read(state_var); /* LMM(desc_read:E) */
d_state = get_desc_state(id, state_val);
out:
- atomic_long_set(&desc_out->state_var, state_val);
+ if (desc_out)
+ atomic_long_set(&desc_out->state_var, state_val);
return d_state;
}

Sorry, I missed it in my last review. This time I build+boot tested the patch.

With this hunk added:

Reviewed-by: John Ogness <[email protected]>

John

2021-10-31 02:12:03

by kernel test robot

[permalink] [raw]
Subject: [printk] ebd5ebf476: BUG:kernel_NULL_pointer_dereference,address



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: ebd5ebf4767439d2528d8c9afc6c86363dc92488 ("[PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance")
url: https://github.com/0day-ci/linux/commits/Petr-Mladek/printk-ringbuffer-Improve-prb_next_seq-performance/20211027-220945
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git d25f27432f80a800a3592db128254c8140bd71bf
patch link: https://lore.kernel.org/linux-arm-kernel/YXlddJxLh77DKfIO@alley

in testcase: stress-ng
version: stress-ng-x86_64-0.11-06_20211029
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: dev
cpufreq_governor: performance
ucode: 0x5003006



on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>



[ 56.519320][ T5857] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 56.527377][ T5857] #PF: supervisor write access in kernel mode
[ 56.527379][ T5857] #PF: error_code(0x0002) - not-present page
[ 56.527381][ T5857] PGD 0 P4D 0
[ 56.540017][ T5857] Oops: 0002 [#1] SMP NOPTI
[ 56.540020][ T5857] CPU: 6 PID: 5857 Comm: stress-ng Not tainted 5.15.0-rc7-00019-gebd5ebf47674 #1
[ 56.557846][ T5857] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 56.557848][ T5857] RIP: 0010:desc_read+0xa1/0xc0
[ 56.574672][ T5857] Code: 49 89 09 4d 85 c0 74 06 8b 40 14 41 89 00 48 b8 ff ff ff ff ff ff ff 3f 48 8b 0b 48 21 c8 48 39 c6 75 1b 48 89 c8 48 c1 e8 3e
<48> 89 0a 48 8b 5d f8 c9 c3 48 89 0a 48 8b 5d f8 31 c0 c9 c3 b8 ff
[ 56.574675][ T5857] RSP: 0018:ffffc90025ab3ec0 EFLAGS: 00010a02
[ 56.574678][ T5857] RAX: 0000000000000002 RBX: ffffffff82b2ff78 RCX: 80000000ffff8e81
[ 56.574680][ T5857] RDX: 0000000000000000 RSI: 00000000ffff8e81 RDI: 0000000000004885
[ 56.620421][ T5857] RBP: ffffc90025ab3ed0 R08: 0000000000000000 R09: ffffc90025ab3ee0
[ 56.620427][ T5857] R10: 00000000ffff8e81 R11: 0000000000000002 R12: 0000000000000002
[ 56.620429][ T5857] R13: ffffffffffffffea R14: ffff88b01dd09f00 R15: 0000000000000000
[ 56.646759][ T5857] FS: 00007fc24c328740(0000) GS:ffff8897e0780000(0000) knlGS:0000000000000000
[ 56.646764][ T5857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.646768][ T5857] CR2: 0000000000000000 CR3: 00000001ab9cc005 CR4: 00000000007706e0
[ 56.673115][ T5857] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 56.673118][ T5857] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 56.673121][ T5857] PKRU: 55555554
[ 56.695346][ T5857] Call Trace:
[ 56.695352][ T5857] prb_next_seq+0x2e/0xc0
[ 56.704792][ T5857] devkmsg_llseek+0x71/0xc0
[ 56.704799][ T5857] ksys_lseek+0x60/0xc0
[ 56.714966][ T5857] do_syscall_64+0x38/0xc0
[ 56.714979][ T5857] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 56.726717][ T5857] RIP: 0033:0x7fc24c438597
[ 56.726726][ T5857] Code: c7 c0 ff ff ff ff c3 48 8b 15 f5 08 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b6 0f 1f 80 00 00 00 00 b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 08 0d 00 f7 d8 64 89 02 48
[ 56.753847][ T5857] RSP: 002b:00007ffc4d28fe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
[ 56.753853][ T5857] RAX: ffffffffffffffda RBX: 000055fe3be2c700 RCX: 00007fc24c438597
[ 56.753856][ T5857] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000005
[ 56.781564][ T5857] RBP: 0000000000000005 R08: 00000000617cb6ed R09: 00007ffc4d337080
[ 56.781566][ T5857] R10: 0000000000018c78 R11: 0000000000000246 R12: 000055fe3be2c710
[ 56.799455][ T5857] R13: 00007ffc4d291360 R14: 0000000000000001 R15: 000055fe3bbe4ba0
[ 56.799464][ T5857] Modules linked in: vfio_iommu_type1 vfio loop hci_vhci bluetooth ecdh_generic ecc rfkill cuse fuse dm_mod binfmt_misc btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp sd_mod t10_pi sg kvm_intel kvm ast irqbypass drm_vram_helper crct10dif_pclmul drm_ttm_helper crc32_pclmul ipmi_ssif crc32c_intel ttm ghash_clmulni_intel drm_kms_helper ahci syscopyarea sysfillrect acpi_ipmi libahci sysimgblt rapl ipmi_si fb_sys_fops intel_cstate mei_me drm ioatdma intel_uncore libata mei joydev ipmi_devintf intel_pch_thermal dca wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables
[ 56.878664][ T5857] CR2: 0000000000000000
[ 56.884268][ T5857] ---[ end trace 1d2ed0095167d93e ]---



To reproduce:
# build kernel with attached config file

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (5.37 kB)
config-5.15.0-rc7-00019-gebd5ebf47674 (175.30 kB)
job-script (8.42 kB)
dmesg.xz (48.10 kB)
job.yaml (5.50 kB)
Download all attachments

2021-11-01 10:09:52

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

On Wed 2021-10-27 17:54:42, John Ogness wrote:
> On 2021-10-27, Petr Mladek <[email protected]> wrote:
> > diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> > index 8a7b7362c0dd..24f47fbefbb5 100644
> > --- a/kernel/printk/printk_ringbuffer.c
> > +++ b/kernel/printk/printk_ringbuffer.c
> > @@ -474,8 +474,10 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
> > * state has been re-checked. A memcpy() for all of @desc
> > * cannot be used because of the atomic_t @state_var field.
> > */
> > - memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> > - sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> > + if (desc_out) {
> > + memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> > + sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> > + }
> > if (seq_out)
> > *seq_out = info->seq; /* also part of desc_read:C */
> > if (caller_id_out)
>
> You also need this hunk:
>
> @@ -530,7 +530,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
> state_val = atomic_long_read(state_var); /* LMM(desc_read:E) */
> d_state = get_desc_state(id, state_val);
> out:
> - atomic_long_set(&desc_out->state_var, state_val);
> + if (desc_out)
> + atomic_long_set(&desc_out->state_var, state_val);
> return d_state;
> }
>
> Sorry, I missed it in my last review. This time I build+boot tested the patch.

Urgh. Great catch! It seems that my testing was not good enough. Or
that I was just lucky.

> With this hunk added:
>
> Reviewed-by: John Ogness <[email protected]>

I am going to queue it for 5.17 after the pull request for 5.16 is
accepted. I do not feel comfortable to rush this into 5.16, especially
after finding the bug. The merge window has just started...

Anyway, thanks a lot for review and catching the mistake.

Best Regards,
Petr

2022-04-08 01:05:35

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -75,6 +75,7 @@ struct prb_desc_ring {
> struct printk_info *infos;
> atomic_long_t head_id;
> atomic_long_t tail_id;
> + atomic_long_t last_finalized_id;

I really know nothing about this code, but while looking around
kernel/printk/ I noticed kernel/printk/printk.c has these lines in
log_buf_vmcoreinfo_setup().

VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
VMCOREINFO_OFFSET(prb_desc_ring, descs);
VMCOREINFO_OFFSET(prb_desc_ring, infos);
VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);

Would this header file change also require a change to that location?
Something like

VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
+ VMCOREINFO_OFFSET(prb_desc_ring, last_finalized_id);

-Saravana

> };
>
> /*
> @@ -258,6 +259,7 @@ static struct printk_ringbuffer name = { \
> .infos = &_##name##_infos[0], \
> .head_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> .tail_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> + .last_finalized_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> }, \
> .text_data_ring = { \
> .size_bits = (avgtextbits) + (descbits), \

2022-04-08 01:14:17

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

Messed up mail headers. I'll resend this with proper mail headers.

-Saravana

On Thu, Apr 7, 2022 at 5:36 PM Saravana Kannan <[email protected]> wrote:
>
> > --- a/kernel/printk/printk_ringbuffer.h
> > +++ b/kernel/printk/printk_ringbuffer.h
> > @@ -75,6 +75,7 @@ struct prb_desc_ring {
> > struct printk_info *infos;
> > atomic_long_t head_id;
> > atomic_long_t tail_id;
> > + atomic_long_t last_finalized_id;
>
> I really know nothing about this code, but while looking around
> kernel/printk/ I noticed kernel/printk/printk.c has these lines in
> log_buf_vmcoreinfo_setup().
>
> VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
> VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
> VMCOREINFO_OFFSET(prb_desc_ring, descs);
> VMCOREINFO_OFFSET(prb_desc_ring, infos);
> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
>
> Would this header file change also require a change to that location?
> Something like
>
> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
> + VMCOREINFO_OFFSET(prb_desc_ring, last_finalized_id);
>
> -Saravana
>
> > };
> >
> > /*
> > @@ -258,6 +259,7 @@ static struct printk_ringbuffer name = { \
> > .infos = &_##name##_infos[0], \
> > .head_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> > .tail_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> > + .last_finalized_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> > }, \
> > .text_data_ring = { \
> > .size_bits = (avgtextbits) + (descbits), \

2022-04-08 01:17:35

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance


> Petr Mladek <[email protected]> wrote:
>
> prb_next_seq() always iterates from the first known sequence number.
> In the worst case, it might loop 8k times for 256kB buffer,
> 15k times for 512kB buffer, and 64k times for 2MB buffer.
>
> It was reported that pooling and reading using syslog interface
> might occupy 50% of CPU.
>
> Speedup the search by storing @id of the last finalized descriptor.
>
> The loop is still needed because the @id is stored and read in the best
> effort way. An atomic variable is used to keep the @id consistent.
> But the stores and reads are not serialized against each other.
> The descriptor could get reused in the meantime. The related sequence
> number will be used only when it is still valid.
>
> An invalid value should be read _only_ when there is a flood of messages
> and the ringbuffer is rapidly reused. The performance is the least
> problem in this case.
>
> Reported-by: Chunlei Wang <[email protected]>
> Signed-off-by: Petr Mladek <[email protected]>
> ---
> Changes against v1:
>
> + Read @seq by the last finalized @id directly in prb_next_seq() (John)
>
> kernel/printk/printk_ringbuffer.c | 49 ++++++++++++++++++++++++++++---
> kernel/printk/printk_ringbuffer.h | 2 ++
> 2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 8a7b7362c0dd..24f47fbefbb5 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -474,8 +474,10 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
> * state has been re-checked. A memcpy() for all of @desc
> * cannot be used because of the atomic_t @state_var field.
> */
> - memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> - sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> + if (desc_out) {
> + memcpy(&desc_out->text_blk_lpos, &desc->text_blk_lpos,
> + sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */
> + }
> if (seq_out)
> *seq_out = info->seq; /* also part of desc_read:C */
> if (caller_id_out)
> @@ -1449,6 +1451,9 @@ static void desc_make_final(struct prb_desc_ring *desc_ring, unsigned long id)
>
> atomic_long_cmpxchg_relaxed(&d->state_var, prev_state_val,
> DESC_SV(id, desc_finalized)); /* LMM(desc_make_final:A) */
> +
> + /* Best effort to remember the last finalized @id. */
> + atomic_long_set(&desc_ring->last_finalized_id, id);
> }
>
> /**
> @@ -1657,7 +1662,12 @@ void prb_commit(struct prb_reserved_entry *e)
> */
> void prb_final_commit(struct prb_reserved_entry *e)
> {
> + struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
> +
> _prb_commit(e, desc_finalized);
> +
> + /* Best effort to remember the last finalized @id. */
> + atomic_long_set(&desc_ring->last_finalized_id, e->id);
> }
>
> /*
> @@ -2005,9 +2015,39 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb)
> */
> u64 prb_next_seq(struct printk_ringbuffer *rb)
> {
> - u64 seq = 0;
> + struct prb_desc_ring *desc_ring = &rb->desc_ring;
> + enum desc_state d_state;
> + unsigned long id;
> + u64 seq;
> +
> + /* Check if the cached @id still points to a valid @seq. */
> + id = atomic_long_read(&desc_ring->last_finalized_id);
> + d_state = desc_read(desc_ring, id, NULL, &seq, NULL);
>
> - /* Search forward from the oldest descriptor. */
> + if (d_state == desc_finalized || d_state == desc_reusable) {
> + /*
> + * Begin searching after the last finalized record.
> + *
> + * On 0, the search must begin at 0 because of hack#2
> + * of the bootstrapping phase it is not known if a
> + * record at index 0 exists.
> + */
> + if (seq != 0)
> + seq++;
> + } else {
> + /*
> + * The information about the last finalized sequence number
> + * has gone. It should happen only when there is a flood of
> + * new messages and the ringbuffer is rapidly recycled.
> + * Give up and start from the beginning.
> + */
> + seq = 0;
> + }
> +
> + /*
> + * The information about the last finalized @seq might be inaccurate.
> + * Search forward to find the current one.
> + */
> while (_prb_read_valid(rb, &seq, NULL, NULL))
> seq++;
>
> @@ -2044,6 +2084,7 @@ void prb_init(struct printk_ringbuffer *rb,
> rb->desc_ring.infos = infos;
> atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits));
> atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits));
> + atomic_long_set(&rb->desc_ring.last_finalized_id, DESC0_ID(descbits));
>
> rb->text_data_ring.size_bits = textbits;
> rb->text_data_ring.data = text_buf;
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 73cc80e01cef..18cd25e489b8 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -75,6 +75,7 @@ struct prb_desc_ring {
> struct printk_info *infos;
> atomic_long_t head_id;
> atomic_long_t tail_id;
> + atomic_long_t last_finalized_id;
> };


I really know nothing about this code, but while looking around
kernel/printk/ I noticed kernel/printk/printk.c has these lines in
log_buf_vmcoreinfo_setup().

VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
VMCOREINFO_OFFSET(prb_desc_ring, descs);
VMCOREINFO_OFFSET(prb_desc_ring, infos);
VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);

Would this header file change also require a change to that location?
Something like

VMCOREINFO_OFFSET(prb_desc_ring, head_id);
VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
+ VMCOREINFO_OFFSET(prb_desc_ring, last_finalized_id);

-Saravana

>
> /*
> @@ -258,6 +259,7 @@ static struct printk_ringbuffer name = { \
> .infos = &_##name##_infos[0], \
> .head_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> .tail_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> + .last_finalized_id = ATOMIC_INIT(DESC0_ID(descbits)), \
> }, \
> .text_data_ring = { \
> .size_bits = (avgtextbits) + (descbits), \
> --
> 2.26.2
>

2022-04-08 07:22:35

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v2] printk: ringbuffer: Improve prb_next_seq() performance

On Thu 2022-04-07 17:48:21, Saravana Kannan wrote:
>
> > Petr Mladek <[email protected]> wrote:
> >
> > prb_next_seq() always iterates from the first known sequence number.
> > In the worst case, it might loop 8k times for 256kB buffer,
> > 15k times for 512kB buffer, and 64k times for 2MB buffer.
> >
> > It was reported that pooling and reading using syslog interface
> > might occupy 50% of CPU.
> >
> > Speedup the search by storing @id of the last finalized descriptor.
> >
> > The loop is still needed because the @id is stored and read in the best
> > effort way. An atomic variable is used to keep the @id consistent.
> > But the stores and reads are not serialized against each other.
> > The descriptor could get reused in the meantime. The related sequence
> > number will be used only when it is still valid.
> >
> > An invalid value should be read _only_ when there is a flood of messages
> > and the ringbuffer is rapidly reused. The performance is the least
> > problem in this case.
> >
> > --- a/kernel/printk/printk_ringbuffer.h
> > +++ b/kernel/printk/printk_ringbuffer.h
> > @@ -75,6 +75,7 @@ struct prb_desc_ring {
> > struct printk_info *infos;
> > atomic_long_t head_id;
> > atomic_long_t tail_id;
> > + atomic_long_t last_finalized_id;
> > };
>
>
> I really know nothing about this code, but while looking around
> kernel/printk/ I noticed kernel/printk/printk.c has these lines in
> log_buf_vmcoreinfo_setup().
>
> VMCOREINFO_STRUCT_SIZE(prb_desc_ring);
> VMCOREINFO_OFFSET(prb_desc_ring, count_bits);
> VMCOREINFO_OFFSET(prb_desc_ring, descs);
> VMCOREINFO_OFFSET(prb_desc_ring, infos);
> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
>
> Would this header file change also require a change to that location?
> Something like
>
> VMCOREINFO_OFFSET(prb_desc_ring, head_id);
> VMCOREINFO_OFFSET(prb_desc_ring, tail_id);
> + VMCOREINFO_OFFSET(prb_desc_ring, last_finalized_id);

It is actually not needed. VMCOREINFO_*() macros define some data
that are used by the "crash" tool when reading crashdumps. The field
"last_finalized_id" is used only at runtime. It is not needed
for reading the log in the crashdump.

Anyway, thanks for the question. It is great to know that there are
more people checking our changes.

Best Regards,
Petr