2022-09-02 14:52:56

by Varsha Teratipally

[permalink] [raw]
Subject: Request to cherry-pick 20401d1058f3f841f35a594ac2fc1293710e55b9 to v5.10 and v5.4

Hi all,

Commit 20401d1058f3f841f35a594ac2fc1293710e55b9("ipc: replace costly
bailout check in sysvipc_find_ipc()" fixes a high cve and optimizes the
costly loop by adding a checkpoint, which I think might be a good
candidate for the stable branches

Let me know what you think



2022-09-02 15:15:24

by Varsha Teratipally

[permalink] [raw]
Subject: [PATCH] ipc: replace costly bailout check in sysvipc_find_ipc()

From: Rafael Aquini <[email protected]>

sysvipc_find_ipc() was left with a costly way to check if the offset
position fed to it is bigger than the total number of IPC IDs in use. So
much so that the time it takes to iterate over /proc/sysvipc/* files grows
exponentially for a custom benchmark that creates "N" SYSV shm segments
and then times the read of /proc/sysvipc/shm (milliseconds):

12 msecs to read 1024 segs from /proc/sysvipc/shm
18 msecs to read 2048 segs from /proc/sysvipc/shm
65 msecs to read 4096 segs from /proc/sysvipc/shm
325 msecs to read 8192 segs from /proc/sysvipc/shm
1303 msecs to read 16384 segs from /proc/sysvipc/shm
5182 msecs to read 32768 segs from /proc/sysvipc/shm

The root problem lies with the loop that computes the total amount of ids
in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than
"ids->in_use". That is a quite inneficient way to get to the maximum
index in the id lookup table, specially when that value is already
provided by struct ipc_ids.max_idx.

This patch follows up on the optimization introduced via commit
15df03c879836 ("sysvipc: make get_maxid O(1) again") and gets rid of the
aforementioned costly loop replacing it by a simpler checkpoint based on
ipc_get_maxidx() returned value, which allows for a smooth linear increase
in time complexity for the same custom benchmark:

2 msecs to read 1024 segs from /proc/sysvipc/shm
2 msecs to read 2048 segs from /proc/sysvipc/shm
4 msecs to read 4096 segs from /proc/sysvipc/shm
9 msecs to read 8192 segs from /proc/sysvipc/shm
19 msecs to read 16384 segs from /proc/sysvipc/shm
39 msecs to read 32768 segs from /proc/sysvipc/shm

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rafael Aquini <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Acked-by: Manfred Spraul <[email protected]>
Cc: Waiman Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
ipc/util.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index 0027e47626b7..d48d8cfa1f3f 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -788,21 +788,13 @@ struct pid_namespace *ipc_seq_pid_ns(struct seq_file *s)
static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
loff_t *new_pos)
{
- struct kern_ipc_perm *ipc;
- int total, id;
-
- total = 0;
- for (id = 0; id < pos && total < ids->in_use; id++) {
- ipc = idr_find(&ids->ipcs_idr, id);
- if (ipc != NULL)
- total++;
- }
+ struct kern_ipc_perm *ipc = NULL;
+ int max_idx = ipc_get_maxidx(ids);

- ipc = NULL;
- if (total >= ids->in_use)
+ if (max_idx == -1 || pos > max_idx)
goto out;

- for (; pos < ipc_mni; pos++) {
+ for (; pos <= max_idx; pos++) {
ipc = idr_find(&ids->ipcs_idr, pos);
if (ipc != NULL) {
rcu_read_lock();
--

2022-09-02 15:39:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Request to cherry-pick 20401d1058f3f841f35a594ac2fc1293710e55b9 to v5.10 and v5.4

On Fri, Sep 02, 2022 at 01:59:11PM +0000, Varsha Teratipally wrote:
> Hi all,
>
> Commit 20401d1058f3f841f35a594ac2fc1293710e55b9("ipc: replace costly
> bailout check in sysvipc_find_ipc()" fixes a high cve and optimizes the
> costly loop by adding a checkpoint, which I think might be a good
> candidate for the stable branches

What do you mean by "high cve"?

And that feels like it's an artificial benchmark fixup, what real
workload benefits from this change?

thanks,

greg k-h

2022-09-04 17:59:09

by Manfred Spraul

[permalink] [raw]
Subject: Re: Request to cherry-pick 20401d1058f3f841f35a594ac2fc1293710e55b9 to v5.10 and v5.4

Hi,

On 9/2/22 16:27, Greg Kroah-Hartman wrote:
> On Fri, Sep 02, 2022 at 01:59:11PM +0000, Varsha Teratipally wrote:
>> Hi all,
>>
>> Commit 20401d1058f3f841f35a594ac2fc1293710e55b9("ipc: replace costly
>> bailout check in sysvipc_find_ipc()" fixes a high cve and optimizes the
>> costly loop by adding a checkpoint, which I think might be a good
>> candidate for the stable branches
> What do you mean by "high cve"?
>
> And that feels like it's an artificial benchmark fixup, what real
> workload benefits from this change?

Standard ipcs end up parsing /proc/sysvipc/*, thus there are real users
where the performance of /proc/sysvsem/* matters.

But:

The performance of the function was bad since 2007, i.e. why is is now
urgent? I do not see a bug that must be fixed.

Initial patch:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ipc/util.c?id=7ca7e564e049d8b350ec9d958ff25eaa24226352

(core issue: The code needs to find the next entry in an idr. And
instead of using idr_get_next(), it uses idr_find() in a for(;;id++) loop.)

<<<

[manfred@localhost Input]$ rpm -qf /usr/bin/ipcs
util-linux-core-2.38-1.fc36.x86_64


[manfred@localhost Input]$ strace -e openat /usr/bin/ipcs
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_TIME",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_TIME",
O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3

openat(AT_FDCWD,
"/usr/share/locale/en_US.UTF-8/LC_MESSAGES/util-linux.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale/en_US.utf8/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/util-linux.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/util-linux.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/util-linux.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/util-linux.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
------ Message Queues --------
key        msqid      owner      perms      used-bytes messages
openat(AT_FDCWD, "/proc/sysvipc/msg", O_RDONLY) = 3

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch status
openat(AT_FDCWD, "/proc/sysvipc/shm", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
0x00000000 18         manfred    600        524288     2 dest
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
0x5125004a 19         manfred    600        3208 1

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
openat(AT_FDCWD, "/proc/sysvipc/sem", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
0x51250047 0          manfred    600        1
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
0x51250049 2          manfred    600        1

>>>

2022-09-05 05:52:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Request to cherry-pick 20401d1058f3f841f35a594ac2fc1293710e55b9 to v5.10 and v5.4

On Sun, Sep 04, 2022 at 07:38:30PM +0200, Manfred Spraul wrote:
> Hi,
>
> On 9/2/22 16:27, Greg Kroah-Hartman wrote:
> > On Fri, Sep 02, 2022 at 01:59:11PM +0000, Varsha Teratipally wrote:
> > > Hi all,
> > >
> > > Commit 20401d1058f3f841f35a594ac2fc1293710e55b9("ipc: replace costly
> > > bailout check in sysvipc_find_ipc()" fixes a high cve and optimizes the
> > > costly loop by adding a checkpoint, which I think might be a good
> > > candidate for the stable branches
> > What do you mean by "high cve"?
> >
> > And that feels like it's an artificial benchmark fixup, what real
> > workload benefits from this change?
>
> Standard ipcs end up parsing /proc/sysvipc/*, thus there are real users
> where the performance of /proc/sysvsem/* matters.

What real users are that? What workload needs that becides monitoring
tools?

>
> But:
>
> The performance of the function was bad since 2007, i.e. why is is now
> urgent? I do not see a bug that must be fixed.


Me either.

thanks,

greg k-h