2023-09-19 01:13:03

by Brian Pardy

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

[RS removed from CC due to bounce message]

On Wed, Sep 6, 2023 at 5:03 PM Brian Pardy <[email protected]> wrote:
> On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <[email protected]> wrote:
> > Thanks for the regression report. But if you want to get it fixed,
> > you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
> >
> > Anyway, I'm adding it to regzbot:
> >
> > #regzbot ^introduced: v6.4..v6.5
> > #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS
>
> Thank you for directing me to the bug-bisect documentation. Results below:
>
> # git bisect bad
> d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
> commit d14de8067e3f9653cdef5a094176d00f3260ab20
> Author: Ronnie Sahlberg <[email protected]>
> Date: Thu Jul 6 12:32:24 2023 +1000
>
> cifs: Add a laundromat thread for cached directories
>
> and drop cached directories after 30 seconds
>
> Signed-off-by: Ronnie Sahlberg <[email protected]>
> Signed-off-by: Steve French <[email protected]>
>
> fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/smb/client/cached_dir.h | 1 +
> 2 files changed, 68 insertions(+)

Is there any further information I can provide to aid in debugging
this issue? Should I just expect incorrect load average reporting when
a CIFS share is mounted on any kernel >6.5.0?

I'm not clear on the value or necessity of this "laundromat thread" -
everything worked as expected before it was added - shall I just patch
it out of my kernel builds going forward if there is no interest in
fixing it? Is a .config option to disable it possible?


2023-09-19 04:15:22

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

> I'm not clear on the value or necessity of this "laundromat thread"

This thread is used to clean up cached directories when directory
leases are supported (in those cases directory contents, and
information needed for revalidate of directory paths can be cached
safely).

On Mon, Sep 18, 2023 at 9:20 PM Brian Pardy <[email protected]> wrote:
>
> [RS removed from CC due to bounce message]
>
> On Wed, Sep 6, 2023 at 5:03 PM Brian Pardy <[email protected]> wrote:
> > On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <[email protected]> wrote:
> > > Thanks for the regression report. But if you want to get it fixed,
> > > you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
> > >
> > > Anyway, I'm adding it to regzbot:
> > >
> > > #regzbot ^introduced: v6.4..v6.5
> > > #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS
> >
> > Thank you for directing me to the bug-bisect documentation. Results below:
> >
> > # git bisect bad
> > d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
> > commit d14de8067e3f9653cdef5a094176d00f3260ab20
> > Author: Ronnie Sahlberg <[email protected]>
> > Date: Thu Jul 6 12:32:24 2023 +1000
> >
> > cifs: Add a laundromat thread for cached directories
> >
> > and drop cached directories after 30 seconds
> >
> > Signed-off-by: Ronnie Sahlberg <[email protected]>
> > Signed-off-by: Steve French <[email protected]>
> >
> > fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> > fs/smb/client/cached_dir.h | 1 +
> > 2 files changed, 68 insertions(+)
>
> Is there any further information I can provide to aid in debugging
> this issue? Should I just expect incorrect load average reporting when
> a CIFS share is mounted on any kernel >6.5.0?
>
> I'm not clear on the value or necessity of this "laundromat thread" -
> everything worked as expected before it was added - shall I just patch
> it out of my kernel builds going forward if there is no interest in
> fixing it? Is a .config option to disable it possible?



--
Thanks,

Steve

2023-09-19 06:45:49

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
event (at SDC this week) which is now going on - will let you know
what we find.

One obvious thing is that it probably isn't necessary for cases when
the server does not support directory leases, but we noticed another
problem as well.


On Mon, Sep 18, 2023 at 9:20 PM Brian Pardy <[email protected]> wrote:
>
> [RS removed from CC due to bounce message]
>
> On Wed, Sep 6, 2023 at 5:03 PM Brian Pardy <[email protected]> wrote:
> > On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <[email protected]> wrote:
> > > Thanks for the regression report. But if you want to get it fixed,
> > > you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
> > >
> > > Anyway, I'm adding it to regzbot:
> > >
> > > #regzbot ^introduced: v6.4..v6.5
> > > #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS
> >
> > Thank you for directing me to the bug-bisect documentation. Results below:
> >
> > # git bisect bad
> > d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
> > commit d14de8067e3f9653cdef5a094176d00f3260ab20
> > Author: Ronnie Sahlberg <[email protected]>
> > Date: Thu Jul 6 12:32:24 2023 +1000
> >
> > cifs: Add a laundromat thread for cached directories
> >
> > and drop cached directories after 30 seconds
> >
> > Signed-off-by: Ronnie Sahlberg <[email protected]>
> > Signed-off-by: Steve French <[email protected]>
> >
> > fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> > fs/smb/client/cached_dir.h | 1 +
> > 2 files changed, 68 insertions(+)
>
> Is there any further information I can provide to aid in debugging
> this issue? Should I just expect incorrect load average reporting when
> a CIFS share is mounted on any kernel >6.5.0?
>
> I'm not clear on the value or necessity of this "laundromat thread" -
> everything worked as expected before it was added - shall I just patch
> it out of my kernel builds going forward if there is no interest in
> fixing it? Is a .config option to disable it possible?



--
Thanks,

Steve

2023-09-19 08:50:24

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Does the attached patch help in your case? It avoids starting the
laundromat thread for IPC shares (which cuts the number of the threads
in half for many cases) and also avoids starting them if the server
does not support directory leases (e.g. if Samba server instead of
Windows server).


On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
>
> Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> event (at SDC this week) which is now going on - will let you know
> what we find.
>
> One obvious thing is that it probably isn't necessary for cases when
> the server does not support directory leases, but we noticed another
> problem as well.
>
>
> On Mon, Sep 18, 2023 at 9:20 PM Brian Pardy <[email protected]> wrote:
> >
> > [RS removed from CC due to bounce message]
> >
> > On Wed, Sep 6, 2023 at 5:03 PM Brian Pardy <[email protected]> wrote:
> > > On Tue, Sep 5, 2023 at 9:01 PM Bagas Sanjaya <[email protected]> wrote:
> > > > Thanks for the regression report. But if you want to get it fixed,
> > > > you have to do your part: perform bisection. See Documentation/admin-guide/bug-bisect.rst in the kernel sources for how to do that.
> > > >
> > > > Anyway, I'm adding it to regzbot:
> > > >
> > > > #regzbot ^introduced: v6.4..v6.5
> > > > #regzbot title: incorrect CPU utilization report (multiplied) when mounting CIFS
> > >
> > > Thank you for directing me to the bug-bisect documentation. Results below:
> > >
> > > # git bisect bad
> > > d14de8067e3f9653cdef5a094176d00f3260ab20 is the first bad commit
> > > commit d14de8067e3f9653cdef5a094176d00f3260ab20
> > > Author: Ronnie Sahlberg <[email protected]>
> > > Date: Thu Jul 6 12:32:24 2023 +1000
> > >
> > > cifs: Add a laundromat thread for cached directories
> > >
> > > and drop cached directories after 30 seconds
> > >
> > > Signed-off-by: Ronnie Sahlberg <[email protected]>
> > > Signed-off-by: Steve French <[email protected]>
> > >
> > > fs/smb/client/cached_dir.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/smb/client/cached_dir.h | 1 +
> > > 2 files changed, 68 insertions(+)
> >
> > Is there any further information I can provide to aid in debugging
> > this issue? Should I just expect incorrect load average reporting when
> > a CIFS share is mounted on any kernel >6.5.0?
> >
> > I'm not clear on the value or necessity of this "laundromat thread" -
> > everything worked as expected before it was added - shall I just patch
> > it out of my kernel builds going forward if there is no interest in
> > fixing it? Is a .config option to disable it possible?
>
>
>
> --
> Thanks,
>
> Steve



--
Thanks,

Steve


Attachments:
0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch (4.81 kB)

2023-09-19 14:56:42

by Brian Pardy

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On Tue, Sep 19, 2023 at 1:36 AM Steve French <[email protected]> wrote:
>
> Does the attached patch help in your case? It avoids starting the
> laundromat thread for IPC shares (which cuts the number of the threads
> in half for many cases) and also avoids starting them if the server
> does not support directory leases (e.g. if Samba server instead of
> Windows server).

Hello,

I applied the 0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch
you provided against the 6.5.3 kernel.

I can confirm that it resolves this issue - no laundromat threads are
created, and the reported load average is as expected, not falsely
high.

This appears to fully fix the issue in my case. Thank you very much!

> On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
> >
> > Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> > event (at SDC this week) which is now going on - will let you know
> > what we find.
> >
> > One obvious thing is that it probably isn't necessary for cases when
> > the server does not support directory leases, but we noticed another
> > problem as well.

2023-09-19 16:39:24

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Minor updates (pointed out by Paulo) to patch. See attached.

On Tue, Sep 19, 2023 at 8:21 AM Brian Pardy <[email protected]> wrote:
>
> On Tue, Sep 19, 2023 at 1:36 AM Steve French <[email protected]> wrote:
> >
> > Does the attached patch help in your case? It avoids starting the
> > laundromat thread for IPC shares (which cuts the number of the threads
> > in half for many cases) and also avoids starting them if the server
> > does not support directory leases (e.g. if Samba server instead of
> > Windows server).
>
> Hello,
>
> I applied the 0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch
> you provided against the 6.5.3 kernel.
>
> I can confirm that it resolves this issue - no laundromat threads are
> created, and the reported load average is as expected, not falsely
> high.
>
> This appears to fully fix the issue in my case. Thank you very much!
>
> > On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
> > >
> > > Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> > > event (at SDC this week) which is now going on - will let you know
> > > what we find.
> > >
> > > One obvious thing is that it probably isn't necessary for cases when
> > > the server does not support directory leases, but we noticed another
> > > problem as well.



--
Thanks,

Steve


Attachments:
0001-smb3-do-not-start-laundromat-thread-when-dir-leases.patch (4.84 kB)

2023-09-19 18:12:29

by Tom Talpey

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 9/19/2023 9:38 AM, Steve French wrote:
> Minor updates (pointed out by Paulo) to patch. See attached.

So, was the thread crashing before??

+ if (cfids == NULL)
+ return;
+
spin_lock(&cfids->cfid_list_lock);


This if/else, IMO...

@@ -2492,7 +2493,10 @@ cifs_get_tcon(struct cifs_ses *ses, struct
smb3_fs_context *ctx)
goto out_fail;
}

- tcon = tconInfoAlloc();
+ if (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING)
+ tcon = tcon_info_alloc(true);
+ else
+ tcon = tcon_info_alloc(false);

would be more readable as...

tcon = tcon_info_alloc(ses->server->capabilities &
SMB2_GLOBAL_CAP_DIRECTORY_LEASING != 0);


These changes are good, but I'm skeptical they will reduce the load
when the laundromat thread is actually running. All these do is avoid
creating it when not necessary, right?

Acked-by: Tom Talpey <[email protected]>

Tom.


>
> On Tue, Sep 19, 2023 at 8:21 AM Brian Pardy <[email protected]> wrote:
>>
>> On Tue, Sep 19, 2023 at 1:36 AM Steve French <[email protected]> wrote:
>>>
>>> Does the attached patch help in your case? It avoids starting the
>>> laundromat thread for IPC shares (which cuts the number of the threads
>>> in half for many cases) and also avoids starting them if the server
>>> does not support directory leases (e.g. if Samba server instead of
>>> Windows server).
>>
>> Hello,
>>
>> I applied the 0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch
>> you provided against the 6.5.3 kernel.
>>
>> I can confirm that it resolves this issue - no laundromat threads are
>> created, and the reported load average is as expected, not falsely
>> high.
>>
>> This appears to fully fix the issue in my case. Thank you very much!
>>
>>> On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
>>>>
>>>> Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
>>>> event (at SDC this week) which is now going on - will let you know
>>>> what we find.
>>>>
>>>> One obvious thing is that it probably isn't necessary for cases when
>>>> the server does not support directory leases, but we noticed another
>>>> problem as well.
>
>
>

2023-09-19 18:25:09

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On Tue, Sep 19, 2023 at 1:07 PM Tom Talpey <[email protected]> wrote:
>
> On 9/19/2023 9:38 AM, Steve French wrote:
> > Minor updates (pointed out by Paulo) to patch. See attached.
>
> So, was the thread crashing before??
>
> + if (cfids == NULL)
> + return;
> +

Without laundromat initialized cfids can be null - so we need to check
if cfids is initialized in a few places (may help in a few corner
cases if there is a race in closing laundromat thread at umount but
was added to avoid oops at unmount if laundromat not initialized)

> These changes are good, but I'm skeptical they will reduce the load
> when the laundromat thread is actually running. All these do is avoid
> creating it when not necessary, right?

It does create half as many laundromat threads (we don't need
laundromat on connection to IPC$) even for the Windows server target
example, but helps more for cases where server doesn't support
directory leases.

> > On Tue, Sep 19, 2023 at 8:21 AM Brian Pardy <[email protected]> wrote:
> >>
> >> On Tue, Sep 19, 2023 at 1:36 AM Steve French <[email protected]> wrote:
> >>>
> >>> Does the attached patch help in your case? It avoids starting the
> >>> laundromat thread for IPC shares (which cuts the number of the threads
> >>> in half for many cases) and also avoids starting them if the server
> >>> does not support directory leases (e.g. if Samba server instead of
> >>> Windows server).
> >>
> >> Hello,
> >>
> >> I applied the 0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch
> >> you provided against the 6.5.3 kernel.
> >>
> >> I can confirm that it resolves this issue - no laundromat threads are
> >> created, and the reported load average is as expected, not falsely
> >> high.
> >>
> >> This appears to fully fix the issue in my case. Thank you very much!
> >>
> >>> On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
> >>>>
> >>>> Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> >>>> event (at SDC this week) which is now going on - will let you know
> >>>> what we find.
> >>>>
> >>>> One obvious thing is that it probably isn't necessary for cases when
> >>>> the server does not support directory leases, but we noticed another
> >>>> problem as well.
> >
> >
> >



--
Thanks,

Steve

2023-09-19 21:43:11

by Brian Pardy

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On Tue, Sep 19, 2023 at 12:39 PM Steve French <[email protected]> wrote:
>
> Minor updates (pointed out by Paulo) to patch. See attached.

I can't comment on the updates between the two versions of the patch,
but I can confirm that the second patch also works as expected to
resolve this issue on my system.

-Brian


> On Tue, Sep 19, 2023 at 8:21 AM Brian Pardy <[email protected]> wrote:
> >
> > On Tue, Sep 19, 2023 at 1:36 AM Steve French <[email protected]> wrote:
> > >
> > > Does the attached patch help in your case? It avoids starting the
> > > laundromat thread for IPC shares (which cuts the number of the threads
> > > in half for many cases) and also avoids starting them if the server
> > > does not support directory leases (e.g. if Samba server instead of
> > > Windows server).
> >
> > Hello,
> >
> > I applied the 0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch
> > you provided against the 6.5.3 kernel.
> >
> > I can confirm that it resolves this issue - no laundromat threads are
> > created, and the reported load average is as expected, not falsely
> > high.
> >
> > This appears to fully fix the issue in my case. Thank you very much!
> >
> > > On Mon, Sep 18, 2023 at 10:00 PM Steve French <[email protected]> wrote:
> > > >
> > > > Paulo and I were discussing the laundromat thread at the SMB3.1.1 test
> > > > event (at SDC this week) which is now going on - will let you know
> > > > what we find.
> > > >
> > > > One obvious thing is that it probably isn't necessary for cases when
> > > > the server does not support directory leases, but we noticed another
> > > > problem as well.
>
>
>
> --
> Thanks,
>
> Steve

2023-09-27 02:06:05

by Paul Aurich

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-09-19 13:23:44 -0500, Steve French wrote:
>On Tue, Sep 19, 2023 at 1:07 PM Tom Talpey <[email protected]> wrote:
>> These changes are good, but I'm skeptical they will reduce the load
>> when the laundromat thread is actually running. All these do is avoid
>> creating it when not necessary, right?
>
>It does create half as many laundromat threads (we don't need
>laundromat on connection to IPC$) even for the Windows server target
>example, but helps more for cases where server doesn't support
>directory leases.

Perhaps the laundromat thread should be using msleep_interruptible()?

Using an interruptible sleep appears to prevent the thread from contributing
to the load average, and has the happy side-effect of removing the up-to-1s delay
when tearing down the tcon (since a7c01fa93ae, kthread_stop() will return
early triggered by kthread_stop).

~Paul

2023-10-05 16:11:29

by Dr. Bernd Feige

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Am Dienstag, dem 26.09.2023 um 17:54 -0700 schrieb Paul Aurich:
> Perhaps the laundromat thread should be using msleep_interruptible()?
>
> Using an interruptible sleep appears to prevent the thread from
> contributing
> to the load average, and has the happy side-effect of removing the
> up-to-1s delay
> when tearing down the tcon (since a7c01fa93ae, kthread_stop() will
> return
> early triggered by kthread_stop).

Sorry for chiming in so late - I'm also on gentoo (kernel 6.5.5-
gentoo), but as a client of Windows AD.

Just want to emphasize that using uninterruptible sleep has not just
unhappy but devastating side-effects.

I have 8 processors and 16 cifsd-cfid-laundromat processes, so
/proc/loadavg reports a load average of 16 on a totally idle system.

This means that load-balancing software will never start additional
tasks on this system - "make -l" but also any other load-dependent
system. Just reducing the number of cifsd-cfid-laundromat processes
does not fix this - even a single one makes loadavg report a wrong
result for load balancing.

So, if cifsd-cfid-laundromat must really be uninterruptible, the only
solution would be to change the way loadavg is computed by the kernel
to exclude uninterruptible but sleeping processes. But must it be
uninterruptible?

Thanks and best regards,
Bernd

2023-10-13 23:22:41

by matoro

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-10-05 05:55, Dr. Bernd Feige wrote:
> Am Dienstag, dem 26.09.2023 um 17:54 -0700 schrieb Paul Aurich:
>> Perhaps the laundromat thread should be using msleep_interruptible()?
>>
>> Using an interruptible sleep appears to prevent the thread from
>> contributing
>> to the load average, and has the happy side-effect of removing the
>> up-to-1s delay
>> when tearing down the tcon (since a7c01fa93ae, kthread_stop() will
>> return
>> early triggered by kthread_stop).
>
> Sorry for chiming in so late - I'm also on gentoo (kernel 6.5.5-
> gentoo), but as a client of Windows AD.
>
> Just want to emphasize that using uninterruptible sleep has not just
> unhappy but devastating side-effects.
>
> I have 8 processors and 16 cifsd-cfid-laundromat processes, so
> /proc/loadavg reports a load average of 16 on a totally idle system.
>
> This means that load-balancing software will never start additional
> tasks on this system - "make -l" but also any other load-dependent
> system. Just reducing the number of cifsd-cfid-laundromat processes
> does not fix this - even a single one makes loadavg report a wrong
> result for load balancing.
>
> So, if cifsd-cfid-laundromat must really be uninterruptible, the only
> solution would be to change the way loadavg is computed by the kernel
> to exclude uninterruptible but sleeping processes. But must it be
> uninterruptible?
>
> Thanks and best regards,
> Bernd

This is a huge problem here as well, as a client to Samba using SMB1
(for Unix extensions).

For others encountering this problem, I was able to work around it with
the following snippet:

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 2d5e9a9d5b8b..fc2caccb597a 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -576,7 +576,7 @@ cifs_cfids_laundromat_thread(void *p)
struct list_head entry;

while (!kthread_should_stop()) {
- ssleep(1);
+ msleep_interruptible(1000);
INIT_LIST_HEAD(&entry);
if (kthread_should_stop())
return 0;

2023-10-14 00:01:30

by Paulo Alcantara

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Could you please try two commits[1][2] from for-next?

[1] https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=e95f3f74465072c2545d8e65a3c3a96e37129cf8
[2] https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=81ba10959970d15c388bf29866b01b62f387e6a3

On 13 October 2023 20:19:37 GMT-03:00, matoro <[email protected]> wrote:
>On 2023-10-05 05:55, Dr. Bernd Feige wrote:
>> Am Dienstag, dem 26.09.2023 um 17:54 -0700 schrieb Paul Aurich:
>>> Perhaps the laundromat thread should be using msleep_interruptible()?
>>>
>>> Using an interruptible sleep appears to prevent the thread from
>>> contributing
>>> to the load average, and has the happy side-effect of removing the
>>> up-to-1s delay
>>> when tearing down the tcon (since a7c01fa93ae, kthread_stop() will
>>> return
>>> early triggered by kthread_stop).
>>
>> Sorry for chiming in so late - I'm also on gentoo (kernel 6.5.5-
>> gentoo), but as a client of Windows AD.
>>
>> Just want to emphasize that using uninterruptible sleep has not just
>> unhappy but devastating side-effects.
>>
>> I have 8 processors and 16 cifsd-cfid-laundromat processes, so
>> /proc/loadavg reports a load average of 16 on a totally idle system.
>>
>> This means that load-balancing software will never start additional
>> tasks on this system - "make -l" but also any other load-dependent
>> system. Just reducing the number of cifsd-cfid-laundromat processes
>> does not fix this - even a single one makes loadavg report a wrong
>> result for load balancing.
>>
>> So, if cifsd-cfid-laundromat must really be uninterruptible, the only
>> solution would be to change the way loadavg is computed by the kernel
>> to exclude uninterruptible but sleeping processes. But must it be
>> uninterruptible?
>>
>> Thanks and best regards,
>> Bernd
>
>This is a huge problem here as well, as a client to Samba using SMB1 (for Unix extensions).
>
>For others encountering this problem, I was able to work around it with the following snippet:
>
>diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
>index 2d5e9a9d5b8b..fc2caccb597a 100644
>--- a/fs/smb/client/cached_dir.c
>+++ b/fs/smb/client/cached_dir.c
>@@ -576,7 +576,7 @@ cifs_cfids_laundromat_thread(void *p)
> struct list_head entry;
>
> while (!kthread_should_stop()) {
>- ssleep(1);
>+ msleep_interruptible(1000);
> INIT_LIST_HEAD(&entry);
> if (kthread_should_stop())
> return 0;

2023-10-14 00:01:32

by Paulo Alcantara

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

You probably want these two as well

https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=2da338ff752a2789470d733111a5241f30026675

https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=3b8bb3171571f92eda863e5f78b063604c61f72a

as directory leases isn't supported in SMB1, so no waste of system resources by having those kthreads running.

On 13 October 2023 20:52:11 GMT-03:00, Paulo Alcantara <[email protected]> wrote:
>Could you please try two commits[1][2] from for-next?
>
>[1] https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=e95f3f74465072c2545d8e65a3c3a96e37129cf8
>[2] https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=81ba10959970d15c388bf29866b01b62f387e6a3
>
>On 13 October 2023 20:19:37 GMT-03:00, matoro <[email protected]> wrote:
>>On 2023-10-05 05:55, Dr. Bernd Feige wrote:
>>> Am Dienstag, dem 26.09.2023 um 17:54 -0700 schrieb Paul Aurich:
>>>> Perhaps the laundromat thread should be using msleep_interruptible()?
>>>>
>>>> Using an interruptible sleep appears to prevent the thread from
>>>> contributing
>>>> to the load average, and has the happy side-effect of removing the
>>>> up-to-1s delay
>>>> when tearing down the tcon (since a7c01fa93ae, kthread_stop() will
>>>> return
>>>> early triggered by kthread_stop).
>>>
>>> Sorry for chiming in so late - I'm also on gentoo (kernel 6.5.5-
>>> gentoo), but as a client of Windows AD.
>>>
>>> Just want to emphasize that using uninterruptible sleep has not just
>>> unhappy but devastating side-effects.
>>>
>>> I have 8 processors and 16 cifsd-cfid-laundromat processes, so
>>> /proc/loadavg reports a load average of 16 on a totally idle system.
>>>
>>> This means that load-balancing software will never start additional
>>> tasks on this system - "make -l" but also any other load-dependent
>>> system. Just reducing the number of cifsd-cfid-laundromat processes
>>> does not fix this - even a single one makes loadavg report a wrong
>>> result for load balancing.
>>>
>>> So, if cifsd-cfid-laundromat must really be uninterruptible, the only
>>> solution would be to change the way loadavg is computed by the kernel
>>> to exclude uninterruptible but sleeping processes. But must it be
>>> uninterruptible?
>>>
>>> Thanks and best regards,
>>> Bernd
>>
>>This is a huge problem here as well, as a client to Samba using SMB1 (for Unix extensions).
>>
>>For others encountering this problem, I was able to work around it with the following snippet:
>>
>>diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
>>index 2d5e9a9d5b8b..fc2caccb597a 100644
>>--- a/fs/smb/client/cached_dir.c
>>+++ b/fs/smb/client/cached_dir.c
>>@@ -576,7 +576,7 @@ cifs_cfids_laundromat_thread(void *p)
>> struct list_head entry;
>>
>> while (!kthread_should_stop()) {
>>- ssleep(1);
>>+ msleep_interruptible(1000);
>> INIT_LIST_HEAD(&entry);
>> if (kthread_should_stop())
>> return 0;