2023-10-14 00:59:56

by matoro

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-10-13 20:13, Steve French wrote:
> Let me know if those fixes help as two of them have not been sent to
> Linus
> yet, but I could send tomorrow
>
> On Fri, Oct 13, 2023, 19:01 Paulo Alcantara <[email protected]> wrote:
>
>> You probably want these two as well
>>
>>
>> https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=2da338ff752a2789470d733111a5241f30026675
>>
>>
>> https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=3b8bb3171571f92eda863e5f78b063604c61f72a
>>
>> as directory leases isn't supported in SMB1, so no waste of system
>> resources by having those kthreads running.
>>
>> On 13 October 2023 20:52:11 GMT-03:00, Paulo Alcantara
>> <[email protected]>
>> wrote:
>> >Could you please try two commits[1][2] from for-next?
>> >
>> >[1]
>> https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=e95f3f74465072c2545d8e65a3c3a96e37129cf8
>> >[2]
>> https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commit;h=81ba10959970d15c388bf29866b01b62f387e6a3
>> >
>> >On 13 October 2023 20:19:37 GMT-03:00, matoro <
>> [email protected]> wrote:
>> >>On 2023-10-05 05:55, Dr. Bernd Feige wrote:
>> >>> Am Dienstag, dem 26.09.2023 um 17:54 -0700 schrieb Paul Aurich:
>> >>>> Perhaps the laundromat thread should be using msleep_interruptible()?
>> >>>>
>> >>>> Using an interruptible sleep appears to prevent the thread from
>> >>>> contributing
>> >>>> to the load average, and has the happy side-effect of removing the
>> >>>> up-to-1s delay
>> >>>> when tearing down the tcon (since a7c01fa93ae, kthread_stop() will
>> >>>> return
>> >>>> early triggered by kthread_stop).
>> >>>
>> >>> Sorry for chiming in so late - I'm also on gentoo (kernel 6.5.5-
>> >>> gentoo), but as a client of Windows AD.
>> >>>
>> >>> Just want to emphasize that using uninterruptible sleep has not just
>> >>> unhappy but devastating side-effects.
>> >>>
>> >>> I have 8 processors and 16 cifsd-cfid-laundromat processes, so
>> >>> /proc/loadavg reports a load average of 16 on a totally idle system.
>> >>>
>> >>> This means that load-balancing software will never start additional
>> >>> tasks on this system - "make -l" but also any other load-dependent
>> >>> system. Just reducing the number of cifsd-cfid-laundromat processes
>> >>> does not fix this - even a single one makes loadavg report a wrong
>> >>> result for load balancing.
>> >>>
>> >>> So, if cifsd-cfid-laundromat must really be uninterruptible, the only
>> >>> solution would be to change the way loadavg is computed by the kernel
>> >>> to exclude uninterruptible but sleeping processes. But must it be
>> >>> uninterruptible?
>> >>>
>> >>> Thanks and best regards,
>> >>> Bernd
>> >>
>> >>This is a huge problem here as well, as a client to Samba using SMB1
>> (for Unix extensions).
>> >>
>> >>For others encountering this problem, I was able to work around it with
>> the following snippet:
>> >>
>> >>diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
>> >>index 2d5e9a9d5b8b..fc2caccb597a 100644
>> >>--- a/fs/smb/client/cached_dir.c
>> >>+++ b/fs/smb/client/cached_dir.c
>> >>@@ -576,7 +576,7 @@ cifs_cfids_laundromat_thread(void *p)
>> >> struct list_head entry;
>> >>
>> >> while (!kthread_should_stop()) {
>> >>- ssleep(1);
>> >>+ msleep_interruptible(1000);
>> >> INIT_LIST_HEAD(&entry);
>> >> if (kthread_should_stop())
>> >> return 0;
>>

Do you have backports of these to 6.5? I tried to do it manually but
there's already so many changes between 6.5 and these commits.


2023-10-16 18:41:50

by Paulo Alcantara

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

matoro <[email protected]> writes:

> Do you have backports of these to 6.5? I tried to do it manually but
> there's already so many changes between 6.5 and these commits.

Please find attached two patches that should fix your SMB1 case. They
applied cleanly on top of v6.5.y branch.

Let me know if it works for you and then I'll ask stable team to pick
those up.


Attachments:
0001-smb3-do-not-start-laundromat-thread-when-dir-leases-.patch (5.02 kB)
0002-smb-client-do-not-start-laundromat-thread-on-nohandl.patch (2.34 kB)
Download all attachments

2023-10-20 14:46:07

by Dr. Bernd Feige

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

Am Montag, dem 16.10.2023 um 15:41 -0300 schrieb Paulo Alcantara:
> matoro <[email protected]> writes:
>
> > Do you have backports of these to 6.5?  I tried to do it manually
> > but
> > there's already so many changes between 6.5 and these commits.
>
> Please find attached two patches that should fix your SMB1 case. 
> They
> applied cleanly on top of v6.5.y branch.
>
> Let me know if it works for you and then I'll ask stable team to pick
> those up.

Thanks!
I can confirm that the patches apply cleanly on 6.5.8 and help a lot
with the issue here (vers=3.1.1, gentoo client, MS AD server with DFS).

2023-10-23 14:07:54

by Paulo Alcantara

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

"Dr. Bernd Feige" <[email protected]> writes:

> I can confirm that the patches apply cleanly on 6.5.8 and help a lot
> with the issue here (vers=3.1.1, gentoo client, MS AD server with DFS).

Thanks for testing it.

Steve, I would suggest below commits for v6.5.y

238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases")
6a50d71d0fff ("smb3: allow controlling maximum number of cached directories")
2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled")
e95f3f744650 ("smb: client: make laundromat a delayed worker")
81ba10959970 ("smb: client: prevent new fids from being removed by laundromat")

If OK, please ask stable team to pick those up.

2023-10-24 02:52:05

by Steve French

[permalink] [raw]
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

We probably want this one as well ...

commit 2da338ff752a2789470d733111a5241f30026675
Author: Steve French <[email protected]>
Date: Tue Sep 19 11:35:53 2023 -0500

smb3: do not start laundromat thread when dir leases
disabled

When no directory lease support, or for IPC shares where directories
can not be opened, do not start an unneeded laundromat thread for
that mount (it wastes resources).

Fixes: d14de8067e3f ("cifs: Add a laundromat thread for cached directories")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Acked-by: Tom Talpey <[email protected]>
Signed-off-by: Steve French <[email protected]>



Any objections to adding that one to the list as well? The patches
all seem to apply fine to current 6.5-stable-rc

On Mon, Oct 23, 2023 at 9:07 AM Paulo Alcantara <[email protected]> wrote:
>
> "Dr. Bernd Feige" <[email protected]> writes:
>
> > I can confirm that the patches apply cleanly on 6.5.8 and help a lot
> > with the issue here (vers=3.1.1, gentoo client, MS AD server with DFS).
>
> Thanks for testing it.
>
> Steve, I would suggest below commits for v6.5.y
>
> 238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases")
> 6a50d71d0fff ("smb3: allow controlling maximum number of cached directories")
> 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled")
> e95f3f744650 ("smb: client: make laundromat a delayed worker")
> 81ba10959970 ("smb: client: prevent new fids from being removed by laundromat")
>
> If OK, please ask stable team to pick those up.



--
Thanks,

Steve