Hi,
starting with 5.4.188 wie see a massive performance regression on our
nfs-server. It basically is serving requests very very slowly with cpu
utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
unusable as a fileserver.
The culprit are commits (or one of it):
c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
nfsd_file_lru_dispose()"
628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
laundrette"
(upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
9542e6a643fc69d528dfb3303f145719c61d3050)
If I revert them in v5.4.192 the kernel works as before and performance
is ok again.
I did not try to revert them one by one as any disruption of our
nfs-server is a severe problem for us and I'm not sure if they are
related.
5.10 and 5.15 both always performed very badly on our nfs-server in a
similar way so we were stuck with 5.4.
I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
revert them in 5.15 yet.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
> Hi,
>
> starting with 5.4.188 wie see a massive performance regression on our
> nfs-server. It basically is serving requests very very slowly with cpu
> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
> unusable as a fileserver.
>
> The culprit are commits (or one of it):
>
> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
> nfsd_file_lru_dispose()"
> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
> laundrette"
>
> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
> 9542e6a643fc69d528dfb3303f145719c61d3050)
>
> If I revert them in v5.4.192 the kernel works as before and performance is
> ok again.
>
> I did not try to revert them one by one as any disruption of our nfs-server
> is a severe problem for us and I'm not sure if they are related.
>
> 5.10 and 5.15 both always performed very badly on our nfs-server in a
> similar way so we were stuck with 5.4.
>
> I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
> revert them in 5.15 yet.
Odds are 5.18-rc6 is also a problem?
If so, I'll just wait for the fix to get into Linus's tree as this does
not seem to be a stable-tree-only issue.
thanks,
greg k-h
> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]> wrote:
>
> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>> Hi,
>>
>> starting with 5.4.188 wie see a massive performance regression on our
>> nfs-server. It basically is serving requests very very slowly with cpu
>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
>> unusable as a fileserver.
>>
>> The culprit are commits (or one of it):
>>
>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>> nfsd_file_lru_dispose()"
>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
>> laundrette"
>>
>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>
>> If I revert them in v5.4.192 the kernel works as before and performance is
>> ok again.
>>
>> I did not try to revert them one by one as any disruption of our nfs-server
>> is a severe problem for us and I'm not sure if they are related.
>>
>> 5.10 and 5.15 both always performed very badly on our nfs-server in a
>> similar way so we were stuck with 5.4.
>>
>> I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
>> revert them in 5.15 yet.
>
> Odds are 5.18-rc6 is also a problem?
We believe that
6b8a94332ee4 ("nfsd: Fix a write performance regression")
addresses the performance regression. It was merged into 5.18-rc.
> If so, I'll just wait for the fix to get into Linus's tree as this does
> not seem to be a stable-tree-only issue.
Unfortunately I've received a recent report that the fix introduces
a "sleep while spinlock is held" for NFSv4.0 in rare cases.
--
Chuck Lever
Am 2022-05-11 16:36, schrieb Chuck Lever III:
>> On May 11, 2022, at 10:23 AM, Greg KH <[email protected]>
>> wrote:
>>
>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>>
>>>
>>>> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]>
>>>> wrote:
>>>>
>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>>>>> Hi,
>>>>>
>>>>> starting with 5.4.188 wie see a massive performance regression on
>>>>> our
>>>>> nfs-server. It basically is serving requests very very slowly with
>>>>> cpu
>>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it
>>>>> is
>>>>> unusable as a fileserver.
>>>>>
>>>>> The culprit are commits (or one of it):
>>>>>
>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>> nfsd_file_lru_dispose()"
>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise
>>>>> filecache
>>>>> laundrette"
>>>>>
>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>
>>>>> If I revert them in v5.4.192 the kernel works as before and
>>>>> performance is
>>>>> ok again.
>>>>>
>>>>> I did not try to revert them one by one as any disruption of our
>>>>> nfs-server
>>>>> is a severe problem for us and I'm not sure if they are related.
>>>>>
>>>>> 5.10 and 5.15 both always performed very badly on our nfs-server in
>>>>> a
>>>>> similar way so we were stuck with 5.4.
>>>>>
>>>>> I now think this is because of
>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't
>>>>> tried to
>>>>> revert them in 5.15 yet.
>>>>
>>>> Odds are 5.18-rc6 is also a problem?
>>>
>>> We believe that
>>>
>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>
>>> addresses the performance regression. It was merged into 5.18-rc.
>>
>> And into 5.17.4 if someone wants to try that release.
>
> I don't have a lot of time to backport this one myself, so
> I welcome anyone who wants to apply that commit to their
> favorite LTS kernel and test it for us.
>
>
>>>> If so, I'll just wait for the fix to get into Linus's tree as this
>>>> does
>>>> not seem to be a stable-tree-only issue.
>>>
>>> Unfortunately I've received a recent report that the fix introduces
>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>
>> Ick, not good, any potential fixes for that?
>
> Not yet. I was at LSF last week, so I've just started digging
> into this one. I've confirmed that the report is a real bug,
> but we still don't know how hard it is to hit it with real
> workloads.
>
>
> --
> Chuck Lever
This maybe unrelated: when I used 5.4.188 for the first time I got this:
[Mon Apr 11 09:20:23 2022] ------------[ cut here ]------------
[Mon Apr 11 09:20:23 2022] refcount_t: underflow; use-after-free.
[Mon Apr 11 09:20:23 2022] WARNING: CPU: 18 PID: 7443 at
lib/refcount.c:190 refcount_sub_and_test_checked+0x48/0x50
[Mon Apr 11 09:20:23 2022] Modules linked in: rpcsec_gss_krb5(E) msr(E)
8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E) binfmt_misc(E)
dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E)
ipmi_ssif(E) intel_rapl_ms
r(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
crct10dif_pclmul(E) crc32_pclmul(E) ast(E) snd_pcm(E) drm_vram_helper(E)
ttm(E) ghash_clmulni_intel(E) s
nd_timer(E) rapl(E) snd(E) soundcore(E) drm_kms_helper(E)
intel_cstate(E) pcspkr(E) iTCO_wdt(E) intel_uncore(E) drm(E)
iTCO_vendor_support(E) mei_me(E) joydev(E) i2c_algo_bit(E) watchdog(E)
mei(E) ioatdma(E) evdev(E) sg(E) ipmi_si
(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E)
button(E) poly1305_x86_64(E) poly1305_generic(E) sha256_ssse3(E)
sha1_ssse3(E) chacha_generic(E) chacha20poly1305(E) aesni_intel(E)
crypto_simd(E) glue_helper(
E) cryptd(E) drbd(E) lru_cache(E) nfsd(E) nfs_acl(E) lockd(E) grace(E)
auth_rpcgss(E) sunrpc(E)
[Mon Apr 11 09:20:23 2022] ip_tables(E) x_tables(E) autofs4(E) ext4(E)
crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E)
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
async_tx(E) xor(E) raid6_pq(E) libcrc
32c(E) crc32c_generic(E) raid0(E) multipath(E) linear(E) ses(E)
enclosure(E) hid_generic(E) usbhid(E) hid(E) raid1(E) md_mod(E)
sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E) xhci_pci(E) libahci(E)
ehci_pci(E) xfrm_algo(E) mpt3sas(E)
xhci_hcd(E) ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E)
usbcore(E) scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E)
mfd_core(E) usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
[Mon Apr 11 09:20:23 2022] CPU: 18 PID: 7443 Comm: kworker/u50:1
Tainted: G W E 5.4.188-debian64.all+1.2 #1
[Mon Apr 11 09:20:23 2022] Hardware name: Supermicro X10DRi/X10DRI-T,
BIOS 1.1a 10/16/2015
[Mon Apr 11 09:20:23 2022] Workqueue: rpciod rpc_async_schedule [sunrpc]
[Mon Apr 11 09:20:23 2022] RIP:
0010:refcount_sub_and_test_checked+0x48/0x50
[Mon Apr 11 09:20:23 2022] Code: 31 e4 44 89 e0 41 5c c3 eb f8 44 0f b6
25 7c ee ce 00 45 84 e4 75 e8 48 c7 c7 80 97 8b b1 c6 05 69 ee ce 00 01
e8 78 3f 3a 00 <0f> 0b eb d4 0f 1f 40 00 48 89 fe bf 01 00 00 00 eb a6
66 0f 1f 44
[Mon Apr 11 09:20:23 2022] RSP: 0018:ffffad8a8d01fe48 EFLAGS: 00010286
[Mon Apr 11 09:20:23 2022] RAX: 0000000000000000 RBX: 0000000000002a81
RCX: 0000000000000000
[Mon Apr 11 09:20:23 2022] RDX: 0000000000000001 RSI: 0000000000000082
RDI: 00000000ffffffff
[Mon Apr 11 09:20:23 2022] RBP: ffff9fb4a2394400 R08: 0000000000000001
R09: 000000000000085b
[Mon Apr 11 09:20:23 2022] R10: 0000000000030f5c R11: 0000000000000004
R12: 0000000000000000
[Mon Apr 11 09:20:23 2022] R13: ffff9feaf358d400 R14: 0000000000000000
R15: ffff9fb7e1b25148
[Mon Apr 11 09:20:23 2022] FS: 0000000000000000(0000)
GS:ffff9feaff980000(0000) knlGS:0000000000000000
[Mon Apr 11 09:20:23 2022] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Mon Apr 11 09:20:23 2022] CR2: 00007ffe62ca3080 CR3: 000000350880a002
CR4: 00000000001606e0
[Mon Apr 11 09:20:23 2022] Call Trace:
[Mon Apr 11 09:20:23 2022] nfsd4_cb_offload_release+0x16/0x30 [nfsd]
[Mon Apr 11 09:20:23 2022] rpc_free_task+0x39/0x70 [sunrpc]
[Mon Apr 11 09:20:23 2022] rpc_async_schedule+0x29/0x40 [sunrpc]
[Mon Apr 11 09:20:23 2022] process_one_work+0x1ab/0x390
[Mon Apr 11 09:20:23 2022] worker_thread+0x50/0x3c0
[Mon Apr 11 09:20:23 2022] ? rescuer_thread+0x370/0x370
[Mon Apr 11 09:20:23 2022] kthread+0x13c/0x160
[Mon Apr 11 09:20:23 2022] ? __kthread_bind_mask+0x60/0x60
[Mon Apr 11 09:20:23 2022] ret_from_fork+0x1f/0x40
[Mon Apr 11 09:20:23 2022] ---[ end trace c58c1aaca9be5d21 ]---
[Mon Apr 11 09:56:32 2022] perf: interrupt took too long (6293 > 6228),
lowering kernel.perf_event_max_sample_rate to 31750
[Mon Apr 11 09:59:17 2022] ------------[ cut here ]------------
[Mon Apr 11 09:59:17 2022] nfsd4_process_open2 failed to open
newly-created file! status=10008
[Mon Apr 11 09:59:17 2022] WARNING: CPU: 2 PID: 3061 at
fs/nfsd/nfs4proc.c:456 nfsd4_open+0x39a/0x710 [nfsd]
[Mon Apr 11 09:59:17 2022] Modules linked in: rpcsec_gss_krb5(E) msr(E)
8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E) binfmt_misc(E)
dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E)
ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E)
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E)
kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ast(E)
snd_pcm(E) drm_vram_helper(E) ttm(E) ghash_clmulni_intel(E) snd_timer(E)
rapl(E) snd(E) soundcore(E) drm_kms_helper(E) intel_cstate(E) pcspkr(E)
iTCO_wdt(E) intel_uncore(E) drm(E) iTCO_vendor_support(E) mei_me(E)
joydev(E) i2c_algo_bit(E) watchdog(E) mei(E) ioatdma(E) evdev(E) sg(E)
ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E)
acpi_pad(E) button(E) poly1305_x86_64(E) poly1305_generic(E)
sha256_ssse3(E) sha1_ssse3(E) chacha_generic(E) chacha20poly1305(E)
aesni_intel(E) crypto_simd(E) glue_helper(E) cryptd(E) drbd(E)
lru_cache(E) nfsd(E) nfs_acl(E) lockd(E) grace(E) auth_rpcgss(E)
sunrpc(E)
[Mon Apr 11 09:59:17 2022] ip_tables(E) x_tables(E) autofs4(E) ext4(E)
crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E)
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid0(E)
multipath(E) linear(E) ses(E) enclosure(E) hid_generic(E) usbhid(E)
hid(E) raid1(E) md_mod(E) sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E)
xhci_pci(E) libahci(E) ehci_pci(E) xfrm_algo(E) mpt3sas(E) xhci_hcd(E)
ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E) usbcore(E)
scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E) mfd_core(E)
usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
[Mon Apr 11 09:59:17 2022] CPU: 2 PID: 3061 Comm: nfsd Tainted: G
W E 5.4.188-debian64.all+1.2 #1
[Mon Apr 11 09:59:17 2022] Hardware name: Supermicro X10DRi/X10DRI-T,
BIOS 1.1a 10/16/2015
[Mon Apr 11 09:59:17 2022] RIP: 0010:nfsd4_open+0x39a/0x710 [nfsd]
[Mon Apr 11 09:59:17 2022] Code: e8 db 0f 01 00 41 89 c5 85 c0 0f 84 5e
01 00 00 80 bd 15 01 00 00 00 74 13 44 89 ee 48 c7 c7 a8 08 6f c0 0f ce
e8 b6 f8 ae f0 <0f> 0b 4d 85 ff 0f 84 58 fd ff ff 4d 39 fc 0f 84 4f fd
ff ff 4c 89
[Mon Apr 11 09:59:17 2022] RSP: 0018:ffffad8ab09c7db8 EFLAGS: 00010286
[Mon Apr 11 09:59:17 2022] RAX: 0000000000000000 RBX: ffff9feaaf7cb000
RCX: 0000000000000000
[Mon Apr 11 09:59:17 2022] RDX: 0000000000000001 RSI: 0000000000000082
RDI: 00000000ffffffff
[Mon Apr 11 09:59:17 2022] RBP: ffff9feaae54f460 R08: 0000000000000001
R09: 000000000000087a
[Mon Apr 11 09:59:17 2022] R10: 0000000000031e00 R11: 0000000000000004
R12: ffff9feaae550070
[Mon Apr 11 09:59:17 2022] R13: 0000000018270000 R14: ffff9feaae5ae000
R15: ffff9fc07ccfd400
[Mon Apr 11 09:59:17 2022] FS: 0000000000000000(0000)
GS:ffff9fcb1f880000(0000) knlGS:0000000000000000
[Mon Apr 11 09:59:17 2022] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Mon Apr 11 09:59:17 2022] CR2: 00007fdae60db000 CR3: 00000037e87fe005
CR4: 00000000001606e0
[Mon Apr 11 09:59:17 2022] Call Trace:
[Mon Apr 11 09:59:17 2022] nfsd4_proc_compound+0x45d/0x730 [nfsd]
[Mon Apr 11 09:59:17 2022] nfsd_dispatch+0xc1/0x200 [nfsd]
[Mon Apr 11 09:59:17 2022] svc_process_common+0x399/0x6e0 [sunrpc]
[Mon Apr 11 09:59:17 2022] ? svc_recv+0x312/0x9f0 [sunrpc]
[Mon Apr 11 09:59:17 2022] ? nfsd_svc+0x2f0/0x2f0 [nfsd]
[Mon Apr 11 09:59:17 2022] ? nfsd_destroy+0x60/0x60 [nfsd]
[Mon Apr 11 09:59:17 2022] svc_process+0xd4/0x110 [sunrpc]
[Mon Apr 11 09:59:17 2022] nfsd+0xed/0x150 [nfsd]
[Mon Apr 11 09:59:17 2022] kthread+0x13c/0x160
[Mon Apr 11 09:59:17 2022] ? __kthread_bind_mask+0x60/0x60
[Mon Apr 11 09:59:17 2022] ret_from_fork+0x1f/0x40
[Mon Apr 11 09:59:17 2022] ---[ end trace c58c1aaca9be5d22 ]---
I never saw this with an earlier kernel (and we use 5.4 since a long
time).
I did not see this with the unchanged 5.4.192, though, which I tested
before running 5.4.192 with the above mentioned patches reverted.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
> On May 11, 2022, at 10:23 AM, Greg KH <[email protected]> wrote:
>
> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>
>>
>>> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]> wrote:
>>>
>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>>>> Hi,
>>>>
>>>> starting with 5.4.188 wie see a massive performance regression on our
>>>> nfs-server. It basically is serving requests very very slowly with cpu
>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
>>>> unusable as a fileserver.
>>>>
>>>> The culprit are commits (or one of it):
>>>>
>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>> nfsd_file_lru_dispose()"
>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
>>>> laundrette"
>>>>
>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>
>>>> If I revert them in v5.4.192 the kernel works as before and performance is
>>>> ok again.
>>>>
>>>> I did not try to revert them one by one as any disruption of our nfs-server
>>>> is a severe problem for us and I'm not sure if they are related.
>>>>
>>>> 5.10 and 5.15 both always performed very badly on our nfs-server in a
>>>> similar way so we were stuck with 5.4.
>>>>
>>>> I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
>>>> revert them in 5.15 yet.
>>>
>>> Odds are 5.18-rc6 is also a problem?
>>
>> We believe that
>>
>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>
>> addresses the performance regression. It was merged into 5.18-rc.
>
> And into 5.17.4 if someone wants to try that release.
I don't have a lot of time to backport this one myself, so
I welcome anyone who wants to apply that commit to their
favorite LTS kernel and test it for us.
>>> If so, I'll just wait for the fix to get into Linus's tree as this does
>>> not seem to be a stable-tree-only issue.
>>
>> Unfortunately I've received a recent report that the fix introduces
>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>
> Ick, not good, any potential fixes for that?
Not yet. I was at LSF last week, so I've just started digging
into this one. I've confirmed that the report is a real bug,
but we still don't know how hard it is to hit it with real
workloads.
--
Chuck Lever
Am 2022-05-11 17:50, schrieb Chuck Lever III:
>> On May 11, 2022, at 11:30 AM, Wolfgang Walter <[email protected]> wrote:
>>
>> Am 2022-05-11 16:36, schrieb Chuck Lever III:
>>>> On May 11, 2022, at 10:23 AM, Greg KH <[email protected]>
>>>> wrote:
>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>>>>> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]>
>>>>>> wrote:
>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>>>>>>> Hi,
>>>>>>> starting with 5.4.188 wie see a massive performance regression on
>>>>>>> our
>>>>>>> nfs-server. It basically is serving requests very very slowly
>>>>>>> with cpu
>>>>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that
>>>>>>> it is
>>>>>>> unusable as a fileserver.
>>>>>>> The culprit are commits (or one of it):
>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>>>> nfsd_file_lru_dispose()"
>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise
>>>>>>> filecache
>>>>>>> laundrette"
>>>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>> If I revert them in v5.4.192 the kernel works as before and
>>>>>>> performance is
>>>>>>> ok again.
>>>>>>> I did not try to revert them one by one as any disruption of our
>>>>>>> nfs-server
>>>>>>> is a severe problem for us and I'm not sure if they are related.
>>>>>>> 5.10 and 5.15 both always performed very badly on our nfs-server
>>>>>>> in a
>>>>>>> similar way so we were stuck with 5.4.
>>>>>>> I now think this is because of
>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't
>>>>>>> tried to
>>>>>>> revert them in 5.15 yet.
>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>> We believe that
>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>>> addresses the performance regression. It was merged into 5.18-rc.
>>>> And into 5.17.4 if someone wants to try that release.
>>> I don't have a lot of time to backport this one myself, so
>>> I welcome anyone who wants to apply that commit to their
>>> favorite LTS kernel and test it for us.
>>>>>> If so, I'll just wait for the fix to get into Linus's tree as this
>>>>>> does
>>>>>> not seem to be a stable-tree-only issue.
>>>>> Unfortunately I've received a recent report that the fix introduces
>>>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>>> Ick, not good, any potential fixes for that?
>>> Not yet. I was at LSF last week, so I've just started digging
>>> into this one. I've confirmed that the report is a real bug,
>>> but we still don't know how hard it is to hit it with real
>>> workloads.
>>> --
>>> Chuck Lever
>>
>> This maybe unrelated: when I used 5.4.188 for the first time I got
>> this:
>>
>> [Mon Apr 11 09:20:23 2022] ------------[ cut here ]------------
>> [Mon Apr 11 09:20:23 2022] refcount_t: underflow; use-after-free.
>
> I don't believe this is the same issue as the performance
> regression. Hard to say more without knowing what the
> server's workload was when this occurred.
>
> The only two NFSD-related patches that appear between
> v5.4.187 and v5.4.188 are Trond's filecache. It's implausible
> that a callback reference count issue would be related.
Maybe it is because of the high cpu load with these two patches. Whereas
there are usually only 10 nfsd threads in R state even under high load
and only for short times, with these two patches there are periods where
there are 50 or more.
Our monitoring says that the load on average (1m) is < 1, with these two
patches it was 18. So there is more parallelism.
>
>
>> [Mon Apr 11 09:20:23 2022] WARNING: CPU: 18 PID: 7443 at
>> lib/refcount.c:190 refcount_sub_and_test_checked+0x48/0x50
>> [Mon Apr 11 09:20:23 2022] Modules linked in: rpcsec_gss_krb5(E)
>> msr(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E)
>> binfmt_misc(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E)
>> dm_bufio(E) ipmi_ssif(E) intel_rapl_ms
>> r(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
>> intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
>> crct10dif_pclmul(E) crc32_pclmul(E) ast(E) snd_pcm(E)
>> drm_vram_helper(E) ttm(E) ghash_clmulni_intel(E) s
>> nd_timer(E) rapl(E) snd(E) soundcore(E) drm_kms_helper(E)
>> intel_cstate(E) pcspkr(E) iTCO_wdt(E) intel_uncore(E) drm(E)
>> iTCO_vendor_support(E) mei_me(E) joydev(E) i2c_algo_bit(E) watchdog(E)
>> mei(E) ioatdma(E) evdev(E) sg(E) ipmi_si
>> (E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E)
>> button(E) poly1305_x86_64(E) poly1305_generic(E) sha256_ssse3(E)
>> sha1_ssse3(E) chacha_generic(E) chacha20poly1305(E) aesni_intel(E)
>> crypto_simd(E) glue_helper(
>> E) cryptd(E) drbd(E) lru_cache(E) nfsd(E) nfs_acl(E) lockd(E) grace(E)
>> auth_rpcgss(E) sunrpc(E)
>> [Mon Apr 11 09:20:23 2022] ip_tables(E) x_tables(E) autofs4(E)
>> ext4(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc
>> 32c(E) crc32c_generic(E) raid0(E) multipath(E) linear(E) ses(E)
>> enclosure(E) hid_generic(E) usbhid(E) hid(E) raid1(E) md_mod(E)
>> sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E) xhci_pci(E) libahci(E)
>> ehci_pci(E) xfrm_algo(E) mpt3sas(E)
>> xhci_hcd(E) ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E)
>> usbcore(E) scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E)
>> mfd_core(E) usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
>> [Mon Apr 11 09:20:23 2022] CPU: 18 PID: 7443 Comm: kworker/u50:1
>> Tainted: G W E 5.4.188-debian64.all+1.2 #1
>> [Mon Apr 11 09:20:23 2022] Hardware name: Supermicro X10DRi/X10DRI-T,
>> BIOS 1.1a 10/16/2015
>> [Mon Apr 11 09:20:23 2022] Workqueue: rpciod rpc_async_schedule
>> [sunrpc]
>> [Mon Apr 11 09:20:23 2022] RIP:
>> 0010:refcount_sub_and_test_checked+0x48/0x50
>> [Mon Apr 11 09:20:23 2022] Code: 31 e4 44 89 e0 41 5c c3 eb f8 44 0f
>> b6 25 7c ee ce 00 45 84 e4 75 e8 48 c7 c7 80 97 8b b1 c6 05 69 ee ce
>> 00 01 e8 78 3f 3a 00 <0f> 0b eb d4 0f 1f 40 00 48 89 fe bf 01 00 00 00
>> eb a6 66 0f 1f 44
>> [Mon Apr 11 09:20:23 2022] RSP: 0018:ffffad8a8d01fe48 EFLAGS: 00010286
>> [Mon Apr 11 09:20:23 2022] RAX: 0000000000000000 RBX: 0000000000002a81
>> RCX: 0000000000000000
>> [Mon Apr 11 09:20:23 2022] RDX: 0000000000000001 RSI: 0000000000000082
>> RDI: 00000000ffffffff
>> [Mon Apr 11 09:20:23 2022] RBP: ffff9fb4a2394400 R08: 0000000000000001
>> R09: 000000000000085b
>> [Mon Apr 11 09:20:23 2022] R10: 0000000000030f5c R11: 0000000000000004
>> R12: 0000000000000000
>> [Mon Apr 11 09:20:23 2022] R13: ffff9feaf358d400 R14: 0000000000000000
>> R15: ffff9fb7e1b25148
>> [Mon Apr 11 09:20:23 2022] FS: 0000000000000000(0000)
>> GS:ffff9feaff980000(0000) knlGS:0000000000000000
>> [Mon Apr 11 09:20:23 2022] CS: 0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> [Mon Apr 11 09:20:23 2022] CR2: 00007ffe62ca3080 CR3: 000000350880a002
>> CR4: 00000000001606e0
>> [Mon Apr 11 09:20:23 2022] Call Trace:
>> [Mon Apr 11 09:20:23 2022] nfsd4_cb_offload_release+0x16/0x30 [nfsd]
>> [Mon Apr 11 09:20:23 2022] rpc_free_task+0x39/0x70 [sunrpc]
>> [Mon Apr 11 09:20:23 2022] rpc_async_schedule+0x29/0x40 [sunrpc]
>> [Mon Apr 11 09:20:23 2022] process_one_work+0x1ab/0x390
>> [Mon Apr 11 09:20:23 2022] worker_thread+0x50/0x3c0
>> [Mon Apr 11 09:20:23 2022] ? rescuer_thread+0x370/0x370
>> [Mon Apr 11 09:20:23 2022] kthread+0x13c/0x160
>> [Mon Apr 11 09:20:23 2022] ? __kthread_bind_mask+0x60/0x60
>> [Mon Apr 11 09:20:23 2022] ret_from_fork+0x1f/0x40
>> [Mon Apr 11 09:20:23 2022] ---[ end trace c58c1aaca9be5d21 ]---
>> [Mon Apr 11 09:56:32 2022] perf: interrupt took too long (6293 >
>> 6228), lowering kernel.perf_event_max_sample_rate to 31750
>> [Mon Apr 11 09:59:17 2022] ------------[ cut here ]------------
>
>
>
>> [Mon Apr 11 09:59:17 2022] nfsd4_process_open2 failed to open
>> newly-created file! status=10008
>
> This is definitely not the same problem. This is a failure
> to open a freshly created file. Maybe due to a memory shortage?
No, I don't think so, there was about 100G available.
> But this area has been a problem for a long time.
>
> I have some patches going into 5.19 that might help in this
> area by making NFSv4 OPEN(CREATE) atomic.
>
>
>> [Mon Apr 11 09:59:17 2022] WARNING: CPU: 2 PID: 3061 at
>> fs/nfsd/nfs4proc.c:456 nfsd4_open+0x39a/0x710 [nfsd]
>> [Mon Apr 11 09:59:17 2022] Modules linked in: rpcsec_gss_krb5(E)
>> msr(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E)
>> binfmt_misc(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E)
>> dm_bufio(E) ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E)
>> sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E)
>> kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E)
>> ast(E) snd_pcm(E) drm_vram_helper(E) ttm(E) ghash_clmulni_intel(E)
>> snd_timer(E) rapl(E) snd(E) soundcore(E) drm_kms_helper(E)
>> intel_cstate(E) pcspkr(E) iTCO_wdt(E) intel_uncore(E) drm(E)
>> iTCO_vendor_support(E) mei_me(E) joydev(E) i2c_algo_bit(E) watchdog(E)
>> mei(E) ioatdma(E) evdev(E) sg(E) ipmi_si(E) ipmi_devintf(E)
>> ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) button(E)
>> poly1305_x86_64(E) poly1305_generic(E) sha256_ssse3(E) sha1_ssse3(E)
>> chacha_generic(E) chacha20poly1305(E) aesni_intel(E) crypto_simd(E)
>> glue_helper(E) cryptd(E) drbd(E) lru_cache(E) nfsd(E) nfs_acl(E)
>> lockd(E) grace(E) auth_rpcgss(E) sunrpc(E)
>> [Mon Apr 11 09:59:17 2022] ip_tables(E) x_tables(E) autofs4(E)
>> ext4(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid0(E)
>> multipath(E) linear(E) ses(E) enclosure(E) hid_generic(E) usbhid(E)
>> hid(E) raid1(E) md_mod(E) sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E)
>> xhci_pci(E) libahci(E) ehci_pci(E) xfrm_algo(E) mpt3sas(E) xhci_hcd(E)
>> ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E) usbcore(E)
>> scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E) mfd_core(E)
>> usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
>> [Mon Apr 11 09:59:17 2022] CPU: 2 PID: 3061 Comm: nfsd Tainted: G
>> W E 5.4.188-debian64.all+1.2 #1
>> [Mon Apr 11 09:59:17 2022] Hardware name: Supermicro X10DRi/X10DRI-T,
>> BIOS 1.1a 10/16/2015
>> [Mon Apr 11 09:59:17 2022] RIP: 0010:nfsd4_open+0x39a/0x710 [nfsd]
>> [Mon Apr 11 09:59:17 2022] Code: e8 db 0f 01 00 41 89 c5 85 c0 0f 84
>> 5e 01 00 00 80 bd 15 01 00 00 00 74 13 44 89 ee 48 c7 c7 a8 08 6f c0
>> 0f ce e8 b6 f8 ae f0 <0f> 0b 4d 85 ff 0f 84 58 fd ff ff 4d 39 fc 0f 84
>> 4f fd ff ff 4c 89
>> [Mon Apr 11 09:59:17 2022] RSP: 0018:ffffad8ab09c7db8 EFLAGS: 00010286
>> [Mon Apr 11 09:59:17 2022] RAX: 0000000000000000 RBX: ffff9feaaf7cb000
>> RCX: 0000000000000000
>> [Mon Apr 11 09:59:17 2022] RDX: 0000000000000001 RSI: 0000000000000082
>> RDI: 00000000ffffffff
>> [Mon Apr 11 09:59:17 2022] RBP: ffff9feaae54f460 R08: 0000000000000001
>> R09: 000000000000087a
>> [Mon Apr 11 09:59:17 2022] R10: 0000000000031e00 R11: 0000000000000004
>> R12: ffff9feaae550070
>> [Mon Apr 11 09:59:17 2022] R13: 0000000018270000 R14: ffff9feaae5ae000
>> R15: ffff9fc07ccfd400
>> [Mon Apr 11 09:59:17 2022] FS: 0000000000000000(0000)
>> GS:ffff9fcb1f880000(0000) knlGS:0000000000000000
>> [Mon Apr 11 09:59:17 2022] CS: 0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> [Mon Apr 11 09:59:17 2022] CR2: 00007fdae60db000 CR3: 00000037e87fe005
>> CR4: 00000000001606e0
>> [Mon Apr 11 09:59:17 2022] Call Trace:
>> [Mon Apr 11 09:59:17 2022] nfsd4_proc_compound+0x45d/0x730 [nfsd]
>> [Mon Apr 11 09:59:17 2022] nfsd_dispatch+0xc1/0x200 [nfsd]
>> [Mon Apr 11 09:59:17 2022] svc_process_common+0x399/0x6e0 [sunrpc]
>> [Mon Apr 11 09:59:17 2022] ? svc_recv+0x312/0x9f0 [sunrpc]
>> [Mon Apr 11 09:59:17 2022] ? nfsd_svc+0x2f0/0x2f0 [nfsd]
>> [Mon Apr 11 09:59:17 2022] ? nfsd_destroy+0x60/0x60 [nfsd]
>> [Mon Apr 11 09:59:17 2022] svc_process+0xd4/0x110 [sunrpc]
>> [Mon Apr 11 09:59:17 2022] nfsd+0xed/0x150 [nfsd]
>> [Mon Apr 11 09:59:17 2022] kthread+0x13c/0x160
>> [Mon Apr 11 09:59:17 2022] ? __kthread_bind_mask+0x60/0x60
>> [Mon Apr 11 09:59:17 2022] ret_from_fork+0x1f/0x40
>> [Mon Apr 11 09:59:17 2022] ---[ end trace c58c1aaca9be5d22 ]---
>>
>> I never saw this with an earlier kernel (and we use 5.4 since a long
>> time).
>>
>> I did not see this with the unchanged 5.4.192, though, which I tested
>> before running 5.4.192 with the above mentioned patches reverted.
>>
>> Regards,
>> --
>> Wolfgang Walter
>> Studentenwerk München
>> Anstalt des öffentlichen Rechts
>
> --
> Chuck Lever
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
> On May 11, 2022, at 11:30 AM, Wolfgang Walter <[email protected]> wrote:
>
> Am 2022-05-11 16:36, schrieb Chuck Lever III:
>>> On May 11, 2022, at 10:23 AM, Greg KH <[email protected]> wrote:
>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>>>> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]> wrote:
>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>>>>>> Hi,
>>>>>> starting with 5.4.188 wie see a massive performance regression on our
>>>>>> nfs-server. It basically is serving requests very very slowly with cpu
>>>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
>>>>>> unusable as a fileserver.
>>>>>> The culprit are commits (or one of it):
>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>>> nfsd_file_lru_dispose()"
>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
>>>>>> laundrette"
>>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>> If I revert them in v5.4.192 the kernel works as before and performance is
>>>>>> ok again.
>>>>>> I did not try to revert them one by one as any disruption of our nfs-server
>>>>>> is a severe problem for us and I'm not sure if they are related.
>>>>>> 5.10 and 5.15 both always performed very badly on our nfs-server in a
>>>>>> similar way so we were stuck with 5.4.
>>>>>> I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
>>>>>> revert them in 5.15 yet.
>>>>> Odds are 5.18-rc6 is also a problem?
>>>> We believe that
>>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>> addresses the performance regression. It was merged into 5.18-rc.
>>> And into 5.17.4 if someone wants to try that release.
>> I don't have a lot of time to backport this one myself, so
>> I welcome anyone who wants to apply that commit to their
>> favorite LTS kernel and test it for us.
>>>>> If so, I'll just wait for the fix to get into Linus's tree as this does
>>>>> not seem to be a stable-tree-only issue.
>>>> Unfortunately I've received a recent report that the fix introduces
>>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>> Ick, not good, any potential fixes for that?
>> Not yet. I was at LSF last week, so I've just started digging
>> into this one. I've confirmed that the report is a real bug,
>> but we still don't know how hard it is to hit it with real
>> workloads.
>> --
>> Chuck Lever
>
> This maybe unrelated: when I used 5.4.188 for the first time I got this:
>
> [Mon Apr 11 09:20:23 2022] ------------[ cut here ]------------
> [Mon Apr 11 09:20:23 2022] refcount_t: underflow; use-after-free.
I don't believe this is the same issue as the performance
regression. Hard to say more without knowing what the
server's workload was when this occurred.
The only two NFSD-related patches that appear between
v5.4.187 and v5.4.188 are Trond's filecache. It's implausible
that a callback reference count issue would be related.
> [Mon Apr 11 09:20:23 2022] WARNING: CPU: 18 PID: 7443 at lib/refcount.c:190 refcount_sub_and_test_checked+0x48/0x50
> [Mon Apr 11 09:20:23 2022] Modules linked in: rpcsec_gss_krb5(E) msr(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E) binfmt_misc(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ipmi_ssif(E) intel_rapl_ms
> r(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ast(E) snd_pcm(E) drm_vram_helper(E) ttm(E) ghash_clmulni_intel(E) s
> nd_timer(E) rapl(E) snd(E) soundcore(E) drm_kms_helper(E) intel_cstate(E) pcspkr(E) iTCO_wdt(E) intel_uncore(E) drm(E) iTCO_vendor_support(E) mei_me(E) joydev(E) i2c_algo_bit(E) watchdog(E) mei(E) ioatdma(E) evdev(E) sg(E) ipmi_si
> (E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) button(E) poly1305_x86_64(E) poly1305_generic(E) sha256_ssse3(E) sha1_ssse3(E) chacha_generic(E) chacha20poly1305(E) aesni_intel(E) crypto_simd(E) glue_helper(
> E) cryptd(E) drbd(E) lru_cache(E) nfsd(E) nfs_acl(E) lockd(E) grace(E) auth_rpcgss(E) sunrpc(E)
> [Mon Apr 11 09:20:23 2022] ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc
> 32c(E) crc32c_generic(E) raid0(E) multipath(E) linear(E) ses(E) enclosure(E) hid_generic(E) usbhid(E) hid(E) raid1(E) md_mod(E) sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E) xhci_pci(E) libahci(E) ehci_pci(E) xfrm_algo(E) mpt3sas(E)
> xhci_hcd(E) ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E) usbcore(E) scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E) mfd_core(E) usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
> [Mon Apr 11 09:20:23 2022] CPU: 18 PID: 7443 Comm: kworker/u50:1 Tainted: G W E 5.4.188-debian64.all+1.2 #1
> [Mon Apr 11 09:20:23 2022] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1a 10/16/2015
> [Mon Apr 11 09:20:23 2022] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [Mon Apr 11 09:20:23 2022] RIP: 0010:refcount_sub_and_test_checked+0x48/0x50
> [Mon Apr 11 09:20:23 2022] Code: 31 e4 44 89 e0 41 5c c3 eb f8 44 0f b6 25 7c ee ce 00 45 84 e4 75 e8 48 c7 c7 80 97 8b b1 c6 05 69 ee ce 00 01 e8 78 3f 3a 00 <0f> 0b eb d4 0f 1f 40 00 48 89 fe bf 01 00 00 00 eb a6 66 0f 1f 44
> [Mon Apr 11 09:20:23 2022] RSP: 0018:ffffad8a8d01fe48 EFLAGS: 00010286
> [Mon Apr 11 09:20:23 2022] RAX: 0000000000000000 RBX: 0000000000002a81 RCX: 0000000000000000
> [Mon Apr 11 09:20:23 2022] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 00000000ffffffff
> [Mon Apr 11 09:20:23 2022] RBP: ffff9fb4a2394400 R08: 0000000000000001 R09: 000000000000085b
> [Mon Apr 11 09:20:23 2022] R10: 0000000000030f5c R11: 0000000000000004 R12: 0000000000000000
> [Mon Apr 11 09:20:23 2022] R13: ffff9feaf358d400 R14: 0000000000000000 R15: ffff9fb7e1b25148
> [Mon Apr 11 09:20:23 2022] FS: 0000000000000000(0000) GS:ffff9feaff980000(0000) knlGS:0000000000000000
> [Mon Apr 11 09:20:23 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Mon Apr 11 09:20:23 2022] CR2: 00007ffe62ca3080 CR3: 000000350880a002 CR4: 00000000001606e0
> [Mon Apr 11 09:20:23 2022] Call Trace:
> [Mon Apr 11 09:20:23 2022] nfsd4_cb_offload_release+0x16/0x30 [nfsd]
> [Mon Apr 11 09:20:23 2022] rpc_free_task+0x39/0x70 [sunrpc]
> [Mon Apr 11 09:20:23 2022] rpc_async_schedule+0x29/0x40 [sunrpc]
> [Mon Apr 11 09:20:23 2022] process_one_work+0x1ab/0x390
> [Mon Apr 11 09:20:23 2022] worker_thread+0x50/0x3c0
> [Mon Apr 11 09:20:23 2022] ? rescuer_thread+0x370/0x370
> [Mon Apr 11 09:20:23 2022] kthread+0x13c/0x160
> [Mon Apr 11 09:20:23 2022] ? __kthread_bind_mask+0x60/0x60
> [Mon Apr 11 09:20:23 2022] ret_from_fork+0x1f/0x40
> [Mon Apr 11 09:20:23 2022] ---[ end trace c58c1aaca9be5d21 ]---
> [Mon Apr 11 09:56:32 2022] perf: interrupt took too long (6293 > 6228), lowering kernel.perf_event_max_sample_rate to 31750
> [Mon Apr 11 09:59:17 2022] ------------[ cut here ]------------
> [Mon Apr 11 09:59:17 2022] nfsd4_process_open2 failed to open newly-created file! status=10008
This is definitely not the same problem. This is a failure
to open a freshly created file. Maybe due to a memory shortage?
But this area has been a problem for a long time.
I have some patches going into 5.19 that might help in this
area by making NFSv4 OPEN(CREATE) atomic.
> [Mon Apr 11 09:59:17 2022] WARNING: CPU: 2 PID: 3061 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x39a/0x710 [nfsd]
> [Mon Apr 11 09:59:17 2022] Modules linked in: rpcsec_gss_krb5(E) msr(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) dm_cache_smq(E) binfmt_misc(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ast(E) snd_pcm(E) drm_vram_helper(E) ttm(E) ghash_clmulni_intel(E) snd_timer(E) rapl(E) snd(E) soundcore(E) drm_kms_helper(E) intel_cstate(E) pcspkr(E) iTCO_wdt(E) intel_uncore(E) drm(E) iTCO_vendor_support(E) mei_me(E) joydev(E) i2c_algo_bit(E) watchdog(E) mei(E) ioatdma(E) evdev(E) sg(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_power_meter(E) acpi_pad(E) button(E) poly1305_x86_64(E) poly1305_generic(E) sha256_ssse3(E) sha1_ssse3(E) chacha_generic(E) chacha20poly1305(E) aesni_intel(E) crypto_simd(E) glue_helper(E) cryptd(E) drbd(E) lru_cache(E) nfsd(E) nfs_acl(E) lockd(E) grace(E) auth_rpcgss(E) sunrpc(E)
> [Mon Apr 11 09:59:17 2022] ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid0(E) multipath(E) linear(E) ses(E) enclosure(E) hid_generic(E) usbhid(E) hid(E) raid1(E) md_mod(E) sd_mod(E) crc32c_intel(E) ixgbe(E) ahci(E) xhci_pci(E) libahci(E) ehci_pci(E) xfrm_algo(E) mpt3sas(E) xhci_hcd(E) ehci_hcd(E) dca(E) raid_class(E) libata(E) libphy(E) usbcore(E) scsi_transport_sas(E) lpc_ich(E) ptp(E) i2c_i801(E) mfd_core(E) usb_common(E) pps_core(E) scsi_mod(E) mdio(E) wmi(E)
> [Mon Apr 11 09:59:17 2022] CPU: 2 PID: 3061 Comm: nfsd Tainted: G W E 5.4.188-debian64.all+1.2 #1
> [Mon Apr 11 09:59:17 2022] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1a 10/16/2015
> [Mon Apr 11 09:59:17 2022] RIP: 0010:nfsd4_open+0x39a/0x710 [nfsd]
> [Mon Apr 11 09:59:17 2022] Code: e8 db 0f 01 00 41 89 c5 85 c0 0f 84 5e 01 00 00 80 bd 15 01 00 00 00 74 13 44 89 ee 48 c7 c7 a8 08 6f c0 0f ce e8 b6 f8 ae f0 <0f> 0b 4d 85 ff 0f 84 58 fd ff ff 4d 39 fc 0f 84 4f fd ff ff 4c 89
> [Mon Apr 11 09:59:17 2022] RSP: 0018:ffffad8ab09c7db8 EFLAGS: 00010286
> [Mon Apr 11 09:59:17 2022] RAX: 0000000000000000 RBX: ffff9feaaf7cb000 RCX: 0000000000000000
> [Mon Apr 11 09:59:17 2022] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 00000000ffffffff
> [Mon Apr 11 09:59:17 2022] RBP: ffff9feaae54f460 R08: 0000000000000001 R09: 000000000000087a
> [Mon Apr 11 09:59:17 2022] R10: 0000000000031e00 R11: 0000000000000004 R12: ffff9feaae550070
> [Mon Apr 11 09:59:17 2022] R13: 0000000018270000 R14: ffff9feaae5ae000 R15: ffff9fc07ccfd400
> [Mon Apr 11 09:59:17 2022] FS: 0000000000000000(0000) GS:ffff9fcb1f880000(0000) knlGS:0000000000000000
> [Mon Apr 11 09:59:17 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Mon Apr 11 09:59:17 2022] CR2: 00007fdae60db000 CR3: 00000037e87fe005 CR4: 00000000001606e0
> [Mon Apr 11 09:59:17 2022] Call Trace:
> [Mon Apr 11 09:59:17 2022] nfsd4_proc_compound+0x45d/0x730 [nfsd]
> [Mon Apr 11 09:59:17 2022] nfsd_dispatch+0xc1/0x200 [nfsd]
> [Mon Apr 11 09:59:17 2022] svc_process_common+0x399/0x6e0 [sunrpc]
> [Mon Apr 11 09:59:17 2022] ? svc_recv+0x312/0x9f0 [sunrpc]
> [Mon Apr 11 09:59:17 2022] ? nfsd_svc+0x2f0/0x2f0 [nfsd]
> [Mon Apr 11 09:59:17 2022] ? nfsd_destroy+0x60/0x60 [nfsd]
> [Mon Apr 11 09:59:17 2022] svc_process+0xd4/0x110 [sunrpc]
> [Mon Apr 11 09:59:17 2022] nfsd+0xed/0x150 [nfsd]
> [Mon Apr 11 09:59:17 2022] kthread+0x13c/0x160
> [Mon Apr 11 09:59:17 2022] ? __kthread_bind_mask+0x60/0x60
> [Mon Apr 11 09:59:17 2022] ret_from_fork+0x1f/0x40
> [Mon Apr 11 09:59:17 2022] ---[ end trace c58c1aaca9be5d22 ]---
>
> I never saw this with an earlier kernel (and we use 5.4 since a long time).
>
> I did not see this with the unchanged 5.4.192, though, which I tested before running 5.4.192 with the above mentioned patches reverted.
>
> Regards,
> --
> Wolfgang Walter
> Studentenwerk München
> Anstalt des öffentlichen Rechts
--
Chuck Lever
> On May 11, 2022, at 10:36 AM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On May 11, 2022, at 10:23 AM, Greg KH <[email protected]> wrote:
>>
>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>>
>>>
>>>> On May 11, 2022, at 8:38 AM, Greg KH <[email protected]> wrote:
>>>>
>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter wrote:
>>>>> Hi,
>>>>>
>>>>> starting with 5.4.188 wie see a massive performance regression on our
>>>>> nfs-server. It basically is serving requests very very slowly with cpu
>>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
>>>>> unusable as a fileserver.
>>>>>
>>>>> The culprit are commits (or one of it):
>>>>>
>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>> nfsd_file_lru_dispose()"
>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
>>>>> laundrette"
>>>>>
>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>
>>>>> If I revert them in v5.4.192 the kernel works as before and performance is
>>>>> ok again.
>>>>>
>>>>> I did not try to revert them one by one as any disruption of our nfs-server
>>>>> is a severe problem for us and I'm not sure if they are related.
>>>>>
>>>>> 5.10 and 5.15 both always performed very badly on our nfs-server in a
>>>>> similar way so we were stuck with 5.4.
>>>>>
>>>>> I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
>>>>> revert them in 5.15 yet.
>>>>
>>>> Odds are 5.18-rc6 is also a problem?
>>>
>>> We believe that
>>>
>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>
>>> addresses the performance regression. It was merged into 5.18-rc.
>>
>> And into 5.17.4 if someone wants to try that release.
>
> I don't have a lot of time to backport this one myself, so
> I welcome anyone who wants to apply that commit to their
> favorite LTS kernel and test it for us.
>
>
>>>> If so, I'll just wait for the fix to get into Linus's tree as this does
>>>> not seem to be a stable-tree-only issue.
>>>
>>> Unfortunately I've received a recent report that the fix introduces
>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>
>> Ick, not good, any potential fixes for that?
>
> Not yet. I was at LSF last week, so I've just started digging
> into this one. I've confirmed that the report is a real bug,
> but we still don't know how hard it is to hit it with real
> workloads.
We believe the following, which should be part of the first
NFSD pull request for 5.19, will properly address the splat.
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
--
Chuck Lever
On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>
>
> > On May 11, 2022, at 10:36 AM, Chuck Lever III
> > <[email protected]> wrote:
> >
> >
> >
> > > On May 11, 2022, at 10:23 AM, Greg KH
> > > <[email protected]> wrote:
> > >
> > > On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
> > > >
> > > >
> > > > > On May 11, 2022, at 8:38 AM, Greg KH
> > > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
> > > > > wrote:
> > > > > > Hi,
> > > > > >
> > > > > > starting with 5.4.188 wie see a massive performance
> > > > > > regression on our
> > > > > > nfs-server. It basically is serving requests very very
> > > > > > slowly with cpu
> > > > > > utilization of 100% (with 5.4.187 and earlier it is 10%) so
> > > > > > that it is
> > > > > > unusable as a fileserver.
> > > > > >
> > > > > > The culprit are commits (or one of it):
> > > > > >
> > > > > > c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
> > > > > > nfsd_file_lru_dispose()"
> > > > > > 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
> > > > > > Containerise filecache
> > > > > > laundrette"
> > > > > >
> > > > > > (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
> > > > > > 9542e6a643fc69d528dfb3303f145719c61d3050)
> > > > > >
> > > > > > If I revert them in v5.4.192 the kernel works as before and
> > > > > > performance is
> > > > > > ok again.
> > > > > >
> > > > > > I did not try to revert them one by one as any disruption
> > > > > > of our nfs-server
> > > > > > is a severe problem for us and I'm not sure if they are
> > > > > > related.
> > > > > >
> > > > > > 5.10 and 5.15 both always performed very badly on our nfs-
> > > > > > server in a
> > > > > > similar way so we were stuck with 5.4.
> > > > > >
> > > > > > I now think this is because of
> > > > > > 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
> > > > > > and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I
> > > > > > didn't tried to
> > > > > > revert them in 5.15 yet.
> > > > >
> > > > > Odds are 5.18-rc6 is also a problem?
> > > >
> > > > We believe that
> > > >
> > > > 6b8a94332ee4 ("nfsd: Fix a write performance regression")
> > > >
> > > > addresses the performance regression. It was merged into 5.18-
> > > > rc.
> > >
> > > And into 5.17.4 if someone wants to try that release.
> >
> > I don't have a lot of time to backport this one myself, so
> > I welcome anyone who wants to apply that commit to their
> > favorite LTS kernel and test it for us.
> >
> >
> > > > > If so, I'll just wait for the fix to get into Linus's tree as
> > > > > this does
> > > > > not seem to be a stable-tree-only issue.
> > > >
> > > > Unfortunately I've received a recent report that the fix
> > > > introduces
> > > > a "sleep while spinlock is held" for NFSv4.0 in rare cases.
> > >
> > > Ick, not good, any potential fixes for that?
> >
> > Not yet. I was at LSF last week, so I've just started digging
> > into this one. I've confirmed that the report is a real bug,
> > but we still don't know how hard it is to hit it with real
> > workloads.
>
> We believe the following, which should be part of the first
> NFSD pull request for 5.19, will properly address the splat.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>
>
Uh... What happens if you have 2 simultaneous calls to
nfsd4_release_lockowner() for the same file? i.e. 2 separate processes
owned by the same user, both locking the same file.
Can't that cause the 'putlist' to get corrupted when both callers add
the same nf->nf_putfile to two separate lists?
> --
> Chuck Lever
>
>
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
>
>
> > On May 20, 2022, at 12:40 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On May 11, 2022, at 10:36 AM, Chuck Lever III
> > > > <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > On May 11, 2022, at 10:23 AM, Greg KH
> > > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > > On May 11, 2022, at 8:38 AM, Greg KH
> > > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
> > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > starting with 5.4.188 wie see a massive performance
> > > > > > > > regression on our
> > > > > > > > nfs-server. It basically is serving requests very very
> > > > > > > > slowly with cpu
> > > > > > > > utilization of 100% (with 5.4.187 and earlier it is
> > > > > > > > 10%) so
> > > > > > > > that it is
> > > > > > > > unusable as a fileserver.
> > > > > > > >
> > > > > > > > The culprit are commits (or one of it):
> > > > > > > >
> > > > > > > > c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
> > > > > > > > nfsd_file_lru_dispose()"
> > > > > > > > 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
> > > > > > > > Containerise filecache
> > > > > > > > laundrette"
> > > > > > > >
> > > > > > > > (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
> > > > > > > > 9542e6a643fc69d528dfb3303f145719c61d3050)
> > > > > > > >
> > > > > > > > If I revert them in v5.4.192 the kernel works as before
> > > > > > > > and
> > > > > > > > performance is
> > > > > > > > ok again.
> > > > > > > >
> > > > > > > > I did not try to revert them one by one as any
> > > > > > > > disruption
> > > > > > > > of our nfs-server
> > > > > > > > is a severe problem for us and I'm not sure if they are
> > > > > > > > related.
> > > > > > > >
> > > > > > > > 5.10 and 5.15 both always performed very badly on our
> > > > > > > > nfs-
> > > > > > > > server in a
> > > > > > > > similar way so we were stuck with 5.4.
> > > > > > > >
> > > > > > > > I now think this is because of
> > > > > > > > 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
> > > > > > > > and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though
> > > > > > > > I
> > > > > > > > didn't tried to
> > > > > > > > revert them in 5.15 yet.
> > > > > > >
> > > > > > > Odds are 5.18-rc6 is also a problem?
> > > > > >
> > > > > > We believe that
> > > > > >
> > > > > > 6b8a94332ee4 ("nfsd: Fix a write performance regression")
> > > > > >
> > > > > > addresses the performance regression. It was merged into
> > > > > > 5.18-
> > > > > > rc.
> > > > >
> > > > > And into 5.17.4 if someone wants to try that release.
> > > >
> > > > I don't have a lot of time to backport this one myself, so
> > > > I welcome anyone who wants to apply that commit to their
> > > > favorite LTS kernel and test it for us.
> > > >
> > > >
> > > > > > > If so, I'll just wait for the fix to get into Linus's
> > > > > > > tree as
> > > > > > > this does
> > > > > > > not seem to be a stable-tree-only issue.
> > > > > >
> > > > > > Unfortunately I've received a recent report that the fix
> > > > > > introduces
> > > > > > a "sleep while spinlock is held" for NFSv4.0 in rare cases.
> > > > >
> > > > > Ick, not good, any potential fixes for that?
> > > >
> > > > Not yet. I was at LSF last week, so I've just started digging
> > > > into this one. I've confirmed that the report is a real bug,
> > > > but we still don't know how hard it is to hit it with real
> > > > workloads.
> > >
> > > We believe the following, which should be part of the first
> > > NFSD pull request for 5.19, will properly address the splat.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
> > >
> > >
> > Uh... What happens if you have 2 simultaneous calls to
> > nfsd4_release_lockowner() for the same file? i.e. 2 separate
> > processes
> > owned by the same user, both locking the same file.
> >
> > Can't that cause the 'putlist' to get corrupted when both callers
> > add
> > the same nf->nf_putfile to two separate lists?
>
> IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
>
> The first call finds the lockowner in cl_ownerstr_hashtbl and
> unhashes it before releasing cl_lock.
>
> Then the second cannot find that lockowner, thus it can't
> requeue it for bulk_put.
>
> Am I missing something?
In the example I quoted, there are 2 separate processes running on the
client. Those processes could share the same open owner + open stateid,
and hence the same struct nfs4_file, since that depends only on the
process credentials matching. However they will not normally share a
lock owner, since POSIX does not expect different processes to share
locks.
IOW: The point is that one can relatively easily create 2 different
lock owners with different lock stateids that share the same underlying
struct nfs4_file.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On May 20, 2022, at 6:24 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 20, 2022, at 12:40 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On May 11, 2022, at 10:36 AM, Chuck Lever III
>>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On May 11, 2022, at 10:23 AM, Greg KH
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On May 11, 2022, at 8:38 AM, Greg KH
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> starting with 5.4.188 wie see a massive performance
>>>>>>>>> regression on our
>>>>>>>>> nfs-server. It basically is serving requests very very
>>>>>>>>> slowly with cpu
>>>>>>>>> utilization of 100% (with 5.4.187 and earlier it is
>>>>>>>>> 10%) so
>>>>>>>>> that it is
>>>>>>>>> unusable as a fileserver.
>>>>>>>>>
>>>>>>>>> The culprit are commits (or one of it):
>>>>>>>>>
>>>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>>>>>> nfsd_file_lru_dispose()"
>>>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
>>>>>>>>> Containerise filecache
>>>>>>>>> laundrette"
>>>>>>>>>
>>>>>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>>>>
>>>>>>>>> If I revert them in v5.4.192 the kernel works as before
>>>>>>>>> and
>>>>>>>>> performance is
>>>>>>>>> ok again.
>>>>>>>>>
>>>>>>>>> I did not try to revert them one by one as any
>>>>>>>>> disruption
>>>>>>>>> of our nfs-server
>>>>>>>>> is a severe problem for us and I'm not sure if they are
>>>>>>>>> related.
>>>>>>>>>
>>>>>>>>> 5.10 and 5.15 both always performed very badly on our
>>>>>>>>> nfs-
>>>>>>>>> server in a
>>>>>>>>> similar way so we were stuck with 5.4.
>>>>>>>>>
>>>>>>>>> I now think this is because of
>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though
>>>>>>>>> I
>>>>>>>>> didn't tried to
>>>>>>>>> revert them in 5.15 yet.
>>>>>>>>
>>>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>>>>
>>>>>>> We believe that
>>>>>>>
>>>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>>>>>
>>>>>>> addresses the performance regression. It was merged into
>>>>>>> 5.18-
>>>>>>> rc.
>>>>>>
>>>>>> And into 5.17.4 if someone wants to try that release.
>>>>>
>>>>> I don't have a lot of time to backport this one myself, so
>>>>> I welcome anyone who wants to apply that commit to their
>>>>> favorite LTS kernel and test it for us.
>>>>>
>>>>>
>>>>>>>> If so, I'll just wait for the fix to get into Linus's
>>>>>>>> tree as
>>>>>>>> this does
>>>>>>>> not seem to be a stable-tree-only issue.
>>>>>>>
>>>>>>> Unfortunately I've received a recent report that the fix
>>>>>>> introduces
>>>>>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>>>>>
>>>>>> Ick, not good, any potential fixes for that?
>>>>>
>>>>> Not yet. I was at LSF last week, so I've just started digging
>>>>> into this one. I've confirmed that the report is a real bug,
>>>>> but we still don't know how hard it is to hit it with real
>>>>> workloads.
>>>>
>>>> We believe the following, which should be part of the first
>>>> NFSD pull request for 5.19, will properly address the splat.
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>>>>
>>>>
>>> Uh... What happens if you have 2 simultaneous calls to
>>> nfsd4_release_lockowner() for the same file? i.e. 2 separate
>>> processes
>>> owned by the same user, both locking the same file.
>>>
>>> Can't that cause the 'putlist' to get corrupted when both callers
>>> add
>>> the same nf->nf_putfile to two separate lists?
>>
>> IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
>>
>> The first call finds the lockowner in cl_ownerstr_hashtbl and
>> unhashes it before releasing cl_lock.
>>
>> Then the second cannot find that lockowner, thus it can't
>> requeue it for bulk_put.
>>
>> Am I missing something?
>
> In the example I quoted, there are 2 separate processes running on the
> client. Those processes could share the same open owner + open stateid,
> and hence the same struct nfs4_file, since that depends only on the
> process credentials matching. However they will not normally share a
> lock owner, since POSIX does not expect different processes to share
> locks.
>
> IOW: The point is that one can relatively easily create 2 different
> lock owners with different lock stateids that share the same underlying
> struct nfs4_file.
Is there a similar exposure if two different clients are locking
the same file? If so, then we can't use a per-nfs4_client semaphore
to serialize access to the nf_putfile field.
--
Chuck Lever
On Sat, 2022-05-21 at 17:22 +0000, Chuck Lever III wrote:
>
>
> > On May 20, 2022, at 7:43 PM, Chuck Lever III
> > <[email protected]> wrote:
> >
> >
> >
> > > On May 20, 2022, at 6:24 PM, Trond Myklebust
> > > <[email protected]> wrote:
> > >
> > > On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
> > > >
> > > >
> > > > > On May 20, 2022, at 12:40 PM, Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > >
> > > > > On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
> > > > > >
> > > > > >
> > > > > > > On May 11, 2022, at 10:36 AM, Chuck Lever III
> > > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On May 11, 2022, at 10:23 AM, Greg KH
> > > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever
> > > > > > > > III
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > On May 11, 2022, at 8:38 AM, Greg KH
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang
> > > > > > > > > > Walter
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > starting with 5.4.188 wie see a massive
> > > > > > > > > > > performance
> > > > > > > > > > > regression on our
> > > > > > > > > > > nfs-server. It basically is serving requests very
> > > > > > > > > > > very
> > > > > > > > > > > slowly with cpu
> > > > > > > > > > > utilization of 100% (with 5.4.187 and earlier it
> > > > > > > > > > > is
> > > > > > > > > > > 10%) so
> > > > > > > > > > > that it is
> > > > > > > > > > > unusable as a fileserver.
> > > > > > > > > > >
> > > > > > > > > > > The culprit are commits (or one of it):
> > > > > > > > > > >
> > > > > > > > > > > c32f1041382a88b17da5736886da4a492353a1bb "nfsd:
> > > > > > > > > > > cleanup
> > > > > > > > > > > nfsd_file_lru_dispose()"
> > > > > > > > > > > 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
> > > > > > > > > > > Containerise filecache
> > > > > > > > > > > laundrette"
> > > > > > > > > > >
> > > > > > > > > > > (upstream
> > > > > > > > > > > 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
> > > > > > > > > > > 9542e6a643fc69d528dfb3303f145719c61d3050)
> > > > > > > > > > >
> > > > > > > > > > > If I revert them in v5.4.192 the kernel works as
> > > > > > > > > > > before
> > > > > > > > > > > and
> > > > > > > > > > > performance is
> > > > > > > > > > > ok again.
> > > > > > > > > > >
> > > > > > > > > > > I did not try to revert them one by one as any
> > > > > > > > > > > disruption
> > > > > > > > > > > of our nfs-server
> > > > > > > > > > > is a severe problem for us and I'm not sure if
> > > > > > > > > > > they are
> > > > > > > > > > > related.
> > > > > > > > > > >
> > > > > > > > > > > 5.10 and 5.15 both always performed very badly on
> > > > > > > > > > > our
> > > > > > > > > > > nfs-
> > > > > > > > > > > server in a
> > > > > > > > > > > similar way so we were stuck with 5.4.
> > > > > > > > > > >
> > > > > > > > > > > I now think this is because of
> > > > > > > > > > > 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
> > > > > > > > > > > and/or 9542e6a643fc69d528dfb3303f145719c61d3050
> > > > > > > > > > > though
> > > > > > > > > > > I
> > > > > > > > > > > didn't tried to
> > > > > > > > > > > revert them in 5.15 yet.
> > > > > > > > > >
> > > > > > > > > > Odds are 5.18-rc6 is also a problem?
> > > > > > > > >
> > > > > > > > > We believe that
> > > > > > > > >
> > > > > > > > > 6b8a94332ee4 ("nfsd: Fix a write performance
> > > > > > > > > regression")
> > > > > > > > >
> > > > > > > > > addresses the performance regression. It was merged
> > > > > > > > > into
> > > > > > > > > 5.18-
> > > > > > > > > rc.
> > > > > > > >
> > > > > > > > And into 5.17.4 if someone wants to try that release.
> > > > > > >
> > > > > > > I don't have a lot of time to backport this one myself,
> > > > > > > so
> > > > > > > I welcome anyone who wants to apply that commit to their
> > > > > > > favorite LTS kernel and test it for us.
> > > > > > >
> > > > > > >
> > > > > > > > > > If so, I'll just wait for the fix to get into
> > > > > > > > > > Linus's
> > > > > > > > > > tree as
> > > > > > > > > > this does
> > > > > > > > > > not seem to be a stable-tree-only issue.
> > > > > > > > >
> > > > > > > > > Unfortunately I've received a recent report that the
> > > > > > > > > fix
> > > > > > > > > introduces
> > > > > > > > > a "sleep while spinlock is held" for NFSv4.0 in rare
> > > > > > > > > cases.
> > > > > > > >
> > > > > > > > Ick, not good, any potential fixes for that?
> > > > > > >
> > > > > > > Not yet. I was at LSF last week, so I've just started
> > > > > > > digging
> > > > > > > into this one. I've confirmed that the report is a real
> > > > > > > bug,
> > > > > > > but we still don't know how hard it is to hit it with
> > > > > > > real
> > > > > > > workloads.
> > > > > >
> > > > > > We believe the following, which should be part of the first
> > > > > > NFSD pull request for 5.19, will properly address the
> > > > > > splat.
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
> > > > > >
> > > > > >
> > > > > Uh... What happens if you have 2 simultaneous calls to
> > > > > nfsd4_release_lockowner() for the same file? i.e. 2 separate
> > > > > processes
> > > > > owned by the same user, both locking the same file.
> > > > >
> > > > > Can't that cause the 'putlist' to get corrupted when both
> > > > > callers
> > > > > add
> > > > > the same nf->nf_putfile to two separate lists?
> > > >
> > > > IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
> > > >
> > > > The first call finds the lockowner in cl_ownerstr_hashtbl and
> > > > unhashes it before releasing cl_lock.
> > > >
> > > > Then the second cannot find that lockowner, thus it can't
> > > > requeue it for bulk_put.
> > > >
> > > > Am I missing something?
> > >
> > > In the example I quoted, there are 2 separate processes running
> > > on the
> > > client. Those processes could share the same open owner + open
> > > stateid,
> > > and hence the same struct nfs4_file, since that depends only on
> > > the
> > > process credentials matching. However they will not normally
> > > share a
> > > lock owner, since POSIX does not expect different processes to
> > > share
> > > locks.
> > >
> > > IOW: The point is that one can relatively easily create 2
> > > different
> > > lock owners with different lock stateids that share the same
> > > underlying
> > > struct nfs4_file.
> >
> > Is there a similar exposure if two different clients are locking
> > the same file? If so, then we can't use a per-nfs4_client semaphore
> > to serialize access to the nf_putfile field.
>
> I had a thought about an alternate approach.
>
> Create a second nfsd_file_put API that is not allowed to sleep.
> Let's call it "nfsd_file_put_async()". Teach check_for_locked()
> to use that instead of nfsd_file_put().
>
> Here's where I'm a little fuzzy: nfsd_file_put_async() could do
> something like:
>
> void nfsd_file_put_async(struct nfsd_file *nf)
> {
> if (refcount_dec_and_test(&nf->nf_ref))
> nfsd_file_close_inode(nf->nf_inode);
> }
>
>
That approach moves the sync to the garbage collector, which was
exactly what we're trying to avoid in the first place.
Why not just do this "check_for_locks()" thing differently?
It really shouldn't be too hard to add something to
nfsd4_lm_get_owner()/nfsd4_lm_put_owner() that bumps a counter in the
lockowner in order to tell you whether or not locks are still held
instead of doing this bone headed walk through the list of locks.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On May 21, 2022, at 2:10 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2022-05-21 at 17:22 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 20, 2022, at 7:43 PM, Chuck Lever III
>>> <[email protected]> wrote:
>>>
>>>
>>>
>>>> On May 20, 2022, at 6:24 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>
>>>> On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On May 20, 2022, at 12:40 PM, Trond Myklebust
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On May 11, 2022, at 10:36 AM, Chuck Lever III
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 11, 2022, at 10:23 AM, Greg KH
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever
>>>>>>>>> III
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On May 11, 2022, at 8:38 AM, Greg KH
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang
>>>>>>>>>>> Walter
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> starting with 5.4.188 wie see a massive
>>>>>>>>>>>> performance
>>>>>>>>>>>> regression on our
>>>>>>>>>>>> nfs-server. It basically is serving requests very
>>>>>>>>>>>> very
>>>>>>>>>>>> slowly with cpu
>>>>>>>>>>>> utilization of 100% (with 5.4.187 and earlier it
>>>>>>>>>>>> is
>>>>>>>>>>>> 10%) so
>>>>>>>>>>>> that it is
>>>>>>>>>>>> unusable as a fileserver.
>>>>>>>>>>>>
>>>>>>>>>>>> The culprit are commits (or one of it):
>>>>>>>>>>>>
>>>>>>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd:
>>>>>>>>>>>> cleanup
>>>>>>>>>>>> nfsd_file_lru_dispose()"
>>>>>>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
>>>>>>>>>>>> Containerise filecache
>>>>>>>>>>>> laundrette"
>>>>>>>>>>>>
>>>>>>>>>>>> (upstream
>>>>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>>>>>>>
>>>>>>>>>>>> If I revert them in v5.4.192 the kernel works as
>>>>>>>>>>>> before
>>>>>>>>>>>> and
>>>>>>>>>>>> performance is
>>>>>>>>>>>> ok again.
>>>>>>>>>>>>
>>>>>>>>>>>> I did not try to revert them one by one as any
>>>>>>>>>>>> disruption
>>>>>>>>>>>> of our nfs-server
>>>>>>>>>>>> is a severe problem for us and I'm not sure if
>>>>>>>>>>>> they are
>>>>>>>>>>>> related.
>>>>>>>>>>>>
>>>>>>>>>>>> 5.10 and 5.15 both always performed very badly on
>>>>>>>>>>>> our
>>>>>>>>>>>> nfs-
>>>>>>>>>>>> server in a
>>>>>>>>>>>> similar way so we were stuck with 5.4.
>>>>>>>>>>>>
>>>>>>>>>>>> I now think this is because of
>>>>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050
>>>>>>>>>>>> though
>>>>>>>>>>>> I
>>>>>>>>>>>> didn't tried to
>>>>>>>>>>>> revert them in 5.15 yet.
>>>>>>>>>>>
>>>>>>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>>>>>>>
>>>>>>>>>> We believe that
>>>>>>>>>>
>>>>>>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance
>>>>>>>>>> regression")
>>>>>>>>>>
>>>>>>>>>> addresses the performance regression. It was merged
>>>>>>>>>> into
>>>>>>>>>> 5.18-
>>>>>>>>>> rc.
>>>>>>>>>
>>>>>>>>> And into 5.17.4 if someone wants to try that release.
>>>>>>>>
>>>>>>>> I don't have a lot of time to backport this one myself,
>>>>>>>> so
>>>>>>>> I welcome anyone who wants to apply that commit to their
>>>>>>>> favorite LTS kernel and test it for us.
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> If so, I'll just wait for the fix to get into
>>>>>>>>>>> Linus's
>>>>>>>>>>> tree as
>>>>>>>>>>> this does
>>>>>>>>>>> not seem to be a stable-tree-only issue.
>>>>>>>>>>
>>>>>>>>>> Unfortunately I've received a recent report that the
>>>>>>>>>> fix
>>>>>>>>>> introduces
>>>>>>>>>> a "sleep while spinlock is held" for NFSv4.0 in rare
>>>>>>>>>> cases.
>>>>>>>>>
>>>>>>>>> Ick, not good, any potential fixes for that?
>>>>>>>>
>>>>>>>> Not yet. I was at LSF last week, so I've just started
>>>>>>>> digging
>>>>>>>> into this one. I've confirmed that the report is a real
>>>>>>>> bug,
>>>>>>>> but we still don't know how hard it is to hit it with
>>>>>>>> real
>>>>>>>> workloads.
>>>>>>>
>>>>>>> We believe the following, which should be part of the first
>>>>>>> NFSD pull request for 5.19, will properly address the
>>>>>>> splat.
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>>>>>>>
>>>>>>>
>>>>>> Uh... What happens if you have 2 simultaneous calls to
>>>>>> nfsd4_release_lockowner() for the same file? i.e. 2 separate
>>>>>> processes
>>>>>> owned by the same user, both locking the same file.
>>>>>>
>>>>>> Can't that cause the 'putlist' to get corrupted when both
>>>>>> callers
>>>>>> add
>>>>>> the same nf->nf_putfile to two separate lists?
>>>>>
>>>>> IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
>>>>>
>>>>> The first call finds the lockowner in cl_ownerstr_hashtbl and
>>>>> unhashes it before releasing cl_lock.
>>>>>
>>>>> Then the second cannot find that lockowner, thus it can't
>>>>> requeue it for bulk_put.
>>>>>
>>>>> Am I missing something?
>>>>
>>>> In the example I quoted, there are 2 separate processes running
>>>> on the
>>>> client. Those processes could share the same open owner + open
>>>> stateid,
>>>> and hence the same struct nfs4_file, since that depends only on
>>>> the
>>>> process credentials matching. However they will not normally
>>>> share a
>>>> lock owner, since POSIX does not expect different processes to
>>>> share
>>>> locks.
>>>>
>>>> IOW: The point is that one can relatively easily create 2
>>>> different
>>>> lock owners with different lock stateids that share the same
>>>> underlying
>>>> struct nfs4_file.
>>>
>>> Is there a similar exposure if two different clients are locking
>>> the same file? If so, then we can't use a per-nfs4_client semaphore
>>> to serialize access to the nf_putfile field.
>>
>> I had a thought about an alternate approach.
>>
>> Create a second nfsd_file_put API that is not allowed to sleep.
>> Let's call it "nfsd_file_put_async()". Teach check_for_locked()
>> to use that instead of nfsd_file_put().
>>
>> Here's where I'm a little fuzzy: nfsd_file_put_async() could do
>> something like:
>>
>> void nfsd_file_put_async(struct nfsd_file *nf)
>> {
>> if (refcount_dec_and_test(&nf->nf_ref))
>> nfsd_file_close_inode(nf->nf_inode);
>> }
>>
>>
>
> That approach moves the sync to the garbage collector, which was
> exactly what we're trying to avoid in the first place.
Totally understood.
My thought was that "put" for RELEASE_LOCKOWNER/FREE_STATEID
would be unlikely to have any data to sync -- callers that
actually have data to flush are elsewhere, and those would
continue to use the synchronous nfsd_file_put() API.
Do you have a workload where we can test this assumption?
> Why not just do this "check_for_locks()" thing differently?
>
> It really shouldn't be too hard to add something to
> nfsd4_lm_get_owner()/nfsd4_lm_put_owner() that bumps a counter in the
> lockowner in order to tell you whether or not locks are still held
> instead of doing this bone headed walk through the list of locks.
I thought of that a couple weeks ago. That doesn't work
because you can lock or unlock by range. That means the
symmetry of LOCK and LOCKU is not guaranteed, and I don't
believe these calls are used that way anyway. So I
abandoned the idea of using get_owner / put_owner.
But maybe we can provide some other mechanism to record
whether a lockowner is associated with file locks.
--
Chuck Lever
> On May 21, 2022, at 3:49 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2022-05-21 at 19:11 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 21, 2022, at 2:10 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Sat, 2022-05-21 at 17:22 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On May 20, 2022, at 7:43 PM, Chuck Lever III
>>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On May 20, 2022, at 6:24 PM, Trond Myklebust
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On May 20, 2022, at 12:40 PM, Trond Myklebust
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On May 11, 2022, at 10:36 AM, Chuck Lever III
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On May 11, 2022, at 10:23 AM, Greg KH
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck
>>>>>>>>>>> Lever
>>>>>>>>>>> III
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On May 11, 2022, at 8:38 AM, Greg KH
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200,
>>>>>>>>>>>>> Wolfgang
>>>>>>>>>>>>> Walter
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> starting with 5.4.188 wie see a massive
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> regression on our
>>>>>>>>>>>>>> nfs-server. It basically is serving requests
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>> slowly with cpu
>>>>>>>>>>>>>> utilization of 100% (with 5.4.187 and earlier
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> 10%) so
>>>>>>>>>>>>>> that it is
>>>>>>>>>>>>>> unusable as a fileserver.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The culprit are commits (or one of it):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb
>>>>>>>>>>>>>> "nfsd:
>>>>>>>>>>>>>> cleanup
>>>>>>>>>>>>>> nfsd_file_lru_dispose()"
>>>>>>>>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645
>>>>>>>>>>>>>> "nfsd:
>>>>>>>>>>>>>> Containerise filecache
>>>>>>>>>>>>>> laundrette"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (upstream
>>>>>>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>>>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If I revert them in v5.4.192 the kernel works
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> before
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> performance is
>>>>>>>>>>>>>> ok again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did not try to revert them one by one as
>>>>>>>>>>>>>> any
>>>>>>>>>>>>>> disruption
>>>>>>>>>>>>>> of our nfs-server
>>>>>>>>>>>>>> is a severe problem for us and I'm not sure
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> they are
>>>>>>>>>>>>>> related.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 5.10 and 5.15 both always performed very
>>>>>>>>>>>>>> badly on
>>>>>>>>>>>>>> our
>>>>>>>>>>>>>> nfs-
>>>>>>>>>>>>>> server in a
>>>>>>>>>>>>>> similar way so we were stuck with 5.4.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I now think this is because of
>>>>>>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>>>>>>>>> and/or
>>>>>>>>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050
>>>>>>>>>>>>>> though
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> didn't tried to
>>>>>>>>>>>>>> revert them in 5.15 yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>>>>>>>>>
>>>>>>>>>>>> We believe that
>>>>>>>>>>>>
>>>>>>>>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance
>>>>>>>>>>>> regression")
>>>>>>>>>>>>
>>>>>>>>>>>> addresses the performance regression. It was
>>>>>>>>>>>> merged
>>>>>>>>>>>> into
>>>>>>>>>>>> 5.18-
>>>>>>>>>>>> rc.
>>>>>>>>>>>
>>>>>>>>>>> And into 5.17.4 if someone wants to try that
>>>>>>>>>>> release.
>>>>>>>>>>
>>>>>>>>>> I don't have a lot of time to backport this one
>>>>>>>>>> myself,
>>>>>>>>>> so
>>>>>>>>>> I welcome anyone who wants to apply that commit to
>>>>>>>>>> their
>>>>>>>>>> favorite LTS kernel and test it for us.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> If so, I'll just wait for the fix to get into
>>>>>>>>>>>>> Linus's
>>>>>>>>>>>>> tree as
>>>>>>>>>>>>> this does
>>>>>>>>>>>>> not seem to be a stable-tree-only issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I've received a recent report that
>>>>>>>>>>>> the
>>>>>>>>>>>> fix
>>>>>>>>>>>> introduces
>>>>>>>>>>>> a "sleep while spinlock is held" for NFSv4.0 in
>>>>>>>>>>>> rare
>>>>>>>>>>>> cases.
>>>>>>>>>>>
>>>>>>>>>>> Ick, not good, any potential fixes for that?
>>>>>>>>>>
>>>>>>>>>> Not yet. I was at LSF last week, so I've just started
>>>>>>>>>> digging
>>>>>>>>>> into this one. I've confirmed that the report is a
>>>>>>>>>> real
>>>>>>>>>> bug,
>>>>>>>>>> but we still don't know how hard it is to hit it with
>>>>>>>>>> real
>>>>>>>>>> workloads.
>>>>>>>>>
>>>>>>>>> We believe the following, which should be part of the
>>>>>>>>> first
>>>>>>>>> NFSD pull request for 5.19, will properly address the
>>>>>>>>> splat.
>>>>>>>>>
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Uh... What happens if you have 2 simultaneous calls to
>>>>>>>> nfsd4_release_lockowner() for the same file? i.e. 2
>>>>>>>> separate
>>>>>>>> processes
>>>>>>>> owned by the same user, both locking the same file.
>>>>>>>>
>>>>>>>> Can't that cause the 'putlist' to get corrupted when both
>>>>>>>> callers
>>>>>>>> add
>>>>>>>> the same nf->nf_putfile to two separate lists?
>>>>>>>
>>>>>>> IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
>>>>>>>
>>>>>>> The first call finds the lockowner in cl_ownerstr_hashtbl
>>>>>>> and
>>>>>>> unhashes it before releasing cl_lock.
>>>>>>>
>>>>>>> Then the second cannot find that lockowner, thus it can't
>>>>>>> requeue it for bulk_put.
>>>>>>>
>>>>>>> Am I missing something?
>>>>>>
>>>>>> In the example I quoted, there are 2 separate processes
>>>>>> running
>>>>>> on the
>>>>>> client. Those processes could share the same open owner +
>>>>>> open
>>>>>> stateid,
>>>>>> and hence the same struct nfs4_file, since that depends only
>>>>>> on
>>>>>> the
>>>>>> process credentials matching. However they will not normally
>>>>>> share a
>>>>>> lock owner, since POSIX does not expect different processes
>>>>>> to
>>>>>> share
>>>>>> locks.
>>>>>>
>>>>>> IOW: The point is that one can relatively easily create 2
>>>>>> different
>>>>>> lock owners with different lock stateids that share the same
>>>>>> underlying
>>>>>> struct nfs4_file.
>>>>>
>>>>> Is there a similar exposure if two different clients are
>>>>> locking
>>>>> the same file? If so, then we can't use a per-nfs4_client
>>>>> semaphore
>>>>> to serialize access to the nf_putfile field.
>>>>
>>>> I had a thought about an alternate approach.
>>>>
>>>> Create a second nfsd_file_put API that is not allowed to sleep.
>>>> Let's call it "nfsd_file_put_async()". Teach check_for_locked()
>>>> to use that instead of nfsd_file_put().
>>>>
>>>> Here's where I'm a little fuzzy: nfsd_file_put_async() could do
>>>> something like:
>>>>
>>>> void nfsd_file_put_async(struct nfsd_file *nf)
>>>> {
>>>> if (refcount_dec_and_test(&nf->nf_ref))
>>>> nfsd_file_close_inode(nf->nf_inode);
>>>> }
>>>>
>>>>
>>>
>>> That approach moves the sync to the garbage collector, which was
>>> exactly what we're trying to avoid in the first place.
>>
>> Totally understood.
>>
>> My thought was that "put" for RELEASE_LOCKOWNER/FREE_STATEID
>> would be unlikely to have any data to sync -- callers that
>> actually have data to flush are elsewhere, and those would
>> continue to use the synchronous nfsd_file_put() API.
>>
>> Do you have a workload where we can test this assumption?
>>
>>
>>> Why not just do this "check_for_locks()" thing differently?
>>>
>>> It really shouldn't be too hard to add something to
>>> nfsd4_lm_get_owner()/nfsd4_lm_put_owner() that bumps a counter in
>>> the
>>> lockowner in order to tell you whether or not locks are still held
>>> instead of doing this bone headed walk through the list of locks.
>>
>> I thought of that a couple weeks ago. That doesn't work
>> because you can lock or unlock by range. That means the
>> symmetry of LOCK and LOCKU is not guaranteed, and I don't
>> believe these calls are used that way anyway. So I
>> abandoned the idea of using get_owner / put_owner.
>>
>
> Then you're misunderstanding how it works. lm_get_owner() is called
> when a lock is initialised from another one. The whole point is to
> ensure that each and every object representing a range lock on the
> inode's list maintains its own private reference to the knfsd lockowner
> (i.e. the fl->fl_owner).
>
> For instance when a LOCK call calls posix_lock_inode(), then that
> function uses locks_copy_conflock() (which calls lm_get_owner) to
> initialise the range lock object that is being put on the inode list.
> If the new lock causes multiple existing locks to be replaced, they all
> call lm_put_owner to release their references to fl->fl_owner as part
> of the process of being freed.
>
> Conversely, when LOCKU causes a range to get split, the two locks that
> replace the old one are both initialised using locks_copy_conflock(),
> so they both call lm_get_owner. The lock that represents the range
> being replaced is then made to call lm_put_owner() when it is freed.
>
> etc, etc...
That is definitely not what it looked like when I traced it. The
reference count managed by get_owner / put_owner did not seem
to be usable for tracking whether a lockowner had locks or not.
The reference count was put pretty quickly after the lm_get_owner
call.
But I'm not interested in an argument. I'll go back and look at
get_owner / put_owner again, because I agree that not having to
traverse the inode's flc_posix list during check_for_locks() would
be awesome sauce.
--
Chuck Lever
> On May 20, 2022, at 12:40 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>>
>>
>>> On May 11, 2022, at 10:36 AM, Chuck Lever III
>>> <[email protected]> wrote:
>>>
>>>
>>>
>>>> On May 11, 2022, at 10:23 AM, Greg KH
>>>> <[email protected]> wrote:
>>>>
>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On May 11, 2022, at 8:38 AM, Greg KH
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> starting with 5.4.188 wie see a massive performance
>>>>>>> regression on our
>>>>>>> nfs-server. It basically is serving requests very very
>>>>>>> slowly with cpu
>>>>>>> utilization of 100% (with 5.4.187 and earlier it is 10%) so
>>>>>>> that it is
>>>>>>> unusable as a fileserver.
>>>>>>>
>>>>>>> The culprit are commits (or one of it):
>>>>>>>
>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>>>> nfsd_file_lru_dispose()"
>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
>>>>>>> Containerise filecache
>>>>>>> laundrette"
>>>>>>>
>>>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>>
>>>>>>> If I revert them in v5.4.192 the kernel works as before and
>>>>>>> performance is
>>>>>>> ok again.
>>>>>>>
>>>>>>> I did not try to revert them one by one as any disruption
>>>>>>> of our nfs-server
>>>>>>> is a severe problem for us and I'm not sure if they are
>>>>>>> related.
>>>>>>>
>>>>>>> 5.10 and 5.15 both always performed very badly on our nfs-
>>>>>>> server in a
>>>>>>> similar way so we were stuck with 5.4.
>>>>>>>
>>>>>>> I now think this is because of
>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I
>>>>>>> didn't tried to
>>>>>>> revert them in 5.15 yet.
>>>>>>
>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>>
>>>>> We believe that
>>>>>
>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>>>
>>>>> addresses the performance regression. It was merged into 5.18-
>>>>> rc.
>>>>
>>>> And into 5.17.4 if someone wants to try that release.
>>>
>>> I don't have a lot of time to backport this one myself, so
>>> I welcome anyone who wants to apply that commit to their
>>> favorite LTS kernel and test it for us.
>>>
>>>
>>>>>> If so, I'll just wait for the fix to get into Linus's tree as
>>>>>> this does
>>>>>> not seem to be a stable-tree-only issue.
>>>>>
>>>>> Unfortunately I've received a recent report that the fix
>>>>> introduces
>>>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>>>
>>>> Ick, not good, any potential fixes for that?
>>>
>>> Not yet. I was at LSF last week, so I've just started digging
>>> into this one. I've confirmed that the report is a real bug,
>>> but we still don't know how hard it is to hit it with real
>>> workloads.
>>
>> We believe the following, which should be part of the first
>> NFSD pull request for 5.19, will properly address the splat.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>>
>>
> Uh... What happens if you have 2 simultaneous calls to
> nfsd4_release_lockowner() for the same file? i.e. 2 separate processes
> owned by the same user, both locking the same file.
>
> Can't that cause the 'putlist' to get corrupted when both callers add
> the same nf->nf_putfile to two separate lists?
IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
The first call finds the lockowner in cl_ownerstr_hashtbl and
unhashes it before releasing cl_lock.
Then the second cannot find that lockowner, thus it can't
requeue it for bulk_put.
Am I missing something?
--
Chuck Lever
> On May 20, 2022, at 7:43 PM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On May 20, 2022, at 6:24 PM, Trond Myklebust <[email protected]> wrote:
>>
>> On Fri, 2022-05-20 at 21:52 +0000, Chuck Lever III wrote:
>>>
>>>
>>>> On May 20, 2022, at 12:40 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>
>>>> On Fri, 2022-05-20 at 15:36 +0000, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On May 11, 2022, at 10:36 AM, Chuck Lever III
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On May 11, 2022, at 10:23 AM, Greg KH
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> On Wed, May 11, 2022 at 02:16:19PM +0000, Chuck Lever III
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On May 11, 2022, at 8:38 AM, Greg KH
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, May 11, 2022 at 12:03:13PM +0200, Wolfgang Walter
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> starting with 5.4.188 wie see a massive performance
>>>>>>>>>> regression on our
>>>>>>>>>> nfs-server. It basically is serving requests very very
>>>>>>>>>> slowly with cpu
>>>>>>>>>> utilization of 100% (with 5.4.187 and earlier it is
>>>>>>>>>> 10%) so
>>>>>>>>>> that it is
>>>>>>>>>> unusable as a fileserver.
>>>>>>>>>>
>>>>>>>>>> The culprit are commits (or one of it):
>>>>>>>>>>
>>>>>>>>>> c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
>>>>>>>>>> nfsd_file_lru_dispose()"
>>>>>>>>>> 628adfa21815f74c04724abc85847f24b5dd1645 "nfsd:
>>>>>>>>>> Containerise filecache
>>>>>>>>>> laundrette"
>>>>>>>>>>
>>>>>>>>>> (upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
>>>>>>>>>> 9542e6a643fc69d528dfb3303f145719c61d3050)
>>>>>>>>>>
>>>>>>>>>> If I revert them in v5.4.192 the kernel works as before
>>>>>>>>>> and
>>>>>>>>>> performance is
>>>>>>>>>> ok again.
>>>>>>>>>>
>>>>>>>>>> I did not try to revert them one by one as any
>>>>>>>>>> disruption
>>>>>>>>>> of our nfs-server
>>>>>>>>>> is a severe problem for us and I'm not sure if they are
>>>>>>>>>> related.
>>>>>>>>>>
>>>>>>>>>> 5.10 and 5.15 both always performed very badly on our
>>>>>>>>>> nfs-
>>>>>>>>>> server in a
>>>>>>>>>> similar way so we were stuck with 5.4.
>>>>>>>>>>
>>>>>>>>>> I now think this is because of
>>>>>>>>>> 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
>>>>>>>>>> and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though
>>>>>>>>>> I
>>>>>>>>>> didn't tried to
>>>>>>>>>> revert them in 5.15 yet.
>>>>>>>>>
>>>>>>>>> Odds are 5.18-rc6 is also a problem?
>>>>>>>>
>>>>>>>> We believe that
>>>>>>>>
>>>>>>>> 6b8a94332ee4 ("nfsd: Fix a write performance regression")
>>>>>>>>
>>>>>>>> addresses the performance regression. It was merged into
>>>>>>>> 5.18-
>>>>>>>> rc.
>>>>>>>
>>>>>>> And into 5.17.4 if someone wants to try that release.
>>>>>>
>>>>>> I don't have a lot of time to backport this one myself, so
>>>>>> I welcome anyone who wants to apply that commit to their
>>>>>> favorite LTS kernel and test it for us.
>>>>>>
>>>>>>
>>>>>>>>> If so, I'll just wait for the fix to get into Linus's
>>>>>>>>> tree as
>>>>>>>>> this does
>>>>>>>>> not seem to be a stable-tree-only issue.
>>>>>>>>
>>>>>>>> Unfortunately I've received a recent report that the fix
>>>>>>>> introduces
>>>>>>>> a "sleep while spinlock is held" for NFSv4.0 in rare cases.
>>>>>>>
>>>>>>> Ick, not good, any potential fixes for that?
>>>>>>
>>>>>> Not yet. I was at LSF last week, so I've just started digging
>>>>>> into this one. I've confirmed that the report is a real bug,
>>>>>> but we still don't know how hard it is to hit it with real
>>>>>> workloads.
>>>>>
>>>>> We believe the following, which should be part of the first
>>>>> NFSD pull request for 5.19, will properly address the splat.
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=for-next&id=556082f5e5d7ecfd0ee45c3641e2b364bff9ee44
>>>>>
>>>>>
>>>> Uh... What happens if you have 2 simultaneous calls to
>>>> nfsd4_release_lockowner() for the same file? i.e. 2 separate
>>>> processes
>>>> owned by the same user, both locking the same file.
>>>>
>>>> Can't that cause the 'putlist' to get corrupted when both callers
>>>> add
>>>> the same nf->nf_putfile to two separate lists?
>>>
>>> IIUC, cl_lock serializes the two RELEASE_LOCKOWNER calls.
>>>
>>> The first call finds the lockowner in cl_ownerstr_hashtbl and
>>> unhashes it before releasing cl_lock.
>>>
>>> Then the second cannot find that lockowner, thus it can't
>>> requeue it for bulk_put.
>>>
>>> Am I missing something?
>>
>> In the example I quoted, there are 2 separate processes running on the
>> client. Those processes could share the same open owner + open stateid,
>> and hence the same struct nfs4_file, since that depends only on the
>> process credentials matching. However they will not normally share a
>> lock owner, since POSIX does not expect different processes to share
>> locks.
>>
>> IOW: The point is that one can relatively easily create 2 different
>> lock owners with different lock stateids that share the same underlying
>> struct nfs4_file.
>
> Is there a similar exposure if two different clients are locking
> the same file? If so, then we can't use a per-nfs4_client semaphore
> to serialize access to the nf_putfile field.
I had a thought about an alternate approach.
Create a second nfsd_file_put API that is not allowed to sleep.
Let's call it "nfsd_file_put_async()". Teach check_for_locked()
to use that instead of nfsd_file_put().
Here's where I'm a little fuzzy: nfsd_file_put_async() could do
something like:
void nfsd_file_put_async(struct nfsd_file *nf)
{
if (refcount_dec_and_test(&nf->nf_ref))
nfsd_file_close_inode(nf->nf_inode);
}
--
Chuck Lever