Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1093052imm; Fri, 15 Jun 2018 11:03:58 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJKeY1YvIrD+3BcQa7+AI4FwgbIUg0kKYZFgiKgYh07noRsnWCkLvyXJZG3wgN06e0XnziB X-Received: by 2002:a62:3e9a:: with SMTP id y26-v6mr3048885pfj.98.1529085838892; Fri, 15 Jun 2018 11:03:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529085838; cv=none; d=google.com; s=arc-20160816; b=ktD2dEgYii/r5p2coRrQGY0XXszTGnN5+HT6S19CrWJWUPJU/VBes8JDxcYngk52BZ g1TR6CI0r1VVR8+5DzijBkptmnPPSQyRpjnqhev1GWwYxrz6Pc35hof0gNqP44axIZjV 6LwXDy6uZAFe2mVmQB0fulY8vdwPIH2KUK4Hx6XTvEFAcGxmGVsJYEPoo7v3tfiuQ1mP qWgEjxPuOP8TJqvLT/05FP2krZBvflH/Dtt6wJzBRhi50L/04/5MYhX/lrIePR02tywA uRQNaiW3hXHqznhP/OlXHkI7OozG2xPtkAQf5Au1V9FhKDuVE7XH/4aJ1D42RAAhjSDD FyYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=MdB+/NgF4XtTtqlQYPC0E+AGTsNI/59dzd9B/RGIyGw=; b=WIpyPOKzTZ9SjHXf734FguP16vFiEccqG02CJ+HXg7F1So/aDGqIazsX1fBDx4Q+6g 5H5+wmTLjlIpGY33EueIHx7lAmkkDrg6GA2Nz46+PeQkFGbGiS37mi8Ig2ea7dJ+hS8e Xud3vFWxt42r8iy/9bSmMyE6bnvZSbzx78wll0Ifhsckv4UhrLqfWwirC0bgJtukXAaR 2xigH0u+0pZM8eWAbz3z8RVQ8yrfVZNkbbrOCeXXLpGSpRV6RD21xZeFA6fGOyqAUSy9 tU4ou9m/j9vDS6AQELZzfr2XzAr4TpZN1Yo4CGkl6F+UdYpiQfTS88DAIwiDs0Mr7sUU fakA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5-v6si6992955pgu.103.2018.06.15.11.03.44; Fri, 15 Jun 2018 11:03:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756214AbeFOSDT (ORCPT + 99 others); Fri, 15 Jun 2018 14:03:19 -0400 Received: from static-151-200-251-90.washdc.east.verizon.net ([151.200.251.90]:58126 "EHLO static-151-200-251-90.washdc.east.verizon.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752703AbeFOSDR (ORCPT ); Fri, 15 Jun 2018 14:03:17 -0400 X-Greylist: delayed 829 seconds by postgrey-1.27 at vger.kernel.org; Fri, 15 Jun 2018 14:03:17 EDT Received: from localhost ([127.0.0.1]) by Zia.metrics.net with esmtp (Exim 4.91) (envelope-from ) id 1fTsqk-0006b3-PY; Fri, 15 Jun 2018 13:49:27 -0400 Subject: Re: fscache: fix a kernel BUG at fs/fscache/operation.c:69! To: carmark.dlut@gmail.com Cc: David Howells , linux-cachefs@redhat.com, linux-kernel@vger.kernel.org References: <20180222073330.36259-1-carmark.dlut@gmail.com> From: Anthony DeRobertis Message-ID: Date: Fri, 15 Jun 2018 13:49:24 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180222073330.36259-1-carmark.dlut@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/22/2018 02:33 AM, carmark.dlut@gmail.com wrote: > From: Lei Xue > > There is a potential race in fscache operation enqueuing for reading and > copying multiple pages from cachefiles to netfs. > Under some heavy load system, it will happen very often. Did anything happen with this patch? I still saw it with 4.16.12 (panic below, captured via netconsole) when running ffmpeg with a source on a cached NFSv3 share. (Actually an overlay on top of that cached NFS mount). Applying this patch seems to have fixed the issue. The only weird thing I've noticed — not sure it's actually new from this patch — is that midway through the encode, it stopped using the cached version and started transferring it from the server again. But even so, that's far better than a panic that takes out the machine (until the watchdog forces a reset). ps: please cc me on replies, not subscribed to the mailing list. [51593.345396] FS-Cache: [51593.345406] FS-Cache: Assertion failed [51593.345409] FS-Cache: 0 > 0 is false [51593.345421] ------------[ cut here ]------------ [51593.345424] kernel BUG at /build/linux-43CEzF/linux-4.16.12/fs/fscache/operation.c:68! [51593.345430] invalid opcode: 0000 [#1] SMP PTI [51593.345433] Modules linked in: netconsole configfs ebtable_filter ebtables devlink ip6table_filter ip6_tables iptable_filter overlay nfsv3 nfs tun cachefiles fscache binfmt_misc snd_usb_audio snd_hda_codec_hdmi snd_usbmidi_lib xts dm_crypt hid_generic usbhid snd_rawmidi snd_seq_device hid pktcdvd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec bridge snd_hda_core stp llc intel_powerclamp snd_hwdep kvm_intel kvm snd_pcm_oss mxm_wmi snd_mixer_oss snd_pcm snd_timer irqbypass intel_cstate uhci_hcd intel_uncore r8169 pcspkr ehci_pci xhci_pci xhci_hcd wmi snd ehci_hcd i2c_i801 firewire_ohci sr_mod soundcore usbcore firewire_core crc_itu_t cdrom mii sg button asus_atk0110 lpc_ich shpchp usb_common i5500_temp i7core_edac nvidia_drm(PO) drm_kms_helper drm nvidia_modeset(PO) nvidia(PO) [51593.345468] ipmi_devintf ipmi_msghandler nfsd coretemp auth_rpcgss nfs_acl lockd grace sunrpc i2c_dev iTCO_wdt iTCO_vendor_support acpi_cpufreq msr loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 dm_mod raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid0 multipath linear raid1 raid10 md_mod sd_mod ahci libahci crc32c_intel evdev serio_raw libata scsi_mod [51593.345493] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P W IO 4.16.0-2-amd64 #1 Debian 4.16.12-1 [51593.345495] Hardware name: System manufacturer System Product Name/SABERTOOTH X58, BIOS 1402 08/09/2012 [51593.345502] RIP: 0010:fscache_enqueue_operation+0x106/0x150 [fscache] [51593.345504] RSP: 0018:ffff91a767383c98 EFLAGS: 00010082 [51593.345508] RAX: 0000000000000018 RBX: ffff91a7290d4d80 RCX: 0000000000000006 [51593.345510] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff91a767396730 [51593.345512] RBP: ffff91a3359f8a20 R08: 000000000000043e R09: 0000000000000000 [51593.345515] R10: ffff91a767383c80 R11: 0000000000000001 R12: ffff91a7290d4d80 [51593.345517] R13: ffff91a7290d4e08 R14: ffff91a703575e20 R15: ffff91a7290d4e08 [51593.345520] FS: 0000000000000000(0000) GS:ffff91a767380000(0000) knlGS:0000000000000000 [51593.345522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [51593.345524] CR2: 000055627b19b000 CR3: 000000015680a000 CR4: 00000000000006e0 [51593.345528] Call Trace: [51593.345531] [51593.345536] cachefiles_read_waiter+0xf8/0x150 [cachefiles] [51593.345541] __wake_up_common+0x76/0x170 [51593.345544] wake_up_page_bit+0xac/0x100 [51593.345561] mpage_end_io+0x30/0x90 [ext4] [51593.345568] clone_endio+0x8e/0x140 [dm_mod] [51593.345572] raid_end_bio_io+0x2c/0xb0 [raid10] [51593.345576] raid10_end_read_request+0x71/0x150 [raid10] [51593.345581] blk_update_request+0x78/0x2b0 [51593.345589] scsi_end_request+0x2c/0x1e0 [scsi_mod] [51593.345597] scsi_io_completion+0x477/0x670 [scsi_mod] [51593.345600] blk_done_softirq+0xa0/0xd0 [51593.345605] __do_softirq+0xde/0x2b4 [51593.345609] irq_exit+0xae/0xb0 [51593.345612] do_IRQ+0x7d/0xc0 [51593.345615] common_interrupt+0xf/0xf [51593.345617] [51593.345622] RIP: 0010:cpuidle_enter_state+0xa0/0x2b0 [51593.345624] RSP: 0018:ffffb3f1431c7eb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc [51593.345627] RAX: ffff91a7673a18c0 RBX: 00002eec8305e1d8 RCX: 000000000000001f [51593.345629] RDX: 00002eec8305e1d8 RSI: 000000002d9c529c RDI: 0000000000000000 [51593.345632] RBP: ffff91a7673a9948 R08: 0000000000000001 R09: 0000000000000135 [51593.345634] R10: ffffb3f1431c7e90 R11: 00000000000000a0 R12: 0000000000000003 [51593.345636] R13: ffffffff9cab2cd8 R14: 0000000000000000 R15: 00002eec83032f50 [51593.345641] do_idle+0x183/0x1e0 [51593.345644] cpu_startup_entry+0x6f/0x80 [51593.345649] start_secondary+0x1a4/0x1f0 [51593.345652] secondary_startup_64+0xa5/0xb0 [51593.345655] Code: 0b 48 c7 c7 63 d9 7d c1 e8 72 dd 2f da 48 c7 c7 71 d9 7d c1 e8 66 dd 2f da 48 63 73 44 31 d2 48 c7 c7 a0 ce 7d c1 e8 54 dd 2f da <0f> 0b 48 c7 c7 63 d9 7d c1 e8 46 dd 2f da 48 c7 c7 71 d9 7d c1 [51593.345682] RIP: fscache_enqueue_operation+0x106/0x150 [fscache] RSP: ffff91a767383c98 [51593.345688] ---[ end trace fe061122063cafae ]--- [51593.345690] Kernel panic - not syncing: Fatal exception in interrupt [51593.345782] Kernel Offset: 0x1aa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [51593.345786] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [51593.345793] WARNING: CPU: 6 PID: 0 at /build/linux-43CEzF/linux-4.16.12/kernel/sched/core.c:1190 set_task_cpu+0x14d/0x160 [51593.345795] Modules linked in: netconsole configfs ebtable_filter ebtables devlink ip6table_filter ip6_tables iptable_filter overlay nfsv3 nfs tun cachefiles fscache binfmt_misc snd_usb_audio snd_hda_codec_hdmi snd_usbmidi_lib xts dm_crypt hid_generic usbhid snd_rawmidi snd_seq_device hid pktcdvd snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec bridge snd_hda_core stp llc intel_powerclamp snd_hwdep kvm_intel kvm snd_pcm_oss mxm_wmi snd_mixer_oss snd_pcm snd_timer irqbypass intel_cstate uhci_hcd intel_uncore r8169 pcspkr ehci_pci xhci_pci xhci_hcd wmi snd ehci_hcd i2c_i801 firewire_ohci sr_mod soundcore usbcore firewire_core crc_itu_t cdrom mii sg button asus_atk0110 lpc_ich shpchp usb_common i5500_temp i7core_edac nvidia_drm(PO) drm_kms_helper drm nvidia_modeset(PO) nvidia(PO) [51593.345823] ipmi_devintf ipmi_msghandler nfsd coretemp auth_rpcgss nfs_acl lockd grace sunrpc i2c_dev iTCO_wdt iTCO_vendor_support acpi_cpufreq msr loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 dm_mod raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid0 multipath linear raid1 raid10 md_mod sd_mod ahci libahci crc32c_intel evdev serio_raw libata scsi_mod [51593.345844] CPU: 6 PID: 0 Comm: swapper/6 Tainted: P D W IO 4.16.0-2-amd64 #1 Debian 4.16.12-1 [51593.345846] Hardware name: System manufacturer System Product Name/SABERTOOTH X58, BIOS 1402 08/09/2012 [51593.345849] RIP: 0010:set_task_cpu+0x14d/0x160 [51593.345851] RSP: 0018:ffff91a7673837f0 EFLAGS: 00010006 [51593.345853] RAX: 0000000000000200 RBX: ffff91a74e31c380 RCX: 0000000000000000 [51593.345855] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91a74e31c380 [51593.345857] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000006 [51593.345859] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000 [51593.345861] R13: 0000000000000000 R14: 0000000000000046 R15: 00000000000218c0 [51593.345864] FS: 0000000000000000(0000) GS:ffff91a767380000(0000) knlGS:0000000000000000 [51593.345866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [51593.345868] CR2: 000055627b19b000 CR3: 000000015680a000 CR4: 00000000000006e0 [51593.345870] Call Trace: [51593.345872] [51593.345875] try_to_wake_up+0x154/0x460 [51593.345879] autoremove_wake_function+0x11/0x50 [51593.345881] __wake_up_common+0x76/0x170 [51593.345884] __wake_up_common_lock+0x7c/0xc0 [51593.345888] irq_work_run_list+0x4d/0x70 [51593.345891] smp_irq_work_interrupt+0x32/0xc0 [51593.345894] irq_work_interrupt+0xf/0x20 [51593.345898] RIP: 0010:panic+0x1fb/0x252 [51593.345900] RSP: 0018:ffff91a767383a48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff09 [51593.345903] RAX: 0000000000000041 RBX: 0000000000000000 RCX: 0000000000000006 [51593.345905] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff91a767396730 [51593.345908] RBP: ffff91a767383ac0 R08: 0000000000000472 R09: 0000000000000000 [51593.345910] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [51593.345912] R13: 0000000000000000 R14: ffffffff9c80922d R15: 0000000000000004 [51593.345915] ? irq_work_interrupt+0xa/0x20 [51593.345919] ? panic+0x1f7/0x252 [51593.345923] oops_end+0xac/0xc0 [51593.345927] do_trap+0x132/0x140 [51593.345930] do_error_trap+0x9d/0x120 [51593.345934] ? fscache_enqueue_operation+0x106/0x150 [fscache] [51593.345938] ? up+0x12/0x60 [51593.345941] ? console_unlock+0x29d/0x550 [51593.345944] invalid_op+0x1b/0x40 [51593.345949] RIP: 0010:fscache_enqueue_operation+0x106/0x150 [fscache]