Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933903AbcJZVxX (ORCPT ); Wed, 26 Oct 2016 17:53:23 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:37654 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933292AbcJZVxS (ORCPT ); Wed, 26 Oct 2016 17:53:18 -0400 Subject: Re: bio linked list corruption. To: Linus Torvalds , Dave Jones , Andy Lutomirski , "Andy Lutomirski" , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner References: <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> <20161022152033.gkmm3l75kqjzsije@codemonkey.org.uk> <20161024044051.onmh4h6sc2bjxzzc@codemonkey.org.uk> <77d9983d-a00a-1dc1-a9a1-631de1d0c146@fb.com> <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161026163018.wx57yy554576s6e2@codemonkey.org.uk> <20161026184201.6ofblkd3j5uxystq@codemonkey.org.uk> <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> From: Chris Mason Message-ID: Date: Wed, 26 Oct 2016 17:52:52 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c091:180::1:54e] X-ClientProxiedBy: BN6PR14CA0015.namprd14.prod.outlook.com (10.173.157.153) To CY4PR15MB1238.namprd15.prod.outlook.com (10.172.178.137) X-MS-Office365-Filtering-Correlation-Id: d94a3b6c-0fcf-42d4-0ecf-08d3fdea777d X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1238;2:RkSvvWu3AfXJdQ/z0umzCzhn5bPycL0A5MivGHV9MrT2VsY3KFBoH4eyUy+P4Zlya0B0SFaLNl7uyCzh3VcI5evAXc/VDr2KUNE/hyAssGbU6QyvRV8pc/wEX15QIouTL0bDGDRxJnvBKWiR+yKakgApEK42pbd8bG2M4LC2isAv64upeSw1vgWCHQIePvKnuarbyb7GcK4K5cWazpQZ+A==;3:FxURpSF74/z74Nt5nazd3kDLz5IhUc2ZLGjClIj9hQNzrYJXSzSlabGllf3H+cA28jPZbF6/eGLf9HoRxPwM4409P5VVoGAvXMdpD1wHkGcD1zukAJO360WlQrARqdB/aijK/4IKpxpPevVZccgOFQ== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1238; X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1238;25:mrkQWh+apO5FRyLiTvedBQz/Zpyr9IX8gKlD+hwdnauM8t8BNjM97vxIr3ju7eoioobHVaenYtx9/AZw5Dk8YA2X+LlxADq2yjprwByQchg9JgmsclJDurp94+cgp28/Vx9/XvLl0cahoO8YTh/N+mpb4f7cYkY7/q+yBcbj/o2/k3oYjdbcjSLO2W9iCnQ5qHlY3yMmA3mzSx1UBm69H022YCqd6QlxEbTvHNFf3wnLwjlxRZil+tce1gRd5MiSXf9acjCzGtjnkvNOY3x2CgXaFi5SeDBuRlXE39skpX3ymVY9rebLIc6sS3yv1hXZU9wUSEL3hc9oOFZH1A/qDVW+CPeyLj7wxI1wrErdT6RkumxBmYngVeECN3WQRnkv4tE4YKK6IyZ7MayJwd0mf7/eOhQNez89cfsBderIGSWzZv5Ot9k+Zht0wEfmlIzSuct+D15psaEr/7eiUhKWzZk61wAnXp4i7ujjbUTP4W8n2HpfCGpSE6N+MrawUfuHil8ZilMVcgivu6tnd036JCEZ1LeOEFotqrMgoTbSOSy7xU5Hnm4bdf+naItcK3lxwdr6kMELPxsD8pQ1L7emxQiZbasISMnDRPlnvFt9dM+LbCB6fsLqS61vCydEmL4ezhczyMIlOj7eDP5pjHyxjQaVeMSttH98jQLxQkbV4t5UCcfdfzSCI/mrRvf4IpR0IX3H7IM0Em2wsSW9pqCnKOxbN13QPpGypRHqafSzygGx9Fi2As6Rz/yvQFmI1IX9 X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1238;31:cXOFoeqQpINEP2fRqwYnzcTP/DbRqmllKgw77tLOhs4/2h1IBDjlwDsxNDSYXxKV41d5WiuSuZncjwWAK/Hhj9hxYdgYLC+MFfR/8DU8J/IXv8K5XxtA8gozRsCHzPdplISHNZ28H3IWBkMWnTg1AAXnURs4A+nvNDgeKtKgF3NFceexmjuu2rI6EDuDHiCTx+RQJV7f3XQ5wguoi3xA+78LqMsn6PdGnlcUsGhBsR56YKTfb0JKHxdnLRxD5Jlk;20:2SjtE1TL2D23EoteK5A2B68u7MohpTLx3G7Vvla2Q2apLwT28EuTZx4QUosTTd18D7rYnJDPNwc9vumafg/b3NlyLUrR98cEOWCTD7e4NDsImxRdgpSupJD/Tc4g7FJCTTDDL6fmECAOGg021oB8VgxXlhSfWxPrEMt8sEi2hvo=;4:9S3pNsbUCA/5xrVQboOn2CFPY828kySods2XN82ijQWVkabgFhW2VWzLEUtPeBScokGM/j4c7Axuu/4WZhRhUq0QlN4TWgjHkUioPU//JojtOrBYiqIKv8uljrVw1eFnhk7EC+9y58RD2A7FhlJx9ik/GoSNDGylCPLzqGimjcTSWmcqhlVb8xHiFgnB4RZY32OzZbfxd+06ffj8Jf58lDuxyGk5ZCHUrqGW15XoCETcdhUbFsBgoWZqxO0cwe2S6zH7fy2z2vUUXH55IbocS0VUwMTykW0RPeNRLgIWX7NE4QeAEfHscn0A2/itWfs8/TTieEMWyqpqxnzbbA91yPke83gwTnIpJODYZKOJwCNDVNldhk0xtc0TiP/24DjT+T2KNllU5VajHeeyph41bw== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:CY4PR15MB1238;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1238; X-Forefront-PRVS: 0107098B6C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(189002)(377454003)(24454002)(199003)(1706002)(64126003)(586003)(33646002)(97736004)(31686004)(81156014)(6666003)(68736007)(83506001)(42186005)(3480700004)(7846002)(575784001)(86362001)(2906002)(7736002)(305945005)(6116002)(65826007)(81166006)(23676002)(230700001)(2950100002)(8676002)(31696002)(5660300001)(19580395003)(5890100001)(50466002)(36756003)(19580405001)(101416001)(5001770100001)(76176999)(92566002)(106356001)(189998001)(107886002)(65806001)(4001350100001)(105586002)(47776003)(54356999)(77096005)(93886004)(65956001)(50986999)(921003)(1121003)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR15MB1238;H:[IPv6:2620:10d:c0a1:1110::1085];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjE1TUIxMjM4OzIzOm8vMnpDUFRvWmF2UWRtcVJzT05ITkw5OENP?= =?utf-8?B?dTdzc2VueUJCYlc3dzQ5Q1U3ak9Xcldsbm1HcVZrM3BiL0srblJnZFFKQjB3?= =?utf-8?B?MzNEaHhzTWFId0NVdnVoWWh5cWVTNmh3b3paSnlpbCtwcFlJSXFyMlNGSUpD?= =?utf-8?B?VWxNcXZhTitXekZMV2h4NzU0MUNMTmIxYlZQL0JvUXNBa0ViQTdzeWZkc0Y1?= =?utf-8?B?MFBsb2dTS21teDBWTnh1ZjdybDJHM21IMzM3SVIzNEF0WGVCWlpHZFI4UkZF?= =?utf-8?B?dWEwWUlBS1QrOC94UGhpZ0FMclByemFGbU1LVC9OYS9xMWdEUjRETEdidExw?= =?utf-8?B?YW0vTmxFRDdyVE42ZGc0UTBiSmdNN0RQQUFQWUVxQldndkNzMTk1VmNqWWJq?= =?utf-8?B?U3NiS1lRTjZHRlUxZ29CdFhMUEhLb1hoYmhkMVVmaVU0SGJ0ZVJBbHVzZDFk?= =?utf-8?B?czJHNEFPRDM3RC9MOWlCZjkvR2I2M3NLbDladURmYUtFNTZvTTJNenplTEJa?= =?utf-8?B?UWV4VUtLamEyRFkxelFHeCtpcTRTZEVLM3Z6WWhBblUwUXpIelVMNTRRNXpa?= =?utf-8?B?b3Z3bHhzaHBqekxvWEU0NmdPMWxuY1h5QU1NNVJtdGhiU1J5dzFoa1JncnZv?= =?utf-8?B?NVZwUXBhdUx2dE9QNlNpUzA3Q3JZczJtT2dOUklyaVBXS2VHaHI5NVMwZHV6?= =?utf-8?B?VS93cVJmNldPcnVKd2svV3hLRVBHS3RJb2hJUEJDSVp1TzBSTlNRTEZTTGFm?= =?utf-8?B?MDlpOHBvWHJlNUUyMGpPbmNjdjhmZVJzQjlCTFFvdzArVXVNY3dTNmt0WVR5?= =?utf-8?B?TS9OYjNSam5tOTl1UmFSNXlVc3R1MGpSMXYxeENySGlKdmV4YnpjWHh0NDQv?= =?utf-8?B?MkVtYmRjNkVxd2xCK3RhK2RMMG9JZHVkUURrbGdyK0hIeHFQdUhEMGI0bFBG?= =?utf-8?B?NTlGcmR1RklqTEtlSHpHTnNVb2xyU0Y1eUpqa09RbDNLU2pOem9Md1dJK1lp?= =?utf-8?B?c0x3NGtsZTRvdGdJdmNXMEVVOTdRM2MxYWFSQkYyTmg3R3Vjbmw1ZldYUFpn?= =?utf-8?B?VWFOZUk3UnNoRzFxMXVWK1JrUE1OT3I4TU5kbnUwMDkySS9JZnM2ZjZadEcw?= =?utf-8?B?clg2bmt1YkJhcDhvNWVGTmZEZ283ai91YXRuT1psMWNSNlVqL3RTWWlMR1hS?= =?utf-8?B?cHJXOUFlaWx1TzRWWTF4ejZpdVlmOFBGdTlMTmFwSjJwVXhFTEllZGlxOVdU?= =?utf-8?B?YlQzUnN6QzlzUng5Vmp0VVBGYlEyTGExS2Z2VXVqcmNuRThubEUwa25GcWFC?= =?utf-8?B?SE1SSSsyQTU2Vm9ZckFaVEhETXpGZmRCZHFDYTUzNkV1dmd2K1hZZ3MvcDha?= =?utf-8?B?OFpRSXdORDlJK2xPNFU5aU9wcFlLSklPcnZFREtQa3ViVTBTWnhWQUhUSmlQ?= =?utf-8?B?YzJGa25zSEg2c083eGVwQzJ2NmNtVkxWZFR2WDZMVGwvdXRxc2NpdDZtSXhG?= =?utf-8?B?R0lCT2M1MnhtcWlNMDVPSVJvVTk1N05sMHFDS1VyeFdEL05Zd1FyaW9lYUZJ?= =?utf-8?B?eEs1emJnSVhyc2JwS2huWnU1MWlmNWVRbE1FeVpISW1XU1IzN2phTml6SjN6?= =?utf-8?B?aWwrc3RXNkl6d1N1TEQ4aFFjbWJMZEpHNmdYTlAzTzlzdjR1UGJqZVN1ejBE?= =?utf-8?B?cncyNXdQYXRyamU3U3QvaG9uL2VZU29ZdE1tSmI2ei9xNmhBS3hHbUg2WUxQ?= =?utf-8?B?WHFzcTlIS040Mkx3OUd0ZXl0dk9zZHhiWEtzdXhnSkFWaDA4OUdzWW1XczV0?= =?utf-8?B?Nnk2MlU5VTYvRzNTL2orTCtDdnhjRUV3R0IwZElvSXhyRjdQSHJsS21WQWE1?= =?utf-8?B?cm5wUS9iRkJOb3VlMDdEQUJURWpGd2xZSDYyVGdyTFBMR1dsUTJ5S3VkRzlp?= =?utf-8?Q?SAXfHShMLMsbT/8jDDXHSY2bhqKMgw=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1238;6:FdCU1UNfyy0BQWaemMBuenIAF0r2WrADkX7tUbqb2lutGIYcMvZJoqibBndCYJ/UWMyCs4/7RHDQ5xeryuViGcDMFr9jjQDVjgVAcGBnfyLf/Kdt42buhi9P3qsVxIOHghrsdCyaTiqrCh8Vb4KxR0XcdXgAwCxxJyy0wraX6Z49Zicy/kumLTQ6Ii4ovyRzNVtMkTWyYZXxJTa2eq5saBJK6eWHMjbZpyZO/lFQm1w2Rf9wOS4fAYKY5KXrbkmB0bVN2QdDbFHvFZagr9wc1MMi7IT1lqbMJRE1d4tFe+2UXcp874YEzoNKBIpB97qx;5:kLPebowU6eokimljkp3CssxC4s5/+37MmDWCn5xA/kJLlI0zLYY8lHm99Ff0GTgJ31xvQh55ZZy88iVFjZnMGc/FvvGsL/niHeDlWQBpPIeBleyWN30nEZUqn+sywiFITHE9RvSZITvYOVh5X4UZLQ==;24:Oq9Bg0cJKIv1yrC/RPxMmFd2WpGi4OOtqaobPK/zlnZQM+QzBrjuXnNP7gg2+BTgLZUyunfZLfOo06wm8I45C5S9FfVmJjvdMQOzPjvxzIM=;7:TrdwWD+atXXoCoRvLfNfTpFg2VRQYqH+8SvZdHbckb67t9qAXAHqUQJN1CyQphpDD9j30RkPCdUxQ23kkfKMaSujmqW+yKNWume9F4QqbjOqaSbrIgB9pGPDuuxY7+ppmnPR28a4j61M+PSHY031Yd1kUfFTGWjkuVMhtq6WROs+W+wuJ87L5aKOb/sK1CfwYRnmeWQffQKEoZ3p62IZBS4E1NKyI9ZAf2fZImc4dDOeizlSnoi7Sc7w6gL/4xUfk4++n0YC4eXUr0Pu+ZE774VZFRTMuHHqXqaHJOt9wjpiq2cK/zu8e/97b5eaR8KmM4id84OumS8exTiZIuT2Aw8PKhzucB/RQtOLizJ3Me8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1238;20:vXR8plk96zM3oDZGxAa3Ho4AvA5Trd1i/6G31NujPmH887pl04YJw/sGxDp3llY1FU2sFTE2b95mZ84pDb9nqcNq1D6n73/62cY7LjApsUKKNTLxCEC0eiSnV3DbuURPB3WiYCTccTN3FbCv3NCzvlmgnfBLFNl/KXjrlq6OFx4= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Oct 2016 21:53:04.0826 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR15MB1238 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-26_13:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4191 Lines: 91 On 10/26/2016 04:00 PM, Chris Mason wrote: > > > On 10/26/2016 03:06 PM, Linus Torvalds wrote: >> On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones wrote: >>> >>> The stacks show nearly all of them are stuck in sync_inodes_sb >> >> That's just wb_wait_for_completion(), and it means that some IO isn't >> completing. >> >> There's also a lot of processes waiting for inode_lock(), and a few >> waiting for mnt_want_write() >> >> Ignoring those, we have >> >>> [] btrfs_wait_ordered_roots+0x3f/0x200 [btrfs] >>> [] btrfs_sync_fs+0x31/0xc0 [btrfs] >>> [] sync_filesystem+0x6e/0xa0 >>> [] SyS_syncfs+0x3c/0x70 >>> [] do_syscall_64+0x5c/0x170 >>> [] entry_SYSCALL64_slow_path+0x25/0x25 >>> [] 0xffffffffffffffff >> >> Don't know this one. There's a couple of them. Could there be some >> ABBA deadlock on the ordered roots waiting? > > It's always possible, but we haven't changed anything here. > > I've tried a long list of things to reproduce this on my test boxes, > including days of trinity runs and a kernel module to exercise vmalloc, > and thread creation. > > Today I turned off every CONFIG_DEBUG_* except for list debugging, and > ran dbench 2048: > This one is special because CONFIG_VMAP_STACK is not set. Btrfs triggers in < 10 minutes. I've done 30 minutes each with XFS and Ext4 without luck. This is all in a virtual machine that I can copy on to a bunch of hosts. So I'll get some parallel tests going tonight to narrow it down. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4481 at lib/list_debug.c:33 __list_add+0xbe/0xd0 list_add corruption. prev->next should be next (ffffe8ffffd80b08), but was ffff88012b65fb88. (prev=ffff880128c8d500). Modules linked in: crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper i2c_piix4 cryptd i2c_core virtio_net serio_raw floppy button pcspkr sch_fq_codel autofs4 virtio_blk CPU: 6 PID: 4481 Comm: dbench Not tainted 4.9.0-rc2-15419-g811d54d #319 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 ffff880104eff868 ffffffff814fde0f ffffffff8151c46e ffff880104eff8c8 ffff880104eff8c8 0000000000000000 ffff880104eff8b8 ffffffff810648cf ffff880128cab2c0 000000213fc57c68 ffff8801384e8928 ffff880128cab180 Call Trace: [] dump_stack+0x53/0x74 [] ? __list_add+0xbe/0xd0 [] __warn+0xff/0x120 [] warn_slowpath_fmt+0x49/0x50 [] __list_add+0xbe/0xd0 [] blk_sq_make_request+0x388/0x580 [] generic_make_request+0x104/0x200 [] submit_bio+0x65/0x130 [] ? __percpu_counter_add+0x96/0xd0 [] btrfs_map_bio+0x23c/0x310 [] btrfs_submit_bio_hook+0xd3/0x190 [] submit_one_bio+0x6d/0xa0 [] flush_epd_write_bio+0x4e/0x70 [] extent_writepages+0x5d/0x70 [] ? btrfs_releasepage+0x50/0x50 [] ? wbc_attach_and_unlock_inode+0x6e/0x170 [] btrfs_writepages+0x27/0x30 [] do_writepages+0x20/0x30 [] __filemap_fdatawrite_range+0xb5/0x100 [] filemap_fdatawrite_range+0x13/0x20 [] btrfs_fdatawrite_range+0x2b/0x70 [] btrfs_sync_file+0x88/0x490 [] ? group_send_sig_info+0x42/0x80 [] ? kill_pid_info+0x5d/0x90 [] ? SYSC_kill+0xba/0x1d0 [] ? __sb_end_write+0x58/0x80 [] vfs_fsync_range+0x4c/0xb0 [] ? syscall_trace_enter+0x201/0x2e0 [] vfs_fsync+0x1c/0x20 [] do_fsync+0x3d/0x70 [] ? syscall_slow_exit_work+0xfb/0x100 [] SyS_fsync+0x10/0x20 [] do_syscall_64+0x55/0xd0 [] ? prepare_exit_to_usermode+0x37/0x40 [] entry_SYSCALL64_slow_path+0x25/0x25 ---[ end trace efe6b17c6dba2a6e ]---