Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756724AbcJRXgs (ORCPT ); Tue, 18 Oct 2016 19:36:48 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:41886 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756715AbcJRXgi (ORCPT ); Tue, 18 Oct 2016 19:36:38 -0400 Subject: Re: bio linked list corruption. To: Chris Mason , Dave Jones , Al Viro , Josef Bacik , David Sterba , , Linux Kernel , Linus Torvalds References: <20161011144507.okg6baqvodn2m2lh@codemonkey.org.uk> <20161018224205.bjgloslaxcej2td2@codemonkey.org.uk> <20161018233148.GA93792@clm-mbp.masoncoding.com> From: Jens Axboe Message-ID: <0504f6a2-dbb9-e803-58be-c872061da0e0@fb.com> Date: Tue, 18 Oct 2016 17:36:21 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161018233148.GA93792@clm-mbp.masoncoding.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [65.113.42.202] X-ClientProxiedBy: CO2PR04CA028.namprd04.prod.outlook.com (10.141.240.156) To MWHPR15MB1198.namprd15.prod.outlook.com (10.175.2.140) X-MS-Office365-Filtering-Correlation-Id: 15a5a479-7ce7-4c72-dd1f-08d3f7af9414 X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1198;2:qqByT71FhGaghU41C+CNVqF1fcISIB2hdBOeA8HEZBcy+4pI8QXiMsShYoUOb6iGW5L63WnynfRml3X9LF/Gly2qFhxy4SHdK1emda8NXvQ95SAbjx+HiDfaw6OLWfuNGmH7BXdLvPaYL0yoi4Xg4LPM4IgaA34JeXsaKNvJC5Yg/3L+1Mg3PU2IFVmdBjvF3jsh51xdAdMYwRsy3jz41A==;3:L7QZKhfaJ4kYmkrqCwshyZRzrKI68MY60ZDjJpa/gP8zJqwGMAcdNOOcsu5Tp4yOheHgTevmH3vunEFjfT1F/Z9xB0rLhs9pu52yRZwQchhNBt/msYx+AtrMrVBy9//EuriQdZBhbB6IVo9UZyeS6w==;25:eUKmaHuXf1BMZoTaFUSdoX9kZydiqC6bpNdG1V7E2vX7jTifbbbVXzjrRf7CYNP10L3JrDIdXtAd+LlHu9M2YogKbaqRL5WTWKAjnV3gdWzv0oBfawcgb6AhQubPtpBoSIE8wIFkv4/WCG/Qw8tAL4b0IFh+zx6o6FXl13iLjB7DROAsnIfJCx55F7KyTDi2atgrBSc0BK8qGC7CEKf3ZWtn6a2v+AXSQrBd/Xy4D4G30rqgQjNpwI4GlI200jT3C/Q+bmdFy2nr6uOHD6p8+4ntZPfCSxzBl/NG9Bhnnl09q3SJyQ6niUghuoLjW+JAU7dQLfLyB04SiC40QdalVYO9B69SxVe8LyqvyKDm9FK7LBzFXk07tQfeoD+I3S5iqI1Duw9tTFndh3NIvh84xh9mLiIF5Jovq80OoWYNwIVJCSBL98AUilrjTLIpcMOS X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1198; X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1198;31:z70vdv2YZoxZRK34Djr8+U4LogLnEkFMPvM5saWyU+5RL2Bca3Bb6bma5IM3FQ08TVOkJmNLSpQuBcGAM2jzfsdESbcpeTkhZX7nmAMVWj/ImwY9BBoMUNfovHZnJwiC8e9zoLIJ+lmUH+kZw5ne5CO6Tq8tytrLKPRK1d/Ds9cgnA1Fx39YfpemF2Xo48CqViklFnwgSAav9G1rzk9D3E2ov3Ycuo/OdbCOwS+L9pOS8B7qw5Npb/RWJ8MWrqsY;20:9zl3qbIZ+QG+HvNhpIfY9+uXa4VGoADbzFlMH+8ci+w4YAFVzYRKp2xPo2Y0J+Aa+7+4APGRNcZY1eH8UOo19oIr/gLMzjEnHqjPZgeMniHqIFqg3Zdi03qHv2aI0rPI8HV87n9gl3RKy3ExS7MQgU1AffZwORROQBg1gKva5F4=;4:Cnr+S2nun9npLCGoVnAl5NhIYVzwORJASB25g2mvHciuQn4RqxRmTrwu+6g6+FuJMQcrXdFwoeabMvud/l7LARQbJnX791Vn4z3/UiCryqqXOlbv9KAg4ei0xY+zfIQDuoHPS7pW73bPUsSQqi3WBdc/aq+0DbnRpJ6h7qY2pq+kpP337RkO4x+/slt12yEsT5xJe+nZl9AMkFgDBNQpJWWPIiTRvmO8ja4EYjQdA7DgWyq14fTPlu6Vo0U4qYR0fvfjGFodWOvRhCrLNVn0P3m3Od3RDvW4qGAEffpC5Up7IQ9ymWjJzKpjhDPUuFkCuiSNUBSmrF0k51tqN99I4/rPLvPBHXunKxEgM+bgVDzLNYDzp44HiOs+TWdfrcWQUk4ilSnzzrI9FFkx6MC7DQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:MWHPR15MB1198;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1198; X-Forefront-PRVS: 00997889E7 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(377454003)(189002)(24454002)(199003)(47776003)(65956001)(105586002)(92566002)(5660300001)(101416001)(66066001)(106356001)(83506001)(65806001)(230700001)(36756003)(31686004)(6116002)(50986999)(3480700004)(23746002)(4001350100001)(97736004)(575784001)(107886002)(86362001)(42186005)(586003)(189998001)(5001770100001)(2906002)(81166006)(77096005)(65826007)(7736002)(6666003)(64126003)(68736007)(8676002)(19580395003)(31696002)(2950100002)(117156001)(33646002)(93886004)(3846002)(50466002)(7846002)(81156014)(76176999)(54356999)(305945005);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR15MB1198;H:[192.168.0.138];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;MWHPR15MB1198;23:lDTCDmBYk69IsEUWB8Et9J154vCM3FR3AX1mB?= =?Windows-1252?Q?VEzVgjWzYVKMKcu59YhDINn7vVqJSR7dA1ALvhC75G9ddjOj7Cn/tY3n?= =?Windows-1252?Q?oNaX9gCTAuzkPcyvzM8n9zYiiIOJeOtfM85l+oaJRS478fcBTt8qdciL?= =?Windows-1252?Q?dYRLcHBTIYAjfoRC/FNHX/W7VR4PyjJJk+XNkMXbzJRWTaeG1ljNpJop?= =?Windows-1252?Q?zDxrd1FQnuzocLOb4JFpacJysSZaNBu6rfG2r/dvRqFiLVUku58JJSr3?= =?Windows-1252?Q?7vjrjTlZj5Xuy1LFZmkPtSULqjVqF1AWc56l34fsmp5v2IdiwlzlbXEl?= =?Windows-1252?Q?f/zSA8EQ/5tC7fPMkaipZp7XSplophJYK+vO8mQ52yWKlOLxIwBVQ61z?= =?Windows-1252?Q?ykkKt7A9+FsxAZQawYKCN/FBixUuYoKxWlDbLRYHoYWsaXtQGbP8S8f6?= =?Windows-1252?Q?lYgmj9qLtyV/RU//A4y4EPUC4iXOs7O4lTcXb3sFZ5mfRqluZE95OPim?= =?Windows-1252?Q?MfvX5S3n/ScPBslsOi0/PTuuXxH5KUVGONyX2Tcidf144lgABbEu0CFN?= =?Windows-1252?Q?7W6n+3vt5oYHFkioCLVgRLINqyMreVIGYiBOH3KQcjYA8UvvlqeMpICB?= =?Windows-1252?Q?XPdxvxfA2DfkYNZOZDPywJbiQi8A4MrDvRq4YaT22MI28t/lGdpZDk3X?= =?Windows-1252?Q?WcwKBtIHsy3uttcSBFl5sWBgce1Z6ne6fLbY6Y2DBZXx5uNdwhoEYdBn?= =?Windows-1252?Q?qyTBYcSb1pZAU0uwYxYNuFIQ8THm7LEXFtAz9/ACPVh4tpy81z4aJXlJ?= =?Windows-1252?Q?7ZiuIgnZKnUN6JJTua4FgwPTvhX8Dv8Vfys6F0Y/uu5R9reCb7tN1eOF?= =?Windows-1252?Q?kSSNrme3kEmAK3p4+EGa5URr+6PtfBERs20Wb/pydWhmNFul4GtD7vWA?= =?Windows-1252?Q?wgaT9VaY3OnoyzsRsSIKARkuPRSLzZv+V0mLdXoD2toP7DOrCgScboje?= =?Windows-1252?Q?rsYWZvzTif4Eliykr3jlTpwpuJW06HTcey3mpzuaKwZckEIh5UTROvjM?= =?Windows-1252?Q?mtCA5jiNdC/djgZ0jwDYUbio3bZfheha8495QDxqOzNM505rGJ/hHkwo?= =?Windows-1252?Q?PVmgpSMAA3YORED0XHV16XTgdyCc+OCU1j9tFPIlkpu2Bi8fw3/z3IYv?= =?Windows-1252?Q?ANJ+KZK3FY03FOPkGhljZsUYg6Hg3pfcfWnf9qABk4+r/fyFu6MyyxPj?= =?Windows-1252?Q?PfgPdAb0XBFbBifAN9ZEQyc96rQUerydWFTHeR/1R+kN2maBzp43DdbN?= =?Windows-1252?Q?lcHCW9KhkPb0BtJ8MBggs5YRLrLib9vg8gfHsnYjvhwQb5OM5jFGPCJv?= =?Windows-1252?Q?4Mmd2X1l+Jq5XhafJbCyUJqRwLVdAfQLnwbIiQ2JZyBdPO6o39xupk8A?= =?Windows-1252?Q?52odQ3YYzLIBFRSJntQ5yVDI3atEQOpB676Y9wQy5u4TYU5GIIf1Lqlm?= =?Windows-1252?Q?9v0fSgN9HVUDXP7X9k8E+th+CrP?= X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1198;6:3miRlu+iOoqJMNU3uZZ74T8QIlw4QHUPK/WK7zCnckGteBAsWkMIQnpHwsWNrl5CIdPxuxRwhkefSEl57J4r+TTLOTb4mJBFBKpFncjlCp8mq+uXs1chnex/qisvS9f2VjR3mg0od/aPExu7SMHUkky1JlK1R4K2xjr5SHOtK8vKvMAoreJ/5/xRBztOAYOTh2Odx7hz4y1EGL8gWp201m90mXJ40AoCmMR8FcOdBANulmqFm6fykbbYLkdwgeWysFJ5wznlmoEzpErxt7gLb1s7FTnw/d2ujRC3Iyv6R9RrMjOVkMDSTMgc39IxUbrM;5:QC/rcg5LaBcs1yZxdsMvF2/BbKzzEIJrwdWpaQnm9B6N4PSMiJNSCU24A1ubaML8U7IpG7sgrv+SdC2L5P0q/4E7aAEKtGEwgNiMXPnE9RbGq5DHD7pegmP1ygsgl8idVGaKLKhTUUDxpDZ4gkEQTKI5h3GnPBDcMG2NmBspbrk=;24:2hKiBLiwwjJvjY7zMymkIaGs7Ol2PbBCd3AE/ZrPQaliNe9kQmf4m+WNttlVbausdrFDibzuF4h6x3+jSAI5crhQ8a6HDVnYohY2t31nHzo=;7:gt0B5/eNQx0STEw8M9thg4W27+Z46Bhl8K/vVUb9rpfUBucVZLMZrE3qPl7QhoVhXjM+GfaDtrqF00Q8ylT3KLx2HlI1voKiDOq7Fls67BZJJgX8IBorRewKxpudfaIW80xBj41qkmzRovXZXT1XcG9FqZIPg7X56uhqmAPR7pF0cGI8SSqJ7Z+yR+QEsVGfeEg/L1jbvv18pQe+kM2trYpFSDk8IUujCATxig2VBetGGKnZWkRtASn6zYviXkjg1tI9vbDNq70rF+6BBLVBI9wQm0GpceIWrr6mSky3PS24IYGQIlGiZjnMuJexzCn1vfby/5zQZm542dFJDdklqVNxcToyNAUp1bpHdajLsKM= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1198;20:fKijn6WvPof1y9p2EkYBR4QmettL1qn+3NSf19eztpVQt/RRcfGt2ddPey54pye33zmNl5qG1y3+QrXRV8jttfhzebXAK8pQ72rABZm+DPVSf5T9XW3gMhjGxGZsIu8ov29AUU+eC+PeD16kqkXqp7BhD4VtAhGXW5/7EOWB1Ik= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Oct 2016 23:36:26.2252 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1198 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-18_12:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3218 Lines: 68 On 10/18/2016 05:31 PM, Chris Mason wrote: > On Tue, Oct 18, 2016 at 05:12:41PM -0600, Jens Axboe wrote: >> On 10/18/2016 04:42 PM, Dave Jones wrote: >>> So Chris had me do a run on ext4 just for giggles. It took a while, but >>> eventually this fell out... >>> >>> >>> WARNING: CPU: 3 PID: 21324 at lib/list_debug.c:33 __list_add+0x89/0xb0 >>> list_add corruption. prev->next should be next (ffffe8ffffc05648), >>> but was ffffc9000028bcd8. (prev=ffff880503a145c0). >>> CPU: 3 PID: 21324 Comm: modprobe Not tainted 4.9.0-rc1-think+ #1 >>> ffffc90000a6b7b8 ffffffff81320e3c ffffc90000a6b808 0000000000000000 >>> ffffc90000a6b7f8 ffffffff8107a711 0000002100000246 ffff8805039f1740 >>> ffff880503a145c0 ffffe8ffffc05648 ffffe8ffffa05600 ffff880502c39548 >>> Call Trace: >>> [] dump_stack+0x4f/0x73 >>> [] __warn+0xc1/0xe0 >>> [] warn_slowpath_fmt+0x5a/0x80 >>> [] __list_add+0x89/0xb0 >>> [] blk_sq_make_request+0x2f8/0x350 >>> [] ? generic_make_request+0xec/0x240 >>> [] generic_make_request+0xf9/0x240 >>> [] submit_bio+0x78/0x150 >>> [] ? __find_get_block+0x126/0x130 >>> [] submit_bh_wbc+0x16f/0x1e0 >>> [] ? __end_buffer_read_notouch+0x20/0x20 >>> [] ll_rw_block+0xa8/0xb0 >>> [] __breadahead+0x3f/0x70 >>> [] __ext4_get_inode_loc+0x37c/0x3d0 >>> [] ext4_iget+0x8d/0xb90 >>> [] ? d_alloc_parallel+0x329/0x700 >>> [] ext4_iget_normal+0x2a/0x30 >>> [] ext4_lookup+0x136/0x250 >>> [] lookup_slow+0x12d/0x220 >>> [] walk_component+0x1e7/0x310 >>> [] ? path_init+0x4d8/0x520 >>> [] path_lookupat+0x62/0x120 >>> [] ? getname_flags+0x32/0x180 >>> [] filename_lookup+0xa8/0x130 >>> [] ? strncpy_from_user+0x46/0x170 >>> [] ? getname_flags+0x4e/0x180 >>> [] user_path_at_empty+0x31/0x40 >>> [] vfs_fstatat+0x61/0xc0 >>> [] ? __lock_acquire.isra.32+0x1cf/0x8c0 >>> [] SYSC_newstat+0x2e/0x60 >>> [] ? __this_cpu_preempt_check+0x13/0x20 >>> [] SyS_newstat+0x9/0x10 >>> [] do_syscall_64+0x5c/0x170 >>> [] entry_SYSCALL64_slow_path+0x25/0x25 >>> >>> So this one isn't a btrfs specific problem as I first thought. >>> >>> This sometimes reproduces within minutes, sometimes hours, which makes >>> it a pain to bisect. It only started showing up this merge window >>> though. >> >> Chinner reported the same thing on XFS, I'll look into it asap. > > Jens, not sure if you saw the whole thread. This has triggered bad page > state errors, and also corrupted a btrfs list. It hurts me to say, but > it might not actually be your fault. Just went through the block changes again, pretty basic this round and nothing that touches plugging at all. General memory corruption could certainly explain it... -- Jens Axboe