Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934884AbcJZXP2 (ORCPT ); Wed, 26 Oct 2016 19:15:28 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:33722 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933418AbcJZXPZ (ORCPT ); Wed, 26 Oct 2016 19:15:25 -0400 Date: Wed, 26 Oct 2016 18:54:21 -0400 From: Chris Mason To: Linus Torvalds CC: Dave Jones , Andy Lutomirski , Andy Lutomirski , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner Subject: Re: bio linked list corruption. Message-ID: <20161026225421.GA15247@clm-mbp.thefacebook.com> Mail-Followup-To: Chris Mason , Linus Torvalds , Dave Jones , Andy Lutomirski , Andy Lutomirski , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner References: <77d9983d-a00a-1dc1-a9a1-631de1d0c146@fb.com> <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161026163018.wx57yy554576s6e2@codemonkey.org.uk> <20161026184201.6ofblkd3j5uxystq@codemonkey.org.uk> <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [2620:10d:c091:180::1:5709] X-ClientProxiedBy: CY4PR22CA0029.namprd22.prod.outlook.com (10.172.142.143) To DM5PR15MB1243.namprd15.prod.outlook.com (10.173.209.137) X-MS-Office365-Filtering-Correlation-Id: 7fb4af1b-31a5-4b1c-7768-08d3fdf309d9 X-Microsoft-Exchange-Diagnostics: 1;DM5PR15MB1243;2:u/mPDOS370XcOvCQ35ZMYpxeEP5fBMyVy/sE5kywJrDt7id2DKgEQ9NQ+WCDJ1T0dprQxN5fYPRS+hRGCYrGeuFpygVtmT/6uHxyuDUVB7Wr+SPYQZdz6sFVH4ffk1BotFiaDJlo+dU/k5u+9jZLbJsACNn1le/ELBh1RFLMlLfq3ogBOybIzUGBl8T/+9Gb1IGw2cl3nhJ7zv4fdMCVKw==;3:VJ2L4VRvt2HKIg9bsEe0sPDdi9UUBRAnLUHmNHyU6NvQGFO92OOL8tLi3memm3bNDZ/IF4dYg+n4KM692/UgKdhCX6Cj78KoOYCfnjQ8FkF23XPLsb24dtQyK5mJkjXfx7+zz2I6LLhcE+XYaP0s+g== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM5PR15MB1243; X-Microsoft-Exchange-Diagnostics: 1;DM5PR15MB1243;25:bnN8a5nIZO4oXTLQ7OYkptkC4phApDOemQklty7YDMFyG1KNqsh8W8TSNKh/golJ3vQreKJO24+1Zy9jCxl6M9Z6F2mEQRrTGmK7kjIK7wnw3B9ty7CAhDbEVY2ur6brrgx//wLuWH/QInE24XKduPjwAkg45pfzhHaFxjP9agus1ntB8HkGjOU3BbwgyBeu6sCBH3DJEX8SmI6MbIp60DoA3EGblNRXE4NToUcx3q1xXgJWWLm8Bh+xoH3vZx2tzb0NQH+pl+TdW5Ch7bwJaSfIyZbxNRFpjVpq747oehDWJwQ8LFTGayEicsI1VJosabFKKzZmaRZ5mdU2Ybbt9TBcP4NRPqRbdOCrKgJnjzySnzyaCdDtKi70FdEgmLjnL4kCVfnOCtEoNWyrhuwctPDE5KWGwbEXdSt7PS8hIKAANPOvKjtqanp58Pde6B2sOpinVkyi39iBs5V0QNz3HVGKRN0eRCSmpyEwQ2ei5vnaoArY4Ns5Xsu2yUhFy//cruaHhc4TmlqqevV93phex2b1uzYGzKiS8ReF8gLU9Q2/lBlvQ9Hg2IdBwoWUY6b+EydKBM7LtA4dFAXJp4IyeG8XvKFlXS5U51wmB0TO8bNsBj1Z2Z24a8/yMH8vBsWHgOOmHdhBfDa9vJwbi01suT4vJ+e3s4OTMi8NKqY8Qb+6O9p5Ytd9aabg4tVyUt8REx58kp7Zb1SimmkN7VJOmbzc/CT322y5z6HLELIbws4= X-Microsoft-Exchange-Diagnostics: 1;DM5PR15MB1243;31:3z+5CHGFtXs03GlNAAF33z2vcn66E4t5W5YAnHGWoh7Vte5DTquMgADu5KaFaoE9VjLuKW5If1ynj98ssusiFydmghy/HaLbGXhCdsiC4T7+SC7YTgBdRuWmT3oRU6lDROixjsU770c+wEQ4aipGREtqGxd6YblpVITTjPR2M3sX/RVeSzv4asE0P47clTvLBPgSLliKk5vkIjWWBs66ZaW9ZpqZY5xDu49ciE6nrdZna5cL3eC7cXsthMDmhOh8fFjbXhi5o9CIAPsl/x5SOw==;20:osuAkSgltNJ2vagx1wXHslbY7hEmf66VH7IWsvVyrbgsAHaZXJ5lu3fEiO5Gy1LV4dRQofLTr8IUM7fuh8hRQOqF9Mug6gFa0pEee0PVdrjLPSuKRhNioBYzArN/SRkS2itWI7wxUxOWZRRk2wSh3ic/cpPR+FUI67Xf/7a9RWA=;4:2SrJBk4bLjx4LvY3L+pwYo4/sLOp1MgKAbhXQKcrQ3Im6Bhr1xnvCpmKBqoZAnD2HLJrtA7oKmlTAenSAdOxGvyF0CjLKFcHpMO4gAzzikZFBJO/uyWqjkz4LMNShrIe1GMPZa3J/Kyle24tiQdOl1WQtPzCCKIsrIJtv4EEcD2HKZVT+ZT9FoBxkElh7KeDVlXJClHWUPF/PErGKEb0rqi7hawdG6q5Xyi93xzxbrhWFnTvYnZ84HhYnvbEkTUxWyDrZNUGY9IvHXkSFEhqOH7nb5G6d1fuE6Sia5OSrFGlvqUDhityf21d3TUYUrV5ZGSRZC1i5N6M72I19Zfp3e233hiCrK53nh55XeZJ33t9jTyc3G3qokmUFWKLV/qEFre/940AYb0FcjmTIU/IuZ+1T8HXd71rC37eYluZOZEQXzzXqSmtd+Xv8LfB5jxu X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:DM5PR15MB1243;BCL:0;PCL:0;RULEID:;SRVR:DM5PR15MB1243; X-Forefront-PRVS: 0107098B6C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(199003)(377454003)(24454002)(189002)(76176999)(69596002)(81156014)(97736004)(189998001)(68736007)(6116002)(23726003)(50986999)(4326007)(110136003)(50466002)(101416001)(106356001)(54356999)(5660300001)(9686002)(47776003)(1076002)(93886004)(92566002)(33656002)(2906002)(42186005)(586003)(77096005)(105586002)(53416004)(6916009)(4001350100001)(2950100002)(86362001)(19580405001)(3480700004)(7846002)(19580395003)(8676002)(305945005)(83506001)(81166006)(7736002)(6666003)(18370500001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR15MB1243;H:clm-mbp.thefacebook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM5PR15MB1243;23:m7x9pavK/ePCBSXtSt7n35kLLsiYBuLfU0xVj7qYy?= =?us-ascii?Q?h8TdEdKja0Iht+fXRUgKu0YJAXOMm9nTl4ADUhvr1orh4VpCVlrxl5LSvItf?= =?us-ascii?Q?L5QSd63KH9HwEQ3pv2gQPrVPvfaeAFWEJyuNb8dx6h3NbvJX0GLS8qhKRYTr?= =?us-ascii?Q?Aovnu63rz199B8rKKsMWDEWo+FC77mQMUAwpK1uawcZk6Aj16ZhMfL14KXQE?= =?us-ascii?Q?y/NOuQak9RxvYj/UvpeF9IgRLCIo7Tk00viSVfaQyz38IQ28FJvrr4IRCC7c?= =?us-ascii?Q?ty0cXzGiEIUGUGERmv/hoh5t7QAYv/ttYUikOj11mMrcdiGHgp6igh4B0W7F?= =?us-ascii?Q?hL9cJVuRl5tJYdMLPsN+piUj2JJLSa+DItN877ox2P8eYhJv4V++TyHKkS+V?= =?us-ascii?Q?iyzfcy9s8ZMXmI9QZv5XEvXygYUgXsK0+f8KdmBKF1xmrv6DPYRhTTGA/dLC?= =?us-ascii?Q?eifqQnJp1Wozl8ttFBMqsuMup/5hMLz8Q0gLEtS28QaZAvRNiuFBuOiMrJDj?= =?us-ascii?Q?a/NzX90lrH2MPtoUPdASA+H1SAzUelhD5El3B2gafb4FZaWZHX/tXM0wQ4Fd?= =?us-ascii?Q?WNgTwj+EzaLpQhvDXT8udXcpxABb1ngUGjhA2kXZAEiJl2xfJF7Dj+cpKpg2?= =?us-ascii?Q?71WN7DIQLKtREWOtgXjW5j+jSA+Xjb3ellvv+7DVnKAR5SGlW1RbSencH9Jz?= =?us-ascii?Q?Yz3B2qBI4Ud5uldyf/qm2nVPCfs7MgOBAlOgulXzQmnt6NnYh9BFf3Oj7+z3?= =?us-ascii?Q?NQ4y1KMh3Y+KnNTx50QUr0HGDxrqsn9ZHFmxKj4aOHtIZ7wiuuPjIkPXUmog?= =?us-ascii?Q?cU5qmxOFq2Bek5hl7/DNVDeNqfbKlZnq64LTNKTYFhRLxnJSYH6ls1+lQ7IZ?= =?us-ascii?Q?APv6z6H7HP0ocgmMEm45mVTCFZTHvMpbhB0y/zCEvOM+C5LAvGADgvj1M7ss?= =?us-ascii?Q?81+jiODcp1OnvF+QmoyfHj/UPi6u8E0aOTY7mTpuhh6Dm07J2iGRcQU9jVIb?= =?us-ascii?Q?Q0IYIFl/x7pkuTCoL6zWRRnxrE8MxJc/v179JKuqS9CQaUN50F1+ERZsazYj?= =?us-ascii?Q?2tsVc5Cd2EDndLntJqc26kmNNC4XPu3vwZ4RHurbTSAupWBSNZk0w6er3J0S?= =?us-ascii?Q?3ahtC34dgj3PP/yhmHa6xbuNlP/+efpchvC/ELeujJF3XsMR2wjI4oe4QDhm?= =?us-ascii?Q?ugynYqBbzPSLmrdagr5ZVC4J0RIqTxyGQSUmeTmCrOrEksEDNwysxY89tv51?= =?us-ascii?Q?BgTdTKzSNjP7NJuSBhwJl+qtr+ZbxBJgWCGuGQU?= X-Microsoft-Exchange-Diagnostics: 1;DM5PR15MB1243;6:CVRa8lo8q2XXhup2SygGZAECNA7Z8oCLkvYwi5QceWofPoJAasmTGRHfIp1UQF5htQT6MrWDTIlrFvQ3eE6ZVbRTddg6wnaDdnO2wqn57yqmUKLleqYuh5y+jubzyajplaJGNg4B5n+7oMa5qnwVfgMetTEAtFBand2cYkRZNNCysDjhtKtA/4bqnAubvrMep8xLIsiL6cWYW4bNKWEc1BdaLPX27DHRA25NrLJxlvyiSZ02sUslZGd4CVB9OHq2kGVqIkxJlANX0KyDZl5sbFlPYDtVJhv25C7w3gRiKQCzmCqlKiFOUbxZK/69SzAl;5:iAO/Dq2KlVIMDpcRJZfMGqC/fbd+MAQYZGNKUFagmTc24rZjX0TaT9yNuB6CCTvSGL82H2S9RuAjHiXnt+F2v1oXUtB3reqLa/cM8CpyInSPWvCc2ArLK00FYn0KZeOnduDnAqSKoE2+SMr101PE0A==;24:cTCzUDGlmcIebcfHu+eO4Bfc3Sp2wfvAL6P8I8eRhm3WY3NkdzNPBcSr1LXqy1JRqtLl+7OLs8K5sAhMKJr4fEc8rF8w9HluMpejOFt2KNQ=;7:YyK065vnxtbLJMzE+yQvBBha2p382rBBaGRUbqMydwnZmlcQfjJ9IPcaaVQpKp91WUGUjV9CgQn74v/F4v99xXpyPCkMNOcQYUJVs/Vvb3FoB6gZxnAYcjvSAsQIMab62b8lWwCMLv2x/012FrnbMcJYnQWVScz2MvWJ2kM0ozKns84j55D41gMmj2XtR/R0TVlsmU3IoXnTF7C0fFIgWyYuXiUZLodqDmzT9BTJEChIS2CSZZXQTeWfNzpkp9DomDNcwoT5ovdCaZYtyMpZWnoWDxePL1hcX2kIWz08UEm/L044ull8eMZTCoOmOSRXP1zbyfg+6iK+bNQtP04P4EKx2UQxEI/OuKV0HC32sQ0= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM5PR15MB1243;20:hScPrIw7mzW6tGbIZAMI1GTyVRTEXsMS1bkr6q+C5MazKsUcz+ohP6xnTq0f/k3jU3nho0alTTv/GHewgbcNc6jRk+s87r9zJmk+JmwRtV4zKSJztfJwVyf8MFB+ZBrvDxlfIlRNEvrRxiVhXGpucUYZeoE0LDlHgAjvUXJjq60= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Oct 2016 22:54:25.1160 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR15MB1243 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-26_14:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4148 Lines: 95 On Wed, Oct 26, 2016 at 03:07:10PM -0700, Linus Torvalds wrote: >On Wed, Oct 26, 2016 at 1:00 PM, Chris Mason wrote: >> >> Today I turned off every CONFIG_DEBUG_* except for list debugging, and >> ran dbench 2048: >> >> [ 2759.118711] WARNING: CPU: 2 PID: 31039 at lib/list_debug.c:33 __list_add+0xbe/0xd0 >> [ 2759.119652] list_add corruption. prev->next should be next (ffffe8ffffc80308), but was ffffc90000ccfb88. (prev=ffff880128522380). >> [ 2759.121039] Modules linked in: crc32c_intel i2c_piix4 aesni_intel aes_x86_64 virtio_net glue_helper i2c_core lrw floppy gf128mul serio_raw pcspkr button ablk_helper cryptd sch_fq_codel autofs4 virtio_blk >> [ 2759.124369] CPU: 2 PID: 31039 Comm: dbench Not tainted 4.9.0-rc1-15246-g4ce9206-dirty #317 >> [ 2759.125077] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 >> [ 2759.125077] ffffc9000f6fb868 ffffffff814fe4ff ffffffff8151cb5e ffffc9000f6fb8c8 >> [ 2759.125077] ffffc9000f6fb8c8 0000000000000000 ffffc9000f6fb8b8 ffffffff81064bbf >> [ 2759.127444] ffff880128523680 0000002139968000 ffff880138b7a4a0 ffff880128523540 >> [ 2759.127444] Call Trace: >> [ 2759.127444] [] dump_stack+0x53/0x74 >> [ 2759.127444] [] ? __list_add+0xbe/0xd0 >> [ 2759.127444] [] __warn+0xff/0x120 >> [ 2759.127444] [] warn_slowpath_fmt+0x49/0x50 >> [ 2759.127444] [] __list_add+0xbe/0xd0 >> [ 2759.127444] [] blk_sq_make_request+0x388/0x580 > >Ok, that's definitely the same one that Dave started out seeing. > >The fact that it is that reliable - two different machines, two very >different loads (dbench looks nothing like trinity) really makes me >think that maybe the problem really is in the block plugging after >all. > >It very much does not smell like random stack corruption. It's simply >not random enough. Agreed. I'd feel better if I could trigger this outside of btrfs, even though Dave Chinner hit something very similar on xfs. I'll peel off another test machine for a long XFS run. > >And I just noticed something: I originally thought that this is the >"list_add_tail()" to the plug - which is the only "list_add()" variant >in that function. > >But that never made sense, because the whole "but was" isn't a stack >address, and "next" in "list_add_tail()" is basically fixed, and would >have to be the stack. > >But I now notice that there's actually another "list_add()" variant >there, and it's the one from __blk_mq_insert_request() that gets >inlined into blk_mq_insert_request(), which then gets inlined into >blk_mq_make_request(). > >And that actually makes some sense just looking at the offsets too: > > blk_sq_make_request+0x388/0x580 > >so it's somewhat at the end of blk_sq_make_request(). So it's not unlikely. > >And there it makes perfect sense that the "next should be" value is >*not* on the stack. > >Chris, if you built with debug info, you can try > > ./scripts/faddr2line /boot/vmlinux blk_sq_make_request+0x388 > >to get what line that blk_sq_make_request+0x388 address actually is. I >think it's the > > list_add_tail(&rq->queuelist, &ctx->rq_list); > >in __blk_mq_insert_req_list() (when it's inlined from >blk_sq_make_request(), "at_head" will be false. > >So it smells like "&ctx->rq_list" might be corrupt. I'm running your current git here, so these line numbers should line up for you: blk_sq_make_request+0x388/0x578: __blk_mq_insert_request at block/blk-mq.c:1049 (inlined by) blk_mq_merge_queue_io at block/blk-mq.c:1175 (inlined by) blk_sq_make_request at block/blk-mq.c:1419 The fsync path in the WARN doesn't have any plugs that I can see, so its not surprising that we're not in the plugging path. I'm here: if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) { /* * For a SYNC request, send it to the hardware immediately. For * an ASYNC request, just ensure that we run it later on. The * latter allows for merging opportunities and more efficient * dispatching. */ I'll try the debugging patch you sent in the other email. -chris