Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933583AbcJLOnc (ORCPT ); Wed, 12 Oct 2016 10:43:32 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:39921 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933734AbcJLOnY (ORCPT ); Wed, 12 Oct 2016 10:43:24 -0400 Subject: Re: btrfs bio linked list corruption. To: Dave Jones , Al Viro , Josef Bacik , David Sterba , , Linux Kernel References: <20161011144507.okg6baqvodn2m2lh@codemonkey.org.uk> <20161012134717.n74tww5eywc7dqp7@codemonkey.org.uk> <20161012144012.7vvfehceoykswmun@codemonkey.org.uk> From: Chris Mason Message-ID: Date: Wed, 12 Oct 2016 10:42:46 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161012144012.7vvfehceoykswmun@codemonkey.org.uk> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c091:180::ee4c] X-ClientProxiedBy: DM5PR20CA0040.namprd20.prod.outlook.com (10.171.161.154) To MWHPR15MB1247.namprd15.prod.outlook.com (10.175.3.9) X-MS-Office365-Filtering-Correlation-Id: 5c823e08-7535-49dd-4b5a-08d3f2ae0c9a X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;2:bT2BaTYNc+H/+qmKVCOb8lwR0apJXfGwzZeUNSWpZF/+5LugwbRk02WwCnMPyAIFSsqPdLrXk3hzmSxpjGUK9Wfqg18wKQ1SiCwsvxl1jnr9FubE+z96//Beyodbe7mONA3hguCzrBr5uJsaNGWCQDBCHjwaZ7vWefO3Lpua0mn1oEkzRK+Zc17EOyfmu+tWaSduGFu1agvfQKByeP7f7Q==;3:SWif5XxbCJzqy3UhMvUBdmN9ryuG/1t8MKGOlL2vLvNH98CaZ+IQL5/KYINquypXB/SmYf3Qr30OahlQ9392umsyQ5bv+9AD1FUS2bUnntVSiKRz4Op7/B9S/NbwqI8WEzw4lmcf4tQEltzdz3P22Q==;25:t+SL7OImbxykwNomliT7qmRnmxH/I1kwup41BXImdD869HpuO32nKyPzSxvdsFDklo3FoUlmGb5fum8/MfbQ1KEJfwJG/6UbeyzBA/J3GgiRaAFn/iy1fZCiRSr97bTpe/NQJUItA/OALHnTYS6k+e2RTHLdZIT52xaUQzPVjxtb/8biHVis9SmyhuCk7N1J1d8DMiH+WYktDDHEUN8sWJG/7xUq05R48LtLsg5+YsH/0M9OkVuUdYrZRQBMQ1RuH2vTxDBQTFgWyNIfXPiA39Ib8/AvYC4e4Tk4UYLYKEtw3+x+y1EKTKPqlxjIaGuZYvxYf3717+7zKtUzSZH4aa65yCK2qU7fP0e+z+5m/o0jGYkEM+c6a6RQ3KgnPlzKchmNisbB5jgxjfHIL/wNkAd+4aGTldy08cMyjlkEF51WkiqbIGnHLWPjIhwAma9h X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1247; X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;31:ZJIqCUdDavOb1VDrHZt5fC0LeqHYUmzinM3qQkyXDFxIpAV7k/WjIfiQKu49Ox9Mx2FlOzcm1ns/L5aF8e+ZKCa336xPjVSoBcwHfOsDPpSkgze1RwGOisEdCzX79zux8ryJSR7ngd2T7GyL7lO9VlYqMqWF74HgqdUzvvg569HCaOGkCSy7hzLhl2WGkw99iy28XbIlwNcySAekGR6Rri2dX2Weip40dc8u7jCLpRHvEQSHRZ3E3LhsHTZwPYtE;20:LklbalMOnu49X3f+t9noPwvGWBo75rmMM++LojBSdG7iCsJkI6uwpvfAOa4H9+eX9zOG/lJ898aRXRt6YIuOrBix+4FUwO0MckpUTF9MN/za21J7aUx6Q3Dw8Mvc+GJrSSe1I2W4HzBnqpf10QMAyhYrjOPaHZqfbD524mFe1DE=;4:g8SW1kfKhy9flc6xigBbfW3y6DhIsEHxeZwgrNbOsRP5+AFzj0IxNm2Msr5TibQeFbH+DiD042bJ6pYY3K/OT7xoPH5kxM/WAvVeVVYeHRMEwVAgNKn0sOqTHVC/NOKElC1cgRd4Lh0tyN6iVSXZ2PLl8ZxNrPr+ixx7mR3mZrKZgJTnj3bTtsh82eQ0jfmmTJfCIB4BWb6bH6OH7Kid8N2PvE7tdZRMl8wRkNan3WYRJYg9Kszg3faaFeKuCUR/KtNzLdgF+1iGgFCXFcVLWtVVFHmAPGwDI5UxdMnchMHelgmSaoiWP6nItG5H+0QHAoQqgrMUA6Z9hJWDC1ovvA3mtHgTo+gHFGHPD3Hb4Lu9PFhevkH6hDasYN0knt/YzU2mzc+eTXOVW6jK94k3UQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:MWHPR15MB1247;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1247; X-Forefront-PRVS: 0093C80C01 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(377454003)(189002)(24454002)(199003)(106356001)(107886002)(81156014)(92566002)(81166006)(77096005)(1706002)(8676002)(23746002)(230700001)(551934003)(4001350100001)(105586002)(47776003)(97736004)(7846002)(305945005)(7736002)(65806001)(65956001)(76176999)(50986999)(54356999)(31686004)(50466002)(101416001)(42186005)(5001770100001)(68736007)(586003)(64126003)(33646002)(83506001)(65826007)(2950100002)(6666003)(31696002)(6116002)(86362001)(575784001)(189998001)(2906002)(93886004)(5660300001)(36756003)(3826002)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR15MB1247;H:[IPv6:2620:10d:c0a1:1110::1023];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;MWHPR15MB1247;23:Dl6wICdCqnQMk2ZiCxgPHpo74kvcQNDnzn4c/?= =?Windows-1252?Q?D68a36dSKGMdw6PbWves2g4oGZcPR4roJa9a9drIhm3Y/u+GMb4u5JTC?= =?Windows-1252?Q?yaa6PyvjVKAD9t/b2NQbWMNGfcOPiFVn/p3ojxZZ30YK0Fn/HsUhFYQ4?= =?Windows-1252?Q?qwJ4a2dZs5n5ZPaJYkln0KVjc5rVJVmUoo2u/SE83GwU0HVAlUHa/nqc?= =?Windows-1252?Q?xpBLKRAGeG0qC+57o5Q62EuAkRjl/pjhv/E+EAmZ8tfkNA7dxVmzbdd8?= =?Windows-1252?Q?XQFkT/fcIW6AEetCy6p7fIYyqEv9RylBRcYDvYHayxgL+5wFhm++yfpx?= =?Windows-1252?Q?PpHzastQ2I7TCwsogsr+260VPk+diVnQitYQH3lYyjOsfXiJ2nrUwD4m?= =?Windows-1252?Q?6ZSLW7cVoiqPqDDOt+RZ12ZWzKJ374DoqHHEnKmW+/gzCiXxmUR/rFRc?= =?Windows-1252?Q?7MrpwXLV1zoeDrLeaVKl1Edg9K2uD5NrENSwW5g3pgMambxJ2SaPtcSh?= =?Windows-1252?Q?ztJ1MO8TdODhYenuz9lTFqwh2MnUBgMAv9y1Pcd8zGq9gsF7/Snxzku/?= =?Windows-1252?Q?KDBrs0UDvd93KVLGpSR/Zsge+I+sbeNO0dR0tra1Ov4p6SlcPWdVmIyX?= =?Windows-1252?Q?di3yaQueFe18QyIcMkQOH1GRzmCnAwKNWOtNLfzKkbMQkp/7SGyPIWpf?= =?Windows-1252?Q?Uz90FjOAyRGJAfxJxSZRb9QXYDNimJqHcE0dY0Y8fpDSwGMynDz9cvLo?= =?Windows-1252?Q?zeZBGPpcIk4tgXCTqqczRqV2/1eY28dqAaldywBiIL2hVETtwy1aL4UZ?= =?Windows-1252?Q?0mWS4Ahj0sjy7Z1+94JFrY5H4WnEC1Vk59mLJV0N5jevV17XqIOINULS?= =?Windows-1252?Q?h5yExoLP5BQGHPWLu9Tc+l736dAzrwhN1vSsPB3ek/hVy+5Kwch3pIuL?= =?Windows-1252?Q?Qhn5QLACHvlcNZMZ/k4f2TLBnDozegVFGYFoeKZ4YQKAlWUOwme91/hx?= =?Windows-1252?Q?3YQ8MPwZF/LL9BQ97n8bgojm3m/ro/6r3K0YspjSGc2ND/zisq5Urutk?= =?Windows-1252?Q?mqjMzwra/0Y1otoNweLR1oFC6l04bhzSxBtwjVjBx7IbB532Vib7JiV+?= =?Windows-1252?Q?czgRDGIzVXR0z5uXKRmaeV+OeRwawMuGrn/XAuDmRxiaXGiJj1o93/aL?= =?Windows-1252?Q?DQs/3dz8kA0o7NwvuLC4IiTovvhKEoKMG93FfLuWs5tyuXZ+AHoajayj?= =?Windows-1252?Q?8dDF55qCFsaVPy+KrDNBbp7K4nr+BKTVpiSqzRbZeuixHt4s8M5RLYiB?= =?Windows-1252?Q?AsrNVnV8R2xKCNtC7fap7AEDYg0rkR9icRw2hDi4z3rhNMFT8g1nqb2y?= =?Windows-1252?Q?0B05UXVsIRr+EdTXrvJFQcaXmpz/Tvj1YH6qA+M3HblRiIhwHtA8GihY?= =?Windows-1252?Q?J6g4HVRWFRrWtTeXxFp?= X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;6:tjsguobzCFb324EcaYC/AUxmdp0feAdZHJgVjN0hAg3d6BE1yId80xoc7jeUjBUHrSX/pcQYE672MYF9R3BmtX7O0Rc6hti1HXXHJQ/vcPE9W01BGu3dbKExwdfc05Dz9hCXocTmpyLRHcWviPsTjKbgS2wjw3CILkRx4q1V9ZAXAVp0S8mobg/vi0ldxjOnRFyrgATqwaLTPUpGHv0j0jtLH19cKN1a2yV47dkvXZFqZak9lLUniyDCbCfSayMzdQY+0u86GyrIt54V8eKfwCN4SzOm6v7QfvM0Wdew0YnArAAR5xDYO/GoyYo1Ed9l;5:LaRiN4EWO4pf0Fhi/vysXIQ/bnufsrZz7WGSs6AiqLfZH1bKfxoxdR5xnks2MfA6QGyilRzP27Y2yO4x8t+2e2a4VpgPknSnGdPxQN+RtjoI7TbtLsom1yTsxQ8GUGmB9DihyVj/ZFOSngYban8TOw==;24:OSmmK+LLyKVRmNKtPy3Kls9oPdE3b+kbbuslYXgUrsfGmndKFLG1G+8J1dsSpAU554l+UA32ogVmtDMLqusgPOiOUocA5s2LtRRiJzSj6QI=;7:HOhb66zDirnC6j9uZVQ4D+CbHAwHpoiq2tgFWr2fBlyN02pSwjvPJAUoGX20jCpDYdMvVfoFLzmChls+xrmVQIV0+CKnkDEgPOLIbEXAdVBz9MI2DzXqv/wLCLOULsH/amp0KHQTnF2aDV3IynujUx4M9MHPYakn71pAe6B6uNbW7OOb0TRyMItp2iVcxkFzhZsxu1I6+pJHcgVjzNFo+ahwcpPpu2SmezqxGLGKRqGYgj0bGeAD4lbfX3Pd4vpbBS9s1hZuRy3gYY9fgpeD3gpeLvDxde3CdroIol8qY3yZPGhOLFECkZohMauF8fyhPI3pkQwSOkWB4R0lPNc9At3ter2jDz2xtS35Qwddh6g= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;20:6Adma2nic8bHSa+z/gJcw4FMd2XjIdXyKmMRqVU4UedB6iL9pmt+4/SIFzSEoq5KCFk+HMQsJPse9SLv4FA5ea56NVKtNRJ1bIgwhqQN0gcl28PZ8gdz2ybQi7T+gWMMz3Cwm+Gz6iS/NOlT01xWMgGlbYasNweeVSB0FLtC+co= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Oct 2016 14:42:52.7071 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1247 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-12_08:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2025 Lines: 40 On 10/12/2016 10:40 AM, Dave Jones wrote: > On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > > > list_add corruption. prev->next should be next (ffffe8ffff806648), but was ffffc9000067fcd8. (prev=ffff880503878b80). > > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > > ffffc90000d87458 ffffffff8d32007c ffffc90000d874a8 0000000000000000 > > > > ffffc90000d87498 ffffffff8d07a6c1 0000002100000246 ffff88050388e880 > > > > I hit this again overnight, it's the same trace, the only difference > > being slightly different addresses in the list pointers: > > > > [42572.777196] list_add corruption. prev->next should be next (ffffe8ffff806648), but was ffffc90000647cd8. (prev=ffff880503a0ba00). > > > > I'm actually a little surprised that ->next was the same across two > > reboots on two different kernel builds. That might be a sign this is > > more repeatable than I'd thought, even if it does take hours of runtime > > right now to trigger it. I'll try and narrow the scope of what trinity > > is doing to see if I can make it happen faster. > > .. and of course the first thing that happens is a completely different > btrfs trace.. > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > ffffc900019076a8 ffffffffb731ff3c 0000000000000000 0000000000000000 > ffffc900019076e8 ffffffffb707a6c1 000001e9f5806ce0 ffff8804f74c4d98 > 0000000000000801 ffff880501cfa2a8 000000000000008a 000000000000008a This isn't even IO. Uuughhhh. We're going to need a fast enough test that we can bisect. -chris