Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755847AbcJUUSc (ORCPT ); Fri, 21 Oct 2016 16:18:32 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:41980 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755536AbcJUUS2 (ORCPT ); Fri, 21 Oct 2016 16:18:28 -0400 Subject: Re: bio linked list corruption. To: Dave Jones , Andy Lutomirski , Andy Lutomirski , Linus Torvalds , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel References: <20161018233148.GA93792@clm-mbp.masoncoding.com> <20161018234248.GB93792@clm-mbp.masoncoding.com> <332c8e94-a969-093f-1fb4-30d89be8993e@kernel.org> <20161020225028.czodw54tjbiwwv3o@codemonkey.org.uk> <20161020230341.jsxpia2sy53xn5l5@codemonkey.org.uk> <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> From: Chris Mason Message-ID: Date: Fri, 21 Oct 2016 16:17:48 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c091:180::773d] X-ClientProxiedBy: DM5PR20CA0041.namprd20.prod.outlook.com (10.171.161.155) To CY4PR15MB1240.namprd15.prod.outlook.com (10.172.180.7) X-MS-Office365-Filtering-Correlation-Id: 21fa7347-a35d-4f65-523d-08d3f9ef5e13 X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1240;2:/n9duBEyLx83MkMtjqZ9FePxqctmSlnq5HQlRat6/MEmPBZ8sG0sKd86nNeg55zpaxf25HXv+BrnjQdDJAiTH+JiVbsNuLSz7YpdR4ykCd+q+KGa0XRXDh65bann2dlkERBLrf+VZDJyoxSKXk1PvlWdHwxZKe5WruUJX4ox/Y/79FI4EbQrYCMbAx3Y+SqoLRwlkNDkn6jUdg9LkKMb6g==;3:u7fJkW2/VQ8xn2DVuDrMmJhvHKX6u9WCjEPfeD7FXa4vxwXOKrUnvNnjf/lalOls3bu/WMAwZuV6vyXylP2pXb1S6Tum0v2gUN3K7a8g/WrpxOkrqPSV5fVzHqRRMMCQcmLWuT9Mu8rbewRYTKkNWw== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1240; X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1240;25:HvYbbGA2spvrgAz+oJ2LFw3oO9REoDFiAghqRsWN3+tncxZQFyXxjL1n4mLo5fHOIhZ4SJg5eh7A256qhR1yQurxDQGOwc9KLaAZ5cz7e13bRvswrnpg/jENa/FYIhZ6l3JgYSnOnnssrdGus6bWASRbgOz/rxJbHUNRZa4OZj5uYNiaOjH74gnvsjwqb1savrnTJmLHKPOieUqOGj9vhTBstaKWhELPCX7yJhLrlPmp8Xd+ETfWyib26NMmuufi+Ug4aeRlAH+MAMuRSRmtHVcZMdcUNg3Wjm+fd7JrvGKCXaCsJJ8Jhyr2/QGewZuYSC/6lWXI0FaCGqsg7KdkDnH18Q3TKMGXlfKNLxmBBxd8fJI2BjODjpNwdq1VQfeiEGHymuC7WghR8ezm6HVSv3at5YMvUOdLb2iJWAEcN3A1MFRdIOVh9qY/QuU9Q3qDajXuvVN9RwCsbZhwqfRXmQ9Ede2CEM0+V+ShRBx7ATRlgeadQu5TzW8TjquGvZCAunfy/Jr2wH4S+tkmxC5BjFdOqhz/nM5TTfSYs+S5V6tF9GeIJY+SrN7uHm7hbQAAIFILFCMI/En1WxR9zw/BshcDpgqGq/Cx5psnSALhatfzRWHQqomvEnbhcX4FCplvMSDRdEPpgm0KMxYUECSjB6nHTiNpK9LW8Dreb8C11mD6aHmyS36Fzko66xKG1DFVpViCUJGJzS0frQvfbjKqYTSKmvZDsyxjBn6IS6C5BcwyDnWFG2Al+4kOSgvqapJG8ggfN8sDeKZaU7L4RLWw6hzoGyN2yRpNWmIcR1jtC6Gnqa47K+Ln/tZ2nT1LgOmubJcFMrQ/P/RJOR3UYneFTA== X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1240;31:xJKnDt4XBFnVG3gSPDOLmo8fIt40m8GL28OYrmSzHWkgQFpu1DVM5jo9P8qbglhel8D9zQzwFSMxu/T3uHE6Z+NVj5MDD9RtkKMnD7Dni7IJPe6KCXnzuZzNcG4YaKQjP3N+P/o54srYz0k2As8ydBQnyK9s0/Gys8kOaoIDCKrLDdQpekIbkMkejeSZ2frRxzQOtHiPImRpAO6XAbrPUfTw7aNISCitJHyaAaPSWfhEsmcRQ5N/MLpLmIsTJ7YB4b1ln26DCMoFvBFwH7gTiQ==;20:+SABoXys0q5+OsCQdpgl5aoyR7l/ynC0/pCZF3mzUmxtfclgqKXnFjDBXTb61jk+Qt9smvFNYg5WMXqblzfkE8HGWs95z8s4sZzI4+4g8qVaa7pcLytLgSAsGAzHhRkvqB0eFPYwZDIm0pL30VWbwzkGuaq7aKd80s5TTy5i7cY=;4:SslXp5Z+XWCKDQWKqvnUCniueee+Nfr25v0hiMnFJF//AXYK8dZkJTwOj4MsdIBRetlC8Qx21ToUpuM9qXwsj5eMlcd/Al+xtX1PJOxOE6SYJ9KWTSKkRwT5QTW9SE8FkpVBiebtkE6n2+JIft4YKHM6Kz8MEBxaE2eJSnELV9IbQUw3+TQVaVIYnI64jaMByhGo+4sgZ9QMOIWv7/5tkTxYmY/nXT8pUkB2PmGiNlDRz9dziTMdUqqwC/XozRzUc2WB8xfwvScdtTrYxjc3BKaSL+tIigNanRHB4dFMivfBybTStI1GO7j71nmEQPUEiAMAO1RqHayfDH//itijGDEtcsk6/eiT2bHHAvYMAi/RG0H7BOTGcewhb8n2jtpfSV2zHo98hp/2zgDdN4ac3Vmr6UlZlwSrHM9F/jhADioPmaYiX6vU8rcDE3SswFHC X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(84791874153150); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046);SRVR:CY4PR15MB1240;BCL:0;PCL:0;RULEID:;SRVR:CY4PR15MB1240; X-Forefront-PRVS: 01026E1310 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(189002)(24454002)(199003)(377454003)(5001770100001)(93886004)(586003)(81166006)(2906002)(230700001)(64126003)(47776003)(92566002)(101416001)(77096005)(6666003)(65806001)(83506001)(5660300001)(65826007)(23746002)(15975445007)(305945005)(2950100002)(107886002)(31696002)(106356001)(6116002)(86362001)(65956001)(8676002)(7736002)(68736007)(19580405001)(36756003)(189998001)(81156014)(50466002)(42186005)(33646002)(31686004)(97736004)(1706002)(105586002)(50986999)(4001350100001)(19580395003)(76176999)(54356999)(3480700004)(7846002)(921003)(3826002)(1121003)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR15MB1240;H:[IPv6:2620:10d:c0a1:1110:8000::204d];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;CY4PR15MB1240;23:NRZDeUyeGuhA3d5IBmL+6cpF1I7JmLsEEAsFT?= =?Windows-1252?Q?zXFxuJ8t83zSQuRZr4dnc2wisjlkZVxcK57mB46TM+sI12ZfIZZOyiTH?= =?Windows-1252?Q?sbT4x8sG57q1YlzfEP06e57y/iRrrVwOhdluNy3wCjFC8J3bfB9Ua2WR?= =?Windows-1252?Q?p1j9MVmwbeH4BluOMmUwNjmJUOuSEkAH1/gEac16vVWOb/vPBbCK7JpT?= =?Windows-1252?Q?BHJITOG/QM7FxpjY9xR0kAmINbT9fNg8XnpzjC95osdaChU2hctBjSvZ?= =?Windows-1252?Q?YU8hDS2m5vL1stIg46Kp+jEbX+iygNqMwEi/HDSMsMm50dEJDE/n5zXH?= =?Windows-1252?Q?j1G1rFBvRp0y6VaqUxhVL9YZPiOP1xYZJRnHDe+7OZ8f47AbPc3zN3Cu?= =?Windows-1252?Q?kCkruJMvCud1/TOJLSP8v4sp5gO+IZQPm+sxEs168/ZWzjpGE4A8bZX2?= =?Windows-1252?Q?FYHuVyIzR2tzotXtJlnxrc/1TR4zcJjyycRQa4kic/8rN1vo/AWWI9sY?= =?Windows-1252?Q?LXiCUTvxSWA5w6/XaUXaAEtLO18/aoow+s6gdvPLL/bl2zUFml/KI5i+?= =?Windows-1252?Q?0xTtyB3QQMuIMdqoa2N4TPy+cMZaX3QLWtJWwXNEJQi5cLUdwyV/zpzU?= =?Windows-1252?Q?JpoRopdAOZrXhxOikN2CTaFQfzGiQBiLiY+/4sK6hYP8ZVeMlXr5AynI?= =?Windows-1252?Q?IjLcUyzvmMjcbBW7ZnFQJMbB4UaUgcHk+ytNoJvQCUNfH68zpOBivtJf?= =?Windows-1252?Q?vjyrP19jfT03OYLYd9izOXysO3xWmkzxkmCgrzawSy2iPOYZqplvizEP?= =?Windows-1252?Q?0tWKdLNmTvKqGhBMmcxQTtYqF+spdsTGioMnfnntuK7rfI+iaos9fd1t?= =?Windows-1252?Q?uJgkYU+qkKrkbDGqPjK6kYSGqal0vicminlzR+jqBPGo7OBYZtBcMpek?= =?Windows-1252?Q?kcvaBeqbz9P0xGp0pr6BATEFkUXkAPb/mv81Laz5Thva8E8gjRGsVXUm?= =?Windows-1252?Q?EPFsyQhQFKPo7X2JDEhvkStl0CMHDQqHjHdmRa69n2sspj5sycVsuUVt?= =?Windows-1252?Q?lABrnrp3g715eKcwwn1E7QPCX+oMBjRb3q0NCRT/lxp9Cv3sWQiZMBdu?= =?Windows-1252?Q?OcUZuH49F23IUxsUxS5zvcU4TyKJ7ci0r6m7WDReBm0lBkrRUASKjj45?= =?Windows-1252?Q?tkL3RZeO19eTFCxgyUWKcfguIMPo0Luogbo/WVFiykewZ1HA1oMvzavQ?= =?Windows-1252?Q?J7bTlksQi8rJcQpvzV731SMUJNsm9sl0x27sY1NXb1irOx14qr4chQEM?= =?Windows-1252?Q?Wl+PCUKDE3Gzr4bL3LHuWm1qjyZDV/ZUFujGufKd8H0W6RqoUV/Yz9nc?= =?Windows-1252?Q?t+15nJD0WKQYfR6AJn/hYpzdC295bT4EREujhDegoO64I6eDwsbqtBxZ?= =?Windows-1252?Q?wMpMLRjhW0VwAucsFqCF1MogLO+6PofRKN5p5n+egAknC7qjfBKJYvEx?= =?Windows-1252?Q?PZKyXrq3Jvtlu2/KhJXI0cSNIIt6XUToLyjwAllrcySoC0vVQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1240;6:ubBuyRPavTGT+Gd+jGnKLGe0uXFJOTPFUpXfpjsZBrKSlf7IMTeEam5sbHnVuexVY18CdWWy/7JgOMSdUGp21Cdes9srL/i5XOSt6aoZ9EZqdqZoMajjIhXgDjebgYQICiEKZgcjbUyuaSk7/1aFCdN8GrVWR6GFIxUnNjsR3OqNW4JBoV53X0ChpQHjJ7g5byOBelL8BkNVnypB5iZl/baT3lGdA9SlnMaqrDxuw3kIg5wYDtWPlmW7XzBOSIaep44FycxEqbMfYzcCR5NyIStz+pJxmDoGk830pRkoSRTs57+8tDwLrZ1wd/gOJ0uc;5:3jnnqYAOThTmwaPngSmt7xKRm0Ste3WW9dshhNI5rYnauIvooF1vej5hfHixEsP8cyY6Dzmh85ZtGRmTzupnGuupHA2CjLJ0e5N2Q6KXQVhFoHEUPc0j3t+GJaq0Yiqyd/dAA930atYc94wpp6E+Wg==;24:+ANw/nb0t/UyijRp0Y5pkf4G5zAq9JV+AoCtXE9s1hZOdikSodM4uZW1Zwl2T4zp0zsKQmAoIqjgmJ7kjr5lWZqPN3L5FIWqidFdqFIlXuU=;7:HAghOhrxF3wvl7Vm+yx6/ZTEsnDpdi6+6GxWOfNupBD6kdsvVYmmPd368tcRfHp21WhxLQIvtiX01T7gMQToR/6Oz+au59RVxEPVIMqYYEGEI2j7Rf3wm6N6I7K89/ZIJ90bsHQArHg7oSOVK2FyIB5j9/c7dpc2+krPX8JgB32RUuNaFvsC0h/fKVAC5JPRoZL9uDy1xKcbiFDLgZX12OiAwX0hMAV2vmZ5u/Z966RSpT9TqCKD/2BLTz3x2mT49tmjp6I/inSg4ZJnhpJEbTlksYbP59vBBl0/wqoRZNoDe1tFGOsiAaNCCzNyAzKX/Cuw3+7AUnh8bN3UB0OK7t47Gy9OP/ZUXep+MQxkRGk= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR15MB1240;20:gHhfJwR0fu7YMe1TqjwOrfPWdFsEvQO3Ommop7rcQzWmlx5qrNHgS/HduG8QSTxmhpOFCeoFXqQvlYyZjcPC5wHcMLwqpWco+5VtE2t8JRXKhSpkbe5oNazpa0ITSfEggoa06vqp9Ms0RsQOVWk9z8HiHf9A2O6Mvj1ooim271c= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Oct 2016 20:18:04.5984 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR15MB1240 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-21_12:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3943 Lines: 78 On 10/21/2016 04:02 PM, Dave Jones wrote: > On Thu, Oct 20, 2016 at 04:23:32PM -0700, Andy Lutomirski wrote: > > On Thu, Oct 20, 2016 at 4:03 PM, Dave Jones wrote: > > > On Thu, Oct 20, 2016 at 04:01:12PM -0700, Andy Lutomirski wrote: > > > > On Thu, Oct 20, 2016 at 3:50 PM, Dave Jones wrote: > > > > > On Tue, Oct 18, 2016 at 06:05:57PM -0700, Andy Lutomirski wrote: > > > > > > > > > > > One possible debugging approach would be to change: > > > > > > > > > > > > #define NR_CACHED_STACKS 2 > > > > > > > > > > > > to > > > > > > > > > > > > #define NR_CACHED_STACKS 0 > > > > > > > > > > > > in kernel/fork.c and to set CONFIG_DEBUG_PAGEALLOC=y. The latter will > > > > > > force an immediate TLB flush after vfree. > > > > > > > > > > I can give that idea some runtime, but it sounds like this a case where > > > > > we're trying to prove a negative, and that'll just run and run ? In which case I > > > > > might do this when I'm travelling on Sunday. > > > > > > > > The idea is that the stack will be free and unmapped immediately upon > > > > process exit if configured like this so that bogus stack accesses (by > > > > the CPU, not DMA) would OOPS immediately. > > > > > > oh, misparsed. ok, I can definitely get behind that idea then. > > > I'll do that next. > > > > > > > It could be worth trying this, too: > > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=174531fef4e8 > > > > It occurred to me that the current code is a little bit fragile. > > It's been nearly 24hrs with the above changes, and it's been pretty much > silent the whole time. > > The only thing of note over that time period has been a btrfs lockdep > warning that's been around for a while, and occasional btrfs checksum > failures, which I've been seeing for a while, but seem to have gotten > worse since 4.8. Meaning you hit them with v4.8 or not? > > I'm pretty confident in the disk being ok in this machine, so I think > the checksum warnings are bogus. Chris suggested they may be the result > of memory corruption, but there's little else going on. > > > BTRFS warning (device sda3): csum failed ino 130654 off 0 csum 2566472073 expected csum 3008371513 > BTRFS warning (device sda3): csum failed ino 131057 off 4096 csum 3563910319 expected csum 738595262 > BTRFS warning (device sda3): csum failed ino 131176 off 4096 csum 1344477721 expected csum 441864825 > BTRFS warning (device sda3): csum failed ino 131241 off 245760 csum 3576232181 expected csum 2566472073 > BTRFS warning (device sda3): csum failed ino 131429 off 0 csum 1494450239 expected csum 2646577722 > BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320 expected csum 3828807800 > BTRFS warning (device sda3): csum failed ino 131471 off 4096 csum 3475108475 expected csum 2566472073 > BTRFS warning (device sda3): csum failed ino 131471 off 958464 csum 142982740 expected csum 2566472073 > BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320 expected csum 3828807800 > BTRFS warning (device sda3): csum failed ino 131532 off 270336 csum 3138898528 expected csum 2566472073 > BTRFS warning (device sda3): csum failed ino 131532 off 1249280 csum 2169165042 expected csum 2566472073 > BTRFS warning (device sda3): csum failed ino 131649 off 16384 csum 2914965650 expected csum 1425742005 > > > A curious thing: the expected csum 2566472073 turns up a number of times for different inodes, and gets > differing actual csums each time. I suppose this could be something like a block of all zeros in multiple files, > but it struck me as surprising. > > btrfs people: is there an easy way to map those inodes to a filename ? I'm betting those are the > test files that trinity generates. If so, it might point to a race somewhere. btrfs inspect inode 130654 mntpoint -chris