Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757135AbcJMVUO (ORCPT ); Thu, 13 Oct 2016 17:20:14 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:42005 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756867AbcJMVUE (ORCPT ); Thu, 13 Oct 2016 17:20:04 -0400 Subject: Re: btrfs bio linked list corruption. To: Dave Jones , Al Viro , Josef Bacik , David Sterba , , Linux Kernel References: <20161011144507.okg6baqvodn2m2lh@codemonkey.org.uk> <20161012134717.n74tww5eywc7dqp7@codemonkey.org.uk> <20161012144012.7vvfehceoykswmun@codemonkey.org.uk> <20161013181622.qpi5puv6ivxvslnf@codemonkey.org.uk> From: Chris Mason Message-ID: <7b476728-75f8-e3d3-1261-b9b0d598ed10@fb.com> Date: Thu, 13 Oct 2016 17:18:46 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161013181622.qpi5puv6ivxvslnf@codemonkey.org.uk> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [2620:10d:c090:180::1:36c4] X-ClientProxiedBy: DM5PR20CA0029.namprd20.prod.outlook.com (10.171.161.143) To MWHPR15MB1247.namprd15.prod.outlook.com (10.175.3.9) X-MS-Office365-Filtering-Correlation-Id: f1f2b3cb-eb3c-4b15-a25b-08d3f3ae9002 X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;2:qShZxl17c16SKBZ+7OB0rmFfmTmhRfhDAQI7zOgc9BHVr6KSrkoUQwR4qoUrF7+teAPf6g+agaO64hjUxxv6XQc0msVNBiVZBAVdAWM8Wbis386JlPzKBds+j1CMmLZzHZGq6oSrsT2b1E8fh2ausYhXthcV7nFwkAIk3pM0t9nX3AALTysUXmCeiogO6W6Et+AYSoWeuo/vG/2apjTltA==;3:3M71MiFnq99cJOzrbVh/F+9Dq+4nn+frVVo7zmBBKorTdSzHtW9Dbkxu/EnPPMdtQmV+Keg+ONTT8HdxOtp5pGtWC6iEdfeync4uz/Pe6NSsu4MWNDiFOg7wOb+GrinUZvm+8jDOOKTWcl5ZaIbvxA== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1247; X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;25:BmJGxC7/klkpLPWuZft7j9vZaUBv+gKBfqlo/Is5cOkyTSRhHkDeS4mlB/H2d0dldXjJ41Gcd7T69KhpWkFb+672JlJSivvQ06GjNaoz3W2Q5Eo+cjAwwNPH9382BgdnEIfMPYbaYWXO2wOY7iAMV4TfjX/7QDybZbR1oMOL6h9PzjLYEoMdAa61+jl7fUVs3WNprubK892Y7TBrkukrdREXbMh99MkBylezF+knedGDtncSfxKmc0luWh6q0rUKrmQycfysshP3LMmdBWJufwh8lNhb8wPdQeoUIkUM43fXWesJChQcpRvhWGV9dPFX77mlcn4KEmUmcYSEx/WpInFYQ0us91kAcGjc2qtzo1ScAl/7ab2+tjjTZpVdDo6IvHhAqsdU3kfvxp3pp+9JWDoW3vtoxBD3W/YRyLxF1Uz14Adq745YHRa34g8ahbpiZDVyTKa6FlAhnngLgUUTzAttJEhbqPtvEf5It/A/4Xtv3BRCtl2di6Kr9I2FK2/O2iM744BFrVBrkOIC26BBciwbmV7cVybHtvu9RrbpX7fxYt/AB0dOPzaS1tYt4MnYTuUXj4A4yCTXZrAcWgX2BSemC+Fp2bKzqWeQ9jFRc0nqFduIK1IDAdJdCW47UWXnXSKvi8Xoa03WCDVXN0c3pyiNm9pH1V4xLa0dhEzeNUp+wG+7kzIMJFNLNFXnJgNS2aDDwCVP8hBGH0u+En8LlwDbsSXyER6IFXbJcdfYzg4= X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;31:JJ+5y+/UxlfaLmLHVN0qOImVK3+VfovS+eVegI8wFfyaf0Mm1MZVE/5u1pvyLLiNh7U8J12g24l66DpVrti0NtkCeiz2BZ3JSrWbo6zWcFRpM5q6l1Uu871VnN+OCuCqUq5Jvj6WAbJUZSKTqQ1fwNAAiQisnPJNlXMYmnT1LTTgd4fbFMixrNez/aOX9jGkoe/5yukb4F09/xtFtrACQZKuy71p/Og08TWnCqGUc6MQrNSNLUxaWWUDd7W9Im/d;20:ddlLCWvPW/KMX2zB0OUNVJ2pjHkTOmRdswm2Y8yYb6ZnOgMbgfuktedaPycD4R4Q2GhQgiQ9XlcRc4xFv7c4KbhZJ9Y8IUyw9mRFLgFclh0en+BFGXE20fJbdiqrPgYRA50idYh6pSKLT5U+g8Cl5/JS47Q5sIqkWJr+3Pivq3M=;4:NXNCJykpY/PFfQhr7GH2NOc6fOI+jRoBh2SpOVp4wFpU9Rx9HNhJ89oE6bGsQfA9cgq061OxizXEJpKGdocUf8ySD0Z6UohVhxLN9v4cSaC3lFbQ/wMcQhpAB0ZNld6RRy+0/It3394hcmI+lzo6LhuEVHZhA8jAfa+d0gFzjSk7AxWyTu5hBKr7ddW95xVNdkUDKMJ9iA4f2yZrt4mTezdD3NNoAj+hIBgJLU+jE0fSWHpwPFf3iZFu9Fahjs2h4AwGqbA6Ac81aaeiKKxVrXWLP/kTcimVDYT1f2nQd9YBOEPzFAS8boPRtyTrWBKLh+KlQUA5gzAawXjQOAmLCMIie2WB8LHE4GkNFf+Rv5tJaoBB4Ns/ubFx50R8x6V3Rt/502BDa8mdXGal2kCfiw== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:MWHPR15MB1247;BCL:0;PCL:0;RULEID:;SRVR:MWHPR15MB1247; X-Forefront-PRVS: 0094E3478A X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(377454003)(189002)(24454002)(199003)(106356001)(92566002)(81156014)(81166006)(8676002)(230700001)(77096005)(1706002)(551934003)(4001350100001)(23746002)(97736004)(50466002)(107886002)(47776003)(105586002)(7736002)(7846002)(305945005)(65806001)(65956001)(31686004)(76176999)(54356999)(101416001)(50986999)(586003)(31696002)(5001770100001)(42186005)(64126003)(33646002)(68736007)(83506001)(86362001)(5660300001)(2950100002)(6666003)(575784001)(189998001)(6116002)(93886004)(2906002)(36756003)(65826007)(3826002)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR15MB1247;H:[IPv6:2620:10d:c081:1110::1365];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;MWHPR15MB1247;23:4RM/XImHVupxbdYmZmXyKdIoJVwFWZ9kvH8uX?= =?Windows-1252?Q?UEIOGGzW1fwapWByKRIHp2AvueGLwvpgieOo+nPoDZeaH7F6QBr6Vz+0?= =?Windows-1252?Q?6T9p0VjW8fzgr9pKQ85lW9C5PIjZr1M96yKo2Xx/aUmEw8Wh2ULIAywt?= =?Windows-1252?Q?k1HTaQdKqwe657HvSbLUqOfzJYF3zGPl0NcbyoFSN8JNdMVHn5hUrX2Q?= =?Windows-1252?Q?eSLpFKo3Tqp/Qu4OL5t6mLHsPa50rHOElrFIlByJmfx69XqbsrsGSILc?= =?Windows-1252?Q?tdeSZFkWdCEUDq9I5C0QpnMf68fUzdjb6qcgz5hnXoUVVDDcKbYNFfC+?= =?Windows-1252?Q?GRLTUcDGlWJ99T+NySLXhUPsAfcEkgM4VIT0oJu1Fwu2/H7q6g6LrNa8?= =?Windows-1252?Q?cogMCy7QB5ioReHK4WrlpoSbwZicBhw99obl+41xXouH9HL+tZ15eJ3+?= =?Windows-1252?Q?q4hyq35b6ayEG0hB/4SVucRg1mp1mvVtPX14DEsA51F7kTPH4UKhyqYt?= =?Windows-1252?Q?oXkoCcorz2qqtwNVwXhU1+eQIwNG01Zx3YS8WA3M/FwoaX/pKe4m12EW?= =?Windows-1252?Q?64/DuFI37apOYpI3J9R/136PmoB7ET3GA1tpVHs5AaSuSL+VOqJx5D/6?= =?Windows-1252?Q?yILIIuPg0lWp0jLw3tvzU5ZpLGHQP5Du0MMr4BN6XTYbtlEln1ApUVzh?= =?Windows-1252?Q?By3onuKIDt4Y2qo1/EcpSoD8/RZb6agJdocBEdnwl1tzapt2vixbShIV?= =?Windows-1252?Q?wDFCVYWj10P2619kpRIpfYKmBjF/e9iSa2UM36PfLVRWdfn173Wzj4XT?= =?Windows-1252?Q?Zk0kqSaJ25yfWbLtkJhYvSupBttoxye45+oAHqXg4DTIC5wCPoQBC/dV?= =?Windows-1252?Q?U333LxTKxVmUbWAijAaLEEl4wLn5bcVSyzH+0Xdqcdji/liJk59REGDx?= =?Windows-1252?Q?5DdDbrN2k7CqLs756Jr7jMrSbpCcvGOmfjN+sYKQ8/y0H7SRs8+bj2zz?= =?Windows-1252?Q?sJGx2KIXlPiqpwJ6f/ql5LvsrZ0x8GYPMgkGxtqTDyKsAvlGUGdrdnHE?= =?Windows-1252?Q?CBuDY9tnlUy9ddehJ2NNFdjDU5gYXwTvcIdo6rpxD9JCGJGJ27Qd42yB?= =?Windows-1252?Q?M6x7ww6tZuF9tBZtK7EZSyt2luWHru3S3StWlIJojtoyyOB9GxuXIkRu?= =?Windows-1252?Q?ED+vsCfwKm+3hFJFw0ttdUZbB8mr9eeUCys7mEuzqEj4x3EkA5yk+TuB?= =?Windows-1252?Q?tckA/i6DO9jk3sKVhMLKH0h+3z7P+jqkaczaHNTn7CKPLDf8rAgS+Qc4?= =?Windows-1252?Q?F+9K9zm06UGGqyXBcYPAAV2oTbGA1zklFzzWASJWouPUpSMEaOFoifGK?= =?Windows-1252?Q?z3aSPrMCwTyj5ge7/NqBmOjQYNyOjTyESsfyPEsZmq1d66z644xlCJD8?= =?Windows-1252?Q?txn6LrQNSLFqGVdUbR+?= X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;6:C6WU2ocnE93fAfgg31PHFv1Jd4X+UKQKhNN6W/T7dhLn7HzNm9r+ksdxl3q3+rYEUzL3I27XkPFs387bevHqHdITYXCYYNScWMWHPJdBt7en2XCwpwFkbdHwLaVmL6fqYfojSney+3Xi2oStKzLfaxO2yKSFAD/DxuNhaevPCNUKL/MjCvZV8qpbcagjlx+y5hnD16AN+yOndXTW/5JlKT7aMLNF4Nt4Xq45QiYD13MnzLG8OEeuYpAa5+zkpAs6/iQz+6jY/av4BjoqP71Vq3JTKS0i9GIGjcIL2BYpZXTmSMwwo/Yh6Xznstvd81L/;5:IFQvKnCmq4rajyk/ANowcCw22NQUsMZ1SZGvCVropj1Z/9BtHONsffnF8L/Ae2PpF/d7kyLV4R13UB9wqbmFHb8cciY3Or6HQ+G8z+RirvQBtrgjdPjG+a/FR77GMOtTjrLoyIyBTrNnaCnE/ogI1Q==;24:cCRDsRwBnvdJWFSCRcPT5O33iSR5FzyHTeMFAACOjeps268Dc1ldxaaWh551zM8wwTVZ/TbpcSOdTn7J9UkFpk8g/5EcI9uhk10t2GNHcKo=;7:MoOEykheArjARbOi+ZE+iuV2TfB8Qhd8hP5r3fWRiP/+Mx0IqCJ+7qCxNPkOqBkZOxj9agpHca5ZMyaagyaLynIweqJEdewjIe+H+Z55m5UXNWHM3RFdLw9ckRdKoPTauhiFPkKlONvMS8nn6gJDlIbkkgpFSPOjmZp7Ptc8Wzq2Zc9VqqDzLgpokz4EIau7DMhjhwofyDlkE50o7vGh+L8RbPSPdD59V6WgCt8pIvsGkS48gZGNvRN5SPEqe+VX+IegLWKJw1R461vyTC6Y5YJ47+Wyyb995FStAOsCaTt2vhlWnPzmMWNy9w1MnUq7llyq+RuLM+TyLQrkJzNB/QV6jX0erUFDTqVv8WnI9So= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;MWHPR15MB1247;20:/m63H0sELLe6iCQ2CTcvQuQWhA1VTmZfzkTvbRV2Z1kOCfnFWR8qOGmPB01wV7v5En247imXIGwPOUJTEoataYHT69D9th/WpRAfxK1fTJJDPQadeGgAWfFKrNcP+sFN8+exMQYKAHQRleEMWDftxaEohMeqkQqvAqiCZ4PKGw4= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Oct 2016 21:19:03.8088 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1247 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-13_13:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2742 Lines: 55 On 10/13/2016 02:16 PM, Dave Jones wrote: > On Wed, Oct 12, 2016 at 10:42:46AM -0400, Chris Mason wrote: > > On 10/12/2016 10:40 AM, Dave Jones wrote: > > > On Wed, Oct 12, 2016 at 09:47:17AM -0400, Dave Jones wrote: > > > > On Tue, Oct 11, 2016 at 11:54:09AM -0400, Chris Mason wrote: > > > > > > > > > > > > > > > On 10/11/2016 10:45 AM, Dave Jones wrote: > > > > > > This is from Linus' current tree, with Al's iovec fixups on top. > > > > > > > > > > > > ------------[ cut here ]------------ > > > > > > WARNING: CPU: 1 PID: 3673 at lib/list_debug.c:33 __list_add+0x89/0xb0 > > > > > > list_add corruption. prev->next should be next (ffffe8ffff806648), but was ffffc9000067fcd8. (prev=ffff880503878b80). > > > > > > CPU: 1 PID: 3673 Comm: trinity-c0 Not tainted 4.8.0-think+ #13 > > > > > > ffffc90000d87458 ffffffff8d32007c ffffc90000d874a8 0000000000000000 > > > > > > ffffc90000d87498 ffffffff8d07a6c1 0000002100000246 ffff88050388e880 > > > > > > > > I hit this again overnight, it's the same trace, the only difference > > > > being slightly different addresses in the list pointers: > > > > > > > > [42572.777196] list_add corruption. prev->next should be next (ffffe8ffff806648), but was ffffc90000647cd8. (prev=ffff880503a0ba00). > > > > > > > > I'm actually a little surprised that ->next was the same across two > > > > reboots on two different kernel builds. That might be a sign this is > > > > more repeatable than I'd thought, even if it does take hours of runtime > > > > right now to trigger it. I'll try and narrow the scope of what trinity > > > > is doing to see if I can make it happen faster. > > > > > > .. and of course the first thing that happens is a completely different > > > btrfs trace.. > > > > > > > > > WARNING: CPU: 1 PID: 21706 at fs/btrfs/transaction.c:489 start_transaction+0x40a/0x440 [btrfs] > > > CPU: 1 PID: 21706 Comm: trinity-c16 Not tainted 4.8.0-think+ #14 > > > ffffc900019076a8 ffffffffb731ff3c 0000000000000000 0000000000000000 > > > ffffc900019076e8 ffffffffb707a6c1 000001e9f5806ce0 ffff8804f74c4d98 > > > 0000000000000801 ffff880501cfa2a8 000000000000008a 000000000000008a > > > > This isn't even IO. Uuughhhh. We're going to need a fast enough test > > that we can bisect. > > Progress... > I've found that this combination of syscalls.. > > ./trinity -C64 -q -l off -a64 --enable-fds=testfile -c fsync -c fsetxattr -c lremovexattr -c pwritev2 > > hits one of these two bugs in a few minutes runtime. > > Just the xattr syscalls + fsync isn't enough, neither is just pwrite + fsync. > Mix them together though, and something goes awry. > Hasn't triggered here yet. I'll leave it running though. -chris