Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935530AbcJZWyR (ORCPT ); Wed, 26 Oct 2016 18:54:17 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:36382 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932707AbcJZWyO (ORCPT ); Wed, 26 Oct 2016 18:54:14 -0400 Subject: Re: bio linked list corruption. To: Dave Jones , Linus Torvalds , Chris Mason , Andy Lutomirski , Andy Lutomirski , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner References: <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161026163018.wx57yy554576s6e2@codemonkey.org.uk> <20161026184201.6ofblkd3j5uxystq@codemonkey.org.uk> <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> <20161026224025.mou27kki4bslftli@codemonkey.org.uk> From: Jens Axboe Message-ID: Date: Wed, 26 Oct 2016 16:52:55 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161026224025.mou27kki4bslftli@codemonkey.org.uk> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [66.29.164.166] X-ClientProxiedBy: BY1PR19CA0015.namprd19.prod.outlook.com (10.162.139.153) To BN6PR15MB1186.namprd15.prod.outlook.com (10.172.205.140) X-MS-Office365-Filtering-Correlation-Id: fbc443d2-4c08-4162-386b-08d3fdf2d774 X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1186;2:xO9xe61V6YrFg2FYGFC4IqUJNUSR8H9NfBMEQ5PRzIu7KQfvH6UmlUsED0Gs3RVWmfmQ69WfKF+QSkFnaCq37vF9xR9l3WgZNauoGFVrS1B6R54orO/lWU/ZXLLz8VjBT1RzQU9+7DZgxmhgfKwHYasvAeVd41WWA/K1dv53qhglTTFIVh94Fcb2G5HGgHubl0jT6PlQxevSBrGGgZcJ0A==;3:8Z0jVz18+QKgCUsxPdLfpsB4I112WfwQsM3gcQpLr0Lq5KWuruBoGWk7RGNTnHAaq2+StG62yK9b35Xyhoml6GMNxQNdqPHAQTCZ9x+YTd7vccJjbkXVhpbSHM9/Qu54qbgdi8IPIrwrX7KAITJoDg==;25:r9fex8DHtCL2t7ZqJbZcWq6xpbGGlDWzF8p5HDvzdfpHtpw7tt6ef6dVhes1hGay6dNxeBrLakx/O6Hr1uPoQNG3k2M4VpC7IyQpKuAh473SfLhF0AO4xlVmCUmPSehCAeerM174y+gpjdXQTz9ES1UGIfg6esbuQV/mSOdPFcrIFxjgpkVjQwIYqfoJz8GSZFx+DJWn0ENajc7DkzhbYGpzN1KIhdI8Omm2zE9SPoklnlysEwQpZ4GTqyZW91H2jsQNjvUNnNVeHhUCt5XcaX366P3uHyRiZJEzGgQvOEFaJT7ImagjetMfqHR8UYONHnHXK/Rq8P88ArFRjDSInkv5K0+TMNXy2qd9VZKa/DKtmDI1LfrGLvqCBJ4UgCPWWMQJ6MuZKS98Krf0ZpHN6pKHYZhAgE69YNfWWSZTx7lIIwR0T8j+3EJ5U6vI6XPq X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1186; X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1186;31:PYGGNVHiH+0uTnd8YuuHCF+E3lwJ442fYRn89FkXmAjjkXEdxC+IqTwSSjho0Ng6Eau1Ida5tpeF4uHZroIjR3HY95OkXWC3sVc8M/Z8pwK08KU8PVu8hHduqtVoQegvHWsXnri6qj3G/aBYOW6GcqTWfncOv2e+fftI0OzrCMPvCjHin6RIrt1epZsrPDPjrAkyfhM+639WHulqNyqEQuA0bUeQLPbSDvg5sttiFYEi86ePo0585f8mu1rBRhiU;20:6+UURHTbBvdGoNwTzr3VQtuPcqVWhsxIrkM0Cr8Plyq1dURiUcUDv7NZfoo4l/MzUFRdSlvljlJtHsKIK+R92xIJhNKGGVNyJUZjhtf3ZMxljQzd+n/aEc32JYYsFhoxm/0zJQqjKFQe1eCyAP0KhivRMZ1fPu5WDmTBs5xhkcM=;4:eWs23n1FfHOz7ykAmcmCnLnm/1h8WlGbV6EuYqimX579Z7yH1t135j7cLDPJN6xumXUYbeytU+yiJb4dt0Tx7W0azHCNeHgT/2S1/bCCmaeb38Hj1z6IgdfI/rE2ZB2HE3CKljLMWh0RsxxrrlSmLdYeqw+K8DBX1FVCAVLIgvcFpEHR8AtAGltxq2MZF+D9yJ0DU87nDpaGkko/5S2vpfN/zEcv+0tou/BTgr813XIcUoCS6M62gSKjY1DDTb5YS8QVpg6pTvJBPkrJou8z/WrZcVdzthc0RsCHyfT8lfp8h1CWy1s2fjckoWAzhiFc3B3+xFwTryBvpxQR/T58PXSzmgVAAK/rANss4q3/Ux/jTdL8ecahFPJ7Guptvwy56E8veGqlzUZb8i1jUUh0ig== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:BN6PR15MB1186;BCL:0;PCL:0;RULEID:;SRVR:BN6PR15MB1186; X-Forefront-PRVS: 0107098B6C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(46034005)(24454002)(377454003)(189002)(199003)(2906002)(64126003)(42186005)(105586002)(83506001)(54356999)(76176999)(31696002)(106356001)(77096005)(19580395003)(2950100002)(86362001)(6666003)(50986999)(7846002)(81166006)(33646002)(117156001)(81156014)(230700001)(31686004)(47776003)(8676002)(23746002)(5890100001)(68736007)(4001350100001)(65806001)(586003)(97736004)(5001770100001)(7736002)(189998001)(66066001)(65826007)(3846002)(92566002)(107886002)(5660300001)(101416001)(305945005)(65956001)(93886004)(36756003)(50466002)(6116002)(3480700004)(921003)(1121003);DIR:OUT;SFP:1102;SCL:1;SRVR:BN6PR15MB1186;H:[192.168.1.176];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BN6PR15MB1186;23:kS55y0U8BtAWfRbifZBJZzYvgW9WMvuahPlWU?= =?Windows-1252?Q?HZdaPb63HVkoPmi6r158t4LgUnAzBNsMxVRW+VRKMLoj7HR/I6k/15NM?= =?Windows-1252?Q?YlkERGD3P9zL13jk8+KWNzUme73mqu9+8LXlxfdNScidmNw6DhcJrIui?= =?Windows-1252?Q?ciAEnrEsHPcbU65yhM3AjGbFgd7QocxPq/XB9GbHXJFqqunNYcAx3A1c?= =?Windows-1252?Q?ugawvI0Tp6ZTzGNUHUyQyFjuyvx1BFk5ZIAjnmVwb4iWLbIpYW1eBoLr?= =?Windows-1252?Q?7ZZWmC90KZ3s3u7rXRh61nLmUE2YLmcfoTWuanlQp3wXHo3/b3tuuDD7?= =?Windows-1252?Q?chdmiJB6bDkJtsBAKEwjjmM28VlVSM6J42wehiyyE8KW/h0vk/ibp96i?= =?Windows-1252?Q?hWu/qbmFqmY20d/F9spKCO30Ghagzij9K+zsJWZlwR649RIKZVF5i0KF?= =?Windows-1252?Q?PAbZajI2MNBgkoCcKQyh5atkZq1rKvO+AcSSUDTZnkl2rEmdD61FIZlS?= =?Windows-1252?Q?2sshSczsDSJzk82pmmrl/327pavPjQQjRIpD6SzNT/cH819EbNsrRgan?= =?Windows-1252?Q?a0NsVoxM8sPURDreXFLuUhVCutyW5UEHdBjgfM4JHNYQ6vmJZC7Jwz+l?= =?Windows-1252?Q?fI8chir73C2aTdp1LolIjPlLKKjBvlUsBMBPvyb92mt41QmXxYK7J7hz?= =?Windows-1252?Q?CsGLbovQqTuRg6skFEwaz+atGCT1a7NVhol4GVCrKqVkL8Tngx/wcXM1?= =?Windows-1252?Q?CC9yq+tAqxOJuPvTJDVcCn6SxuNYb7Eq23ushvuXRMC9OwV5hn2epIlj?= =?Windows-1252?Q?K0ih9aJ+8IZzPf9Y0rUmU4ZUtKo5dBSS4DoXH3upNiv4jnTAs3LTjNQ3?= =?Windows-1252?Q?PACwlPQJkU7Jwosd9uCUUsbAhamzUyxVJKT8FDNRb4Ar19hEHnZd3IlL?= =?Windows-1252?Q?pITfdwJ9EMTFEuhgsBl7ieoGt3ZbU84rmMreUEyv/M1nTg5CT7pFgLaN?= =?Windows-1252?Q?JuQjaQ1q54o9NICwYEahr37j4vBSBChPwx4tuLhs+AaDqPSJrMkJ10kU?= =?Windows-1252?Q?UGsZGWIZn+blRXvMFjlUeSusH3ggXE7G+fsADIkjcKc0AuVmp03vlOf6?= =?Windows-1252?Q?mx89Im79miMfp0rVnhq2lYlpP6QGN+1hw6ENHMTOtJWB0LfelVdU4iDm?= =?Windows-1252?Q?8Ri/vUfFpCSrWgTyOGqnuqBCUD+DMg9lY/gYjum/QAhscfEN9N3pMeWa?= =?Windows-1252?Q?tGqNUVqwVFQAVm/X6bqwaOGP/ecZAdkQ/3LOG3W04jom7/jg8AmWJXkS?= =?Windows-1252?Q?M2VWxaTMxofNArxSHrmHZ+wBiKaSSic8Wi9QBNL8rZ4Bk4KTi8uh8r0u?= =?Windows-1252?Q?o77a+y4QtZlIgGz78fkTo3TOOR4M79DyNvxNMWRtcFwtn+bDzTY8w8RP?= =?Windows-1252?Q?0d1x4PXzGKx76MZO3KPJMz63qNIPDYw6+PTSSAwahXU+apvwfqvoo+QU?= =?Windows-1252?Q?BYwT9KQDUeev+T8PQlePOH5wIt4Mq/QKBAIfd8NbWcUVbywFjqHThrsq?= =?Windows-1252?Q?C0z6hfbYc5pfic=3D?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1186;6:uI+4bSKT7GkBtuJn/YTtMJBJz5RU6qa/DuNI8cM2/ZGQWn+GiLLYZdCa9QbMDxqlowZj+lzHoWpcLbmiMqcNx4QS3VfldQ8fGQWD8EdtmrRS+XUqs8Ak+T7oWk2/gwDfIUHb6ON1xoMH9Na4INCa5gBcd/7LbJIhASxGWhDs7OedaVcXU4laivMdpWO5FYBpHLs3+ZLC/VAd3R8IEXBKezO2Ae5GFiNrsNuL+7A7GtiIdjupQTOkB52VdXl8LxYAY9n6llP20bFjBluNbS/UsqYJDs7ldZTkYF0cMMbLsP+TnwIWTGg0S95uUCoxoSap;5:jagYj53HzEVzuNtyvvgTvOC80HaCHvU0V8EaJxYCN7n749LZXptvhiilHkdnS2fvm/lKnUKx1oAhSOjd3kE11CMbLPgIa8qLy0dv/xcjpjY20PM0GAbIwzn7SMbDtZwOdzSktCQ5KzJ0c+YP7EGK5g==;24:IWLh8JilYWjMCoI8l0BzGX0I/QAZ4EoZzW1Opzz4W+6x/emdZ4G3GkjoQr5EiSh1AusRO206lwOKWDF+p0dnaGcHjVwI7w4tvWFjWhEh7uw=;7:lZ3RW2WZIXOd7b82dQju4XEPg5S/pEP1sf7UwOLBynfvSBJDQ8rp4h/foCswD3gRiFfosI3dMvk6Yro8hqCQFJZlsWgA+fzHZ/ZGdJICXBOiqzhNK6ZBB9qRHdX6FYIiH6H9S03/wHYKIJjLt62CrCVoamUN+vUynoT/0KK6CaGT3Ie4WqXcEmrEevT2z6BzamzijtUODuZJxI6E80n/SINy9CMh309P9wc5njpXxdY/7CZDHNwWSj7cehi/Dx02jYL2ZzIJu8W0MkiwJQgmyPmajwGXM/lU+/XufxQE5z/m4GZDK8lvBnndqqvD2ocDSlMTSvSFNlaoUZhx6w639hk+Fn+VgRPQHz5AJs0ngUg= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR15MB1186;20:EyMr8B7APMuj/Fv0M2tmJ6EzWgyfx70/u0UHYtdeCyMPF1TKqsr70gSHSG9hbtrOVVtz0IMQEaEMElbilXnSZ5bfJ97G/j3fJJ6TqXe+kz807ynTMnJIKG4uzjLg449rdF+YEQipcyETKEXMJDDp3XVLoVgyKkqSywSaC0O28R0= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Oct 2016 22:53:01.4911 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR15MB1186 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-26_14:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2481 Lines: 67 On 10/26/2016 04:40 PM, Dave Jones wrote: > On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote: > > > Could you try the attached patch? It adds a couple of sanity tests: > > > > - a number of tests to verify that 'rq->queuelist' isn't already on > > some queue when it is added to a queue > > > > - one test to verify that rq->mq_ctx is the same ctx that we have locked. > > > > I may be completely full of shit, and this patch may be pure garbage > > or "obviously will never trigger", but humor me. > > I gave it a shot too for shits & giggles. > This falls out during boot. > > [ 9.244030] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null) > [ 9.271391] ------------[ cut here ]------------ > [ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0 > [ 9.285613] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-rc2-think+ #4 Very odd, don't immediately see how that can happen. For testing, can you try and add the below patch? Just curious if that fixes the list corruption. Thing is, I don't see how ->mq_ctx and ctx are different in this path, but I can debug that on the side. diff --git a/block/blk-mq.c b/block/blk-mq.c index ddc2eed64771..73b9462aa21f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1165,9 +1165,10 @@ static inline bool hctx_allow_merges(struct blk_mq_hw_ctx *hctx) } static inline bool blk_mq_merge_queue_io(struct blk_mq_hw_ctx *hctx, - struct blk_mq_ctx *ctx, struct request *rq, struct bio *bio) { + struct blk_mq_ctx *ctx = rq->mq_ctx; + if (!hctx_allow_merges(hctx) || !bio_mergeable(bio)) { blk_mq_bio_to_request(rq, bio); spin_lock(&ctx->lock); @@ -1338,7 +1339,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) goto done; } - if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) { + if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) { /* * For a SYNC request, send it to the hardware immediately. For * an ASYNC request, just ensure that we run it later on. The @@ -1416,7 +1417,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio) return cookie; } - if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) { + if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) { /* * For a SYNC request, send it to the hardware immediately. For * an ASYNC request, just ensure that we run it later on. The -- Jens Axboe