Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756755AbdLTXlI (ORCPT ); Wed, 20 Dec 2017 18:41:08 -0500 Received: from esa3.hgst.iphmx.com ([216.71.153.141]:43094 "EHLO esa3.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755822AbdLTXlF (ORCPT ); Wed, 20 Dec 2017 18:41:05 -0500 X-IronPort-AV: E=Sophos;i="5.45,434,1508774400"; d="scan'208";a="66249412" From: Bart Van Assche To: "tj@kernel.org" , "axboe@kernel.dk" CC: "kernel-team@fb.com" , "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "osandov@fb.com" , "linux-block@vger.kernel.org" , "oleg@redhat.com" , "hch@lst.de" Subject: Re: [PATCHSET v2] blk-mq: reimplement timeout handling Thread-Topic: [PATCHSET v2] blk-mq: reimplement timeout handling Thread-Index: AQHTc3vvt0xfyoBJfkyq1vSApEyj9qNM8QSA Date: Wed, 20 Dec 2017 23:41:02 +0000 Message-ID: <1513813261.2603.36.camel@wdc.com> References: <20171212190134.535941-1-tj@kernel.org> In-Reply-To: <20171212190134.535941-1-tj@kernel.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [199.255.44.250] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;CY1PR0401MB1535;20:G2e2y9djV1qzzUdnN2uinsHe31SBzKzv5pcngzbGWpm5aBDIWfHXKa35mfHcsNuur8HevajoboWsu5mwFt/CCeR3QDtjHy9pUPIAAQk5zAUfYYnD7SNVAg8I/NSfyu4ICIx5NassFpFuK1qXq83pBwJdCaPKmc+Unu+mnotMrgk= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 927f297b-8259-46bd-8370-08d5480321a8 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(2017052603307)(7153060);SRVR:CY1PR0401MB1535; x-ms-traffictypediagnostic: CY1PR0401MB1535: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231023)(10201501046)(3002001)(6055026)(6041268)(20161123560045)(20161123564045)(20161123558120)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);SRVR:CY1PR0401MB1535;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:CY1PR0401MB1535; x-forefront-prvs: 0527DFA348 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(376002)(39380400002)(396003)(39860400002)(366004)(346002)(199004)(189003)(51234002)(377424004)(24454002)(97736004)(7736002)(81166006)(575784001)(6436002)(316002)(305945005)(3660700001)(86362001)(77096006)(3846002)(6486002)(6116002)(4326008)(53936002)(3280700002)(6246003)(229853002)(68736007)(81156014)(8676002)(54906003)(110136005)(8936002)(25786009)(2950100002)(478600001)(6506007)(2906002)(6512007)(4001150100001)(59450400001)(36756003)(76176011)(66066001)(105586002)(14454004)(72206003)(106356001)(2501003)(99286004)(103116003)(5660300001)(2900100001)(102836004);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0401MB1535;H:CY1PR0401MB1536.namprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <996FEBA545BACE4DAC81127FBD88B023@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: 927f297b-8259-46bd-8370-08d5480321a8 X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Dec 2017 23:41:02.8493 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1535 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id vBKNfDWZ018315 Content-Length: 2620 Lines: 64 On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > Currently, blk-mq timeout path synchronizes against the usual > issue/completion path using a complex scheme involving atomic > bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence > rules. Unfortunatley, it contains quite a few holes. Hello Tejun, An attempt to run SCSI I/O with this patch series applied resulted in the following: BUG: unable to handle kernel NULL pointer dereference at (null) IP: scsi_times_out+0x1c/0x2d0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 437 Comm: kworker/1:1H Tainted: G W 4.15.0-rc4-dbg+ #1 Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:scsi_times_out+0x1c/0x2d0 RSP: 0018:ffffc90007ef3d58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880878eab000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880878eab000 RBP: ffff880878eab1a0 R08: ffffffffffffffff R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000000 R14: ffff88085e4a5ce8 R15: ffff880878e9f848 FS: 0000000000000000(0000) GS:ffff88093f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001c0f002 CR4: 00000000000606e0 Call Trace: blk_mq_terminate_expired+0x36/0x70 bt_iter+0x43/0x50 blk_mq_queue_tag_busy_iter+0xee/0x200 blk_mq_timeout_work+0x186/0x2e0 process_one_work+0x221/0x6e0 worker_thread+0x3a/0x390 kthread+0x11c/0x140 ret_from_fork+0x24/0x30 RIP: scsi_times_out+0x1c/0x2d0 RSP: ffffc90007ef3d58 CR2: 0000000000000000 (gdb) list *(scsi_times_out+0x1c) 0xffffffff8147adbc is in scsi_times_out (drivers/scsi/scsi_error.c:285). 280 */ 281 enum blk_eh_timer_return scsi_times_out(struct request *req) 282 { 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; 285 struct Scsi_Host *host = scmd->device->host; 286 287 trace_scsi_dispatch_cmd_timeout(scmd); 288 scsi_log_completion(scmd, TIMEOUT_ERROR); 289 (gdb) disas /s scsi_times_out [ ... ] 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req); 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; 285 struct Scsi_Host *host = scmd->device->host; 0xffffffff8147adb2 <+18>: mov 0x1d8(%rdi),%rax 0xffffffff8147adb9 <+25>: mov %rdi,%rbx 0xffffffff8147adbc <+28>: mov (%rax),%r13 0xffffffff8147adbf <+31>: nopl 0x0(%rax,%rax,1) Bart.