Received: by 10.213.65.68 with SMTP id h4csp1142054imn; Wed, 28 Mar 2018 21:57:21 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/+ntUh28yJVaQ6s8mMRyRwSAAhSzb4dwNr19Nt0DqAeNhv7n1CdFJiOesx3Pn1Y+pgcLfe X-Received: by 10.98.233.3 with SMTP id j3mr5222614pfh.38.1522299441845; Wed, 28 Mar 2018 21:57:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522299441; cv=none; d=google.com; s=arc-20160816; b=DXQYezUti2lXoXK8/QVS9qh3ZD8mnYL6JwxxdebZBzIaDNsQcCEN2t6cAdQWBP0dde k1lbmhck7tLCLbescVkpEzcw/zskVUfgkNCVtZ9hZFYw2sRA0lKH1AGslqMi1dPzf+NW SsSd/QpupPI6v0USyWkB1uOYP7mspKBj9+VWa2e3vanPw1iyoSSeTAnqH+En87ROsM4i nxjd1lGw6cparPloALP5WvthoZr3jiZo/6/Wfgx7hHDhZwZmn8NnIVRGLcZvfuZDimEG Aa+/+om1vWvhOL3GzEvlvmNSt3zbGGHCbvytHlmn6eBs/WLB4ZPq4xegsrGSQWnOjb55 kkOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=YXVKYbB9Ue2v1Bs2hd+5L7+mfQfm/dO1yDnKBSx+69E=; b=Mwf1k3qAmdIOCtGr4kpa4mUoPrEhfwX+iVTgkv2Rl69pjBZGDGZpX7xT0wPL+la39f NcMhc1/NcvkYFcmLtTeXEktbPslVA68UXgWqUCCsRuPUY5mhgOLteaUBwNPno6Fr+u8W Z4u1yQDrSAlqnxYf+hiZu4abMcekjFcnYeCRaBgK2WYLJUt4XdivTszIhgU4qC2eDtNP do7z46UrWoVEN1/p7DWmbioPmiyxixMcmFWIpW3iEnDkYoB9yrc1DyQyOJ2PjUFx5g2E FnCIp7naYHkilxBnBpuvCBZpyhvtCGRLkraCvMNH82xgeL6QChEGKCqUDXNJ2kZ2dcyB NRxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LzhciZEn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d12-v6si1480683plr.634.2018.03.28.21.57.08; Wed, 28 Mar 2018 21:57:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LzhciZEn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752281AbeC2E4L (ORCPT + 99 others); Thu, 29 Mar 2018 00:56:11 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:39943 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751205AbeC2E4I (ORCPT ); Thu, 29 Mar 2018 00:56:08 -0400 Received: by mail-wm0-f45.google.com with SMTP id x4so8988276wmh.5 for ; Wed, 28 Mar 2018 21:56:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YXVKYbB9Ue2v1Bs2hd+5L7+mfQfm/dO1yDnKBSx+69E=; b=LzhciZEnaxtZconE0DpC0dXLapBfn4pp8QOa4fUOpFrBaaoctvm8uObqTrCoYOz9B6 Ue3NmJi0QJ/qKbLeE/2xZ1cHS01EZGmAbDsqGFZ6l2i+OpetU/SbLpN21ufbnobbBYFw 5kXaNrXER8xQpKS/dRLBCots7yi2oi7UXXdgU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YXVKYbB9Ue2v1Bs2hd+5L7+mfQfm/dO1yDnKBSx+69E=; b=E9xQHkBPOOQEYj3+Jwy/Z0ot7N/GiqE9+E8V8zreWNIo1KS1ZxzVsjhhYiP766HI6L QvzJPRm3kicb4k7Yo59+Bko0xM7zqwOnVtfVRWR5sR2/D7ypnwhIrEpZ8fL4TTe4f14A RvWZjMx5mTGIawOfuqH5B2zlSXJwmJSF1m8SN4gEfUfszQbXXgeLNdS3d/6oCTsyc/vn M/BLmpIyFjOCEXetT9vXUWMA4G5Z1audadnDDr+0dM2e8gO4FR8vkyZRtZ9fjuk2THBw UEi4ZzBAsszDX3k9KrsvKXp3cV2DxVWqYEozWqJcCnkMx1LTwW2aI6ahMAKr6Vl7FOg1 6mAg== X-Gm-Message-State: AElRT7HvVQtUure+MoM6lLDreWwGG8zth8PP5Sfm+yIVmmjW9pIeKoak IwttL/syBoxoCkhQX5iB8BAUVg== X-Received: by 10.28.1.197 with SMTP id 188mr4550742wmb.49.1522299367185; Wed, 28 Mar 2018 21:56:07 -0700 (PDT) Received: from [192.168.43.112] ([5.170.128.180]) by smtp.gmail.com with ESMTPSA id p137sm1482888wmd.41.2018.03.28.21.56.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Mar 2018 21:56:06 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: General protection fault with use_blk_mq=1. From: Paolo Valente In-Reply-To: Date: Thu, 29 Mar 2018 06:56:04 +0200 Cc: "Zephaniah E. Loss-Cutler-Hull" , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <7d8a9c62-7d3e-879c-5b5b-30707f04553e@aehallh.com> To: Jens Axboe X-Mailer: Apple Mail (2.3445.5.20) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Il giorno 29 mar 2018, alle ore 03:02, Jens Axboe ha = scritto: >=20 > On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote: >> I am not subscribed to any of the lists on the To list here, please = CC >> me on any replies. >>=20 >> I am encountering a fairly consistent crash anywhere from 15 minutes = to >> 12 hours after boot with scsi_mod.use_blk_mq=3D1 dm_mod.use_blk_mq=3D1>= =20 >> The crash looks like: >>=20 >> [ 5466.075993] general protection fault: 0000 [#1] PREEMPT SMP PTI >> [ 5466.075997] Modules linked in: esp4 xfrm4_mode_tunnel fuse usblp >> uvcvideo pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) >> ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 >> xt_conntrack nf_conntrack iptable_filter ip_tables x_tables = intel_rapl >> joydev serio_raw wmi_bmof iwldvm iwlwifi shpchp kvm_intel kvm = irqbypass >> autofs4 algif_skcipher nls_iso8859_1 nls_cp437 crc32_pclmul >> ghash_clmulni_intel >> [ 5466.076022] CPU: 3 PID: 10573 Comm: pool Tainted: G O =20= >> 4.15.13-f1-dirty #148 >> [ 5466.076024] Hardware name: Hewlett-Packard HP EliteBook Folio >> 9470m/18DF, BIOS 68IBD Ver. F.44 05/22/2013 >> [ 5466.076029] RIP: 0010:percpu_counter_add_batch+0x2b/0xb0 >> [ 5466.076031] RSP: 0018:ffffa556c47afb58 EFLAGS: 00010002 >> [ 5466.076033] RAX: ffff95cda87ce018 RBX: ffff95cda87cdb68 RCX: >> 0000000000000000 >> [ 5466.076034] RDX: 000000003fffffff RSI: ffffffff896495c4 RDI: >> ffffffff895b2bed >> [ 5466.076036] RBP: 000000003fffffff R08: 0000000000000000 R09: >> ffff95cb7d5f8148 >> [ 5466.076037] R10: 0000000000000200 R11: 0000000000000000 R12: >> 0000000000000001 >> [ 5466.076038] R13: ffff95cda87ce088 R14: ffff95cda6ebd100 R15: >> ffffa556c47afc58 >> [ 5466.076040] FS: 00007f25f5305700(0000) GS:ffff95cdbeac0000(0000) >> knlGS:0000000000000000 >> [ 5466.076042] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 5466.076043] CR2: 00007f25e807e0a8 CR3: 00000003ed5a6001 CR4: >> 00000000001606e0 >> [ 5466.076044] Call Trace: >> [ 5466.076050] bfqg_stats_update_io_add+0x58/0x100 >> [ 5466.076055] bfq_insert_requests+0xec/0xd80 >> [ 5466.076059] ? blk_rq_append_bio+0x8f/0xa0 >> [ 5466.076061] ? blk_rq_map_user_iov+0xc3/0x1d0 >> [ 5466.076065] blk_mq_sched_insert_request+0xa3/0x130 >> [ 5466.076068] blk_execute_rq+0x3a/0x50 >> [ 5466.076070] sg_io+0x197/0x3e0 >> [ 5466.076073] ? dput+0xca/0x210 >> [ 5466.076077] ? mntput_no_expire+0x11/0x1a0 >> [ 5466.076079] scsi_cmd_ioctl+0x289/0x400 >> [ 5466.076082] ? filename_lookup+0xe1/0x170 >> [ 5466.076085] sd_ioctl+0xc7/0x1a0 >> [ 5466.076088] blkdev_ioctl+0x4d4/0x8c0 >> [ 5466.076091] block_ioctl+0x39/0x40 >> [ 5466.076094] do_vfs_ioctl+0x92/0x5e0 >> [ 5466.076097] ? __fget+0x73/0xc0 >> [ 5466.076099] SyS_ioctl+0x74/0x80 >> [ 5466.076102] do_syscall_64+0x60/0x110 >> [ 5466.076106] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 >> [ 5466.076109] RIP: 0033:0x7f25f75fef47 >> [ 5466.076110] RSP: 002b:00007f25f53049a8 EFLAGS: 00000246 ORIG_RAX: >> 0000000000000010 >> [ 5466.076112] RAX: ffffffffffffffda RBX: 000000000000000c RCX: >> 00007f25f75fef47 >> [ 5466.076114] RDX: 00007f25f53049b0 RSI: 0000000000002285 RDI: >> 000000000000000c >> [ 5466.076115] RBP: 0000000000000010 R08: 00007f25e8007818 R09: >> 0000000000000200 >> [ 5466.076116] R10: 0000000000000001 R11: 0000000000000246 R12: >> 0000000000000000 >> [ 5466.076118] R13: 0000000000000000 R14: 00007f25f8a6b5e0 R15: >> 00007f25e80173e0 >> [ 5466.076120] Code: 41 55 49 89 fd bf 01 00 00 00 41 54 49 89 f4 55 = 89 >> d5 53 e8 18 e1 bb ff 48 c7 c7 c4 95 64 89 e8 dc e9 fb ff 49 8b 45 20 = 48 >> 63 d5 <65> 8b 18 48 63 db 4c 01 e3 48 39 d3 7d 0a f7 dd 48 63 ed 48 = 39 >> [ 5466.076147] RIP: percpu_counter_add_batch+0x2b/0xb0 RSP: = ffffa556c47afb58 >> [ 5466.076149] ---[ end trace 8d7eb80aafef4494 ]--- >> [ 5466.670153] note: pool[10573] exited with preempt_count 2 >>=20 >> (I only have the one instance right this minute as a result of not >> having remote syslog setup before now.) >>=20 >> This is clearly deep in the blk_mq code, and it goes away when I = remove >> the use_blk_mq kernel command line parameters. >>=20 >> My next obvious step is to try and disable the load of the vbox = modules. >>=20 >> I can include the full dmesg output if it would be helpful. >>=20 >> The system is an older HP Ultrabook, and the root partition is, sda1 = (a >> SSD) -> a LUKS encrypted partition -> LVM -> BTRFS. >>=20 >> The kernel is a stock 4.15.11, however I only recently added the = blk_mq >> options, so while I can state that I have seen this on multiple = kernels >> in the 4.15.x series, I have not tested earlier kernels in this >> configuration. >>=20 >> Looking through the code, I'd guess that this is dying inside >> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what = RIP >> is pointing at. >=20 > Leaving the whole thing here for Paolo - it's crashing off insertion = of > a request coming out of SG_IO. Don't think we've seen this BFQ failure > case before. >=20 Actually, we have. Found and reported by Ming about two months and a half ago: https://www.spinics.net/lists/linux-block/msg21422.html Then it just disappeared with 4.16, and Ming moved on. This forced me to abandon the problem, as I never succeeded in reproducing it. Thanks, Paolo > You can mitigate this by switching the scsi-mq devices to mq-deadline > instead. >=20 > --=20 > Jens Axboe