Received: by 10.213.65.68 with SMTP id h4csp1153067imn; Wed, 28 Mar 2018 22:14:29 -0700 (PDT) X-Google-Smtp-Source: AIpwx48kh8PIawtSFaH1/GXmk81sGU2yLEYC/dZ2A29jHicFkuFBq/+raEhpZO2ZHcGQjuoydFrP X-Received: by 2002:a17:902:828b:: with SMTP id y11-v6mr6735364pln.69.1522300469842; Wed, 28 Mar 2018 22:14:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522300469; cv=none; d=google.com; s=arc-20160816; b=bI6Y73mFCx1zXX3Q2gus5y8Bj9N1/cNmN3qHKM57teclPIJV4LZjJOrAJFw3vLqM9l lmHtBgxsEvipTK0fPXX8N5nYOG/2XAuLwmq6y0d1ab6NLIkENIKpLDFRaG9DgUPwS4Vf kn2gJit9CQwPthemHxMNKFOW4Ag4Icdp0LGvAcWoKLw+yRFHCmsoYOnaLsX4BMdRKZf7 H8+k5xrdF3NItWn56IDmc83jypKoGoq9DpTw6qXIOBeh11XXdc9iTa+g5PtBPAZBrRfW QKWSRhzkKgWCxA2EeS0nGVz1FuDyvfABSQ98YzP7CQRcv5rXlkTpGZ9b2di8dxHa+rZH N1ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=jHDnwNs0RB0My5zc5r1pebVK28Y85dfhJWxuJjLR+9E=; b=Yk3Q0SClSoy7VAzc757ZZRmyyl8V5U9dAL8J84Us1SCUMK0JzThz/CyTiZW8y9JnPC prILnDFjiIsamRJkC71KgJtSuPBRZtVouPDh4vG8zbgeAPZSxpueGrw3Ql0rbL2ydz/B CC/otji5tMtLCPnsJuXfXKyko+EGmYJl5sH/Xfcn6puMBdjSJ1nAVD3XIKmlhY+Q5h6m i5EriKFSkNSaFczJsJg3Ybm9DICSderbdB2ELo5Rj5w2HqW+J3bsuw3IyqPSqP2jPGtM xsDECly9d5YcvOmWNWR4oJk79bZ2YEm+4UjH3GHz12jWMGNP7gz6m6nlqYsRwDmTQRa1 Iltg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hShvzAeN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id be3-v6si4933587plb.208.2018.03.28.22.14.15; Wed, 28 Mar 2018 22:14:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hShvzAeN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751227AbeC2FNR (ORCPT + 99 others); Thu, 29 Mar 2018 01:13:17 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:55125 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751124AbeC2FNP (ORCPT ); Thu, 29 Mar 2018 01:13:15 -0400 Received: by mail-wm0-f48.google.com with SMTP id h76so8500966wme.4 for ; Wed, 28 Mar 2018 22:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jHDnwNs0RB0My5zc5r1pebVK28Y85dfhJWxuJjLR+9E=; b=hShvzAeN4tNG2xcFVHlBOxtNta4epFtf7DQIyXHkEp7KXOaw/netQe6pWm/ymXJlp6 pfiLYM+UmVAQ0QMBwou4vvds+8YTF176xsA89KHObtp5uOUZUuMb1wg4u1Y2HgFlLlaf d0S37oH/cdvZ2bbL6SJatkvQA/s3WSys3Mb6I= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=jHDnwNs0RB0My5zc5r1pebVK28Y85dfhJWxuJjLR+9E=; b=FFCu/XX+X4jxMMXWc52hV9GJmR+u+BtpbiKuXUdKd+XcfY5s3/jL2GmOODzvInp4bB DOY+b3LQtwYjAXMmYk3zxt78lvRLFp5oXLAwkN22+VydcPJ12YZJB7fOs3rbN8J/j1bv GMbB3gDN7cJOtJ0iir3IlAnYUO9Rf5TZ5sPJHyf20LJOkyaLBCV0zYvh8UB9m2vECIcP n6m8v5d1wml1SU4Uc/VAxxN9nOR30n2+6sihdGW6PUkRRiPxizeKkRAUHrscudxUstVh ujR8wfv4NH7IxaSE9OisBKmioF07YQ4DUU5IikGae369RGvm240woki+TLPrECQpzVnR rmSQ== X-Gm-Message-State: AElRT7HnK0o/3atR/tidXCm5UBTk9Nf3dWfNJflyczV2UnoXXKXPBPAy U+ijg+qwjocUcrXkUdAfAD8OUw== X-Received: by 10.28.127.147 with SMTP id a141mr4432512wmd.138.1522300393925; Wed, 28 Mar 2018 22:13:13 -0700 (PDT) Received: from [192.168.43.112] ([5.170.128.180]) by smtp.gmail.com with ESMTPSA id t69sm5949220wrc.87.2018.03.28.22.13.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Mar 2018 22:13:13 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: General protection fault with use_blk_mq=1. From: Paolo Valente In-Reply-To: <735c5d75-eacf-8ed2-ba9b-9ff4b0b5290d@kernel.dk> Date: Thu, 29 Mar 2018 07:13:11 +0200 Cc: "Zephaniah E. Loss-Cutler-Hull" , "Zephaniah E. Loss-Cutler-Hull" , Linux Kernel Mailing List , linux-block , linux-scsi@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <882A26D2-BEB8-4CE3-B132-0DE31BFD5D28@linaro.org> References: <7d8a9c62-7d3e-879c-5b5b-30707f04553e@aehallh.com> <735c5d75-eacf-8ed2-ba9b-9ff4b0b5290d@kernel.dk> To: Jens Axboe X-Mailer: Apple Mail (2.3445.5.20) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe ha = scritto: >=20 > On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote: >> On 03/28/2018 06:02 PM, Jens Axboe wrote: >>> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote: >>>> I am not subscribed to any of the lists on the To list here, please = CC >>>> me on any replies. >>>>=20 >>>> I am encountering a fairly consistent crash anywhere from 15 = minutes to >>>> 12 hours after boot with scsi_mod.use_blk_mq=3D1 = dm_mod.use_blk_mq=3D1>=20 >>>> The crash looks like: >>>>=20 >>=20 >>>>=20 >>>> Looking through the code, I'd guess that this is dying inside >>>> blkg_rwstat_add, which calls percpu_counter_add_batch, which is = what RIP >>>> is pointing at. >>>=20 >>> Leaving the whole thing here for Paolo - it's crashing off insertion = of >>> a request coming out of SG_IO. Don't think we've seen this BFQ = failure >>> case before. >>>=20 >>> You can mitigate this by switching the scsi-mq devices to = mq-deadline >>> instead. >>>=20 >>=20 >> I'm thinking that I should also be able to mitigate it by disabling >> CONFIG_DEBUG_BLK_CGROUP. >>=20 >> That should remove that entire chunk of code. >>=20 >> Of course, that won't help if this is actually a symptom of a bigger >> problem. >=20 > Yes, it's not a given that it will fully mask the issue at hand. But > turning off BFQ has a much higher chance of working for you. >=20 > This time actually CC'ing Paolo. >=20 Hi Zephaniah, if you are actually interested in the benefits of BFQ (low latency, high responsiveness, fairness, ...) then it may be worth to try what you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP. Also because this option activates the heavy computation of debug cgroup statistics, which probably you don't use. In addition, the outcome of your attempt without CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information: - if no failure occurs, then the issue is likely to be confined in that debugging code (which, on the bright side, is likely to be of occasional interest, for only a handful of developers) - if the issue still shows up, then we may have new hints on this odd failure Finally, consider that this issue has been reported to disappear from 4.16 [1], and, as a plus, that the service quality of BFQ had a further boost exactly from 4.16. Looking forward to your feedback, in case you try BFQ without CONFIG_DEBUG_BLK_CGROUP, Paolo [1] https://www.spinics.net/lists/linux-block/msg21422.html >=20 > --=20 > Jens Axboe