Received: by 10.223.185.116 with SMTP id b49csp4239143wrg; Mon, 26 Feb 2018 13:50:22 -0800 (PST) X-Google-Smtp-Source: AH8x224p+dJ0jr7x1KJVpEVyRWdhQbO0+92f2635gn3ksut0Wa9P4QkXzAeW95zLjAVVpUuyISjU X-Received: by 2002:a17:902:848e:: with SMTP id c14-v6mr12260671plo.139.1519681821936; Mon, 26 Feb 2018 13:50:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519681821; cv=none; d=google.com; s=arc-20160816; b=i+lYj5iepQZTnhNjCNfBm5UI4LQiyXQNNhmNjLu4pDZLIFCCWBqSU5RRIXxQYH/Qi2 qLiPref78ui2gGgHlCcXJFkHXGFNc75rEyhXcTJLSUcE0jwvD6QhDzhUBjhB1AiIHXlu TAxpkOS5hQ6ONhnHx0KgQMZwkNdvRnbVZ+XkN8ef3158gV3otQcbcK/Q/EbYIvNBIysJ hhIuaw43bDKFP9SLNV/ao5NdMWi02It/iQLGfwwqKmaWBI0BF8RidktoySHd5MZ5Wdxi 1bvW+TKxl/zIME/P/STJzvpccq38wxozlKcu5jbMZMLNwecu4RlZqNrhuRhMSgonvtgR affw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dkim-signature :arc-authentication-results; bh=IVa5JnKMXbalUDa+mlKq2X5OKwIheXgVLKoa30mVlHo=; b=FEzqZSQeK/ZozuyIqG67lGBrgHABfDpcJUJR5NfcYGE71z5jozm1RemOFnpzVkt4Ip QBKoUfEo7VZBz/8epdBphmctt7fgHl8MBWcgfhg/3C/MYsgiZpb8IbZJKyjmobP78fe7 LfSkyDn4mv8dIaz4uFfZhusucSFQb4Kx5NNbFbN35l/Uaxgk0XZ7LhTEvwpC2kgxHBZo dr+0eOTYCh9K2jsvxbn6uuh/Ys3lAz3q24b+Em7Ni65UIa5uof57VVud+Or0yTiIVQVP slDozo8u+orEfQR7zJosb8CovYjVSVipZcTQpob1tc1+V/KntTkhvJ6s3U+QDj9lf/BN Ko+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=fBIMk1P/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y2si6028588pgr.167.2018.02.26.13.50.05; Mon, 26 Feb 2018 13:50:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=fBIMk1P/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751990AbeBZVsd (ORCPT + 99 others); Mon, 26 Feb 2018 16:48:33 -0500 Received: from mail-lf0-f68.google.com ([209.85.215.68]:33668 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751531AbeBZVs1 (ORCPT ); Mon, 26 Feb 2018 16:48:27 -0500 Received: by mail-lf0-f68.google.com with SMTP id o145so21251973lff.0; Mon, 26 Feb 2018 13:48:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=IVa5JnKMXbalUDa+mlKq2X5OKwIheXgVLKoa30mVlHo=; b=fBIMk1P/e16MD8AZ8Bp0poNNemN9lERvi9S4gj5YMtkOmGVE9l3XH5GeC9sdgglryv 1I2g4xsDknnIZ0L+8BzyIFe2iQlktSj1pAM4o7zPfVf5nguW6fwf1nl/PxrIqxJO2xKx /u2h4F1hUZ45AM+d7Yw1MYDIVqHHgqMdzU7zp8UfXEgG48lhilAlCmYpgM5A79hmsNER ZBMuCXg9MhuPfhn7YQS2YLuKxnKBPByRPkTt7OMZCMp2O9yvEvDt1m/OObmuM4KjRtOq ztEevmIZpe7d+QXPI2afig+mi6LK9x/9CcnScIcVO+BAGErWkw6V4SCz4IUfe0Lim+uQ +7WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=IVa5JnKMXbalUDa+mlKq2X5OKwIheXgVLKoa30mVlHo=; b=NmYf1TIr0i/1GCLfMR405ubF6/oI+97yzbbfODMRS8uzze3mAgnvY0/VlUDUvgnYHY EHZ+4wSZo81d4wClnIoyc0nvbjWv0YkG5Q1A0zR+ZvqhF11WYrj1GhIg+Q0GQaDUU7HL 2WEv+o77oquPfvJ5N6IGaddMzHRBIFnSj5ywgPUlWk8CZN9ssVq7TvjzISG64Q2FglNA LL0NOHEe1UD6KSU0tNc4qwrmX+DxEAgUXu4Ak6Tj+isPM60KXkVfCZBQIcwnO926lN44 Yk19rYZy0qqiFl2xEgLncq06rhg9V2GF7yvLRWu7TOM5qUubeQlmgZh/FpZKVbqpQJj9 ievw== X-Gm-Message-State: APf1xPAHPINKp4tam4zg6nQsux+1htowXors7zjb4LOKJEabQ70Dt6B6 yv5kLX2pxbHsZVul5GudGgA= X-Received: by 10.25.67.80 with SMTP id m16mr9040872lfj.105.1519681705487; Mon, 26 Feb 2018 13:48:25 -0800 (PST) Received: from [192.168.1.145] (ppp109-252-55-234.pppoe.spdop.ru. [109.252.55.234]) by smtp.googlemail.com with ESMTPSA id q80sm2202792lfb.69.2018.02.26.13.48.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 13:48:24 -0800 (PST) From: Dmitry Osipenko Subject: Re: [PATCH V15 06/22] mmc: block: Add blk-mq support To: Adrian Hunter , Ulf Hansson Cc: linux-mmc , linux-block , linux-kernel , Bough Chen , Alex Lemberg , Mateusz Nowak , Yuliy Izrailov , Jaehoon Chung , Dong Aisheng , Das Asutosh , Zhangfei Gao , Sahitya Tummala , Harjani Ritesh , Venu Byravarasu , Linus Walleij , Shawn Lin , Bartlomiej Zolnierkiewicz , Christoph Hellwig , Thierry Reding , Krishna Reddy References: <1511962879-24262-1-git-send-email-adrian.hunter@intel.com> <1511962879-24262-7-git-send-email-adrian.hunter@intel.com> <829308a3-3bf6-c173-65fa-e2a0f45f7f61@intel.com> <68886f99-97f5-897a-f754-6f414741bd5a@gmail.com> Message-ID: <22580b82-0257-b156-9f0c-79afa34067e5@gmail.com> Date: Tue, 27 Feb 2018 00:48:23 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <68886f99-97f5-897a-f754-6f414741bd5a@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22.02.2018 20:54, Dmitry Osipenko wrote: > On 22.02.2018 10:42, Adrian Hunter wrote: >> On 21/02/18 22:50, Dmitry Osipenko wrote: >>> On 29.11.2017 16:41, Adrian Hunter wrote: >>>> Define and use a blk-mq queue. Discards and flushes are processed >>>> synchronously, but reads and writes asynchronously. In order to support >>>> slow DMA unmapping, DMA unmapping is not done until after the next request >>>> is started. That means the request is not completed until then. If there is >>>> no next request then the completion is done by queued work. >>> >>> Hello, >>> >>> I'm using (running linux-next and doing some upstream development for) some old >>> NVIDIA Tegra tablet that has built-in (internal) and external MMC's and with the >>> blk-mq being enabled I'm observing a soft lockup. The lockup seems is >>> reproducible quite reliably by running fsck on any MMC partition, sometimes >>> kernels lockups on boot during probing partitions table (weirdly only when both >>> SDHCI's are present, i.e. internal storage enabled in DT and external SD is >>> inserted/enabled) and it also lockups pretty quickly in a case of just a general >>> use. Reverting mmc/ commits up to 1bec43a3b18 ("Remove option not to use >>> blk-mq") and disabling CONFIG_MMC_MQ_DEFAULT makes everything working fine >>> again. There is also a third SDHCI populated with built-in WiFi/Bluetooth SDIO >>> and I'm observing odd MMC timeouts with the blk-mq enabled, disabling >>> CONFIG_MMC_MQ_DEFAULT fixes these timeouts as well. >>> >>> Any thoughts? >> >> SDIO (unless it is a combo card) should be unaffected by changes to the >> block driver. I don't know whether it's a combo card or not. Where I can find info about that? Is it mentioned in sysfs somewhere? Alternatively you may take a brief look at what brcmfmac driver does, maybe it will tell you immediately whether blk-mq affects it or not. And if it's not affected, then it could be that there is some other issue that is masked by a properly working block driver. >> I don't have any ideas. Adding more NVIDIA people. >>> >>> WiFi issue >>> ======================== >>> >>> [ 38.247006] mmc2: Timeout waiting for hardware interrupt. >>> [ 38.247027] brcmfmac: brcmf_escan_timeout: timer expired >>> [ 38.247036] mmc2: sdhci: ============ SDHCI REGISTER DUMP =========== >>> [ 38.247047] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000001 >>> [ 38.247055] mmc2: sdhci: Blk size: 0x00007008 | Blk cnt: 0x00000000 >>> [ 38.247062] mmc2: sdhci: Argument: 0x21000008 | Trn mode: 0x00000013 >>> [ 38.247070] mmc2: sdhci: Present: 0x01d70000 | Host ctl: 0x00000013 >>> [ 38.247077] mmc2: sdhci: Power: 0x00000001 | Blk gap: 0x00000000 >>> [ 38.247084] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 >>> [ 38.247091] mmc2: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000 >>> [ 38.247098] mmc2: sdhci: Int enab: 0x02ff000b | Sig enab: 0x02fc000b >>> [ 38.247105] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000 >>> [ 38.247112] mmc2: sdhci: Caps: 0x61ff30b0 | Caps_1: 0x00000000 >>> [ 38.247119] mmc2: sdhci: Cmd: 0x0000353a | Max curr: 0x00000001 >>> [ 38.247126] mmc2: sdhci: Resp[0]: 0x00001800 | Resp[1]: 0x08002db5 >>> [ 38.247133] mmc2: sdhci: Resp[2]: 0x16da8000 | Resp[3]: 0x00000400 >>> [ 38.247139] mmc2: sdhci: Host ctl2: 0x00000000 >>> [ 38.247146] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x17c47200 >>> [ 38.247152] mmc2: sdhci: ============================================ >>> [ 38.247250] brcmfmac: brcmf_sdio_readframes: read 520 bytes from channel 1 >>> failed: -84 >>> [ 38.247274] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK >>> [ 40.807019] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout >>> [ 40.807042] brcmfmac: brcmf_notify_escan_complete: Scan abort failed >>> [ 48.487007] mmc2: Timeout waiting for hardware interrupt. >>> [ 48.487057] mmc2: sdhci: ============ SDHCI REGISTER DUMP =========== >>> [ 48.487096] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000001 >>> [ 48.487128] mmc2: sdhci: Blk size: 0x00007040 | Blk cnt: 0x00000001 >>> [ 48.487160] mmc2: sdhci: Argument: 0x21000040 | Trn mode: 0x00000013 >>> [ 48.487191] mmc2: sdhci: Present: 0x01d70000 | Host ctl: 0x00000013 >>> [ 48.487221] mmc2: sdhci: Power: 0x00000001 | Blk gap: 0x00000000 >>> [ 48.487251] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 >>> [ 48.487281] mmc2: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000 >>> [ 48.487313] mmc2: sdhci: Int enab: 0x02ff000b | Sig enab: 0x02fc000b >>> [ 48.487343] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000 >>> [ 48.487374] mmc2: sdhci: Caps: 0x61ff30b0 | Caps_1: 0x00000000 >>> [ 48.487404] mmc2: sdhci: Cmd: 0x0000353a | Max curr: 0x00000001 >>> [ 48.487435] mmc2: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x08002db5 >>> [ 48.487466] mmc2: sdhci: Resp[2]: 0x16da8000 | Resp[3]: 0x00000400 >>> [ 48.487493] mmc2: sdhci: Host ctl2: 0x00000000 >>> [ 48.487525] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x17c47200 >>> [ 48.487552] mmc2: sdhci: ============================================ >>> [ 48.487749] brcmfmac: brcmf_sdio_readframes: read 480 bytes from channel 1 >>> failed: -84 >>> [ 48.487822] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK >>> >>> >>> Soft lockup issue >>> ======================== >>> >>> # fsck -f /dev/disk/by-uuid/6768309f-3545-49d5-9ac7-d5be24d35ef2 >>> fsck из util-linux 2.30.2 >>> e2fsck 1.43.9 (8-Feb-2018) >>> Проход 1: Проверка inodes, блокs, а также размеров >>> Pass 2: Checking каталог structure >>> Pass 3: Checking каталог connectivity >>> Pass 4: Checking reference counts >>> Pass 5: Checking группа summary information >>> .... >>> >>> [ 125.925436] INFO: task kworker/0:3H:263 blocked for more than 60 seconds. >>> [ 125.925496] Not tainted >>> 4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041 >>> [ 125.925530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >>> message. >>> [ 125.925562] kworker/0:3H D 0 263 2 0x00000000 >>> [ 125.925653] Workqueue: kblockd mmc_blk_mq_complete_work >>> [ 125.925747] [] (__schedule) from [] (schedule+0x60/0xcc) >>> [ 125.925805] [] (schedule) from [] >>> (__mmc_claim_host+0xdc/0x24c) >>> [ 125.925849] [] (__mmc_claim_host) from [] >> >> That claim host should not be there. Here is a fix for that: >> >> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c >> index 908e4db03535..62049f95116b 100644 >> --- a/drivers/mmc/core/mmc_ops.c >> +++ b/drivers/mmc/core/mmc_ops.c >> @@ -932,9 +932,7 @@ static int mmc_read_bkops_status(struct mmc_card *card) >> int err; >> u8 *ext_csd; >> >> - mmc_claim_host(card->host); >> err = mmc_get_ext_csd(card, &ext_csd); >> - mmc_release_host(card->host); >> if (err) >> return err; > > Looks like this patch fixes all the problems. I'll keep testing it for a couple > of days and then report back the final result. Thank you very much. This patch fixes the lockup (and WiFi MMC timeouts), for that: Tested-by: Dmitry Osipenko But still something is wrong... I've been getting occasional EXT4 Ooops's, like the one below, and __wait_on_bit() is always figuring in the stacktrace. It never happened with blk-mq disabled, though it could be a coincidence and actually unrelated to blk-mq patches. [ 6625.992337] Unable to handle kernel NULL pointer dereference at virtual address 0000001c [ 6625.993004] pgd = 00b30c03 [ 6625.993257] [0000001c] *pgd=00000000 [ 6625.993594] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 6625.994022] Modules linked in: [ 6625.994326] CPU: 1 PID: 19355 Comm: dpkg Not tainted 4.16.0-rc2-next-20180220-00095-ge9c9f5689a84-dirty #2090 [ 6625.995078] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [ 6625.995595] PC is at dx_probe+0x68/0x684 [ 6625.995947] LR is at __wait_on_bit+0xac/0xc8 [ 6625.996307] pc : [] lr : [] psr: 800f0013 [ 6625.996806] sp : d55e3df0 ip : c0170e88 fp : d55e3e44 [ 6625.997227] r10: d55e3f4c r9 : d55e3e70 r8 : 00000000 [ 6625.997650] r7 : c4e13240 r6 : 00000000 r5 : d657db18 r4 : d55e3e8c [ 6625.998165] r3 : 0000007b r2 : d5830800 r1 : d5831000 r0 : c4e13240 [ 6625.998686] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 6625.999246] Control: 10c5387d Table: 0a63004a DAC: 00000051 [ 6625.999710] Process dpkg (pid: 19355, stack limit = 0x139a48b6) [ 6626.000184] Stack: (0xd55e3df0 to 0xd55e4000) [ 6626.000560] 3de0: 000002e9 d55e3e00 c0c01964 c0278c70 [ 6626.001209] 3e00: d55e3e24 014000c0 c04f3580 c0c01958 d55e3e90 801a001a d55e3e3c 00000012 [ 6626.001854] 3e20: d5830800 00000000 d657db18 c24b0000 d55e3e70 d55e3f4c d55e3edc d55e3e48 [ 6626.002502] 3e40: c033d568 c033b904 00000000 600f0013 c029e640 d6cf3540 ffffe000 00000000 [ 6626.003150] 3e60: 00076e99 d55e3ef4 d55e3e8c d5830800 d409c440 d409c454 00000012 c029e640 [ 6626.003795] 3e80: d55e3ec4 d55e3e90 c02797b4 c4e13240 00000000 00000000 00000000 00000000 [ 6626.004442] 3ea0: 00000000 00000000 00000000 00000000 d409c428 d409c428 d657db18 d409c428 [ 6626.005088] 3ec0: 00000000 c24b0000 ffffff9c d55e3f4c d55e3f14 d55e3ee0 c033d7b0 c033d1b8 [ 6626.005732] 3ee0: c0c01964 c0180050 d55e3f14 d55e3ef8 c029e870 00000000 d409c428 d6546558 [ 6626.006382] 3f00: d55e3f58 00000000 d55e3f34 d55e3f18 c0291f04 c033d764 00000000 00000001 [ 6626.007032] 3f20: 00000000 d55e3f58 d55e3f94 d55e3f38 c0293d70 c0291ea0 d55e3f58 d55e3f4c [ 6626.007679] 3f40: 00000000 0090abb0 d5467800 00000000 d6dd0110 d6546558 f3bc423c 00000012 [ 6626.008326] 3f60: c24b0019 80808080 00000000 015ce1b0 0090abb0 00d8d670 00000028 c01011e4 [ 6626.008971] 3f80: d55e2000 00000000 d55e3fa4 d55e3f98 c0294544 c0293c44 00000000 d55e3fa8 [ 6626.009620] 3fa0: c0101000 c0294530 015ce1b0 0090abb0 0090abb0 000002a8 7d5a8800 7d5a8800 [ 6626.010264] 3fc0: 015ce1b0 0090abb0 00d8d670 00000028 0048eb80 00487344 015eb160 004a6c10 [ 6626.010912] 3fe0: 004a6c8c bede3c0c 0048149d b6ecc6b8 600f0030 0090abb0 00000000 00000000 [ 6626.011577] [] (dx_probe) from [] (ext4_find_entry+0x3bc/0x5ac) [ 6626.012198] [] (ext4_find_entry) from [] (ext4_lookup+0x58/0x1f4) [ 6626.012844] [] (ext4_lookup) from [] (__lookup_hash+0x70/0x9c) [ 6626.013468] [] (__lookup_hash) from [] (do_rmdir+0x138/0x1b8) [ 6626.014071] [] (do_rmdir) from [] (SyS_rmdir+0x20/0x24) [ 6626.014642] [] (SyS_rmdir) from [] (ret_fast_syscall+0x0/0x54) [ 6626.015231] Exception stack(0xd55e3fa8 to 0xd55e3ff0) [ 6626.015656] 3fa0: 015ce1b0 0090abb0 0090abb0 000002a8 7d5a8800 7d5a8800 [ 6626.016302] 3fc0: 015ce1b0 0090abb0 00d8d670 00000028 0048eb80 00487344 015eb160 004a6c10 [ 6626.035930] 3fe0: 004a6c8c bede3c0c 0048149d b6ecc6b8 [ 6626.055341] Code: e1a07000 e5840000 8a000078 e590601c (e5d6301c) [ 6626.075632] ---[ end trace 034f3552437a92bc ]---