Received: by 10.213.65.68 with SMTP id h4csp1097311imn; Sun, 18 Mar 2018 14:36:30 -0700 (PDT) X-Google-Smtp-Source: AG47ELtdNQjj+FUR9bhLhs4vltxCQaGSaki1x5hwdu+RVTju0ew0oVoSQvWhluF4GZkYCyIQ3X20 X-Received: by 10.101.101.66 with SMTP id a2mr550162pgw.223.1521408989998; Sun, 18 Mar 2018 14:36:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521408989; cv=none; d=google.com; s=arc-20160816; b=E2ZqvNKynT7b8hxynJVrSmQgG1JeJNKwVQwlvewPWrN6nDQkRMgUM90Jllc4yybCr6 sPcDxkaaMzS8v39AnIVqxd0tizUobSz3du+S0tBlJyeI4SKwh4SvToLCrrAMUbT76EsH pxWUEjqMloAjcum+vkrxWK/ItmdhDPxhtE+D6h6rCvuLj4fTW/CHQjfEtpWxod2n2rck 9e8z5Of7y76xM3guxRU9r1DU9HAtyMOJW7W4ovyzkhJ2UDLjixA9bzdyUBuOQqwyAsXT dKkKzT7hqEMswG3jftoPBXJjAG+SX9U6kUyOYfqscYc+wkkQjqi9tTrJvVbqjrlRd0dG QVKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=HRunXb2V0IW4aKEmSmGGbarjmpNG1hl27OhJgDyuKjs=; b=oGNup9ETycHZzIfNF2iqTWA1C6YCE0a84WU46ne/RJ+kqQ+Xs+BdtC2s0XYPN/D/OV DeNzp7XLYdLGNmvPi4Uji+awc4MOMDCdKkePb6OehBPQEIIGSmMNCtsxIfJugY6FytIB myMhgz/jXXJ2ROo6TJfo2dMXIukgxUXjTQ6S44l9CsoDZ+GVdrBo4n6jY/Z9nhkdNPJo NhXbCVV3P3zMcHNCCnaQqmpVxltVXG11LX1G8VzHbknSjvtsaJhiNyge0PQewzl1QZcc D+8HjF56tBz24FMpDZn1Hufd45mthA13r3rHpk6LLfE44/WB2khdTYUqusQz6/1A7gPD fkDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w8-v6si1970504plp.680.2018.03.18.14.36.15; Sun, 18 Mar 2018 14:36:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754750AbeCRVeI (ORCPT + 99 others); Sun, 18 Mar 2018 17:34:08 -0400 Received: from mail-wm0-f47.google.com ([74.125.82.47]:53639 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754304AbeCRVeG (ORCPT ); Sun, 18 Mar 2018 17:34:06 -0400 Received: by mail-wm0-f47.google.com with SMTP id e194so11823984wmd.3 for ; Sun, 18 Mar 2018 14:34:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=HRunXb2V0IW4aKEmSmGGbarjmpNG1hl27OhJgDyuKjs=; b=J0h8qWu45uWozbtDkMEE3rxQUsIUAk911dYwE8MX5A1b+nLfTguwS1KbMmY2H4stzW BEbkAiAusneiiTGkfjlyLUkOL2Flp0PWK4/dTvZxuSITFDb2/v3l7K/8GTw0Kbmyhtzf y7De/deH5Lr3xBR81fgGZr2QfMpduMsaoB8JqEzk1pSRP96Qj/4sUgEMb6jf6N7NR3L/ /fpweohxThb+2/50rKdefkbaXwJsJnatdtr9gBczqV4PyDpGSEtqQQvZ6aXzcM8m8Us2 dyNFqXrxr/VMAQRxMiNA3gpWZzi2Ye5esUOyI0R9Qgo2EUBDyAdG9pGypAv48VzncuQ3 LcVg== X-Gm-Message-State: AElRT7EsNus5xs6KEP8V1Ed3KGWYp4YgiXHgTpmRLkKjNi4VpKvmXcRY 05ZvknLkHuy5qmgYg8YWrDxtUiHmFq0= X-Received: by 10.80.148.49 with SMTP id p46mr10792564eda.311.1521408845251; Sun, 18 Mar 2018 14:34:05 -0700 (PDT) Received: from shalem.localdomain (546A5441.cm-12-3b.dynamic.ziggo.nl. [84.106.84.65]) by smtp.gmail.com with ESMTPSA id h16sm7722837edj.10.2018.03.18.14.34.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Mar 2018 14:34:04 -0700 (PDT) Subject: Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts To: Martin Steigerwald Cc: Linux Kernel Mailing List , Thorsten Leemhuis , Tejun Heo References: <27165802.vQ9JbjrmvU@merkaba> <3573548.kp1edD77Gq@merkaba> <1e708e58-5ba7-8ce9-2edf-6cc7ba5f80c3@redhat.com> <4373847.u3KAOBid1D@merkaba> From: Hans de Goede Message-ID: Date: Sun, 18 Mar 2018 22:34:03 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <4373847.u3KAOBid1D@merkaba> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 14-03-18 13:48, Martin Steigerwald wrote: > Hans de Goede - 14.03.18, 12:05: >> Hi, >> >> On 14-03-18 12:01, Martin Steigerwald wrote: >>> Hans de Goede - 11.03.18, 15:37: >>>> Hi Martin, >>>> >>>> On 11-03-18 09:20, Martin Steigerwald wrote: >>>>> Hello. >>>>> >>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue >>>>> with SMART checks occassionally failing like this: >>>>> >>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending >>>>> checks >>>>> udisksd[24408]: Error performing housekeeping for drive >>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating >>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense >>>>> data returned:#0120000: 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 50 >>>>> 00 ..............P.#0120010: 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> 00 >>>>> 00 00 00 ................#012 (g-io-error-quark, 0) merkaba >>>>> udisksd[24408]: Error performing housekeeping for drive >>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error >>>>> updating >>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected >>>>> sense >>>>> data returned:#0120000: 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 00 >>>>> 00 ................#0120010: 00 0 0 00 00 50 00 00 00 00 00 00 00 >>>>> 00 00 00 00 ....P...........#012 (g-io-error-quark, 0) >>>>> >>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520) >>>>> >>>>> However when I then check manually with smartctl -a | -x | -H the device >>>>> reports SMART data just fine. >>>>> >>>>> As smartd correctly detects that device is in sleep mode, this may be an >>>>> userspace issue in udisksd. >>>>> >>>>> Also at some boot attempts the boot hangs with a message like "could not >>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1 >>>>> on to LVs (each on one of the SSDs). A configuration that requires a >>>>> manual >>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before >>>>> btrfs device scan). >>>>> >>>>> I wonder whether that has to do with the new SATA LPM policy stuff, but >>>>> as >>>>> I had issues with >>>>> >>>>> 3 => Medium power with Device Initiated PM enabled >>>>> >>>>> (machine did not boot, which could also have been caused by me >>>>> accidentally >>>>> removing all TCP/IP network support in the kernel with that setting) >>>>> >>>>> I set it back to >>>>> >>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0 >>>>> >>>>> (firmware settings) >>>> >>>> Right, so at that settings the LPM policy changes are effectively >>>> disabled and cannot explain your SMART issues. >>>> >>>> Still I would like to zoom in on this part of your bug report, because >>>> for Fedora 28 we are planning to ship with >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 >>>> and AFAIK Ubuntu has similar plans. >>>> >>>> I suspect that the issue you were seeing with >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've >>>> attached >>>> a patch for you to test, which disabled LPM for your model Crucial SSD >>>> (but >>>> keeps it on for the Intel disk) if you can confirm that with that patch >>>> you >>>> can run with >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great. >>> >>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully >>> booted three times in a row. So feel free to add tested-by. >> >> Thanks. >> >> To be clear, you're talking about 4.16-rc5 with the patch I made to >> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ? > > 4.16-rc5 with your > > 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch I was about to submit this upstream and was planning on extending it to also cover the 960GB version, which lead to me doing a quick google. Judging from the google results it seems that there are multiple firmware versions of this SSD out there and I wonder if you are perhaps running an older version of the firmware. If you do: dmesg | grep Crucial_CT480M500 You should see something like this: ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133 I'm interested in the "MU03" part, what is that in your case? Note I'm not saying we should not do the NOLPM quirk, but maybe we can limit it to older firmware. Regards, Hans