Received: by 10.213.65.68 with SMTP id h4csp1382568imn; Mon, 19 Mar 2018 02:34:47 -0700 (PDT) X-Google-Smtp-Source: AG47ELuucZGCtIKP2Scc7cYjfyFvW3VBBcPiP1BO4BGSaNkBisDZldl0W/IRN9hoyA9KmI/NjlBQ X-Received: by 10.99.127.27 with SMTP id a27mr8495482pgd.18.1521452087587; Mon, 19 Mar 2018 02:34:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521452087; cv=none; d=google.com; s=arc-20160816; b=bWj+Qu0sLpsyYS01AE5kwZ4RkwEgUgtnh44FMK0DHX3cUda22CaW8BCOksaKHnkS9X swBBPPPoBAPXSJ4WtqFCYJtXOsSlsx8XLwK8azKvWLk+Fq9S4Aaw6lvDItbwhi16ubcY RB7PnbPCMduthB5EzSAfpUG3zTMi2WRDPYibCy3cv0FVMzhtw+1K0MFvrYMIU/1gMjwa gY5egH28x1MOotGnEXERW+GAZ0AnPxRJ350MLL9k6YylOalDIiMn9vJ1vCNEke+cjJ9N yl7jT+Cl1NkbpMqa00mg1BsKJbjCQYG6hZQslE8BsVu72Dfs1qSRoCUKONVOei4Agosz 6L1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=SXVAxMM/Ku1gqOyLzMyB3hGZrZ533MoxlDIic8IsGe0=; b=mK2nLpgP8Zyp7sD6UraL7hI1Vd/EyphViMlsJ/qJYODwu1EZW3mpUYcDGs7IhVvQWG taC9SrUfb97KS6rvoeYP/wm0v3os0rGtL3Md5S7u4V7qWNrc2J7rYjBG5u9VVIOyPwFw JTiQCvVFV87gkabrInQigRf+ghHCetfPQNHvfVQ5WMFtLTntRCvXPMNOn4ebKGYT5uhT 5xUwIufkyHauVA0ta+oa5V9Gef1/mOusMIki4+YMpDxGSqdZO1YwqLRKff2/EdjBwTQv 9mWGPspVDKJUWGW9XNj4zNhLRHrtN4XjASniNPuAbgwYU/DB2muQHDnOBT6+23/qKh/A 2pqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1-v6si13224843plj.409.2018.03.19.02.34.33; Mon, 19 Mar 2018 02:34:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755441AbeCSJdB (ORCPT + 99 others); Mon, 19 Mar 2018 05:33:01 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:36457 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755352AbeCSJc6 (ORCPT ); Mon, 19 Mar 2018 05:32:58 -0400 Received: by mail-wm0-f45.google.com with SMTP id x82so1357496wmg.1 for ; Mon, 19 Mar 2018 02:32:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=SXVAxMM/Ku1gqOyLzMyB3hGZrZ533MoxlDIic8IsGe0=; b=S5VadyAJ9E9UQpPPsx52rCY40OihwH2Qik6ktqI3NUf2G2h0e3/WibmJT8M4Ia7ET7 vpVad+7SASu1axqhI7CXkt+cYJ9MJHbkYiWVj2niieFb4tGyTMx4t1vef9q9XwHK7Qu7 DrpZHHzYfVNBH1T+9K9aIR+tq1IH9HjWrt/CnzIRYflpvoXAg5ZM564cPOWjxyDa9ONy eAGH8JuzrUr+/apNb32ROaHJj/TBAftozU2NVNZUpQHj4BEs0pAHZ06g6yH2+w4vH5+s M5GztF0l7dDIt/CV/F9fGThaf+lpOgdqAAex4/OY+oneZSw693VqntiAgZe/5Xl6HO0s PGdw== X-Gm-Message-State: AElRT7HVelCfWo7NksE28cVsnvCB4PLuG1DnW2w5PDcKgkANZ6D3YwKr tOIRDwJntK0cDAaB1mOpo4DbwTYECpA= X-Received: by 10.80.208.144 with SMTP id v16mr7624819edd.182.1521451977568; Mon, 19 Mar 2018 02:32:57 -0700 (PDT) Received: from shalem.localdomain (546A5441.cm-12-3b.dynamic.ziggo.nl. [84.106.84.65]) by smtp.gmail.com with ESMTPSA id m23sm8841708edc.69.2018.03.19.02.32.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Mar 2018 02:32:57 -0700 (PDT) Subject: Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts To: Martin Steigerwald Cc: Linux Kernel Mailing List , Thorsten Leemhuis , Tejun Heo References: <27165802.vQ9JbjrmvU@merkaba> <4373847.u3KAOBid1D@merkaba> <2906688.e4ghZiFuBA@merkaba> From: Hans de Goede Message-ID: Date: Mon, 19 Mar 2018 10:32:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <2906688.e4ghZiFuBA@merkaba> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 18-03-18 23:06, Martin Steigerwald wrote: > Hi Hans. > > Hans de Goede - 18.03.18, 22:34: >> On 14-03-18 13:48, Martin Steigerwald wrote: >>> Hans de Goede - 14.03.18, 12:05: >>>> Hi, >>>> >>>> On 14-03-18 12:01, Martin Steigerwald wrote: >>>>> Hans de Goede - 11.03.18, 15:37: >>>>>> Hi Martin, >>>>>> >>>>>> On 11-03-18 09:20, Martin Steigerwald wrote: >>>>>>> Hello. >>>>>>> >>>>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue >>>>>>> with SMART checks occassionally failing like this: >>>>>>> >>>>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending >>>>>>> checks >>>>>>> udisksd[24408]: Error performing housekeeping for drive >>>>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error >>>>>>> updating >>>>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected >>>>>>> sense >>>>>>> data returned:#0120000: 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 >>>>>>> 50 >>>>>>> 00 ..............P.#0120010: 00 00 00 00 00 00 00 00 00 00 00 00 >>>>>>> 00 >>>>>>> 00 00 00 ................#012 (g-io-error-quark, 0) merkaba >>>>>>> udisksd[24408]: Error performing housekeeping for drive >>>>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error >>>>>>> updating >>>>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected >>>>>>> sense >>>>>>> data returned:#0120000: 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 >>>>>>> 00 >>>>>>> 00 ................#0120010: 00 0 0 00 00 50 00 00 00 00 00 00 00 >>>>>>> 00 00 00 00 ....P...........#012 (g-io-error-quark, 0) >>>>>>> >>>>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad >>>>>>> T520) >>>>>>> >>>>>>> However when I then check manually with smartctl -a | -x | -H the >>>>>>> device >>>>>>> reports SMART data just fine. >>>>>>> >>>>>>> As smartd correctly detects that device is in sleep mode, this may be >>>>>>> an >>>>>>> userspace issue in udisksd. >>>>>>> >>>>>>> Also at some boot attempts the boot hangs with a message like "could >>>>>>> not >>>>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1 >>>>>>> on to LVs (each on one of the SSDs). A configuration that requires a >>>>>>> manual >>>>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before >>>>>>> btrfs device scan). >>>>>>> >>>>>>> I wonder whether that has to do with the new SATA LPM policy stuff, >>>>>>> but >>>>>>> as >>>>>>> I had issues with >>>>>>> >>>>>>> 3 => Medium power with Device Initiated PM enabled >>>>>>> >>>>>>> (machine did not boot, which could also have been caused by me >>>>>>> accidentally >>>>>>> removing all TCP/IP network support in the kernel with that setting) >>>>>>> >>>>>>> I set it back to >>>>>>> >>>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0 >>>>>>> >>>>>>> (firmware settings) >>>>>> >>>>>> Right, so at that settings the LPM policy changes are effectively >>>>>> disabled and cannot explain your SMART issues. >>>>>> >>>>>> Still I would like to zoom in on this part of your bug report, because >>>>>> for Fedora 28 we are planning to ship with >>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 >>>>>> and AFAIK Ubuntu has similar plans. >>>>>> >>>>>> I suspect that the issue you were seeing with >>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've >>>>>> attached >>>>>> a patch for you to test, which disabled LPM for your model Crucial SSD >>>>>> (but >>>>>> keeps it on for the Intel disk) if you can confirm that with that patch >>>>>> you >>>>>> can run with >>>>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great. >>>>> >>>>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system >>>>> successfully >>>>> booted three times in a row. So feel free to add tested-by. >>>> >>>> Thanks. >>>> >>>> To be clear, you're talking about 4.16-rc5 with the patch I made to >>>> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ? >>> >>> 4.16-rc5 with your >>> >>> 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch >> >> I was about to submit this upstream and was planning on extending it to >> also cover the 960GB version, which lead to me doing a quick google. >> Judging from the google results it seems that there are multiple firmware >> versions of this SSD out there and I wonder if you are perhaps running >> an older version of the firmware. If you do: >> >> dmesg | grep Crucial_CT480M500 >> >> You should see something like this: >> >> ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133 >> >> I'm interested in the "MU03" part, what is that in your case? > > Although I never updated the firmware, I do have MU03: > > % lsscsi | grep Crucial > [2:0:0:0] disk ATA Crucial_CT480M50 MU03 /dev/sdb > > % dmesg | grep Crucial_CT480M500 > [ 2.424537] ata3.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133 Thanks. So there is an MU05 update: www.crucial.com/wcsstore/CrucialSAS/firmware/M500/MU05/crucial-m500-iso-firmware-update-mu05-en.pdf Which according to its changelog features: "Improved drive latency performance in applications with SMART polling" Which is not relevant to the LPM issues you are seeing, but seems relevant to the other issues you are seeing. Unfortunately the MU05 update does not seem to specifically address any LPM issues, so I'm just going to do the blacklist for all 480GB+ models for now (my experience with other Crucial models is that smaller variants seem to not suffer from LPM issues). Regards, Hans