Received: by 10.213.65.68 with SMTP id h4csp1108232imn; Sun, 18 Mar 2018 15:07:27 -0700 (PDT) X-Google-Smtp-Source: AG47ELuW2hLKbgfnIp7IwVDxgi3DWmYmeDTE7qEY2pWsZMivGLSUWIwoOFkSfzZE0kTzlVLaE6kg X-Received: by 10.167.131.135 with SMTP id u7mr8269385pfm.50.1521410846978; Sun, 18 Mar 2018 15:07:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521410846; cv=none; d=google.com; s=arc-20160816; b=OpBNliKCwyToxysiO9Z4LT0Ds41vRSqcGXfIoVha6OTDB4P2ApTCmswcoL+KS5XJEm nb3a1XFmby+qp0Hau00ysJj1K7ejGaa6hp4B4I7I+yBSQ0AJvIDPlI8KQ28975cWn09H 5JbnQlv1Lo6CDbIrlzPK40X9rtt9WKOO+jQOXoECRMyqaT2TelwZBE9Norglc4HS4le4 2T46/jvCFqeZtxaGgCC3NDlsRHa6FTLfaQ4jRfFPcttiZBvzGqRNa/5utVzrEn/JMhff J1trS7dA8yaF2YWunejKcgKUVLBTpI88BzSxjEbRnT3fvNVNdKD3r1/3WP7yjycETllB jpqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ruhCOLXfvf4U7nTimxoXhNWQom4E3jZ+Ybqg5tIofRg=; b=t5Wjgt420druG6dsqpTQ7x5zYAXBo9PG57+furhjTTslK+zTmBWugqGJtqzpeinDBW vyJAIU36fsItQ7PBL+HZrdFBzaNeyROkm8O4l/t2WSZ2L+MrLOVlLSn1h0QBhAN0HRdN W+BWIiFaENVRhcrmm0Cc+7Z6gdT8GxsKhsWexo26x7ptS0YLtRBjm0cXaNbqq4CFNIbp Fp3FFA0X/i/paPNsF83L89Sw1o7/Cm5jA1cb3YEvzyS7NttdPwfj+kvfNjXuqezN15gR i4MA2ixeUKiF3eJOMIw7M0aGbdTJZxIRt1/RA8t/pCMlpbPODpr7nAKe86ErsyRMgTwu Rwjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k73si775653pgc.707.2018.03.18.15.07.11; Sun, 18 Mar 2018 15:07:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754756AbeCRWGJ convert rfc822-to-8bit (ORCPT + 99 others); Sun, 18 Mar 2018 18:06:09 -0400 Received: from mondschein.lichtvoll.de ([194.150.191.11]:54587 "EHLO mail.lichtvoll.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754342AbeCRWGE (ORCPT ); Sun, 18 Mar 2018 18:06:04 -0400 Received: from merkaba.localnet (ppp-46-244-243-63.dynamic.mnet-online.de [46.244.243.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lichtvoll.de (Postfix) with ESMTPSA id 4A4A22C71D5; Sun, 18 Mar 2018 23:06:03 +0100 (CET) From: Martin Steigerwald To: Hans de Goede Cc: Linux Kernel Mailing List , Thorsten Leemhuis , Tejun Heo Subject: Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts Date: Sun, 18 Mar 2018 23:06:02 +0100 Message-ID: <2906688.e4ghZiFuBA@merkaba> In-Reply-To: References: <27165802.vQ9JbjrmvU@merkaba> <4373847.u3KAOBid1D@merkaba> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Hans. Hans de Goede - 18.03.18, 22:34: > On 14-03-18 13:48, Martin Steigerwald wrote: > > Hans de Goede - 14.03.18, 12:05: > >> Hi, > >> > >> On 14-03-18 12:01, Martin Steigerwald wrote: > >>> Hans de Goede - 11.03.18, 15:37: > >>>> Hi Martin, > >>>> > >>>> On 11-03-18 09:20, Martin Steigerwald wrote: > >>>>> Hello. > >>>>> > >>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue > >>>>> with SMART checks occassionally failing like this: > >>>>> > >>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending > >>>>> checks > >>>>> udisksd[24408]: Error performing housekeeping for drive > >>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error > >>>>> updating > >>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected > >>>>> sense > >>>>> data returned:#0120000: 0e 09 0c 00 00 00 ff 00 00 00 00 00 00 00 > >>>>> 50 > >>>>> 00 ..............P.#0120010: 00 00 00 00 00 00 00 00 00 00 00 00 > >>>>> 00 > >>>>> 00 00 00 ................#012 (g-io-error-quark, 0) merkaba > >>>>> udisksd[24408]: Error performing housekeeping for drive > >>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error > >>>>> updating > >>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected > >>>>> sense > >>>>> data returned:#0120000: 01 00 1d 00 00 00 0e 09 0c 00 00 00 ff 00 > >>>>> 00 > >>>>> 00 ................#0120010: 00 0 0 00 00 50 00 00 00 00 00 00 00 > >>>>> 00 00 00 00 ....P...........#012 (g-io-error-quark, 0) > >>>>> > >>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad > >>>>> T520) > >>>>> > >>>>> However when I then check manually with smartctl -a | -x | -H the > >>>>> device > >>>>> reports SMART data just fine. > >>>>> > >>>>> As smartd correctly detects that device is in sleep mode, this may be > >>>>> an > >>>>> userspace issue in udisksd. > >>>>> > >>>>> Also at some boot attempts the boot hangs with a message like "could > >>>>> not > >>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1 > >>>>> on to LVs (each on one of the SSDs). A configuration that requires a > >>>>> manual > >>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before > >>>>> btrfs device scan). > >>>>> > >>>>> I wonder whether that has to do with the new SATA LPM policy stuff, > >>>>> but > >>>>> as > >>>>> I had issues with > >>>>> > >>>>> 3 => Medium power with Device Initiated PM enabled > >>>>> > >>>>> (machine did not boot, which could also have been caused by me > >>>>> accidentally > >>>>> removing all TCP/IP network support in the kernel with that setting) > >>>>> > >>>>> I set it back to > >>>>> > >>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0 > >>>>> > >>>>> (firmware settings) > >>>> > >>>> Right, so at that settings the LPM policy changes are effectively > >>>> disabled and cannot explain your SMART issues. > >>>> > >>>> Still I would like to zoom in on this part of your bug report, because > >>>> for Fedora 28 we are planning to ship with > >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 > >>>> and AFAIK Ubuntu has similar plans. > >>>> > >>>> I suspect that the issue you were seeing with > >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've > >>>> attached > >>>> a patch for you to test, which disabled LPM for your model Crucial SSD > >>>> (but > >>>> keeps it on for the Intel disk) if you can confirm that with that patch > >>>> you > >>>> can run with > >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great. > >>> > >>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system > >>> successfully > >>> booted three times in a row. So feel free to add tested-by. > >> > >> Thanks. > >> > >> To be clear, you're talking about 4.16-rc5 with the patch I made to > >> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ? > > > > 4.16-rc5 with your > > > > 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch > > I was about to submit this upstream and was planning on extending it to > also cover the 960GB version, which lead to me doing a quick google. > Judging from the google results it seems that there are multiple firmware > versions of this SSD out there and I wonder if you are perhaps running > an older version of the firmware. If you do: > > dmesg | grep Crucial_CT480M500 > > You should see something like this: > > ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133 > > I'm interested in the "MU03" part, what is that in your case? Although I never updated the firmware, I do have MU03: % lsscsi | grep Crucial [2:0:0:0] disk ATA Crucial_CT480M50 MU03 /dev/sdb % dmesg | grep Crucial_CT480M500 [ 2.424537] ata3.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133 > Note I'm not saying we should not do the NOLPM quirk, but maybe we > can limit it to older firmware. Thanks, -- Martin