2016-12-14 12:08:17

by Vasiliy Tolstov

[permalink] [raw]
Subject: Samsung SSD 1.92TB PM863 Enterprise 2.5" SATA3 errors withc stable 4.4.34

Hi! I have stable problems with all Samsung SSD drivers like PM863 and
EVO 850 Pro.

Time after time scsi bus reset link with messages:
[ 2477.973617] ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800
action 0xe frozen
[ 2477.975036] ata1: irq_stat 0x00400040, connection status changed
[ 2477.976396] ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
[ 2477.977766] ata1: hard resetting link
[ 2478.701015] ata1: SATA link down (SStatus 0 SControl 300)
[ 2483.700924] ata1: hard resetting link
[ 2484.020924] ata1: SATA link down (SStatus 0 SControl 300)
[ 2484.022257] ata1: limiting SATA link speed to 1.5 Gbps
[ 2489.020766] ata1: hard resetting link
[ 2489.340828] ata1: SATA link down (SStatus 0 SControl 310)
[ 2489.342158] ata1.00: disabled
[ 2489.343452] ata1: EH complete
[ 2489.344806] ata1.00: detaching (SCSI 0:0:0:0)
[ 2489.347434] sd 0:0:0:0: [sda] Stopping disk
[ 2489.348605] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 3457.586929] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000
action 0xe frozen
[ 3457.588224] ata1: irq_stat 0x00000040, connection status changed
[ 3457.589453] ata1: SError: { CommWake DevExch }
[ 3457.590679] ata1: hard resetting link
[ 3458.312616] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 3461.139831] ata1.00: ATA-9: SAMSUNG MZ7LM1T9HCJM-0E003, GXT3003Q,
max UDMA/133
[ 3461.141027] ata1.00: 3750748848 sectors, multi 16: LBA48 NCQ (depth
31/32), AA
[ 3461.142882] ata1.00: configured for UDMA/133
[ 3461.144004] ata1: EH complete
[ 3461.145545] scsi 0:0:0:0: Direct-Access ATA SAMSUNG MZ7LM1T9 003Q
PQ: 0 ANSI: 5
[ 3461.147069] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 3461.147082] sd 0:0:0:0: [sda] 3750748848 512-byte logical blocks:
(1.92 TB/1.75 TiB)
[ 3461.147649] sd 0:0:0:0: [sda] Write Protect is off
[ 3461.147652] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 3461.147849] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 3461.152457] sd 0:0:0:0: [sda] Attached SCSI removable disk

I'm try to remove drive and add it again message not appears may be
one hour or more. I'm try different servers from HP and Supermicro and
error is present. Also i'm try various disk from this series and
nothing changed.

If i have massive workload like writing to ext4 fs on this ssd drivers
i get corrupted ext4 journal and readonly fs.

My kernel version is 4.4.34
May be some Samsung engineers presented in this mailing list and ca
help to solve this errors? Or for server i need only Intel SSD (yes if
i use intel ssd this error not happening, this is not intel
advertising)

--
Vasiliy Tolstov,
e-mail: [email protected]


2016-12-14 12:22:09

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: Samsung SSD 1.92TB PM863 Enterprise 2.5" SATA3 errors withc stable 4.4.34

On Wed, Dec 14, 2016 at 03:07:48PM +0300, Vasiliy Tolstov wrote:
> Hi! I have stable problems with all Samsung SSD drivers like PM863 and
> EVO 850 Pro.
>
> Time after time scsi bus reset link with messages:
> [ 2477.973617] ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [ 2477.975036] ata1: irq_stat 0x00400040, connection status changed
> [ 2477.976396] ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }

Random shot in the dark, have you tried changing the cable?
HostInt: HBA Internal error
PHYRdyChg: PhyRdy signal changed state
10B8B: 10b to 8b decoding error occurred
DevExch: Device presence has changed

This could indicate a problem with the cabling and/or connectors.

Byte,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2016-12-14 12:48:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Samsung SSD 1.92TB PM863 Enterprise 2.5" SATA3 errors withc stable 4.4.34

On Wed, Dec 14, 2016 at 03:07:48PM +0300, Vasiliy Tolstov wrote:
> Hi! I have stable problems with all Samsung SSD drivers like PM863 and
> EVO 850 Pro.
>
> Time after time scsi bus reset link with messages:
> [ 2477.973617] ata1: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [ 2477.975036] ata1: irq_stat 0x00400040, connection status changed
> [ 2477.976396] ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [ 2477.977766] ata1: hard resetting link
> [ 2478.701015] ata1: SATA link down (SStatus 0 SControl 300)
> [ 2483.700924] ata1: hard resetting link
> [ 2484.020924] ata1: SATA link down (SStatus 0 SControl 300)
> [ 2484.022257] ata1: limiting SATA link speed to 1.5 Gbps
> [ 2489.020766] ata1: hard resetting link
> [ 2489.340828] ata1: SATA link down (SStatus 0 SControl 310)
> [ 2489.342158] ata1.00: disabled
> [ 2489.343452] ata1: EH complete
> [ 2489.344806] ata1.00: detaching (SCSI 0:0:0:0)
> [ 2489.347434] sd 0:0:0:0: [sda] Stopping disk
> [ 2489.348605] sd 0:0:0:0: [sda] Start/Stop Unit failed: Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [ 3457.586929] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000
> action 0xe frozen
> [ 3457.588224] ata1: irq_stat 0x00000040, connection status changed
> [ 3457.589453] ata1: SError: { CommWake DevExch }
> [ 3457.590679] ata1: hard resetting link
> [ 3458.312616] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 3461.139831] ata1.00: ATA-9: SAMSUNG MZ7LM1T9HCJM-0E003, GXT3003Q,
> max UDMA/133
> [ 3461.141027] ata1.00: 3750748848 sectors, multi 16: LBA48 NCQ (depth
> 31/32), AA
> [ 3461.142882] ata1.00: configured for UDMA/133
> [ 3461.144004] ata1: EH complete
> [ 3461.145545] scsi 0:0:0:0: Direct-Access ATA SAMSUNG MZ7LM1T9 003Q
> PQ: 0 ANSI: 5
> [ 3461.147069] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ 3461.147082] sd 0:0:0:0: [sda] 3750748848 512-byte logical blocks:
> (1.92 TB/1.75 TiB)
> [ 3461.147649] sd 0:0:0:0: [sda] Write Protect is off
> [ 3461.147652] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 3461.147849] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 3461.152457] sd 0:0:0:0: [sda] Attached SCSI removable disk
>
> I'm try to remove drive and add it again message not appears may be
> one hour or more. I'm try different servers from HP and Supermicro and
> error is present. Also i'm try various disk from this series and
> nothing changed.
>
> If i have massive workload like writing to ext4 fs on this ssd drivers
> i get corrupted ext4 journal and readonly fs.
>
> My kernel version is 4.4.34
> May be some Samsung engineers presented in this mailing list and ca
> help to solve this errors? Or for server i need only Intel SSD (yes if
> i use intel ssd this error not happening, this is not intel
> advertising)

Do you also have problems with this on the 4.9 kernel release? We can't
add any changes to 4.4 that is not already made in 4.9.

thanks,

greg k-h

2016-12-14 20:19:55

by Vasiliy Tolstov

[permalink] [raw]
Subject: Re: Samsung SSD 1.92TB PM863 Enterprise 2.5" SATA3 errors withc stable 4.4.34

2016-12-14 15:22 GMT+03:00 Johannes Thumshirn <[email protected]>:
> Random shot in the dark, have you tried changing the cable?
> HostInt: HBA Internal error
> PHYRdyChg: PhyRdy signal changed state
> 10B8B: 10b to 8b decoding error occurred
> DevExch: Device presence has changed
>
> This could indicate a problem with the cabling and/or connectors.


This can happening, but i don't have cables - all disk attached to
motherboard via internal drive bay. And also - ssd is new and this
problem persists across many servers.

--
Vasiliy Tolstov,
e-mail: [email protected]

2016-12-14 20:20:40

by Vasiliy Tolstov

[permalink] [raw]
Subject: Re: Samsung SSD 1.92TB PM863 Enterprise 2.5" SATA3 errors withc stable 4.4.34

2016-12-14 15:48 GMT+03:00 Greg KH <[email protected]>:
> Do you also have problems with this on the 4.9 kernel release? We can't
> add any changes to 4.4 that is not already made in 4.9.


Thanks, i'm try to create reproducible fio test on 4.4.x and after
that go to 4.9

--
Vasiliy Tolstov,
e-mail: [email protected]