LinuxLists.cc - very slow SSD with fsync

2015-03-13 14:48:06

Subject: very slow SSD with fsync

Slow as in sync(1) taking 6 seconds on an idle system, dd-ing from the
raw disk drops from 600MB/s to less than 1MB/s. iotop shows jbd2 often
waiting a long time, but _very_ low throughput (KB/s it that). I've
spent a day or so trying everything I can find to identify the cause but
I'm completely stuck (even with help from a btrfs guru - he eventually
suggested asking here).

Chrome and dropbox seem to trigger this, I'm just installing a pretty
new Macbook Pro (didn't get along with OSX, returning to Linux). The
system was fine, but now seems to be reaaaaalllly slow. Maybe it's as
I've put stuff on, but I think that it's degradation caused by something
else.

The only thing that makes a difference is mounting with barrier=0 but
I'd rather not do that of course. I'm not trying to maximise
performance here, just eliminate some nasty buglet/incompatibility.

I've tried:
- fstrim
- noncq (just in case)
- disabling power management (tlp daemon, then turning off tweakables in
powertop)
- latest 3.19.1 ubuntu kernel and also 3.16
- adding discard to mount options, also taking it away again and just fstrim
- disabling SMART (saw one report on a normal hard disk after 30mins -
note that mine is slow immediately)
http://unix.stackexchange.com/questions/165464/calls-to-sync-fsync-slow-down-after-30-minutes-uptime
- using the tracing framework (thanks to Ted for a comment elsewhere) I
traced the slowness to these syncs being very slow (my much older laptop
takes 0.05s...)

Disk is EcryptFS on LVM (mounted data=ordered) - Ubuntu defaults. I
tried moving dropbox to non-encryptFS and no joy. My friend suggested
that I use LUKS instead but it seems that I'll still have this issue.

I've been using dd|pv for some tests, hdparm shows similar results (this
set taken with thunderbird, but no chrome, no dropbox running) - and the
cursor locks up every few seconds whilst I'm typing this!)

[13:50]~# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
Model Number: APPLE SSD SM0512F
Serial Number: S1K5NYBF266383
Firmware Revision: UXM2JA1Q
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II
Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x0039)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 977105060
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 477102 MBytes
device size with M = 1000*1000: 500277 MBytes (500 GB)
cache/buffer size = unknown
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 128, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
DMA Setup Auto-Activate optimization
* Software settings preservation
* SET MAX SETPASSWORD/UNLOCK DMA commands
* WRITE BUFFER DMA command
* READ BUFFER DMA command
* Data Set Management TRIM supported (limit 8 blocks)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
6min for SECURITY ERASE UNIT. 32min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538655584d30
NAA : 5
IEEE OUI : 002538
Unique ID : 655584d30
Integrity word not set (found 0x00bf, expected 0x100a5)

iotop -o -b:
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.05 % [kworker/u16:3]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 11.81 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
275 be/3 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [jbd2/dm-0-8]
4308 be/4 abridget 0.00 B/s 0.00 B/s 0.00 % 99.99 % thunderbird
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.07 % [kworker/u16:3]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 31.33 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
184 be/4 root 0.00 B/s 31.33 K/s 0.00 % 0.00 % [kworker/u16:3]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 7.70 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
275 be/3 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [jbd2/dm-0-8]
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.04 % [kworker/u16:3]
3148 be/4 abridget 0.00 B/s 7.70 K/s 0.00 % 0.00 % upstart --user
Total DISK READ : 0.00 B/s | Total DISK WRITE : 15.74 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 70.81 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
4308 be/4 abridget 0.00 B/s 15.74 K/s 0.00 % 0.08 % thunderbird
275 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.01 % [jbd2/dm-0-8]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.07 % [kworker/u16:3]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 42.50 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 42.50 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
275 be/3 root 0.00 B/s 27.04 K/s 0.00 % 99.99 % [jbd2/dm-0-8]
4308 be/4 abridget 0.00 B/s 15.45 K/s 0.00 % 0.07 % thunderbird
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % [kworker/u16:3]
Total DISK READ : 0.00 B/s | Total DISK WRITE : 30.87 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 212.24 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
275 be/3 root 0.00 B/s 19.29 K/s 0.00 % 99.99 % [jbd2/dm-0-8]
4308 be/4 abridget 0.00 B/s 11.58 K/s 0.00 % 0.00 % thunderbird
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 23.49 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
4308 be/4 abridget 0.00 B/s 0.00 B/s 0.00 % 25.66 % thunderbird
184 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.05 % [kworker/u16:3]
275 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [jbd2/dm-0-8]

And now (more programs):
[14:30]~$ sudo mount -o remount,barrier=0 /
[14:33]~$ sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 6548 MB in 2.00 seconds = 3275.19 MB/sec
Timing buffered disk reads: 2352 MB in 3.00 seconds = 783.92 MB/sec

[14:34]~$ sudo mount -o remount,barrier=1 /
[14:34]~$ sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 2 MB in 2.88 seconds = 710.50 kB/sec
Timing buffered disk reads: 20 MB in 5.07 seconds = 3.94 MB/sec

[14:34]~$ sudo mount -o remount,barrier=0 /
[14:35]~$ sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 17316 MB in 2.00 seconds = 8666.69 MB/sec
Timing buffered disk reads: 2368 MB in 3.00 seconds = 789.12 MB/sec

2015-03-13 15:26:02

by Theodore Ts'o

[permalink] [raw]

Subject: Re: very slow SSD with fsync

On Fri, Mar 13, 2015 at 02:48:03PM +0000, Adrian Bridgett wrote:
> Slow as in sync(1) taking 6 seconds on an idle system, dd-ing from the raw
> disk drops from 600MB/s to less than 1MB/s. iotop shows jbd2 often waiting
> a long time, but _very_ low throughput (KB/s it that). I've spent a day or
> so trying everything I can find to identify the cause but I'm completely
> stuck (even with help from a btrfs guru - he eventually suggested asking
> here).

The obvious thing to do is to try using blktrace to see which I/O
operations are triggering the problem. It may just be that the hard
drive's internal FTL metadata is very badly fragmented.

If so, it could be that that the only thing that will fix things is to
reset the entire FTL is to do a full backup, do a discard or a secure
erase on the entire SSD, and then reformat and do a restore from
backups.

You might also try looking at the SSD model and try to find some
reviews on Tom's Hardware or Anand Tech to see how the SSD fares when
stressed on various SSD torture tests. What model SSD is it, by the
way?

- Ted

2015-03-13 19:34:01

by Adrian Bridgett

[permalink] [raw]

Subject: Re: very slow SSD with fsync

On 13/03/15 15:25, Theodore Ts'o wrote:
> The obvious thing to do is to try using blktrace to see which I/O
> operations are triggering the problem. It may just be that the hard
> drive's internal FTL metadata is very badly fragmented.
Damn :-( Seems a bit odd as it's only a few months old (not used much)
and OSX _seems_ fine.

Not quite sure what to look for in blktrace. I see the sync but the
timestamps in blktrace are near instantaneous vs 1.3, 2.2 seconds
reported by strace. Trying a "find" and watching it stall blktrace
shows this - which to my amateur eyes shows a 0.5s gap between the read
request at 1.815589 and being issued at 2.373. Two completions being
the only occurrence in the middle. Maybe queue depth was full - but we
have a two completions earlier.

8,0 0 226 1.812363080 271 D WS 498955904 + 16 (
988) [jbd2
/dm-0-8]
8,0 0 227 1.812439052 0 C WS 498955904 + 16 (
75972) [0]
8,4 0 228 1.812456484 271 A FWS 0 + 0 <- (252,0) 0
8,0 0 229 1.812456838 271 Q FWS [jbd2/dm-0-8]
8,4 0 230 1.812457124 271 A FWS 0 + 0 <- (252,0) 0
8,0 0 231 1.812457237 271 Q FWS [jbd2/dm-0-8]
8,0 0 232 1.812458007 271 G FWS [jbd2/dm-0-8]
8,0 0 233 1.812458478 271 I FWS ( 471) [jbd2/dm-0-8]
8,0 0 234 1.812465182 271 G FWS [jbd2/dm-0-8]
8,0 0 235 1.812465316 271 I FWS ( 134) [jbd2/dm-0-8]
8,0 0 236 1.815446245 0 C RM 507106648 + 8 (
3778637) [0]
8,4 2 11 1.815585457 31072 A RM 16875872 + 8 <-
(252,0) 1687382
4
8,0 2 12 1.815585633 31072 A RM 507107680 + 8 <- (8,4)
16875872
8,0 2 13 1.815586124 31072 Q RM 507107680 + 8 [find]
8,0 2 14 1.815588326 31072 G RM 507107680 + 8 [find]
8,0 2 15 1.815589269 31072 I RM 507107680 + 8 ( 943)
[find]
8,0 0 237 1.830102994 0 C WS 0 (17739914) [0]
8,0 0 238 2.373458367 0 C WS 0 [0]
8,0 0 239 2.373469557 0 D RM 507107680 + 8
(557880288) [swap
per/0]
8,4 0 240 2.373486949 15158 A WFS 8724112 + 8 <- (252,0)
8722064
8,0 0 241 2.373487336 15158 A WFS 498955920 + 8 <- (8,4)
8724112
8,0 0 242 2.373488480 15158 Q WFS 498955920 + 8
[kworker/0:2]
8,0 0 243 2.373490158 15158 G WFS 498955920 + 8
[kworker/0:2]
8,0 0 244 2.373491573 15158 I WFS 498955920 + 8 ( 1415)
[kwork
er/0:2]
8,0 0 245 2.373491927 15158 D WS 498955920 + 8 ( 354)
[kworker/0:2]
8,0 0 246 2.373596570 0 C WS 498955920 + 8 (
104643) [0]
8,0 0 247 2.375251869 0 C RM 507107680 + 8 (
1782312) [0]
8,4 2 16 2.375452116 31072 A RM 16875072 + 8 <-
(252,0) 16873024
8,0 2 17 2.375452283 31072 A RM 507106880 + 8 <- (8,4)
16875072
8,0 2 18 2.375452776 31072 Q RM 507106880 + 8 [find]

I'm looking at 'watch -n0 cat /proc/diskstats |egrep -v "ram|loop"'
and it's normally sat down at 0/1 for I/Os in progress (have seen hit
100 _briefly_ and then drop just as quickly).

> What model SSD is it, by the way?
It's the built-in Apple MBP 512GB drive - "APPLE SSD SM0512F" which
apparently means Samsung (I've been reading about issues with their
840EVO drives...

Thanks once again (and for all your immense work over the many, many years!)

Adrian

2015-03-15 14:36:37

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: very slow SSD with fsync

Adrian Bridgett <[email protected]> writes:

> On 13/03/15 15:25, Theodore Ts'o wrote:
>> The obvious thing to do is to try using blktrace to see which I/O
>> operations are triggering the problem. It may just be that the hard
>> drive's internal FTL metadata is very badly fragmented.
> Damn :-( Seems a bit odd as it's only a few months old (not used much)
> and OSX _seems_ fine.
>
> Not quite sure what to look for in blktrace. I see the sync but the
> timestamps in blktrace are near instantaneous vs 1.3, 2.2 seconds
> reported by strace. Trying a "find" and watching it stall blktrace
> shows this - which to my amateur eyes shows a 0.5s gap between the read
> request at 1.815589 and being issued at 2.373. Two completions being
> the only occurrence in the middle. Maybe queue depth was full - but we
> have a two completions earlier.
>
>
> 8,0 0 226 1.812363080 271 D WS 498955904 + 16 (
> 988) [jbd2
> /dm-0-8]
> 8,0 0 227 1.812439052 0 C WS 498955904 + 16 (
> 75972) [0]
> 8,4 0 228 1.812456484 271 A FWS 0 + 0 <- (252,0) 0
> 8,0 0 229 1.812456838 271 Q FWS [jbd2/dm-0-8]
> 8,4 0 230 1.812457124 271 A FWS 0 + 0 <- (252,0) 0
> 8,0 0 231 1.812457237 271 Q FWS [jbd2/dm-0-8]
> 8,0 0 232 1.812458007 271 G FWS [jbd2/dm-0-8]
> 8,0 0 233 1.812458478 271 I FWS ( 471) [jbd2/dm-0-8]
> 8,0 0 234 1.812465182 271 G FWS [jbd2/dm-0-8]
> 8,0 0 235 1.812465316 271 I FWS ( 134) [jbd2/dm-0-8]
> 8,0 0 236 1.815446245 0 C RM 507106648 + 8 (
> 3778637) [0]
> 8,4 2 11 1.815585457 31072 A RM 16875872 + 8 <-
> (252,0) 1687382
> 4
> 8,0 2 12 1.815585633 31072 A RM 507107680 + 8 <- (8,4)
> 16875872
> 8,0 2 13 1.815586124 31072 Q RM 507107680 + 8 [find]
> 8,0 2 14 1.815588326 31072 G RM 507107680 + 8 [find]
> 8,0 2 15 1.815589269 31072 I RM 507107680 + 8 ( 943)
> [find]
> 8,0 0 237 1.830102994 0 C WS 0 (17739914) [0]
> 8,0 0 238 2.373458367 0 C WS 0 [0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As far as I can see it stuck on superblock completion which was issued
as 228'th request. It takes ~500ms to complete it. Tend to agree with
Ted looks likely your SSD has problems probably with FTL layer.
In order to check that filesystem is not related with the issye you may try RAW IO:

###umount your filesystem. *This is important, otherwise you may corrupt your FS*
#umount /$YOUR_FS
### Read and write back your superblock. Second operation should give
### you same timings as superblock commit from JBD2
#dd if=/dev/$YOUR_DISK of=super_block bs=4k count=4 iflag=direct
#dd if=super_block of=/dev/$YOUR_DISK bs=4k count=4 conv=fsync,notrunc oflag=direct

Superblock is most frequently rewritten block in whole disk, so if
previous filesystem do not issued TRIM requests back to SSD, so it's FTL may
become very slow. You should try fstrim(8), probably it can helps

> 8,0 0 239 2.373469557 0 D RM 507107680 + 8
> (557880288) [swap
> per/0]
> 8,4 0 240 2.373486949 15158 A WFS 8724112 + 8 <- (252,0)
> 8722064
> 8,0 0 241 2.373487336 15158 A WFS 498955920 + 8 <- (8,4)
> 8724112
> 8,0 0 242 2.373488480 15158 Q WFS 498955920 + 8
> [kworker/0:2]
> 8,0 0 243 2.373490158 15158 G WFS 498955920 + 8
> [kworker/0:2]
> 8,0 0 244 2.373491573 15158 I WFS 498955920 + 8 ( 1415)
> [kwork
> er/0:2]
> 8,0 0 245 2.373491927 15158 D WS 498955920 + 8 ( 354)
> [kworker/0:2]
> 8,0 0 246 2.373596570 0 C WS 498955920 + 8 (
> 104643) [0]
> 8,0 0 247 2.375251869 0 C RM 507107680 + 8 (
> 1782312) [0]
> 8,4 2 16 2.375452116 31072 A RM 16875072 + 8 <-
> (252,0) 16873024
> 8,0 2 17 2.375452283 31072 A RM 507106880 + 8 <- (8,4)
> 16875072
> 8,0 2 18 2.375452776 31072 Q RM 507106880 + 8 [find]
>
> I'm looking at 'watch -n0 cat /proc/diskstats |egrep -v "ram|loop"'
> and it's normally sat down at 0/1 for I/Os in progress (have seen hit
> 100 _briefly_ and then drop just as quickly).
>
>> What model SSD is it, by the way?
> It's the built-in Apple MBP 512GB drive - "APPLE SSD SM0512F" which
> apparently means Samsung (I've been reading about issues with their
> 840EVO drives...
>
> Thanks once again (and for all your immense work over the many, many years!)
>
> Adrian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Attachments:

signature.asc (472.00 B)