Hi,
I installed my new laptop on Saturday and setup an ext4 filesystem
on my / and /home partitions. Without me doing much file transfers,
I noticed today:
jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
342614039
This is on a 100GB partition. I used fstrim multiple times. I analysed
the increase over some time today and issued an fstrim in between:
2013-09-12 15:53 341.582.779
2013-09-12 15:54 341.582.971
2013-09-12 15:58 341.583.103
2013-09-12 16:01 341.584.095
2013-09-12 16:04 341.584.475
2013-09-12 16:05 341.584.623
<fstrim -v /home => /home/: 1052205056 bytes were trimmed>
2013-09-12 16:07 342.612.167
2013-09-12 16:08 342.612.323
2013-09-12 16:10 342.613.995
2013-09-12 16:11 342.614.039
2013-09-12 16:15 342.614.291
2013-09-12 16:16 342.614.475
So it seems that ext4 counts the trims as writes? I don't know how I could
get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
-- otherwise.
My smart values for my SSD are:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0003 100 100 070 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0003 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0002 100 100 000 Old_age Always - 23
12 Power_Cycle_Count 0x0002 100 100 000 Old_age Always - 37
177 Wear_Leveling_Count 0x0003 100 100 000 Pre-fail Always - 465
178 Used_Rsvd_Blk_Cnt_Chip 0x0003 100 100 000 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0003 100 100 000 Pre-fail Always - 0
182 Erase_Fail_Count_Total 0x0003 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0002 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0003 100 100 000 Pre-fail Always - 7
196 Reallocated_Event_Count 0x0003 100 100 000 Pre-fail Always - 0
198 Offline_Uncorrectable 0x0003 100 100 000 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x0003 100 100 000 Pre-fail Always - 0
232 Available_Reservd_Space 0x0003 100 100 010 Pre-fail Always - 0
241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 1494
242 Total_LBAs_Read 0x0003 100 100 000 Pre-fail Always - 1308
Do those still look OK?
Please help,
I don't know what happens here.
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
On Thu, Sep 12, 2013 at 04:18:56PM +0200, Julian Andres Klode wrote:
> Hi,
>
> I installed my new laptop on Saturday and setup an ext4 filesystem
> on my / and /home partitions. Without me doing much file transfers,
> I noticed today:
Please note that I am not subscribed to the mailing list, so please
keep me in To or CC when answering.
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> Hi,
>
> I installed my new laptop on Saturday and setup an ext4 filesystem
> on my / and /home partitions. Without me doing much file transfers,
> I noticed today:
>
> jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> 342614039
>
> This is on a 100GB partition. I used fstrim multiple times. I analysed
> the increase over some time today and issued an fstrim in between:
<snip>
> So it seems that ext4 counts the trims as writes? I don't know how I could
> get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> -- otherwise.
The way fstrim works is that it allocates a temporary file that fills
almost the entire free space on the partition. I believe it does this
with fallocate in order to ensure that space for the file is actually
reserved on disc (but it does not get written to!). It then looks up
where on disc the file's reserved space is, and sends a trim command to
the drive to free that space. Afterwards, it deletes the temporary file.
So what you are seeing means means that it's probably just an issue with
the write accounting, where the blocks reserved by the fallocate are
counted as writes.
> My smart values for my SSD are:
>
> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 1494
You should be able to confirm this by checking the 'Total_LBAs_Written'
attribute before and after doing the fstrim; it should either not go up,
or go up only be a small amount. Although to be honest, I'm not sure
what this is counting - if that raw value is actually LBAs, that would
only account for 747KiB of writes! I guess it's probably a count of
erase blocks or something - what model is the SSD?
--
Calvin Walton <[email protected]>
On Thu, Sep 12, 2013 at 10:54:03AM -0400, Calvin Walton wrote:
> On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> > Hi,
> >
> > I installed my new laptop on Saturday and setup an ext4 filesystem
> > on my / and /home partitions. Without me doing much file transfers,
> > I noticed today:
> >
> > jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> > 342614039
> >
> > This is on a 100GB partition. I used fstrim multiple times. I analysed
> > the increase over some time today and issued an fstrim in between:
> <snip>
> > So it seems that ext4 counts the trims as writes? I don't know how I could
> > get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> > -- otherwise.
>
> The way fstrim works is that it allocates a temporary file that fills
> almost the entire free space on the partition. I believe it does this
> with fallocate in order to ensure that space for the file is actually
> reserved on disc (but it does not get written to!). It then looks up
> where on disc the file's reserved space is, and sends a trim command to
> the drive to free that space. Afterwards, it deletes the temporary file.
>
> So what you are seeing means means that it's probably just an issue with
> the write accounting, where the blocks reserved by the fallocate are
> counted as writes.
I can also confirm that using fallocate to allocate a 1G file (and deleting
it afterwards without modifying it in between; with discard enabled [I enabled
this now after the log for testing]) also increases the write number by 1G.
>
> > My smart values for my SSD are:
> >
> > SMART Attributes Data Structure revision number: 1
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> > 241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 1494
>
> You should be able to confirm this by checking the 'Total_LBAs_Written'
> attribute before and after doing the fstrim; it should either not go up,
> or go up only be a small amount. Although to be honest, I'm not sure
> what this is counting - if that raw value is actually LBAs, that would
> only account for 747KiB of writes! I guess it's probably a count of
> erase blocks or something - what model is the SSD?
It was 1493 some time before the trim. The disk is a PLEXTOR PX-128M5.
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
On 9/12/13 9:54 AM, Calvin Walton wrote:
> On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
>> Hi,
>>
>> I installed my new laptop on Saturday and setup an ext4 filesystem
>> on my / and /home partitions. Without me doing much file transfers,
>> I noticed today:
>>
>> jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
>> 342614039
>>
>> This is on a 100GB partition. I used fstrim multiple times. I analysed
>> the increase over some time today and issued an fstrim in between:
> <snip>
>> So it seems that ext4 counts the trims as writes? I don't know how I could
>> get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
>> -- otherwise.
>
> The way fstrim works is that it allocates a temporary file that fills
> almost the entire free space on the partition.
No, that's not correct.
> I believe it does this
> with fallocate in order to ensure that space for the file is actually
> reserved on disc (but it does not get written to!). It then looks up
> where on disc the file's reserved space is, and sends a trim command to
> the drive to free that space. Afterwards, it deletes the temporary file.
Nope. ;) strace it and see, it does nothing like this - it calls a special
ioctl to ask the fs to find and issue discards on unused blocks.
# strace -e open,write,fallocate,unlink,ioctl fstrim mnt/
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/libc.so.6", O_RDONLY) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
open("mnt/", O_RDONLY) = 3
ioctl(3, 0xc0185879, 0x7fff6ac47d40) = 0 <=== FITRIM ioctl
(old hdparm discard might have done what you say, but that was a hack).
> So what you are seeing means means that it's probably just an issue with
> the write accounting, where the blocks reserved by the fallocate are
> counted as writes.
I also think that it is just accounting, and probably just an error,
which seems to be fixed by now - what kernel are you running?
When you report it in ext4, it calculates it like this:
return snprintf(buf, PAGE_SIZE, "%llu\n",
(unsigned long long)(sbi->s_kbytes_written +
((part_stat_read(sb->s_bdev->bd_part, sectors[1]) -
EXT4_SB(sb)->s_sectors_written_start) >> 1)));
so it counts partition stats in the mix (outside of ext4's accounting)
On io completion, we add the bytes "completed" (blk_account_io_completion())
And it sounds like it's counting trim/discard completions in the mix.
does /proc/diskstats show a jump for your partition after an fstrim as well?
But what kernel are you running? I don't see it on a 3.11 kernel:
After a fresh mkfs I'm at:
[root@bp-05 tmp]# dumpe2fs -h fsfile | grep Lifetime
dumpe2fs 1.41.12 (17-May-2010)
Lifetime writes: 8135 MB
and then several fstrims don't budge it:
[root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
8330683
[root@bp-05 tmp]# fstrim mnt/
[root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
8330683
[root@bp-05 tmp]# fstrim mnt/
[root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
8330683
-Eric
On Thu, Sep 12, 2013 at 10:54:03AM -0400, Calvin Walton wrote:
> On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> > Hi,
> >
> > I installed my new laptop on Saturday and setup an ext4 filesystem
> > on my / and /home partitions. Without me doing much file transfers,
> > I noticed today:
> >
> > jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> > 342614039
> >
> > This is on a 100GB partition. I used fstrim multiple times. I analysed
> > the increase over some time today and issued an fstrim in between:
> <snip>
> > So it seems that ext4 counts the trims as writes? I don't know how I could
> > get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> > -- otherwise.
>
> The way fstrim works is that it allocates a temporary file that fills
> almost the entire free space on the partition. I believe it does this
> with fallocate in order to ensure that space for the file is actually
> reserved on disc (but it does not get written to!). It then looks up
> where on disc the file's reserved space is, and sends a trim command to
> the drive to free that space. Afterwards, it deletes the temporary file.
>
> So what you are seeing means means that it's probably just an issue with
> the write accounting, where the blocks reserved by the fallocate are
> counted as writes.
>
> > My smart values for my SSD are:
> >
> > SMART Attributes Data Structure revision number: 1
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> > 241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 1494
>
> You should be able to confirm this by checking the 'Total_LBAs_Written'
> attribute before and after doing the fstrim; it should either not go up,
> or go up only be a small amount. Although to be honest, I'm not sure
> what this is counting - if that raw value is actually LBAs, that would
> only account for 747KiB of writes! I guess it's probably a count of
> erase blocks or something - what model is the SSD?
According to http://www.plextoramericas.com/index.php/forum/27-ssd/7881-my-m5pro-wear-leveling-count-problem
those are 32 MB blocks. And 177 Wear_Leveling_Count corresponds to 64 MB
blocks.
So Total_LBAs_Written corresponds to 46 GB of writes and Wear_Leveling_Count
corresponds to 29 GB. This seems realistic for 5 days of use with an initial
installation and more than 100MB of writes per hour (roughly 1GB per day).
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
Ext4 is getting this information from the block layer. Specifically,
it's calling
part_stat_read(bd_part, sectors[WRITE])
to get the statistics information it uses to caluclate how many kb
have been written.
I'm not a block layer expert, but it may very well be the case that
TRIMS are being counted as requests. Requests are categorized as
either READS or WRITES, and I don't see any special casing for non r/w
requests in block/blk-core.c.
Regards,
- Ted
On Thu, 12 Sep 2013, Calvin Walton wrote:
> Date: Thu, 12 Sep 2013 10:54:03 -0400
> From: Calvin Walton <[email protected]>
> To: Julian Andres Klode <[email protected]>
> Cc: [email protected]
> Subject: Re: Please help: Is ext4 counting trims as writes,
> or is something killing my SSD?
>
> On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> > Hi,
> >
> > I installed my new laptop on Saturday and setup an ext4 filesystem
> > on my / and /home partitions. Without me doing much file transfers,
> > I noticed today:
> >
> > jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> > 342614039
> >
> > This is on a 100GB partition. I used fstrim multiple times. I analysed
> > the increase over some time today and issued an fstrim in between:
> <snip>
> > So it seems that ext4 counts the trims as writes? I don't know how I could
> > get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> > -- otherwise.
>
> The way fstrim works is that it allocates a temporary file that fills
> almost the entire free space on the partition. I believe it does this
> with fallocate in order to ensure that space for the file is actually
> reserved on disc (but it does not get written to!). It then looks up
> where on disc the file's reserved space is, and sends a trim command to
> the drive to free that space. Afterwards, it deletes the temporary file.
As Eric already mentioned that's not how it works. You're confusing
it with wiper.sh script which did exactly that without any support
from the file system.
Fstrim is entirely different thing and it require support from file
system which ext4 and ext3 has (and xfs,btrfs,gfs2,ocfs2 and
possibly more)
What Julian is probably seeing is that in older kernel there was a
behaviour where all DISCARD requests were accounted as WRITE requests.
This should be fixed in the recent kernel already.
-Lukas
>
> So what you are seeing means means that it's probably just an issue with
> the write accounting, where the blocks reserved by the fallocate are
> counted as writes.
>
> > My smart values for my SSD are:
> >
> > SMART Attributes Data Structure revision number: 1
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> > 241 Total_LBAs_Written 0x0003 100 100 000 Pre-fail Always - 1494
>
> You should be able to confirm this by checking the 'Total_LBAs_Written'
> attribute before and after doing the fstrim; it should either not go up,
> or go up only be a small amount. Although to be honest, I'm not sure
> what this is counting - if that raw value is actually LBAs, that would
> only account for 747KiB of writes! I guess it's probably a count of
> erase blocks or something - what model is the SSD?
>
>
On Thu, 2013-09-12 at 10:18 -0500, Eric Sandeen wrote:
> On 9/12/13 9:54 AM, Calvin Walton wrote:
> > On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> >> Hi,
> >>
> >> I installed my new laptop on Saturday and setup an ext4 filesystem
> >> on my / and /home partitions. Without me doing much file transfers,
> >> I noticed today:
> >>
> >> jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> >> 342614039
> >>
> >> This is on a 100GB partition. I used fstrim multiple times. I analysed
> >> the increase over some time today and issued an fstrim in between:
> > <snip>
> >> So it seems that ext4 counts the trims as writes? I don't know how I could
> >> get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> >> -- otherwise.
> >
> > The way fstrim works is that it allocates a temporary file that fills
> > almost the entire free space on the partition.
>
> No, that's not correct.
> Nope. ;) strace it and see, it does nothing like this - it calls a special
> ioctl to ask the fs to find and issue discards on unused blocks.
>
> # strace -e open,write,fallocate,unlink,ioctl fstrim mnt/
> open("/etc/ld.so.cache", O_RDONLY) = 3
> open("/lib64/libc.so.6", O_RDONLY) = 3
> open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
> open("mnt/", O_RDONLY) = 3
> ioctl(3, 0xc0185879, 0x7fff6ac47d40) = 0 <=== FITRIM ioctl
>
> (old hdparm discard might have done what you say, but that was a hack).
Alright, you got me there :) To be honest, I got the name 'fstrim'
confused with the old 'wiper.sh' script that used to be the only way to
do this, and did in fact function as I said.
Having this all integrated into the filesystem itself is quite a nice
change for the better - the old way was definitely a hack! I suppose
this was added sometime in the 2011 timeframe?
--
Calvin Walton <[email protected]>
On Thu, Sep 12, 2013 at 10:18:11AM -0500, Eric Sandeen wrote:
> On 9/12/13 9:54 AM, Calvin Walton wrote:
> > On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
> >> Hi,
> >>
> >> I installed my new laptop on Saturday and setup an ext4 filesystem
> >> on my / and /home partitions. Without me doing much file transfers,
> >> I noticed today:
> >>
> >> jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
> >> 342614039
> >>
> >> This is on a 100GB partition. I used fstrim multiple times. I analysed
> >> the increase over some time today and issued an fstrim in between:
> > <snip>
> >> So it seems that ext4 counts the trims as writes? I don't know how I could
> >> get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
> >> -- otherwise.
> >
> > The way fstrim works is that it allocates a temporary file that fills
> > almost the entire free space on the partition.
>
> No, that's not correct.
>
> > I believe it does this
> > with fallocate in order to ensure that space for the file is actually
> > reserved on disc (but it does not get written to!). It then looks up
> > where on disc the file's reserved space is, and sends a trim command to
> > the drive to free that space. Afterwards, it deletes the temporary file.
>
> Nope. ;) strace it and see, it does nothing like this - it calls a special
> ioctl to ask the fs to find and issue discards on unused blocks.
>
> # strace -e open,write,fallocate,unlink,ioctl fstrim mnt/
> open("/etc/ld.so.cache", O_RDONLY) = 3
> open("/lib64/libc.so.6", O_RDONLY) = 3
> open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
> open("mnt/", O_RDONLY) = 3
> ioctl(3, 0xc0185879, 0x7fff6ac47d40) = 0 <=== FITRIM ioctl
>
> (old hdparm discard might have done what you say, but that was a hack).
>
> > So what you are seeing means means that it's probably just an issue with
> > the write accounting, where the blocks reserved by the fallocate are
> > counted as writes.
>
> I also think that it is just accounting, and probably just an error,
> which seems to be fixed by now - what kernel are you running?
Kernel 3.10.7
>
> When you report it in ext4, it calculates it like this:
>
> return snprintf(buf, PAGE_SIZE, "%llu\n",
> (unsigned long long)(sbi->s_kbytes_written +
> ((part_stat_read(sb->s_bdev->bd_part, sectors[1]) -
> EXT4_SB(sb)->s_sectors_written_start) >> 1)));
>
> so it counts partition stats in the mix (outside of ext4's accounting)
>
> On io completion, we add the bytes "completed" (blk_account_io_completion())
>
> And it sounds like it's counting trim/discard completions in the mix.
>
> does /proc/diskstats show a jump for your partition after an fstrim as well?
>
I created a file using fallocate, deleted it (with discard option set
on the FS), and then sync'ed and got the following changes in sdb3:
jak@jak-x230:~$ diff /tmp/a /tmp/b
diff --git tmp/a tmp/b
index e0370bf..43c2fdd 100644
--- tmp/a
+++ tmp/b
@@ -1,7 +1,7 @@
8 0 sda 1845 2122 15992 15268 6070 313375 3119314 5359680 0 85548 5391508
8 1 sda1 500 0 3970 1104 4106 37774 2840016 1028656 0 29656 1046320
- 8 16 sdb 85114 4486 4281300 36344 143239 111626 282319450 1803288 0 101416 1839608
+ 8 16 sdb 85114 4486 4281300 36344 143300 111658 284417426 1803492 0 101460 1839812
8 17 sdb1 930 992 8152 316 2 0 2 0 0 68 316
8 18 sdb2 72071 3316 3024626 29692 54309 29582 23201808 183432 0 37704 213060
- 8 19 sdb3 11858 175 1246458 6320 88381 82044 259117640 1619624 0 65880 1626200
+ 8 19 sdb3 11858 175 1246458 6320 88442 82076 261215616 1619828 0 65924 1626404
>
> But what kernel are you running? I don't see it on a 3.11 kernel:
>
> After a fresh mkfs I'm at:
> [root@bp-05 tmp]# dumpe2fs -h fsfile | grep Lifetime
> dumpe2fs 1.41.12 (17-May-2010)
> Lifetime writes: 8135 MB
>
> and then several fstrims don't budge it:
>
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
> [root@bp-05 tmp]# fstrim mnt/
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
> [root@bp-05 tmp]# fstrim mnt/
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
>
> -Eric
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
On 9/12/13 10:29 AM, Calvin Walton wrote:
...
> Having this all integrated into the filesystem itself is quite a nice
> change for the better - the old way was definitely a hack! I suppose
> this was added sometime in the 2011 timeframe?
commit 367a51a339020ba4d9edb0ce0f21d65bd50b00c9
Author: Lukas Czerner <[email protected]>
Date: Wed Oct 27 21:30:11 2010 -0400
fs: Add FITRIM ioctl
Adds an filesystem independent ioctl to allow implementation of file
system batched discard support.
the userspace tool came around that time as well.
-Eric
On 9/12/13 10:32 AM, Julian Andres Klode wrote:
> On Thu, Sep 12, 2013 at 10:18:11AM -0500, Eric Sandeen wrote:
...
<note, realized that my test on loop might not be valid>
> I created a file using fallocate, deleted it (with discard option set
> on the FS), and then sync'ed and got the following changes in sdb3:
>
> jak@jak-x230:~$ diff /tmp/a /tmp/b
> diff --git tmp/a tmp/b
> index e0370bf..43c2fdd 100644
> --- tmp/a
> +++ tmp/b
> @@ -1,7 +1,7 @@
> 8 0 sda 1845 2122 15992 15268 6070 313375 3119314 5359680 0 85548 5391508
> 8 1 sda1 500 0 3970 1104 4106 37774 2840016 1028656 0 29656 1046320
> - 8 16 sdb 85114 4486 4281300 36344 143239 111626 282319450 1803288 0 101416 1839608
> + 8 16 sdb 85114 4486 4281300 36344 143300 111658 284417426 1803492 0 101460 1839812
> 8 17 sdb1 930 992 8152 316 2 0 2 0 0 68 316
> 8 18 sdb2 72071 3316 3024626 29692 54309 29582 23201808 183432 0 37704 213060
> - 8 19 sdb3 11858 175 1246458 6320 88381 82044 259117640 1619624 0 65880 1626200
> + 8 19 sdb3 11858 175 1246458 6320 88442 82076 261215616 1619828 0 65924 1626404
^^^^^^^^^
field 7 (after major/minor/device) is the number of sectors written.
Yours moved by exactly 1G.
So the takeaway is; I think discards *are* included in the stats, but don't worry, it's
not doing IO to your device. It was added here, and it doesn't seem to have changed:
commit c69d48540c201394d08cb4d48b905e001313d9b8
Author: Jens Axboe <[email protected]>
Date: Fri Apr 24 08:12:19 2009 +0200
block: include discard requests in IO accounting
We currently don't do merging on discard requests, but we potentially
could. If we do, then we need to include discard requests in the IO
accounting, or merging would end up decrementing in_flight IO counters
for an IO which never incremented them.
So enable accounting for discard requests.
However, it seems a little odd to me that ext4 feels it necessary to issue
discards on blocks which have been fallocated but not written to, I'll have
to think about that part (doesn't really matter for your case, it's just a
curiosity).
Thanks,
-Eric
On Thu, Sep 12, 2013 at 10:52:38AM -0500, Eric Sandeen wrote:
>
> However, it seems a little odd to me that ext4 feels it necessary to issue
> discards on blocks which have been fallocated but not written to, I'll have
> to think about that part (doesn't really matter for your case, it's just a
> curiosity).
For fstrim, we issue discards based on blocks which are not in use
according to the block allocation bitmap.
It shouldn't matter that we've issued discard on blocks which had been
previously discarded, and in fact, it might help, since sometimes
storage devices only traces block usage on large granularities ---
that is, it might only releases blocks on a thin provisioned storage
when a full megabyte worth of blocks are discarded.
- Ted
On 09/12/2013 11:18 AM, Eric Sandeen wrote:
> On 9/12/13 9:54 AM, Calvin Walton wrote:
>> On Thu, 2013-09-12 at 16:18 +0200, Julian Andres Klode wrote:
>>> Hi,
>>>
>>> I installed my new laptop on Saturday and setup an ext4 filesystem
>>> on my / and /home partitions. Without me doing much file transfers,
>>> I noticed today:
>>>
>>> jak@jak-x230:~$ cat /sys/fs/ext4/sdb3/lifetime_write_kbytes
>>> 342614039
>>>
>>> This is on a 100GB partition. I used fstrim multiple times. I analysed
>>> the increase over some time today and issued an fstrim in between:
>> <snip>
>>> So it seems that ext4 counts the trims as writes? I don't know how I could
>>> get 300GB of writes on a 100GB partition -- of which only 8 GB are occupied
>>> -- otherwise.
>> The way fstrim works is that it allocates a temporary file that fills
>> almost the entire free space on the partition.
> No, that's not correct.
That is how an older tool (from Mark Lord) used to work :)
ric
>
>> I believe it does this
>> with fallocate in order to ensure that space for the file is actually
>> reserved on disc (but it does not get written to!). It then looks up
>> where on disc the file's reserved space is, and sends a trim command to
>> the drive to free that space. Afterwards, it deletes the temporary file.
> Nope. ;) strace it and see, it does nothing like this - it calls a special
> ioctl to ask the fs to find and issue discards on unused blocks.
>
> # strace -e open,write,fallocate,unlink,ioctl fstrim mnt/
> open("/etc/ld.so.cache", O_RDONLY) = 3
> open("/lib64/libc.so.6", O_RDONLY) = 3
> open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
> open("mnt/", O_RDONLY) = 3
> ioctl(3, 0xc0185879, 0x7fff6ac47d40) = 0 <=== FITRIM ioctl
>
> (old hdparm discard might have done what you say, but that was a hack).
>
>> So what you are seeing means means that it's probably just an issue with
>> the write accounting, where the blocks reserved by the fallocate are
>> counted as writes.
> I also think that it is just accounting, and probably just an error,
> which seems to be fixed by now - what kernel are you running?
>
> When you report it in ext4, it calculates it like this:
>
> return snprintf(buf, PAGE_SIZE, "%llu\n",
> (unsigned long long)(sbi->s_kbytes_written +
> ((part_stat_read(sb->s_bdev->bd_part, sectors[1]) -
> EXT4_SB(sb)->s_sectors_written_start) >> 1)));
>
> so it counts partition stats in the mix (outside of ext4's accounting)
>
> On io completion, we add the bytes "completed" (blk_account_io_completion())
>
> And it sounds like it's counting trim/discard completions in the mix.
>
> does /proc/diskstats show a jump for your partition after an fstrim as well?
>
>
>
> But what kernel are you running? I don't see it on a 3.11 kernel:
>
> After a fresh mkfs I'm at:
> [root@bp-05 tmp]# dumpe2fs -h fsfile | grep Lifetime
> dumpe2fs 1.41.12 (17-May-2010)
> Lifetime writes: 8135 MB
>
> and then several fstrims don't budge it:
>
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
> [root@bp-05 tmp]# fstrim mnt/
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
> [root@bp-05 tmp]# fstrim mnt/
> [root@bp-05 tmp]# cat /sys/fs/ext4/loop0/lifetime_write_kbytes
> 8330683
>
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/12/2013 02:47 PM, Theodore Ts'o wrote:
> On Thu, Sep 12, 2013 at 10:52:38AM -0500, Eric Sandeen wrote:
>> However, it seems a little odd to me that ext4 feels it necessary to issue
>> discards on blocks which have been fallocated but not written to, I'll have
>> to think about that part (doesn't really matter for your case, it's just a
>> curiosity).
> For fstrim, we issue discards based on blocks which are not in use
> according to the block allocation bitmap.
>
> It shouldn't matter that we've issued discard on blocks which had been
> previously discarded, and in fact, it might help, since sometimes
> storage devices only traces block usage on large granularities ---
> that is, it might only releases blocks on a thin provisioned storage
> when a full megabyte worth of blocks are discarded.
>
> - Ted
>
It is the right thing to do to re-issue the trims I think for exactly that
reason. Devices are allowed by the spec to ignore requests that are not aligned
to their needs, so this lets us try to get back in sync.
ric