Here's a problem I can't work out:
I have a filesystem (in a VM) that I know has at least 100MB of
deleted files on it. Doing this in a script:
mount -o discard /dev/sda1 /mnt
fstrim /mnt
... does nothing. Also the fstrim is almost instantaneous -- there's
no way it could be scanning the disk.
However, if I start with the same filesystem, mounted with -o discard,
and create and rm large files, while observing the size of the
underlying virtual disk, then discard is obviously working fine. 'rm'
of large files makes the underlying disk shrink.
Any ideas here?
Rich.
kernel: 3.12.5-302.fc20.x86_64
qemu: 1.7.0
virtio-scsi with discard=unmap
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
On 3/11/14, 4:39 PM, Richard W.M. Jones wrote:
>
> Here's a problem I can't work out:
>
> I have a filesystem (in a VM) that I know has at least 100MB of
> deleted files on it.
Was it mounted with -o discard at the time the files were deleted?
If so, then the trim is already done during the unlink process,
and there's no more work to do.
So that's my first thought, but ...
> Doing this in a script:
>
> mount -o discard /dev/sda1 /mnt
> fstrim /mnt
>
> ... does nothing. Also the fstrim is almost instantaneous -- there's
> no way it could be scanning the disk.
blktrace would be a better tool to find out whether or not discards
are actually getting issued to storage...
And if you strace it what does the ioctl return?
Enabling the trace_ext4_trim_all_free tracepoint might be interesting too.
> However, if I start with the same filesystem, mounted with -o discard,
> and create and rm large files, while observing the size of the
> underlying virtual disk, then discard is obviously working fine. 'rm'
> of large files makes the underlying disk shrink.
>
> Any ideas here?
first of all, I should point out that "-o discard" is not necessary for
fstrim / FITRIM ioctl to work. The former tries to trim as soon
as files are unlinked; FITRIM goes looking for free blocks to trim.
If you're mounting with -o discard, then fstrim should never find any
workd to do.
-Eric
> Rich.
>
> kernel: 3.12.5-302.fc20.x86_64
> qemu: 1.7.0
> virtio-scsi with discard=unmap
>
[The context of this is trying to get virt-sparsify to work in place
on disks.]
On Tue, Mar 11, 2014 at 04:47:02PM -0500, Eric Sandeen wrote:
> On 3/11/14, 4:39 PM, Richard W.M. Jones wrote:
> >
> > Here's a problem I can't work out:
> >
> > I have a filesystem (in a VM) that I know has at least 100MB of
> > deleted files on it.
>
> Was it mounted with -o discard at the time the files were deleted?
No, it was not.
I know that the original 'rm' command didn't recover any space because
the disk image grew by ~100 MB.
> If so, then the trim is already done during the unlink process,
> and there's no more work to do.
>
> So that's my first thought, but ...
>
> > Doing this in a script:
> >
> > mount -o discard /dev/sda1 /mnt
> > fstrim /mnt
> >
> > ... does nothing. Also the fstrim is almost instantaneous -- there's
> > no way it could be scanning the disk.
>
> blktrace would be a better tool to find out whether or not discards
> are actually getting issued to storage...
>
> And if you strace it what does the ioctl return?
I'll try that in a few minutes.
In the mean time I captured the fstrim -v output:
fstrim -v /
/: 124 MiB (130039808 bytes) trimmed
124 MB is (within 25%) the amount of data I would expect needs to be
trimmed.
> Enabling the trace_ext4_trim_all_free tracepoint might be interesting too.
That a systemtap thing? It's tricky to get systemtap working in a
virtual machine, but I guess I can try if nothing else works.
> > However, if I start with the same filesystem, mounted with -o discard,
> > and create and rm large files, while observing the size of the
> > underlying virtual disk, then discard is obviously working fine. 'rm'
> > of large files makes the underlying disk shrink.
> >
> > Any ideas here?
>
> first of all, I should point out that "-o discard" is not necessary for
> fstrim / FITRIM ioctl to work. The former tries to trim as soon
> as files are unlinked; FITRIM goes looking for free blocks to trim.
>
> If you're mounting with -o discard, then fstrim should never find any
> workd to do.
Useful to know.
I thought I had to use -o discard in order for the ext4 module to send
discard commands at all to the block layer.
Thanks,
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
> > And if you strace it what does the ioctl return?
It seems OK:
stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/", O_RDONLY) = 0
ioctl(0, FITRIM, 0x7fffbfb2be60) = 0
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org
On 3/11/14, 5:00 PM, Richard W.M. Jones wrote:
> [The context of this is trying to get virt-sparsify to work in place
> on disks.]
>
> On Tue, Mar 11, 2014 at 04:47:02PM -0500, Eric Sandeen wrote:
>> On 3/11/14, 4:39 PM, Richard W.M. Jones wrote:
>>>
>>> Here's a problem I can't work out:
>>>
>>> I have a filesystem (in a VM) that I know has at least 100MB of
>>> deleted files on it.
>>
>> Was it mounted with -o discard at the time the files were deleted?
>
> No, it was not.
Ok, worth asking. :)
> I know that the original 'rm' command didn't recover any space because
> the disk image grew by ~100 MB.
>> If so, then the trim is already done during the unlink process,
>> and there's no more work to do.
>>
>> So that's my first thought, but ...
>>
>>> Doing this in a script:
>>>
>>> mount -o discard /dev/sda1 /mnt
>>> fstrim /mnt
>>>
>>> ... does nothing. Also the fstrim is almost instantaneous -- there's
>>> no way it could be scanning the disk.
>>
>> blktrace would be a better tool to find out whether or not discards
>> are actually getting issued to storage...
blktrace is probably the place to start. Do you see discard
requests? then ext4 is doing its job. If not, we can trace
ext4 to see why it's not issuing them, assuming there really
is work to do.
>> And if you strace it what does the ioctl return?
>
> I'll try that in a few minutes.
>
> In the mean time I captured the fstrim -v output:
>
> fstrim -v /
> /: 124 MiB (130039808 bytes) trimmed
>
> 124 MB is (within 25%) the amount of data I would expect needs to be
> trimmed.
Ok, so it says that it did do what you expected...
>> Enabling the trace_ext4_trim_all_free tracepoint might be interesting too.
>
> That a systemtap thing? It's tricky to get systemtap working in a
> virtual machine, but I guess I can try if nothing else works.
# trace-cmd record -e &
# <run test>
# fg
<ctrl-c>
# trace-cmd report > trace_report.txt
should do it.
>>> However, if I start with the same filesystem, mounted with -o discard,
>>> and create and rm large files, while observing the size of the
>>> underlying virtual disk, then discard is obviously working fine. 'rm'
>>> of large files makes the underlying disk shrink.
(backing up, that's the "-o discard" option at work)
>>> Any ideas here?
>>
>> first of all, I should point out that "-o discard" is not necessary for
>> fstrim / FITRIM ioctl to work. The former tries to trim as soon
>> as files are unlinked; FITRIM goes looking for free blocks to trim.
>>
>> If you're mounting with -o discard, then fstrim should never find any
>> work to do.
>
> Useful to know.
>
> I thought I had to use -o discard in order for the ext4 module to send
> discard commands at all to the block layer.
nope. That just makes it do it every time a block is freed, instead
of in batches via fstrim:
discard Controls whether ext4 should issue discard/TRIM
nodiscard(*) commands to the underlying block device when
blocks are freed.
the FITRIM ioctl works fine w/o the mount option, and in fact as I said,
should have no work to do if the mount option is there - every freed block
shouid get discarded (well, maybe modulo some size thresholds, I don't
remember for sure).
-Eric
> Thanks,
>
> Rich.
>
On Tue, Mar 11, 2014 at 05:08:19PM -0500, Eric Sandeen wrote:
> blktrace is probably the place to start. Do you see discard
> requests? then ext4 is doing its job. If not, we can trace
> ext4 to see why it's not issuing them, assuming there really
> is work to do.
At the moment I can't get this to work. The script I'm using is:
----------------------------------------------------------------------
set -e
set -x
trace-cmd record -e all -o /tmp/trace &
pid=$!
fstrim /sysroot
kill $pid; sleep 2
trace-cmd report -i /tmp/trace
----------------------------------------------------------------------
However the last trace-cmd gives an error:
trace-cmd: No such file or directory
opening '/tmp/trace'
I'll try again tomorrow morning.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
On Tue, Mar 11, 2014 at 10:59:32PM +0000, Richard W.M. Jones wrote:
> On Tue, Mar 11, 2014 at 05:08:19PM -0500, Eric Sandeen wrote:
> > blktrace is probably the place to start. Do you see discard
> > requests? then ext4 is doing its job. If not, we can trace
> > ext4 to see why it's not issuing them, assuming there really
> > is work to do.
>
> At the moment I can't get this to work. The script I'm using is:
>
> ----------------------------------------------------------------------
> set -e
> set -x
> trace-cmd record -e all -o /tmp/trace &
> pid=$!
> fstrim /sysroot
> kill $pid; sleep 2
> trace-cmd report -i /tmp/trace
> ----------------------------------------------------------------------
I got it to work by using:
----------------------------------------------------------------------
set -e
set -x
trace-cmd record -e all fstrim /sysroot
trace-cmd report
----------------------------------------------------------------------
The output is absolutely huge and I didn't capture it.
However just the act of doing the tracing *caused* the trim to happen
properly in the underlying disk.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
On 3/11/14, 6:07 PM, Richard W.M. Jones wrote:
> I got it to work by using:
>
> ----------------------------------------------------------------------
> set -e
> set -x
> trace-cmd record -e all fstrim /sysroot
> trace-cmd report
> ----------------------------------------------------------------------
>
> The output is absolutely huge and I didn't capture it.
that's why I suggested a single tracepoint, rather than every tracepoint
in the kernel... ;)
oh wait, I didn't. :/ argh sorry.
# trace-cmd record -e ext4_trim\*
should do it.
> However just the act of doing the tracing *caused* the trim to happen
> properly in the underlying disk.
that sounds very strange...
-Eric
> Rich.
>
On Tue, Mar 11, 2014 at 06:09:28PM -0500, Eric Sandeen wrote:
> On 3/11/14, 6:07 PM, Richard W.M. Jones wrote:
>
> > I got it to work by using:
> >
> > ----------------------------------------------------------------------
> > set -e
> > set -x
> > trace-cmd record -e all fstrim /sysroot
> > trace-cmd report
> > ----------------------------------------------------------------------
> >
> > The output is absolutely huge and I didn't capture it.
>
> that's why I suggested a single tracepoint, rather than every tracepoint
> in the kernel... ;)
>
> oh wait, I didn't. :/ argh sorry.
>
> # trace-cmd record -e ext4_trim\*
>
> should do it.
>
> > However just the act of doing the tracing *caused* the trim to happen
> > properly in the underlying disk.
>
> that sounds very strange...
Thanks Eric.
FYI the libguestfs / virt-sparsify patch series that motivates this is
here:
https://www.redhat.com/archives/libguestfs/2014-March/thread.html#00091
Even with the greatly reduced set of traces (see attached), just the
act of tracing seems to have made trimming work properly. The output
file has been trimmed properly from 926 MB to 819 MB:
819M fedora-20.img
926M fedora-20.img.ORIG
However I don't think making tracing the default on all fstrim
operations is going to be a good solution :-(
Rich.
----------------------------------------------------------------------
+ trace-cmd record -e 'ext4_trim*' fstrim /sysroot
/sys/kernel/debug/tracing/events/ext4_trim*/filter
/sys/kernel/debug/tracing/events/*/ext4_trim*/filter
CPU0 data recorded at offset=0x2d0000
4096 bytes in size
/sys/kernel/debug/tracing/events/ext4_trim*/filter
/sys/kernel/debug/tracing/events/*/ext4_trim*/filter
Kernel buffer statistics:
Note: "entries" are the entries left in the kernel ring buffer and are not
recorded in the trace data. They should all be zero.
CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3368
oldest event ts: 2.475090
now ts: 2.500652
dropped events: 0
read events: 105
+ trace-cmd report
version = 6
cpus=1
fstrim-189 [000] 2.475090: ext4_trim_all_free: dev 8,3 group 0, start 0, len 32767
fstrim-189 [000] 2.475092: ext4_trim_extent: dev 8,3 group 0, start 10061, len 22707
fstrim-189 [000] 2.475300: ext4_trim_all_free: dev 8,3 group 1, start 0, len 32767
fstrim-189 [000] 2.475301: ext4_trim_extent: dev 8,3 group 1, start 316, len 58
fstrim-189 [000] 2.475413: ext4_trim_extent: dev 8,3 group 1, start 384, len 35
fstrim-189 [000] 2.475501: ext4_trim_extent: dev 8,3 group 1, start 453, len 1
fstrim-189 [000] 2.475569: ext4_trim_extent: dev 8,3 group 1, start 512, len 22
fstrim-189 [000] 2.475648: ext4_trim_extent: dev 8,3 group 1, start 1735, len 473
fstrim-189 [000] 2.475937: ext4_trim_extent: dev 8,3 group 1, start 9225, len 1
fstrim-189 [000] 2.476008: ext4_trim_extent: dev 8,3 group 1, start 9227, len 1
fstrim-189 [000] 2.476076: ext4_trim_extent: dev 8,3 group 1, start 9232, len 496
fstrim-189 [000] 2.476250: ext4_trim_all_free: dev 8,3 group 3, start 0, len 32767
fstrim-189 [000] 2.476251: ext4_trim_extent: dev 8,3 group 3, start 5384, len 8
fstrim-189 [000] 2.476324: ext4_trim_extent: dev 8,3 group 3, start 17748, len 684
fstrim-189 [000] 2.476396: ext4_trim_extent: dev 8,3 group 3, start 30731, len 7
fstrim-189 [000] 2.476471: ext4_trim_extent: dev 8,3 group 3, start 30803, len 1
fstrim-189 [000] 2.476595: ext4_trim_all_free: dev 8,3 group 4, start 0, len 32767
fstrim-189 [000] 2.476596: ext4_trim_extent: dev 8,3 group 4, start 1607, len 57
fstrim-189 [000] 2.476665: ext4_trim_extent: dev 8,3 group 4, start 1798, len 250
fstrim-189 [000] 2.476736: ext4_trim_extent: dev 8,3 group 4, start 17810, len 14
fstrim-189 [000] 2.476809: ext4_trim_extent: dev 8,3 group 4, start 17862, len 14906
fstrim-189 [000] 2.485681: ext4_trim_all_free: dev 8,3 group 5, start 0, len 32767
fstrim-189 [000] 2.485683: ext4_trim_extent: dev 8,3 group 5, start 316, len 32452
fstrim-189 [000] 2.492399: ext4_trim_all_free: dev 8,3 group 6, start 0, len 32767
fstrim-189 [000] 2.492400: ext4_trim_extent: dev 8,3 group 6, start 0, len 32768
fstrim-189 [000] 2.492546: ext4_trim_all_free: dev 8,3 group 7, start 0, len 32767
fstrim-189 [000] 2.492547: ext4_trim_extent: dev 8,3 group 7, start 316, len 32452
fstrim-189 [000] 2.492665: ext4_trim_all_free: dev 8,3 group 8, start 0, len 32767
fstrim-189 [000] 2.492666: ext4_trim_extent: dev 8,3 group 8, start 0, len 32768
fstrim-189 [000] 2.492783: ext4_trim_all_free: dev 8,3 group 9, start 0, len 32767
fstrim-189 [000] 2.492784: ext4_trim_extent: dev 8,3 group 9, start 316, len 32452
fstrim-189 [000] 2.492897: ext4_trim_all_free: dev 8,3 group 10, start 0, len 32767
fstrim-189 [000] 2.492898: ext4_trim_extent: dev 8,3 group 10, start 0, len 32768
fstrim-189 [000] 2.493018: ext4_trim_all_free: dev 8,3 group 11, start 0, len 32767
fstrim-189 [000] 2.493019: ext4_trim_extent: dev 8,3 group 11, start 0, len 32768
fstrim-189 [000] 2.493132: ext4_trim_all_free: dev 8,3 group 12, start 0, len 32767
fstrim-189 [000] 2.493133: ext4_trim_extent: dev 8,3 group 12, start 0, len 32768
fstrim-189 [000] 2.493245: ext4_trim_all_free: dev 8,3 group 13, start 0, len 32767
fstrim-189 [000] 2.493246: ext4_trim_extent: dev 8,3 group 13, start 0, len 32768
fstrim-189 [000] 2.493359: ext4_trim_all_free: dev 8,3 group 14, start 0, len 32767
fstrim-189 [000] 2.493360: ext4_trim_extent: dev 8,3 group 14, start 0, len 32768
fstrim-189 [000] 2.493473: ext4_trim_all_free: dev 8,3 group 15, start 0, len 32767
fstrim-189 [000] 2.493473: ext4_trim_extent: dev 8,3 group 15, start 0, len 32768
fstrim-189 [000] 2.493586: ext4_trim_all_free: dev 8,3 group 16, start 0, len 32767
fstrim-189 [000] 2.493587: ext4_trim_extent: dev 8,3 group 16, start 9816, len 1
fstrim-189 [000] 2.493656: ext4_trim_extent: dev 8,3 group 16, start 9818, len 22950
fstrim-189 [000] 2.493892: ext4_trim_all_free: dev 8,3 group 18, start 0, len 32767
fstrim-189 [000] 2.493892: ext4_trim_extent: dev 8,3 group 18, start 761, len 1
fstrim-189 [000] 2.493972: ext4_trim_extent: dev 8,3 group 18, start 793, len 231
fstrim-189 [000] 2.494085: ext4_trim_extent: dev 8,3 group 18, start 6767, len 1
fstrim-189 [000] 2.494170: ext4_trim_extent: dev 8,3 group 18, start 7694, len 1
fstrim-189 [000] 2.494259: ext4_trim_extent: dev 8,3 group 18, start 7700, len 1
fstrim-189 [000] 2.494347: ext4_trim_extent: dev 8,3 group 18, start 12252, len 1
fstrim-189 [000] 2.494437: ext4_trim_extent: dev 8,3 group 18, start 24218, len 1
fstrim-189 [000] 2.494620: ext4_trim_all_free: dev 8,3 group 19, start 0, len 32767
fstrim-189 [000] 2.494621: ext4_trim_extent: dev 8,3 group 19, start 7693, len 1
fstrim-189 [000] 2.494702: ext4_trim_extent: dev 8,3 group 19, start 7715, len 1
fstrim-189 [000] 2.494791: ext4_trim_extent: dev 8,3 group 19, start 8980, len 108
fstrim-189 [000] 2.494948: ext4_trim_extent: dev 8,3 group 19, start 9147, len 37
fstrim-189 [000] 2.495070: ext4_trim_extent: dev 8,3 group 19, start 9190, len 4
fstrim-189 [000] 2.495157: ext4_trim_extent: dev 8,3 group 19, start 9893, len 349
fstrim-189 [000] 2.495453: ext4_trim_extent: dev 8,3 group 19, start 10245, len 507
fstrim-189 [000] 2.495556: ext4_trim_extent: dev 8,3 group 19, start 10753, len 22015
fstrim-189 [000] 2.495739: ext4_trim_all_free: dev 8,3 group 20, start 0, len 32767
fstrim-189 [000] 2.495739: ext4_trim_extent: dev 8,3 group 20, start 799, len 31969
fstrim-189 [000] 2.495915: ext4_trim_all_free: dev 8,3 group 21, start 0, len 32767
fstrim-189 [000] 2.495916: ext4_trim_extent: dev 8,3 group 21, start 0, len 32768
fstrim-189 [000] 2.496098: ext4_trim_all_free: dev 8,3 group 22, start 0, len 32767
fstrim-189 [000] 2.496099: ext4_trim_extent: dev 8,3 group 22, start 0, len 32768
fstrim-189 [000] 2.496270: ext4_trim_all_free: dev 8,3 group 23, start 0, len 32767
fstrim-189 [000] 2.496271: ext4_trim_extent: dev 8,3 group 23, start 0, len 32768
fstrim-189 [000] 2.496449: ext4_trim_all_free: dev 8,3 group 24, start 0, len 32767
fstrim-189 [000] 2.496450: ext4_trim_extent: dev 8,3 group 24, start 0, len 32768
fstrim-189 [000] 2.496613: ext4_trim_all_free: dev 8,3 group 25, start 0, len 32767
fstrim-189 [000] 2.496614: ext4_trim_extent: dev 8,3 group 25, start 316, len 32452
fstrim-189 [000] 2.496780: ext4_trim_all_free: dev 8,3 group 26, start 0, len 32767
fstrim-189 [000] 2.496781: ext4_trim_extent: dev 8,3 group 26, start 0, len 32768
fstrim-189 [000] 2.496952: ext4_trim_all_free: dev 8,3 group 27, start 0, len 32767
fstrim-189 [000] 2.496953: ext4_trim_extent: dev 8,3 group 27, start 316, len 32452
fstrim-189 [000] 2.497138: ext4_trim_all_free: dev 8,3 group 28, start 0, len 32767
fstrim-189 [000] 2.497139: ext4_trim_extent: dev 8,3 group 28, start 0, len 32768
fstrim-189 [000] 2.497341: ext4_trim_all_free: dev 8,3 group 29, start 0, len 32767
fstrim-189 [000] 2.497342: ext4_trim_extent: dev 8,3 group 29, start 0, len 32768
fstrim-189 [000] 2.497530: ext4_trim_all_free: dev 8,3 group 30, start 0, len 32767
fstrim-189 [000] 2.497531: ext4_trim_extent: dev 8,3 group 30, start 0, len 32768
fstrim-189 [000] 2.497711: ext4_trim_all_free: dev 8,3 group 31, start 0, len 32767
fstrim-189 [000] 2.497712: ext4_trim_extent: dev 8,3 group 31, start 0, len 32768
fstrim-189 [000] 2.497873: ext4_trim_all_free: dev 8,3 group 32, start 0, len 32767
fstrim-189 [000] 2.497874: ext4_trim_extent: dev 8,3 group 32, start 8, len 8
fstrim-189 [000] 2.497960: ext4_trim_extent: dev 8,3 group 32, start 24, len 8
fstrim-189 [000] 2.498037: ext4_trim_extent: dev 8,3 group 32, start 4056, len 28712
fstrim-189 [000] 2.498235: ext4_trim_all_free: dev 8,3 group 33, start 0, len 32767
fstrim-189 [000] 2.498237: ext4_trim_extent: dev 8,3 group 33, start 0, len 32768
fstrim-189 [000] 2.498433: ext4_trim_all_free: dev 8,3 group 34, start 0, len 32767
fstrim-189 [000] 2.498434: ext4_trim_extent: dev 8,3 group 34, start 0, len 32768
fstrim-189 [000] 2.498612: ext4_trim_all_free: dev 8,3 group 35, start 0, len 32767
fstrim-189 [000] 2.498613: ext4_trim_extent: dev 8,3 group 35, start 0, len 32768
fstrim-189 [000] 2.498780: ext4_trim_all_free: dev 8,3 group 36, start 0, len 32767
fstrim-189 [000] 2.498781: ext4_trim_extent: dev 8,3 group 36, start 0, len 32768
fstrim-189 [000] 2.498972: ext4_trim_all_free: dev 8,3 group 37, start 0, len 32767
fstrim-189 [000] 2.498973: ext4_trim_extent: dev 8,3 group 37, start 0, len 32768
fstrim-189 [000] 2.499219: ext4_trim_all_free: dev 8,3 group 38, start 0, len 32767
fstrim-189 [000] 2.499221: ext4_trim_extent: dev 8,3 group 38, start 0, len 32768
fstrim-189 [000] 2.499397: ext4_trim_all_free: dev 8,3 group 39, start 0, len 9471
fstrim-189 [000] 2.499398: ext4_trim_extent: dev 8,3 group 39, start 0, len 9472
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
On Tue, Mar 11, 2014 at 11:30:47PM +0000, Richard W.M. Jones wrote:
> On Tue, Mar 11, 2014 at 06:09:28PM -0500, Eric Sandeen wrote:
> > On Tue, Mar 11, 2014, Richard W.M. Jones wrote:
> > > However just the act of doing the tracing *caused* the trim to happen
> > > properly in the underlying disk.
> >
> > that sounds very strange...
>
> Thanks Eric.
>
> FYI the libguestfs / virt-sparsify patch series that motivates this is
> here:
>
> https://www.redhat.com/archives/libguestfs/2014-March/thread.html#00091
>
> Even with the greatly reduced set of traces (see attached), just the
> act of tracing seems to have made trimming work properly. The output
> file has been trimmed properly from 926 MB to 819 MB:
I did a bit more testing on this.
It appears we are sure that the ext4 ioctl FITRIM is sending discard
requests.
However fstrim doesn't happen reliably.
fstrim + blktrace works reliably
fstrim + fsync unreliable, usually fails to trim
fstrim + sync unreliable, usually fails to trim
fstrim + umount unreliable, usually fails to trim
fstrim + sleep 10 unreliable, usually fails to trim
( fstrim + sleep 10 ) x 3 unreliable, usually fails to trim
fstrim on its own unreliable, usually fails to trim
Somewhere, the discard requests are disappearing in the stack (or more
likely, being delayed). blktrace/trace-cmd somehow forces them out.
But fsync/sync/umount/sleep does not. They might be stuck in qemu too ...
Is there any further test I can try here?
Is there a way to force out discard requests?
qemu cache mode is set to writeback.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
Well, it turns out that the bug is mine.
I had libguestfs trimming the wrong filesystem :-(
Anyway I can report that fstrim works reliably, virt-sparsify now
supports an --in-place option, the world is good.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
Il 12/03/2014 11:17, Richard W.M. Jones ha scritto:
> Somewhere, the discard requests are disappearing in the stack (or more
> likely, being delayed). blktrace/trace-cmd somehow forces them out.
> But fsync/sync/umount/sleep does not. They might be stuck in qemu too ...
No, this I can be quite sure about. QEMU sends them as soon as they are
received in the SCSI layer. If they were ill-formed, QEMU would fail
them. If they got stuck, sooner or later you'd not be able to do I/O
anymore (there is a queue depth limit) and the guest would start getting
timeouts.
Also, certainly blktrace would have no effect on QEMU.
Paolo
On Wed, Mar 12, 2014 at 07:10:42PM +0100, Paolo Bonzini wrote:
> Il 12/03/2014 11:17, Richard W.M. Jones ha scritto:
> >Somewhere, the discard requests are disappearing in the stack (or more
> >likely, being delayed). blktrace/trace-cmd somehow forces them out.
> >But fsync/sync/umount/sleep does not. They might be stuck in qemu too ...
>
> No, this I can be quite sure about. QEMU sends them as soon as they
> are received in the SCSI layer. If they were ill-formed, QEMU would
> fail them. If they got stuck, sooner or later you'd not be able to
> do I/O anymore (there is a queue depth limit) and the guest would
> start getting timeouts.
>
> Also, certainly blktrace would have no effect on QEMU.
Yup, it was completely a bug at my end. Now that is fixed,
fstrim works perfectly.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)