Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
that the image kvm/qemu uses as the hd is on a partition formatted
with Btrfs, not that the fs used by the hd inside the kvm environment
is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
I haven't written down the exact numbers, because I forgot, but while
I was trying to make it work, after I noticed how much longer than
usual it was taking to just install the system, I took a look at iotop
and it was reporting a write speed of the kvm process of approximately
3M/s, while the Btrfs kernel thread had an approximately write speed
of 7K/s! Just formatting the partitions during the debian installation
took minutes. When the actual installation of the distro started I had
to stop it, because it was taking hours! The iotop results made me
think that the problem could be Btrfs, but, to be sure that it wasn't
instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
ext3 fs and started kvm with the same parameters as before. The
installation of debian inside kvm this time went smoothly and fast,
like normally it does. I've been using Btrfs for some time now and
while it has never been a speed champion(and I guess it's not supposed
to be one and I don't even really care that much about it), I've never
had any noticeable performance problem before and it has always been
quite stable. In this test case though, it seems to be doing very bad.
cheers
--
Giangiacomo
On 07/11/2010 10:24 PM, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.
>
> cheers
>
not sure with butter filesystems.. but, what is the last good kernel?
are you able to bisect?
Justin P. Mattock
12.07.2010 09:24, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.
This looks quite similar to a problem with ext4 and O_SYNC which I
reported earlier but no one cared to answer (or read?) - there:
http://permalink.gmane.org/gmane.linux.file-systems/42758
(sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
try a few other options, esp. cache=none and re-writing some guest
files to verify.
/mjt
On 07/12/2010 12:09 AM, Michael Tokarev wrote:
> 12.07.2010 09:24, Giangiacomo Mariotti wrote:
>> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
>> that the image kvm/qemu uses as the hd is on a partition formatted
>> with Btrfs, not that the fs used by the hd inside the kvm environment
>> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
>> I haven't written down the exact numbers, because I forgot, but while
>> I was trying to make it work, after I noticed how much longer than
>> usual it was taking to just install the system, I took a look at iotop
>> and it was reporting a write speed of the kvm process of approximately
>> 3M/s, while the Btrfs kernel thread had an approximately write speed
>> of 7K/s! Just formatting the partitions during the debian installation
>> took minutes. When the actual installation of the distro started I had
>> to stop it, because it was taking hours! The iotop results made me
>> think that the problem could be Btrfs, but, to be sure that it wasn't
>> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
>> ext3 fs and started kvm with the same parameters as before. The
>> installation of debian inside kvm this time went smoothly and fast,
>> like normally it does. I've been using Btrfs for some time now and
>> while it has never been a speed champion(and I guess it's not supposed
>> to be one and I don't even really care that much about it), I've never
>> had any noticeable performance problem before and it has always been
>> quite stable. In this test case though, it seems to be doing very bad.
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
cool a solution... glad to see... no chance at a bisect with this?
(getting this down too a commit or two makes things easier)
Justin P. Mattock
On Mon, Jul 12, 2010 at 9:17 AM, Justin P. Mattock
<[email protected]> wrote:
> On 07/12/2010 12:09 AM, Michael Tokarev wrote:
>>
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>
> cool a solution... glad to see... no chance at a bisect with this?
> (getting this down too a commit or two makes things easier)
>
> Justin P. Mattock
>
I didn't even say what kernel version I was using, sorry! Kernel
2.6.34.1+"patches in stable queue for next stable release". I tried
this some time ago with 2.6.33.x(don't remember which version exactly)
and it had the same problem, but at the time I stopped trying thinking
that it was a kvm problem. So basically there's no known(to me) good
version and no, I can't bisect this because this is my production
system. Anyway, I suspect this is reproducible. Am I the only one who
created a virtual hd file on a Btrfs and then used it with kvm/qemu? I
mean, it's not a particularly exotic test-case!
--
Giangiacomo
On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <[email protected]> wrote:
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
>
Either way, changing to cache=none I suspect wouldn't tell me much,
because if it's as slow as before, it's still unusable and if instead
it's even slower, well it'd be even more unusable, so I wouldn't be
able to tell the difference. What I can say for certain is that with
the exact same virtual hd file, same options, same system, but on an
ext3 fs there's no problem at all, on a Btrfs is not just slower, it
takes ages.
--
Giangiacomo
On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <[email protected]> wrote:
> >
> > This looks quite similar to a problem with ext4 and O_SYNC which I
> > reported earlier but no one cared to answer (or read?) - there:
> > http://permalink.gmane.org/gmane.linux.file-systems/42758
> > (sent to qemu-devel and linux-fsdevel lists - Cc'd too). ?You can
> > try a few other options, esp. cache=none and re-writing some guest
> > files to verify.
> >
> > /mjt
> >
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference. What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.
>
O_DIRECT support was just introduced recently, please try on the latest kernel
with the normal settings (which IIRC uses O_DIRECT), that should make things
suck alot less. Thanks,
Josef
Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <[email protected]> wrote:
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>>
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference.
Actually it's not that simple.
> What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.
It is exactly the same with ext4 vs ext3. But only on metadata-intensitive
operations (for qcow2 image). Once you allocate space, it becomes fast,
and _especially_ fast with cache=none. Actually, it looks like O_SYNC
(default cache mode) is _slower_ on ext4 than O_DIRECT (cache=none).
(And yes, I know O_DIRECT does NOT imply O_SYNC and vise versa).
/mjt
Josef Bacik wrote:
[]
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less. Thanks,
Um. Do you mean it were introduced in BTRFS or general? :)
Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)
/mjt
On Mon, Jul 12, 2010 at 05:42:04PM +0400, Michael Tokarev wrote:
> Josef Bacik wrote:
> []
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less. Thanks,
>
> Um. Do you mean it were introduced in BTRFS or general? :)
>
> Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)
Btrfs obviously.
Josef
On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <[email protected]> wrote:
>
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less. Thanks,
>
> Josef
>
With latest kernel do you mean the current Linus' git tree? Because if
instead you're talking about the current stable kernel, that's the one
I used on my test.
--
Giangiacomo
On Mon, Jul 12, 2010 at 10:23:14PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <[email protected]> wrote:
> >
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less. ?Thanks,
> >
> > Josef
> >
> With latest kernel do you mean the current Linus' git tree? Because if
> instead you're talking about the current stable kernel, that's the one
> I used on my test.
>
Yes Linus' git tree. Thanks,
Josef
On 07/12/2010 08:24 AM, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.
>
>
Btrfs is very slow on sync writes:
$ fio --name=x --directory=/images --rw=randwrite --runtime=300
--size=1G --filesize=1G --bs=4k --ioengine=psync --sync=1 --unlink=1
x: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
x: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w] [1.3% done] [0K/0K /s] [0/0 iops] [eta 06h:18m:45s]
x: (groupid=0, jobs=1): err= 0: pid=2086
write: io=13,752KB, bw=46,927B/s, iops=11, runt=300078msec
clat (msec): min=33, max=1,711, avg=87.26, stdev=60.00
bw (KB/s) : min= 5, max= 105, per=103.79%, avg=46.70, stdev=15.86
cpu : usr=0.03%, sys=19.55%, ctx=47197, majf=0, minf=94
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w: total=0/3438, short=0/0
lat (msec): 50=3.40%, 100=75.63%, 250=19.14%, 500=1.40%, 750=0.35%
lat (msec): 1000=0.06%, 2000=0.03%
Run status group 0 (all jobs):
WRITE: io=13,752KB, aggrb=45KB/s, minb=46KB/s, maxb=46KB/s,
mint=300078msec, maxt=300078msec
45KB/s, while 4-5MB/s traffic was actually going to the disk. For every
4KB that the the application writes, 400KB+ of metadata is written.
(It's actually worse, since it starts faster than the average and ends
up slower than the average).
For kvm, you can try cache=writeback or cache=unsafe and get better
performance (though still slower than ext*).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
Am 12.07.2010 15:43, schrieb Josef Bacik:
> On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
>> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <[email protected]> wrote:
>>>
>>> This looks quite similar to a problem with ext4 and O_SYNC which I
>>> reported earlier but no one cared to answer (or read?) - there:
>>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too). You can
>>> try a few other options, esp. cache=none and re-writing some guest
>>> files to verify.
>>>
>>> /mjt
>>>
>> Either way, changing to cache=none I suspect wouldn't tell me much,
>> because if it's as slow as before, it's still unusable and if instead
>> it's even slower, well it'd be even more unusable, so I wouldn't be
>> able to tell the difference. What I can say for certain is that with
>> the exact same virtual hd file, same options, same system, but on an
>> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
>> takes ages.
>>
>
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.
IIUC, he uses the default cache option of qemu, which is
cache=writethrough and maps to O_DSYNC without O_DIRECT. O_DIRECT would
only be used for cache=none.
Kevin
On Tue, Jul 13, 2010 at 6:29 AM, Avi Kivity <[email protected]> wrote:
> Btrfs is very slow on sync writes:
>
> 45KB/s, while 4-5MB/s traffic was actually going to the disk. For every 4KB
> that the the application writes, 400KB+ of metadata is written.
>
> (It's actually worse, since it starts faster than the average and ends up
> slower than the average).
>
> For kvm, you can try cache=writeback or cache=unsafe and get better
> performance (though still slower than ext*).
>
Yeah, well I've already moved the virtual hd file to an ext3
partition, so the problem for me was actually already "solved" before
posting the first post. I posted the first message just to report the
particularly bad performances of Btrfs for this test-case, so that, if
not already known, they could be investigated and hopefully solved.
By the way, thanks to everyone who answered!
--
Giangiacomo
There are a lot of variables when using qemu.
The most important one are:
- the cache mode on the device. The default is cache=writethrough,
which is not quite optimal. You generally do want to use cache=none
which uses O_DIRECT in qemu.
- if the backing image is sparse or not.
- if you use barrier - both in the host and the guest.
Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
ext3. For ext3 we also compare the default, unsafe barrier=0 version
and the barrier=1 version you should use if you actually care about
your data.
The comparism is a simple untar of a Linux 2.6.34 tarball, including a
sync after it. We run this with ext3 in the guest, either using the
default barrier=0, or for the later tests also using barrier=1. It
is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
before each test.
As you can see you generally do want to use cache=none and every
filesystem is about the same speed for that - except that on XFS you
also really need preallocation. What's interesting is how bad btrfs
is for the default compared to the others, and that for many filesystems
things actually get minimally faster when enabling barriers in the
guest. Things will look very different for barrier heavy guest, I'll
do another benchmark for those.
bdev xfs btrfs ext4 ext3 ext3 (barrier)
cache=writethrough nobarrier sparse 0m27.183s 0m42.552s 2m28.929s 0m33.749s 0m24.975s 0m37.105s
cache=writethrough nobarrier prealloc - 0m32.840s 2m28.378s 0m34.233s - -
cache=none nobarrier sparse 0m21.988s 0m49.758s 0m24.819s 0m23.977s 0m22.569s 0m24.938s
cache=none nobarrier prealloc - 0m24.464s 0m24.646s 0m24.346s - -
cache=none barrier sparse 0m21.526s 0m41.158s 0m24.403s 0m23.924s 0m23.040s 0m23.272s
cache=none barrier prealloc - 0m23.944s 0m24.284s 0m23.981s - -
On Wed, Jul 14, 2010 at 9:49 PM, Christoph Hellwig <[email protected]> wrote:
> There are a lot of variables when using qemu.
>
> The most important one are:
>
> - the cache mode on the device. The default is cache=writethrough,
> which is not quite optimal. You generally do want to use cache=none
> which uses O_DIRECT in qemu.
> - if the backing image is sparse or not.
> - if you use barrier - both in the host and the guest.
>
> Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
> ext3. For ext3 we also compare the default, unsafe barrier=0 version
> and the barrier=1 version you should use if you actually care about
> your data.
>
> The comparism is a simple untar of a Linux 2.6.34 tarball, including a
> sync after it. We run this with ext3 in the guest, either using the
> default barrier=0, or for the later tests also using barrier=1. It
> is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
> before each test.
>
> As you can see you generally do want to use cache=none and every
> filesystem is about the same speed for that - except that on XFS you
> also really need preallocation. What's interesting is how bad btrfs
> is for the default compared to the others, and that for many filesystems
> things actually get minimally faster when enabling barriers in the
> guest. Things will look very different for barrier heavy guest, I'll
> do another benchmark for those.
>
> bdev xfs btrfs ext4 ext3 ext3 (barrier)
>
> cache=writethrough nobarrier sparse 0m27.183s 0m42.552s 2m28.929s 0m33.749s 0m24.975s 0m37.105s
> cache=writethrough nobarrier prealloc - 0m32.840s 2m28.378s 0m34.233s - -
>
> cache=none nobarrier sparse 0m21.988s 0m49.758s 0m24.819s 0m23.977s 0m22.569s 0m24.938s
> cache=none nobarrier prealloc - 0m24.464s 0m24.646s 0m24.346s - -
>
> cache=none barrier sparse 0m21.526s 0m41.158s 0m24.403s 0m23.924s 0m23.040s 0m23.272s
> cache=none barrier prealloc - 0m23.944s 0m24.284s 0m23.981s - -
>
Very interesting. I haven't had the time to try it again, but now I'm
gonna try some options about the cache and see what gives me the best
results.
--
Giangiacomo
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Wed, Jul 14, 2010 at 03:49:05PM -0400, Christoph Hellwig wrote:
> Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
> ext3. For ext3 we also compare the default, unsafe barrier=0 version
> and the barrier=1 version you should use if you actually care about
> your data.
>
> The comparism is a simple untar of a Linux 2.6.34 tarball, including a
> sync after it. We run this with ext3 in the guest, either using the
> default barrier=0, or for the later tests also using barrier=1. It
> is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
> before each test.
>
> As you can see you generally do want to use cache=none and every
> filesystem is about the same speed for that - except that on XFS you
> also really need preallocation. What's interesting is how bad btrfs
> is for the default compared to the others, and that for many filesystems
> things actually get minimally faster when enabling barriers in the
> guest.
Christoph,
Thanks so much for running these benchmarks. It's been on my todo
list ever since the original complaint came across on the linux-ext4
list, but I just haven't had time to do the investigation. I wonder
exactly what qemu is doing which is impact btrfs in particularly so
badly. I assume that using the qcow2 format with cache=writethrough,
it's doing lots of effectively file appends whih require allocation
(or conversion of uninitialized preallocated blocks to initialized
blocks in the fs metadata) with lots of fsync()'s afterwards.
But when I've benchmarked the fs_mark benchmark writing 10k files
followed by an fsync, I didn't see results for btrfs that were way out
of line compared to xfs, ext3, ext4, et.al. So merely doing a block
allocation, a small write, followed by an fsync, was something that
all file systems did fairly well at. So there must be something
interesting/pathalogical about what qemu is doing with
cache=writethrough. It might be interesting to understand what is
going on there, either to fix qemu/kvm, or so file systems know that
there's a particular workload that requires some special attention...
- Ted
P.S. I assume since you listed "sparse" that you were using a raw
disk and not a qcom2 block device image?
On Sat, Jul 17, 2010 at 06:28:06AM -0400, Ted Ts'o wrote:
> Thanks so much for running these benchmarks. It's been on my todo
> list ever since the original complaint came across on the linux-ext4
> list, but I just haven't had time to do the investigation. I wonder
> exactly what qemu is doing which is impact btrfs in particularly so
> badly. I assume that using the qcow2 format with cache=writethrough,
> it's doing lots of effectively file appends whih require allocation
> (or conversion of uninitialized preallocated blocks to initialized
> blocks in the fs metadata) with lots of fsync()'s afterwards.
This is using raw images. So what we're doing there is hole filling.
No explicit fsyncs are done for cache=writethrough. cache=writethrough
translates to using O_DSYNC, which makes every write synchronous, which
these days translates to an implicity ->fsync call on every write.
> P.S. I assume since you listed "sparse" that you were using a raw
> disk and not a qcom2 block device image?
All of these are using raw images. sparse means just doing a truncate
to the image size, preallocated means using fallocate to pre-allocate
the space.
Christoph Hellwig wrote:
> There are a lot of variables when using qemu.
>
> The most important one are:
>
> - the cache mode on the device. The default is cache=writethrough,
> which is not quite optimal. You generally do want to use cache=none
> which uses O_DIRECT in qemu.
> - if the backing image is sparse or not.
> - if you use barrier - both in the host and the guest.
I noticed that when btrfs is mounted with default options, when writing
i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
host (as measured with "iostat -m -p").
With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
produces 10 GB write on the host.
--
Tomasz Chmielewski
http://wpkg.org
On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
> Christoph Hellwig wrote:
>
>> There are a lot of variables when using qemu.
>>
>> The most important one are:
>>
>> - the cache mode on the device. The default is cache=writethrough,
>> which is not quite optimal. You generally do want to use cache=none
>> which uses O_DIRECT in qemu.
>> - if the backing image is sparse or not.
>> - if you use barrier - both in the host and the guest.
>
> I noticed that when btrfs is mounted with default options, when writing
> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
> host (as measured with "iostat -m -p").
>
>
> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
> produces 10 GB write on the host.
>
Whoa 20gb? That doesn't sound right, COW should just mean we get quite a bit of
fragmentation, not write everything twice. What exactly is qemu doing? Thanks,
Josef
On 8/29/10 17:14 , Josef Bacik wrote:
> On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
>> Christoph Hellwig wrote:
>>> There are a lot of variables when using qemu.
>>>
>>> The most important one are:
>>>
>>> - the cache mode on the device. The default is cache=writethrough,
>>> which is not quite optimal. You generally do want to use cache=none
>>> which uses O_DIRECT in qemu.
>>> - if the backing image is sparse or not.
>>> - if you use barrier - both in the host and the guest.
>> I noticed that when btrfs is mounted with default options, when writing
>> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
>> host (as measured with "iostat -m -p").
>>
>> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
>> produces 10 GB write on the host
> Whoa 20gb? That doesn't sound right, COW should just mean we get quite a bit of
> fragmentation, not write everything twice. What exactly is qemu doing? Thanks,
Make sure you build your file system with "mkfs.btrfs -m single -d
single /dev/whatever". You may well be writing duplicate copies of
everything.
--rich
On Mon, Aug 30, 2010 at 8:59 AM, K. Richard Pixley <[email protected]> wrote:
> On 8/29/10 17:14 , Josef Bacik wrote:
>>
>> On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
>>>
>>> Christoph Hellwig wrote:
>>>>
>>>> There are a lot of variables when using qemu.
>>>>
>>>> The most important one are:
>>>>
>>>> - the cache mode on the device. The default is cache=writethrough,
>>>> which is not quite optimal. You generally do want to use cache=none
>>>> which uses O_DIRECT in qemu.
>>>> - if the backing image is sparse or not.
>>>> - if you use barrier - both in the host and the guest.
>>>
>>> I noticed that when btrfs is mounted with default options, when writing
>>> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
>>> host (as measured with "iostat -m -p").
>>>
>>> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
>>> produces 10 GB write on the host
>>
>> Whoa 20gb? That doesn't sound right, COW should just mean we get quite a
>> bit of
>> fragmentation, not write everything twice. What exactly is qemu doing?
>> Thanks,
>
> Make sure you build your file system with "mkfs.btrfs -m single -d single
> /dev/whatever". You may well be writing duplicate copies of everything.
>
There is little reason not to use duplicate metadata. Only small
files (less than 2kb) get stored in the tree, so there should be no
worries about images being duplicated without data duplication set at
mkfs time.
On 20100831 14:46, Mike Fedyk wrote:
> There is little reason not to use duplicate metadata. Only small
> files (less than 2kb) get stored in the tree, so there should be no
> worries about images being duplicated without data duplication set at
> mkfs time.
My benchmarks show that for my kinds of data, btrfs is somewhat slower
than ext4, (which is slightly slower than ext3 which is somewhat slower
than ext2), when using the defaults, (ie, duplicate metadata).
It's a hair faster than ext2, (the fastest of the ext family), when
using singleton metadata. And ext2 isn't even crash resistant while
btrfs has snapshots.
I'm using hardware raid for striping speed. (Tried btrfs striping, it
was close, but not as fast on my hardware). I want speed, speed, speed.
My data is only vaguely important, (continuous builders), but speed is
everything.
While the reason to use singleton metadata may be "little", it dominates
my application. If I were forced to use duplicate metadata then I'd
still be arguing with my coworkers about whether the speed costs were
worth it to buy snapshot functionality. But the fact that btrfs is
faster AND provides snapshots, (and less metadata overhead and bigger
file systems and etc), makes for an easy sale.
Note that nilfs2 has similar performance, but somewhat different
snapshot characteristics that aren't as useful in my current application.
--rich