2015-06-05 14:05:15

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: Re-2: Strange problems with lseek in qemu-img map

On Wed, Jun 03, 2015 at 03:09:40PM +0000, David Weber wrote:
> > > I then startet a fedora 22 live system and I saw the same problem. It
> > > happens
> > > on both the ramdisk and a ext4 filesystem.
> >
> > "it" == qemu-img map hangs or takes a very long time?
> I never waited for it to complete but I guess it just takes very long.
>
> >
> > Can you post a shell script that reproduces this with a ramdisk? That
> > seems like the easiest way to get people debugging it.
>
> You can use the following commands.
>
> mkfs.ext4 /dev/ram0
> mkdir -p /mnt/tmp
> mount /dev/ram0 /mnt/tmp
> cd /mnt/tmp
> qemu-img create test 500G
> time qemu-img map test
>
> This takes foreover on all my systems.

I experience the same thing on Linux 4.1.0-rc5 with ext4 (4 KB file
system block size, 512B device sector size).

The lseek() calls take *seconds* to complete on a completely empty
(sparse) 500G file.

Here is the perf-top(1) output:

34.08% [kernel] [k] _raw_read_lock
21.11% [kernel] [k] ext4_es_lookup_extent
20.67% [kernel] [k] ext4_es_find_delayed_extent_range
8.68% [kernel] [k] ext4_map_blocks
4.99% [kernel] [k] ext4_llseek
1.90% [kernel] [k] rb_next
0.14% [kernel] [k] __fget

Yikes!

The syscall causing this is just:

lseek(7, 0, SEEK_DATA)

Lukas: I've CCed you because you have helped with ext4 issues in the
past. Not sure if you have time or if this is your area.

Stefan


Attachments:
(No filename) (1.55 kB)
(No filename) (473.00 B)
Download all attachments

2015-06-05 15:14:22

by Lukas Czerner

[permalink] [raw]
Subject: Re: [Qemu-devel] Re-2: Strange problems with lseek in qemu-img map

On Fri, 5 Jun 2015, Stefan Hajnoczi wrote:

> Date: Fri, 5 Jun 2015 15:05:15 +0100
> From: Stefan Hajnoczi <[email protected]>
> To: David Weber <[email protected]>
> Cc: [email protected], [email protected],
> Lukas Czerner <[email protected]>, [email protected]
> Subject: Re: [Qemu-devel] Re-2: Strange problems with lseek in qemu-img map
>
> On Wed, Jun 03, 2015 at 03:09:40PM +0000, David Weber wrote:
> > > > I then startet a fedora 22 live system and I saw the same problem. It
> > > > happens
> > > > on both the ramdisk and a ext4 filesystem.
> > >
> > > "it" == qemu-img map hangs or takes a very long time?
> > I never waited for it to complete but I guess it just takes very long.
> >
> > >
> > > Can you post a shell script that reproduces this with a ramdisk? That
> > > seems like the easiest way to get people debugging it.
> >
> > You can use the following commands.
> >
> > mkfs.ext4 /dev/ram0
> > mkdir -p /mnt/tmp
> > mount /dev/ram0 /mnt/tmp
> > cd /mnt/tmp
> > qemu-img create test 500G
> > time qemu-img map test
> >
> > This takes foreover on all my systems.
>
> I experience the same thing on Linux 4.1.0-rc5 with ext4 (4 KB file
> system block size, 512B device sector size).
>
> The lseek() calls take *seconds* to complete on a completely empty
> (sparse) 500G file.
>
> Here is the perf-top(1) output:
>
> 34.08% [kernel] [k] _raw_read_lock
> 21.11% [kernel] [k] ext4_es_lookup_extent
> 20.67% [kernel] [k] ext4_es_find_delayed_extent_range
> 8.68% [kernel] [k] ext4_map_blocks
> 4.99% [kernel] [k] ext4_llseek
> 1.90% [kernel] [k] rb_next
> 0.14% [kernel] [k] __fget
>
> Yikes!
>
> The syscall causing this is just:
>
> lseek(7, 0, SEEK_DATA)
>
> Lukas: I've CCed you because you have helped with ext4 issues in the
> past. Not sure if you have time or if this is your area.
>
> Stefan

Hi,

this is definitely an issue with extent status tree and I have not seen
it before. I'll look at it first thing next week, but meanwhile I'll
add Zheng Liu who is the author of the extent status tree so he
might be able to help you.

Regards,
-Lukas

2015-06-06 03:53:50

by Wen Congyang

[permalink] [raw]
Subject: Re: Re-2: Strange problems with lseek in qemu-img map

At 2015/6/5 23:14, Luk?? Czerner Wrote:
> On Fri, 5 Jun 2015, Stefan Hajnoczi wrote:
>
>> Date: Fri, 5 Jun 2015 15:05:15 +0100
>> From: Stefan Hajnoczi <[email protected]>
>> To: David Weber <[email protected]>
>> Cc: [email protected], [email protected],
>> Lukas Czerner <[email protected]>, [email protected]
>> Subject: Re: [Qemu-devel] Re-2: Strange problems with lseek in qemu-img map
>>
>> On Wed, Jun 03, 2015 at 03:09:40PM +0000, David Weber wrote:
>>>>> I then startet a fedora 22 live system and I saw the same problem. It
>>>>> happens
>>>>> on both the ramdisk and a ext4 filesystem.
>>>>
>>>> "it" == qemu-img map hangs or takes a very long time?
>>> I never waited for it to complete but I guess it just takes very long.
>>>
>>>>
>>>> Can you post a shell script that reproduces this with a ramdisk? That
>>>> seems like the easiest way to get people debugging it.
>>>
>>> You can use the following commands.
>>>
>>> mkfs.ext4 /dev/ram0
>>> mkdir -p /mnt/tmp
>>> mount /dev/ram0 /mnt/tmp
>>> cd /mnt/tmp
>>> qemu-img create test 500G
>>> time qemu-img map test
>>>
>>> This takes foreover on all my systems.
>>
>> I experience the same thing on Linux 4.1.0-rc5 with ext4 (4 KB file
>> system block size, 512B device sector size).
>>
>> The lseek() calls take *seconds* to complete on a completely empty
>> (sparse) 500G file.
>>
>> Here is the perf-top(1) output:
>>
>> 34.08% [kernel] [k] _raw_read_lock
>> 21.11% [kernel] [k] ext4_es_lookup_extent
>> 20.67% [kernel] [k] ext4_es_find_delayed_extent_range
>> 8.68% [kernel] [k] ext4_map_blocks
>> 4.99% [kernel] [k] ext4_llseek
>> 1.90% [kernel] [k] rb_next
>> 0.14% [kernel] [k] __fget
>>
>> Yikes!
>>
>> The syscall causing this is just:
>>
>> lseek(7, 0, SEEK_DATA)
>>
>> Lukas: I've CCed you because you have helped with ext4 issues in the
>> past. Not sure if you have time or if this is your area.
>>
>> Stefan
>
> Hi,
>
> this is definitely an issue with extent status tree and I have not seen
> it before. I'll look at it first thing next week, but meanwhile I'll
> add Zheng Liu who is the author of the extent status tree so he
> might be able to help you.

It is an known bug, and Dmitry Monakhov tried to fix it:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14516bb7bb6ffbd49f35389f9ece3b2045ba5815

But this patch introduced another problem, and have been reverted.

Thanks
Wen Congyang

>
> Regards,
> -Lukas
>
>

2015-06-09 10:35:15

by Lukas Czerner

[permalink] [raw]
Subject: Re: Re-2: Strange problems with lseek in qemu-img map

On Sat, 6 Jun 2015, Wen Congyang wrote:

> Date: Sat, 06 Jun 2015 11:53:50 +0800
> From: Wen Congyang <[email protected]>
> To: Luk?? Czerner <[email protected]>, Stefan Hajnoczi <[email protected]>
> Cc: David Weber <[email protected]>, [email protected],
> [email protected], [email protected]
> Subject: Re: [Qemu-devel] Re-2: Strange problems with lseek in qemu-img map
>
> At 2015/6/5 23:14, Luk?? Czerner Wrote:
> > On Fri, 5 Jun 2015, Stefan Hajnoczi wrote:
> >
> > > Date: Fri, 5 Jun 2015 15:05:15 +0100
> > > From: Stefan Hajnoczi <[email protected]>
> > > To: David Weber <[email protected]>
> > > Cc: [email protected], [email protected],
> > > Lukas Czerner <[email protected]>, [email protected]
> > > Subject: Re: [Qemu-devel] Re-2: Strange problems with lseek in qemu-img
> > > map
> > >
> > > On Wed, Jun 03, 2015 at 03:09:40PM +0000, David Weber wrote:
> > > > > > I then startet a fedora 22 live system and I saw the same problem.
> > > > > > It
> > > > > > happens
> > > > > > on both the ramdisk and a ext4 filesystem.
> > > > >
> > > > > "it" == qemu-img map hangs or takes a very long time?
> > > > I never waited for it to complete but I guess it just takes very long.
> > > >
> > > > >
> > > > > Can you post a shell script that reproduces this with a ramdisk? That
> > > > > seems like the easiest way to get people debugging it.
> > > >
> > > > You can use the following commands.
> > > >
> > > > mkfs.ext4 /dev/ram0
> > > > mkdir -p /mnt/tmp
> > > > mount /dev/ram0 /mnt/tmp
> > > > cd /mnt/tmp
> > > > qemu-img create test 500G
> > > > time qemu-img map test
> > > >
> > > > This takes foreover on all my systems.
> > >
> > > I experience the same thing on Linux 4.1.0-rc5 with ext4 (4 KB file
> > > system block size, 512B device sector size).
> > >
> > > The lseek() calls take *seconds* to complete on a completely empty
> > > (sparse) 500G file.
> > >
> > > Here is the perf-top(1) output:
> > >
> > > 34.08% [kernel] [k] _raw_read_lock
> > > 21.11% [kernel] [k] ext4_es_lookup_extent
> > > 20.67% [kernel] [k]
> > > ext4_es_find_delayed_extent_range
> > > 8.68% [kernel] [k] ext4_map_blocks
> > > 4.99% [kernel] [k] ext4_llseek
> > > 1.90% [kernel] [k] rb_next
> > > 0.14% [kernel] [k] __fget
> > >
> > > Yikes!
> > >
> > > The syscall causing this is just:
> > >
> > > lseek(7, 0, SEEK_DATA)
> > >
> > > Lukas: I've CCed you because you have helped with ext4 issues in the
> > > past. Not sure if you have time or if this is your area.
> > >
> > > Stefan
> >
> > Hi,
> >
> > this is definitely an issue with extent status tree and I have not seen
> > it before. I'll look at it first thing next week, but meanwhile I'll
> > add Zheng Liu who is the author of the extent status tree so he
> > might be able to help you.
>
> It is an known bug, and Dmitry Monakhov tried to fix it:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14516bb7bb6ffbd49f35389f9ece3b2045ba5815
>
> But this patch introduced another problem, and have been reverted.
>
> Thanks
> Wen Congyang

Wow, the code is so broken :) Luckily the fix should be fairly easy,
I am testing it now.

Thanks everyone for pointing this out.

-Lukas

>
> >
> > Regards,
> > -Lukas
> >
> >