On 06/10/2010 04:47 PM, Paul Eggert wrote:
> On 06/09/2010 11:56 PM, jeff.liu wrote:
>
>> Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
>> the particular FS.
>>
> If the file system is using delayed allocation, then can
> the fiemap ioctl tell us that a file contains a hole (because nothing has been
> allocated there), but read() would tell us that the file contains nonzero data at the same location
> (because it's sitting in a buffer somewhere)? If so, we'd need to do something like invoke
> fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
> there's another ioctl that will do the allocation without having to actually do a sync.
>
I guess we'll have to use FIEMAP_FLAG_SYNC.
> There's also the issue of copying from a file at the same time that some other process
> is writing to it, but that is allowed to produce ill-defined behavior. I'm more worried
> about the case where some other process writes to the source file just before 'cp' starts.
>
cp's behavior with active files is undefined. But we know it reads from
offset 0 to MAX. With fiemap it will continue to do the same with the
exception that it will skip reads (and thus writes) depending on the extent
map it gets at the very beginning.
> (Sorry, I haven't had time yet to dive into the proposed change; I'm still trying to understand
> the environment.)
>
> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and SEEK_DATA options, which
> are easier to use and which (as far as I can tell from the manual) shouldn't require anything
> fdatasync-ish. Any objection if I propose support for that too? It is supposed to work
> with ZFS, something I can test here.
>
There is no plan to implement SEEK_HOLE/SEEK_DATA in the kernel.
At most glibc will use fiemap to extend lseek(). BTW, SEEK_HOLE/DATA
also have the same problem with active files.
ccing linux-ext4.
Sunil Mushran wrote:
> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>
>>> Yeah, I just realized that the behaviour I observed is caused by the
>>> delay allocation mechanism of
>>> the particular FS.
>>>
>> If the file system is using delayed allocation, then can
>> the fiemap ioctl tell us that a file contains a hole (because nothing
>> has been
>> allocated there), but read() would tell us that the file contains
>> nonzero data at the same location
>> (because it's sitting in a buffer somewhere)? If so, we'd need to do
>> something like invoke
>> fdatasync() on the file before issuing the fiemap ioctl, to force
>> allocation; or perhaps
>> there's another ioctl that will do the allocation without having to
>> actually do a sync.
>>
>
> I guess we'll have to use FIEMAP_FLAG_SYNC.
Hi Sunil,
Thanks for the comments.
So we can ensure the source file synced before mapping in this way.
Hi Jim and Paul,
How about the tiny patch below?
>From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
From: Jie Liu <[email protected]>
Date: Fri, 11 Jun 2010 16:29:02 +0800
Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
* src/copy.c (fiemap_copy): Force kernel to sync the source
file before mapping.
Signed-off-by: Jie Liu <[email protected]>
---
src/copy.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/src/copy.c b/src/copy.c
index f149be4..f48c74d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
do
{
fiemap->fm_length = FIEMAP_MAX_OFFSET;
+ fiemap->fm_flags = FIEMAP_FLAG_SYNC;
fiemap->fm_extent_count = count;
/* When ioctl(2) fails, fall back to the normal copy only if it
--
1.5.4.3
Thanks,
-Jeff
>
>> There's also the issue of copying from a file at the same time that
>> some other process
>> is writing to it, but that is allowed to produce ill-defined
>> behavior. I'm more worried
>> about the case where some other process writes to the source file just
>> before 'cp' starts.
>>
>
> cp's behavior with active files is undefined. But we know it reads from
> offset 0 to MAX. With fiemap it will continue to do the same with the
> exception that it will skip reads (and thus writes) depending on the extent
> map it gets at the very beginning.
>
>> (Sorry, I haven't had time yet to dive into the proposed change; I'm
>> still trying to understand
>> the environment.)
>>
>> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and
>> SEEK_DATA options, which
>> are easier to use and which (as far as I can tell from the manual)
>> shouldn't require anything
>> fdatasync-ish. Any objection if I propose support for that too? It
>> is supposed to work
>> with ZFS, something I can test here.
>>
>
> There is no plan to implement SEEK_HOLE/SEEK_DATA in the kernel.
> At most glibc will use fiemap to extend lseek(). BTW, SEEK_HOLE/DATA
> also have the same problem with active files.
>
> ccing linux-ext4.
>
>
>
--
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.
jeff.liu wrote:
> Sunil Mushran wrote:
...
>> I guess we'll have to use FIEMAP_FLAG_SYNC.
> Hi Sunil,
>
> Thanks for the comments.
> So we can ensure the source file synced before mapping in this way.
>
> Hi Jim and Paul,
>
> How about the tiny patch below?
...
> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
>
> * src/copy.c (fiemap_copy): Force kernel to sync the source
> file before mapping.
>
> Signed-off-by: Jie Liu <[email protected]>
> ---
> src/copy.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/src/copy.c b/src/copy.c
> index f149be4..f48c74d 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
> do
> {
> fiemap->fm_length = FIEMAP_MAX_OFFSET;
> + fiemap->fm_flags = FIEMAP_FLAG_SYNC;
> fiemap->fm_extent_count = count;
Thanks to both of you.
That patch looks fine, Jeff.
However, at least with ext4 and fedora 13, I require
this change to pass "make check". Note the new entry
in the "flags" column:
==> ff2 <==
Filesystem type is: ef53
File size of j2 is 2048 (1 block, blocksize 4096)
ext logical physical expected length flags
0 0 0 1 unknown,delalloc,eof
j2: 1 extent found
So I will apply the test-fixing patch first, then your change,
Jeff, and then rebase to the latest on master.
>From 484903dc41246cb3c774f178f695725561b105a0 Mon Sep 17 00:00:00 2001
From: Jim Meyering <[email protected]>
Date: Fri, 11 Jun 2010 14:34:03 +0200
Subject: [PATCH 1/2] tests: accommodate varying filefrag -v "flags" output
* tests/cp/sparse-fiemap: Accommodate values other than "eof"
in the "flags" column of filefrag -v output
---
tests/cp/sparse-fiemap | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index dc0cf60..b6b1103 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -68,10 +68,11 @@ $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'
# Extract logical block number and length pairs from filefrag -v output.
# The initial sed is to remove the "eof" from the normally-empty "flags" field.
+# Similarly, remove flags values like "unknown,delalloc,eof".
# That is required when that final extent has no number in the "expected" field.
f()
{
- sed 's/ eof$//' $@ \
+ sed 's/ [a-z,][a-z,]*$//' $@ \
| awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
}
--
1.7.1.501.g23b46
jeff.liu wrote:
> Sunil Mushran wrote:
>> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>>
>>>> Yeah, I just realized that the behaviour I observed is caused by the
>>>> delay allocation mechanism of
>>>> the particular FS.
>>>>
>>> If the file system is using delayed allocation, then can
>>> the fiemap ioctl tell us that a file contains a hole (because nothing
>>> has been
>>> allocated there), but read() would tell us that the file contains
>>> nonzero data at the same location
>>> (because it's sitting in a buffer somewhere)? If so, we'd need to do
>>> something like invoke
>>> fdatasync() on the file before issuing the fiemap ioctl, to force
>>> allocation; or perhaps
>>> there's another ioctl that will do the allocation without having to
>>> actually do a sync.
>>>
>> I guess we'll have to use FIEMAP_FLAG_SYNC.
> Hi Sunil,
>
> Thanks for the comments.
> So we can ensure the source file synced before mapping in this way.
>
> Hi Jim and Paul,
>
> How about the tiny patch below?
I agree that this is needed, thanks.
-Eric
> From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
> From: Jie Liu <[email protected]>
> Date: Fri, 11 Jun 2010 16:29:02 +0800
> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
>
> * src/copy.c (fiemap_copy): Force kernel to sync the source
> file before mapping.
>
> Signed-off-by: Jie Liu <[email protected]>
> ---
> src/copy.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/src/copy.c b/src/copy.c
> index f149be4..f48c74d 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
> do
> {
> fiemap->fm_length = FIEMAP_MAX_OFFSET;
> + fiemap->fm_flags = FIEMAP_FLAG_SYNC;
> fiemap->fm_extent_count = count;
>
> /* When ioctl(2) fails, fall back to the normal copy only if it
Eric Sandeen wrote:
> jeff.liu wrote:
>> Sunil Mushran wrote:
>>> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>>>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>>>
>>>>> Yeah, I just realized that the behaviour I observed is caused by the
>>>>> delay allocation mechanism of
>>>>> the particular FS.
>>>>>
>>>> If the file system is using delayed allocation, then can
>>>> the fiemap ioctl tell us that a file contains a hole (because nothing
>>>> has been
>>>> allocated there), but read() would tell us that the file contains
>>>> nonzero data at the same location
>>>> (because it's sitting in a buffer somewhere)? If so, we'd need to do
>>>> something like invoke
>>>> fdatasync() on the file before issuing the fiemap ioctl, to force
>>>> allocation; or perhaps
>>>> there's another ioctl that will do the allocation without having to
>>>> actually do a sync.
>>>>
>>> I guess we'll have to use FIEMAP_FLAG_SYNC.
>> Hi Sunil,
>>
>> Thanks for the comments.
>> So we can ensure the source file synced before mapping in this way.
>>
>> Hi Jim and Paul,
>>
>> How about the tiny patch below?
>
> I agree that this is needed, thanks.
Thanks for your confirming response.
Regards,
-Jeff
>
> -Eric
>
>> From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <[email protected]>
>> Date: Fri, 11 Jun 2010 16:29:02 +0800
>> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
>>
>> * src/copy.c (fiemap_copy): Force kernel to sync the source
>> file before mapping.
>>
>> Signed-off-by: Jie Liu <[email protected]>
>> ---
>> src/copy.c | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index f149be4..f48c74d 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>> do
>> {
>> fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> + fiemap->fm_flags = FIEMAP_FLAG_SYNC;
>> fiemap->fm_extent_count = count;
>>
>> /* When ioctl(2) fails, fall back to the normal copy only if it
>
>
>
>
--
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.
Sunil Mushran wrote:
> SEEK_HOLE/DATA also have the same problem with active files.
Yes, that's true if 'cp' is copying a file while someone else is writing to it.
But the case we're worried about is when 'cp' starts copying a file immediately
after someone else has finished writing to it but data has not been sent to
disk; in that case SEEK_HOLE/DATA should work just as well as the fiemap ioctl.
On 06/11/2010 01:31 AM, jeff.liu wrote:
> + fiemap->fm_flags = FIEMAP_FLAG_SYNC;
If I'm reading the Linux source code correctly,
this forces all the dirty blocks in the input file to disk, and
forces the kernel to wait until all those blocks actually hit the disk.
We don't need that; there shouldn't be a need to force any
blocks to hit the disk. All that 'cp' needs to know is: please tell me
the next block which might contain nonzero data. Is there some way
that we can get that information without forcing the input file's
blocks to disk?