2008-09-05 18:24:48

by Eric Sandeen

[permalink] [raw]
Subject: odd allocation patterns

I was trying some various IO patterns to see what the ext4 allocator
might do (as I tend to do every few months ....) :)

On the one hand there are some very interesting, and nice (at least for
some workloads) results:

If I write even, then odd, blocks, in the end it comes out to one
extent - even with an unmount in between:

# for I in `seq 0 2 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
conv=notrunc seek=$I 2>/dev/null; done

(unmount, remount)

# for I in `seq 1 2 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
conv=notrunc seek=$I 2>/dev/null; done
# filefrag testfile
File is stored in extents format
testfile: 1 extent found

as long as the holes eventually get filled in, this is a pretty nice
behavior to end up with contiguous allocation (if they're not ever
filled in, it's a little odd)

However, sequential, synchronous writes are doing weird things:

# for I in `seq 1 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
conv=notrunc seek=$I oflag=sync 2>/dev/null; done

# filefrag -v testfile
Checking testfile
Filesystem type is: ef53
Filesystem cylinder groups is approximately 235
File is stored in extents format
Blocksize of file testfile2 is 4096
File size of testfile2 is 4198400 (1025 blocks)
First block: 0
Last block: 45312
Discontinuity: Block 2 is at 44032 (was 43520)
Discontinuity: Block 11 is at 43521 (was 44040)
Discontinuity: Block 15 is at 43066 (was 43524)
Discontinuity: Block 256 is at 44544 (was 43306)
testfile: 5 extents found

not only is it non-contiguous, it's out of order.

Ditto for direct IO:

# for I in `seq 1 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
conv=notrunc seek=$I oflag=direct 2>/dev/null; done

[root@inode test2]# filefrag -v testfile
Checking testfile
Filesystem type is: ef53
Filesystem cylinder groups is approximately 235
File is stored in extents format
Blocksize of file testfile is 4096
File size of testfile is 4198400 (1025 blocks)
First block: 0
Last block: 45312
Discontinuity: Block 2 is at 43525 (was 44041)
Discontinuity: Block 4 is at 44042 (was 43526)
Discontinuity: Block 5 is at 43527 (was 44042)
Discontinuity: Block 15 is at 43306 (was 43536)
Discontinuity: Block 16 is at 43312 (was 43306)
Discontinuity: Block 128 is at 43136 (was 43423)
Discontinuity: Block 256 is at 44544 (was 43263)
testfile: 8 extents found

Interestingly, a backwards synchronous write comes out exactly the same:

[root@inode test2]# for I in `seq 1024 -1 0`; do dd if=/dev/zero
of=testfile2 bs=4k count=1 conv=notrunc seek=$I oflag=sync 2>/dev/null; done
[root@inode test2]# filefrag -v testfileChecking testfile
Filesystem type is: ef53
Filesystem cylinder groups is approximately 235
File is stored in extents format
Blocksize of file testfile is 4096
File size of testfile is 4198400 (1025 blocks)
First block: 0
Last block: 45312
Discontinuity: Block 2 is at 43525 (was 44041)
Discontinuity: Block 4 is at 44042 (was 43526)
Discontinuity: Block 5 is at 43527 (was 44042)
Discontinuity: Block 15 is at 43306 (was 43536)
Discontinuity: Block 16 is at 43312 (was 43306)
Discontinuity: Block 128 is at 43136 (was 43423)
Discontinuity: Block 256 is at 44544 (was 43263)
testfile: 8 extents found

It's not an artifact of filefrag; debugfs shows it as well:

BLOCKS:
(IND):43066, (1):44041, (2-3):43525-43526, (4):44042,
(5-14):43527-43536, (15):43306, (16-127):43312-43423,
(128-255):43136-43263, (256-1024):44544-45312

not sure what's going on here, have not started digging yet, but it's
.... odd. With delallloc and buffered (non-synchronous IO), these all
come out pretty sanely.

-Eric


2008-09-06 06:39:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: odd allocation patterns

On Sep 05, 2008 13:24 -0500, Eric Sandeen wrote:
> If I write even, then odd, blocks, in the end it comes out to one
> extent - even with an unmount in between:
>
> # for I in `seq 0 2 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
> conv=notrunc seek=$I 2>/dev/null; done
>
> (unmount, remount)
>
> # for I in `seq 1 2 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
> conv=notrunc seek=$I 2>/dev/null; done
> # filefrag testfile
> File is stored in extents format
> testfile: 1 extent found

Interesting. I'd asked Alex to tune the allocator to locate blocks
with a position relative to the end of the previously-allocated blocks.
I didn't think it would actually work so well :-).

> However, sequential, synchronous writes are doing weird things:
>
> # for I in `seq 1 1024`; do dd if=/dev/zero of=testfile bs=4k count=1
> conv=notrunc seek=$I oflag=sync 2>/dev/null; done
>
> # filefrag -v testfile
> Checking testfile
> Filesystem type is: ef53
> Filesystem cylinder groups is approximately 235
> File is stored in extents format
> Blocksize of file testfile2 is 4096
> File size of testfile2 is 4198400 (1025 blocks)
> First block: 0
> Last block: 45312
> Discontinuity: Block 2 is at 44032 (was 43520)
> Discontinuity: Block 11 is at 43521 (was 44040)
> Discontinuity: Block 15 is at 43066 (was 43524)
> Discontinuity: Block 256 is at 44544 (was 43306)
> testfile: 5 extents found
>
> not only is it non-contiguous, it's out of order.

I agree this is completely strange. The only thing I can think of is
that this is being treated as a "small file" and the blocks are being
packed into the small file preallocation group, and if this is an SMP
system then it is possible there are 2 or more preallocation spaces.
Since you have 3 processes running (bash, seq, dd) and dd is being run
in a different process (CPU?) for each block.

Can you try running this with a single process? Even if you run
"dd if=/dev/zero of=testfile bs=4k count=1024 oflag=sync" should
still produce single-block sync writes without forking each time.

I agree the allocator probably shouldn't do this, but it isn't exactly
a normal workload. It seems possible that the goal block (the last
block allocated) isn't being taken into account properly? It also
seems possible that if the dd process is moving between CPUs each time
the preallocation group is blocking the allocation of the "next" block?

> Interestingly, a backwards synchronous write comes out exactly the same:

Are you sure you unlinked the file in between? :-)

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.