2005-02-21 04:01:13

by Pete Zaitcev

[permalink] [raw]
Subject: Merging fails reading /dev/uba1

Hi, Jens:

I think this question belongs to your domain, but please let me know
if I'm mistaken, so I can pursue this elsewhere.

I encountered a strange performance anomaly. I do the following:

<----- Plug USB key
[root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
10240+0 records in
10240+0 records out

real 0m22.731s
user 0m0.004s
sys 0m0.345s
[root@lembas ~]#

<----- Remove and replug the USB key
[root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
10240+0 records in
10240+0 records out

real 1m42.622s
user 0m0.005s
sys 0m1.518s
[root@lembas ~]#

So, reading from a partition of the same device is 5 times slower than
reading from the device itself. The question is, why?

To the best of my knowledge, this does not occur with SCSI (usb-storage
and sd or sr). This hints strongly that the ub is not doing something
right, but what that can be?

The ub takes the request processing machinery from Carmel exactly. I am
wondering if Carmel (sx8) exhibits any similar performance anomalies
(cc-ing to Jeff)

Additional information:

[root@lembas ~]# cat /proc/version
Linux version 2.6.11-rc4-lem (zaitcev@lembas) (gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 Tue Feb 15 23:06:39 PST 2005
[root@lembas ~]# cat /proc/partitions
major minor #blocks name

3 0 39070080 hda
3 1 5935986 hda1
3 2 5936017 hda2
3 3 554242 hda3
3 4 1 hda4
3 5 26643771 hda5
180 0 1024000 uba
180 1 1023983 uba1
[root@lembas ~]#

Thanks,
-- Pete


2005-02-21 07:51:47

by Jens Axboe

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1

On Sun, Feb 20 2005, Pete Zaitcev wrote:
> Hi, Jens:
>
> I think this question belongs to your domain, but please let me know
> if I'm mistaken, so I can pursue this elsewhere.
>
> I encountered a strange performance anomaly. I do the following:
>
> <----- Plug USB key
> [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> 10240+0 records in
> 10240+0 records out
>
> real 0m22.731s
> user 0m0.004s
> sys 0m0.345s
> [root@lembas ~]#
>
> <----- Remove and replug the USB key
> [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> 10240+0 records in
> 10240+0 records out
>
> real 1m42.622s
> user 0m0.005s
> sys 0m1.518s
> [root@lembas ~]#
>
> So, reading from a partition of the same device is 5 times slower than
> reading from the device itself. The question is, why?
>
> To the best of my knowledge, this does not occur with SCSI (usb-storage
> and sd or sr). This hints strongly that the ub is not doing something
> right, but what that can be?
>
> The ub takes the request processing machinery from Carmel exactly. I am
> wondering if Carmel (sx8) exhibits any similar performance anomalies
> (cc-ing to Jeff)

I can't explain why the replugging slows it down, maybe you were lucky
to get contigious pages in the first case? As far as I can see, ub
effectively disables merging by setting max hw/phys segment limit of 1.

--
Jens Axboe

2005-02-21 18:24:41

by Pete Zaitcev

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1

On Mon, 21 Feb 2005 08:51:32 +0100, Jens Axboe <[email protected]> wrote:

> > [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> > real 0m22.731s

> > [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> > real 1m42.622s

> > So, reading from a partition of the same device is 5 times slower than
> > reading from the device itself. The question is, why?

> I can't explain why the replugging slows it down, maybe you were lucky
> to get contigious pages in the first case? As far as I can see, ub
> effectively disables merging by setting max hw/phys segment limit of 1.

If you mean physical replugging, it has nothing to do with the issue.
I only mentioned it to show that old pages were purged.

Contiguous pages have nothing to do with it either. I forgot to mention
that in the first case (whole device), all reads are done with length of
4KB, while in the second case (partition), all reads are 512 bytes long.

Basically, the key is reading from a partition or not. It causes the
sub-page sized merging to fail.

This is how paritioning looks:

[root@lembas zaitcev]# fdisk /dev/uba

Command (m for help): p

Disk /dev/uba: 1048 MB, 1048576000 bytes
64 heads, 32 sectors/track, 1000 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/uba1 * 1 1000 1023983+ 6 FAT16
Partition 1 has different physical/logical endings:
phys=(998, 63, 32) logical=(999, 63, 31)

Command (m for help):

It does not look to me as if the partition started from an odd number
of sectors. In fact, it starts from a full number of pages.

The segment number hint was a good one. I can implement a fake s/g
capability easily within the driver, if this is suggested. But before
hacking on that, I'd like to note that I'm surprised how the block
layer is unable to coalesce sector-sized reads within a page. Also,
why does this depend on partitioning? Something is fishy here.

-- Pete

2005-02-21 18:32:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1

On Mon, Feb 21, 2005 at 10:24:31AM -0800, Pete Zaitcev wrote:
> On Mon, 21 Feb 2005 08:51:32 +0100, Jens Axboe <[email protected]> wrote:
>
> > > [root@lembas ~]# time dd if=/dev/uba of=/dev/null bs=10k count=10240
> > > real 0m22.731s
>
> > > [root@lembas ~]# time dd if=/dev/uba1 of=/dev/null bs=10k count=10240
> > > real 1m42.622s
>
> > > So, reading from a partition of the same device is 5 times slower than
> > > reading from the device itself. The question is, why?
>
> > I can't explain why the replugging slows it down, maybe you were lucky
> > to get contigious pages in the first case? As far as I can see, ub
> > effectively disables merging by setting max hw/phys segment limit of 1.
>
> If you mean physical replugging, it has nothing to do with the issue.
> I only mentioned it to show that old pages were purged.
>
> Contiguous pages have nothing to do with it either. I forgot to mention
> that in the first case (whole device), all reads are done with length of
> 4KB, while in the second case (partition), all reads are 512 bytes long.
>
> Basically, the key is reading from a partition or not. It causes the
> sub-page sized merging to fail.

Does setting the blkdev's block size change things?

Jeff



2005-02-21 20:00:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1



On Mon, 21 Feb 2005, Pete Zaitcev wrote:
>
> Contiguous pages have nothing to do with it either. I forgot to mention
> that in the first case (whole device), all reads are done with length of
> 4KB, while in the second case (partition), all reads are 512 bytes long.

That's because your partition isn't a full 4kB in size.

So the kernel falls back to 512-byte reads, just because they are the only
kind that _can_ read the last sector.

> Disk /dev/uba: 1048 MB, 1048576000 bytes

Note: this is a nice multiple of 4kB.

> 64 heads, 32 sectors/track, 1000 cylinders
> Units = cylinders of 2048 * 512 = 1048576 bytes
>
> Device Boot Start End Blocks Id System
> /dev/uba1 * 1 1000 1023983+ 6 FAT16

And note how this is _not_ (see the "+" at the end), you've got a
1023983.5 kB partition.

> It does not look to me as if the partition started from an odd number
> of sectors. In fact, it starts from a full number of pages.

But it seems to end in an odd number of sectors.

That said, I'm surprised that the difference in performance is _that_
large. Regardless of whether the disk blocksize is 512 bytes or 4096
bytes, you should be getting IO merging - it might use more CPU time, but
the actual IO should still be done in much larger blocks.

You should be able to try the BLKBSZSET ioctl to set the blocksize by hand
if you want to try it out:

int size = 4096;
ioctl(fd, BLKBSZSET, &size);

or similar. Of course, mounting a filesystem on the device tends to do
that (or undo it) for you, ie it will set the blocksize to whatever
blocksize the filesystem wants.

Linus

2005-02-22 00:42:15

by Pete Zaitcev

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1

On Mon, 21 Feb 2005 12:00:48 -0800 (PST), Linus Torvalds <[email protected]> wrote:

> That said, I'm surprised that the difference in performance is _that_
> large. Regardless of whether the disk blocksize is 512 bytes or 4096
> bytes, you should be getting IO merging - it might use more CPU time, but
> the actual IO should still be done in much larger blocks.

I am surprised too. Jens says "ub effectively disables merging by setting
max hw/phys segment limit of 1." But surely this ought not to be a problem
for reads within the same page.

> int size = 4096;
> ioctl(fd, BLKBSZSET, &size);

Thank you for the tip. This works fine, 4KB I/O is restored for dd.
However, I still have this problem with people who use ub to read CF sticks
from their cameras, mounted as FAT or VFAT. I verified that the effect of
this ioctl disappears at mount time, just as you said.

I'll think what I can do about it.

-- Pete

2005-02-22 01:47:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Merging fails reading /dev/uba1



On Mon, 21 Feb 2005, Pete Zaitcev wrote:
>
> I am surprised too. Jens says "ub effectively disables merging by setting
> max hw/phys segment limit of 1." But surely this ought not to be a problem
> for reads within the same page.

Hmm.. Why does it do that anyway? Jens - will merging take place at all
with that setting, even for physically contiguous segments? It appears
not, from the timings.

Anyway, I _think_ the bug is in the BIOVEC_VIRT_MERGEABLE() usage, which
doesn't seem to make much sense. In particular, look at
"ll_merge_requests_fn()", and notice how it first checks whether something
is physically mergeable, but even if it _is_ able to merge physically, it
will still check virtual mergeability too - which makes no sense at all.

If it was physically mergeable, there _is_ no virtual merge. In
particular, a device (or system) that doesn't support virtual merges, or
only supports them on a page boundary, will always _fail_ to virtually
merge within the same page, so it's guaranteed to never merge 512-byte
entries.

Jens, that just _has_ to be wrong. If a physical merge was possible, we
shouldn't check the virtual merge, we should just return 1.

> > int size = 4096;
> > ioctl(fd, BLKBSZSET, &size);
>
> Thank you for the tip. This works fine, 4KB I/O is restored for dd.
> However, I still have this problem with people who use ub to read CF sticks
> from their cameras, mounted as FAT or VFAT. I verified that the effect of
> this ioctl disappears at mount time, just as you said.

Yes. The FAT filesystem needs to set the buffer size to 512 bytes, since
it will actually act in 512-byte blocks.

> I'll think what I can do about it.

Enable merging is the thing to do. Why does UB have any merging limits at
all, since USB has to scatter-gather the fragments anyway?

Anyway, I think you can work around the above virtual merge bug (assuming
I'm right, and it _is_ a bug, which Jens may or may be able to correct me
on, depending on just how deep into baby-diapers he is), by just saying
that UB supports only _one_ physical segment, but can take any number of
virtual segments.

Ie do

blk_queue_max_hw_segments(q, 100);
blk_queue_max_phys_segments(q, 1);

which tells the block layer that you don't care about how hard it is to
merge things virtually, but you only ever want _one_ physical segment. (At
which point you will also only really ever get one virtual segment, of
course, but the point is that you'll avoid the bug that says "I can't
merge these two things virtually" when you don't care).

Maybe that works, maybe it doesn't. Give it a try.

Linus