2008-03-20 00:08:43

by David Chinner

[permalink] [raw]
Subject: Buffered I/O to block device very slow and other SCSI issues...

4p ia64, 24GB RAM, 2.6.25-rc3, qla1280, 15krpm scsi disk.

Direct I/O:

dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=1024 oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 27.8974 s, 38.5 MB/s

Doing approximately 80 512k I/os per second (disk bandwidth).

Buffered I/O:

dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 427.872 s, 10.0 MB/s

Which was doing about 200 64k I/Os per second for the entire write.
Interactivity goes to hell, cpu usage is pretty much zero. I can
barely run anything, I can't ctrl-c out of running programs, etc.

It would appear that the block device is being flushed one sector at
a time:

8,22 3 8757 26.699052800 12143 Q W 9770471 + 1 [sync]
8,22 3 8760 26.699058050 12143 Q W 9770472 + 1 [sync]
8,22 3 8763 26.699063300 12143 Q W 9770473 + 1 [sync]
8,22 3 8766 26.699068900 12143 Q W 9770474 + 1 [sync]
8,22 3 8769 26.699074150 12143 Q W 9770475 + 1 [sync]
8,22 3 8772 26.699079400 12143 Q W 9770476 + 1 [sync]
8,22 3 8775 26.699084650 12143 Q W 9770477 + 1 [sync]
8,22 3 8778 26.699090200 12143 Q W 9770478 + 1 [sync]
8,22 3 8781 26.699095600 12143 Q W 9770479 + 1 [sync]
8,22 3 8784 26.699100800 12143 Q W 9770480 + 1 [sync]
8,22 3 8787 26.699106100 12143 Q W 9770481 + 1 [sync]
8,22 3 8790 26.699111700 12143 Q W 9770482 + 1 [sync]
8,22 3 8793 26.699117000 12143 Q W 9770483 + 1 [sync]
8,22 3 8796 26.699122250 12143 Q W 9770484 + 1 [sync]
8,22 3 8799 26.699127500 12143 Q W 9770485 + 1 [sync]
8,22 3 8802 26.699133100 12143 Q W 9770486 + 1 [sync]
8,22 3 8805 26.699138450 12143 Q W 9770487 + 1 [sync]

And it's only combining into 128 sector I/Os. Something wrong
with bio merging limiting the I/os to 128 elements? I'm using BSG
here, so maybe that's a factor.....

I also suspect that CTQ has not been set up correctly on this
kernel, because:

$ cat /sys/block/sdb/device/queue_depth
3
$ ls -l /sys/block/sdb/device/queue_depth
-r--r--r-- 1 root root 0 Mar 20 09:59 /sys/block/sdb/device/queue_depth
$

It appears to be hard coded to 3 and can't be changed....

Hmmm:

$ sudo sdparm -s WCE=1 /dev/sdb
SCSI INQUIRY command failed on /dev/sdb
$

Did I miss some new config option, or are things just generally broken?

Cheers,

Dave.

--
Dave Chinner
Principal Engineer
SGI Australian Software Group


2008-03-20 01:08:23

by Jeremy Higdon

[permalink] [raw]
Subject: Re: Buffered I/O to block device very slow and other SCSI issues...

On Thu, Mar 20, 2008 at 10:16:54AM +1100, David Chinner wrote:
> 4p ia64, 24GB RAM, 2.6.25-rc3, qla1280, 15krpm scsi disk.
>
> Direct I/O:
>
> dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=1024 oflag=direct
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 27.8974 s, 38.5 MB/s
>
> Doing approximately 80 512k I/os per second (disk bandwidth).
>
> Buffered I/O:
>
> dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=4096
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 427.872 s, 10.0 MB/s

How big is sdb6? How many '2's do you see in

factor `cat /sys/block/sdb/sdb6/size`

> I also suspect that CTQ has not been set up correctly on this
> kernel, because:
>
> $ cat /sys/block/sdb/device/queue_depth
> 3
> $ ls -l /sys/block/sdb/device/queue_depth
> -r--r--r-- 1 root root 0 Mar 20 09:59 /sys/block/sdb/device/queue_depth
> $
>
> It appears to be hard coded to 3 and can't be changed....

That's a bug in the qla1280 driver. I thought that had gotten fixed.
It's looking at the wrong mailbox register after setting device parameters.

jeremy

2008-03-20 10:21:05

by Andrew Morton

[permalink] [raw]
Subject: Re: Buffered I/O to block device very slow and other SCSI issues...

On Wed, 19 Mar 2008 18:08:07 -0700 Jeremy Higdon <[email protected]> wrote:
>

(cc's added. It matters)

> On Thu, Mar 20, 2008 at 10:16:54AM +1100, David Chinner wrote:
> > 4p ia64, 24GB RAM, 2.6.25-rc3, qla1280, 15krpm scsi disk.
> >
> > Direct I/O:
> >
> > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=1024 oflag=direct
> > 1024+0 records in
> > 1024+0 records out
> > 1073741824 bytes (1.1 GB) copied, 27.8974 s, 38.5 MB/s
> >
> > Doing approximately 80 512k I/os per second (disk bandwidth).
> >
> > Buffered I/O:
> >
> > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=4096
> > 4096+0 records in
> > 4096+0 records out
> > 4294967296 bytes (4.3 GB) copied, 427.872 s, 10.0 MB/s
>
> How big is sdb6? How many '2's do you see in
>
> factor `cat /sys/block/sdb/sdb6/size`

There have always been problems with thsi and I'm not sure that anyone
cared enough about buffered writes to blockdevs to get to the bottom of
them.

I assume you aren't running i386 highmem...


> > I also suspect that CTQ has not been set up correctly on this
> > kernel, because:
> >
> > $ cat /sys/block/sdb/device/queue_depth
> > 3
> > $ ls -l /sys/block/sdb/device/queue_depth
> > -r--r--r-- 1 root root 0 Mar 20 09:59 /sys/block/sdb/device/queue_depth
> > $
> >
> > It appears to be hard coded to 3 and can't be changed....
>
> That's a bug in the qla1280 driver. I thought that had gotten fixed.
> It's looking at the wrong mailbox register after setting device parameters.
>

Was there a patch anywhere?

2008-03-20 11:59:22

by Mike Snitzer

[permalink] [raw]
Subject: Re: Buffered I/O to block device very slow and other SCSI issues...

On Thu, Mar 20, 2008 at 6:20 AM, Andrew Morton
<[email protected]> wrote:
> On Wed, 19 Mar 2008 18:08:07 -0700 Jeremy Higdon <[email protected]> wrote:
> >
>
> (cc's added. It matters)
>
>
> > On Thu, Mar 20, 2008 at 10:16:54AM +1100, David Chinner wrote:
> > > 4p ia64, 24GB RAM, 2.6.25-rc3, qla1280, 15krpm scsi disk.
> > >
> > > Direct I/O:
> > >
> > > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=1024 oflag=direct
> > > 1024+0 records in
> > > 1024+0 records out
> > > 1073741824 bytes (1.1 GB) copied, 27.8974 s, 38.5 MB/s
> > >
> > > Doing approximately 80 512k I/os per second (disk bandwidth).
> > >
> > > Buffered I/O:
> > >
> > > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=4096
> > > 4096+0 records in
> > > 4096+0 records out
> > > 4294967296 bytes (4.3 GB) copied, 427.872 s, 10.0 MB/s
> >
> > How big is sdb6? How many '2's do you see in
> >
> > factor `cat /sys/block/sdb/sdb6/size`
>
> There have always been problems with thsi and I'm not sure that anyone
> cared enough about buffered writes to blockdevs to get to the bottom of
> them.
>
> I assume you aren't running i386 highmem...

I've experienced the same kind of degradation with buffered IO vs
direct specifically when using Linux partitions. Using the full block
device doesn't create such fragmented IOs.

The problem was reported to the blktrace list some weeks ago by my
coworker (cc'ing Ming):
http://marc.info/?l=linux-btrace&m=120296070516776&w=2
(fyi, Ming forgot to use oflag=sync, this explains the weird results
when doing buffered writes while blktrace'ing)

To summarize a little more (without messing round with partition
alignment), the test system is x86_64 with 4GB, storage is directly
connected via aacraid, 7200 rpm SATA disk.

Using:
dd if=/dev/zero of=/dev/sdhX bs=1M oflag=sync count=4 seek=2
and
dd if=/dev/zero of=/dev/sdhX bs=1M oflag=direct count=4 seek=2

full disk case (sdh):
buffered writes are +8 and being merged to 3 512k requests, 1 8k and 1
504k (27MB/s)
odirect writes are all +512 (35MB/s)

partitioned case: a 3GB sdh1 and ~720GB sdh2.
buffered writes to partition1 are +1 and are merged to 65k requests (10.3MB/s)
buffered writes to partition2 are +2 and are merged to 130k requests (15.2MB/s)
odirect writes to either partition are all +512 (27MB/s)

So it appears partition size matters (at least in this case)?

As you can see performing buffered writes to a partition resulted in
very small requests, much like David reported in his original post (+1
or +2 via blktrace).

This happens with every kernel tried; 2.6.22, 2.6.24, RHEL5U1, etc.
cfq vs deadline doesn't change anything. For partitions, changing
partition alignment to a power of 2 actually hurt!?

Mike

2008-03-23 23:11:30

by Jeremy Higdon

[permalink] [raw]
Subject: Re: Buffered I/O to block device very slow and other SCSI issues...

On Thu, Mar 20, 2008 at 03:20:10AM -0700, Andrew Morton wrote:
> On Wed, 19 Mar 2008 18:08:07 -0700 Jeremy Higdon <[email protected]> wrote:
> > > I also suspect that CTQ has not been set up correctly on this
> > > kernel, because:
> > >
> > > $ cat /sys/block/sdb/device/queue_depth
> > > 3
> > > $ ls -l /sys/block/sdb/device/queue_depth
> > > -r--r--r-- 1 root root 0 Mar 20 09:59 /sys/block/sdb/device/queue_depth
> > > $
> > >
> > > It appears to be hard coded to 3 and can't be changed....
> >
> > That's a bug in the qla1280 driver. I thought that had gotten fixed.
> > It's looking at the wrong mailbox register after setting device parameters.
> >
>
> Was there a patch anywhere?

I thought so, but maybe not. I'll go looking for what happened and then
forward one, but it'll take a couple of weeks, unless I can get Jes to
do it, as I'm going on vacation/holiday for a week.

jeremy