LinuxLists.cc - Degraded I/O performance, since 2.5.41

2002-10-10 01:04:10

Subject: Degraded I/O performance, since 2.5.41

When I run a certain large webserver benchmark, I prefer to warm the
pagecache up with the file set first, to cheat a little :) I grep
through 20 different 500MB file sets in parallel to do this. It is a
_lot_ slower in the BK snapshot than in plain 2.5.41.

And, no, these numbers aren't inflated, I have a lot of fast disks. I
_can_ do 50MB/sec :)

A little snipped from vmstat (I cut out the boring columns):
good kernel: 2.5.41: vmstat 4
Cached bi bo bi cs us s id
389280 53284 7 1625 3235 12 88 0
600580 53489 19 1599 3264 11 89 0
813428 53891 0 1587 3256 12 88 0
1027260 54093 0 1609 3239 12 88 0
1241448 54183 0 1611 3251 11 89 0
1454036 53790 0 1618 3267 12 88 0
doing the entire 10GB grep takes 192 seconds.
a dd produces: ~48000 bi/sec

exact same grep operation on kernel: 2.5.41+yesterday's bk: vmstat 4
Cached bi bo bi cs us sy id
4855948 9697 1 1408 846 20 80 0
4890464 8745 0 1398 800 18 82 0
4922392 8077 55 1364 676 21 79 0
4959164 9296 1 1399 798 18 82 0
4995936 9315 0 1407 830 19 81 0
5027208 7931 0 1351 638 22 78 0
5066256 9855 9 1416 856 19 81 0
I was too impatient to wait on the greps to complete.
a dd produces: ~37800 bi/sec

So, bi/sec goes from 54,000 in 2.5.41, to ~8700 in yesterday's
snapshot. It goes from around 50MB/sec to about 8MB/sec.

Although vmstat shows 0% idle time, the profilers show lots of idle
time, 98%! I tried oprofile and readprofile. Is the 2.0.9 vmstat
still broken? I'm using idle=poll if that makes any difference.

--
Dave Hansen
[email protected]

2002-10-11 00:48:20

by Dave Hansen

[permalink] [raw]

Subject: Re: Degraded I/O performance, since 2.5.41

Doug Ledford wrote:
> I try to keep the drivers working
> at a basic level, but until I'm done, benchmarking is pretty much a waste
> of time I think)

Benchmarking is integral in what we're doing right now. We need to
make quick decisions about what is good or bad before the freeze.
This patch makes my machine unusable for anything that isn't in the
pagecache. A simple "make oldconfig" on a cold tree takes minutes to
complete. My grep test got an order of magnitude worse. If we have
to keep this code, can we just make the default queue HUGE for now?
Will that work around it?

A bunch of the AIO people use QLogic cards, which I'm sure are broken
by this as well. I'm going to back this patch out for all the testing
trees I do, and I suggest anyone who cares about I/O on SCSI
(excluding aic7xxx) after 2.5.41 do the same.

--
Dave Hansen
[email protected]

2002-10-11 01:39:53

by Dave Hansen

[permalink] [raw]

Subject: Re: Degraded I/O performance, since 2.5.41

Received: from localhost (nighthawk [127.0.0.1])
by nighthawk.sr71.net (8.11.6/8.11.6) with ESMTP id g9B19MU01846
for <dave@localhost>; Thu, 10 Oct 2002 18:09:22 -0700
Received: from imap.linux.ibm.com [9.27.103.44]
by localhost with IMAP (fetchmail-5.9.0)
for dave@localhost (multi-drop); Thu, 10 Oct 2002 18:09:22 -0700 (PDT)
Received: from localhost ([unix socket])
by imap.linux.ibm.com (Cyrus v2.1.9) with LMTP; Thu, 10 Oct 2002 21:09:26 -0400
X-Sieve: CMU Sieve 2.2
Received: from smtp.linux.ibm.com (linux.ibm.com [9.26.4.197])
by imap.linux.ibm.com (Postfix) with ESMTP id A15D57C017
for <[email protected]>; Thu, 10 Oct 2002 21:09:25 -0400 (EDT)
Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206])
by smtp.linux.ibm.com (Postfix) with ESMTP id 04B4F3FE06
for <[email protected]>; Thu, 10 Oct 2002 21:09:25 -0400 (EDT)
Received: from e2.ny.us.ibm.com (e2.esmtp.ibm.com [9.14.6.102])
by northrelay04.pok.ibm.com (8.12.3/NCO/VER6.4) with ESMTP id g9B19IKf137968
for <[email protected]>; Thu, 10 Oct 2002 21:09:22 -0400
Received: from pogo.mtv1.steeleye.com (host194.steeleye.com [66.206.164.34])
by e2.ny.us.ibm.com (8.12.2/8.12.2) with ESMTP id g9B19KHd065956
for <[email protected]>; Thu, 10 Oct 2002 21:09:20 -0400
Received: (from root@localhost)
by pogo.mtv1.steeleye.com (8.9.3/8.9.3) id SAA07896
for <[email protected]>; Thu, 10 Oct 2002 18:09:19 -0700
Received: from localhost.localdomain (vpn-233.mtv1.steeleye.com [172.16.1.233])
by pogo.mtv1.steeleye.com (8.9.3/8.9.3) with ESMTP id SAA07617;
Thu, 10 Oct 2002 18:09:16 -0700
Received: from mulgrave (jejb@localhost)
by localhost.localdomain (8.11.6/linuxconf) with ESMTP id g9B19FJ14530;
Thu, 10 Oct 2002 18:09:16 -0700
Message-Id: <[email protected]>
X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4
To: Dave Hansen <[email protected]>
Cc: [email protected]
Subject: Re: Degraded I/O performance, since 2.5.41
In-Reply-To: Message from Dave Hansen <[email protected]>
of "Thu, 10 Oct 2002 16:48:47 PDT." <[email protected]>
Mime-Version: 1.0
Content-Type: multipart/mixed ;
boundary="==_Exmh_-5647072960"
Date: Thu, 10 Oct 2002 18:09:14 -0700
From: James Bottomley <[email protected]>
X-AntiVirus: scanned for viruses by AMaViS 0.2.1 (http://amavis.org/)
X-Spam-Status: No, hits=-7.3 required=5.0
tests=IN_REP_TO,DOUBLE_CAPSWORD,UNIFIED_PATCH
version=2.31
X-Spam-Level:
X-Fetchmail-Warning: no recipient addresses matched declared local names

This is a multipart MIME message.

--==_Exmh_-5647072960
Content-Type: text/plain; charset=us-ascii

OK, this patch should fix it. Do your performance numbers for ips improve
again with this?

James

--==_Exmh_-5647072960
Content-Type: text/plain ; name="tmp.diff"; charset=us-ascii
Content-Description: tmp.diff
Content-Disposition: attachment; filename="tmp.diff"

===== drivers/scsi/scsi_scan.c 1.24 vs edited =====
--- 1.24/drivers/scsi/scsi_scan.c Tue Oct 8 15:45:57 2002
+++ edited/drivers/scsi/scsi_scan.c Thu Oct 10 17:40:53 2002
@@ -1477,11 +1477,14 @@
if (sdt->detect)
sdev->attached += (*sdt->detect) (sdev);

- if (sdev->host->hostt->slave_attach != NULL)
+ if (sdev->host->hostt->slave_attach != NULL) {
if (sdev->host->hostt->slave_attach(sdev) != 0) {
printk(KERN_INFO "scsi_add_lun: failed low level driver attach, setting device offline");
sdev->online = FALSE;
}
+ } else if(sdev->host->cmd_per_lun) {
+ scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);
+ }

if (sdevnew != NULL)
*sdevnew = sdev;

--==_Exmh_-5647072960--

Attachments:

cset-1.704.1.2.txt (38.80 kB)
james_bottomley_scsi_fix.patch (3.56 kB)
Download all attachments

2002-10-11 13:43:46

by David Jeffery

[permalink] [raw]

Subject: RE: Degraded I/O performance, since 2.5.41

> From: Dave Hansen
> James Bottomley wrote:
> > OK, this patch should fix it. Do your performance numbers
> for ips improve
> > again with this?
>
> Yes, they are better, but still about 10% below what I was seeing
> before. Thank you for getting this out so quickly. I can do
> reasonable work with this.
>
> Are the ServeRAID people aware of this situation? Do they know that
> their performance could be in the toliet if they don't
> implement queue
> resizing in the driver?
> --
> Dave Hansen
> [email protected]
>

Dave,

Thank you for for adding me to the CC list. I didn't realize how the
queueing work would affect scsi drivers.

I'm using an older 2.5 kernel so I hadn't seen the performance drop.
I'll update my kernel and get to work on it next week when I have
time.

David Jeffery

2002-10-12 17:40:32

by Mike Anderson

[permalink] [raw]

Subject: Re: Degraded I/O performance, since 2.5.41

Sorry if this is a resend it looked this did not make it onto the list
Friday.

Jeffery, David [[email protected]] wrote:
> > Yes, they are better, but still about 10% below what I was seeing
> > before. Thank you for getting this out so quickly. I can do
> > reasonable work with this.

Dave H,

The short term patch will not bring your queue_depth value back up all
the way. The ips drivers cmd_per_lun value is 16. The old method would
distribute the queue_depth among the devices on the adapter. I believe
your system has a ServeRAID 4M which means you could have had a
queue_depth as high as 95. This seems high but from your numbers a value
higher than 16 is most likely needed.

You can check your old and new values post booting the kernel if you have
sg configured in with:

cat /proc/scsi/sg/device_hdr /proc/scsi/sg/devices

>
> I'm using an older 2.5 kernel so I hadn't seen the performance drop.
> I'll update my kernel and get to work on it next week when I have

David J,
Here is a quick patch for you to review. I tested and seems to bring my
system back up to per patch queue depth values. I used a enq value to
simulate past behavior in the attach function. I tested with one logical
and two logical devices.

Here is some sg output at different kernel and logical driver values.

2.5.40 1 logical drive:
host chan id lun type opens qdepth busy online
0 0 0 0 0 3 31 0 1

2.5.40 2 logical drives:
host chan id lun type opens qdepth busy online
0 0 0 0 0 3 15 0 1
0 0 1 0 0 0 15 0 1

2.5 current 2 logical drives:
host chan id lun type opens qdepth busy online
0 0 0 0 0 3 1 0 1
0 0 1 0 0 0 1 0 1

2.5 current 2 logical drives + patch:
host chan id lun type opens qdepth busy online
0 0 0 0 0 3 15 0 1
0 0 1 0 0 0 7 0 1

2.5 current 2 logical drives + patch post some IO:
host chan id lun type opens qdepth busy online
0 0 0 0 0 3 15 0 1
0 0 1 0 0 0 15 0 1

-andmike
--
Michael Anderson
[email protected]

ips.c | 47 ++++++++++++++++-------------------------------
ips.h | 3 ++-
2 files changed, 18 insertions(+), 32 deletions(-)
------

--- 1.26/drivers/scsi/ips.c Tue Oct 8 00:51:39 2002
+++ edited/drivers/scsi/ips.c Fri Oct 11 13:24:08 2002
@@ -481,7 +481,6 @@
static void ips_free_flash_copperhead(ips_ha_t *ha);
static void ips_get_bios_version(ips_ha_t *, int);
static void ips_identify_controller(ips_ha_t *);
-static void ips_select_queue_depth(struct Scsi_Host *, Scsi_Device *);
static void ips_chkstatus(ips_ha_t *, IPS_STATUS *);
static void ips_enable_int_copperhead(ips_ha_t *);
static void ips_enable_int_copperhead_memio(ips_ha_t *);
@@ -1087,7 +1086,6 @@
sh->n_io_port = io_addr ? 255 : 0;
sh->unique_id = (io_addr) ? io_addr : mem_addr;
sh->irq = irq;
- sh->select_queue_depths = ips_select_queue_depth;
sh->sg_tablesize = sh->hostt->sg_tablesize;
sh->can_queue = sh->hostt->can_queue;
sh->cmd_per_lun = sh->hostt->cmd_per_lun;
@@ -1820,45 +1818,33 @@

/****************************************************************************/
/* */
-/* Routine Name: ips_select_queue_depth */
+/* Routine Name: ips_slave_attach */
/* */
/* Routine Description: */
/* */
-/* Select queue depths for the devices on the contoller */
+/* Configure the device we are attaching to this controller */
/* */
/****************************************************************************/
-static void
-ips_select_queue_depth(struct Scsi_Host *host, Scsi_Device *scsi_devs) {
- Scsi_Device *device;
+int
+ips_slave_attach(Scsi_Device *scsi_dev) {
ips_ha_t *ha;
- int count = 0;
- int min;
+ int queue_depth;
+ int min, per_logical;
+
+ ha = (ips_ha_t *) scsi_dev->host->hostdata;;

- ha = IPS_HA(host);
min = ha->max_cmds / 4;
+ per_logical = ( ha->max_cmds -1 ) / ha->enq->ucLogDriveCount;

- for (device = scsi_devs; device; device = device->next) {
- if (device->host == host) {
- if ((device->channel == 0) && (device->type == 0))
- count++;
- }
+ if ((scsi_dev->channel == 0) && (scsi_dev->type == 0)) {
+ queue_depth = max(per_logical, min);
+ } else {
+ queue_depth = 2;
}

- for (device = scsi_devs; device; device = device->next) {
- if (device->host == host) {
- if ((device->channel == 0) && (device->type == 0)) {
- device->queue_depth = ( ha->max_cmds - 1 ) / count;
- if (device->queue_depth < min)
- device->queue_depth = min;
- }
- else {
- device->queue_depth = 2;
- }
-
- if (device->queue_depth < 2)
- device->queue_depth = 2;
- }
- }
+ scsi_adjust_queue_depth(scsi_dev, 0, queue_depth);
+
+ return 0;
}

/****************************************************************************/
@@ -7407,7 +7393,6 @@
sh->n_io_port = io_addr ? 255 : 0;
sh->unique_id = (io_addr) ? io_addr : mem_addr;
sh->irq = irq;
- sh->select_queue_depths = ips_select_queue_depth;
sh->sg_tablesize = sh->hostt->sg_tablesize;
sh->can_queue = sh->hostt->can_queue;
sh->cmd_per_lun = sh->hostt->cmd_per_lun;
--- 1.9/drivers/scsi/ips.h Tue Oct 8 00:51:39 2002
+++ edited/drivers/scsi/ips.h Fri Oct 11 11:39:14 2002
@@ -62,6 +62,7 @@
extern int ips_biosparam(Disk *, struct block_device *, int *);
extern const char * ips_info(struct Scsi_Host *);
extern void do_ips(int, void *, struct pt_regs *);
+ extern int ips_slave_attach(Scsi_Device *);

/*
* Some handy macros
@@ -481,7 +482,7 @@
eh_host_reset_handler : ips_eh_reset, \
abort : NULL, \
reset : NULL, \
- slave_attach : NULL, \
+ slave_attach : ips_slave_attach, \
bios_param : ips_biosparam, \
can_queue : 0, \
this_id: -1, \