2002-11-06 04:07:01

by Adam Kropelin

[permalink] [raw]
Subject: 2.5.46: ide-cd cdrecord (almost) success report

I felt like making some coasters tonight so I figured it was a good time
to give ide-cd a chance with cdrecord. For the most part, it worked
great! I was stunned, in fact, by how smoothly it went after I upgraded
my cdtools. I was running SMP + preempt and it wrote smoothly at 12x
with < 2% CPU usage the whole time. Buffer capacity never dropped below
98%.

I was almost disappointed at not making any coasters so I decided to
stress it a bit. I tried running 'make -j10 bzImage' on the 2.5.46
kernel tree while it was writing and that also worked! Buffer never
dropped below 96%. (This machine is 2x Xeon 450, 256 MB RAM, BTW.)

Still without coaster I tried one more thing...
'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
did it in. ;) I'm running ext3 and the writeout load totally killed
burn, which isn't surprising. I was asking for it, I know. What happened
when the cdrecord buffer underran was surprising, though: oops below.
Very repeatable. Can supply copious hw details if it helps.

--Adam

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c02464ad
*pde = 00000000
Oops: 0000
eepro100 mii
CPU: 0
EIP: 0060:[<c02464ad>] Not tainted
EFLAGS: 00010006
EIP is at ide_outsw+0xd/0x20
eax: 00007c00 ebx: 0000f801 ecx: 00007c00 edx: 00000170
esi: 00000000 edi: c0557d74 ebp: 00000000 esp: c04bfeb0
ds: 0068 es: 0068 ss: 0068
Process swapper (pid: 0, threadinfo=c04be000 task=c0491900)
Stack: c0557e20 c02469aa 00000170 00000000 00007c00 0000f801 c0557e20 c0557d74
00000000 c0246a7c c0557e20 00000000 00003e00 0000f800 c13820e0 00000000
c0557e20 c025571c c0557e20 00000000 0000f800 c13ad3a0 c0246a40 0000f800
Call Trace:
[<c02469aa>] ata_output_data+0xba/0xc0
[<c0246a7c>] atapi_output_bytes+0x3c/0x90
[<c025571c>] cdrom_newpc_intr+0x2ac/0x4e0
[<c0246a40>] atapi_output_bytes+0x0/0x90
[<c024cee4>] ide_intr+0x124/0x1e0
[<c0255470>] cdrom_newpc_intr+0x0/0x4e0
[<c010b595>] handle_IRQ_event+0x45/0x70
[<c010b834>] do_IRQ+0xb4/0x150
[<c0107070>] default_idle+0x0/0x50
[<c0107070>] default_idle+0x0/0x50
[<c0109e88>] common_interrupt+0x18/0x20
[<c0107070>] default_idle+0x0/0x50
[<c0107070>] default_idle+0x0/0x50
[<c010709a>] default_idle+0x2a/0x50
[<c010713a>] cpu_idle+0x3a/0x50
[<c0105000>] stext+0x0/0x30

Code: f3 66 6f 5e c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 8b
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing



2002-11-06 07:15:56

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Tue, Nov 05 2002, Adam Kropelin wrote:
> I felt like making some coasters tonight so I figured it was a good time
> to give ide-cd a chance with cdrecord. For the most part, it worked
> great! I was stunned, in fact, by how smoothly it went after I upgraded
> my cdtools. I was running SMP + preempt and it wrote smoothly at 12x
> with < 2% CPU usage the whole time. Buffer capacity never dropped below
> 98%.
>
> I was almost disappointed at not making any coasters so I decided to
> stress it a bit. I tried running 'make -j10 bzImage' on the 2.5.46
> kernel tree while it was writing and that also worked! Buffer never
> dropped below 96%. (This machine is 2x Xeon 450, 256 MB RAM, BTW.)

That sounds very good so far!

> Still without coaster I tried one more thing...
> 'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
> did it in. ;) I'm running ext3 and the writeout load totally killed
> burn, which isn't surprising. I was asking for it, I know. What happened

Really, this should work. The deadline scheduler should handle this just
fine in fact. Which device is your burner and which device is the hard
drive? It sounds like a bug.

> when the cdrecord buffer underran was surprising, though: oops below.
> Very repeatable. Can supply copious hw details if it helps.

I'll try and reproduce that here, there's been a similar report (same
oops) before. If you can just send me the dmesg output after a boot that
should be fine.

Thanks a lot for testing, it's this kind of testing I need to iron out
the last few bugs!

--
Jens Axboe

2002-11-06 07:38:29

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06 2002, Jens Axboe wrote:
> > when the cdrecord buffer underran was surprising, though: oops below.
> > Very repeatable. Can supply copious hw details if it helps.
>
> I'll try and reproduce that here, there's been a similar report (same
> oops) before. If you can just send me the dmesg output after a boot that
> should be fine.

Could you reproduce with this patch? I'd like to see the request state
when this happens.

===== drivers/ide/ide-cd.c 1.32 vs edited =====
--- 1.32/drivers/ide/ide-cd.c Sun Nov 3 19:57:35 2002
+++ edited/drivers/ide/ide-cd.c Wed Nov 6 08:44:04 2002
@@ -1722,6 +1722,11 @@
blen = bio_iovec(rq->bio)->bv_len;
}

+ if (!ptr) {
+ blk_dump_rq_flags(rq, "cdrom_newpc_intr");
+ break;
+ }
+
if (blen > thislen)
blen = thislen;


--
Jens Axboe

2002-11-06 15:50:28

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 08:22:23AM +0100, Jens Axboe wrote:
> On Tue, Nov 05 2002, Adam Kropelin wrote:
> > Still without coaster I tried one more thing...
> > 'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
> > did it in. ;) I'm running ext3 and the writeout load totally killed
> > burn, which isn't surprising. I was asking for it, I know. What happened
>
> Really, this should work. The deadline scheduler should handle this just
> fine in fact. Which device is your burner and which device is the hard
> drive? It sounds like a bug.

Hard disk is sdc on onboard AIC7xxx.
Writer is hdc, the only device on the secondary onboard IDE channel.
All other disks (IDE & SCSI) were idle during the test.

Source image for the burn was on sdc3 and a directory on the same
partition was also the target for the 'dd'.

The system was exhibiting the usual spotty ext3 large writeout behavior
as has been discussed on the list before. Basically, vmstat 1 looks
something like this (captured from a different test on a different
machine but shows the same pattern):

0 1 1 120 4464 0 140856 0 0 0 15420 6235 520 0 42 58
0 1 1 120 4456 0 140856 0 0 0 3240 1094 36 0 2 98
1 0 0 120 4428 0 140844 0 0 0 52 1151 70 0 4 96
1 0 0 120 4440 0 141356 0 0 0 4 6810 541 1 42 57
0 0 0 120 4464 0 141320 0 0 0 0 6894 553 1 40 58
0 1 1 120 4396 0 140840 0 0 0 15508 6018 466 0 40 59
0 1 1 120 4388 0 140840 0 0 0 1608 1093 57 0 2 98
0 0 0 120 4404 0 140832 0 0 0 52 2350 165 0 12 87
0 0 0 120 4460 0 141380 0 0 0 4 7040 564 1 42 57
1 0 0 120 4356 0 141372 0 0 0 4 7073 570 1 45 54

> I'll try and reproduce that here, there's been a similar report (same
> oops) before. If you can just send me the dmesg output after a boot that
> should be fine.

Will do when I get home (now +9 hrs or so)...

> Thanks a lot for testing, it's this kind of testing I need to iron out
> the last few bugs!

No sweat. I've got three spindles of blank media left. ;)

--Adam, who remembers buying his first blank CDR for USD 19.95 and is
thankful CD writer testing is no longer so expensive.

2002-11-06 18:05:18

by Patrick Mansfield

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 10:56:56AM -0500, Adam Kropelin wrote:
> On Wed, Nov 06, 2002 at 08:22:23AM +0100, Jens Axboe wrote:
> > On Tue, Nov 05 2002, Adam Kropelin wrote:
> > > Still without coaster I tried one more thing...
> > > 'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
> > > did it in. ;) I'm running ext3 and the writeout load totally killed
> > > burn, which isn't surprising. I was asking for it, I know. What happened
> >
> > Really, this should work. The deadline scheduler should handle this just
> > fine in fact. Which device is your burner and which device is the hard
> > drive? It sounds like a bug.
>
> Hard disk is sdc on onboard AIC7xxx.
> Writer is hdc, the only device on the secondary onboard IDE channel.
> All other disks (IDE & SCSI) were idle during the test.

What queue depth is the AIC setting?

SCSI in 2.5.x no longer copies the request, so if you have a queue
depth larger than the allocated requests there might not be
any free requests left for the blk layer to play with.

AIC default queue depth is 253 (with 2.5.46 queue depth can be set to 1
if scsi is all in kernel, or no upper drivers are available when the
host adapter is intialized), it should auto lock it to the right depth,
but right now it does not notify the upper layer of the queue depth
change (no call to scsi_adjust_queue_depth when locking in the new
queue depth). You can modify .config, or pass boot/module options to
lower it.

-- Patrick Mansfield

2002-11-06 18:38:35

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06 2002, Adam Kropelin wrote:
> On Wed, Nov 06, 2002 at 08:22:23AM +0100, Jens Axboe wrote:
> > On Tue, Nov 05 2002, Adam Kropelin wrote:
> > > Still without coaster I tried one more thing...
> > > 'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
> > > did it in. ;) I'm running ext3 and the writeout load totally killed
> > > burn, which isn't surprising. I was asking for it, I know. What happened
> >
> > Really, this should work. The deadline scheduler should handle this just
> > fine in fact. Which device is your burner and which device is the hard
> > drive? It sounds like a bug.
>
> Hard disk is sdc on onboard AIC7xxx.
> Writer is hdc, the only device on the secondary onboard IDE channel.
> All other disks (IDE & SCSI) were idle during the test.

Ah ok, yes on SCSI it's very easy to starve requests. There's no good
way to control this yet, unfortunately. Please set max number of tags to
2-4 or so, and you should not be able to kill the burn.

You wont see better performance with more tags than 4 anyways, and you
risk god awful latencies. So it's a good choice, regardless.

> > I'll try and reproduce that here, there's been a similar report (same
> > oops) before. If you can just send me the dmesg output after a boot that
> > should be fine.
>
> Will do when I get home (now +9 hrs or so)...

Great, thanks.

--
Jens Axboe

2002-11-06 23:26:52

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 10:11:44AM -0800, Patrick Mansfield wrote:
> On Wed, Nov 06, 2002 at 10:56:56AM -0500, Adam Kropelin wrote:
> > On Wed, Nov 06, 2002 at 08:22:23AM +0100, Jens Axboe wrote:
> > > On Tue, Nov 05 2002, Adam Kropelin wrote:
> > > > Still without coaster I tried one more thing...
> > > > 'dd if=/dev/zero of=foo bs=1M' in parallel with another burn. That one
> > > > did it in. ;) I'm running ext3 and the writeout load totally killed
> > > > burn, which isn't surprising. I was asking for it, I know. What happened
> > >
> > > Really, this should work. The deadline scheduler should handle this just
> > > fine in fact. Which device is your burner and which device is the hard
> > > drive? It sounds like a bug.
> >
> > Hard disk is sdc on onboard AIC7xxx.
> > Writer is hdc, the only device on the secondary onboard IDE channel.
> > All other disks (IDE & SCSI) were idle during the test.
>
> What queue depth is the AIC setting?
>
> SCSI in 2.5.x no longer copies the request, so if you have a queue
> depth larger than the allocated requests there might not be
> any free requests left for the blk layer to play with.
>
> AIC default queue depth is 253 (with 2.5.46 queue depth can be set to 1

Are you talking tcq depth here? Best as I can tell, 2.5.46 defaults to
16. Lowering it to 2 doesn't seem to help.

> queue depth). You can modify .config, or pass boot/module options to
> lower it.

If I've misunderstood you, please clue me in and I'll give it a shot...

--Adam

2002-11-06 23:18:54

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 08:44:57AM +0100, Jens Axboe wrote:
> On Wed, Nov 06 2002, Jens Axboe wrote:
> > > when the cdrecord buffer underran was surprising, though: oops below.
> > > Very repeatable. Can supply copious hw details if it helps.
> >
> > I'll try and reproduce that here, there's been a similar report (same
> > oops) before. If you can just send me the dmesg output after a boot that
> > should be fine.
>
> Could you reproduce with this patch? I'd like to see the request state
> when this happens.

Here you go... This was with max tags locked at 4. I've included some of
the surrounding lines from cdrecord.

--Adam

Track 01: 85 of 437 MB written (fifo 0%) [buf 31%] 3.3x./opt/schily/bin/cdrecord: Input/output error. write_g1: scsi sendcmd: no error
CDB: 2A 00 00 00 AA BE 00 00 1F 00
status: 0x1 (GOOD STATUS)
resid: 63488
cmd finished after 0.011s timeout 40s

write track data: error after 89518080 bytes
cdrom_newpc_intr: dev hdc: flags = REQ_RW REQ_NOMERGE REQ_STARTED REQ_BLOCK_PC REQ_FAILED REQ_QUIET
sector 0, nr/cnr 124/8
bio 00000000, biotail c1cd0860, buffer c968d000, data 00000000, len 63488
cdb: 2a 00 00 00 aa be 00 00 1f 00 00 00 00 00 00 00
hdc: padding 63488 bytes
Sense Bytes: 70 00 00 00 00 00 00 12 00 00 00 00 00 00 00 00 00 00

2002-11-06 23:16:48

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 07:45:08PM +0100, Jens Axboe wrote:
> On Wed, Nov 06 2002, Adam Kropelin wrote:
> > Hard disk is sdc on onboard AIC7xxx.
>
> Ah ok, yes on SCSI it's very easy to starve requests. There's no good
> way to control this yet, unfortunately. Please set max number of tags to
> 2-4 or so, and you should not be able to kill the burn.

Tags set to 2 or 4 seem to make no difference. I'll keep
experimenting...

> > > I'll try and reproduce that here, there's been a similar report (same
> > > oops) before. If you can just send me the dmesg output after a boot that
> > > should be fine.
> >
> > Will do when I get home (now +9 hrs or so)...

dmesg sent privately...

--Adam

2002-11-06 23:46:12

by Patrick Mansfield

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 06:33:25PM -0500, Adam Kropelin wrote:
> On Wed, Nov 06, 2002 at 10:11:44AM -0800, Patrick Mansfield wrote:

> > What queue depth is the AIC setting?
> >
> > SCSI in 2.5.x no longer copies the request, so if you have a queue
> > depth larger than the allocated requests there might not be
> > any free requests left for the blk layer to play with.
> >
> > AIC default queue depth is 253 (with 2.5.46 queue depth can be set to 1
>
> Are you talking tcq depth here? Best as I can tell, 2.5.46 defaults to
> 16. Lowering it to 2 doesn't seem to help.

Yes, but not the IDE code (I have know idea what IDE defaults to). If you
have sg its proc interface can dump them, but AIC should default to 253,
on my system with AIC adapter (not the aic7xxx_old adapter, it looks like
it defaults to 32) I typically see:

cat /proc/scsi/sg/device_hdr /proc/scsi/sg/devices
host chan id lun type opens qdepth busy online
0 0 0 0 0 2 253 0 1
0 0 1 0 0 1 253 0 1
0 0 15 0 3 0 2 0 1

The last device does not support tcq (id 15), so it gets a depth of 2.

The AIC driver also prints the depth on init/boot like so:

scsi0:A:0:0: Tagged Queuing enabled. Depth 253

My .config has (default cmd per device):

CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=253
CONFIG_AIC7XXX_RESET_DELAY_MS=15000

-- Patrick Mansfield

2002-11-06 23:58:30

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06, 2002 at 03:52:35PM -0800, Patrick Mansfield wrote:
> On Wed, Nov 06, 2002 at 06:33:25PM -0500, Adam Kropelin wrote:
> > On Wed, Nov 06, 2002 at 10:11:44AM -0800, Patrick Mansfield wrote:
>
> > > What queue depth is the AIC setting?
> > >
> > > SCSI in 2.5.x no longer copies the request, so if you have a queue
> > > depth larger than the allocated requests there might not be
> > > any free requests left for the blk layer to play with.
> > >
> > > AIC default queue depth is 253 (with 2.5.46 queue depth can be set to 1
> >
> > Are you talking tcq depth here? Best as I can tell, 2.5.46 defaults to
> > 16. Lowering it to 2 doesn't seem to help.
>
> CONFIG_SCSI_AIC7XXX=y
> CONFIG_AIC7XXX_CMDS_PER_DEVICE=253
> CONFIG_AIC7XXX_RESET_DELAY_MS=15000

Ok, that is indeed the setting I was changing. I've been carrying this
same .config with my for $BIGNUM kernel versions so I must have lowered
it to 16 sometime in the past.

Even setting queue depth to 1 didn't affect the cdrecord problem.

--Adam

2002-11-07 14:26:48

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord (almost) success report

On Wed, Nov 06 2002, Adam Kropelin wrote:
> On Wed, Nov 06, 2002 at 08:44:57AM +0100, Jens Axboe wrote:
> > On Wed, Nov 06 2002, Jens Axboe wrote:
> > > > when the cdrecord buffer underran was surprising, though: oops below.
> > > > Very repeatable. Can supply copious hw details if it helps.
> > >
> > > I'll try and reproduce that here, there's been a similar report (same
> > > oops) before. If you can just send me the dmesg output after a boot that
> > > should be fine.
> >
> > Could you reproduce with this patch? I'd like to see the request state
> > when this happens.
>
> Here you go... This was with max tags locked at 4. I've included some of
> the surrounding lines from cdrecord.

ok

> --Adam
>
> Track 01: 85 of 437 MB written (fifo 0%) [buf 31%] 3.3x./opt/schily/bin/cdrecord: Input/output error. write_g1: scsi sendcmd: no error
> CDB: 2A 00 00 00 AA BE 00 00 1F 00
> status: 0x1 (GOOD STATUS)
> resid: 63488
> cmd finished after 0.011s timeout 40s
>
> write track data: error after 89518080 bytes
> cdrom_newpc_intr: dev hdc: flags = REQ_RW REQ_NOMERGE REQ_STARTED REQ_BLOCK_PC REQ_FAILED REQ_QUIET
> sector 0, nr/cnr 124/8
> bio 00000000, biotail c1cd0860, buffer c968d000, data 00000000, len 63488
> cdb: 2a 00 00 00 aa be 00 00 1f 00 00 00 00 00 00 00
> hdc: padding 63488 bytes
> Sense Bytes: 70 00 00 00 00 00 00 12 00 00 00 00 00 00 00 00 00 00

>From the ide-cd POV, the best we can do here is just try to continue
safely in case of error. So a slight modification to last patch should
be fine, I think.

At least that is the atapi part of.

===== drivers/ide/ide-cd.c 1.32 vs edited =====
--- 1.32/drivers/ide/ide-cd.c Sun Nov 3 19:57:35 2002
+++ edited/drivers/ide/ide-cd.c Thu Nov 7 15:25:05 2002
@@ -1722,6 +1722,11 @@
blen = bio_iovec(rq->bio)->bv_len;
}

+ if (!ptr) {
+ printk("%s: confused, missing data\n", drive->name);
+ break;
+ }
+
if (blen > thislen)
blen = thislen;

@@ -1741,8 +1746,6 @@
* pad, if necessary
*/
if (len) {
- printk("%s: padding %u bytes\n", drive->name, len);
-
while (len) {
int pad = 0;


--
Jens Axboe