2001-10-25 10:06:57

by Christian Hammers

[permalink] [raw]
Subject: BUG() in asm/pci.h:142 with 2.4.13

Hello

My system crashed several times now with 2.4.11-pre6 and 2.4.13
(pre6 because it was the first one I got that fixed some 2GB RAM memory
allocation bug).

2.4.13 was the easiest one to reproduce: when starting the tape backup
to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
(Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
two PIII and 2GB of RAM it crashed immediately with the error attached
below. The machine was under "stresstest-simulation" load at this time.

The tape_backup.pl uses the "mt" and "cpio" commands to access /dev/nst0.

Maybe worth noting is, that the system crashed another time yesterday
after replacing the external SCSI RAID Chassis/Controller (not the
disks in it) and just this moment with another message (see below).

Any help or hints appreciated!
[please keep me Cc'ed as I'm not subscribed to this list]

bye,

-christian-


kernel: kernel BUG at /usr/local/src/kernel/linux-2.4.13/include/asm/pci.h:142!
kernel: invalid operand: 0000
kernel: CPU: 1
kernel: EIP: 0010:[ahc_linux_run_device_queue+899/2144] Not tainted
kernel: EFLAGS: 00010082
kernel: eax: 00000048 ebx: f7bb5650 ecx: c0275a88 edx: 00010071
kernel: esi: c5915a30 edi: 00000000 ebp: c5915a30 esp: e9ae3e14
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process tape_backup.pl (pid: 4366, stackpage=e9ae3000)
kernel: Stack: c024e100 0000008e f7bbec00 e9ae3e6c 00000000 00000000 f5358de0 0000000e
kernel: f7bbec10 00000007 00000007 401af000 41ffffff 00000004 c5915600 c01b0e09
kernel: f7bbec00 c301fee0 00000202 d35ce1d4 c5915600 f7bbfa20 00000096 c01a5f76
kernel: Call Trace: [ahc_linux_queue+361/424] [scsi_dispatch_cmd+354/632] [scsi_done+0/200] [scsi_request_fn+752/820] [__scsi_insert_special+110/128]
kernel: [scsi_insert_special_req+26/32] [scsi_do_req+284/324] [<f8a8940b>] [<f8a89240>] [<f8a8aad1>] [sys_write+143/196]
kernel: [system_call+51/56]
kernel:
kernel: Code: 0f 0b eb 18 90 83 7e 04 00 75 14 68 90 00 00 00 68 00 e1 24

#
# The output from the other SCSI crash. This came from remote syslogging
# and console.
#

kernel: scsi0:0:0:0: Attempting to queue an ABORT message
kernel: (scsi0:A:0:0): Queuing a recovery SCB
kernel: scsi0:0:0:0: Device is disconnected, re-queuing SCB
kernel: Recovery code sleeping
kernel: (scsi0:A:0:0): Abort Tag Message Sent
kernel: (scsi0:A:0:0): SCB 153 - Abort Tag Completed.
kernel: Recovery SCB completes
kernel: Recovery code awake
kernel: aic7xxx_abort returns 8194
kernel: scsi0:0:0:0: Attempting to queue an ABORT message



Some more debugging help:

mtv-server:/usr/local/src/kernel/linux-2.4.13/include/asm# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT82C691 [Apollo PRO] (rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev
06)
00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 1a)
00:07.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 1a)
00:07.4 SMBus: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0a.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:0c.0 SCSI storage controller: Adaptec 7892A (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP (rev
27)

mtv-server:~$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: easyRAID Model: U3 Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: HP Model: C1537A Rev: L708
Type: Sequential-Access ANSI SCSI revision: 02


--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified


2001-10-25 11:20:16

by Jens Axboe

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13

On Thu, Oct 25 2001, Christian Hammers wrote:
> Hello
>
> My system crashed several times now with 2.4.11-pre6 and 2.4.13
> (pre6 because it was the first one I got that fixed some 2GB RAM memory
> allocation bug).
>
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.
>
> The tape_backup.pl uses the "mt" and "cpio" commands to access /dev/nst0.
>
> Maybe worth noting is, that the system crashed another time yesterday
> after replacing the external SCSI RAID Chassis/Controller (not the
> disks in it) and just this moment with another message (see below).

Could you try this patch and see if it fixes the pci.h BUG at least?

--
Jens Axboe


Attachments:
(No filename) (964.00 B)
sg-page-1 (327.00 B)
Download all attachments

2001-10-25 17:23:47

by Christian Hammers

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13

Hello

On Thu, Oct 25, 2001 at 01:11:07PM +0200, Jens Axboe wrote:
> > 2.4.13 was the easiest one to reproduce: when starting the tape backup
> > to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
> > (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> > two PIII and 2GB of RAM it crashed immediately with the error attached
> > below. The machine was under "stresstest-simulation" load at this time.

> Could you try this patch and see if it fixes the pci.h BUG at least?
This patch did not prevent the crash. Again immediately after rewinding the
tape when it began to write. I'll try now the 2.4.12-ac6... and it works.

> Jens Axboe
bye,

-christian- (happy about Alan having forked the kernel tree once ago..)

--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified

2001-10-25 17:32:47

by Jens Axboe

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13

On Thu, Oct 25 2001, Christian Hammers wrote:
> Hello
>
> On Thu, Oct 25, 2001 at 01:11:07PM +0200, Jens Axboe wrote:
> > > 2.4.13 was the easiest one to reproduce: when starting the tape backup
> > > to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
> > > (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> > > two PIII and 2GB of RAM it crashed immediately with the error attached
> > > below. The machine was under "stresstest-simulation" load at this time.
>
> > Could you try this patch and see if it fixes the pci.h BUG at least?
> This patch did not prevent the crash. Again immediately after rewinding the
> tape when it began to write. I'll try now the 2.4.12-ac6... and it works.

Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
look.

--
Jens Axboe

2001-10-25 17:47:08

by Christian Hammers

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13

Hello

On Thu, Oct 25, 2001 at 07:32:48PM +0200, Jens Axboe wrote:
> > This patch did not prevent the crash. Again immediately after rewinding the
> > tape when it began to write. I'll try now the 2.4.12-ac6... and it works.
> Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> look.

The 2.4.12-ac6 crashed, too, when I killed the dd and cpio processes
with SIGKILL. I got the extra scsi queue debug information on the console
but it was too much to write down. I now have a serial connection to
another computer and did "ln /dev/ttyS1 /dev/console" in the hope to be
able to save you all kernel output.

-christian-

--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified

2001-10-25 20:10:08

by Christian Hammers

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13

Hello

Now it crashed again when writing the tape but after writing 1GB to it.
This time I could capture the output of the extra new-queue debugging
output that I enabled in the kernel configuration (this time 2.4.12-ac6).

For the linux-scsi guys: if you need more information please take a look
at my previous posts to linux-kernel (same subject) or contact me directly.

bye,

-christian-

On Thu, Oct 25, 2001 at 12:07:01PM +0200, Christian Hammers wrote:
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.

#
# console dump via minicom and serial line
#
/USR/SBIN/CRON[10591]: (root) CMD
(/usr/local/maint/watchdog)
kernel: scsi0:0:0:0: Attempting to queue an
ABORT message
kernel: scsi0: Dumping Card State while idle, at
SEQADDR 0x8
kernel: ACCUM = 0x0, SINDEX = 0x20, DINDEX =
0xe4, ARG_2 = 0x0
kernel: HCNT = 0x0
kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
kernel: DFCNTRL = 0x0, DFSTATUS = 0x89
kernel: LASTPHASE = 0x1, SCSISIGI = 0x0,
SXFRCTL0 = 0x80
kernel: SSTAT0 = 0x0, SSTAT1 = 0x8
kernel: SCSIPHASE = 0x0
kernel: STACK == 0x3, 0x108, 0x160, 0x0
kernel: SCB count = 248
kernel: Kernel NEXTQSCB = 62
kernel: Card NEXTQSCB = 62
kernel: QINFIFO entries:
kernel: Waiting Queue entries:
kernel: Disconnected Queue entries: 6:150 12:210
2:178 29:28 23:181 13:61 24:7
kernel: QOUTFIFO entries:
kernel: Sequencer Free SCB List: 22 26 9 4 11 19
10 16 28 1 8 5 27 7 20 31 0 30
25 17 21 3 14 15 18
kernel: Pending list: 150, 210, 178, 28, 181,
61, 7
kernel: Kernel Free SCB list: 32 77 79 149 198
171 223 152 140 105 189 151 78 66
199 88 6 224 138 177 67 84 194 191 23 246 215 24 160 185 225 230 93 174 49
241 110 2 20 147 170 240 33 59
243 99 54 175 19 176 18 192 76 100 190 238 108 8 159 208 207 60 242 217 56
221 1 17 213 92 127 70 162 74 19
7 142 239 196 82 124 29 235 134 232 123 179 218 139 211 117 3 119 57 219
125 122 209 101 44 155 45 39 212 1
28 233 202 158 91 187 46 0 180 182 201 109 118 228 131 12 4 112 229 200 236
173 132 247 97 186 148 55 216 1
33 144 113 231 30 63 37 137 206 156 83 146 135 141 161 64 165 98 35 234 166
81 9 10 214 43 58 111 71 115 10
6 85 183 72 11 204 172 157 130 47 154 188 226 90 220 96 107 27 227 145 40
87 22 94 129 48 205 65 120 73 163
69 26 41 86 103 68 169 53 5 237 167 42 51 195 15 38 80 13 168 21 89 52 16
114 50 193 36 136 75 25 34 95 14
153 203 126 222 116 31 143 104 164 121 102 184 245 244
kernel: DevQ(0:0:0): 0 waiting
kernel: DevQ(0:2:0): 0 waiting
kernel: (scsi0:A:0:0): Queuing a recovery SCB
kernel: scsi0:0:0:0: Device is disconnected,
re-queuing SCB
kernel: Recovery code sleeping
kernel: (scsi0:A:0:0): Abort Tag Message Sent
kernel: (scsi0:A:0:0): SCB 7 - Abort Tag
Completed.
kernel: Recovery SCB completes
kernel: Recovery code awake
kernel: aic7xxx_abort returns 0x2002
sendmail[349]: rejecting connections on daemon
MTA: load average: 83
<80>x?x?<80>x?<80>x<?<80>xx<x?<80><80><80>?x<?x?<80>x?<80>x<??x??x??x<80><80><80><80><80>x???<80><80><80>


--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified

2001-10-26 00:26:00

by David Miller

[permalink] [raw]
Subject: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)

From: Jens Axboe <[email protected]>
Date: Thu, 25 Oct 2001 19:32:48 +0200

On Thu, Oct 25 2001, Christian Hammers wrote:
> This patch did not prevent the crash. Again immediately after rewinding the
> tape when it began to write. I'll try now the 2.4.12-ac6... and it works.

Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
look.

Can people try out this patch? I believe this will fix the bug.

--- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
+++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001
@@ -3233,6 +3233,7 @@
break;
}
}
+ tb->sg[0].page = NULL;
if (tb->sg[segs].address == NULL) {
kfree(tb);
tb = NULL;
@@ -3264,6 +3265,7 @@
tb = NULL;
break;
}
+ tb->sg[segs].page = NULL;
tb->sg[segs].length = b_size;
got += b_size;
segs++;
@@ -3337,6 +3339,7 @@
normalize_buffer(STbuffer);
return FALSE;
}
+ STbuffer->sg[segs].page = NULL;
STbuffer->sg[segs].length = b_size;
STbuffer->sg_segs += 1;
got += b_size;

2001-10-26 01:24:24

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)


David,

Is this waht's causing the earlier bug I reported in 2.4.10? If so
where is this patch so I can see if it fixes the problem.

Thanks,

Jeff


On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Thu, 25 Oct 2001 19:32:48 +0200
>
> On Thu, Oct 25 2001, Christian Hammers wrote:
> > This patch did not prevent the crash. Again immediately after rewinding the
> > tape when it began to write. I'll try now the 2.4.12-ac6... and it works.
>
> Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> look.
>
> Can people try out this patch? I believe this will fix the bug.
>
> --- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
> +++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001
> @@ -3233,6 +3233,7 @@
> break;
> }
> }
> + tb->sg[0].page = NULL;
> if (tb->sg[segs].address == NULL) {
> kfree(tb);
> tb = NULL;
> @@ -3264,6 +3265,7 @@
> tb = NULL;
> break;
> }
> + tb->sg[segs].page = NULL;
> tb->sg[segs].length = b_size;
> got += b_size;
> segs++;
> @@ -3337,6 +3339,7 @@
> normalize_buffer(STbuffer);
> return FALSE;
> }
> + STbuffer->sg[segs].page = NULL;
> STbuffer->sg[segs].length = b_size;
> STbuffer->sg_segs += 1;
> got += b_size;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-26 01:33:37

by Christian Hammers

[permalink] [raw]
Subject: Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)

Hi

Some addition: the kernel (at least the 2.4.11-pre6 worked well with the
tape streamer before the day I replaced the external RAID chassis (broken
display) with a new one. My personal guess in this case is that the
new RAID has a different firmware and maybe a bug that triggers the crash-
condition whenever a second device (here the scsi tape) tries to use the
bus, too.

Would this scenario fit into your idea of the bug?

bye,

-christian-

On Thu, Oct 25, 2001 at 07:26:48PM -0700, Jeff V. Merkey wrote:
> > Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> > look.
> >
> > Can people try out this patch? I believe this will fix the bug.
> >
> > --- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
> > +++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001
> > @@ -3233,6 +3233,7 @@

--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified

2001-10-26 01:32:34

by David Miller

[permalink] [raw]
Subject: Re: SCSI tape crashes

From: "Jeff V. Merkey" <[email protected]>
Date: Thu, 25 Oct 2001 19:26:48 -0700

Is this waht's causing the earlier bug I reported in 2.4.10? If so
where is this patch so I can see if it fixes the problem.

The patch was in the email, can't you read? :-)
(You even quoted the patch in your reply!)

Anyways, my patch isn't relevant to your problem since the
bug I am fixing only can exist in 2.4.13 and later kernels.
Sorry.


On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
> Can people try out this patch? I believe this will fix the bug.
>
> --- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
> +++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001

Franks a lot,
David S. Miller
[email protected]

2001-10-26 02:53:30

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: SCSI tape crashes



David,

Thanks. I'll wait to hear from Kai since it looks like something
related to his code. I did find the patch, but was using mutt,
and you were correct, it was attached at the end of the email.

:-)

Jeff


On Thu, Oct 25, 2001 at 06:32:48PM -0700, David S. Miller wrote:
> From: "Jeff V. Merkey" <[email protected]>
> Date: Thu, 25 Oct 2001 19:26:48 -0700
>
> Is this waht's causing the earlier bug I reported in 2.4.10? If so
> where is this patch so I can see if it fixes the problem.
>
> The patch was in the email, can't you read? :-)
> (You even quoted the patch in your reply!)
>
> Anyways, my patch isn't relevant to your problem since the
> bug I am fixing only can exist in 2.4.13 and later kernels.
> Sorry.
>
>
> On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
> > Can people try out this patch? I believe this will fix the bug.
> >
> > --- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
> > +++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001
>
> Franks a lot,
> David S. Miller
> [email protected]

2001-10-28 01:34:07

by Pete Harlan

[permalink] [raw]
Subject: Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)

On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Thu, 25 Oct 2001 19:32:48 +0200
>
> On Thu, Oct 25 2001, Christian Hammers wrote:
> > This patch did not prevent the crash. Again immediately after
> > rewinding the
> > tape when it began to write. I'll try now the 2.4.12-ac6... and
> > it works.
>
> Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> look.
>
> Can people try out this patch? I believe this will fix the bug.

Yessiree, that fixed the scsi tape lockups we had in 2.4.13.

Many thanks,

--Pete [email protected]


> --- drivers/scsi/st.c.~1~ Sun Oct 21 02:47:53 2001
> +++ drivers/scsi/st.c Thu Oct 25 17:23:45 2001
> @@ -3233,6 +3233,7 @@
> break;
> }
> }
> + tb->sg[0].page = NULL;
> if (tb->sg[segs].address == NULL) {
> kfree(tb);
> tb = NULL;
> @@ -3264,6 +3265,7 @@
> tb = NULL;
> break;
> }
> + tb->sg[segs].page = NULL;
> tb->sg[segs].length = b_size;
> got += b_size;
> segs++;
> @@ -3337,6 +3339,7 @@
> normalize_buffer(STbuffer);
> return FALSE;
> }
> + STbuffer->sg[segs].page = NULL;
> STbuffer->sg[segs].length = b_size;
> STbuffer->sg_segs += 1;
> got += b_size;

2001-10-30 14:25:50

by Christian Hammers

[permalink] [raw]
Subject: Re: BUG() in asm/pci.h:142 with 2.4.13 (cause found!)

Hello

The cause for my problems with crashing kernels when accessing the tape
drive were differences between the original external RAID that belongs
to the machine and a temporarily, nearly equal, RAID that we attached while
the original was send away for repair.
The temp. RAID was Ultra3 160 and /proc/scsi/aic7xxx/0 showed me a cur.
speed of 160 while the "old" and working one only did Ultra2 with 80Mbit/s.

I don't know if it's a hardware incompatibility or if the Linux kernel
drivers cannot handle this specific case.

The problem exists in 2.4.11-pre6, 2.4.13, 2.4.12-ac6 and 2.4.13 with
patched from axboa(?) and D. Miller.

bye,

-christian-


On Thu, Oct 25, 2001 at 12:07:01PM +0200, Christian Hammers wrote:
...
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.
...
> kernel: kernel BUG at /usr/local/src/kernel/linux-2.4.13/include/asm/pci.h:142!
...
> kernel: scsi0:0:0:0: Attempting to queue an ABORT message
> kernel: (scsi0:A:0:0): Queuing a recovery SCB
> kernel: scsi0:0:0:0: Device is disconnected, re-queuing SCB
...

--
Christian Hammers WESTEND GmbH - Aachen und Dueren Tel 0241/701333-0
[email protected] Internet & Security for Professionals Fax 0241/911879
WESTEND ist CISCO Systems Partner - Premium Certified

2001-10-30 16:57:45

by Dan Maas

[permalink] [raw]
Subject: Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)

> Can people try out this patch? I believe this will fix the bug.
> + tb->sg[0].page = NULL;
> if (tb->sg[segs].address == NULL) {

For the sake of making this clear to other kernel hackers (I got bitten by
it too) - starting with 2.4.13 you must zero out the fields of struct
scatterlist that you are not using. i.e. it is no longer sufficient to
simply set sg.address and sg.length, because junk might still be present in
the new sg.page field, and pci_map_*() will BUG() if both sg.address and
sg.page are non-zero.

Regards,
Dan


2001-10-31 08:34:45

by Jens Axboe

[permalink] [raw]
Subject: Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)

On Tue, Oct 30 2001, Dan Maas wrote:
> > Can people try out this patch? I believe this will fix the bug.
> > + tb->sg[0].page = NULL;
> > if (tb->sg[segs].address == NULL) {
>
> For the sake of making this clear to other kernel hackers (I got bitten by
> it too) - starting with 2.4.13 you must zero out the fields of struct
> scatterlist that you are not using. i.e. it is no longer sufficient to
> simply set sg.address and sg.length, because junk might still be present in
> the new sg.page field, and pci_map_*() will BUG() if both sg.address and
> sg.page are non-zero.

True, perhaps we should add a init_sg or something like that.

--
Jens Axboe

2001-10-31 08:53:38

by Jens Axboe

[permalink] [raw]
Subject: Re: SCSI tape crashes

On Wed, Oct 31 2001, David S. Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Wed, 31 Oct 2001 09:46:50 +0100
>
> How is this different from my init_sg proposal? :-). The above looks
> fine to me, I just think it is important that we take care of this.
>
> My impression was that you wanted to do something like:
>
> sgl = kmalloc();
> init_sg(sgl, nents);
>
> for_each_ent() {
> ... existing code ..
> }
>
> If what you really meant was semantically identicaly to what I have
> proposed, I'm completely in agreement with it. Please submit patches.

Yep, init_sg_entry is probably a better name... I'll hack up and submit
soonish.

--
Jens Axboe