Hi All,
Qlogic SCSI support seems broken on 2.4.0-test11 on a Miata (Digital Personal WorkStation 600au).
When starting up, we get a machine check after initialing the qlogic SCSI code.
Using the Alpha kgdb, we figured out that the code is dying in scsi_wait_request().
Here's the backtrace:
scsi_wait_req (SRpnt=0xfffffc0001f9b480, cmnd=0xfffffc890000a078,
buffer=0x100, bufflen=2, timeout=17891584, retries=6144)
at /usr/src/linux/include/asm/atomic.h:85
(gdb) where
#0 scsi_wait_req (SRpnt=0xfffffc0001f9b480, cmnd=0xfffffc890000a078,
buffer=0x100, bufflen=2, timeout=17891584, retries=6144)
at /usr/src/linux/include/asm/atomic.h:85
#1 0xfffffc00004107f0 in scan_scsis_single (channel=0, dev=41080, lun=0,
max_dev_lun=0xfffffc00001efa30, sparse_lun=0xfffffc00001efa34,
SDpnt2=0xfffffc00001efa38, shpnt=0xfffffc00005ff800,
scsi_result=0xfffffc00001ef930 "\001") at scsi_scan.c:516
#2 0xfffffc0000410548 in scan_scsis (shpnt=0xfffffc00005ff800, hardcoded=1,
hchannel=0, hid=0, hlun=0) at scsi_scan.c:403
#3 0xfffffc0000404f58 in scsi_register_host (tpnt=0xfffffc000058fb80)
at scsi.c:1904
#4 0xfffffc00004dac50 in init_this_scsi_driver ()
#5 0xfffffc00004c2bec in do_initcalls ()
#6 0xfffffc00004c2c6c in do_basic_setup ()
#7 0xfffffc0000310078 in init (unused=0x0) at init/main.c:775
Note: On the working kernels, the two controllers are 0x800 apart, but
on the broken kernels, they are only 0x400. Could the overlap
cause problems?
Working: 2.2.14-6.0: (from 6.2 Redhat)
qlogicisp : new isp1020 revision ID (5)
qlogicisp : new isp1020 revision ID (5)
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 20 irq 27 I/O base 0x9000
scsi1 : QLogic ISP1020 SCSI on PCI bus 01 device 48 irq 40 I/O base 0x9800
scsi : 2 hosts.
Vendor: DEC Model: RZ1CB-BA (C) DEC Rev: LYE0
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: DEC Model: RZ28D (C) DEC Rev: 0008
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
Vendor: DEC Model: RZ1BB-BA (C) DEC Rev: LYE0
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
scsi : detected 3 SCSI disks total.
Working: vmlinux-2.2.17-4 (from 7.0 Redhat)
qlogicisp : new isp1020 revision ID (5)
qlogicisp : new isp1020 revision ID (5)
DC390: 0 adapters found
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 20 irq 27 I/O base 0xa000
scsi1 : QLogic ISP1020 SCSI on PCI bus 01 device 48 irq 40 I/O base 0xa800
scsi : 2 hosts.
Vendor: DEC Model: RZ1CB-BA (C) DEC Rev: LYE0
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: DEC Model: RZ28D (C) DEC Rev: 0008
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
Vendor: DEC Model: RZ1BB-BA (C) DEC Rev: LYE0
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
scsi : detected 3 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 8380080 [4091 MB] [4.1 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 4110480 [2007 MB] [2.0 GB]
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 4110480 [2007 MB] [2.0 GB]
Broken 2.4.0-test11: (gcc version 2.96 20000731 (Red Hat Linux 7.0))
SCSI subsystem driver Revision: 1.00
qlogicisp : new isp1020 revision ID (5)
qlogicisp : new isp1020 revision ID (5)
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 20 irq 27 I/O base 0xa000
scsi1 : QLogic ISP1020 SCSI on PCI bus 01 device 48 irq 40 I/O base 0xa400
CIA machine check: vector=0x660 pc=0xfffffc0000312644 code=0x813
machine check type: unknown
pc = [<fffffc0000312644>] ra = [<fffffc0000312660>] ps = 0000
v0 = 0000000000000000 t0 = 0000000000000000 t1 = fffffc00005d8b20
t2 = 0000000000000001 t3 = 0000000000000001 t4 = fffffc000057a110
t5 = fffffffffffffc18 t6 = 000000000000451d t7 = fffffc0000520000
a0 = 0000000000000019 a1 = 0000000000000032 a2 = fffffc000035d5cc
a3 = 0000000000000002 a4 = fffffc0000544080 a5 = fffffc000057a110
t8 = 0000000000000000 t9 = 00000000f96329ef t10= 0000000000000000
t11= 0000000000000001 pv = fffffc0000329f80 at = fffffc0000520000
gp = fffffc000059dd88 sp = fffffc0000523fd0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
Broken 2.4.0-test11: (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release))
SCSI subsystem driver Revision: 1.00
qlogicisp : new isp1020 revision ID (5)
qlogicisp : new isp1020 revision ID (5)
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 20 irq 27 I/O base 0xa000
scsi1 : QLogic ISP1020 SCSI on PCI bus 01 device 48 irq 40 I/O base 0xa400
CIA machine check: vector=0x660 pc=0xfffffc0000312464 code=0x813
machine check type: unknown
pc = [<fffffc0000312464>] ra = [<fffffc0000312480>] ps = 0000
v0 = 0000000000000000 t0 = 0000000000000000 t1 = 0000000000000001
t2 = 0000000000000001 t3 = fffffc0000562850 t4 = fffffc0000562850
t5 = fffffffffffffc18 t6 = fffffc00005613d0 t7 = fffffc0000508000
a0 = 0000000000000019 a1 = 0000000000000032 a2 = fffffc000035b478
a3 = 0000000000000002 a4 = fffffc00003f8880 a5 = 0000000000001800
t8 = 0000000000000000 t9 = 000000001feee829 t10= 0000000000000000
t11= ffff00ff00000012 pv = fffffc0000329f80 at = fffffc000052c080
gp = fffffc0000585ed8 sp = fffffc000050bfd0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun 0 0
Thanks,
--Phil & Bill
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
Hi Phil,
Phillip Ezolt wrote:
> Hi All,
>
> Qlogic SCSI support seems broken on 2.4.0-test11 on a Miata (Digital Personal WorkStation 600au).
>
> When starting up, we get a machine check after initialing the qlogic SCSI code.
>
> Using the Alpha kgdb, we figured out that the code is dying in scsi_wait_request().
Wow, I'm impressed! I didn't realize that kgdb worked on Alpha...Were
you using the remote kgdb? (You can answer me offline to save
bandwidth.) This would be a _huge_ help in trying to figure out why my
Wildfire^WGS160 is crashing with the DISCONTIGMEM code that I stole from
Jay and have been hacking on.
Speaking of that system, it has two QLogic adapters in it (both 1040Bs,
like the Miata), and they are working just fine under 2.4.0-test11
(obviously, without my changes ;). It looks like it's probably the
platform code that's busted. I can't remember...are those Pyxis or
CIA? Anyway, could this have something to do with the PCI & PCI bridge
work that Richard and Ivan just submitted?
- Pete
>
> Here's the backtrace:
>
> scsi_wait_req (SRpnt=0xfffffc0001f9b480, cmnd=0xfffffc890000a078,
> buffer=0x100, bufflen=2, timeout=17891584, retries=6144)
> at /usr/src/linux/include/asm/atomic.h:85
> (gdb) where
> #0 scsi_wait_req (SRpnt=0xfffffc0001f9b480, cmnd=0xfffffc890000a078,
> buffer=0x100, bufflen=2, timeout=17891584, retries=6144)
> at /usr/src/linux/include/asm/atomic.h:85
> #1 0xfffffc00004107f0 in scan_scsis_single (channel=0, dev=41080, lun=0,
> max_dev_lun=0xfffffc00001efa30, sparse_lun=0xfffffc00001efa34,
> SDpnt2=0xfffffc00001efa38, shpnt=0xfffffc00005ff800,
> scsi_result=0xfffffc00001ef930 "\001") at scsi_scan.c:516
> #2 0xfffffc0000410548 in scan_scsis (shpnt=0xfffffc00005ff800, hardcoded=1,
> hchannel=0, hid=0, hlun=0) at scsi_scan.c:403
> #3 0xfffffc0000404f58 in scsi_register_host (tpnt=0xfffffc000058fb80)
> at scsi.c:1904
> #4 0xfffffc00004dac50 in init_this_scsi_driver ()
> #5 0xfffffc00004c2bec in do_initcalls ()
> #6 0xfffffc00004c2c6c in do_basic_setup ()
> #7 0xfffffc0000310078 in init (unused=0x0) at init/main.c:775
>
>
> Note: On the working kernels, the two controllers are 0x800 apart, but
> on the broken kernels, they are only 0x400. Could the overlap
> cause problems?
>
>
Its probaly due to the semaphore changes that went in to test11 by
Richard Henderson. scsi_wait_req will grab one on entry. Did test10
work for you on Alpha?
Regards,
Torben Mathiasen
> -----Original Message-----
> From: Rival, Frank
> Sent: 30. november 2000 21:37
> To: Ezolt, Phillip
> Cc: [email protected]; [email protected]; [email protected];
> Estabrook, Jay; [email protected]; [email protected];
> [email protected]
> Subject: Re: Alpha SCSI error on 2.4.0-test11
>
>
> Hi Phil,
>
> Phillip Ezolt wrote:
>
> > Hi All,
> >
> > Qlogic SCSI support seems broken on 2.4.0-test11 on a Miata
> (Digital Personal WorkStation 600au).
> >
> > When starting up, we get a machine check after initialing
> the qlogic SCSI code.
> >
> > Using the Alpha kgdb, we figured out that the code is dying
> in scsi_wait_request().
>
> Wow, I'm impressed! I didn't realize that kgdb worked on
> Alpha...Were
> you using the remote kgdb? (You can answer me offline to save
> bandwidth.) This would be a _huge_ help in trying to figure
> out why my
> Wildfire^WGS160 is crashing with the DISCONTIGMEM code that I
> stole from
> Jay and have been hacking on.
>
> Speaking of that system, it has two QLogic adapters in it
> (both 1040Bs,
> like the Miata), and they are working just fine under 2.4.0-test11
> (obviously, without my changes ;). It looks like it's probably the
> platform code that's busted. I can't remember...are those Pyxis or
> CIA? Anyway, could this have something to do with the PCI &
> PCI bridge
> work that Richard and Ivan just submitted?
>
> - Pete
>
On Thu, Nov 30, 2000 at 03:02:42PM -0500, Phillip Ezolt wrote:
> Qlogic SCSI support seems broken on 2.4.0-test11 on a Miata (Digital Personal WorkStation 600au).
>
> When starting up, we get a machine check after initialing the qlogic SCSI code.
Try test12-pre3 - there is the new PCI init stuff. It works (to some degree)
on as1000a with the same qlogic scsi.
Ivan.
Ivan,
We dug a little deeper, and think that we found the problem.
The error occurs if we have over 1024 MB of memory in the machine.
If we have less than 1024MB, the machine behaves correctly.
(This is a 600Mhz Digital Personal Workstation)
Once again, the 2.2 kernel in RH 7.0 behaves properly.
I'll give test12-pre3 a try and see if it fixes things.
Thanks,
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Fri, 1 Dec 2000, Ivan Kokshaysky wrote:
> On Thu, Nov 30, 2000 at 03:02:42PM -0500, Phillip Ezolt wrote:
> > Qlogic SCSI support seems broken on 2.4.0-test11 on a Miata (Digital Personal WorkStation 600au).
> >
> > When starting up, we get a machine check after initialing the qlogic SCSI code.
>
> Try test12-pre3 - there is the new PCI init stuff. It works (to some degree)
> on as1000a with the same qlogic scsi.
>
> Ivan.
>
>
>
> _______________________________________________
> Axp-list mailing list
> [email protected]
> https://listman.redhat.com/mailman/listinfo/axp-list
>
On Thu, Nov 30, 2000 at 05:26:58PM -0500, Phillip Ezolt wrote:
> I'll give test12-pre3 a try and see if it fixes things.
test12-pre2 crashes at boot on my DS20. This patch workaround the problem
but I would be _very_ surprised if this is the right fix :) It's obviously not
meant for inclusion.
--- 2.4.0-test12-pre2-alpha/drivers/pci/setup-res.c.~1~ Tue Nov 28 18:40:29 2000
+++ 2.4.0-test12-pre2-alpha/drivers/pci/setup-res.c Wed Nov 29 03:15:45 2000
@@ -148,8 +148,11 @@
continue;
for (list = head; ; list = list->next) {
unsigned long size = 0;
- struct resource_list *ln = list->next;
+ struct resource_list *ln;
+ if (!list)
+ return;
+ ln = list->next;
if (ln)
size = ln->res->end - ln->res->start;
if (r->end - r->start > size) {
I prefer to finish the ASN SMP rework before looking into this.
Andrea
On Thu, Nov 30, 2000 at 11:37:42PM +0100, Andrea Arcangeli wrote:
> test12-pre2 crashes at boot on my DS20. This patch workaround the problem
> but I would be _very_ surprised if this is the right fix :) It's obviously not
> meant for inclusion.
...
> - struct resource_list *ln = list->next;
> + struct resource_list *ln;
>
> + if (!list)
> + return;
> + ln = list->next;
Argh. I believe that crash could happen only if some broken device has
empty I/O or memory range and IORESOURCE_[IO,MEM] bit set.
Andrea, could you try this?
Ivan.
--- linux/drivers/pci/setup-res.c~ Thu Nov 30 12:14:31 2000
+++ linux/drivers/pci/setup-res.c Fri Dec 1 13:49:34 2000
@@ -136,6 +136,7 @@ pdev_sort_resources(struct pci_dev *dev,
for (i = 0; i < PCI_NUM_RESOURCES; i++) {
struct resource *r;
struct resource_list *list, *tmp;
+ unsigned long r_size;
/* PCI-PCI bridges may have I/O ports or
memory on the primary bus */
@@ -144,7 +145,9 @@ pdev_sort_resources(struct pci_dev *dev,
continue;
r = &dev->resource[i];
- if (!(r->flags & type_mask) || r->parent)
+ r_size = r->end - r->start;
+
+ if (!(r->flags & type_mask) || !r_size || r->parent)
continue;
for (list = head; ; list = list->next) {
unsigned long size = 0;
@@ -152,7 +155,7 @@ pdev_sort_resources(struct pci_dev *dev,
if (ln)
size = ln->res->end - ln->res->start;
- if (r->end - r->start > size) {
+ if (r_size > size) {
tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
tmp->next = ln;
tmp->res = r;
On Fri, Dec 01, 2000 at 02:56:19PM +0300, Ivan Kokshaysky wrote:
> Andrea, could you try this?
that's the right fix thanks (please send to Linus).
BTW, here is a preview of the asn SMP race fix for 2.4.x:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test12-pre2/alpha-ASN-SMP-races-2.4.x-1
I'm still left the #ifdef __alpha__ around the context[NR_CPUS] to avoid
breakage of other archs but that should be probably removed: any CPU with
per-CPU ASNs like alpha needs per-CPU per-MM context information to avoid
wasting ASNs when the task migrate CPU or with threads.
The ASN race fix for 2.4.x is implemented differently than the 2.2.x previous
version, in 2.4.x I'm avoiding the __cli, so the whole context switch runs with
irq _enabled_ as usual (unlike in the 2.2.x version). I'm also taking care not
to waste any ASN than strictly necessary while doing the race-check after the
context switch completed.
And here a new version of the 2.2.x one (I was clearing all other cpu context
from activate_context, and that wasn't necessary but it couldn't hurt so
it's a minor update):
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre23/alpha-ASN-SMP-races-3
(the cli-less logic could be backported to 2.2.x but OTOH the cli way looks
simpler so more appropriate for 2.2.x)
Andrea
Ivan,
I have tried test12-pre3 with and without your patch, it fails in the same
way.
The Qlogic SCSI controller continues to fail if we have >1 gig in the machine.
(But works fine without it.)
Any ideas? (Or patches that I can test... ;-) )
Thanks,
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Fri, 1 Dec 2000, Ivan Kokshaysky wrote:
> On Thu, Nov 30, 2000 at 11:37:42PM +0100, Andrea Arcangeli wrote:
> > test12-pre2 crashes at boot on my DS20. This patch workaround the problem
> > but I would be _very_ surprised if this is the right fix :) It's obviously not
> > meant for inclusion.
> ...
> > - struct resource_list *ln = list->next;
> > + struct resource_list *ln;
> >
> > + if (!list)
> > + return;
> > + ln = list->next;
>
> Argh. I believe that crash could happen only if some broken device has
> empty I/O or memory range and IORESOURCE_[IO,MEM] bit set.
>
> Andrea, could you try this?
>
> Ivan.
>
> --- linux/drivers/pci/setup-res.c~ Thu Nov 30 12:14:31 2000
> +++ linux/drivers/pci/setup-res.c Fri Dec 1 13:49:34 2000
> @@ -136,6 +136,7 @@ pdev_sort_resources(struct pci_dev *dev,
> for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> struct resource *r;
> struct resource_list *list, *tmp;
> + unsigned long r_size;
>
> /* PCI-PCI bridges may have I/O ports or
> memory on the primary bus */
> @@ -144,7 +145,9 @@ pdev_sort_resources(struct pci_dev *dev,
> continue;
>
> r = &dev->resource[i];
> - if (!(r->flags & type_mask) || r->parent)
> + r_size = r->end - r->start;
> +
> + if (!(r->flags & type_mask) || !r_size || r->parent)
> continue;
> for (list = head; ; list = list->next) {
> unsigned long size = 0;
> @@ -152,7 +155,7 @@ pdev_sort_resources(struct pci_dev *dev,
>
> if (ln)
> size = ln->res->end - ln->res->start;
> - if (r->end - r->start > size) {
> + if (r_size > size) {
> tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
> tmp->next = ln;
> tmp->res = r;
>
>
Date: Fri, 1 Dec 2000 15:18:42 +0100
From: Andrea Arcangeli <[email protected]>
I'm still left the #ifdef __alpha__ around the context[NR_CPUS] to
avoid breakage of other archs but that should be probably removed:
any CPU with per-CPU ASNs like alpha needs per-CPU per-MM context
information to avoid wasting ASNs when the task migrate CPU or with
threads.
I would instead suggest to declare 'context' to be of an arch-specific
defined type, much like "thread_struct" is.
For example, I don't need NR_CPUS contexts in the mm_struct on
sparc64, my allocation just works differently, so I shouldn't eat
all the space.
Later,
David S. Miller
[email protected]
On Fri, Dec 01, 2000 at 10:19:44AM -0800, David S. Miller wrote:
> I'm still left the #ifdef __alpha__ around the context[NR_CPUS] to
> avoid breakage of other archs but that should be probably removed:
> any CPU with per-CPU ASNs like alpha needs per-CPU per-MM context
> information to avoid wasting ASNs when the task migrate CPU or with
> threads.
>
> I would instead suggest to declare 'context' to be of an arch-specific
> defined type, much like "thread_struct" is.
I agree, really that should been the case since the first place because the 4
bytes of context are just a waste for x86* :). I mainly wanted to make sure
other archs was doing the right thing too.
> For example, I don't need NR_CPUS contexts in the mm_struct on
> sparc64, my allocation just works differently, so I shouldn't eat
> all the space.
I think at least mips wants to use per-mm per-cpu context too btw.
Andrea
On Fri, Dec 01, 2000 at 01:30:10PM -0500, Phillip Ezolt wrote:
> Any ideas? (Or patches that I can test... ;-) )
miata seems to use cia southbridge so it should set an iommu direct mapping
large 2G. So it's maybe the second window between 1G and 2G that isn't set
correctly? Does the qlogic driver works on a tsunami southbridge?
Andrea
Andrea,
> large 2G. So it's maybe the second window between 1G and 2G that isn't set
> correctly?
What data structure's would I look at? What should I investigate to
verify this?
> Does the qlogic driver works on a tsunami southbridge?
What would I have to do to test this? I have an ES40 & 3 miata's
at my disposal.
Thanks,
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Fri, 1 Dec 2000, Andrea Arcangeli wrote:
> On Fri, Dec 01, 2000 at 01:30:10PM -0500, Phillip Ezolt wrote:
> > Any ideas? (Or patches that I can test... ;-) )
>
> miata seems to use cia southbridge so it should set an iommu direct mapping
> large 2G. So it's maybe the second window between 1G and 2G that isn't set
> correctly? Does the qlogic driver works on a tsunami southbridge?
>
> Andrea
>
On Fri, Dec 01, 2000 at 02:56:43PM -0500, Phillip Ezolt wrote:
> What data structure's would I look at? What should I investigate to
> verify this?
The relevant code is in arch/alpha/kernel/core_cia.c
> What would I have to do to test this? I have an ES40 & 3 miata's
Does the qlogic driver works well on an ES40 with more than 1G of ram? If
yes then qlogic driver should be ok.
Andrea
On Fri, Dec 01, 2000 at 02:56:43PM -0500, Phillip Ezolt wrote:
> What data structure's would I look at? What should I investigate to
> verify this?
In the arch/alpha/kernel/pci_iommu.c change
#define DEBUG_ALLOC 0
to
#define DEBUG_ALLOC 2
Perhaps this will give us more info.
At the first look window 1 is being set up properly.
Ivan.
Ivan,
I've recompiled as you have suggested. Any ideas?
Here is my dmesg output:
Linux version 2.4.0-test12 ([email protected]) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #3 Mon Dec 4 02:38:18 EST 2000
Booting GENERIC on Miata using machine vector Miata from SRM
Command line: console=tty0 console=ttyS0,9600 root=/dev/fd0
memcluster 0, usage 1, start 0, end 236
memcluster 1, usage 0, start 236, end 147455
memcluster 2, usage 1, start 147455, end 147456
freeing pages 236:384
freeing pages 754:147455
pci: cia revision 1 (pyxis)
On node 0 totalpages: 147456
zone(0): 147456 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Kernel command line: console=tty0 console=ttyS0,9600 root=/dev/fd0
Using epoch = 1952
Console: colour VGA+ 80x25
Calibrating delay loop... 1191.18 BogoMIPS
Memory: 1155136k/1179640k available (1602k kernel code, 22616k reserved, 515k data, 376k init)
Dentry-cache hash table entries: 262144 (order: 9, 4194304 bytes)
Buffer-cache hash table entries: 65536 (order: 6, 524288 bytes)
Page-cache hash table entries: 262144 (order: 8, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 2097152 bytes)
POSIX conformance testing by UNIFIX
pci: passed tb register update test
pci: passed sg loopback i/o read test
pci: passed tbia test
pci: passed pte write cache snoop test
pci: failed valid tag invalid pte reload test (mcheck; workaround available)
pci: passed pci machine check test
got res[8000:807f] for resource 0 of Digital Equipment Corporation DECchip 21142/43
got res[8080:80ff] for resource 1 of Digital Equipment Corporation DEFPA
got res[8400:840f] for resource 4 of Contaq Microsystems 82c693 (#2)
got res[9000000:97fffff] for resource 1 of Matrox Graphics, Inc. MGA 2064W [Millennium]
got res[9800000:983ffff] for resource 6 of Digital Equipment Corporation DECchip 21142/43
got res[9840000:984ffff] for resource 4 of Contaq Microsystems 82c693 (#3)
got res[9850000:985ffff] for resource 6 of Matrox Graphics, Inc. MGA 2064W [Millennium]
got res[9860000:986ffff] for resource 2 of Digital Equipment Corporation DEFPA
got res[9870000:9873fff] for resource 0 of Matrox Graphics, Inc. MGA 2064W [Millennium]
got res[9874000:9874fff] for resource 0 of Contaq Microsystems 82c693 (#4)
got res[9875000:987507f] for resource 1 of Digital Equipment Corporation DECchip 21142/43
got res[9876000:987607f] for resource 0 of Digital Equipment Corporation DEFPA
got res[9000:90ff] for resource 0 of Q Logic ISP1020
got res[9400:947f] for resource 0 of Digital Equipment Corporation DECchip 21040 [Tulip]
got res[9900000:990ffff] for resource 6 of Q Logic ISP1020
got res[9910000:9910fff] for resource 1 of Q Logic ISP1020
got res[9911000:991107f] for resource 1 of Digital Equipment Corporation DECchip 21040 [Tulip]
PCI: Bus 1, bridge: Digital Equipment Corporation DECchip 21152
IO window: 9000-9fff
MEM window: 09900000-099fffff
PCI enable device: (Digital Equipment Corporation DECchip 21142/43)
cmd reg 0x47
PCI enable device: (Contaq Microsystems 82c693)
cmd reg 0x47
PCI enable device: (Contaq Microsystems 82c693 (#2))
cmd reg 0x45
PCI enable device: (Contaq Microsystems 82c693 (#3))
cmd reg 0x47
PCI enable device: (Contaq Microsystems 82c693 (#4))
cmd reg 0x46
PCI enable device: (Matrox Graphics, Inc. MGA 2064W [Millennium])
cmd reg 0x87
PCI enable device: (Digital Equipment Corporation DEFPA)
cmd reg 0x47
PCI enable device: (Digital Equipment Corporation DECchip 21152)
cmd reg 0x107
PCI enable device: (Q Logic ISP1020)
cmd reg 0x47
PCI enable device: (Digital Equipment Corporation DECchip 21040 [Tulip])
cmd reg 0x47
SMC37c669 Super I/O Controller found @ 0x370
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd v1.8
pty: 256 Unix98 ptys configured
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
CY82C693: IDE controller on PCI bus 00 dev 39
CY82C693: chipset revision 0
CY82C693: not 100% native mode: will probe irqs later
CY82C693U driver v0.34 99-13-12 Andreas S. Krebs ([email protected])
ide0: BM-DMA at 0x8400-0x8407<7>pci_map_single: [fffffc0001910000,1000] -> direct 41910000 from fffffc000031afa8
, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0x8408-0x840f<7>pci_map_single: [fffffc00001fa000,1000] -> direct 401fa000 from fffffc000031afa8
, BIOS settings: hdc:pio, hdd:pio
hda: TOSHIBA CD-ROM XM-5702B, ATAPI CDROM drive
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 12X CD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.11
Floppy drive(s): fd0 is 2.88M
FDC 0 is a post-1991 82077
Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
rtc: Digital UNIX epoch (1952) detected
Real Time Clock Driver v1.10d
Linux Tulip driver version 0.9.11 (November 3, 2000)
eth0: Digital DS21143 Tulip rev 48 at 0x8000, 00:00:F8:76:72:DA, IRQ 24.
eth0: EEPROM default media type Autosense.
eth0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block.
eth0: Index #1 - Media 10baseT-FD (#4) described by a 21142 Serial PHY (2) block.
eth0: Index #2 - Media 10base2 (#1) described by a 21142 Serial PHY (2) block.
eth0: Index #3 - Media AUI (#2) described by a 21142 Serial PHY (2) block.
eth0: Index #4 - Media MII (#11) described by a 21142 MII PHY (3) block.
eth0: MII transceiver #5 config 2000 status 784b advertising 01e1.
eth1: Digital DC21040 Tulip rev 35 at 0x9400, 08:00:2B:E4:1E:CB, IRQ 44.
SCSI subsystem driver Revision: 1.00
qlogicisp : new isp1020 revision ID (5)
scsi0 : QLogic ISP1020 SCSI on PCI bus 01 device 20 irq 27 I/O base 0x9000
CIA machine check: vector=0x660 pc=0xfffffc0000310764 code=0x813
machine check type: unknown
pc = [<fffffc0000310764>] ra = [<fffffc000032dc3c>] ps = 0000
v0 = 0000000047fe03b8 t0 = fffffc0000310a10 t1 = 0000000000000001
t2 = 0000000000000001 t3 = fffffc0001914000 t4 = fffffc0000562208
t5 = 0000000000000057 t6 = fffffc0000560d88 t7 = fffffc0001914000
a0 = 00000000019143b8 a1 = fffffc0047fe0000 a2 = fffffc000032e304
a3 = fffffffffffffffe a4 = 000000000000000f a5 = 0000000000000000
t8 = 0000000000000000 t9 = 0000000063001812 t10= 0000000000000000
t11= 0000000000000010 pv = fffffc0000310a00 at = fffffc000052c080
gp = fffffc0000585890 sp = fffffc0001917c00
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Sat, 2 Dec 2000, Ivan Kokshaysky wrote:
> On Fri, Dec 01, 2000 at 02:56:43PM -0500, Phillip Ezolt wrote:
> > What data structure's would I look at? What should I investigate to
> > verify this?
>
> In the arch/alpha/kernel/pci_iommu.c change
> #define DEBUG_ALLOC 0
> to
> #define DEBUG_ALLOC 2
>
> Perhaps this will give us more info.
> At the first look window 1 is being set up properly.
>
> Ivan.
>
>
Andrea,
> Does the qlogic driver works well on an ES40 with more than 1G of ram? If
> yes then qlogic driver should be ok.
Yes. I have tried it on an ES40 with 16 Gig of ram, and it boots just fine.
From what you say, this appears to be a Miata problem and NOT
a qlogic problem. What next?
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Fri, 1 Dec 2000, Andrea Arcangeli wrote:
> On Fri, Dec 01, 2000 at 02:56:43PM -0500, Phillip Ezolt wrote:
> > What data structure's would I look at? What should I investigate to
> > verify this?
>
> The relevant code is in arch/alpha/kernel/core_cia.c
>
> > What would I have to do to test this? I have an ES40 & 3 miata's
>
> Does the qlogic driver works well on an ES40 with more than 1G of ram? If
> yes then qlogic driver should be ok.
>
> Andrea
>
>
On Mon, Dec 04, 2000 at 01:53:42PM -0500, Phillip Ezolt wrote:
>
> I've recompiled as you have suggested. Any ideas?
Compile again with the following patches (these are against 2.4.0-test12,
but those in arch/alpha/kernel/core_cia.c should work against test10/11
as well).
Something got lost between 2.2 and 2.4, but it's most likely that
MIATA (because it has 6 DIMM slots) is one of the few CIA and PYXIS
machines that could actually get over 1GB of memory; that's why we
haven't seen this before...
--Jay++
-----------------------------------------------------------------------------
Jay A Estabrook Alpha Engineering - LINUX Project
Compaq Computer Corp. - MRO1-2/K20 (508) 467-2080
200 Forest Street, Marlboro MA 01752 [email protected]
-----------------------------------------------------------------------------
diff -urN old/arch/alpha/kernel/core_cia.c new/arch/alpha/kernel/core_cia.c
--- old/arch/alpha/kernel/core_cia.c Tue Dec 5 10:09:01 2000
+++ new/arch/alpha/kernel/core_cia.c Tue Dec 5 18:45:12 2000
@@ -700,11 +700,11 @@
*(vip)CIA_IOC_PCI_W1_BASE = 0x40000000 | 1;
*(vip)CIA_IOC_PCI_W1_MASK = (0x40000000 - 1) & 0xfff00000;
- *(vip)CIA_IOC_PCI_T1_BASE = 0;
+ *(vip)CIA_IOC_PCI_T1_BASE = 0 >> 2;
*(vip)CIA_IOC_PCI_W2_BASE = 0x80000000 | 1;
*(vip)CIA_IOC_PCI_W2_MASK = (0x40000000 - 1) & 0xfff00000;
- *(vip)CIA_IOC_PCI_T2_BASE = 0x40000000;
+ *(vip)CIA_IOC_PCI_T2_BASE = 0x40000000 >> 2;
*(vip)CIA_IOC_PCI_W3_BASE = 0;
}
diff -urN old/arch/alpha/kernel/pci.c new/arch/alpha/kernel/pci.c
--- old/arch/alpha/kernel/pci.c Tue Dec 5 10:09:01 2000
+++ new/arch/alpha/kernel/pci.c Tue Dec 5 10:20:01 2000
@@ -91,9 +91,15 @@
if (dev->class >> 8 != PCI_CLASS_STORAGE_IDE)
return;
dev->resource[1].start |= 2;
- dev->resource[1].end = dev->resource[1].start;
+ dev->resource[1].end = dev->resource[1].start + 1;
+#ifndef CONFIG_BLK_DEV_IDEPCI
+ /* already claimed by "standard" (ie junk) resources */
+ dev->resource[0].flags &= ~IORESOURCE_IO;
+ dev->resource[1].flags &= ~IORESOURCE_IO;
+#else
pci_claim_resource(dev, 0);
pci_claim_resource(dev, 1);
+#endif
}
static void __init
diff -urN old/drivers/pci/pci.c new/drivers/pci/pci.c
--- old/drivers/pci/pci.c Tue Dec 5 10:09:02 2000
+++ new/drivers/pci/pci.c Tue Dec 5 10:17:32 2000
@@ -540,7 +540,7 @@
static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
{
unsigned int pos, reg, next;
- u32 l, sz;
+ u32 l, sz, tmp;
struct resource *res;
for(pos=0; pos<howmany; pos = next) {
-----------------------------------------------------------------------------
Jay,
You're a genius. That works like a charm.
Thanks so much!
--Phil
Compaq: High Performance Server Division/Benchmark Performance Engineering
---------------- Alpha, The Fastest Processor on Earth --------------------
[email protected] |C|O|M|P|A|Q| [email protected]
------------------- See the results at http://www.spec.org -----------------------
On Tue, 5 Dec 2000, Jay Estabrook wrote:
> On Mon, Dec 04, 2000 at 01:53:42PM -0500, Phillip Ezolt wrote:
> >
> > I've recompiled as you have suggested. Any ideas?
>
> Compile again with the following patches (these are against 2.4.0-test12,
> but those in arch/alpha/kernel/core_cia.c should work against test10/11
> as well).
>
> Something got lost between 2.2 and 2.4, but it's most likely that
> MIATA (because it has 6 DIMM slots) is one of the few CIA and PYXIS
> machines that could actually get over 1GB of memory; that's why we
> haven't seen this before...
>
> --Jay++
>
> -----------------------------------------------------------------------------
> Jay A Estabrook Alpha Engineering - LINUX Project
> Compaq Computer Corp. - MRO1-2/K20 (508) 467-2080
> 200 Forest Street, Marlboro MA 01752 [email protected]
> -----------------------------------------------------------------------------
>
> diff -urN old/arch/alpha/kernel/core_cia.c new/arch/alpha/kernel/core_cia.c
> --- old/arch/alpha/kernel/core_cia.c Tue Dec 5 10:09:01 2000
> +++ new/arch/alpha/kernel/core_cia.c Tue Dec 5 18:45:12 2000
> @@ -700,11 +700,11 @@
>
> *(vip)CIA_IOC_PCI_W1_BASE = 0x40000000 | 1;
> *(vip)CIA_IOC_PCI_W1_MASK = (0x40000000 - 1) & 0xfff00000;
> - *(vip)CIA_IOC_PCI_T1_BASE = 0;
> + *(vip)CIA_IOC_PCI_T1_BASE = 0 >> 2;
>
> *(vip)CIA_IOC_PCI_W2_BASE = 0x80000000 | 1;
> *(vip)CIA_IOC_PCI_W2_MASK = (0x40000000 - 1) & 0xfff00000;
> - *(vip)CIA_IOC_PCI_T2_BASE = 0x40000000;
> + *(vip)CIA_IOC_PCI_T2_BASE = 0x40000000 >> 2;
>
> *(vip)CIA_IOC_PCI_W3_BASE = 0;
> }
> diff -urN old/arch/alpha/kernel/pci.c new/arch/alpha/kernel/pci.c
> --- old/arch/alpha/kernel/pci.c Tue Dec 5 10:09:01 2000
> +++ new/arch/alpha/kernel/pci.c Tue Dec 5 10:20:01 2000
> @@ -91,9 +91,15 @@
> if (dev->class >> 8 != PCI_CLASS_STORAGE_IDE)
> return;
> dev->resource[1].start |= 2;
> - dev->resource[1].end = dev->resource[1].start;
> + dev->resource[1].end = dev->resource[1].start + 1;
> +#ifndef CONFIG_BLK_DEV_IDEPCI
> + /* already claimed by "standard" (ie junk) resources */
> + dev->resource[0].flags &= ~IORESOURCE_IO;
> + dev->resource[1].flags &= ~IORESOURCE_IO;
> +#else
> pci_claim_resource(dev, 0);
> pci_claim_resource(dev, 1);
> +#endif
> }
>
> static void __init
> diff -urN old/drivers/pci/pci.c new/drivers/pci/pci.c
> --- old/drivers/pci/pci.c Tue Dec 5 10:09:02 2000
> +++ new/drivers/pci/pci.c Tue Dec 5 10:17:32 2000
> @@ -540,7 +540,7 @@
> static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
> {
> unsigned int pos, reg, next;
> - u32 l, sz;
> + u32 l, sz, tmp;
> struct resource *res;
>
> for(pos=0; pos<howmany; pos = next) {
> -----------------------------------------------------------------------------
>
On Fri, Dec 01, 2000 at 08:14:44PM +0100, Andrea Arcangeli wrote:
> On Fri, Dec 01, 2000 at 10:19:44AM -0800, David S. Miller wrote:
> > I would instead suggest to declare 'context' to be of an arch-specific
> > defined type, much like "thread_struct" is.
>
> I agree, [..]
Here it is:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test12/alpha-ASN-SMP-races-2.4.x-2
This one breaks all archs but i386 and alpha. If some arch maintainer likes me
to update its arch blindly implementing mm_arch structure as an `unsigned long
context' and fixing up the miscompilation I will do.
Andrea
Date: Fri, 15 Dec 2000 16:46:26 +0100
From: Andrea Arcangeli <[email protected]>
This one breaks all archs but i386 and alpha. If some arch maintainer likes me
to update its arch blindly implementing mm_arch structure as an `unsigned long
context' and fixing up the miscompilation I will do.
Can you name the mm_struct member "context" still instead of
"mm_arch"? Because many ports will simply:
typedef unsigned long mm_arch_t;
Then all the code changes will make the accesses look less
meaningful. Consider:
if (CTX_VALID(mm->mm_arch))
whereas before the code said:
if (CTX_VALID(mm->context))
which tells the reader lot more. In fact, retaining the "context" member
name allows most ports to operate with only one change, creating
the asm/mm_arch.h header. You can in fact do this for all ports
which care about MMU tlb contexts (a simple grep such as
egrep -e "m->context" `find . -type f -name "*.[ch]"`
will show you which ports care).
Later,
David S. Miller
[email protected]
On Fri, Dec 15, 2000 at 09:11:31AM -0800, David S. Miller wrote:
> Can you name the mm_struct member "context" [..]
I got you was proposing that but once we change it I preferred to use a generic
mm_arch structure (not just a context field) to have a more generic interface
in the long run. (maybe some port wants to collect something else than a MM
`context')
> Then all the code changes will make the accesses look less
> meaningful. Consider:
>
> if (CTX_VALID(mm->mm_arch))
>
> whereas before the code said:
>
> if (CTX_VALID(mm->context))
>
> which tells the reader lot more. [..]
What I propose is to convert the current:
if (CTX_VALID(mm->context))
to
if (CTX_VALID(mm->mm_arch.context))
(that's the same I did in the alpha tree from mm->context[] to
mm->mm_arch.context[])
I'm aware this way all ports actively using `mm->context' needs to be changed
but the change is certainly a no-brainer... OK?
Andrea
Date: Fri, 15 Dec 2000 18:55:28 +0100
From: Andrea Arcangeli <[email protected]>
I'm aware this way all ports actively using `mm->context' needs to
be changed but the change is certainly a no-brainer... OK?
My problem is that I don't want to typedef it to a structure, this
will unnecessarily increase the required alignment of the structure
member on some architectures.
Well, if you're willing to do all the fixing up, then I won't argue it
much more. :-)
Later,
David S. Miller
[email protected]