2002-10-16 19:33:33

by Adam Radford

[permalink] [raw]
Subject: 2.5.43 aic7xxx segfault sd_synchronize_cache() called after SHT-> release()

I think sd_synchronize_cache() is getting called after SHT->release()
function,
which couldn't possibly be right. This causes adaptec, 3ware, etc, to
segfault
on rmmod.

See below for adaptec segfault output:

aic7xxx
CPU: 1
EIP: 0060:[<c025918b>] Not tainted
EFLAGS: 00010202
EIP is at put_device+0x7b/0xa0
eax: 00000000 ebx: c8997028 ecx: 00000001 edx: c0465470
esi: c12f4174 edi: c8997000 ebp: 00000000 esp: c5b81ee4
ds: 0068 es: 0068 ss: 0068
Process rmmod (pid: 1085, threadinfo=c5b80000 task=c5d4a800)
Stack: c8997028 c0481e20 c02d0f3a c8997028 c8997028 c0481f3c c8997028
c0481f4c
00000000 c66fe1e8 00000286 c798aa00 c0481e20 c12f4000 c13b5000
c02be9aa
c12f4000 c5b81f30 00000002 00030002 00000002 08072009 c042399f
08071fff
Call Trace:
[<c02d0f3a>] sg_detach+0x20a/0x240
[<c02be9aa>] scsi_unregister_host+0x26a/0x5f0
[<c01418d8>] __alloc_pages+0x88/0x270
[<c892f12a>] exit_this_scsi_driver+0xa/0xc [aic7xxx]
[<c893a740>] driver_template+0x0/0x70 [aic7xxx]
[<c012029e>] free_module+0x1e/0x140
[<c011f3db>] sys_delete_module+0x1db/0x4c0
[<c010787f>] syscall_call+0x7/0xb

Code: 0f 0b 0d 01 86 69 3d c0 8b 83 d4 00 00 00 85 c0 74 04 53 ff
Segmentation fault
[root@localhost boot]#

--
Adam Radford
Software Engineer
3ware, Inc.


2002-10-16 20:17:19

by Patrick Mansfield

[permalink] [raw]
Subject: Re: 2.5.43 aic7xxx segfault sd_synchronize_cache() called after SHT-> release()

On Wed, Oct 16, 2002 at 12:41:14PM -0700, Adam Radford wrote:
> I think sd_synchronize_cache() is getting called after SHT->release()
> function,
> which couldn't possibly be right. This causes adaptec, 3ware, etc, to
> segfault
> on rmmod.
>
> See below for adaptec segfault output:
>
> aic7xxx
> CPU: 1
> EIP: 0060:[<c025918b>] Not tainted
> EFLAGS: 00010202
> EIP is at put_device+0x7b/0xa0
> eax: 00000000 ebx: c8997028 ecx: 00000001 edx: c0465470
> esi: c12f4174 edi: c8997000 ebp: 00000000 esp: c5b81ee4
> ds: 0068 es: 0068 ss: 0068
> Process rmmod (pid: 1085, threadinfo=c5b80000 task=c5d4a800)
> Stack: c8997028 c0481e20 c02d0f3a c8997028 c8997028 c0481f3c c8997028
> c0481f4c
> 00000000 c66fe1e8 00000286 c798aa00 c0481e20 c12f4000 c13b5000
> c02be9aa
> c12f4000 c5b81f30 00000002 00030002 00000002 08072009 c042399f
> 08071fff
> Call Trace:
> [<c02d0f3a>] sg_detach+0x20a/0x240
> [<c02be9aa>] scsi_unregister_host+0x26a/0x5f0
> [<c01418d8>] __alloc_pages+0x88/0x270
> [<c892f12a>] exit_this_scsi_driver+0xa/0xc [aic7xxx]
> [<c893a740>] driver_template+0x0/0x70 [aic7xxx]
> [<c012029e>] free_module+0x1e/0x140
> [<c011f3db>] sys_delete_module+0x1db/0x4c0
> [<c010787f>] syscall_call+0x7/0xb
>
> Code: 0f 0b 0d 01 86 69 3d c0 8b 83 d4 00 00 00 85 c0 74 04 53 ff

Are you sure it is not a BUG? This looks just like what Badari reported
yesterday:

kernel BUG at drivers/base/core.c:251!
invalid operand: 0000
qla2200
CPU: 0
EIP: 0060:[<c023eb24>] Not tainted
EFLAGS: 00010202
EIP is at put_device+0x64/0x90
eax: 00000000 ebx: f8a08028 ecx: f8a080c4 edx: 00000001
esi: c3aded54 edi: f8a08000 ebp: 00000003 esp: cb007ee4
ds: 0068 es: 0068 ss: 0068
Process rmmod (pid: 4803, threadinfo=cb006000 task=f62c98c0)
Stack: f8a08028 c0477a40 c02ce533 f8a08028 f8a08028 c0477b5c f8a08028 c0477b6c
00000000 40153f6d 00000286 f68fc000 c0477a40 c3adec00 f4df0000 c02a7a9a
c3adec00 cb007f30 00000002 00030002 00000001 08071002 c041685c 08070ffd
Call Trace:
[<c02ce533>] sg_detach+0x1e3/0x210
[<c02a7a9a>] scsi_unregister_host+0x26a/0x5d0
[<c01f4736>] __generic_copy_to_user+0x56/0x80
[<c013e4e8>] __alloc_pages+0x98/0x270
[<f89e7cba>] exit_this_scsi_driver+0xa/0x10 [qla2200]
[<f8a00360>] driver_template+0x0/0x74 [qla2200]
[<c011ea0e>] free_module+0x1e/0x130
[<c011dc94>] sys_delete_module+0x1b4/0x410
[<c01075e3>] syscall_call+0x7/0xb

I posted a patch to change the put_device() calls to device_unregister(),
st.c got fixed in 2.5.43, these are still not fixed in 2.5.43:

--- linux-2.5.43/drivers/scsi/scsi.c Tue Oct 15 20:28:22 2002
+++ linux-2.5.43-unreg/drivers/scsi/scsi.c Wed Oct 16 12:50:08 2002
@@ -2248,7 +2248,7 @@
if (shpnt->hostt->slave_detach)
(*shpnt->hostt->slave_detach) (SDpnt);
devfs_unregister (SDpnt->de);
- put_device(&SDpnt->sdev_driverfs_dev);
+ device_unregister(&SDpnt->sdev_driverfs_dev);
}
}

@@ -2299,7 +2299,7 @@
/* Remove the /proc/scsi directory entry */
sprintf(name,"%d",shpnt->host_no);
remove_proc_entry(name, tpnt->proc_dir);
- put_device(&shpnt->host_driverfs_dev);
+ device_unregister(&shpnt->host_driverfs_dev);
if (tpnt->release)
(*tpnt->release) (shpnt);
else {
--- linux-2.5.43/drivers/scsi/sg.c Tue Oct 15 20:27:57 2002
+++ linux-2.5.43-unreg/drivers/scsi/sg.c Wed Oct 16 12:50:25 2002
@@ -1611,7 +1611,7 @@
sdp->de = NULL;
device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_type);
device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_kdev);
- put_device(&sdp->sg_driverfs_dev);
+ device_unregister(&sdp->sg_driverfs_dev);
if (NULL == sdp->headfp)
vfree((char *) sdp);
}

2002-10-19 23:00:38

by Bill Davidsen

[permalink] [raw]
Subject: NCR adaptor doesn't see devices (was: 2.5.43 aic7xxx segfault)

On Wed, 16 Oct 2002, Patrick Mansfield wrote:

> On Wed, Oct 16, 2002 at 12:41:14PM -0700, Adam Radford wrote:
> > I think sd_synchronize_cache() is getting called after SHT->release()
> > function,
> > which couldn't possibly be right. This causes adaptec, 3ware, etc, to
> > segfault
> > on rmmod.
> >
> > See below for adaptec segfault output:
[ let's not ]
> Are you sure it is not a BUG? This looks just like what Badari reported
> yesterday:
[ more BUG output snipped ]
> I posted a patch to change the put_device() calls to device_unregister(),
> st.c got fixed in 2.5.43, these are still not fixed in 2.5.43:

I got the same type of thing in 2.5.43, 43-mm2. Applied the patch below
and the BUG went away. Unfortunately the NCR still doesn't see the
attached devices, normally a CD and tape drive. I pulled the tape drive to
see if that helps, it didn't. All works just fine with 2.4.recent. dmesg
output attached to preserve format.

> --- linux-2.5.43/drivers/scsi/scsi.c Tue Oct 15 20:28:22 2002
> +++ linux-2.5.43-unreg/drivers/scsi/scsi.c Wed Oct 16 12:50:08 2002
> @@ -2248,7 +2248,7 @@
> if (shpnt->hostt->slave_detach)
> (*shpnt->hostt->slave_detach) (SDpnt);
> devfs_unregister (SDpnt->de);
> - put_device(&SDpnt->sdev_driverfs_dev);
> + device_unregister(&SDpnt->sdev_driverfs_dev);
> }
> }
>
> @@ -2299,7 +2299,7 @@
> /* Remove the /proc/scsi directory entry */
> sprintf(name,"%d",shpnt->host_no);
> remove_proc_entry(name, tpnt->proc_dir);
> - put_device(&shpnt->host_driverfs_dev);
> + device_unregister(&shpnt->host_driverfs_dev);
> if (tpnt->release)
> (*tpnt->release) (shpnt);
> else {
> --- linux-2.5.43/drivers/scsi/sg.c Tue Oct 15 20:27:57 2002
> +++ linux-2.5.43-unreg/drivers/scsi/sg.c Wed Oct 16 12:50:25 2002
> @@ -1611,7 +1611,7 @@
> sdp->de = NULL;
> device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_type);
> device_remove_file(&sdp->sg_driverfs_dev, &dev_attr_kdev);
> - put_device(&sdp->sg_driverfs_dev);
> + device_unregister(&sdp->sg_driverfs_dev);
> if (NULL == sdp->headfp)
> vfree((char *) sdp);
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>

--
bill davidsen, CTO TMR Associates, Inc <[email protected]>
Having the feature freeze for Linux 2.5 on Hallow'een is appropriate,
since using 2.5 kernels includes a lot of things jumping out of dark
corners to scare you.


Attachments:
dmesg-2.5.43-mm2p1 (13.94 kB)

2002-10-19 23:07:40

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: NCR adaptor doesn't see devices (was: 2.5.43 aic7xxx segfault)


Hello Davidsen , I hope the Sym-2 driver is what you are using ?
From the dmesg output I suspect that is not the case . If there
is only the one Symbios/LSI driver I hope it is the Sym-2
version . Hth , JimL

On Sat, 19 Oct 2002, davidsen wrote:
> On Wed, 16 Oct 2002, Patrick Mansfield wrote:
> > On Wed, Oct 16, 2002 at 12:41:14PM -0700, Adam Radford wrote:
> > > I think sd_synchronize_cache() is getting called after SHT->release()
> > > function,
> > > which couldn't possibly be right. This causes adaptec, 3ware, etc, to
> > > segfault
> > > on rmmod.
> > > See below for adaptec segfault output:
> [ let's not ]
> > Are you sure it is not a BUG? This looks just like what Badari reported
> > yesterday:
> [ more BUG output snipped ]
> > I posted a patch to change the put_device() calls to device_unregister(),
> > st.c got fixed in 2.5.43, these are still not fixed in 2.5.43:
>
> I got the same type of thing in 2.5.43, 43-mm2. Applied the patch below
> and the BUG went away. Unfortunately the NCR still doesn't see the
> attached devices, normally a CD and tape drive. I pulled the tape drive to
> see if that helps, it didn't. All works just fine with 2.4.recent. dmesg
> output attached to preserve format.
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+

2002-10-20 18:28:11

by Bill Davidsen

[permalink] [raw]
Subject: Re: NCR adaptor doesn't see devices (was: 2.5.43 aic7xxx segfault)

On Sat, 19 Oct 2002, Mr. James W. Laferriere wrote:

>
> Hello Davidsen , I hope the Sym-2 driver is what you are using ?
> From the dmesg output I suspect that is not the case . If there
> is only the one Symbios/LSI driver I hope it is the Sym-2
> version . Hth , JimL

No, the sym-anything seems to be for the newer chopsets, and not the old
ncr825. I believe I tried 2.5.38 or so with that driver and it couldn't
find a device it liked. I'll try building that module again, but it didn't
work and I thought it might be causing a problem trying.

Also note that the driver inserts and fails twice (see dmesg) which is not
intuitive to me.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-10-20 18:58:57

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: NCR adaptor doesn't see devices (was: 2.5.43 aic7xxx segfault)


Hello Bill ,

On Sun, 20 Oct 2002, Bill Davidsen wrote:
> No, the sym-anything seems to be for the newer chopsets, and not the old
> ncr825. I believe I tried 2.5.38 or so with that driver and it couldn't
> find a device it liked. I'll try building that module again, but it didn't
> work and I thought it might be causing a problem trying.
Iirc , Gerard said that the Sym-2 is for all chipsets again .
see: linux/drivers/scsi/sym53c8xx_2/Documentation.txt
The ncr53c8xx was the original driver that he produced . Then
came the sym53c8xx version which was NOT for the older chips
supported by the ncr53c8xx.c .

> Also note that the driver inserts and fails twice (see dmesg) which is not
> intuitive to me.
Yes , I noted them below using a grep of your document . It
appears that the SYM53c8xx driver gets loaded THEN the
ncr53c8xx attempts to load & of course conflicts with the
SYM53c8xx . These two drivers can not co-exist , Without very
special care as to how they get loaded or some such .
I still highly recommend the sym2 driver rather than either of
the two being loaded . But if it won't recognise the drives ...
Hth , JimL

--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+

dmesg-2.5.43-mm2p1:227:sym53c8xx: at PCI bus 0, device 9, function 0
dmesg-2.5.43-mm2p1:228:sym53c8xx: not initializing, device not supported
...
dmesg-2.5.43-mm2p1:279:ncr53c8xx: at PCI bus 0, device 9, function 0
dmesg-2.5.43-mm2p1:280:ncr53c8xx: 53c825 detected
dmesg-2.5.43-mm2p1:281:ncr53c825-0: rev 0x2 on pci bus 0 device 9 function 0 irq 9
dmesg-2.5.43-mm2p1:282:ncr53c825-0: ID 7, Fast-10, Parity Checking
dmesg-2.5.43-mm2p1:283:scsi1 : ncr53c8xx-3.4.3b-20010512
dmesg-2.5.43-mm2p1:284:ncr53c825-0-<2,*>: target did not report SYNC.
...
dmesg-2.5.43-mm2p1:293:ncr53c825-0: releasing host resources
dmesg-2.5.43-mm2p1:294:ncr53c825-0: resetting chip
dmesg-2.5.43-mm2p1:295:ncr53c825-0: host resources successfully released
...
dmesg-2.5.43-mm2p1:303:ncr53c8xx: at PCI bus 0, device 9, function 0
dmesg-2.5.43-mm2p1:304:ncr53c8xx: 53c825 detected
dmesg-2.5.43-mm2p1:305:ncr53c825-0: rev 0x2 on pci bus 0 device 9 function 0 irq 9
dmesg-2.5.43-mm2p1:306:ncr53c825-0: ID 7, Fast-10, Parity Checking
dmesg-2.5.43-mm2p1:307:scsi1 : ncr53c8xx-3.4.3b-20010512
dmesg-2.5.43-mm2p1:308:ncr53c825-0-<2,*>: target did not report SYNC.
...
dmesg-2.5.43-mm2p1:311:ncr53c825-0: releasing host resources
dmesg-2.5.43-mm2p1:312:ncr53c825-0: resetting chip
dmesg-2.5.43-mm2p1:313:ncr53c825-0: host resources successfully released

2002-10-21 02:29:13

by Bill Davidsen

[permalink] [raw]
Subject: Re: NCR adaptor doesn't see devices (was: 2.5.43 aic7xxx segfault)

On Sun, 20 Oct 2002, Mr. James W. Laferriere wrote:

> On Sun, 20 Oct 2002, Bill Davidsen wrote:
> > No, the sym-anything seems to be for the newer chopsets, and not the old
> > ncr825. I believe I tried 2.5.38 or so with that driver and it couldn't
> > find a device it liked. I'll try building that module again, but it didn't
> > work and I thought it might be causing a problem trying.
> Iirc , Gerard said that the Sym-2 is for all chipsets again .
> see: linux/drivers/scsi/sym53c8xx_2/Documentation.txt
> The ncr53c8xx was the original driver that he produced . Then
> came the sym53c8xx version which was NOT for the older chips
> supported by the ncr53c8xx.c .

First, you are right, using the new sym driver the card works. But:
1. it doesn't build (2.5.43) as a module, up through -mm3
2. the ncr module works in 2.4, and should work or be remnoved.
Typically we keep old modules, like the something7,8xx (yes, comma
in the module name).
3. Building in makes it load before the ide-scsi module, and changes all
the device name and assignments. I have enough problems going between
2.4 and 2.5, I'm afraid of ising devfs on top of that. If it works.

> > Also note that the driver inserts and fails twice (see dmesg) which is not
> > intuitive to me.
> Yes , I noted them below using a grep of your document . It
> appears that the SYM53c8xx driver gets loaded THEN the
> ncr53c8xx attempts to load & of course conflicts with the
> SYM53c8xx . These two drivers can not co-exist , Without very
> special care as to how they get loaded or some such .
> I still highly recommend the sym2 driver rather than either of
> the two being loaded . But if it won't recognise the drives ...
> Hth , JimL

Most good catch, I saw the ncr52c8xx and missed the module name, don't
know how that got turned on.

So I can run as long as I'm very ccareful what scripts do, and the ncr
module could be fix (probably trivial) or the sym-2 could be made to work
as a module. I can provide the config and anything else if that proves
hard to replicate, I'm in the habbit of building almost everything as a
module, so I hit rather more of these problems than I would like.

Again thanks for the catch, I'm working, the system uses all devices, and
the swap of /dev/sc0<=>/dev/scd1 assignment is acceptable for a
development machine. I'll probably change the scripts to use cdrom1..N and
just make the symlinks in rc.local.

I couldn't get netfilter to build as modules the last time I tried, I have
to look at that Wednesday, when I'm back in the office.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.