2002-10-01 22:21:45

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Andreas Boman wrote:
>
> * Andrew Morton ([email protected]) wrote:
> > Helge Hafting wrote:
> > >
> > > ..
> > > [<c01146c4>]__might_sleep+0x54/0x60
> > > [<c012dca0>]kmalloc+0x4c/0x130
> > > [<c010b6b2>]sys_ioperm+0x82/0x11c
> > > [<c0106fbb>]syscall_call+0x7/0xb
> > >
> >
> >
> > You up to trying this fix?
> >
>
> This patch on 2.3.40+xfsfix didnt change anything here, while my trace
> does look different it seems to be related.

It's different.

> I get a similar oops when ide initializes during boot,

It's not an oops. It's just a warning.

>
> Debug: sleeping function called from illegal context at slab.c:1374

See? I added "Debug:" to the message ;)

> Call Trace:
> [__kmem_cache_alloc+255/272]__kmem_cache_alloc+0xff/0x110
> [get_vm_area+38/256]get_vm_area+0x26/0x100
> [__vmalloc+75/304]__vmalloc+0x4b/0x130
> [vmalloc+34/48]vmalloc+0x22/0x30
> [<e08fe502>]sg_init+0x82/0x130 [sg]
> [<e09022c7>].rodata.str1.1+0x23/0x2b0 [sg]
> [<e0903be0>]sg_fops+0x0/0x58 [sg]
> [<e0903b20>]sg_template+0x0/0x94 [sg]

That is known - sg_init() is blatantly calling vmalloc under
write_lock_irqsave().


2002-10-01 23:44:31

by Joaquim Fellmann

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Le mer 02/10/2002 ? 00:23, Andrew Morton a ?crit :


> > Call Trace:
> > [__kmem_cache_alloc+255/272]__kmem_cache_alloc+0xff/0x110
> > [get_vm_area+38/256]get_vm_area+0x26/0x100
> > [__vmalloc+75/304]__vmalloc+0x4b/0x130
> > [vmalloc+34/48]vmalloc+0x22/0x30
> > [<e08fe502>]sg_init+0x82/0x130 [sg]
> > [<e09022c7>].rodata.str1.1+0x23/0x2b0 [sg]
> > [<e0903be0>]sg_fops+0x0/0x58 [sg]
> > [<e0903b20>]sg_template+0x0/0x94 [sg]
>
> That is known - sg_init() is blatantly calling vmalloc under
> write_lock_irqsave().

Hye,

is that the same problem ?
Apparently it's not scsi related as the one above:

Debug: sleeping function called from illegal context at slab.c:1374
c72a5f60 c01170b4 c02809c0 c0284df1 0000055e c72dfc80 c01311aa c0284df1
0000055e 00000000 00000400 bffffd24 c72dfc80 c01aebcf c72df960 00000011
c010d142 00000080 000001d0 c72a4000 00000100 bffffd24 bffffc2c 00000000
Call Trace:
[<c01170b4>]__might_sleep+0x54/0x60
[<c01311aa>]kmalloc+0x56/0x214
[<c01aebcf>]capable+0x1b/0x34
[<c010d142>]sys_ioperm+0x82/0x11c
[<c0108a3f>]syscall_call+0x7/0xb


It's from a 2.5.40 pulled from bk.

Regards

--
Joaquim Fellmann


2002-10-02 16:18:59

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Andrew Morton [[email protected]] wrote:
> That is known - sg_init() is blatantly calling vmalloc under
> write_lock_irqsave().

I had not already seen a patch for this.

During Douglas Gilbert's time-off he connects when he can so it maybe a
bit until he can address this.

In the interim a quick patch below should fix the problem, and still
provide for safe additions.

I have done just minor testing on 2.5.40 using the sg_utils.

-andmike
--
Michael Anderson
[email protected]

sg.c | 48 +++++++++++++++++++++++++++++++++---------------
1 files changed, 33 insertions(+), 15 deletions(-)

diff -Nru a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c Wed Oct 2 09:00:41 2002
+++ b/drivers/scsi/sg.c Wed Oct 2 09:00:41 2002
@@ -1354,11 +1354,29 @@
{
static int sg_registered = 0;
unsigned long iflags;
+ int tmp_dev_max;
+ Sg_device **tmp_da;

if ((sg_template.dev_noticed == 0) || sg_dev_arr)
return 0;

+ SCSI_LOG_TIMEOUT(3, printk("sg_init\n"));
+
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
+ tmp_dev_max = sg_template.dev_noticed + SG_DEV_ARR_LUMP;
+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+
+ tmp_da = (Sg_device **)vmalloc(
+ tmp_dev_max * sizeof(Sg_device *));
+ if (NULL == tmp_da) {
+ printk(KERN_ERR "sg_init: no space for sg_dev_arr\n");
+ return 1;
+ }
+ memset(tmp_da, 0, tmp_dev_max * sizeof (Sg_device *));
write_lock_irqsave(&sg_dev_arr_lock, iflags);
+ sg_template.dev_max = tmp_dev_max;
+ sg_dev_arr = tmp_da;
+
if (!sg_registered) {
if (register_chrdev(SCSI_GENERIC_MAJOR, "sg", &sg_fops)) {
printk(KERN_ERR
@@ -1370,16 +1388,6 @@
sg_registered++;
}

- SCSI_LOG_TIMEOUT(3, printk("sg_init\n"));
- sg_template.dev_max = sg_template.dev_noticed + SG_DEV_ARR_LUMP;
- sg_dev_arr = (Sg_device **)vmalloc(
- sg_template.dev_max * sizeof(Sg_device *));
- if (NULL == sg_dev_arr) {
- printk(KERN_ERR "sg_init: no space for sg_dev_arr\n");
- write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
- return 1;
- }
- memset(sg_dev_arr, 0, sg_template.dev_max * sizeof (Sg_device *));
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
#ifdef CONFIG_PROC_FS
sg_proc_init();
@@ -1430,7 +1438,7 @@
static int
sg_attach(Scsi_Device * scsidp)
{
- Sg_device *sdp;
+ Sg_device *sdp = NULL;
unsigned long iflags;
int k;

@@ -1439,15 +1447,16 @@
Sg_device **tmp_da;
int tmp_dev_max = sg_template.nr_dev + SG_DEV_ARR_LUMP;

+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
tmp_da = (Sg_device **)vmalloc(
tmp_dev_max * sizeof(Sg_device *));
if (NULL == tmp_da) {
scsidp->attached--;
- write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
printk(KERN_ERR
"sg_attach: device array cannot be resized\n");
return 1;
}
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
memset(tmp_da, 0, tmp_dev_max * sizeof (Sg_device *));
memcpy(tmp_da, sg_dev_arr,
sg_template.dev_max * sizeof (Sg_device *));
@@ -1456,6 +1465,7 @@
sg_template.dev_max = tmp_dev_max;
}

+find_empty_slot:
for (k = 0; k < sg_template.dev_max; k++)
if (!sg_dev_arr[k])
break;
@@ -1467,11 +1477,19 @@
" type=%d, minor number exceed %d\n",
scsidp->host->host_no, scsidp->channel, scsidp->id,
scsidp->lun, scsidp->type, SG_MAX_DEVS_MASK);
+ if (NULL != sdp)
+ vfree((char *) sdp);
return 1;
}
- if (k < sg_template.dev_max)
- sdp = (Sg_device *)vmalloc(sizeof(Sg_device));
- else
+ if (k < sg_template.dev_max) {
+ if (NULL == sdp) {
+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+ sdp = (Sg_device *)vmalloc(sizeof(Sg_device));
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
+ if (!sg_dev_arr[k])
+ goto find_empty_slot;
+ }
+ } else
sdp = NULL;
if (NULL == sdp) {
scsidp->attached--;

2002-10-02 18:55:07

by Andreas Boman

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

* Mike Anderson ([email protected]) wrote:
> Andrew Morton [[email protected]] wrote:
> > That is known - sg_init() is blatantly calling vmalloc under
> > write_lock_irqsave().
>
> I had not already seen a patch for this.
>
> During Douglas Gilbert's time-off he connects when he can so it maybe a
> bit until he can address this.
>
> In the interim a quick patch below should fix the problem, and still
> provide for safe additions.
>
> I have done just minor testing on 2.5.40 using the sg_utils.

This seems to have fixed that particular warning, and got me a new one:

Debug: sleeping function called from illegal context at /usr/src/linux-2.5.40/include/asm/semaphore.h:119
debb7e38 debb7e5c c016f69b c0356860 00000077 df354680 e0907020 e0907028
e0907020 debb7e78 c0241e06 e09070bc dfd563c8 e0907020 e0907028 debb6000
debb7e94 c0240348 e0907020 c01f483f e0907020 00000000 00000000 debb7ee4
Call Trace:
[driverfs_create_dir+75/240]driverfs_create_dir+0x4b/0xf0
[device_make_dir+70/144]device_make_dir+0x46/0x90
[device_register+184/368]device_register+0xb8/0x170
[sprintf+31/48]sprintf+0x1f/0x30
[<e08fe86a>]sg_attach+0x23a/0x450 [sg]
[<e09023ce>].rodata.str1.1+0x6a/0x2b0 [sg]
[<e0902387>].rodata.str1.1+0x23/0x2b0 [sg]
[<e0903ca0>]sg_fops+0x0/0x58 [sg]
[<e0903be0>]sg_template+0x0/0x94 [sg]
[scsi_register_device+269/336]scsi_register_device+0x10d/0x150
[<e08fecd3>]init_sg+0x23/0x60 [sg]
[<e0903be0>]sg_template+0x0/0x94 [sg]
[sys_init_module+1311/1648]sys_init_module+0x51f/0x670
[<e08fc060>]E __insmod_sg_O/lib/modules/2.5.40anb4/kernel/drivers/scsi/sg.o_M3D9B4C96_V132392+0x60/0x80 [sg]
[<e0902614>]__ksymtab+0x0/0x28 [sg]
[<e08fc060>]E __insmod_sg_O/lib/modules/2.5.40anb4/kernel/drivers/scsi/sg.o_M3D9B4C96_V132392+0x60/0x80 [sg]
[syscall_call+7/11]syscall_call+0x7/0xb

modprobe cdrom and sr_mod follow without problems, but then when i modprobe
ide-scsi:

scsi1 : SCSI host adapter emulation for IDE ATAPI devices
scsi_eh_offline_sdevs: Device set offline - notready or command retry failedafter error recovery: host1 channel 0 id 0 lun 0
Vendor: Model: Rev:
Type: Direct-Access ANSI SCSI revision: 00
hda: lost interrupt
ide-scsi: (IO,CoD) != (0,1) while issuing a packet command
hda: DMA disabled
hda: ATAPI reset complete

and the box is dead again.

2002-10-02 19:25:32

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Andreas Boman [[email protected]] wrote:
> This seems to have fixed that particular warning, and got me a new one:
>
> Debug: sleeping function called from illegal context at /usr/src/linux-2.5.40/include/asm/semaphore.h:119
> debb7e38 debb7e5c c016f69b c0356860 00000077 df354680 e0907020 e0907028
> e0907020 debb7e78 c0241e06 e09070bc dfd563c8 e0907020 e0907028 debb6000
> debb7e94 c0240348 e0907020 c01f483f e0907020 00000000 00000000 debb7ee4
> Call Trace:
> [driverfs_create_dir+75/240]driverfs_create_dir+0x4b/0xf0
> [device_make_dir+70/144]device_make_dir+0x46/0x90
> [device_register+184/368]device_register+0xb8/0x170
> [sprintf+31/48]sprintf+0x1f/0x30
> [<e08fe86a>]sg_attach+0x23a/0x450 [sg]
> [<e09023ce>].rodata.str1.1+0x6a/0x2b0 [sg]
> [<e0902387>].rodata.str1.1+0x23/0x2b0 [sg]
> [<e0903ca0>]sg_fops+0x0/0x58 [sg]
> [<e0903be0>]sg_template+0x0/0x94 [sg]
> [scsi_register_device+269/336]scsi_register_device+0x10d/0x150
> [<e08fecd3>]init_sg+0x23/0x60 [sg]
> [<e0903be0>]sg_template+0x0/0x94 [sg]
> [sys_init_module+1311/1648]sys_init_module+0x51f/0x670
> [<e08fc060>]E __insmod_sg_O/lib/modules/2.5.40anb4/kernel/drivers/scsi/sg.o_M3D9B4C96_V132392+0x60/0x80 [sg]
> [<e0902614>]__ksymtab+0x0/0x28 [sg]
> [<e08fc060>]E __insmod_sg_O/lib/modules/2.5.40anb4/kernel/drivers/scsi/sg.o_M3D9B4C96_V132392+0x60/0x80 [sg]
> [syscall_call+7/11]syscall_call+0x7/0xb
>
Sorry, patmans pointed out the issue to me this morning. I was also
in a hurray this morning and grab a .config with CONFIG_PREEMPT off.

I am running now with PREEMPT and moved the driverfs calls to the other
side of the lock give me a few more minutes of testing and I will resend.

-andmike
--
Michael Anderson
[email protected]

2002-10-02 21:16:50

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Andreas,
Here is the updated patch.

-andmike
--
Michael Anderson
[email protected]

sg.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 40 insertions(+), 19 deletions(-)

diff -Nru a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c Wed Oct 2 12:19:48 2002
+++ b/drivers/scsi/sg.c Wed Oct 2 12:19:48 2002
@@ -1354,32 +1354,42 @@
{
static int sg_registered = 0;
unsigned long iflags;
+ int tmp_dev_max;
+ Sg_device **tmp_da;

if ((sg_template.dev_noticed == 0) || sg_dev_arr)
return 0;

+ SCSI_LOG_TIMEOUT(3, printk("sg_init\n"));
+
write_lock_irqsave(&sg_dev_arr_lock, iflags);
+ tmp_dev_max = sg_template.dev_noticed + SG_DEV_ARR_LUMP;
+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+
+ tmp_da = (Sg_device **)vmalloc(
+ tmp_dev_max * sizeof(Sg_device *));
+ if (NULL == tmp_da) {
+ printk(KERN_ERR "sg_init: no space for sg_dev_arr\n");
+ return 1;
+ }
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
+
if (!sg_registered) {
if (register_chrdev(SCSI_GENERIC_MAJOR, "sg", &sg_fops)) {
printk(KERN_ERR
"Unable to get major %d for generic SCSI device\n",
SCSI_GENERIC_MAJOR);
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+ vfree((char *) tmp_da);
return 1;
}
sg_registered++;
}

- SCSI_LOG_TIMEOUT(3, printk("sg_init\n"));
- sg_template.dev_max = sg_template.dev_noticed + SG_DEV_ARR_LUMP;
- sg_dev_arr = (Sg_device **)vmalloc(
- sg_template.dev_max * sizeof(Sg_device *));
- if (NULL == sg_dev_arr) {
- printk(KERN_ERR "sg_init: no space for sg_dev_arr\n");
- write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
- return 1;
- }
- memset(sg_dev_arr, 0, sg_template.dev_max * sizeof (Sg_device *));
+ memset(tmp_da, 0, tmp_dev_max * sizeof (Sg_device *));
+ sg_template.dev_max = tmp_dev_max;
+ sg_dev_arr = tmp_da;
+
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
#ifdef CONFIG_PROC_FS
sg_proc_init();
@@ -1430,7 +1440,7 @@
static int
sg_attach(Scsi_Device * scsidp)
{
- Sg_device *sdp;
+ Sg_device *sdp = NULL;
unsigned long iflags;
int k;

@@ -1439,15 +1449,16 @@
Sg_device **tmp_da;
int tmp_dev_max = sg_template.nr_dev + SG_DEV_ARR_LUMP;

+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
tmp_da = (Sg_device **)vmalloc(
tmp_dev_max * sizeof(Sg_device *));
if (NULL == tmp_da) {
scsidp->attached--;
- write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
printk(KERN_ERR
"sg_attach: device array cannot be resized\n");
return 1;
}
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
memset(tmp_da, 0, tmp_dev_max * sizeof (Sg_device *));
memcpy(tmp_da, sg_dev_arr,
sg_template.dev_max * sizeof (Sg_device *));
@@ -1456,6 +1467,7 @@
sg_template.dev_max = tmp_dev_max;
}

+find_empty_slot:
for (k = 0; k < sg_template.dev_max; k++)
if (!sg_dev_arr[k])
break;
@@ -1467,11 +1479,19 @@
" type=%d, minor number exceed %d\n",
scsidp->host->host_no, scsidp->channel, scsidp->id,
scsidp->lun, scsidp->type, SG_MAX_DEVS_MASK);
+ if (NULL != sdp)
+ vfree((char *) sdp);
return 1;
}
- if (k < sg_template.dev_max)
- sdp = (Sg_device *)vmalloc(sizeof(Sg_device));
- else
+ if (k < sg_template.dev_max) {
+ if (NULL == sdp) {
+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+ sdp = (Sg_device *)vmalloc(sizeof(Sg_device));
+ write_lock_irqsave(&sg_dev_arr_lock, iflags);
+ if (!sg_dev_arr[k])
+ goto find_empty_slot;
+ }
+ } else
sdp = NULL;
if (NULL == sdp) {
scsidp->attached--;
@@ -1498,17 +1518,18 @@
scsidp->sdev_driverfs_dev.name);
sdp->sg_driverfs_dev.parent = &scsidp->sdev_driverfs_dev;
sdp->sg_driverfs_dev.bus = &scsi_driverfs_bus_type;
+
+ sg_template.nr_dev++;
+ sg_dev_arr[k] = sdp;
+ write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
+
device_register(&sdp->sg_driverfs_dev);
device_create_file(&sdp->sg_driverfs_dev, &dev_attr_type);
device_create_file(&sdp->sg_driverfs_dev, &dev_attr_kdev);
-
sdp->de = devfs_register(scsidp->de, "generic", DEVFS_FL_DEFAULT,
SCSI_GENERIC_MAJOR, k,
S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
&sg_fops, sdp);
- sg_template.nr_dev++;
- sg_dev_arr[k] = sdp;
- write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
switch (scsidp->type) {
case TYPE_DISK:
case TYPE_MOD:

2002-10-03 00:04:15

by Andreas Boman

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

* Mike Anderson ([email protected]) wrote:
> Andreas,
> Here is the updated patch.
>

Yep, no more warnings on modprobe sg. Unfortenuately the box still hangs after
modprobe ide-scsi:

scsi1 : SCSI host adapter emulation for IDE ATAPI devices
scsi_eh_offline_sdevs: Device set offline - notready or command retry failedafter error recovery: host1 channel 0 id 0 lun 0
Vendor: Model: Rev:
Type: Direct-Access ANSI SCSI revision: 00
hda: lost interrupt
ide-scsi: CoD != 0 in idescsi_pc_intr
hda: DMA disabled
hda: ATAPI reset complete


andreas

2002-10-03 17:11:08

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.4.39 "Sleeping function called from illegal context at slab.c:1374"

Andreas,
I noticed a problem with the scsi_error.c update the I made to
2.5.40. There is a typo in the the tur check on error handling.
I tested the patch yesterday and it is recovering better on
faults.

I do not know if it will fix your problem, but it might be worth
a try.

I will send it to you in bit I am in the process of
rolling it and other patches that depend on it up.


Andreas Boman [[email protected]] wrote:
> * Mike Anderson ([email protected]) wrote:
> > Andreas,
> > Here is the updated patch.
> >
>
> Yep, no more warnings on modprobe sg. Unfortenuately the box still hangs after
> modprobe ide-scsi:
>
> scsi1 : SCSI host adapter emulation for IDE ATAPI devices
> scsi_eh_offline_sdevs: Device set offline - notready or command retry failedafter error recovery: host1 channel 0 id 0 lun 0
> Vendor: Model: Rev:
> Type: Direct-Access ANSI SCSI revision: 00
> hda: lost interrupt
> ide-scsi: CoD != 0 in idescsi_pc_intr
> hda: DMA disabled
> hda: ATAPI reset complete
>
>
> andreas
-andmike
--
Michael Anderson
[email protected]