2003-03-13 18:18:32

by Oleg Drokin

[permalink] [raw]
Subject: dpt_i2o.c memleak/incorrectness

Hello!

There is something strange going on in drivers/scsi/dpt_i2o.c in both
2.4 and 2.5. adpt_i2o_reset_hba() function allocates 4 bytes
for "status" stuff, then tries to reset controller, then
if timeout on first reset stage is reached, frees "status" and returns,
otherwise it proceeds to monitor "status" (which is modified by hardware
now, btw), and if timeout is reached, just exits.
On the first thought I just thought it is trivial memleak that can be
fixed with patch below, but after some more thining I just thought
"what if after some time controller awakes and modifies status,
but it is already allocated for other purposes", scary eh?
So may be we shold not free those four bytes on timeout at all just
for safeness reasons?

Found with help of smatch + enhanced unfree script.

Bye,
Oleg

===== drivers/scsi/dpt_i2o.c 1.9 vs edited =====
--- 1.9/drivers/scsi/dpt_i2o.c Wed Jan 8 18:26:13 2003
+++ edited/drivers/scsi/dpt_i2o.c Thu Mar 13 21:17:10 2003
@@ -1336,6 +1336,7 @@
}
if(time_after(jiffies,timeout)){
printk(KERN_ERR "%s:Timeout waiting for IOP Reset.\n",pHba->name);
+ kfree(status);
return -ETIMEDOUT;
}
} while (m == EMPTY_QUEUE);


2003-03-13 18:25:29

by Alan

[permalink] [raw]
Subject: Re: dpt_i2o.c memleak/incorrectness

On Thu, 2003-03-13 at 18:28, Oleg Drokin wrote:
> Hello!
>
> There is something strange going on in drivers/scsi/dpt_i2o.c in both
> 2.4 and 2.5. adpt_i2o_reset_hba() function allocates 4 bytes
> for "status" stuff, then tries to reset controller, then
> if timeout on first reset stage is reached, frees "status" and returns,
> otherwise it proceeds to monitor "status" (which is modified by hardware
> now, btw), and if timeout is reached, just exits.

Correctly - I2O does the same thing in this case. Its just better to
throw a few bytes away than risk corruption

2003-03-13 18:31:23

by Oleg Drokin

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

Hello!

On Thu, Mar 13, 2003 at 07:44:23PM +0000, Alan Cox wrote:
> > if timeout on first reset stage is reached, frees "status" and returns,
> > otherwise it proceeds to monitor "status" (which is modified by hardware
> > now, btw), and if timeout is reached, just exits.
> Correctly - I2O does the same thing in this case. Its just better to
> throw a few bytes away than risk corruption

Ok, so please consider applying this patch instead (appies to both
2.4 and 2.5)

Bye,
Oleg

===== drivers/scsi/dpt_i2o.c 1.9 vs edited =====
--- 1.9/drivers/scsi/dpt_i2o.c Wed Jan 8 18:26:13 2003
+++ edited/drivers/scsi/dpt_i2o.c Thu Mar 13 21:39:07 2003
@@ -1318,7 +1318,9 @@
while(*status == 0){
if(time_after(jiffies,timeout)){
printk(KERN_WARNING"%s: IOP Reset Timeout\n",pHba->name);
- kfree(status);
+ /* We loose 4 bytes of "status" here, but we cannot
+ free these because controller may awake and corrupt
+ those bytes at any time */
return -ETIMEDOUT;
}
rmb();
@@ -1336,6 +1338,9 @@
}
if(time_after(jiffies,timeout)){
printk(KERN_ERR "%s:Timeout waiting for IOP Reset.\n",pHba->name);
+ /* We loose 4 bytes of "status" here, but we cannot
+ free these because controller may awake and corrupt
+ those bytes at any time */
return -ETIMEDOUT;
}
} while (m == EMPTY_QUEUE);

2003-03-13 18:43:15

by Randy.Dunlap

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On Thu, 13 Mar 2003 21:41:07 +0300 Oleg Drokin <[email protected]> wrote:

| Ok, so please consider applying this patch instead (appies to both
| 2.4 and 2.5)

| + /* We loose 4 bytes of "status" here, but we cannot
lose

| @@ -1336,6 +1338,9 @@
| + /* We loose 4 bytes of "status" here, but we cannot
lose

--
~Randy

2003-03-13 18:48:32

by Bryan Andersen

[permalink] [raw]
Subject: Re: dpt_i2o.c memleak/incorrectness

>> There is something strange going on in drivers/scsi/dpt_i2o.c in both
>> 2.4 and 2.5. adpt_i2o_reset_hba() function allocates 4 bytes
>> for "status" stuff, then tries to reset controller, then
>> if timeout on first reset stage is reached, frees "status" and returns,
>> otherwise it proceeds to monitor "status" (which is modified by hardware
>> now, btw), and if timeout is reached, just exits.
>
> Correctly - I2O does the same thing in this case. Its just better to
> throw a few bytes away than risk corruption

Better document it in the comments or it will get "corrected" by some
mem leak detector. If possible try to use a static for the pointer to
the status block, but that may not work. Re-enterant code and multi CPU
situations likely won't allow for that. Also it might not be worth the
effort to properly determin if it is safe to use only one location.

- Bryan

2003-03-13 18:46:40

by Oleg Drokin

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

Hello!

On Thu, Mar 13, 2003 at 10:51:25AM -0800, Randy.Dunlap wrote:
> | Ok, so please consider applying this patch instead (appies to both
> | 2.4 and 2.5)

Ok, here's the one with spelling fix from Randy ;)

Bye,
Oleg

===== drivers/scsi/dpt_i2o.c 1.9 vs edited =====
--- 1.9/drivers/scsi/dpt_i2o.c Wed Jan 8 18:26:13 2003
+++ edited/drivers/scsi/dpt_i2o.c Thu Mar 13 21:55:08 2003
@@ -1318,7 +1318,9 @@
while(*status == 0){
if(time_after(jiffies,timeout)){
printk(KERN_WARNING"%s: IOP Reset Timeout\n",pHba->name);
- kfree(status);
+ /* We lose 4 bytes of "status" here, but we cannot
+ free these because controller may awake and corrupt
+ those bytes at any time */
return -ETIMEDOUT;
}
rmb();
@@ -1336,6 +1338,9 @@
}
if(time_after(jiffies,timeout)){
printk(KERN_ERR "%s:Timeout waiting for IOP Reset.\n",pHba->name);
+ /* We lose 4 bytes of "status" here, but we cannot
+ free these because controller may awake and corrupt
+ those bytes at any time */
return -ETIMEDOUT;
}
} while (m == EMPTY_QUEUE);

2003-03-13 19:28:40

by Oleg Drokin

[permalink] [raw]
Subject: Now i2o_core.c memleak/incorrectness?

Hello!

On Thu, Mar 13, 2003 at 07:44:23PM +0000, Alan Cox wrote:
> > There is something strange going on in drivers/scsi/dpt_i2o.c in both
> > 2.4 and 2.5. adpt_i2o_reset_hba() function allocates 4 bytes
> > for "status" stuff, then tries to reset controller, then
> > if timeout on first reset stage is reached, frees "status" and returns,
> > otherwise it proceeds to monitor "status" (which is modified by hardware
> > now, btw), and if timeout is reached, just exits.
> Correctly - I2O does the same thing in this case. Its just better to
> throw a few bytes away than risk corruption

Well, it seems that i2o does not always follow this rule.
Also i2o_init_outbound_q() seems not free this "status" thing if everything
went ok, is this intentional?
Or perhaps something like this patch is needed?

Bye,
Oleg

===== drivers/message/i2o/i2o_core.c 1.12 vs edited =====
--- 1.12/drivers/message/i2o/i2o_core.c Tue Aug 6 18:42:18 2002
+++ edited/drivers/message/i2o/i2o_core.c Thu Mar 13 22:36:40 2003
@@ -2217,7 +2217,7 @@
else
printk(KERN_ERR "%s: Outbound queue initialize timeout.\n",
c->name);
- kfree(status);
+ // Better leak this for safety: kfree(status);
return -ETIMEDOUT;
}
schedule();
@@ -2231,6 +2231,7 @@
return -ETIMEDOUT;
}

+ kfree(status);
return 0;
}

2003-03-13 23:23:47

by Alan

[permalink] [raw]
Subject: Re: Now i2o_core.c memleak/incorrectness?

On Thu, 2003-03-13 at 19:38, Oleg Drokin wrote:
> Well, it seems that i2o does not always follow this rule.
> Also i2o_init_outbound_q() seems not free this "status" thing if everything
> went ok, is this intentional?
> Or perhaps something like this patch is needed?

I'll take a look. I'm doing a bunch of i2o cleanup in 2.5 right now

2003-03-14 09:21:23

by Denis Vlasenko

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On 13 March 2003 20:56, Oleg Drokin wrote:
> Hello!
>
> On Thu, Mar 13, 2003 at 10:51:25AM -0800, Randy.Dunlap wrote:
> > | Ok, so please consider applying this patch instead (appies to
> > | both 2.4 and 2.5)
>
> Ok, here's the one with spelling fix from Randy ;)
>
> Bye,
> Oleg
>
> ===== drivers/scsi/dpt_i2o.c 1.9 vs edited =====
> --- 1.9/drivers/scsi/dpt_i2o.c Wed Jan 8 18:26:13 2003
> +++ edited/drivers/scsi/dpt_i2o.c Thu Mar 13 21:55:08 2003
> @@ -1318,7 +1318,9 @@
> while(*status == 0){
> if(time_after(jiffies,timeout)){
> printk(KERN_WARNING"%s: IOP Reset Timeout\n",pHba->name);
> - kfree(status);
> + /* We lose 4 bytes of "status" here, but we cannot
> + free these because controller may awake and corrupt
> + those bytes at any time */

I'd leave kfree() inside the comment - less effort for those
operating under -ENOCAFFEINE condition

/* do NOT do kfree(status): we lose ....

I don't like the whole idea that mem leak is tolerable.

Can we just add a 4 byte scratch pad status to
struct _adpt_hba? Let it scribble there...
--
vda

2003-03-14 11:52:11

by Jörn Engel

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On Fri, 14 March 2003 11:18:49 +0200, Denis Vlasenko wrote:
>
> I don't like the whole idea that mem leak is tolerable.
>
> Can we just add a 4 byte scratch pad status to
> struct _adpt_hba? Let it scribble there...

I've had the same idea, but there might be some pitfalls around.
The problem boils down to the users of the buffer. Is it per-device,
per-process, per-whatever?

Once this is known I'm all for putting the buffer into a per-whatever
data structure. But someone has to understand the driver first. :)

J?rn

--
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein

2003-03-14 13:01:00

by Alan

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On Fri, 2003-03-14 at 09:18, Denis Vlasenko wrote:
> I don't like the whole idea that mem leak is tolerable.
>
> Can we just add a 4 byte scratch pad status to
> struct _adpt_hba? Let it scribble there...

Its 4 bytes (+ slab overhead), its far safer if this happens to say
its gone forever. Its owned by the I2O controller now and it never
gave it back

2003-03-14 13:29:26

by Jörn Engel

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On Fri, 14 March 2003 14:19:58 +0000, Alan Cox wrote:
> On Fri, 2003-03-14 at 09:18, Denis Vlasenko wrote:
> > I don't like the whole idea that mem leak is tolerable.
> >
> > Can we just add a 4 byte scratch pad status to
> > struct _adpt_hba? Let it scribble there...
>
> Its 4 bytes (+ slab overhead), its far safer if this happens to say
> its gone forever. Its owned by the I2O controller now and it never
> gave it back

How about an (optional) counter then? If you can show that this case
is hit zero times during operation, noone will complain. On the other
hand, if some hardware hits this problem 1000+ times, we have a good
reason to find another solution.

I'd volunteer to create the patch, if the idea is accepted.

J?rn

--
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens

2003-03-14 13:36:58

by Oleg Drokin

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

Hello!

On Fri, Mar 14, 2003 at 02:39:42PM +0100, Joern Engel wrote:
> > > Can we just add a 4 byte scratch pad status to
> > > struct _adpt_hba? Let it scribble there...
> > Its 4 bytes (+ slab overhead), its far safer if this happens to say
> > its gone forever. Its owned by the I2O controller now and it never
> > gave it back
> How about an (optional) counter then? If you can show that this case
> is hit zero times during operation, noone will complain. On the other
> hand, if some hardware hits this problem 1000+ times, we have a good
> reason to find another solution.
> I'd volunteer to create the patch, if the idea is accepted.

Well, if some hardware would do so, then users would go here and complain
about kernel being noisy on certain hardware. (message is printed each
time this happens). Have not happened so far.

Bye,
Oleg

2003-03-14 14:07:35

by Alan

[permalink] [raw]
Subject: Re: dpt_i2o.c fix for possibly memory corruption on reset timeout

On Fri, 2003-03-14 at 13:43, Oleg Drokin wrote:
> Well, if some hardware would do so, then users would go here and complain
> about kernel being noisy on certain hardware. (message is printed each
> time this happens). Have not happened so far.

Its run once per controller on board setup, and if it fails that controller
is history until reboot

2003-03-15 18:09:43

by Horst H. von Brand

[permalink] [raw]
Subject: Re: dpt_i2o.c memleak/incorrectness

Bryan Andersen <[email protected]> dijo:
> >> There is something strange going on in drivers/scsi/dpt_i2o.c in both
> >> 2.4 and 2.5. adpt_i2o_reset_hba() function allocates 4 bytes
> >> for "status" stuff, then tries to reset controller, then
> >> if timeout on first reset stage is reached, frees "status" and returns,
> >> otherwise it proceeds to monitor "status" (which is modified by hardware
> >> now, btw), and if timeout is reached, just exits.
> >
> > Correctly - I2O does the same thing in this case. Its just better to
> > throw a few bytes away than risk corruption
>
> Better document it in the comments or it will get "corrected" by some
> mem leak detector. If possible try to use a static for the pointer to
> the status block, but that may not work. Re-enterant code and multi CPU
> situations likely won't allow for that. Also it might not be worth the
> effort to properly determin if it is safe to use only one location.

A device owned area, perhaps? To set up an area that can be messed around
without proper locking will get you problems down the line.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513