2006-01-27 10:10:17

by Chase Venters

[permalink] [raw]
Subject: More information on scsi_cmd_cache leak... (bisect)

Greetings,
Just a quick recap - there are at least 4 reports of 2.6.15 users
experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us
have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.
After dealing with this leak for a while, I decided to do some dancing around
with git bisect. I've landed on a possible point of regression:

commit: a9701a30470856408d08657eb1bd7ae29a146190
[PATCH] md: support BIO_RW_BARRIER for md/raid1

I spent about an hour and a half reading through the patch, trying to see if
I could make sense of what might be wrong. The result (after I dug into the
code to make a change I foolishly thought made sense) was a hung kernel.
This is important because when I rebooted into the kernel that had been
giving me trouble, it started an md resync and I'm now watching (at least
during this resync) the slab usage for scsi_cmd_cache stay sane:

turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :
slabdata 3 3 0

I guess I'm going to have to wait for this resync to finish and see if the
slab leak starts back up.
Other data: I've done an mdadm --stop on my raid1 /boot partition, leaving
only my raid10 / partition active. The slab continues to leak long after md0
(raid1) is stopped. Boot is also never mounted except when explicitly needed.
As I was going through bisect iterations, the cache on good kernels would
report no more than 9 objects right after boot (granted, I built the bisect
kernels with support for basically nothing but my USB keyboard and SATA). The
bad kernels would vary from 350 on up to somewhere around 600 right after
boot.
All bisect kernels were built from the same .config, and "make clean; make
mrproper" was executed between each compile. (Do I really need to do that?
How does git handle timestamps? I assumed I didn't really need to but did it
to be thorough).
I've attached my bisect log in case anyone wants to trace my steps, as well
as the config I was using to build the bisect kernels.
I can't rule Neil's patch out as being inherently responsible, but I'm sure
lots of people are using it now, so perhaps issuing these barriers in
md_super_write are having an interaction elsewhere?
I thought super_written_barrier looked funny in terms of how bio_put is
called on the original bio before the barrier bio is sent to super_written
(looked backwards), but altering it caused the hang. I guess I'm not quite
there yet in understanding block IO :P

Thanks,
Chase


Attachments:
(No filename) (2.54 kB)
bisect-log (2.32 kB)
bisect-config (28.89 kB)
Download all attachments

2006-01-27 11:12:23

by NeilBrown

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Friday January 27, [email protected] wrote:
> Greetings,
> Just a quick recap - there are at least 4 reports of 2.6.15 users
> experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us
> have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.
> After dealing with this leak for a while, I decided to do some dancing around
> with git bisect. I've landed on a possible point of regression:
>
> commit: a9701a30470856408d08657eb1bd7ae29a146190
> [PATCH] md: support BIO_RW_BARRIER for md/raid1
>
> I spent about an hour and a half reading through the patch, trying to see if
> I could make sense of what might be wrong. The result (after I dug into the
> code to make a change I foolishly thought made sense) was a hung kernel.
> This is important because when I rebooted into the kernel that had been
> giving me trouble, it started an md resync and I'm now watching (at least
> during this resync) the slab usage for scsi_cmd_cache stay sane:
>
> turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
> scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :
> slabdata 3 3 0
>

This suggests that the problem happens when a BIO_RW_BARRIER write is
sent to the device. With this patch, md flags all superblock writes
as BIO_RW_BARRIER However md is not so likely to update the superblock often
during a resync.

There is a (rough) count of the number of superblock writes in the
"Events" counter which "mdadm -D" will display.
You could try collecting 'Events' counter together with the
'active_objs' count from /proc/slabinfo and graph the pairs - see if
they are linear.

I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
command to the device, and the driver for your particular device may
well be losing scsi_cmd_cache allocation when doing that, but I leave
that to someone how knows more about that code.

Good detective work!

NeilBrown

2006-01-27 11:21:56

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, Neil Brown wrote:
> On Friday January 27, [email protected] wrote:
> > Greetings,
> > Just a quick recap - there are at least 4 reports of 2.6.15 users
> > experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us
> > have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.
> > After dealing with this leak for a while, I decided to do some dancing around
> > with git bisect. I've landed on a possible point of regression:
> >
> > commit: a9701a30470856408d08657eb1bd7ae29a146190
> > [PATCH] md: support BIO_RW_BARRIER for md/raid1
> >
> > I spent about an hour and a half reading through the patch, trying to see if
> > I could make sense of what might be wrong. The result (after I dug into the
> > code to make a change I foolishly thought made sense) was a hung kernel.
> > This is important because when I rebooted into the kernel that had been
> > giving me trouble, it started an md resync and I'm now watching (at least
> > during this resync) the slab usage for scsi_cmd_cache stay sane:
> >
> > turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
> > scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :
> > slabdata 3 3 0
> >
>
> This suggests that the problem happens when a BIO_RW_BARRIER write is
> sent to the device. With this patch, md flags all superblock writes
> as BIO_RW_BARRIER However md is not so likely to update the superblock often
> during a resync.
>
> There is a (rough) count of the number of superblock writes in the
> "Events" counter which "mdadm -D" will display.
> You could try collecting 'Events' counter together with the
> 'active_objs' count from /proc/slabinfo and graph the pairs - see if
> they are linear.
>
> I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
> command to the device, and the driver for your particular device may
> well be losing scsi_cmd_cache allocation when doing that, but I leave
> that to someone how knows more about that code.

I already checked up on that since I suspected barriers initially. The
path there for scsi is sd.c:sd_issue_flush() which looks pretty straight
forward. In the end it goes through the block layer and gets back to the
SCSI layer as a regular REQ_BLOCK_PC request.

--
Jens Axboe

2006-01-27 11:26:32

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, Jens Axboe wrote:
> On Fri, Jan 27 2006, Neil Brown wrote:
> > On Friday January 27, [email protected] wrote:
> > > Greetings,
> > > Just a quick recap - there are at least 4 reports of 2.6.15 users
> > > experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us
> > > have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.
> > > After dealing with this leak for a while, I decided to do some dancing around
> > > with git bisect. I've landed on a possible point of regression:
> > >
> > > commit: a9701a30470856408d08657eb1bd7ae29a146190
> > > [PATCH] md: support BIO_RW_BARRIER for md/raid1
> > >
> > > I spent about an hour and a half reading through the patch, trying to see if
> > > I could make sense of what might be wrong. The result (after I dug into the
> > > code to make a change I foolishly thought made sense) was a hung kernel.
> > > This is important because when I rebooted into the kernel that had been
> > > giving me trouble, it started an md resync and I'm now watching (at least
> > > during this resync) the slab usage for scsi_cmd_cache stay sane:
> > >
> > > turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
> > > scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :
> > > slabdata 3 3 0
> > >
> >
> > This suggests that the problem happens when a BIO_RW_BARRIER write is
> > sent to the device. With this patch, md flags all superblock writes
> > as BIO_RW_BARRIER However md is not so likely to update the superblock often
> > during a resync.
> >
> > There is a (rough) count of the number of superblock writes in the
> > "Events" counter which "mdadm -D" will display.
> > You could try collecting 'Events' counter together with the
> > 'active_objs' count from /proc/slabinfo and graph the pairs - see if
> > they are linear.
> >
> > I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
> > command to the device, and the driver for your particular device may
> > well be losing scsi_cmd_cache allocation when doing that, but I leave
> > that to someone how knows more about that code.
>
> I already checked up on that since I suspected barriers initially. The
> path there for scsi is sd.c:sd_issue_flush() which looks pretty straight
> forward. In the end it goes through the block layer and gets back to the
> SCSI layer as a regular REQ_BLOCK_PC request.

Sorry, that was for the ->issue_flush() that md also does but did before
the barrier addition as well. Most of the barrier handling is done in
the block layer, but it could show leaks in SCSI of course. FWIW, I
tested barriers with and without md on SCSI here a few days ago and
didn't see any leaks at all.

Chase, can you post full dmesg again? I don't have it, thanks.

--
Jens Axboe

2006-01-27 13:16:09

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27, 2006 at 04:09:44AM -0600, Chase Venters wrote:
> All bisect kernels were built from the same .config, and "make clean; make
> mrproper" was executed between each compile. (Do I really need to do that?

git bisect {good,bad}
make

should be enough.

First bisect steps usually result in full rebuild because some often
included header is patched.

> How does git handle timestamps? I assumed I didn't really need to but did it
> to be thorough).

2006-01-27 15:21:04

by Chase Venters

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Friday 27 January 2006 05:28, Jens Axboe wrote:
> Sorry, that was for the ->issue_flush() that md also does but did before
> the barrier addition as well. Most of the barrier handling is done in
> the block layer, but it could show leaks in SCSI of course. FWIW, I
> tested barriers with and without md on SCSI here a few days ago and
> didn't see any leaks at all.
>
> Chase, can you post full dmesg again? I don't have it, thanks.

Attached.

Thanks,
Chase


Attachments:
(No filename) (462.00 B)
dmesg (31.08 kB)
Download all attachments

2006-01-27 18:41:45

by Ariel

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)


On Fri, 27 Jan 2006, Chase Venters wrote:

> After dealing with this leak for a while, I decided to do some dancing around
> with git bisect. I've landed on a possible point of regression:
>
> commit: a9701a30470856408d08657eb1bd7ae29a146190
> [PATCH] md: support BIO_RW_BARRIER for md/raid1

I can confirm that it only leaks with raid!

I rebooted with my raid5 root, read only, and it didn't leak. As soon as I
remount,rw it started leaking. Go back to ro and it stopped (although it
didn't clean up the old leaks). Tried my raid1 /boot and same thing - rw
leaks, ro doesn't. But, it only leaks on activity.

I then tried a regular lvm mount (with root ro), and no leaks!

What's interesting is that the mount was ro NOT the md (which can be set
ro independently). So it looks like it only leaks if you write to the md
device, and that's why setting the mount ro stopped the leaks.

-Ariel

2006-01-27 18:53:15

by Ariel

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)


I found a patch called: slab-leak-detector.patch and applied it (manually
since it's for a slightly older kernel).

Here are the results: (after leaking about 1MB)

c03741ba <__scsi_get_command+0x29/0x73>

Is the leaker with 294 of them, vs 4 of:

c03743fd <scsi_setup_command_freelist+0xb0/0x101>

and 21:
fffffffe <0xfffffffe>

I'm not sure how helpful that result is though, since I guess we already
knew it was scsi, and from the other email, that's it's also some
interaction with md.

-Ariel

2006-01-27 18:58:07

by Chase Venters

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, 27 Jan 2006, Ariel wrote:
>
> On Fri, 27 Jan 2006, Chase Venters wrote:
>
>> After dealing with this leak for a while, I decided to do some
>> dancing around
>> with git bisect. I've landed on a possible point of regression:
>>
>> commit: a9701a30470856408d08657eb1bd7ae29a146190
>> [PATCH] md: support BIO_RW_BARRIER for md/raid1
>
> I can confirm that it only leaks with raid!
>
> I rebooted with my raid5 root, read only, and it didn't leak. As soon as I
> remount,rw it started leaking. Go back to ro and it stopped (although it
> didn't clean up the old leaks). Tried my raid1 /boot and same thing - rw
> leaks, ro doesn't. But, it only leaks on activity.
>
> I then tried a regular lvm mount (with root ro), and no leaks!
>
> What's interesting is that the mount was ro NOT the md (which can be set ro
> independently). So it looks like it only leaks if you write to the md device,
> and that's why setting the mount ro stopped the leaks.

Yeah, if the mount is ro, md won't have any reasons to write the
superblock any more, which means it won't be sending out bio's with
barriers any more.

I'm in the middle of a crash course on block IO and SATA, but hopefully
some more skillful devs will beat me to the punch :)

> -Ariel

Cheers,
Chase

2006-01-27 19:06:45

by Mike Christie

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

Jens Axboe wrote:
> On Fri, Jan 27 2006, Jens Axboe wrote:
>
>>On Fri, Jan 27 2006, Neil Brown wrote:
>>
>>>On Friday January 27, [email protected] wrote:
>>>
>>>>Greetings,
>>>> Just a quick recap - there are at least 4 reports of 2.6.15 users
>>>>experiencing severe slab leaks with scsi_cmd_cache. It seems that a few of us
>>>>have a board (Asus P5GDC-V Deluxe) in common. We seem to have raid in common.
>>>> After dealing with this leak for a while, I decided to do some dancing around
>>>>with git bisect. I've landed on a possible point of regression:
>>>>
>>>>commit: a9701a30470856408d08657eb1bd7ae29a146190
>>>>[PATCH] md: support BIO_RW_BARRIER for md/raid1
>>>>
>>>> I spent about an hour and a half reading through the patch, trying to see if
>>>>I could make sense of what might be wrong. The result (after I dug into the
>>>>code to make a change I foolishly thought made sense) was a hung kernel.
>>>> This is important because when I rebooted into the kernel that had been
>>>>giving me trouble, it started an md resync and I'm now watching (at least
>>>>during this resync) the slab usage for scsi_cmd_cache stay sane:
>>>>
>>>>turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
>>>>scsi_cmd_cache 30 30 384 10 1 : tunables 54 27 8 :
>>>>slabdata 3 3 0
>>>>
>>>
>>>This suggests that the problem happens when a BIO_RW_BARRIER write is
>>>sent to the device. With this patch, md flags all superblock writes
>>>as BIO_RW_BARRIER However md is not so likely to update the superblock often
>>>during a resync.
>>>
>>>There is a (rough) count of the number of superblock writes in the
>>>"Events" counter which "mdadm -D" will display.
>>>You could try collecting 'Events' counter together with the
>>>'active_objs' count from /proc/slabinfo and graph the pairs - see if
>>>they are linear.
>>>
>>>I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
>>>command to the device, and the driver for your particular device may
>>>well be losing scsi_cmd_cache allocation when doing that, but I leave
>>>that to someone how knows more about that code.
>>
>>I already checked up on that since I suspected barriers initially. The
>>path there for scsi is sd.c:sd_issue_flush() which looks pretty straight
>>forward. In the end it goes through the block layer and gets back to the
>>SCSI layer as a regular REQ_BLOCK_PC request.
>
>
> Sorry, that was for the ->issue_flush() that md also does but did before
> the barrier addition as well. Most of the barrier handling is done in
> the block layer, but it could show leaks in SCSI of course. FWIW, I
> tested barriers with and without md on SCSI here a few days ago and
> didn't see any leaks at all.
>

It does not have anything to do with this in scsi_io_completion does it?

if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
return;

For that case the scsi_cmnd does not get freed. Does it come back around
again and get released from a different path?

2006-01-27 19:16:06

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, Mike Christie wrote:
> Jens Axboe wrote:
> >On Fri, Jan 27 2006, Jens Axboe wrote:
> >
> >>On Fri, Jan 27 2006, Neil Brown wrote:
> >>
> >>>On Friday January 27, [email protected] wrote:
> >>>
> >>>>Greetings,
> >>>> Just a quick recap - there are at least 4 reports of 2.6.15 users
> >>>>experiencing severe slab leaks with scsi_cmd_cache. It seems that a few
> >>>>of us have a board (Asus P5GDC-V Deluxe) in common. We seem to have
> >>>>raid in common. After dealing with this leak for a while, I decided
> >>>> to do some dancing around with git bisect. I've landed on a possible
> >>>>point of regression:
> >>>>
> >>>>commit: a9701a30470856408d08657eb1bd7ae29a146190
> >>>>[PATCH] md: support BIO_RW_BARRIER for md/raid1
> >>>>
> >>>> I spent about an hour and a half reading through the patch, trying
> >>>> to see if I could make sense of what might be wrong. The result (after
> >>>>I dug into the code to make a change I foolishly thought made sense)
> >>>>was a hung kernel.
> >>>> This is important because when I rebooted into the kernel that had
> >>>> been giving me trouble, it started an md resync and I'm now watching
> >>>>(at least during this resync) the slab usage for scsi_cmd_cache stay
> >>>>sane:
> >>>>
> >>>>turbotaz ~ # cat /proc/slabinfo | grep scsi_cmd_cache
> >>>>scsi_cmd_cache 30 30 384 10 1 : tunables 54 27
> >>>>8 : slabdata 3 3 0
> >>>>
> >>>
> >>>This suggests that the problem happens when a BIO_RW_BARRIER write is
> >>>sent to the device. With this patch, md flags all superblock writes
> >>>as BIO_RW_BARRIER However md is not so likely to update the superblock
> >>>often
> >>>during a resync.
> >>>
> >>>There is a (rough) count of the number of superblock writes in the
> >>>"Events" counter which "mdadm -D" will display.
> >>>You could try collecting 'Events' counter together with the
> >>>'active_objs' count from /proc/slabinfo and graph the pairs - see if
> >>>they are linear.
> >>>
> >>>I believe a BIO_RW_BARRIER is likely to send some sort of 'flush'
> >>>command to the device, and the driver for your particular device may
> >>>well be losing scsi_cmd_cache allocation when doing that, but I leave
> >>>that to someone how knows more about that code.
> >>
> >>I already checked up on that since I suspected barriers initially. The
> >>path there for scsi is sd.c:sd_issue_flush() which looks pretty straight
> >>forward. In the end it goes through the block layer and gets back to the
> >>SCSI layer as a regular REQ_BLOCK_PC request.
> >
> >
> >Sorry, that was for the ->issue_flush() that md also does but did before
> >the barrier addition as well. Most of the barrier handling is done in
> >the block layer, but it could show leaks in SCSI of course. FWIW, I
> >tested barriers with and without md on SCSI here a few days ago and
> >didn't see any leaks at all.
> >
>
> It does not have anything to do with this in scsi_io_completion does it?
>
> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
> return;
>
> For that case the scsi_cmnd does not get freed. Does it come back around
> again and get released from a different path?

Certainly smells fishy. Unfortunately I cannot take a look at this until
monday :/

But adding some tracing there might be really interesting. Since we are
not seeing bio and/or req leaks, this does look very promising.

--
Jens Axboe

2006-01-27 19:21:05

by James Bottomley

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
> It does not have anything to do with this in scsi_io_completion does it?
>
> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
> return;
>
> For that case the scsi_cmnd does not get freed. Does it come back around
> again and get released from a different path?

It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
removed that code around 6 Jan (in [BLOCK] update SCSI to use new
blk_ordered for barriers), so if it is that, then the latest kernels
should now not be leaking.

However, all the avaliable evidence does seem to point to the write
barrier enforcement. I'll take another look over those code paths.

James


2006-01-27 19:29:04

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, James Bottomley wrote:
> On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
> > It does not have anything to do with this in scsi_io_completion does it?
> >
> > if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
> > return;
> >
> > For that case the scsi_cmnd does not get freed. Does it come back around
> > again and get released from a different path?
>
> It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
> removed that code around 6 Jan (in [BLOCK] update SCSI to use new
> blk_ordered for barriers), so if it is that, then the latest kernels
> should now not be leaking.

Ah I thought so, seems my memory wasn't totally shot (don't have the
sources with me).

> However, all the avaliable evidence does seem to point to the write
> barrier enforcement. I'll take another look over those code paths.

The fact that it only happens with raid is very odd, though.

--
Jens Axboe

2006-01-27 19:46:25

by Mike Christie

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

James Bottomley wrote:
> On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
>
>>It does not have anything to do with this in scsi_io_completion does it?
>>
>> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
>> return;
>>
>>For that case the scsi_cmnd does not get freed. Does it come back around
>>again and get released from a different path?
>
>
> It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
> removed that code around 6 Jan (in [BLOCK] update SCSI to use new
> blk_ordered for barriers), so if it is that, then the latest kernels
> should now not be leaking.
>

Oh, I thought the reports were for 2.6.15 and below which has that
scsi_io_completion test. Have there been reports for this with
2.6.16-rc1 too?

2006-01-27 19:48:43

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, Mike Christie wrote:
> James Bottomley wrote:
> >On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
> >
> >>It does not have anything to do with this in scsi_io_completion does it?
> >>
> >> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
> >> return;
> >>
> >>For that case the scsi_cmnd does not get freed. Does it come back around
> >>again and get released from a different path?
> >
> >
> >It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
> >removed that code around 6 Jan (in [BLOCK] update SCSI to use new
> >blk_ordered for barriers), so if it is that, then the latest kernels
> >should now not be leaking.
> >
>
> Oh, I thought the reports were for 2.6.15 and below which has that
> scsi_io_completion test. Have there been reports for this with
> 2.6.16-rc1 too?

The reports of leaks are only with > 2.6.15, not with 2.6.15.

--
Jens Axboe

2006-01-27 19:53:40

by Chase Venters

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, 27 Jan 2006, Jens Axboe wrote:

> On Fri, Jan 27 2006, Mike Christie wrote:
>> James Bottomley wrote:
>>> On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
>>>
>>>> It does not have anything to do with this in scsi_io_completion does it?
>>>>
>>>> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
>>>> return;
>>>>
>>>> For that case the scsi_cmnd does not get freed. Does it come back around
>>>> again and get released from a different path?
>>>
>>>
>>> It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
>>> removed that code around 6 Jan (in [BLOCK] update SCSI to use new
>>> blk_ordered for barriers), so if it is that, then the latest kernels
>>> should now not be leaking.
>>>
>>
>> Oh, I thought the reports were for 2.6.15 and below which has that
>> scsi_io_completion test. Have there been reports for this with
>> 2.6.16-rc1 too?
>
> The reports of leaks are only with > 2.6.15, not with 2.6.15.
>

Correction... my leak is with 2.6.15. I discovered it originally in an
NVIDIA-tainted, sk98lin-patched 2.6.15, but my bisect was stock 2.6.15
(bad) to 2.6.14 (good) in Linus's tree, sans any tainting or
modifications.

I haven't actually tried building the latest Linus kernel from git. I'll
do a pull and give it a try when I get home.

Cheers,
Chase

2006-01-27 20:03:00

by Ariel

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)



On Fri, 27 Jan 2006, Chase Venters wrote:

> On Fri, 27 Jan 2006, Jens Axboe wrote:

>> The reports of leaks are only with > 2.6.15, not with 2.6.15.

> Correction... my leak is with 2.6.15.

Mine is also 2.6.15. Stock with debian patches.

In fact I believe ALL the reports are from 2.6.15.

-Ariel

2006-01-27 20:05:23

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, Chase Venters wrote:
> On Fri, 27 Jan 2006, Jens Axboe wrote:
>
> >On Fri, Jan 27 2006, Mike Christie wrote:
> >>James Bottomley wrote:
> >>>On Fri, 2006-01-27 at 13:06 -0600, Mike Christie wrote:
> >>>
> >>>>It does not have anything to do with this in scsi_io_completion does it?
> >>>>
> >>>> if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
> >>>> return;
> >>>>
> >>>>For that case the scsi_cmnd does not get freed. Does it come back around
> >>>>again and get released from a different path?
> >>>
> >>>
> >>>It looks such a likely candidate, doesn't it. Unfortunately, Tejun Heo
> >>>removed that code around 6 Jan (in [BLOCK] update SCSI to use new
> >>>blk_ordered for barriers), so if it is that, then the latest kernels
> >>>should now not be leaking.
> >>>
> >>
> >>Oh, I thought the reports were for 2.6.15 and below which has that
> >>scsi_io_completion test. Have there been reports for this with
> >>2.6.16-rc1 too?
> >
> >The reports of leaks are only with > 2.6.15, not with 2.6.15.
> >
>
> Correction... my leak is with 2.6.15. I discovered it originally in an
> NVIDIA-tainted, sk98lin-patched 2.6.15, but my bisect was stock 2.6.15
> (bad) to 2.6.14 (good) in Linus's tree, sans any tainting or
> modifications.
>
> I haven't actually tried building the latest Linus kernel from git. I'll
> do a pull and give it a try when I get home.

Ah, so the raid barrier stuff predates 2.6.15, I didn't think it did.
Can you try 2.6.16-rc1 at least? If this is the blk_complete_barrier()
leak, then it's not so interesting.

As a work around, please try (in 2.6.15) to set ordered_flush to 0 in
the scsi host template for your sata driver.

--
Jens Axboe

2006-01-27 20:05:51

by Jens Axboe

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Fri, Jan 27 2006, [email protected] wrote:
>
>
> On Fri, 27 Jan 2006, Chase Venters wrote:
>
> >On Fri, 27 Jan 2006, Jens Axboe wrote:
>
> >>The reports of leaks are only with > 2.6.15, not with 2.6.15.
>
> >Correction... my leak is with 2.6.15.
>
> Mine is also 2.6.15. Stock with debian patches.
>
> In fact I believe ALL the reports are from 2.6.15.

Hmm so does it happen in 2.6.16-rc1 or not? And try the suggestion I
made in the other, edit the sata driver for your device and set
ordered_flush to 0 instead of 1.

--
Jens Axboe

2006-01-27 21:07:40

by NeilBrown

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

On Friday January 27, [email protected] wrote:
>
> On Fri, 27 Jan 2006, Chase Venters wrote:
>
> > After dealing with this leak for a while, I decided to do some dancing around
> > with git bisect. I've landed on a possible point of regression:
> >
> > commit: a9701a30470856408d08657eb1bd7ae29a146190
> > [PATCH] md: support BIO_RW_BARRIER for md/raid1
>
> I can confirm that it only leaks with raid!
>
> I rebooted with my raid5 root, read only, and it didn't leak. As soon as I
> remount,rw it started leaking. Go back to ro and it stopped (although it
> didn't clean up the old leaks). Tried my raid1 /boot and same thing - rw
> leaks, ro doesn't. But, it only leaks on activity.
>
> I then tried a regular lvm mount (with root ro), and no leaks!

It might be interesting, but probably not particularly helpful, to
create an ext3 filesystem on this device and mount it with the
barrier=1
mount option.
That should send BIO_RW_BARRIER requests to the device just line md
does.

NeilBrown

2006-01-27 22:50:53

by Tim Morley

[permalink] [raw]
Subject: Re: More information on scsi_cmd_cache leak... (bisect)

Jens Axboe writes ("Re: More information on scsi_cmd_cache leak... (bisect)"):
> On Fri, Jan 27 2006, [email protected] wrote:
> > Mine is also 2.6.15. Stock with debian patches.
> >
> > In fact I believe ALL the reports are from 2.6.15.
>
> Hmm so does it happen in 2.6.16-rc1 or not? And try the suggestion I
> made in the other, edit the sata driver for your device and set
> ordered_flush to 0 instead of 1.

FYI I've had the problem with 2.6.15 and 2.6.15.1. I'm using raid1 and
raid5 on 4 sata drives with an ICH7 controller on an ASUS P5LD2-VM.

I've just got a 2.6.16-rc1 built and it seems that it fixes it! My
scsi_cmd_cache is setting happily at 10 allocations and not moving.

Tim Morley