2016-04-14 00:29:54

by Valdis Klētnieks

[permalink] [raw]
Subject: linux-next crash during very early boot

I'm seeing my laptop crash/wedge up/something during very early
boot - before it can write anything to the console. Nothing in pstore,
need to hold down the power button for 6 seconds and reboot.

git bisect points at:

commit 7a6bacb133752beacb76775797fd550417e9d3a2
Author: Joonsoo Kim <[email protected]>
Date: Thu Apr 7 13:59:39 2016 +1000

mm/slab: factor out kmem_cache_node initialization code

It can be reused on other place, so factor out it. Following patch will
use it.


Not sure what the problem is - the logic *looks* ok at first read. The
patch *does* remove a spin_lock_irq() - but I find it difficult to
believe that with it gone, my laptop is able to hit the race condition
the spinlock protects against *every single boot*.

The only other thing I see is that n->free_limit used to be assigned
every time, and now it's only assigned at initial creation.


Attachments:
(No filename) (848.00 B)

2016-04-14 01:34:49

by Joonsoo Kim

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

On Wed, Apr 13, 2016 at 08:29:46PM -0400, Valdis Kletnieks wrote:
> I'm seeing my laptop crash/wedge up/something during very early
> boot - before it can write anything to the console. Nothing in pstore,
> need to hold down the power button for 6 seconds and reboot.
>
> git bisect points at:
>
> commit 7a6bacb133752beacb76775797fd550417e9d3a2
> Author: Joonsoo Kim <[email protected]>
> Date: Thu Apr 7 13:59:39 2016 +1000
>
> mm/slab: factor out kmem_cache_node initialization code
>
> It can be reused on other place, so factor out it. Following patch will
> use it.
>
>
> Not sure what the problem is - the logic *looks* ok at first read. The
> patch *does* remove a spin_lock_irq() - but I find it difficult to
> believe that with it gone, my laptop is able to hit the race condition
> the spinlock protects against *every single boot*.
>
> The only other thing I see is that n->free_limit used to be assigned
> every time, and now it's only assigned at initial creation.

Hello,

My fault. It should be assgined every time. Please test below patch.
I will send it with proper SOB after you confirm the problem disappear.
Thanks for report and analysis!

Thanks.

---------------->8-----------------
diff --git a/mm/slab.c b/mm/slab.c
index 13e74aa..59dd94a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -856,8 +856,14 @@ static int init_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp)
* node has not already allocated this
*/
n = get_node(cachep, node);
- if (n)
+ if (n) {
+ spin_lock_irq(&n->list_lock);
+ n->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount +
+ cachep->num;
+ spin_unlock_irq(&n->list_lock);
+
return 0;
+ }

n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node);
if (!n)

2016-04-14 19:22:39

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

On Thu, 14 Apr 2016 10:35:47 +0900, Joonsoo Kim said:

> My fault. It should be assgined every time. Please test below patch.
> I will send it with proper SOB after you confirm the problem disappear.
> Thanks for report and analysis!

Still bombs out, sorry. Will do more debugging this evening if I have
a chance - will follow up tomorrow morning US time....


Attachments:
(No filename) (848.00 B)

2016-04-15 01:25:23

by Joonsoo Kim

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

2016-04-15 4:22 GMT+09:00 <[email protected]>:
> On Thu, 14 Apr 2016 10:35:47 +0900, Joonsoo Kim said:
>
>> My fault. It should be assgined every time. Please test below patch.
>> I will send it with proper SOB after you confirm the problem disappear.
>> Thanks for report and analysis!
>
> Still bombs out, sorry. Will do more debugging this evening if I have
> a chance - will follow up tomorrow morning US time....

Hmm... could you also apply the patch on below link?
There is another issue from me and fix is there.

https://lkml.org/lkml/2016/4/10/703

Thanks.

2016-04-15 14:10:42

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

On Thu, 14 Apr 2016 10:35:47 +0900, Joonsoo Kim said:
> On Wed, Apr 13, 2016 at 08:29:46PM -0400, Valdis Kletnieks wrote:
> > I'm seeing my laptop crash/wedge up/something during very early
> > boot - before it can write anything to the console. Nothing in pstore,
> > need to hold down the power button for 6 seconds and reboot.
> >
> > git bisect points at:
> >
> > commit 7a6bacb133752beacb76775797fd550417e9d3a2
> > Author: Joonsoo Kim <[email protected]>
> > Date: Thu Apr 7 13:59:39 2016 +1000
> >
> > mm/slab: factor out kmem_cache_node initialization code
> >
> > It can be reused on other place, so factor out it. Following patch wil
l
> > use it.
> >
> >
> > Not sure what the problem is - the logic *looks* ok at first read. The
> > patch *does* remove a spin_lock_irq() - but I find it difficult to
> > believe that with it gone, my laptop is able to hit the race condition
> > the spinlock protects against *every single boot*.
> >
> > The only other thing I see is that n->free_limit used to be assigned
> > every time, and now it's only assigned at initial creation.
>
> Hello,
>
> My fault. It should be assgined every time. Please test below patch.
> I will send it with proper SOB after you confirm the problem disappear.
> Thanks for report and analysis!

Following up - I verified that it was your patch series and not a bad bisect
by starting with a clean next-20160413 and reverting that series - and the
resulting kernel boots fine.

Will take a closer look at your fix patch and figure out what's still changed
afterwards - there's obviously some small semantic change that actually
matters, but we're not spotting it yet...


Attachments:
(No filename) (848.00 B)

2016-04-20 08:10:41

by Joonsoo Kim

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

On Fri, Apr 15, 2016 at 10:10:33AM -0400, [email protected] wrote:
> On Thu, 14 Apr 2016 10:35:47 +0900, Joonsoo Kim said:
> > On Wed, Apr 13, 2016 at 08:29:46PM -0400, Valdis Kletnieks wrote:
> > > I'm seeing my laptop crash/wedge up/something during very early
> > > boot - before it can write anything to the console. Nothing in pstore,
> > > need to hold down the power button for 6 seconds and reboot.
> > >
> > > git bisect points at:
> > >
> > > commit 7a6bacb133752beacb76775797fd550417e9d3a2
> > > Author: Joonsoo Kim <[email protected]>
> > > Date: Thu Apr 7 13:59:39 2016 +1000
> > >
> > > mm/slab: factor out kmem_cache_node initialization code
> > >
> > > It can be reused on other place, so factor out it. Following patch wil
> l
> > > use it.
> > >
> > >
> > > Not sure what the problem is - the logic *looks* ok at first read. The
> > > patch *does* remove a spin_lock_irq() - but I find it difficult to
> > > believe that with it gone, my laptop is able to hit the race condition
> > > the spinlock protects against *every single boot*.
> > >
> > > The only other thing I see is that n->free_limit used to be assigned
> > > every time, and now it's only assigned at initial creation.
> >
> > Hello,
> >
> > My fault. It should be assgined every time. Please test below patch.
> > I will send it with proper SOB after you confirm the problem disappear.
> > Thanks for report and analysis!
>
> Following up - I verified that it was your patch series and not a bad bisect
> by starting with a clean next-20160413 and reverting that series - and the
> resulting kernel boots fine.
>
> Will take a closer look at your fix patch and figure out what's still changed
> afterwards - there's obviously some small semantic change that actually
> matters, but we're not spotting it yet...

Hello,

Do you try to test the patch in following link on top of my fix for "mm/slab:
factor out kmem_cache_node initialization code"?

https://lkml.org/lkml/2016/4/10/703

I mentioned it in another thread but you didn't reply it so I'm
curious.

Thanks.

2016-04-21 03:14:36

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: linux-next crash during very early boot

On Fri, 15 Apr 2016 10:10:33 -0400, [email protected] said:
> On Thu, 14 Apr 2016 10:35:47 +0900, Joonsoo Kim said:
> > On Wed, Apr 13, 2016 at 08:29:46PM -0400, Valdis Kletnieks wrote:
> > > I'm seeing my laptop crash/wedge up/something during very early
> > > boot - before it can write anything to the console. Nothing in pstore,
> > > need to hold down the power button for 6 seconds and reboot.
> > >
> > > git bisect points at:
> > >
> > > commit 7a6bacb133752beacb76775797fd550417e9d3a2
> > > Author: Joonsoo Kim <[email protected]>
> > > Date: Thu Apr 7 13:59:39 2016 +1000
> > >
> > > mm/slab: factor out kmem_cache_node initialization code
> > >
> > > It can be reused on other place, so factor out it. Following patch will
> > > use it.
> > >
> > >
> > > Not sure what the problem is - the logic *looks* ok at first read. The
> > > patch *does* remove a spin_lock_irq() - but I find it difficult to
> > > believe that with it gone, my laptop is able to hit the race condition
> > > the spinlock protects against *every single boot*.
> > >
> > > The only other thing I see is that n->free_limit used to be assigned
> > > every time, and now it's only assigned at initial creation.
> >
> > Hello,
> >
> > My fault. It should be assgined every time. Please test below patch.
> > I will send it with proper SOB after you confirm the problem disappear.
> > Thanks for report and analysis!
>
> Following up - I verified that it was your patch series and not a bad bisect
> by starting with a clean next-20160413 and reverting that series - and the
> resulting kernel boots fine.

Following up some more - next-20160420 seems to work just fine, even with
no sign in 'git log -- mm/slab.c' of the fix-patch....

I'm obviously having a very bad "things that go bump in the night" with
kernels lately - this makes 3 different "makes no sense" things I've posted
in the last 6 hours... :)


Attachments:
(No filename) (848.00 B)