2010-07-02 23:42:29

by Moffett, Kyle D

[permalink] [raw]
Subject: [P2020] "Processor 1 is stuck" (introduced by 8b27f0b61)

Hello,

I'm working on a new board port to a P2020-based system (e500v2) and I appear to be hitting a regression which causes the second core to fail to come up at boot with a "Processor 1 is stuck" message.

In the successful case (my board support patches on top of v2.6.32):
> smp_85xx_kick_cpu: kick CPU #1
> smp_85xx_kick_cpu: cpu-release-addr: 0x7ffff280
> smp_85xx_kick_cpu: got virt addr: 0xf1014280
> waited 1 msecs for CPU #1
> Processor 1 found.
> Brought up 2 CPUs

In the failing case (with board support patches on top of either 8b27f0b61 or v2.6.34):
> smp_85xx_kick_cpu: kick CPU #1
> smp_85xx_kick_cpu: cpu-release-addr: 0x7ffff280
> smp_85xx_kick_cpu: got virt addr: 0xf1014280
> waited 1 msecs for CPU #1
> [...5 second delay here...]
> Processor 1 is stuck.
> Brought up 1 CPUs

I believe I've bisected a bug to this commit:

> Commit: 8b27f0b61db57f5555fc2d3fc95c3ea9fd1a9d6c
> Author: Kumar Gala <[email protected]>
>
> powerpc/fsl-booke: Rework TLB CAM code
> * Bump'd # of CAM entries to 64 to support e500mc
> * Make the code handle MAS7 properly
> * Use pr_cont instead of creating a string as we go

If I revert 8b27f0b61 on top of v2.6.34 (I fixed the conflicts by deleting the extra hunks), both CPUs come up properly. My current board support files can be browsed via gitweb or cloned via smart-http or natively from here:
http://opensource.exmeritus.com/git/hww-1u-1a/linux.git
git://opensource.exmeritus.com/hww-1u-1a/linux.git

The "latest-v2.6.34" branch is based on v2.6.34 and does not work (Processor 1 is stuck), the "latest-v2.6.32" branch is based on v2.6.32.15 and works correctly.

Our U-Boot port can be browsed here:
http://opensource.exmeritus.com/git/hww-1u-1a/u-boot.git
git://opensource.exmeritus.com/hww-1u-1a/u-boot.git

Any help you can provide would be greatly appreciated.

Cheers,
Kyle Moffett


2010-07-03 00:20:18

by Moffett, Kyle D

[permalink] [raw]
Subject: Re: [P2020] "Processor 1 is stuck" (introduced by 8b27f0b61)

On Jul 02, 2010, at 19:30, Moffett, Kyle D wrote:
> I'm working on a new board port to a P2020-based system (e500v2) and I appear to be hitting a regression which causes the second core to fail to come up at boot with a "Processor 1 is stuck" message.
>
> [...]
>
> If I revert 8b27f0b61 on top of v2.6.34 (I fixed the conflicts by deleting the extra hunks), both CPUs come up properly. My current board support files can be browsed via gitweb or cloned via smart-http or natively from here:
> http://opensource.exmeritus.com/git/hww-1u-1a/linux.git
> git://opensource.exmeritus.com/hww-1u-1a/linux.git

With a little bit more debugging, I was able to strip the patch down to a partial revert (on top of v2.6.34) which seems to fix the bug here. I pushed that patch out to the "latest-v2.6.34" branch of linux.git as commit 1214341.

Cheers,
Kyle Moffett-

2010-07-03 14:26:13

by Kyle Moffett

[permalink] [raw]
Subject: Re: [P2020] "Processor 1 is stuck" (introduced by 8b27f0b61)

On Fri, Jul 2, 2010 at 20:20, Moffett, Kyle D <[email protected]> wrote:
> On Jul 02, 2010, at 19:30, Moffett, Kyle D wrote:
>> I'm working on a new board port to a P2020-based system (e500v2) and I appear to be hitting a regression which causes the second core to fail to come up at boot with a "Processor 1 is stuck" message.
>>
>> [...]
>>
>> If I revert 8b27f0b61 on top of v2.6.34 (I fixed the conflicts by deleting the extra hunks), both CPUs come up properly.  My current board support files can be browsed via gitweb or cloned via smart-http or natively from here:
>>  http://opensource.exmeritus.com/git/hww-1u-1a/linux.git
>>  git://opensource.exmeritus.com/hww-1u-1a/linux.git
>
> With a little bit more debugging, I was able to strip the patch down to a partial revert (on top of v2.6.34) which seems to fix the bug here.  I pushed that patch out to the "latest-v2.6.34" branch of linux.git as commit 1214341.

Aha!

I just found Kumar's upstream commit
78f622377f7d31d988db350a43c5689dd5f31876 (which seems to have been
sent to the stable trees as well), which converts loadcam_entry() back
into being an assembly function because it breaks ftrace. I believe
my config has ftrace on, which would explain the problem.

Cheers,
Kyle Moffett