2018-01-09 13:33:12

by Gaurav Kohli

[permalink] [raw]
Subject: Query: Crash is coming during /prod/PID/stat and do_exit of same task

HI ,

We are seeing crash in do_task_stat while accessing stack pointer, It
seems same task has already completed do_exit call.
So it seems a race between them:

Below is the crash trace:
49750.534377] Kernel BUG at ffffff8e7a4c53a8 [verbose debug info
unavailable]
[49750.534394] task: ffffffe7b4475580 task.stack: ffffffe7a5f0c000
[49750.534400] PC is at do_task_stat+0x740/0x908
[49750.534402] LR is at do_task_stat+0xa4/0x908
[49750.534403] pc : [<ffffff8e7a4c53a8>] lr : [<ffffff8e7a4c4d0c>]
pstate: 80400145
[49750.534404] sp : ffffffe7a5f0fbd0

and here is stack trace on that core:

-000|user_stack_pointer(inline)
-000|do_task_stat(
    |    m = 0xFFFFFFE7A5CD7380,
    |    ns = 0xFFFFFF8E7C43C748,
    |  ?,
    |    task = 0xFFFFFFE80D8C2280,
    |  ?)
    |  tty_pgrp = 0
    |  ppid = 2084696064
    |  sid = 0
    |  mm = 0xFFFFFFE7B4424140
    |  tcomm = (84, 9, 71, 122, 142, 255, 255, 255, 48, 253, 240, 165,
231, 255, 255, 255)
    |  flags = 18446743969119403392
-001|proc_tgid_stat(
    |    m = 0xFFFFFFE7A5CD7380,
    |  ?,

Below are task stats which shows , process completed the do_exit call:
struct task_struct.flags -x 0xFFFFFFE80D8C2280
  flags = 0x40870c

crash_64> struct task_struct.exit_code -x 0xFFFFFFE80D8C2280
  exit_code = 0x6

   struct task_struct.state -x 0xFFFFFFE80D8C2280
  state = 0x40

In our build both patches are there ,
fs/proc: report eip/esp in /prod/PID/stat for coredumping

and also  task.state has already set PF_DUMPCORE as it got the sigabrt
signal.

Regards
Gaurav


-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project.


2018-01-10 05:20:22

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

> We are seeing crash in do_task_stat while accessing stack pointer, It
> seems same task has already completed do_exit call.
> So it seems a race between them:

Please, post exact kernel version and struct task_struct::usage if you
still have that kernel core (or even full task_struct)

2018-01-15 10:04:51

by Gaurav Kohli

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

Hi John, Ingo

As still we are seeing race between do_task_stat and do_exit of task,
Can't we have to
put more strict check in case, if stack pointer is NULL in below code :

                if (permitted && (task->flags & PF_DUMPCORE)) {
                        eip = KSTK_EIP(task);
                        esp = KSTK_ESP(task);
                }

Regards
Gaurav


On 1/9/2018 7:03 PM, Kohli, Gaurav wrote:
> HI ,
>
> We are seeing crash in do_task_stat while accessing stack pointer, It
> seems same task has already completed do_exit call.
> So it seems a race between them:
>
> Below is the crash trace:
> 49750.534377] Kernel BUG at ffffff8e7a4c53a8 [verbose debug info
> unavailable]
> [49750.534394] task: ffffffe7b4475580 task.stack: ffffffe7a5f0c000
> [49750.534400] PC is at do_task_stat+0x740/0x908
> [49750.534402] LR is at do_task_stat+0xa4/0x908
> [49750.534403] pc : [<ffffff8e7a4c53a8>] lr : [<ffffff8e7a4c4d0c>]
> pstate: 80400145
> [49750.534404] sp : ffffffe7a5f0fbd0
>
> and here is stack trace on that core:
>
> -000|user_stack_pointer(inline)
> -000|do_task_stat(
>     |    m = 0xFFFFFFE7A5CD7380,
>     |    ns = 0xFFFFFF8E7C43C748,
>     |  ?,
>     |    task = 0xFFFFFFE80D8C2280,
>     |  ?)
>     |  tty_pgrp = 0
>     |  ppid = 2084696064
>     |  sid = 0
>     |  mm = 0xFFFFFFE7B4424140
>     |  tcomm = (84, 9, 71, 122, 142, 255, 255, 255, 48, 253, 240, 165,
> 231, 255, 255, 255)
>     |  flags = 18446743969119403392
> -001|proc_tgid_stat(
>     |    m = 0xFFFFFFE7A5CD7380,
>     |  ?,
>
> Below are task stats which shows , process completed the do_exit call:
> struct task_struct.flags -x 0xFFFFFFE80D8C2280
>   flags = 0x40870c
>
> crash_64> struct task_struct.exit_code -x 0xFFFFFFE80D8C2280
>   exit_code = 0x6
>
>    struct task_struct.state -x 0xFFFFFFE80D8C2280
>   state = 0x40
>
> In our build both patches are there ,
> fs/proc: report eip/esp in /prod/PID/stat for coredumping
>
> and also  task.state has already set PF_DUMPCORE as it got the sigabrt
> signal.
>
> Regards
> Gaurav
>
>
> -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation
> Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation
> Collaborative Project.

-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project.

2018-01-15 11:02:28

by John Ogness

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

Hello Gaurav.

On 2018-01-09, Kohli, Gaurav <[email protected]> wrote:
> We are seeing crash in do_task_stat while accessing stack pointer, It
> seems same task has already completed do_exit call.
> So it seems a race between them:
>
> Below is the crash trace:
> 49750.534377] Kernel BUG at ffffff8e7a4c53a8 [verbose debug info
> unavailable]
> [49750.534394] task: ffffffe7b4475580 task.stack: ffffffe7a5f0c000
> [49750.534400] PC is at do_task_stat+0x740/0x908
> [49750.534402] LR is at do_task_stat+0xa4/0x908
> [49750.534403] pc : [<ffffff8e7a4c53a8>] lr : [<ffffff8e7a4c4d0c>]
> pstate: 80400145
> [49750.534404] sp : ffffffe7a5f0fbd0
>
> and here is stack trace on that core:
>
> -000|user_stack_pointer(inline)
> -000|do_task_stat(
>     |    m = 0xFFFFFFE7A5CD7380,
>     |    ns = 0xFFFFFF8E7C43C748,
>     |  ?,
>     |    task = 0xFFFFFFE80D8C2280,
>     |  ?)
>     |  tty_pgrp = 0
>     |  ppid = 2084696064
>     |  sid = 0
>     |  mm = 0xFFFFFFE7B4424140
>     |  tcomm = (84, 9, 71, 122, 142, 255, 255, 255, 48, 253, 240, 165,
> 231, 255, 255, 255)
>     |  flags = 18446743969119403392
> -001|proc_tgid_stat(
>     |    m = 0xFFFFFFE7A5CD7380,
>     |  ?,
>
> Below are task stats which shows , process completed the do_exit call:
> struct task_struct.flags -x 0xFFFFFFE80D8C2280
>   flags = 0x40870c
>
> crash_64> struct task_struct.exit_code -x 0xFFFFFFE80D8C2280
>   exit_code = 0x6
>
>    struct task_struct.state -x 0xFFFFFFE80D8C2280
>   state = 0x40

I am confused why this task is in the TASK_PARKED state. What kind of
task is this?

> In our build both patches are there ,
> fs/proc: report eip/esp in /prod/PID/stat for coredumping
>
> and also  task.state has already set PF_DUMPCORE as it got the sigabrt
> signal.

John Ogness

2018-01-15 12:30:21

by Gaurav Kohli

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

On 1/15/2018 4:32 PM, John Ogness wrote:

> Hello Gaurav.
>
> On 2018-01-09, Kohli, Gaurav <[email protected]> wrote:
>> We are seeing crash in do_task_stat while accessing stack pointer, It
>> seems same task has already completed do_exit call.
>> So it seems a race between them:
>>
>> Below is the crash trace:
>> 49750.534377] Kernel BUG at ffffff8e7a4c53a8 [verbose debug info
>> unavailable]
>> [49750.534394] task: ffffffe7b4475580 task.stack: ffffffe7a5f0c000
>> [49750.534400] PC is at do_task_stat+0x740/0x908
>> [49750.534402] LR is at do_task_stat+0xa4/0x908
>> [49750.534403] pc : [<ffffff8e7a4c53a8>] lr : [<ffffff8e7a4c4d0c>]
>> pstate: 80400145
>> [49750.534404] sp : ffffffe7a5f0fbd0
>>
>> and here is stack trace on that core:
>>
>> -000|user_stack_pointer(inline)
>> -000|do_task_stat(
>>     |    m = 0xFFFFFFE7A5CD7380,
>>     |    ns = 0xFFFFFF8E7C43C748,
>>     |  ?,
>>     |    task = 0xFFFFFFE80D8C2280,
>>     |  ?)
>>     |  tty_pgrp = 0
>>     |  ppid = 2084696064
>>     |  sid = 0
>>     |  mm = 0xFFFFFFE7B4424140
>>     |  tcomm = (84, 9, 71, 122, 142, 255, 255, 255, 48, 253, 240, 165,
>> 231, 255, 255, 255)
>>     |  flags = 18446743969119403392
>> -001|proc_tgid_stat(
>>     |    m = 0xFFFFFFE7A5CD7380,
>>     |  ?,
>>
>> Below are task stats which shows , process completed the do_exit call:
>> struct task_struct.flags -x 0xFFFFFFE80D8C2280
>>   flags = 0x40870c
>>
>> crash_64> struct task_struct.exit_code -x 0xFFFFFFE80D8C2280
>>   exit_code = 0x6
>>
>>    struct task_struct.state -x 0xFFFFFFE80D8C2280
>>   state = 0x40
> I am confused why this task is in the TASK_PARKED state. What kind of
> task is this?

Hi John,

This is android HAL layer service and also before bug, i am seeing lot of service exited in logs also,
although not seeing for this pid 6807

.452202:   <2> init: starting service 'limits-hal-1-0'...

 49749.460039:   <2> init: property_set("ro.boottime.limits-hal-1-0", "61591320967789") failed: property already set

 49749.607496:   <6> sh (2422): drop_caches: 3

 49750.281635:   <6> sh (2422): drop_caches: 3

 49750.533853:   <2> init: Untracked pid 6811 exited with status 0

And why it is parked , that is not clear as state is already updated of task.

Regards

Gaurav

>
>> In our build both patches are there ,
>> fs/proc: report eip/esp in /prod/PID/stat for coredumping
>>
>> and also  task.state has already set PF_DUMPCORE as it got the sigabrt
>> signal.
> John Ogness
>
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2018-01-16 05:36:53

by Gaurav Kohli

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

On 1/10/2018 10:50 AM, Alexey Dobriyan wrote:

>> We are seeing crash in do_task_stat while accessing stack pointer, It
>> seems same task has already completed do_exit call.
>> So it seems a race between them:
> Please, post exact kernel version and struct task_struct::usage if you
> still have that kernel core (or even full task_struct)

Hi Alexey,

We are working on 4.9.65 and Please find below usage value and other task_struct value,
please let me know if some other data required as well.

crash_64> struct task_struct.usage -x  0xFFFFFFE80D8C2280

  usage = {

    counter = 0x4

  }

struct task_struct.flags -x 0xFFFFFFE80D8C2280

  flags = 0x40870c

crash_64> struct task_struct.exit_code -x 0xFFFFFFE80D8C2280

  exit_code = 0x6

 struct task_struct.state -x 0xFFFFFFE80D8C2280

  state = 0x40


Please find below crash stack:

-000|user_stack_pointer(inline)

-000|do_task_stat(

    |    m = 0xFFFFFFE7A5CD7380,

    |    ns = 0xFFFFFF8E7C43C748,

    |  ?,

    |    task = 0xFFFFFFE80D8C2280,

    |  ?)

    |  tty_pgrp = 0

    |  ppid = 2084696064

    |  sid = 0

    |  mm = 0xFFFFFFE7B4424140

    |  tcomm = (84, 9, 71, 122, 142, 255, 255, 255, 48, 253, 240, 165, 231, 255, 255, 255)

    |  flags = 18446743969119403392

-001|proc_tgid_stat(

    |    m = 0xFFFFFFE7A5CD7380,

    |  ?,

    |  ?,

    |  ?)

-002|atomic_sub_return(inline)

Regards
Gaurav

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2018-01-16 07:20:14

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

On Tue, Jan 16, 2018 at 11:06:47AM +0530, Kohli, Gaurav wrote:
> On 1/10/2018 10:50 AM, Alexey Dobriyan wrote:
>
> >> We are seeing crash in do_task_stat while accessing stack pointer, It
> >> seems same task has already completed do_exit call.
> >> So it seems a race between them:
> > Please, post exact kernel version and struct task_struct::usage if you
> > still have that kernel core (or even full task_struct)
>
> Hi Alexey,
>
> We are working on 4.9.65 and Please find below usage value and other task_struct value,
> please let me know if some other data required as well.

Kernel stacks live their own lives nowadays, the code needs try_get_task_stack().

2018-01-16 09:45:05

by Gaurav Kohli

[permalink] [raw]
Subject: Re: Query: Crash is coming during /prod/PID/stat and do_exit of same task

On 1/16/2018 12:50 PM, Alexey Dobriyan wrote:

> On Tue, Jan 16, 2018 at 11:06:47AM +0530, Kohli, Gaurav wrote:
>> On 1/10/2018 10:50 AM, Alexey Dobriyan wrote:
>>
>>>> We are seeing crash in do_task_stat while accessing stack pointer, It
>>>> seems same task has already completed do_exit call.
>>>> So it seems a race between them:
>>> Please, post exact kernel version and struct task_struct::usage if you
>>> still have that kernel core (or even full task_struct)
>> Hi Alexey,
>>
>> We are working on 4.9.65 and Please find below usage value and other task_struct value,
>> please let me know if some other data required as well.
> Kernel stacks live their own lives nowadays, the code needs try_get_task_stack().
>
Hi Alexey,

Yes , agree we have to put some check like below

  if (permitted && (task->flags & PF_DUMPCORE) && try_get_task_stack(task)) {

                        eip = KSTK_EIP(task);

                        esp = KSTK_ESP(task);

                }

Or instead of this also , can't we check whether task is in exiting path or not by checking some flags like PF_EXITING.

Regards

Gaurav

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.