2015-07-16 05:57:19

by Zumeng Chen

[permalink] [raw]
Subject: BUG: perf error on syscalls for powerpc64.

Hi All,

1028ccf5 did a change for sys_call_table from a pointer to an array of
unsigned long, I think it's not proper, here is my reason:

sys_call_table defined as a label in assembler should be pointer array
rather than an array as described in 1028ccf5. If we defined it as an
array, then arch_syscall_addr will return the address of sys_call_table[],
actually the content of sys_call_table[] is demanded by arch_syscall_addr.
so 'perf list' will ignore all syscalls since find_syscall_meta will
return null
in init_ftrace_syscalls because of the wrong arch_syscall_addr.

Did I miss something, or Gcc compiler has done something newer ?

Cheers,
Zumeng


2015-07-16 09:04:34

by Michael Ellerman

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> Hi All,
>
> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> unsigned long, I think it's not proper, here is my reason:
>
> sys_call_table defined as a label in assembler should be pointer array
> rather than an array as described in 1028ccf5. If we defined it as an
> array, then arch_syscall_addr will return the address of sys_call_table[],
> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> so 'perf list' will ignore all syscalls since find_syscall_meta will
> return null
> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
>
> Did I miss something, or Gcc compiler has done something newer ?

Hi Zumeng,

It works for me with the code as it is in mainline.

I don't quite follow your explanation, so if you're seeing a bug please send
some information about what you're actually seeing. And include the disassembly
of arch_syscall_addr() and your compiler version etc.

cheers

2015-07-17 01:27:13

by Zumeng Chen

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On 2015年07月16日 17:04, Michael Ellerman wrote:
> On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
>> Hi All,
>>
>> 1028ccf5 did a change for sys_call_table from a pointer to an array of
>> unsigned long, I think it's not proper, here is my reason:
>>
>> sys_call_table defined as a label in assembler should be pointer array
>> rather than an array as described in 1028ccf5. If we defined it as an
>> array, then arch_syscall_addr will return the address of sys_call_table[],
>> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
>> so 'perf list' will ignore all syscalls since find_syscall_meta will
>> return null
>> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
>>
>> Did I miss something, or Gcc compiler has done something newer ?
> Hi Zumeng,
>
> It works for me with the code as it is in mainline.
>
> I don't quite follow your explanation, so if you're seeing a bug please send
> some information about what you're actually seeing. And include the disassembly
> of arch_syscall_addr() and your compiler version etc.

Hi Michael,

Yeah, it seems it was not a good explanation, I'll explain more this time:

1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
to sys_call_table rather than sys_call_table self in assemble level.

arch/powerpc/kernel/systbl.S
47 .globl sys_call_table <--- see here
48 sys_call_table:

So if you want to exclaim sys_call_table as array, then I think
it's very
clear what we'll get when we do sys_call_table[i].

2. Disassemble codes difference of arch_syscall_addr with or without
1028ccf5
================================================

*) With 1028ccf5
---------------------
Dump of assembler code for function arch_syscall_addr:
522 {
523 return (unsigned long)sys_call_table[nr];
0xc000000000df53d4 <+0>: addis r10,r2,-13
0xc000000000df53d8 <+4>: addi r9,r10,3488
0xc000000000df53dc <+8>: rldicr r3,r3,3,60
524 }
0xc000000000df53e0 <+12>: ldx r3,r9,r3
0xc000000000df53e4 <+16>: blr


*) Without 1028ccf5
---------------------------
Dump of assembler code for function arch_syscall_addr:
522 {
523 return (unsigned long)sys_call_table[nr];
0xc000000000df53d0 <+0>: addis r10,r2,-13
0xc000000000df53d4 <+4>: addi r9,r10,3488
0xc000000000df53d8 <+8>: rldicr r3,r3,3,60
0xc000000000df53dc <+12>: ld r9,0(r9) <------only this is
different
524 }
0xc000000000df53e0 <+16>: ldx r3,r9,r3
0xc000000000df53e4 <+20>: blr
End of assembler dump.

3. What I have seen in 3.14.x kernel,
======================
And so far, no more difference to 4.x kernel from me about this part if
I'm right.

*) With 1028ccf5

perf list|grep -i syscall got me nothing.


*) Without 1028ccf5
root@localhost:~# perf list|grep -i syscall
syscalls:sys_enter_socket [Tracepoint event]
syscalls:sys_exit_socket [Tracepoint event]
syscalls:sys_enter_socketpair [Tracepoint event]
syscalls:sys_exit_socketpair [Tracepoint event]
syscalls:sys_enter_bind [Tracepoint event]
syscalls:sys_exit_bind [Tracepoint event]
syscalls:sys_enter_listen [Tracepoint event]
syscalls:sys_exit_listen [Tracepoint event]
... ...

Cheers,
Zumeng

>
> cheers
>
>

2015-07-17 01:52:19

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

Zumeng Chen [[email protected]] wrote:
| 3. What I have seen in 3.14.x kernel,
| ======================
| And so far, no more difference to 4.x kernel from me about this part if
| I'm right.
|
| *) With 1028ccf5
|
| perf list|grep -i syscall got me nothing.
|
|
| *) Without 1028ccf5
| root@localhost:~# perf list|grep -i syscall
| syscalls:sys_enter_socket [Tracepoint event]
| syscalls:sys_exit_socket [Tracepoint event]
| syscalls:sys_enter_socketpair [Tracepoint event]
| syscalls:sys_exit_socketpair [Tracepoint event]
| syscalls:sys_enter_bind [Tracepoint event]
| syscalls:sys_exit_bind [Tracepoint event]
| syscalls:sys_enter_listen [Tracepoint event]
| syscalls:sys_exit_listen [Tracepoint event]
| ... ...

Are you seeing this on big-endian or little-endian system?

IIRC, I saw the opposite behavior on an LE system a few months ago.
i.e. without 1028ccf5, 'perf listf|grep syscall' failed.

Applying 1028ccf5, seemed to fix it.

Sukadev

2015-07-17 02:00:49

by Ian Munsie

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

Excerpts from Sukadev Bhattiprolu's message of 2015-07-17 11:51:04 +1000:
> Are you seeing this on big-endian or little-endian system?
>
> IIRC, I saw the opposite behavior on an LE system a few months ago.
> i.e. without 1028ccf5, 'perf listf|grep syscall' failed.
>
> Applying 1028ccf5, seemed to fix it.

You could be on to something there - IIRC the ABI was changed for LE to
remove the dot symbols. Might be worth testing on both.

Cheers,
-Ian

2015-07-17 04:07:26

by Michael Ellerman

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
> On 2015年07月16日 17:04, Michael Ellerman wrote:
> > On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> >> Hi All,
> >>
> >> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> >> unsigned long, I think it's not proper, here is my reason:
> >>
> >> sys_call_table defined as a label in assembler should be pointer array
> >> rather than an array as described in 1028ccf5. If we defined it as an
> >> array, then arch_syscall_addr will return the address of sys_call_table[],
> >> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> >> so 'perf list' will ignore all syscalls since find_syscall_meta will
> >> return null
> >> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
> >>
> >> Did I miss something, or Gcc compiler has done something newer ?
> > Hi Zumeng,
> >
> > It works for me with the code as it is in mainline.
> >
> > I don't quite follow your explanation, so if you're seeing a bug please send
> > some information about what you're actually seeing. And include the disassembly
> > of arch_syscall_addr() and your compiler version etc.
>
> Hi Michael,

Hi Zumeng,

> Yeah, it seems it was not a good explanation, I'll explain more this time:
>
> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
> to sys_call_table rather than sys_call_table self in assemble level.

No it's not a pointer.

A pointer is a location in memory that contains the address of another location
in memory.

> arch/powerpc/kernel/systbl.S
> 47 .globl sys_call_table <--- see here
> 48 sys_call_table:

Which gives us a .o that looks like:

0000000000000000 <sys_call_table>:
0: R_PPC64_ADDR64 sys_restart_syscall
8: R_PPC64_ADDR64 sys_restart_syscall
10: R_PPC64_ADDR64 sys_exit
18: R_PPC64_ADDR64 sys_exit

ie. at the location in memory called sys_call_table we have *the contents of
the syscall table*.

We do not have *the address* of the syscall table.

You can also see in the System.map:

c000000000bb0798 R sys_call_table
c000000000bb1e58 r cache_type_info

ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only
occupy 8 bytes.

Compare to SYS_CALL_TABLE, which *is* a pointer.

c000000001172bf8 d SYS_CALL_TABLE
c000000001172c00 d exception_marker

Note, 8 bytes.


Finally if you look at a running system using xmon:

0:mon> d $sys_call_table
c0000000008f0798 c0000000000a85a0 c0000000000a85a0 |................|
c0000000008f07a8 c000000000099b40 c000000000099b40 |.......@.......@|

0:mon> la c0000000000a85a0
c0000000000a85a0: .sys_restart_syscall+0x0/0x40
0:mon> la c000000000099b40
c000000000099b40: .SyS_exit+0x0/0x20

0:mon> d $SYS_CALL_TABLE
c000000000ec68f8 c0000000008f0798 7265677368657265 |........regshere|
^
this is the address of sys_call_table


As another example, see hcall_real_table, which is basically identical, and is
also declared as an array in C.


> 3. What I have seen in 3.14.x kernel,
> ======================
> And so far, no more difference to 4.x kernel from me about this part if
> I'm right.
>
> *) With 1028ccf5
>
> perf list|grep -i syscall got me nothing.
>
>
> *) Without 1028ccf5
> root@localhost:~# perf list|grep -i syscall
> syscalls:sys_enter_socket [Tracepoint event]
> syscalls:sys_exit_socket [Tracepoint event]
> syscalls:sys_enter_socketpair [Tracepoint event]
> syscalls:sys_exit_socketpair [Tracepoint event]
> syscalls:sys_enter_bind [Tracepoint event]
> syscalls:sys_exit_bind [Tracepoint event]
> syscalls:sys_enter_listen [Tracepoint event]
> syscalls:sys_exit_listen [Tracepoint event]
> ... ...

I don't know why that's happening.

Please just test 4.2-rc2 for now, so that there are not too many variables.

Assuming you have CONFIG_FTRACE_SYSCALLS=y, you can see the tracepoints in
debugfs with:

$ ls -la /sys/kernel/debug/tracing/events/syscalls
total 0
drwxr-xr-x 596 root root 0 Jul 17 13:11 .
drwxr-xr-x 45 root root 0 Jul 17 13:11 ..
-rw-r--r-- 1 root root 0 Jul 17 13:33 enable
-rw-r--r-- 1 root root 0 Jul 17 13:11 filter
drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept
drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept4
drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_access
drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_add_key
...


cheers


2015-07-17 05:31:16

by Zumeng Chen

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On 2015年07月17日 12:07, Michael Ellerman wrote:
> On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
>> On 2015年07月16日 17:04, Michael Ellerman wrote:
>>> On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
>>>> Hi All,
>>>>
>>>> 1028ccf5 did a change for sys_call_table from a pointer to an array of
>>>> unsigned long, I think it's not proper, here is my reason:
>>>>
>>>> sys_call_table defined as a label in assembler should be pointer array
>>>> rather than an array as described in 1028ccf5. If we defined it as an
>>>> array, then arch_syscall_addr will return the address of sys_call_table[],
>>>> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
>>>> so 'perf list' will ignore all syscalls since find_syscall_meta will
>>>> return null
>>>> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
>>>>
>>>> Did I miss something, or Gcc compiler has done something newer ?
>>> Hi Zumeng,
>>>
>>> It works for me with the code as it is in mainline.
>>>
>>> I don't quite follow your explanation, so if you're seeing a bug please send
>>> some information about what you're actually seeing. And include the disassembly
>>> of arch_syscall_addr() and your compiler version etc.
>> Hi Michael,
> Hi Zumeng,
>
>> Yeah, it seems it was not a good explanation, I'll explain more this time:
>>
>> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
>> to sys_call_table rather than sys_call_table self in assemble level.
> No it's not a pointer.

Then what is the second one in the following:

zchen@pek-yocto-build2:$ cat System.map |grep sys_call_table
c000000000009590 T .sys_call_table <-----this is a real sys_call_table.
c0000000014e1b48 D sys_call_table <-----this should be referred by
arch_syscall_addr

The c0000000014e1b48[0] = c000000000009590

>
> A pointer is a location in memory that contains the address of another location
> in memory.

Yeah, this definition is right.

>
>> arch/powerpc/kernel/systbl.S
>> 47 .globl sys_call_table <--- see here
>> 48 sys_call_table:
> Which gives us a .o that looks like:
>
> 0000000000000000 <sys_call_table>:
> 0: R_PPC64_ADDR64 sys_restart_syscall
> 8: R_PPC64_ADDR64 sys_restart_syscall
> 10: R_PPC64_ADDR64 sys_exit
> 18: R_PPC64_ADDR64 sys_exit
>
> ie. at the location in memory called sys_call_table we have *the contents of
> the syscall table*.
>
> We do not have *the address* of the syscall table.
>
> You can also see in the System.map:
>
> c000000000bb0798 R sys_call_table
> c000000000bb1e58 r cache_type_info

Please refer to `cat System.map` above

>
> ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only
> occupy 8 bytes.
>
> Compare to SYS_CALL_TABLE, which *is* a pointer.
>
> c000000001172bf8 d SYS_CALL_TABLE
> c000000001172c00 d exception_marker
>
> Note, 8 bytes.
>
>
> Finally if you look at a running system using xmon:
>
> 0:mon> d $sys_call_table
> c0000000008f0798 c0000000000a85a0 c0000000000a85a0 |................|
> c0000000008f07a8 c000000000099b40 c000000000099b40 |.......@.......@|

This is right sys_call_table. but not what I'm talking about. What I'm
talking about
is that the definition of sys_call_table by that commit will incur the
following result:

sys_call_table[0]= 0xc0000000014e1b48[0] = c000000000009590 <----Only
this one is right the head address of sys_call_table
sys_call_table[1]= 0xc0000000014e1b48[1] = c0000000015b0da8
sys_call_table[2]= 0xc0000000014e1b48[2] = 0
sys_call_table[3]= 0xc0000000014e1b48[3] = c000000000de0984
sys_call_table[4]= 0xc0000000014e1b48[4] = c0000000015b0da8
sys_call_table[5]= 0xc0000000014e1b48[5] = 0

This is definitely not what we want, is that right?


>
> 0:mon> la c0000000000a85a0
> c0000000000a85a0: .sys_restart_syscall+0x0/0x40
> 0:mon> la c000000000099b40
> c000000000099b40: .SyS_exit+0x0/0x20
>
> 0:mon> d $SYS_CALL_TABLE
> c000000000ec68f8 c0000000008f0798 7265677368657265 |........regshere|
> ^
> this is the address of sys_call_table
>
>
> As another example, see hcall_real_table, which is basically identical, and is
> also declared as an array in C.
>
>
>> 3. What I have seen in 3.14.x kernel,
>> ======================
>> And so far, no more difference to 4.x kernel from me about this part if
>> I'm right.
>>
>> *) With 1028ccf5
>>
>> perf list|grep -i syscall got me nothing.
>>
>>
>> *) Without 1028ccf5
>> root@localhost:~# perf list|grep -i syscall
>> syscalls:sys_enter_socket [Tracepoint event]
>> syscalls:sys_exit_socket [Tracepoint event]
>> syscalls:sys_enter_socketpair [Tracepoint event]
>> syscalls:sys_exit_socketpair [Tracepoint event]
>> syscalls:sys_enter_bind [Tracepoint event]
>> syscalls:sys_exit_bind [Tracepoint event]
>> syscalls:sys_enter_listen [Tracepoint event]
>> syscalls:sys_exit_listen [Tracepoint event]
>> ... ...
> I don't know why that's happening.
>
> Please just test 4.2-rc2 for now, so that there are not too many variables.

Yeah, maybe right.

>
> Assuming you have CONFIG_FTRACE_SYSCALLS=y, you can see the tracepoints in

Absolutely

Cheers,
Zumeng

> debugfs with:
>
> $ ls -la /sys/kernel/debug/tracing/events/syscalls
> total 0
> drwxr-xr-x 596 root root 0 Jul 17 13:11 .
> drwxr-xr-x 45 root root 0 Jul 17 13:11 ..
> -rw-r--r-- 1 root root 0 Jul 17 13:33 enable
> -rw-r--r-- 1 root root 0 Jul 17 13:11 filter
> drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept
> drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_accept4
> drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_access
> drwxr-xr-x 2 root root 0 Jul 17 13:11 sys_enter_add_key
> ...
>
>
> cheers
>
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2015-07-17 05:33:58

by Zumeng Chen

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On 2015年07月17日 09:51, Sukadev Bhattiprolu wrote:
> Zumeng Chen [[email protected]] wrote:
> | 3. What I have seen in 3.14.x kernel,
> | ======================
> | And so far, no more difference to 4.x kernel from me about this part if
> | I'm right.
> |
> | *) With 1028ccf5
> |
> | perf list|grep -i syscall got me nothing.
> |
> |
> | *) Without 1028ccf5
> | root@localhost:~# perf list|grep -i syscall
> | syscalls:sys_enter_socket [Tracepoint event]
> | syscalls:sys_exit_socket [Tracepoint event]
> | syscalls:sys_enter_socketpair [Tracepoint event]
> | syscalls:sys_exit_socketpair [Tracepoint event]
> | syscalls:sys_enter_bind [Tracepoint event]
> | syscalls:sys_exit_bind [Tracepoint event]
> | syscalls:sys_enter_listen [Tracepoint event]
> | syscalls:sys_exit_listen [Tracepoint event]
> | ... ...
>
> Are you seeing this on big-endian or little-endian system?

Big one.

>
> IIRC, I saw the opposite behavior on an LE system a few months ago.
> i.e. without 1028ccf5, 'perf listf|grep syscall' failed.

I wonder if this has anything to do with the bug.

Cheers,
Zumeng

>
> Applying 1028ccf5, seemed to fix it.
>
> Sukadev
>

2015-07-18 02:00:46

by Zumeng Chen

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On 2015年07月17日 09:59, Ian Munsie wrote:
> Excerpts from Sukadev Bhattiprolu's message of 2015-07-17 11:51:04 +1000:
>> Are you seeing this on big-endian or little-endian system?
>>
>> IIRC, I saw the opposite behavior on an LE system a few months ago.
>> i.e. without 1028ccf5, 'perf listf|grep syscall' failed.
>>
>> Applying 1028ccf5, seemed to fix it.
> You could be on to something there - IIRC the ABI was changed for LE to
> remove the dot symbols. Might be worth testing on both.

Yeah, thanks Ian for your hints. it should be the dot symbols. So I'll
believe it's good in 4.x, thanks Michael for your patience as well.

Cheers,
Zumeng
>
> Cheers,
> -Ian
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2015-07-21 06:40:24

by Michael Ellerman

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

On Fri, 2015-07-17 at 13:28 +0800, Zumeng Chen wrote:
> On 2015年07月17日 12:07, Michael Ellerman wrote:
> > On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
> >> On 2015年07月16日 17:04, Michael Ellerman wrote:
> >>> On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> >>>> Hi All,
> >>>>
> >>>> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> >>>> unsigned long, I think it's not proper, here is my reason:
> >>>>
> >>>> sys_call_table defined as a label in assembler should be pointer array
> >>>> rather than an array as described in 1028ccf5. If we defined it as an
> >>>> array, then arch_syscall_addr will return the address of sys_call_table[],
> >>>> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> >>>> so 'perf list' will ignore all syscalls since find_syscall_meta will
> >>>> return null
> >>>> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
> >>>>
> >>>> Did I miss something, or Gcc compiler has done something newer ?
> >>> Hi Zumeng,
> >>>
> >>> It works for me with the code as it is in mainline.
> >>>
> >>> I don't quite follow your explanation, so if you're seeing a bug please send
> >>> some information about what you're actually seeing. And include the disassembly
> >>> of arch_syscall_addr() and your compiler version etc.
> >> Hi Michael,
> > Hi Zumeng,
> >
> >> Yeah, it seems it was not a good explanation, I'll explain more this time:
> >>
> >> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
> >> to sys_call_table rather than sys_call_table self in assemble level.
> > No it's not a pointer.
>
> Then what is the second one in the following:

It's a function descriptor.

> zchen@pek-yocto-build2:$ cat System.map |grep sys_call_table
> c000000000009590 T .sys_call_table <-----this is a real sys_call_table.
> c0000000014e1b48 D sys_call_table <-----this should be referred by
> arch_syscall_addr
>
> The c0000000014e1b48[0] = c000000000009590

That is from 3.14 isn't it?

In 3.14 we had in systbl.S:

46 _GLOBAL(sys_call_table)
47 #include <asm/systbl.h>

And _GLOBAL was:

46 #define _GLOBAL(name) \
47 .type name,@function; \
48 .globl name; \
49 name:


Which means sys_call_table was being declared as a function, which is
completely wrong.

On big endian when you declare a function "foo" you get two symbols, ".foo" at
the address you declare the symbol and "foo" which is somewhere else and
contains three pointers, the first of which is to ".foo".

So at address "foo" you have a pointer to ".foo", which happens to be what
you'd expect if "foo" was a pointer to ".foo".

Anton fixed this in 3.16:

https://git.kernel.org/torvalds/c/c857c43b34ec


But that had the side-effect of breaking the usage of sys_call_table in C.

cheers

2015-07-21 23:01:05

by Zumeng Chen

[permalink] [raw]
Subject: Re: BUG: perf error on syscalls for powerpc64.

在 2015年07月21日 14:40, Michael Ellerman 写道:
> On Fri, 2015-07-17 at 13:28 +0800, Zumeng Chen wrote:
>> On 2015年07月17日 12:07, Michael Ellerman wrote:
>>> On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
>>>> On 2015年07月16日 17:04, Michael Ellerman wrote:
>>>>> On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> 1028ccf5 did a change for sys_call_table from a pointer to an array of
>>>>>> unsigned long, I think it's not proper, here is my reason:
>>>>>>
>>>>>> sys_call_table defined as a label in assembler should be pointer array
>>>>>> rather than an array as described in 1028ccf5. If we defined it as an
>>>>>> array, then arch_syscall_addr will return the address of sys_call_table[],
>>>>>> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
>>>>>> so 'perf list' will ignore all syscalls since find_syscall_meta will
>>>>>> return null
>>>>>> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
>>>>>>
>>>>>> Did I miss something, or Gcc compiler has done something newer ?
>>>>> Hi Zumeng,
>>>>>
>>>>> It works for me with the code as it is in mainline.
>>>>>
>>>>> I don't quite follow your explanation, so if you're seeing a bug please send
>>>>> some information about what you're actually seeing. And include the disassembly
>>>>> of arch_syscall_addr() and your compiler version etc.
>>>> Hi Michael,
>>> Hi Zumeng,
>>>
>>>> Yeah, it seems it was not a good explanation, I'll explain more this time:
>>>>
>>>> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
>>>> to sys_call_table rather than sys_call_table self in assemble level.
>>> No it's not a pointer.
>> Then what is the second one in the following:
> It's a function descriptor.
>
>> zchen@pek-yocto-build2:$ cat System.map |grep sys_call_table
>> c000000000009590 T .sys_call_table <-----this is a real sys_call_table.
>> c0000000014e1b48 D sys_call_table <-----this should be referred by
>> arch_syscall_addr
>>
>> The c0000000014e1b48[0] = c000000000009590
> That is from 3.14 isn't it?
>
> In 3.14 we had in systbl.S:
>
> 46 _GLOBAL(sys_call_table)
> 47 #include <asm/systbl.h>
>
> And _GLOBAL was:
>
> 46 #define _GLOBAL(name) \
> 47 .type name,@function; \
> 48 .globl name; \
> 49 name:
>
>
> Which means sys_call_table was being declared as a function, which is
> completely wrong.
>
> On big endian when you declare a function "foo" you get two symbols, ".foo" at
> the address you declare the symbol and "foo" which is somewhere else and
> contains three pointers, the first of which is to ".foo".
>
> So at address "foo" you have a pointer to ".foo", which happens to be what
> you'd expect if "foo" was a pointer to ".foo".
>
> Anton fixed this in 3.16:
>
> https://git.kernel.org/torvalds/c/c857c43b34ec
>
>
> But that had the side-effect of breaking the usage of sys_call_table in C.

Yeah, good to know, thanks Michael again.

Cheers,
Zumeng

>
> cheers
>
>