2008-10-03 10:59:07

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,

On Fri, Oct 3, 2008 at 8:17 AM, David Gibson
<[email protected]> wrote:
>
> > The new API now has 8 system calls in its fully-featured
> > version. Many data structures shared with user level have been
> > abandoned in favor of explicit syscall parameters. Each syscall has
>
> Ah, excellent, big structures passed through an interface are often a
> pain to deal with, so I'm glad these are going.

Thanks.

>
>
> I, at least, like the changes that have been made here. I do have a
> few suggestions for further simplifications along the same lines
> though, see below.
>
> [snip]
> > II) programming the registers
> >
> > With v2.81:
> > int pfm_write_pmcs(int fd, pfarg_pmc_t *pmds, int n);
> > int pfm_write_pmds(int fd, pfarg_pmd_t *pmcs, int n);
> > int pfm_read_pmds(int fd, parg_pmd_t *pmds, int n);
> >
> > With v3.0:
> > int pfm_write_pmrs(int fd, int flags, pfarg_pmr_t *pmrs, int n);
> > int pfm_write_pmrs(int fd, int flags, pfarg_pmr_t *pmrs, int n,
> > pfarg_pmd_attr_t *pmas);
> >
> > int pfm_read_pmrs(int fd, int flags, pfarg_pmr_t *pmrs, int n);
> > int pfm_read_pmrs(int fd, int flags, parg_pmr_t *pmrs, int n,
> > pfarg_pmd_attr_t *pmas);
>
> I would suggest adding 'type' and 'size' (int) parameters, and folding
> the pmrs and pmas arrays into one freeform array. 'size' gives the
> size in bytes of the array elements, 'type' selects the structure of
> each array element. This means:
> - if you ever have some sort of registers to access that are
> neither PMCs nor PMDs, you can access those two, using a new type
> value.
> - if a new method of operation wants to supply different
> information for PMCs or PMDs, you can define a new structure and type
> value (one obvious example would be a minimal (num,value) only
> structure for minimum latency on simple applications).
> - an explicit size value means things can marshal / copy /
> whatnot the parameters without having to know what size each type
> implies - the code will stay valid when new types / structures are
> defined.

So you are suggesting something along the lines:

int pfm_read_pmrs(int fd, int flags, int type, void *tab, size_t sz);
int pfm_write_pmrs(int fd, int flags, int type, void *tab, size_t sz);

I have already introduced a type flag (PFM_RWFL_PMD, PFM_RWFL_PMC).
Why separate the type into its own parameter?

As for the freeform array, isn't that what people do not like because
of void *
and thus weak type checking?

I will look at switching to size instead of count. I think it does make sense.

>
> [snip]
> > III) attaching and detaching
> >
> > With v2.81:
> > int pfm_load_context(int fd, pfarg_load_t *load);
> > int pfm_unload_context(int fd);
> >
> > With v3.0:
> > int pfm_attach_session(int fd, int flags, int target);
> > int pfm_detach_session(int fd, int flags);
>
> Couldn't you get rid of one more syscall here by making detach a
> special case of attach with a special "null" value for target, or a
> special flag?


We could combine the two and use the flag field to indicate attach/detach.
The target is not a pointer but an int. Some people suggested I use an
unsigned long instead. In anycase, we could not use 0 to indicate "detach"
because CPU0 is a valid choice for system-wide. Thus we would have to
pick another value to mean "nothing", e.g, -1.

> IV) starting and stopping
>
> With v2.81:
> int pfm_start(int fd, pfarg_start_t *st);
> int pfm_stop(int fd);
> int pfm_restart(int fd);
>
> With v3.0:
> int pfm_start_session(int fd, int flags);
> int pfm_stop_session(int fd, int flags);

> Likewise, couldn't you cut this down by one more syscall by making it
> int pfm_set_session_state(int fd, int flags);
> and having a 'RUNNING' flag, which selects start or stop behaviour?

That one we can certainly do. That's a good idea.

Thanks for your feedback, keep it coming.


2008-10-03 11:12:20

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,
>> [snip]
>> > III) attaching and detaching
>> >
>> > With v2.81:
>> > int pfm_load_context(int fd, pfarg_load_t *load);
>> > int pfm_unload_context(int fd);
>> >
>> > With v3.0:
>> > int pfm_attach_session(int fd, int flags, int target);
>> > int pfm_detach_session(int fd, int flags);
>>
>> Couldn't you get rid of one more syscall here by making detach a
>> special case of attach with a special "null" value for target, or a
>> special flag?
>
>
> We could combine the two and use the flag field to indicate attach/detach.
> The target is not a pointer but an int. Some people suggested I use an
> unsigned long instead. In anycase, we could not use 0 to indicate "detach"
> because CPU0 is a valid choice for system-wide. Thus we would have to
> pick another value to mean "nothing", e.g, -1.
>
> > IV) starting and stopping
> >
> > With v2.81:
> > int pfm_start(int fd, pfarg_start_t *st);
> > int pfm_stop(int fd);
> > int pfm_restart(int fd);
> >
> > With v3.0:
> > int pfm_start_session(int fd, int flags);
> > int pfm_stop_session(int fd, int flags);
>
>> Likewise, couldn't you cut this down by one more syscall by making it
>> int pfm_set_session_state(int fd, int flags);
>> and having a 'RUNNING' flag, which selects start or stop behaviour?
>
> That one we can certainly do. That's a good idea.

Some more thoughts on this.

If we wanted to go even further, we could combine start/stop, attach/detach
into a single syscall:

int pfm_control_session(int fd, int flags, int target);
With flags:
PFM_CTFL_START : start monitoring
PFM_CTFL_STOP : stop monitoring
PFM_CTFL_RESTART: resume after overflow notification

PFM_CTFL_ATTACH: attach to thread or cpu designated by 'target'
PFM_CTFL_DETACH: detach session

But then, this is a form of ioctl() which people don't like....

2008-10-04 06:06:51

by David Gibson

[permalink] [raw]
Subject: Re: perfmon3 interface overview

On Fri, Oct 03, 2008 at 01:12:09PM +0200, stephane eranian wrote:
> David,
> >> [snip]
> >> > III) attaching and detaching
> >> >
> >> > With v2.81:
> >> > int pfm_load_context(int fd, pfarg_load_t *load);
> >> > int pfm_unload_context(int fd);
> >> >
> >> > With v3.0:
> >> > int pfm_attach_session(int fd, int flags, int target);
> >> > int pfm_detach_session(int fd, int flags);
> >>
> >> Couldn't you get rid of one more syscall here by making detach a
> >> special case of attach with a special "null" value for target, or a
> >> special flag?
> >
> >
> > We could combine the two and use the flag field to indicate attach/detach.
> > The target is not a pointer but an int. Some people suggested I use an
> > unsigned long instead. In anycase, we could not use 0 to indicate "detach"
> > because CPU0 is a valid choice for system-wide. Thus we would have to
> > pick another value to mean "nothing", e.g, -1.
> >
> > > IV) starting and stopping
> > >
> > > With v2.81:
> > > int pfm_start(int fd, pfarg_start_t *st);
> > > int pfm_stop(int fd);
> > > int pfm_restart(int fd);
> > >
> > > With v3.0:
> > > int pfm_start_session(int fd, int flags);
> > > int pfm_stop_session(int fd, int flags);
> >
> >> Likewise, couldn't you cut this down by one more syscall by making it
> >> int pfm_set_session_state(int fd, int flags);
> >> and having a 'RUNNING' flag, which selects start or stop behaviour?
> >
> > That one we can certainly do. That's a good idea.
>
> Some more thoughts on this.
>
> If we wanted to go even further, we could combine start/stop, attach/detach
> into a single syscall:

Well, you could. But the attach/detach take a parameter which
start/stop don't, making it a less obvious merge to make.

> int pfm_control_session(int fd, int flags, int target);
> With flags:
> PFM_CTFL_START : start monitoring
> PFM_CTFL_STOP : stop monitoring
> PFM_CTFL_RESTART: resume after overflow notification
>
> PFM_CTFL_ATTACH: attach to thread or cpu designated by 'target'
> PFM_CTFL_DETACH: detach session
>
> But then, this is a form of ioctl() which people don't like....

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2008-10-04 07:20:17

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,

On Sat, Oct 4, 2008 at 8:05 AM, David Gibson
<[email protected]> wrote:
>>
>> If we wanted to go even further, we could combine start/stop, attach/detach
>> into a single syscall:
>
> Well, you could. But the attach/detach take a parameter which
> start/stop don't, making it a less obvious merge to make.
>
Unless you make the 3rd argument optional and hide this in a
user library. This is how this is handled for pfm_create_session()
for instance. The library would define this as follows:

int pfm_control_session(int fd, int flags, ...);

Based upon flags, it would use va_arg() to get to the 3rd argument and
pass it to the syscall which implements the version below. A dummy value
would be passed with the flags is not equal to PFM_CTFL_ATTACH.

>> int pfm_control_session(int fd, int flags, int target);
>> With flags:
>> PFM_CTFL_START : start monitoring
>> PFM_CTFL_STOP : stop monitoring
>> PFM_CTFL_RESTART: resume after overflow notification
>>
>> PFM_CTFL_ATTACH: attach to thread or cpu designated by 'target'
>> PFM_CTFL_DETACH: detach session
>>

2008-10-05 05:53:55

by David Gibson

[permalink] [raw]
Subject: Re: perfmon3 interface overview

On Fri, Oct 03, 2008 at 12:58:12PM +0200, stephane eranian wrote:
> David,
>
> On Fri, Oct 3, 2008 at 8:17 AM, David Gibson
> <[email protected]> wrote:
> >
> > > The new API now has 8 system calls in its fully-featured
> > > version. Many data structures shared with user level have been
> > > abandoned in favor of explicit syscall parameters. Each syscall has
> >
> > Ah, excellent, big structures passed through an interface are often a
> > pain to deal with, so I'm glad these are going.

[snip]
> So you are suggesting something along the lines:
>
> int pfm_read_pmrs(int fd, int flags, int type, void *tab, size_t sz);
> int pfm_write_pmrs(int fd, int flags, int type, void *tab, size_t sz);

Uh.. maybe.. there are actually several possible variants all of which
would meet the general idea I had in mind.

> I have already introduced a type flag (PFM_RWFL_PMD, PFM_RWFL_PMC).
> Why separate the type into its own parameter?

Combining the type into the flags is certainly a possibility. I was
just a bit worried that if you eventually have enough actual flag
flags, in addition to the type values, you might start running out of
bits.

> As for the freeform array, isn't that what people do not like because
> of void *
> and thus weak type checking?

Yeah; this is an interesting tradeoff of flexibility versus
predictability. Actually, what I originally had in mind was
seperately passing both 'n' and a per-element size. That provides a
bit more defined structure to the void * - it must be an array of n
identical, fixed-size elements. The internal structure of each
element is defined only by type, but it is assumed to contain no
pointers to further chained structures (i.e. its safe for wrapper code
to do shallow copies of the array).

> I will look at switching to size instead of count. I think it does
> make sense.

Yeah. As I said I was originally thinking of keeping the 'n'
parameter, and adding an element size parameter. But just having an
overall size is also a possibility - it gives less definition to the
internal structure but at least things can marshal or copy the whole
array parameter without having to know its internals.

[snip]
> > > III) attaching and detaching
> > >
> > > With v2.81:
> > > int pfm_load_context(int fd, pfarg_load_t *load);
> > > int pfm_unload_context(int fd);
> > >
> > > With v3.0:
> > > int pfm_attach_session(int fd, int flags, int target);
> > > int pfm_detach_session(int fd, int flags);
> >
> > Couldn't you get rid of one more syscall here by making detach a
> > special case of attach with a special "null" value for target, or a
> > special flag?
>
> We could combine the two and use the flag field to indicate attach/detach.
> The target is not a pointer but an int. Some people suggested I use an
> unsigned long instead. In anycase, we could not use 0 to indicate "detach"
> because CPU0 is a valid choice for system-wide. Thus we would have to
> pick another value to mean "nothing", e.g, -1.

Sorry, I didn't express myself well. I realise it's an integer, and
didn't mean an actual NULL, but as you say a special integer value
with a function similar to NULL. -1 is the most obvious choice.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2008-10-05 05:54:17

by David Gibson

[permalink] [raw]
Subject: Re: perfmon3 interface overview

On Sat, Oct 04, 2008 at 09:20:00AM +0200, stephane eranian wrote:
> David,
>
> On Sat, Oct 4, 2008 at 8:05 AM, David Gibson
> <[email protected]> wrote:
> >>
> >> If we wanted to go even further, we could combine start/stop, attach/detach
> >> into a single syscall:
> >
> > Well, you could. But the attach/detach take a parameter which
> > start/stop don't, making it a less obvious merge to make.
> >
> Unless you make the 3rd argument optional and hide this in a
> user library. This is how this is handled for pfm_create_session()
> for instance. The library would define this as follows:
>
> int pfm_control_session(int fd, int flags, ...);
>
> Based upon flags, it would use va_arg() to get to the 3rd argument and
> pass it to the syscall which implements the version below. A dummy value
> would be passed with the flags is not equal to PFM_CTFL_ATTACH.

Yeah, that's what I'm saying. You could do it, but it feels like
you're forcing it, at least to me. As you say it starts to look a bit
like ioctl() again.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2008-10-05 21:23:26

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,

On Sun, Oct 5, 2008 at 7:53 AM, David Gibson
<[email protected]> wrote:
> [snip]
>> So you are suggesting something along the lines:
>>
>> int pfm_read_pmrs(int fd, int flags, int type, void *tab, size_t sz);
>> int pfm_write_pmrs(int fd, int flags, int type, void *tab, size_t sz);
>
> Uh.. maybe.. there are actually several possible variants all of which
> would meet the general idea I had in mind.
>
>> I have already introduced a type flag (PFM_RWFL_PMD, PFM_RWFL_PMC).
>> Why separate the type into its own parameter?
>
> Combining the type into the flags is certainly a possibility. I was
> just a bit worried that if you eventually have enough actual flag
> flags, in addition to the type values, you might start running out of
> bits.
>
I see, so you were just decoupling to double the number of available bits.
I guess we have a minimum of 32 bits. It seems overkill to support up to
32 'types'. We could force flags to uint64_t. But then we would have to do
the same for all other syscalls to be consistent. Defining flags as long
won't work because of 32-bit vs. 64-bit ABI compatibility issues.




>> As for the freeform array, isn't that what people do not like because
>> of void *
>> and thus weak type checking?
>
> Yeah; this is an interesting tradeoff of flexibility versus
> predictability. Actually, what I originally had in mind was
> seperately passing both 'n' and a per-element size. That provides a
> bit more defined structure to the void * - it must be an array of n
> identical, fixed-size elements. The internal structure of each
> element is defined only by type, but it is assumed to contain no
> pointers to further chained structures (i.e. its safe for wrapper code
> to do shallow copies of the array).
>

Ok, that reminds me of the API of calloc(). It certainly forces you to
think it terms of 'array of elements'. Yet it can be perverted very easily
with the number of elements at 1.

>> I will look at switching to size instead of count. I think it does
>> make sense.
>
> Yeah. As I said I was originally thinking of keeping the 'n'
> parameter, and adding an element size parameter. But just having an
> overall size is also a possibility - it gives less definition to the
> internal structure but at least things can marshal or copy the whole
> array parameter without having to know its internals.
>
Yes, that it true. It also simplifies the checking on size. With count
we had to check for multiply overflow because internall we had to
check the size anyway.

> [snip]
>> > > III) attaching and detaching
>> > >
>> > > With v2.81:
>> > > int pfm_load_context(int fd, pfarg_load_t *load);
>> > > int pfm_unload_context(int fd);
>> > >
>> > > With v3.0:
>> > > int pfm_attach_session(int fd, int flags, int target);
>> > > int pfm_detach_session(int fd, int flags);
>> >
>> > Couldn't you get rid of one more syscall here by making detach a
>> > special case of attach with a special "null" value for target, or a
>> > special flag?
>>
>> We could combine the two and use the flag field to indicate attach/detach.
>> The target is not a pointer but an int. Some people suggested I use an
>> unsigned long instead. In anycase, we could not use 0 to indicate "detach"
>> because CPU0 is a valid choice for system-wide. Thus we would have to
>> pick another value to mean "nothing", e.g, -1.
>
> Sorry, I didn't express myself well. I realise it's an integer, and
> didn't mean an actual NULL, but as you say a special integer value
> with a function similar to NULL. -1 is the most obvious choice.
>
Yes, -1 would work.

If I summarize our discussion. It seems we can define the API as follows:

int pfm_create_session(int fd, uint64_t flags, pfarg_sinfo_t *sif,
[ char *smpl_name, void *smpl_arg, size_t arg_size]);
int pfm_read_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
int pfm_write_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
int pfm_attach_session(int fd, uint64_t flags, int target); /*
attach, detach with target=-1 */
int pfm_control_session(int fd, uint64_t flags); /* for start/stop */
int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t sz);

Does this look reasonable?

2008-10-07 04:02:51

by David Gibson

[permalink] [raw]
Subject: Re: perfmon3 interface overview

On Sun, Oct 05, 2008 at 11:23:15PM +0200, stephane eranian wrote:
> David,
>
> On Sun, Oct 5, 2008 at 7:53 AM, David Gibson
> <[email protected]> wrote:
> > [snip]
> >> So you are suggesting something along the lines:
> >>
> >> int pfm_read_pmrs(int fd, int flags, int type, void *tab, size_t sz);
> >> int pfm_write_pmrs(int fd, int flags, int type, void *tab, size_t sz);
> >
> > Uh.. maybe.. there are actually several possible variants all of which
> > would meet the general idea I had in mind.
> >
> >> I have already introduced a type flag (PFM_RWFL_PMD, PFM_RWFL_PMC).
> >> Why separate the type into its own parameter?
> >
> > Combining the type into the flags is certainly a possibility. I was
> > just a bit worried that if you eventually have enough actual flag
> > flags, in addition to the type values, you might start running out of
> > bits.
> >
> I see, so you were just decoupling to double the number of available bits.
> I guess we have a minimum of 32 bits. It seems overkill to support up to
> 32 'types'. We could force flags to uint64_t. But then we would have to do
> the same for all other syscalls to be consistent. Defining flags as long
> won't work because of 32-bit vs. 64-bit ABI compatibility issues.

I guess so. I guess there's also the possibility that you might later
have some flags that can apply to many of the syscalls, and it would
be nice to have them in a standard bit position for all calls. That
might make bit allocation a bit interesting if you're squeezing type
into the same field.

My opinion is not really strong either way though.

> >> As for the freeform array, isn't that what people do not like because
> >> of void *
> >> and thus weak type checking?
> >
> > Yeah; this is an interesting tradeoff of flexibility versus
> > predictability. Actually, what I originally had in mind was
> > seperately passing both 'n' and a per-element size. That provides a
> > bit more defined structure to the void * - it must be an array of n
> > identical, fixed-size elements. The internal structure of each
> > element is defined only by type, but it is assumed to contain no
> > pointers to further chained structures (i.e. its safe for wrapper code
> > to do shallow copies of the array).
>
> Ok, that reminds me of the API of calloc(). It certainly forces you to
> think it terms of 'array of elements'. Yet it can be perverted very easily
> with the number of elements at 1.

That's true. Like I say, it's a tradeoff.

> >> I will look at switching to size instead of count. I think it does
> >> make sense.
> >
> > Yeah. As I said I was originally thinking of keeping the 'n'
> > parameter, and adding an element size parameter. But just having an
> > overall size is also a possibility - it gives less definition to the
> > internal structure but at least things can marshal or copy the whole
> > array parameter without having to know its internals.
> >
> Yes, that it true. It also simplifies the checking on size. With count
> we had to check for multiply overflow because internall we had to
> check the size anyway.

That's a nice extra bonus, yes.

> > [snip]
> >> > > III) attaching and detaching
> >> > >
> >> > > With v2.81:
> >> > > int pfm_load_context(int fd, pfarg_load_t *load);
> >> > > int pfm_unload_context(int fd);
> >> > >
> >> > > With v3.0:
> >> > > int pfm_attach_session(int fd, int flags, int target);
> >> > > int pfm_detach_session(int fd, int flags);
> >> >
> >> > Couldn't you get rid of one more syscall here by making detach a
> >> > special case of attach with a special "null" value for target, or a
> >> > special flag?
> >>
> >> We could combine the two and use the flag field to indicate attach/detach.
> >> The target is not a pointer but an int. Some people suggested I use an
> >> unsigned long instead. In anycase, we could not use 0 to indicate "detach"
> >> because CPU0 is a valid choice for system-wide. Thus we would have to
> >> pick another value to mean "nothing", e.g, -1.
> >
> > Sorry, I didn't express myself well. I realise it's an integer, and
> > didn't mean an actual NULL, but as you say a special integer value
> > with a function similar to NULL. -1 is the most obvious choice.
> >
> Yes, -1 would work.
>
> If I summarize our discussion. It seems we can define the API as follows:
>
> int pfm_create_session(int fd, uint64_t flags, pfarg_sinfo_t *sif,
> [ char *smpl_name, void *smpl_arg, size_t arg_size]);
> int pfm_read_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> int pfm_write_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> int pfm_attach_session(int fd, uint64_t flags, int target); /*
> attach, detach with target=-1 */
> int pfm_control_session(int fd, uint64_t flags); /* for start/stop */

Unless you have other functions in mind for this, I'd suggest a name
that's a little less generic here. pfm_set_running() maybe.

> int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t
> sz);

Hrm. This now has identical signature to the {read,write}_pmrs. I
wonder if these could be sensibly combined. Maybe have
pfm_get_info(), pfm_set_info() which can get/set control structures
for several different namespaces: PMCs PMDs and now event sets.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2008-10-07 09:24:45

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [perfmon2] perfmon3 interface overview

On Sunday 05 October 2008, stephane eranian wrote:
> If I summarize our discussion. It seems we can define the API as follows:
>
> ? ? int pfm_create_session(int fd, uint64_t flags, pfarg_sinfo_t *sif,
> [ char *smpl_name, void *smpl_arg, size_t arg_size]);
> ? ? int pfm_read_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> ? ? int pfm_write_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> ? ? int pfm_attach_session(int fd, uint64_t flags, int target); ? /*
> attach, detach with target=-1 */
> ? ? int pfm_control_session(int fd, uint64_t flags); ? /* for start/stop */
> ? ? int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t sz);


There are two problems with uint64_t arguments to system calls:

1. You have to mandate the use of stdint.h before including the
perfmon header file. This is a minor problem as long as all callers
go through libpfm, but you can avoid it entirely by using the kernel
__u32 instead of uint32_t style types everywhere in your interface.

2. The calling conventions for passing 64 bit values on 32 bit
architecture are complex, every architecture enforces different
rules here. On the syscall interface, bettwe always use native
types like 'unsigned long' instead of '__u64'. This is different
from the case where you pass a pointer to data, in which case
a '__u64 *' is much preferred over a 'unsigned long *'.

I did not understand the reason for going to a 64 bit flags parameter,
but I think that you would do everyone a favour if you can instead
use two 32-bit flags or a unsigned long flags parameter.

Arnd <><

2008-10-07 09:47:09

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,

On Tue, Oct 7, 2008 at 5:56 AM, David Gibson
<[email protected]> wrote:
> On Sun, Oct 05, 2008 at 11:23:15PM +0200, stephane eranian wrote:
>> I see, so you were just decoupling to double the number of available bits.
>> I guess we have a minimum of 32 bits. It seems overkill to support up to
>> 32 'types'. We could force flags to uint64_t. But then we would have to do
>> the same for all other syscalls to be consistent. Defining flags as long
>> won't work because of 32-bit vs. 64-bit ABI compatibility issues.
>
> I guess so. I guess there's also the possibility that you might later
> have some flags that can apply to many of the syscalls, and it would
> be nice to have them in a standard bit position for all calls. That
> might make bit allocation a bit interesting if you're squeezing type
> into the same field.
>
I can see that for subsets of the calls, e.g., read and write of registers.
It is not clear to me what kind of 'service' would apply to all calls.

As Corey suggested yesteday, the flags for read and write operations
are not just single bit. A portion of the 64 bits, 8 in his proposal, are used
to encode the 'type' of register (PMD, PMC, PMD_ATTR). That simplifies
checking for valid bits in this portion of flags.

OTOH, if I go back to your original proposal:
int pfm_write_pmrs(int fd, int flags, int type, void *v, size_t);

The 'type' field would have the same encoding as proposed by Corey.
32 bits would be plentyful for types. Flags could remain at 32-bits
across the board, thereby avoiding the problems outlined by Arnd
in his follow-up message. Separating the type also makes it look
less like an ioctl(). So I think we should go with that proposal.

[snip]
>>
>> If I summarize our discussion. It seems we can define the API as follows:
>>
>> int pfm_create_session(int fd, uint64_t flags, pfarg_sinfo_t *sif,
>> [ char *smpl_name, void *smpl_arg, size_t arg_size]);
>> int pfm_read_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
>> int pfm_write_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
>> int pfm_attach_session(int fd, uint64_t flags, int target); /*
>> attach, detach with target=-1 */
>> int pfm_control_session(int fd, uint64_t flags); /* for start/stop */
>
> Unless you have other functions in mind for this, I'd suggest a name
> that's a little less generic here. pfm_set_running() maybe.
>
What about pfm_set_state() or pfm_set_session_state()?

>> int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t
>> sz);
>
> Hrm. This now has identical signature to the {read,write}_pmrs. I
> wonder if these could be sensibly combined. Maybe have
> pfm_get_info(), pfm_set_info() which can get/set control structures
> for several different namespaces: PMCs PMDs and now event sets.
>
I think this would be too generic as it would be mixing very different
data types. This is not the case for pfm_write_pmrs() and pfm_read_pmrs()
because you are always passing 'register descriptions'.

There is a fundamental difference between pfm_control_sets() and
pfm_write_pmrs()/pfm_read_pmrs(). For the latter, the 'action' is hardcoded
in the name of the syscall, i.e., read or write, what varies is whether or not
we are passing PMD or PMC. For the former, the 'action', i.e., create new
set or get info about a set, is hardcoded in the flags, and it determines the
type of structure passed in the third argument. In that sense,
pfm_control_sets()
has more the look and feel of ioctl(), but as David points out in his follow-up
message, the difference is that we have size and that provides a basis for
sanity checking in the kernel, unlike ioctl() which only has 'flags'.

In summary, here is the current proposal:

int pfm_create_session(int fd, int flags, pfarg_sinfo_t *sif,
[ char *smpl_name,
void *smpl_arg,
size_t arg_size]);

int pfm_read_pmrs(int fd, int flags, int type, void *tab, size_t sz);

int pfm_write_pmrs(int fd, int flags, int type, void *tab, size_t sz);

int pfm_attach_session(int fd, int flags, int target);
/* detach with target=-1 */

int pfm_set_session_state(int fd, int flags);
/* for start/stop */

int pfm_control_sets(int fd, int flags, void *sets, size_t sz);

2008-10-08 00:54:23

by David Gibson

[permalink] [raw]
Subject: Re: perfmon3 interface overview

On Tue, Oct 07, 2008 at 11:46:53AM +0200, stephane eranian wrote:
> David,
>
> On Tue, Oct 7, 2008 at 5:56 AM, David Gibson
> <[email protected]> wrote:
> > On Sun, Oct 05, 2008 at 11:23:15PM +0200, stephane eranian wrote:
> >> I see, so you were just decoupling to double the number of available bits.
> >> I guess we have a minimum of 32 bits. It seems overkill to support up to
> >> 32 'types'. We could force flags to uint64_t. But then we would have to do
> >> the same for all other syscalls to be consistent. Defining flags as long
> >> won't work because of 32-bit vs. 64-bit ABI compatibility issues.
> >
> > I guess so. I guess there's also the possibility that you might later
> > have some flags that can apply to many of the syscalls, and it would
> > be nice to have them in a standard bit position for all calls. That
> > might make bit allocation a bit interesting if you're squeezing type
> > into the same field.
> >
> I can see that for subsets of the calls, e.g., read and write of registers.
> It is not clear to me what kind of 'service' would apply to all calls.
>
> As Corey suggested yesteday, the flags for read and write operations
> are not just single bit. A portion of the 64 bits, 8 in his proposal, are used
> to encode the 'type' of register (PMD, PMC, PMD_ATTR). That simplifies
> checking for valid bits in this portion of flags.

Yes, that was always what I had in mind if flags and type were
combined.

> OTOH, if I go back to your original proposal:
> int pfm_write_pmrs(int fd, int flags, int type, void *v, size_t);
>
> The 'type' field would have the same encoding as proposed by Corey.
> 32 bits would be plentyful for types. Flags could remain at 32-bits
> across the board, thereby avoiding the problems outlined by Arnd
> in his follow-up message. Separating the type also makes it look
> less like an ioctl(). So I think we should go with that proposal.

Works for me.

> [snip]
> >>
> >> If I summarize our discussion. It seems we can define the API as follows:
> >>
> >> int pfm_create_session(int fd, uint64_t flags, pfarg_sinfo_t *sif,
> >> [ char *smpl_name, void *smpl_arg, size_t arg_size]);
> >> int pfm_read_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> >> int pfm_write_pmrs(int fd, uint64_t flags, void *tab, size_t sz);
> >> int pfm_attach_session(int fd, uint64_t flags, int target); /*
> >> attach, detach with target=-1 */
> >> int pfm_control_session(int fd, uint64_t flags); /* for start/stop */
> >
> > Unless you have other functions in mind for this, I'd suggest a name
> > that's a little less generic here. pfm_set_running() maybe.
> >
> What about pfm_set_state() or pfm_set_session_state()?

I guess so.

> >> int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t
> >> sz);
> >
> > Hrm. This now has identical signature to the {read,write}_pmrs. I
> > wonder if these could be sensibly combined. Maybe have
> > pfm_get_info(), pfm_set_info() which can get/set control structures
> > for several different namespaces: PMCs PMDs and now event sets.
> >
> I think this would be too generic as it would be mixing very different
> data types. This is not the case for pfm_write_pmrs() and pfm_read_pmrs()
> because you are always passing 'register descriptions'.
>
> There is a fundamental difference between pfm_control_sets() and
> pfm_write_pmrs()/pfm_read_pmrs(). For the latter, the 'action' is hardcoded
> in the name of the syscall, i.e., read or write, what varies is whether or not
> we are passing PMD or PMC. For the former, the 'action', i.e., create new
> set or get info about a set, is hardcoded in the flags, and it determines the

Uh.. I was assuming if combined we'd drop the selecting flag from
control_sets() and instead make the pfm_read_stuff() call be the
get_info() equivalent when used on the eventsets space and
pfm_write_stuff() be the create/modify sets call.

The fact that the structure is different for read/write in this case
is a bit funny. In fact it would probably be best to have different
'type' fields for read and write too (EVTSET_INFOm only usable for
reads , and EVTSET_CONTROL only for writes, maybe).


I'm not dead-set on combining these calls, but it seems to me we've
already just about made the read/write calls into a read/write array
of control structures in some namespace call, which pretty much is
what the event set call is too.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2008-10-08 10:09:28

by Stephane Eranian

[permalink] [raw]
Subject: Re: perfmon3 interface overview

David,


On Wed, Oct 8, 2008 at 2:53 AM, David Gibson
<[email protected]> wrote:
David,

>> >> int pfm_control_sets(int fd, uint64_t flags, void *sets, size_t
>> >> sz);
>> >
>> > Hrm. This now has identical signature to the {read,write}_pmrs. I
>> > wonder if these could be sensibly combined. Maybe have
>> > pfm_get_info(), pfm_set_info() which can get/set control structures
>> > for several different namespaces: PMCs PMDs and now event sets.
>> >
>> I think this would be too generic as it would be mixing very different
>> data types. This is not the case for pfm_write_pmrs() and pfm_read_pmrs()
>> because you are always passing 'register descriptions'.
>>
>> There is a fundamental difference between pfm_control_sets() and
>> pfm_write_pmrs()/pfm_read_pmrs(). For the latter, the 'action' is hardcoded
>> in the name of the syscall, i.e., read or write, what varies is whether or not
>> we are passing PMD or PMC. For the former, the 'action', i.e., create new
>> set or get info about a set, is hardcoded in the flags, and it determines the
>
> Uh.. I was assuming if combined we'd drop the selecting flag from
> control_sets() and instead make the pfm_read_stuff() call be the
> get_info() equivalent when used on the eventsets space and
> pfm_write_stuff() be the create/modify sets call.
>
There is something I'd like o clarify about event sets. People may be
wondering why we need a separate call to create new event sets. Why
not create the sets when the session is created, e.g., 'create a
session with 5 sets'.

There are several reasons for not doing this. Each set has some attributes which
describes how and when to switch to another set. Switching can be done based
on timeout or based on the number of overflows of counters, each chosen on a
per set basis, i.e,, it is possible to combine overflow and timeout switching.
Furthermore, the timeout is defined per set and can thus vary from one
set to the
other. Overflow switching is controlled by a threshold passed as an
attribute to the
PMD registers.

If we were to create sets with the sessions, it would be more than a matter of
passing the number of sets. We would actually have to pass the set attributes
(pfarg_set_desc_t) and a size parameter.

We could do this in v3.0 but it would have to work incrementally as initially we
won't support event sets in mainline. The call would look as follows:

int pfm_create_session(int flags, pfarg_sinfo_t *sif);
int pfm_create_session(int flags, pfarg_sinfo_t *sif, char *smpl_name,
void *smpl_arg, size_t arg_size);
int pfm_create_session(int flags, pfarg_sinfo_t *sif, char *smpl_name,
void *smpl_arg, size_t arg_size, pfarg_set_desc_t *s, size_t sz);

I think this call is a bit too complicated. It does a lot of things.
Furthermore, you lose the ability to add sets on the fly. Deleting may
be possible via another call.

So my inclination is to keep this as a separate call. By default, one
set is created (set0), with no switching attribute. It is possible
to change the attributes by recreating the set. That means, unless you
need more sets, you have nothing to do.


> The fact that the structure is different for read/write in this case
> is a bit funny. In fact it would probably be best to have different
> 'type' fields for read and write too (EVTSET_INFOm only usable for
> reads , and EVTSET_CONTROL only for writes, maybe).
>
>
> I'm not dead-set on combining these calls, but it seems to me we've
> already just about made the read/write calls into a read/write array
> of control structures in some namespace call, which pretty much is
> what the event set call is too.
>
I see, you are pushing it a bit further here. If I understand, you're saying:

int pfm_read_session(int fd, int flags, int type, void *t, size_t sz);
int pfm_write_session(int fd, int flags, int type, void *t, size_t sz);

Where type could be:
PFM_RWFL_PMD : write simplified pmds registers (pfarg_pmr_t)
PFM_RWFL_PMD_ATTR: read or write pmds registers with extended attributes.
PFM_RWFL_PMC : write pmcs register (pfarg_pmr_t), note:
this is for writing only.
PFM_RWFL_SETINFO : read set information (pfarg_set_info_t). Only
supported by pfm_read_session
PFM_RWFL_CREATSET: create new event sets (pfarg_set_desc_t). Only
supported by pfm_write_session

I am a bit worried this is too generic, especially for creating sets.
Yet it offers the advantage of being
very compact and extensible. I'd like to combine pfarg_set_info_t and
pfarg_set_desc_t if possible, even
if some fields are used in only one of the calls.

In summary, here is the current proposal:

int pfm_create_session(int flags, pfarg_sinfo_t *sif,
[char *smpl_name,
void *smpl_arg,
size_t arg_sz]);

int pfm_read_session(int fd, int flags, int type, void *t, size_t sz);

int pfm_write_session(int fd, int flags, int type, void *t, size_t sz);

int pfm_attach_session((int fd, int flags, int target); /* target =
-1 for detach */

int pfm_set_state_session(int fd, int flags);

No other calls are necessary to support event sets and multiplexing.