2008-11-12 10:10:34

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

On Wed, Nov 12, 2008 at 8:09 AM, Markus Metzger
<[email protected]> wrote:
> stephane eranian wrote:
>
>> For perfmon, PEBS can be used for both per-thread and per-cpu. With
>> perfmon, the allocation/initialization of the buffer is separated from its
>> activation. In other words, allocation/initialization may be done before
>> you
>> actually have to write the DS_AREA MSR. Allocation/initialization may not
>> necessary be done on the cpu you want to measure in per-cpu mode. The
>> logic of DS is different, ds_request_pebs() allocates and immediately
>> writes
>> DS_AREA. I cannot use that.
>
> DS_AREA contains a pointer to the actual configuration struct.
> The idea is that once I allocated the configuration struct, I can write
> the pointer to it into DS_AREA. It won't be used unless I turn on a feature
> that uses DS.
> So, the flow is:
> 1. allocate configuration struct
> 2. write DS_AREA

What if you are allocating the buffer for another task. For instance,
a tool is attaching
to a running process and wants to use PEBS. The tool allocates and
initializes the
PEBS buffer and then attaches it to the task to monitor (which is
stopped). When the
task is scheduled again, it picks up the PEBS buffer, and DS_AREA is written to.

> 3. configure BTS/PEBS
> 4. enable BTS/PEBS

enable PEBS consists in writing a PMU MSR. Perfmon takes care of that.

>
>
>> Concerning memory allocation of the buffer, the current kernel module
>> exporting PEBS lumps together the memory for the DS_AREA + PEBS
>> buffer. The entire region is then remapped to user space via mmap().
>> That implies that the memory region needs to be page-aligned and multiple
>> of page size. If I wanted to keep this, it is not clear to me how I could
>> use
>> your API to simply pass the addresses.
>
> Higher layers are not meant to know about DS configuration details. I admit
> that
> the interface is still very near to the actual h/w and that the developers
> of
> higher layers do of course know DS details. But they should not care, for
> example, that DS configuration requires memory allocation.
>
> The DS configuration is shared between PEBS and BTS users.
> Assuming there is already a BTS user for that thread or cpu. What should
> ds.c
> do with the memory you allocated for the DS configuration? It must have
> already
> allocated one to satisfy the BTS user.
>
> Besides, I would not expose this configuration to user-space. If the user
> were
> to modify the configuration, he could bring down the system. The PEBS/BTS
> buffers should be save, since they are write-only from h/w perspective.
>
The PEBS buffer (and thus DS_AREA) are remapped read-only.

There is another reason why the DS_AREA is exposed. This is the only way for
tools to see the current position in the PEBS buffer. They may want to poll on
that position index. The PEBS buffer is not necessarily full. What if
the session
terminates with a partial buffer. There must be a way for the tool to figure out
where the last sample is. By exposing DS read-only the index is always available
and always guaranteed current. Without this, the kernel would have to
extract the
index (pebs_get_index) and store it somewhere so the user can see it. This copy
could be tirggered by a PEBS buffer overflow or a stop of monitoring. Sampling
buffers formats currently do not have a stop callback but this can be
added easily.


2008-11-12 10:59:56

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Mittwoch, 12. November 2008 11:10
>To: Markus Metzger

>>> For perfmon, PEBS can be used for both per-thread and per-cpu. With
>>> perfmon, the allocation/initialization of the buffer is
>separated from its
>>> activation. In other words, allocation/initialization may
>be done before
>>> you
>>> actually have to write the DS_AREA MSR.
>Allocation/initialization may not
>>> necessary be done on the cpu you want to measure in per-cpu
>mode. The
>>> logic of DS is different, ds_request_pebs() allocates and
>immediately
>>> writes
>>> DS_AREA. I cannot use that.
>>
>> DS_AREA contains a pointer to the actual configuration struct.
>> The idea is that once I allocated the configuration struct,
>I can write
>> the pointer to it into DS_AREA. It won't be used unless I
>turn on a feature
>> that uses DS.
>> So, the flow is:
>> 1. allocate configuration struct
>> 2. write DS_AREA
>
>What if you are allocating the buffer for another task. For instance,
>a tool is attaching
>to a running process and wants to use PEBS. The tool allocates and
>initializes the
>PEBS buffer and then attaches it to the task to monitor (which is
>stopped). When the
>task is scheduled again, it picks up the PEBS buffer, and
>DS_AREA is written to.

That tool should request DS for the task it intends to attach its
PEBS buffer to. Who knows, maybe PEBS is already in use for that task
by someone else. Or maybe there is a system wide BTS or PEBS session
and the per-task request would need to be rejected (until we can
support this scenario).



>There is another reason why the DS_AREA is exposed. This is
>the only way for
>tools to see the current position in the PEBS buffer. They may
>want to poll on
>that position index. The PEBS buffer is not necessarily full. What if
>the session
>terminates with a partial buffer. There must be a way for the
>tool to figure out
>where the last sample is. By exposing DS read-only the index
>is always available
>and always guaranteed current. Without this, the kernel would have to
>extract the
>index (pebs_get_index) and store it somewhere so the user can
>see it. This copy
>could be tirggered by a PEBS buffer overflow or a stop of
>monitoring. Sampling
>buffers formats currently do not have a stop callback but this can be
>added easily.

With the current interface, you would need to call ds_get_pebs_index().

The rfc patch I sent out some weeks ago in the scope of multiplexing
uses the approach you described - at least in-kernel.
Upon ds_request_pebs(), you would get a const struct pebs_tracer* that
contains a const view of the DS configuration.


In any case, I think the DS configuration needs to be allocated by ds.c
since we have two independent users that need to share a single
configuration.

Would it help if we changed the DS interface as proposed in that rfc
patch (without all the multiplexing stuff)?


regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-13 14:50:44

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

I looked at the ds.c source code some more. I think the problem when
trying to use it with perfmon is because you are assuming that by the
time you allocate the buffer you know the task it is going to be attached to.
With perfmon this assumption is always wrong. Perfmon decouples each
step:
1- allocation of the memory for the buffer
2- initialization of the buffer
3- binding of the buffer to its target thread or CPU
4- saving/restoring of PMU/DS/PEBS registers

The idea is that you can prepare your measurement/buffer and then
attach to a task.
Should you be following across fork/pthread_create, you could prepare in advance
a pool of ready-to-go perfmon sessions, all you'd have to do then is
attach + start.

In ds.c:ds_request_pebs(), you assume that if task == NULL, then this
is a per-cpu session.
If you could decouple that by passing a type parameter for instance
and if you were to add
a ds_attach_pebs() then I think I could use the interface.


On Wed, Nov 12, 2008 at 11:59 AM, Metzger, Markus T
<[email protected]> wrote:
>>-----Original Message-----
>>From: stephane eranian [mailto:[email protected]]
>>Sent: Mittwoch, 12. November 2008 11:10
>>To: Markus Metzger
>
>>>> For perfmon, PEBS can be used for both per-thread and per-cpu. With
>>>> perfmon, the allocation/initialization of the buffer is
>>separated from its
>>>> activation. In other words, allocation/initialization may
>>be done before
>>>> you
>>>> actually have to write the DS_AREA MSR.
>>Allocation/initialization may not
>>>> necessary be done on the cpu you want to measure in per-cpu
>>mode. The
>>>> logic of DS is different, ds_request_pebs() allocates and
>>immediately
>>>> writes
>>>> DS_AREA. I cannot use that.
>>>
>>> DS_AREA contains a pointer to the actual configuration struct.
>>> The idea is that once I allocated the configuration struct,
>>I can write
>>> the pointer to it into DS_AREA. It won't be used unless I
>>turn on a feature
>>> that uses DS.
>>> So, the flow is:
>>> 1. allocate configuration struct
>>> 2. write DS_AREA
>>
>>What if you are allocating the buffer for another task. For instance,
>>a tool is attaching
>>to a running process and wants to use PEBS. The tool allocates and
>>initializes the
>>PEBS buffer and then attaches it to the task to monitor (which is
>>stopped). When the
>>task is scheduled again, it picks up the PEBS buffer, and
>>DS_AREA is written to.
>
> That tool should request DS for the task it intends to attach its
> PEBS buffer to. Who knows, maybe PEBS is already in use for that task
> by someone else. Or maybe there is a system wide BTS or PEBS session
> and the per-task request would need to be rejected (until we can
> support this scenario).
>
>
>
>>There is another reason why the DS_AREA is exposed. This is
>>the only way for
>>tools to see the current position in the PEBS buffer. They may
>>want to poll on
>>that position index. The PEBS buffer is not necessarily full. What if
>>the session
>>terminates with a partial buffer. There must be a way for the
>>tool to figure out
>>where the last sample is. By exposing DS read-only the index
>>is always available
>>and always guaranteed current. Without this, the kernel would have to
>>extract the
>>index (pebs_get_index) and store it somewhere so the user can
>>see it. This copy
>>could be tirggered by a PEBS buffer overflow or a stop of
>>monitoring. Sampling
>>buffers formats currently do not have a stop callback but this can be
>>added easily.
>
> With the current interface, you would need to call ds_get_pebs_index().
>
> The rfc patch I sent out some weeks ago in the scope of multiplexing
> uses the approach you described - at least in-kernel.
> Upon ds_request_pebs(), you would get a const struct pebs_tracer* that
> contains a const view of the DS configuration.
>
>
> In any case, I think the DS configuration needs to be allocated by ds.c
> since we have two independent users that need to share a single
> configuration.
>
> Would it help if we changed the DS interface as proposed in that rfc
> patch (without all the multiplexing stuff)?
>
>
> regards,
> markus.
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>

2008-11-14 14:41:50

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Donnerstag, 13. November 2008 15:51
>To: Metzger, Markus T

>I looked at the ds.c source code some more. I think the problem when
>trying to use it with perfmon is because you are assuming that by the
>time you allocate the buffer you know the task it is going to
>be attached to.
>With perfmon this assumption is always wrong. Perfmon decouples each
>step:
> 1- allocation of the memory for the buffer
> 2- initialization of the buffer
> 3- binding of the buffer to its target thread or CPU
> 4- saving/restoring of PMU/DS/PEBS registers
>
>The idea is that you can prepare your measurement/buffer and then
>attach to a task.
>Should you be following across fork/pthread_create, you could
>prepare in advance
>a pool of ready-to-go perfmon sessions, all you'd have to do then is
>attach + start.
>
>In ds.c:ds_request_pebs(), you assume that if task == NULL, then this
>is a per-cpu session.
>If you could decouple that by passing a type parameter for instance
>and if you were to add
>a ds_attach_pebs() then I think I could use the interface.

Let's assume that there is a BTS user who is also following fork.
That user is also preparing a set of pre-allocated buffers and he would
want to call ds_attach_bts() to attach his buffers.

ds_attach_*() would need to merge the configurations of the BTS and the
PEBS user. We thus cannot pre-configure the DS configurations as such.

Let's further assume that there is a second PEBS user who wants to trace
every task and thus calls ds_request_pebs() somewhere in do_fork().

ds_attach_*() would need to make sure that PEBS/BTS is not already in
use.

In the end, I'm afraid we're reimplementing ds_request_*().


You may still pre-allocate PEBS buffers. When following fork, you would
call ds_request_pebs(), provide the pre-allocated buffer, and handle
possible errors.

regards,
markus.

>
>
>On Wed, Nov 12, 2008 at 11:59 AM, Metzger, Markus T
><[email protected]> wrote:
>>>-----Original Message-----
>>>From: stephane eranian [mailto:[email protected]]
>>>Sent: Mittwoch, 12. November 2008 11:10
>>>To: Markus Metzger
>>
>>>>> For perfmon, PEBS can be used for both per-thread and
>per-cpu. With
>>>>> perfmon, the allocation/initialization of the buffer is
>>>separated from its
>>>>> activation. In other words, allocation/initialization may
>>>be done before
>>>>> you
>>>>> actually have to write the DS_AREA MSR.
>>>Allocation/initialization may not
>>>>> necessary be done on the cpu you want to measure in per-cpu
>>>mode. The
>>>>> logic of DS is different, ds_request_pebs() allocates and
>>>immediately
>>>>> writes
>>>>> DS_AREA. I cannot use that.
>>>>
>>>> DS_AREA contains a pointer to the actual configuration struct.
>>>> The idea is that once I allocated the configuration struct,
>>>I can write
>>>> the pointer to it into DS_AREA. It won't be used unless I
>>>turn on a feature
>>>> that uses DS.
>>>> So, the flow is:
>>>> 1. allocate configuration struct
>>>> 2. write DS_AREA
>>>
>>>What if you are allocating the buffer for another task. For instance,
>>>a tool is attaching
>>>to a running process and wants to use PEBS. The tool allocates and
>>>initializes the
>>>PEBS buffer and then attaches it to the task to monitor (which is
>>>stopped). When the
>>>task is scheduled again, it picks up the PEBS buffer, and
>>>DS_AREA is written to.
>>
>> That tool should request DS for the task it intends to attach its
>> PEBS buffer to. Who knows, maybe PEBS is already in use for that task
>> by someone else. Or maybe there is a system wide BTS or PEBS session
>> and the per-task request would need to be rejected (until we can
>> support this scenario).
>>
>>
>>
>>>There is another reason why the DS_AREA is exposed. This is
>>>the only way for
>>>tools to see the current position in the PEBS buffer. They may
>>>want to poll on
>>>that position index. The PEBS buffer is not necessarily full. What if
>>>the session
>>>terminates with a partial buffer. There must be a way for the
>>>tool to figure out
>>>where the last sample is. By exposing DS read-only the index
>>>is always available
>>>and always guaranteed current. Without this, the kernel would have to
>>>extract the
>>>index (pebs_get_index) and store it somewhere so the user can
>>>see it. This copy
>>>could be tirggered by a PEBS buffer overflow or a stop of
>>>monitoring. Sampling
>>>buffers formats currently do not have a stop callback but this can be
>>>added easily.
>>
>> With the current interface, you would need to call
>ds_get_pebs_index().
>>
>> The rfc patch I sent out some weeks ago in the scope of multiplexing
>> uses the approach you described - at least in-kernel.
>> Upon ds_request_pebs(), you would get a const struct
>pebs_tracer* that
>> contains a const view of the DS configuration.
>>
>>
>> In any case, I think the DS configuration needs to be
>allocated by ds.c
>> since we have two independent users that need to share a single
>> configuration.
>>
>> Would it help if we changed the DS interface as proposed in that rfc
>> patch (without all the multiplexing stuff)?
>>
>>
>> regards,
>> markus.
>> ---------------------------------------------------------------------
>> Intel GmbH
>> Dornacher Strasse 1
>> 85622 Feldkirchen/Muenchen Germany
>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>> VAT Registration No.: DE129385895
>> Citibank Frankfurt (BLZ 502 109 00) 600119052
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-14 21:10:46

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,


I am trying another approach for connecting perfmon to ds.c. I have
added new callbacks
to perfmon to allow me to invoke ds_request_pebs, once I know the
target thread or CPU.
That seems to work, however I ran into a spinlock problem is ds_request():

static int ds_request(struct task_struct *task, void *base, size_t size,
ds_ovfl_callback_t ovfl, enum ds_qualifier qual)
{
struct ds_context *context;
unsigned long buffer, adj;
const unsigned long alignment = (1 << 3);
int error = 0;

if (!ds_cfg.sizeof_ds)
return -EOPNOTSUPP;

/* we require some space to do alignment adjustments below */
if (size < (alignment + ds_cfg.sizeof_rec[qual]))
return -EINVAL;

/* buffer overflow notification is not yet implemented */
if (ovfl)
return -EOPNOTSUPP;


spin_lock(&ds_lock);

>> if (!check_tracer(task))
>> return -EPERM;

You need to spin_unlock() before returning here!

2008-11-15 10:01:48

by Markus Metzger

[permalink] [raw]
Subject: Re: debugctl msr

On Fri, 2008-11-14 at 22:10 +0100, stephane eranian wrote:
> spin_lock(&ds_lock);
>
> >> if (!check_tracer(task))
> >> return -EPERM;
>
> You need to spin_unlock() before returning here!

Thanks,

I sent out a patch to fix that problem.

regards,
markus.

2008-11-18 22:01:15

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

I think your definition for ds_cfg_64 is wrong. On Core, the PEBS
record ALWAYS includes r8-r15 even on
32 bits (zero filled). Also the DS_AREA has 9 fields, not 8.
Consequently, I think the structure should be defined
as follows:

static const struct ds_configuration ds_cfg_64 = {
.sizeof_ds = 8 * 9,
.sizeof_field = 8,
.sizeof_rec[ds_bts] = 8 * 3,
.sizeof_rec[ds_pebs] = 8 * 18
};

Do you agree?

Thanks.

2008-11-19 12:15:06

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Dienstag, 18. November 2008 23:01
>To: Markus Metzger
>Cc: Metzger, Markus T; Ingo Molnar; Andi Kleen; Andrew Morton;
>[email protected]
>Subject: Re: debugctl msr
>

>I think your definition for ds_cfg_64 is wrong. On Core, the PEBS
>record ALWAYS includes r8-r15 even on
>32 bits (zero filled). Also the DS_AREA has 9 fields, not 8.
>Consequently, I think the structure should be defined
>as follows:
>
>static const struct ds_configuration ds_cfg_64 = {
> .sizeof_ds = 8 * 9,

You are right that there are 9 fields.
However, the 9th field is double the size of the other fields and the entire structure is padded.
The original code should read 8 * 12 to reflect that.

> .sizeof_field = 8,
> .sizeof_rec[ds_bts] = 8 * 3,
> .sizeof_rec[ds_pebs] = 8 * 18

You're right about the PEBS size.

I will send a patch some time this week.

thanks,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-19 12:59:41

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,


You are right about the reserved field, it was missing from my code
but that was harmless.

I had to hack ds.c some more to make forward progress with PEBS. First
of all my PEBS code is
in a kernel module, so all PEBS functions have to be exported.
Furhtermore, I need a
ds_get_pebs_thres() and ds_set_pebs_thres() calls.

But the one key problem is ds_validate_access(). I had to disable this
function. The problem is that
with perfmon there can be several threads who need access to the PEBS
fields. Take the simple
case of a tool attaching to a running process to attach the monitoring
harness. The tool's thread
is called ds_request_pebs(), so it is marked as the owner. But on PEBS
buffer full overflow, the
monitored thread is the current thread and it needs to know the
position in the buffer, therefore
it calls ds_get_pebs_index(). With perfmon a session is identified by
a file descriptor, any thread
with access to the file descriptor can access the session's data and
thus may need PEBS access.
For instance, if the monitoring tool is multi-threaded, then any
thread can access the PEBS data.


On Wed, Nov 19, 2008 at 1:14 PM, Metzger, Markus T
<[email protected]> wrote:
>>-----Original Message-----
>>From: stephane eranian [mailto:[email protected]]
>>Sent: Dienstag, 18. November 2008 23:01
>>To: Markus Metzger
>>Cc: Metzger, Markus T; Ingo Molnar; Andi Kleen; Andrew Morton;
>>[email protected]
>>Subject: Re: debugctl msr
>>
>
>>I think your definition for ds_cfg_64 is wrong. On Core, the PEBS
>>record ALWAYS includes r8-r15 even on
>>32 bits (zero filled). Also the DS_AREA has 9 fields, not 8.
>>Consequently, I think the structure should be defined
>>as follows:
>>
>>static const struct ds_configuration ds_cfg_64 = {
>> .sizeof_ds = 8 * 9,
>
> You are right that there are 9 fields.
> However, the 9th field is double the size of the other fields and the entire structure is padded.
> The original code should read 8 * 12 to reflect that.
>
>> .sizeof_field = 8,
>> .sizeof_rec[ds_bts] = 8 * 3,
>> .sizeof_rec[ds_pebs] = 8 * 18
>
> You're right about the PEBS size.
>
> I will send a patch some time this week.
>
> thanks,
> markus.
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>

2008-11-19 15:47:54

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Mittwoch, 19. November 2008 14:00
>To: Metzger, Markus T
>Cc: Markus Metzger; Ingo Molnar; Andi Kleen; Andrew Morton;
>[email protected]
>Subject: Re: debugctl msr
>

>I had to hack ds.c some more to make forward progress with PEBS. First
>of all my PEBS code is
>in a kernel module, so all PEBS functions have to be exported.
>Furhtermore, I need a
>ds_get_pebs_thres() and ds_set_pebs_thres() calls.

I'm having a deja vu. We had this discussion before. You reported those
issues and I fixed them. Same for the PEBS size; and Andi Kleen asked
to exclude ds.c from the build instead of guarding the .c file and to
use the mm semaphore in ds_allocate_buffer().

That thread ended in the multiplexing discussion and my fixes never got in.
I'll send a patch covering those problems next week.


Regarding access to the interrupt threshold, we never completed
our discussion.
If we look towards multiplexing, ds.c has to handle interrupts and
copy the trace to the various users - at least for BTS.
I will pull some of the BTS handling from ptrace into ds.c - ds.c needs
to be able to disable BTS recording for overflow handling and we would
not want this knowledge in two different places.

I would add those functions now to proceed with the perfmon2 adaptation
and later remove them again and put overflow handling into ds.c

I'll send a separate patch for those.


>But the one key problem is ds_validate_access(). I had to disable this
>function.

I agree that this is too ptrace centric.

It works fine for the ptrace bts extenstion since ptrace has similar
restrictions, already.

A better choice would be to return an opaque handle.


regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-19 17:13:25

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

Speaking of locking, I also ran into another issue with ds_lock.
Perfmon sessions each have a spinlock for access serialization, but to
prevent from PMU and timers interrupts, interrupts are masked. Thus,
when perfmon
calls ds.c, interrupts are masked. That means that we lock/unlock ds_lock
with interrupts disabled. The lock checker triggered when I ran a simple perfmon
session and warned of possible lock inversion. Suppose you are coming from the
ptrace code into ds. You grab ds_lock, but the same process is also running
a perfmon session with PEBS and a counter overflows, you get into
the PMU interrupt handler which may call into ds.c and try to grab the ds_lock.
For that reason, I think you should use a
spin_lock_irqsave/spin_unlock_irqrestore
pairs to protect your ds context.

I found another issue with ds_release(). You need to skip freeing the
buffer when it
is NULL, i.e., was already allocated by caller of ds_request_pebs().

I have attached a diff for the ds.c interface. It disables
ds_validate_access(), export
the PEBS functions to modules, fixes ds_release().

As for handling the interrupt is ds.c, not clear how this could work
with current perfmon.
I don't know how this work on the BTS side. On the PMU side, that is not because
I am using PEBS, that I don't also use other counters as well. Longer
term, I think, there
needs to be a lower-level PMU interrupt service where you would
register a callback
on PMU interrupts. It would be used by NMI watchdog, perfmon,
Oprofile, ds.c. Callbacks
would be chained, just like what happens with the NMI interrupt
handler. Such service would
go hand in hand with a centralized PMU MSR allocator which would
provide safe sharing of
the PMU registers between different subsystems (there is a primitive
version of this already
today in perfctr-watchdog.c).



On Wed, Nov 19, 2008 at 4:47 PM, Metzger, Markus T
<[email protected]> wrote:
>>-----Original Message-----
>>From: stephane eranian [mailto:[email protected]]
>>Sent: Mittwoch, 19. November 2008 14:00
>>To: Metzger, Markus T
>>Cc: Markus Metzger; Ingo Molnar; Andi Kleen; Andrew Morton;
>>[email protected]
>>Subject: Re: debugctl msr
>>
>
>>I had to hack ds.c some more to make forward progress with PEBS. First
>>of all my PEBS code is
>>in a kernel module, so all PEBS functions have to be exported.
>>Furhtermore, I need a
>>ds_get_pebs_thres() and ds_set_pebs_thres() calls.
>
> I'm having a deja vu. We had this discussion before. You reported those
> issues and I fixed them. Same for the PEBS size; and Andi Kleen asked
> to exclude ds.c from the build instead of guarding the .c file and to
> use the mm semaphore in ds_allocate_buffer().
>
> That thread ended in the multiplexing discussion and my fixes never got in.
> I'll send a patch covering those problems next week.
>
>
> Regarding access to the interrupt threshold, we never completed
> our discussion.
> If we look towards multiplexing, ds.c has to handle interrupts and
> copy the trace to the various users - at least for BTS.
> I will pull some of the BTS handling from ptrace into ds.c - ds.c needs
> to be able to disable BTS recording for overflow handling and we would
> not want this knowledge in two different places.
>
> I would add those functions now to proceed with the perfmon2 adaptation
> and later remove them again and put overflow handling into ds.c
>
> I'll send a separate patch for those.
>
>
>>But the one key problem is ds_validate_access(). I had to disable this
>>function.
>
> I agree that this is too ptrace centric.
>
> It works fine for the ptrace bts extenstion since ptrace has similar
> restrictions, already.
>
> A better choice would be to return an opaque handle.
>
>
> regards,
> markus.
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>


Attachments:
(No filename) (4.33 kB)
ds.diff (3.97 kB)
Download all attachments

2008-11-19 18:27:58

by Markus Metzger

[permalink] [raw]
Subject: Re: debugctl msr

On Wed, 2008-11-19 at 18:13 +0100, stephane eranian wrote:

> Speaking of locking, I also ran into another issue with ds_lock.
> Perfmon sessions each have a spinlock for access serialization, but to
> prevent from PMU and timers interrupts, interrupts are masked. Thus,
> when perfmon
> calls ds.c, interrupts are masked. That means that we lock/unlock ds_lock
> with interrupts disabled. The lock checker triggered when I ran a simple perfmon
> session and warned of possible lock inversion. Suppose you are coming from the
> ptrace code into ds. You grab ds_lock, but the same process is also running
> a perfmon session with PEBS and a counter overflows, you get into
> the PMU interrupt handler which may call into ds.c and try to grab the ds_lock.
> For that reason, I think you should use a
> spin_lock_irqsave/spin_unlock_irqrestore
> pairs to protect your ds context.

OK. So far, there was no user that called ds_*() with interrupts
disabled.


> I found another issue with ds_release(). You need to skip freeing the
> buffer when it
> is NULL, i.e., was already allocated by caller of ds_request_pebs().

ds_release() is not robust with respect to double release, if that's
what you mean. Is that desirable?

For a single ds_release() call matching a corresponding successful
ds_request() call, the buffer is freed if and only if it had been
allocated by ds.c.

Kfree() itself handles NULL pointers and scripts/checkpatch.pl warns on
a check for NULL around a kfree() call.


> I have attached a diff for the ds.c interface. It disables
> ds_validate_access(), export
> the PEBS functions to modules, fixes ds_release().
>


> As for handling the interrupt is ds.c, not clear how this could work
> with current perfmon.
> I don't know how this work on the BTS side. On the PMU side, that is not because
> I am using PEBS, that I don't also use other counters as well. Longer
> term, I think, there
> needs to be a lower-level PMU interrupt service where you would
> register a callback
> on PMU interrupts. It would be used by NMI watchdog, perfmon,
> Oprofile, ds.c.

That's even preferable to having the interrupt code itself in ds.c

The point I was trying to make is that buffer overflows should not be
handled on higher levels (i.e. users of ds.c). That's why I am so
reluctant to expose the interrupt threshold in the ds.c interface.


regards,
markus.

2008-11-19 19:21:07

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,


On Wed, Nov 19, 2008 at 7:27 PM, Markus Metzger
<[email protected]> wrote:
> On Wed, 2008-11-19 at 18:13 +0100, stephane eranian wrote:
>
>> I found another issue with ds_release(). You need to skip freeing the
>> buffer when it
>> is NULL, i.e., was already allocated by caller of ds_request_pebs().
>
> ds_release() is not robust with respect to double release, if that's
> what you mean. Is that desirable?
>
I don't think so.

> For a single ds_release() call matching a corresponding successful
> ds_request() call, the buffer is freed if and only if it had been
> allocated by ds.c.
>
> Kfree() itself handles NULL pointers and scripts/checkpatch.pl warns on
> a check for NULL around a kfree() call.
>
Yes, I have narrowed this down to the following lines:
current->mm->total_vm -= context->pages[qual];
current->mm->locked_vm -= context->pages[qual];

I think this is again related to the problem of which thread call
ds_release(). In my test
case, this is the monitored thread as it exits. By the time it gets
there current->mm is NULL.

>> As for handling the interrupt is ds.c, not clear how this could work
>> with current perfmon.
>> I don't know how this work on the BTS side. On the PMU side, that is not because
>> I am using PEBS, that I don't also use other counters as well. Longer
>> term, I think, there
>> needs to be a lower-level PMU interrupt service where you would
>> register a callback
>> on PMU interrupts. It would be used by NMI watchdog, perfmon,
>> Oprofile, ds.c.
>
> That's even preferable to having the interrupt code itself in ds.c
>
Yes!

> The point I was trying to make is that buffer overflows should not be
> handled on higher levels (i.e. users of ds.c). That's why I am so
> reluctant to expose the interrupt threshold in the ds.c interface.
>
But the threshold is a characteristic of the buffer, not the interrupt handler.
Depending on the tool, it may be interesting to set the threshold earlier than
at the end of the buffer.

2008-11-19 20:59:46

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,


On Wed, Nov 19, 2008 at 8:20 PM, stephane eranian
<[email protected]> wrote:
>
>> For a single ds_release() call matching a corresponding successful
>> ds_request() call, the buffer is freed if and only if it had been
>> allocated by ds.c.
>>
>> Kfree() itself handles NULL pointers and scripts/checkpatch.pl warns on
>> a check for NULL around a kfree() call.
>>
> Yes, I have narrowed this down to the following lines:
> current->mm->total_vm -= context->pages[qual];
> current->mm->locked_vm -= context->pages[qual];

To follow-up on this, the other issue with this code is that you
should not decrease
those two mm fields if the buffer was not allocated by ds.c. So I
think, the modification
I sent you in my patch is actually valid (just for another reason).

2008-11-19 22:26:33

by Markus Metzger

[permalink] [raw]
Subject: Re: debugctl msr

On Wed, 2008-11-19 at 20:20 +0100, stephane eranian wrote:

> Yes, I have narrowed this down to the following lines:
> current->mm->total_vm -= context->pages[qual];
> current->mm->locked_vm -= context->pages[qual];
>
> I think this is again related to the problem of which thread call
> ds_release(). In my test
> case, this is the monitored thread as it exits. By the time it gets
> there current->mm is NULL.

Yes, this is again the ptrace-ness of the approach. The entire code
assumes that there is one tracer task that controls another traced task.

You're right, though, that I should only do the memory accounting
if the buffer had been allocated by ds.c.
That's a plain bug. Perfmon2 is the first user that uses its own
buffers.


> > The point I was trying to make is that buffer overflows should not be
> > handled on higher levels (i.e. users of ds.c). That's why I am so
> > reluctant to expose the interrupt threshold in the ds.c interface.
> >
> But the threshold is a characteristic of the buffer, not the interrupt handler.
> Depending on the tool, it may be interesting to set the threshold earlier than
> at the end of the buffer.

Good point.

Would you want to change the threshold or would it be OK if this became
another parameter to ds_request()?

regards,
markus.

2008-11-20 21:19:31

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

I found a couple more bugs with ds.c:

- in ds_put_context(), you need to mark task->thread.ds_ctx = NULL or
this_system_context = NULL when you are
done, otherwise a subsequent session on the same task or CPU will
think there is already a context allocated but
the pointer will be stale.

- in ptrace.c:ptrace_disable(), you systematically invoke ds_release()
without checking if TIF_BTS_TRACE_TS
is set. That causes extraneous calls to ds_release() which messes up
the reference counting if PEBS is in use.


I ran into those issues trying to make perfmon work. So far, I have
gotten per-thread PEBS to work. Per-cpu is still
crashing the machine.


On Wed, Nov 19, 2008 at 11:26 PM, Markus Metzger
<[email protected]> wrote:
> On Wed, 2008-11-19 at 20:20 +0100, stephane eranian wrote:
>
>> Yes, I have narrowed this down to the following lines:
>> current->mm->total_vm -= context->pages[qual];
>> current->mm->locked_vm -= context->pages[qual];
>>
>> I think this is again related to the problem of which thread call
>> ds_release(). In my test
>> case, this is the monitored thread as it exits. By the time it gets
>> there current->mm is NULL.
>
> Yes, this is again the ptrace-ness of the approach. The entire code
> assumes that there is one tracer task that controls another traced task.
>
> You're right, though, that I should only do the memory accounting
> if the buffer had been allocated by ds.c.
> That's a plain bug. Perfmon2 is the first user that uses its own
> buffers.
>
>
>> > The point I was trying to make is that buffer overflows should not be
>> > handled on higher levels (i.e. users of ds.c). That's why I am so
>> > reluctant to expose the interrupt threshold in the ds.c interface.
>> >
>> But the threshold is a characteristic of the buffer, not the interrupt handler.
>> Depending on the tool, it may be interesting to set the threshold earlier than
>> at the end of the buffer.
>
> Good point.
>
> Would you want to change the threshold or would it be OK if this became
> another parameter to ds_request()?
>
> regards,
> markus.
>
>

2008-11-21 08:22:46

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Donnerstag, 20. November 2008 22:19
>To: Markus Metzger

>- in ds_put_context(), you need to mark task->thread.ds_ctx = NULL or
>this_system_context = NULL when you are
> done, otherwise a subsequent session on the same task or CPU will
>think there is already a context allocated but
> the pointer will be stale.

This should happen in:
*(context->this) = NULL;

Did you see a context without owners?


>- in ptrace.c:ptrace_disable(), you systematically invoke ds_release()
>without checking if TIF_BTS_TRACE_TS
> is set. That causes extraneous calls to ds_release() which messes up
>the reference counting if PEBS is in use.

That TIF only triggers the recording of scheduling timestamps.

That ds_release_bts() call should:
- get the context
- fail the ownership validation in ds_validate()
- put the context

If ds_validate() is disabled, it would put the context twice.


I am currently working on a patch to replace the validation
mechanism by using a handle returned from ds_request().

I also added the interrupt threshold to the parameters of ds_request().
It will be fix for one tracing session, if that's OK with you.


thanks and regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-21 08:48:06

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

On Fri, Nov 21, 2008 at 9:22 AM, Metzger, Markus T
<[email protected]> wrote:
>>-----Original Message-----
>>From: stephane eranian [mailto:[email protected]]
>>Sent: Donnerstag, 20. November 2008 22:19
>>To: Markus Metzger
>
>>- in ds_put_context(), you need to mark task->thread.ds_ctx = NULL or
>>this_system_context = NULL when you are
>> done, otherwise a subsequent session on the same task or CPU will
>>think there is already a context allocated but
>> the pointer will be stale.
>
> This should happen in:
> *(context->this) = NULL;
>
Ah, I missed that. I admit this code is hard to follow. Maybe it would easier to
explicitly use the actual fields.
>
>>- in ptrace.c:ptrace_disable(), you systematically invoke ds_release()
>>without checking if TIF_BTS_TRACE_TS
>> is set. That causes extraneous calls to ds_release() which messes up
>>the reference counting if PEBS is in use.
>
> That TIF only triggers the recording of scheduling timestamps.
>
> That ds_release_bts() call should:
> - get the context
> - fail the ownership validation in ds_validate()
> - put the context
>
> If ds_validate() is disabled, it would put the context twice.
>
Yes, that was my case.


>
> I am currently working on a patch to replace the validation
> mechanism by using a handle returned from ds_request().
>
Yes, that's the solution.

> I also added the interrupt threshold to the parameters of ds_request().
> It will be fix for one tracing session, if that's OK with you.
>
Yes.

I found the reason why in system-wide my kernel was crashing.
It was due to the way you are writing DS_AREA_MSR in
ds_allocate_context():

if (!task || (task == current))
wrmsr(MSR_IA32_DS_AREA, (unsigned long)context->ds, 0);


This does not work in 64-bit, you truncate the DS address. You must use wrmsrl()
and let the macro break it down in two halves.

I am still seeing an issue whereby I have a buffer with > 100 entries
but I get an interrupt
or each sample, which is wrong. I am investigating why this is. There
should only be one
interrupt per buffer full. This use to work with my old ways of handling PEBS.

Thanks.

2008-11-21 08:59:18

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Freitag, 21. November 2008 09:48
>To: Metzger, Markus T

>> If ds_validate() is disabled, it would put the context twice.
>>
>Yes, that was my case.

The code heavily relies on ds_validate().
I would expect a lot of things to break down if ds_validate() were
disabled.


>I found the reason why in system-wide my kernel was crashing.
>It was due to the way you are writing DS_AREA_MSR in
>ds_allocate_context():
>
> if (!task || (task == current))
> wrmsr(MSR_IA32_DS_AREA, (unsigned
>long)context->ds, 0);


Thanks. I have a fix in my sandbox.


I won't be able to send the patches out this week. Especially the
interface change to introduce the handle is quite big. I should have
them early next week.


regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-21 13:38:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

On Fri, Nov 21, 2008 at 9:22 AM, Metzger, Markus T
<[email protected]> wrote:
> I also added the interrupt threshold to the parameters of ds_request().
> It will be fix for one tracing session, if that's OK with you.
>
I still need a call to query the threshold.

2008-11-21 15:27:36

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

Ok, I finally got everything working. The interrupt issue was due to a
bug in the perfmon code.

But I found another bug in ds.c.

The absolute maximum MUST be at buffer+size+1 but the threshold can be anywhere
between the start of the buffer and buffer+size. You had it set past
the end of the buffer (at +1),
therefore PEBS was not generating any interrupts.

On Fri, Nov 21, 2008 at 2:38 PM, stephane eranian
<[email protected]> wrote:
> Markus,
>
> On Fri, Nov 21, 2008 at 9:22 AM, Metzger, Markus T
> <[email protected]> wrote:
>> I also added the interrupt threshold to the parameters of ds_request().
>> It will be fix for one tracing session, if that's OK with you.
>>
> I still need a call to query the threshold.
>

2008-11-21 16:11:19

by Metzger, Markus T

[permalink] [raw]
Subject: RE: debugctl msr

>-----Original Message-----
>From: stephane eranian [mailto:[email protected]]
>Sent: Freitag, 21. November 2008 16:27
>To: Metzger, Markus T
>Cc: Markus Metzger; Ingo Molnar; Andi Kleen; Andrew Morton;
>[email protected]
>Subject: Re: debugctl msr

>The absolute maximum MUST be at buffer+size+1 but the
>threshold can be anywhere
>between the start of the buffer and buffer+size. You had it set past
>the end of the buffer (at +1),
>therefore PEBS was not generating any interrupts.

The default is cyclic mode, which requires the interrupt threshold
to be outside of the buffer.

A request for interrupt mode should have got you -EOPNOTSUPP.

Absolute max should be one byte beyond the buffer, which
is base + size.


regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2008-11-21 16:33:22

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

On Fri, Nov 21, 2008 at 5:10 PM, Metzger, Markus T
<[email protected]> wrote:
>>-----Original Message-----
>>From: stephane eranian [mailto:[email protected]]
>>Sent: Freitag, 21. November 2008 16:27
>>To: Metzger, Markus T
>>Cc: Markus Metzger; Ingo Molnar; Andi Kleen; Andrew Morton;
>>[email protected]
>>Subject: Re: debugctl msr
>
>>The absolute maximum MUST be at buffer+size+1 but the
>>threshold can be anywhere
>>between the start of the buffer and buffer+size. You had it set past
>>the end of the buffer (at +1),
>>therefore PEBS was not generating any interrupts.
>
> The default is cyclic mode, which requires the interrupt threshold
> to be outside of the buffer.
>
That's not what I need.

> A request for interrupt mode should have got you -EOPNOTSUPP.

I did not try that.

>
> Absolute max should be one byte beyond the buffer, which
> is base + size.
>
My bad ;-<

2008-11-21 22:47:49

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

Perfmon does allocate its own buffer for PEBS.
That buffer contains: a perfmon header + PEBS buffer

Keep in mind that the PEBS buffer is actually remapped to user space to avoid
copying records around.

With the current code base, ds.c can accept a pre-allocated buffer.
However, it does
some adjustments for alignment and size (multiple of PEBS record
size). That means
that the start and end of the buffer could be different from the area
passed to ds_request_pebs().

To parse the buffer, a monitoring tools needs a start position and the
number of samples. The start
position can be expressed as an offset from the beginning of the
perfmon buffer. The ds.c interface
currently only returns indexes for current position, threshold, and
end. Those indexes are good enough
to derived the number of samples. But I would need one call to return
the byte offset from the original
buffer, or the actual address (which perfmon could then convert back
to an offset from its buffer).

Thanks.

On Fri, Nov 21, 2008 at 5:33 PM, stephane eranian
<[email protected]> wrote:
> On Fri, Nov 21, 2008 at 5:10 PM, Metzger, Markus T
> <[email protected]> wrote:
>>>-----Original Message-----
>>>From: stephane eranian [mailto:[email protected]]
>>>Sent: Freitag, 21. November 2008 16:27
>>>To: Metzger, Markus T
>>>Cc: Markus Metzger; Ingo Molnar; Andi Kleen; Andrew Morton;
>>>[email protected]
>>>Subject: Re: debugctl msr
>>
>>>The absolute maximum MUST be at buffer+size+1 but the
>>>threshold can be anywhere
>>>between the start of the buffer and buffer+size. You had it set past
>>>the end of the buffer (at +1),
>>>therefore PEBS was not generating any interrupts.
>>
>> The default is cyclic mode, which requires the interrupt threshold
>> to be outside of the buffer.
>>
> That's not what I need.
>
>> A request for interrupt mode should have got you -EOPNOTSUPP.
>
> I did not try that.
>
>>
>> Absolute max should be one byte beyond the buffer, which
>> is base + size.
>>
> My bad ;-<
>

2008-11-22 09:51:56

by Markus Metzger

[permalink] [raw]
Subject: Re: debugctl msr

On Fri, 2008-11-21 at 23:47 +0100, stephane eranian wrote:


> With the current code base, ds.c can accept a pre-allocated buffer.
> However, it does
> some adjustments for alignment and size (multiple of PEBS record
> size). That means
> that the start and end of the buffer could be different from the area
> passed to ds_request_pebs().
>
> To parse the buffer, a monitoring tools needs a start position and the
> number of samples. The start
> position can be expressed as an offset from the beginning of the
> perfmon buffer. The ds.c interface
> currently only returns indexes for current position, threshold, and
> end. Those indexes are good enough
> to derived the number of samples. But I would need one call to return
> the byte offset from the original
> buffer, or the actual address (which perfmon could then convert back
> to an offset from its buffer).

ds_access() returns the pointer into the raw buffer at a given index.
For index 0, this is the beginning of the buffer.


I plan to replace the various access functions with a single one that
returns a const pointer to the configuration.

Something like:

struct ds_config {
size_t number_of_records;
size_t size_of_a_single_record;
void *base;
void *max;
void *index;
void *threshold;
};

const struct ds_config *ds_config_pebs(struct pebs_tracer *tracer);

For BTS, I would add functions to work on this struct and translate raw
entries into an architecture-independent format.
I'm not sure whether this would make sense for PEBS, as well.

Would you be OK with such a change?


thanks and regards,
markus.

2008-11-23 22:31:51

by Stephane Eranian

[permalink] [raw]
Subject: Re: debugctl msr

Markus,

On Sat, Nov 22, 2008 at 10:51 AM, Markus Metzger
<[email protected]> wrote:
>
> ds_access() returns the pointer into the raw buffer at a given index.
> For index 0, this is the beginning of the buffer.

Ok, I missed that call.
>
>
> I plan to replace the various access functions with a single one that
> returns a const pointer to the configuration.
>
> Something like:
>
> struct ds_config {
> size_t number_of_records;
> size_t size_of_a_single_record;
> void *base;
> void *max;
> void *index;
> void *threshold;
> };
>
> const struct ds_config *ds_config_pebs(struct pebs_tracer *tracer);
> Would you be OK with such a change?
>
Looks good.
There are two calls on the critical path for perfmon : get_index and
get_threshold.
So we want to make sure it is as fast as possible.