On Fri, Oct 30, 2015 at 07:26:35PM -0400, Waiman Long wrote:
> This patch enables the accumulation of kicking and waiting related
> PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
> option is selected. It also enables the collection of data which
> enable us to calculate the kicking and wakeup latencies which have
> a heavy dependency on the CPUs being used.
>
> The statistical counters are per-cpu variables to minimize the
> performance overhead in their updates. These counters are exported
> via the sysfs filesystem under the /sys/kernel/qlockstat directory.
> When the corresponding sysfs files are read, summation and computing
> of the required data are then performed.
Why did you switch to sysfs? You can create custom debugfs files too.
> @@ -259,7 +275,7 @@ static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
> if (READ_ONCE(pn->state) == vcpu_hashed)
> lp = (struct qspinlock **)1;
>
> - for (;;) {
> + for (;; waitcnt++) {
> for (loop = SPIN_THRESHOLD; loop; loop--) {
> if (!READ_ONCE(l->locked))
> return;
Did you check that goes away when !STAT ?
> +/*
> + * Return the average kick latency (ns) = pv_latency_kick/pv_kick_unlock
> + */
> +static ssize_t
> +kick_latency_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
> +{
> + int cpu;
> + u64 latencies = 0, kicks = 0;
> +
> + for_each_online_cpu(cpu) {
I think you need for_each_possible_cpu(), otherwise the results will
change with hotplug operations.
> + kicks += per_cpu(qstats[qstat_pv_kick_unlock], cpu);
> + latencies += per_cpu(qstats[qstat_pv_latency_kick], cpu);
> + }
> +
> + /* Rounded to the nearest ns */
> + return sprintf(buf, "%llu\n", kicks ? (latencies + kicks/2)/kicks : 0);
> +}
On 11/02/2015 11:40 AM, Peter Zijlstra wrote:
> On Fri, Oct 30, 2015 at 07:26:35PM -0400, Waiman Long wrote:
>> This patch enables the accumulation of kicking and waiting related
>> PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
>> option is selected. It also enables the collection of data which
>> enable us to calculate the kicking and wakeup latencies which have
>> a heavy dependency on the CPUs being used.
>>
>> The statistical counters are per-cpu variables to minimize the
>> performance overhead in their updates. These counters are exported
>> via the sysfs filesystem under the /sys/kernel/qlockstat directory.
>> When the corresponding sysfs files are read, summation and computing
>> of the required data are then performed.
> Why did you switch to sysfs? You can create custom debugfs files too.
I was not aware of that capability. So you mean using
debugfs_create_file() using custom file_operations. Right? That doesn't
seem to be easier than using sysfs. However, I can use that if you think
it is better to use debugfs.
>
>> @@ -259,7 +275,7 @@ static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
>> if (READ_ONCE(pn->state) == vcpu_hashed)
>> lp = (struct qspinlock **)1;
>>
>> - for (;;) {
>> + for (;; waitcnt++) {
>> for (loop = SPIN_THRESHOLD; loop; loop--) {
>> if (!READ_ONCE(l->locked))
>> return;
> Did you check that goes away when !STAT ?
Yes, the increment code goes away when !STAT. I had added a comment to
talk about that.
>
>> +/*
>> + * Return the average kick latency (ns) = pv_latency_kick/pv_kick_unlock
>> + */
>> +static ssize_t
>> +kick_latency_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
>> +{
>> + int cpu;
>> + u64 latencies = 0, kicks = 0;
>> +
>> + for_each_online_cpu(cpu) {
> I think you need for_each_possible_cpu(), otherwise the results will
> change with hotplug operations.
Right, I will make the change.
Cheers,
Longman
On Thu, Nov 05, 2015 at 11:29:29AM -0500, Waiman Long wrote:
> On 11/02/2015 11:40 AM, Peter Zijlstra wrote:
> >On Fri, Oct 30, 2015 at 07:26:35PM -0400, Waiman Long wrote:
> >>This patch enables the accumulation of kicking and waiting related
> >>PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
> >>option is selected. It also enables the collection of data which
> >>enable us to calculate the kicking and wakeup latencies which have
> >>a heavy dependency on the CPUs being used.
> >>
> >>The statistical counters are per-cpu variables to minimize the
> >>performance overhead in their updates. These counters are exported
> >>via the sysfs filesystem under the /sys/kernel/qlockstat directory.
> >>When the corresponding sysfs files are read, summation and computing
> >>of the required data are then performed.
> >Why did you switch to sysfs? You can create custom debugfs files too.
>
> I was not aware of that capability. So you mean using debugfs_create_file()
> using custom file_operations. Right?
Yep.
> That doesn't seem to be easier than
> using sysfs. However, I can use that if you think it is better to use
> debugfs.
Mostly I just wanted to point out that it was possible; you need not
change to sysfs because debugfs lacks the capability.
But now that you ask, I think debugfs might be the better place, such
statistics (and the proposed CONFIG symbol) are purely for debug
purposes, right?
On 11/05/2015 11:43 AM, Peter Zijlstra wrote:
> On Thu, Nov 05, 2015 at 11:29:29AM -0500, Waiman Long wrote:
>> On 11/02/2015 11:40 AM, Peter Zijlstra wrote:
>>> On Fri, Oct 30, 2015 at 07:26:35PM -0400, Waiman Long wrote:
>>>> This patch enables the accumulation of kicking and waiting related
>>>> PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
>>>> option is selected. It also enables the collection of data which
>>>> enable us to calculate the kicking and wakeup latencies which have
>>>> a heavy dependency on the CPUs being used.
>>>>
>>>> The statistical counters are per-cpu variables to minimize the
>>>> performance overhead in their updates. These counters are exported
>>>> via the sysfs filesystem under the /sys/kernel/qlockstat directory.
>>>> When the corresponding sysfs files are read, summation and computing
>>>> of the required data are then performed.
>>> Why did you switch to sysfs? You can create custom debugfs files too.
>> I was not aware of that capability. So you mean using debugfs_create_file()
>> using custom file_operations. Right?
> Yep.
>
>> That doesn't seem to be easier than
>> using sysfs. However, I can use that if you think it is better to use
>> debugfs.
> Mostly I just wanted to point out that it was possible; you need not
> change to sysfs because debugfs lacks the capability.
>
> But now that you ask, I think debugfs might be the better place, such
> statistics (and the proposed CONFIG symbol) are purely for debug
> purposes, right?
Davidlohr had asked me to use per-cpu counters to reduce performance
overhead so that they can be usable in production system. That is
another reason why I move to sysfs.
BTW, do you have comments on the other patches in the series? I would
like to collect all the comments before I renew the series.
Cheers,
Longman
On Thu, Nov 05, 2015 at 11:59:21AM -0500, Waiman Long wrote:
> >Mostly I just wanted to point out that it was possible; you need not
> >change to sysfs because debugfs lacks the capability.
> >
> >But now that you ask, I think debugfs might be the better place, such
> >statistics (and the proposed CONFIG symbol) are purely for debug
> >purposes, right?
>
> Davidlohr had asked me to use per-cpu counters to reduce performance
> overhead so that they can be usable in production system. That is another
> reason why I move to sysfs.
Yes, the per-cpu thing certainly makes sense. But as said, that does not
require you to move to sysfs.
> BTW, do you have comments on the other patches in the series? I would like
> to collect all the comments before I renew the series.
I still have to look at the last two patches, I've sadly not had time
for that yet.
On 11/05/2015 12:09 PM, Peter Zijlstra wrote:
> On Thu, Nov 05, 2015 at 11:59:21AM -0500, Waiman Long wrote:
>>> Mostly I just wanted to point out that it was possible; you need not
>>> change to sysfs because debugfs lacks the capability.
>>>
>>> But now that you ask, I think debugfs might be the better place, such
>>> statistics (and the proposed CONFIG symbol) are purely for debug
>>> purposes, right?
>> Davidlohr had asked me to use per-cpu counters to reduce performance
>> overhead so that they can be usable in production system. That is another
>> reason why I move to sysfs.
> Yes, the per-cpu thing certainly makes sense. But as said, that does not
> require you to move to sysfs.
>
>> BTW, do you have comments on the other patches in the series? I would like
>> to collect all the comments before I renew the series.
> I still have to look at the last two patches, I've sadly not had time
> for that yet.
That is what I thought. Just let me know when you are done with the
review, and I will update the patches and send out a new series.
Cheers,
Longman