2019-04-12 00:58:22

by Linhaifeng

[permalink] [raw]
Subject: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Hi,
I have a single thread application like this:

While (1) {
    start = rdtsc();
    sqrt (1024);     
  end = rdtsc();
  cycles = end – start;
  printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
      1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
      cycles);
}
It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
The result of test is:

sqrt 2019-04-10 23:53:50: 43968
sqrt 2019-04-10 23:53:51: 44060
sqrt 2019-04-10 23:53:52: 49012
sqrt 2019-04-10 23:53:53: 38172
sqrt 2019-04-10 23:53:54: 131081408
sqrt 2019-04-10 23:53:55: 43600
sqrt 2019-04-10 23:53:56: 46704
sqrt 2019-04-10 23:53:57: 46880
sqrt 2019-04-10 23:53:58: 44332
……
sqrt 2019-04-10 02:17:15: 131081408
……
sqrt 2019-04-10 04:40:35: 131081408
……

Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.

L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched
^C[ perf record: Woken up 64 times to write data ]
[ perf record: Captured and wrote 204.878 MB perf.sched (1911189 samples) ]

L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched

-----------------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------
  TOTAL:                |      0.000 ms |        0 |
---------------------------------------------------



Is there any other tools of perf to found out the max latency by irq or cpu idle ?



2019-04-12 14:07:23

by William Cohen

[permalink] [raw]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

On 4/11/19 8:57 PM, Linhaifeng wrote:
> Hi,
> I have a single thread application like this:
>
> While (1) {
>     start = rdtsc();
>     sqrt (1024);     
>   end = rdtsc();
>   cycles = end – start;
>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
>       1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>       cycles);
> }
> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
> The result of test is:
>
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 131081408
> ……
> sqrt 2019-04-10 04:40:35: 131081408
> ……
>
> Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.

Hi,

The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest?

According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf):

The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing
unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp
counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4,
Intel Xeon, P6 family, and Pentium processors.

-Will Cohen


>
> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched
> ^C[ perf record: Woken up 64 times to write data ]
> [ perf record: Captured and wrote 204.878 MB perf.sched (1911189 samples) ]
>
> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
>
> -----------------------------------------------------------------------------------------------------------------
>   Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> -----------------------------------------------------------------------------------------------------------------
> -----------------------------------------------------------------------------------------------------------------
>   TOTAL:                |      0.000 ms |        0 |
> ---------------------------------------------------
>
>
>
> Is there any other tools of perf to found out the max latency by irq or cpu idle ?
>
>

2019-04-13 06:02:19

by Linhaifeng

[permalink] [raw]
Subject: RE: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Sorry, the value 131081408 is just for example. Actually the result is like this:
sqrt 2019-04-10 23:53:50: 43968
sqrt 2019-04-10 23:53:51: 44060
sqrt 2019-04-10 23:53:52: 49012
sqrt 2019-04-10 23:53:53: 38172
sqrt 2019-04-10 23:53:54: 131081408
sqrt 2019-04-10 23:53:55: 43600
sqrt 2019-04-10 23:53:56: 46704
sqrt 2019-04-10 23:53:57: 46880
sqrt 2019-04-10 23:53:58: 44332
……
sqrt 2019-04-10 02:17:15: 136345876
……
sqrt 2019-04-10 04:40:35: 136335644
……

-----Original Message-----
From: William Cohen [mailto:[email protected]]
Sent: 2019年4月12日 22:06
To: Linhaifeng <[email protected]>; [email protected]; [email protected]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

On 4/11/19 8:57 PM, Linhaifeng wrote:
> Hi,
> I have a single thread application like this:
>
> While (1) {
>     start = rdtsc();
>     sqrt (1024);     
>   end = rdtsc();
>   cycles = end – start;
>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
>       1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>       cycles);
> }
> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
> The result of test is:
>
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 131081408
> ……
> sqrt 2019-04-10 04:40:35: 131081408
> ……
>
> Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.

Hi,

The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest?

According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf):

The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors.

-Will Cohen


>
> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[
> perf record: Woken up 64 times to write data ] [ perf record: Captured
> and wrote 204.878 MB perf.sched (1911189 samples) ]
>
> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
>
> ----------------------------------------------------------------------
> -------------------------------------------
>   Task                  |   Runtime ms  | Switches | Average delay ms
> | Maximum delay ms | Maximum delay at       |
> ----------------------------------------------------------------------
> -------------------------------------------
> ----------------------------------------------------------------------
> -------------------------------------------
>   TOTAL:                |      0.000 ms |        0 |
> ---------------------------------------------------
>
>
>
> Is there any other tools of perf to found out the max latency by irq or cpu idle ?
>
>

2019-04-14 00:10:50

by William Cohen

[permalink] [raw]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

On 4/13/19 2:01 AM, Linhaifeng wrote:
> Sorry, the value 131081408 is just for example. Actually the result is like this:
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 136345876
> ……
> sqrt 2019-04-10 04:40:35: 136335644
> ……

Hi,

Still it would be interesting to look at the two raw values used to compute the delta to better diagnose what is going on.

-Will
>
> -----Original Message-----
> From: William Cohen [mailto:[email protected]]
> Sent: 2019年4月12日 22:06
> To: Linhaifeng <[email protected]>; [email protected]; [email protected]
> Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle
>
> On 4/11/19 8:57 PM, Linhaifeng wrote:
>> Hi,
>> I have a single thread application like this:
>>
>> While (1) {
>>     start = rdtsc();
>>     sqrt (1024);     
>>   end = rdtsc();
>>   cycles = end – start;
>>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
>>       1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>>       cycles);
>> }
>> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
>> The result of test is:
>>
>> sqrt 2019-04-10 23:53:50: 43968
>> sqrt 2019-04-10 23:53:51: 44060
>> sqrt 2019-04-10 23:53:52: 49012
>> sqrt 2019-04-10 23:53:53: 38172
>> sqrt 2019-04-10 23:53:54: 131081408
>> sqrt 2019-04-10 23:53:55: 43600
>> sqrt 2019-04-10 23:53:56: 46704
>> sqrt 2019-04-10 23:53:57: 46880
>> sqrt 2019-04-10 23:53:58: 44332
>> ……
>> sqrt 2019-04-10 02:17:15: 131081408
>> ……
>> sqrt 2019-04-10 04:40:35: 131081408
>> ……
>>
>> Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.
>
> Hi,
>
> The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest?
>
> According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf):
>
> The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors.
>
> -Will Cohen
>
>
>>
>> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[
>> perf record: Woken up 64 times to write data ] [ perf record: Captured
>> and wrote 204.878 MB perf.sched (1911189 samples) ]
>>
>> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
>>
>> ----------------------------------------------------------------------
>> -------------------------------------------
>>   Task                  |   Runtime ms  | Switches | Average delay ms
>> | Maximum delay ms | Maximum delay at       |
>> ----------------------------------------------------------------------
>> -------------------------------------------
>> ----------------------------------------------------------------------
>> -------------------------------------------
>>   TOTAL:                |      0.000 ms |        0 |
>> ---------------------------------------------------
>>
>>
>>
>> Is there any other tools of perf to found out the max latency by irq or cpu idle ?
>>
>>
>

2019-04-15 12:54:26

by Paul A. Clarke

[permalink] [raw]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

On 4/13/19 1:01 AM, Linhaifeng wrote:
> Sorry, the value 131081408 is just for example. Actually the result is like this:
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 136345876
> ……
> sqrt 2019-04-10 04:40:35: 136335644
> ……

Given the periodicity, I'm wondering if it could be System Management Interrupts (SMIs)?

If so, some BIOSes have settings to mitigate, otherwise, there's not much that can be done, per my understanding.

PC

> -----Original Message-----
> From: William Cohen [mailto:[email protected]]
> Sent: 2019年4月12日 22:06
> To: Linhaifeng <[email protected]>; [email protected]; [email protected]
> Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle
>
> On 4/11/19 8:57 PM, Linhaifeng wrote:
>> Hi,
>> I have a single thread application like this:
>>
>> While (1) {
>>     start = rdtsc();
>>     sqrt (1024);     
>>   end = rdtsc();
>>   cycles = end – start;
>>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
>>       1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>>       cycles);
>> }
>> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
>> The result of test is:
>>
>> sqrt 2019-04-10 23:53:50: 43968
>> sqrt 2019-04-10 23:53:51: 44060
>> sqrt 2019-04-10 23:53:52: 49012
>> sqrt 2019-04-10 23:53:53: 38172
>> sqrt 2019-04-10 23:53:54: 131081408
>> sqrt 2019-04-10 23:53:55: 43600
>> sqrt 2019-04-10 23:53:56: 46704
>> sqrt 2019-04-10 23:53:57: 46880
>> sqrt 2019-04-10 23:53:58: 44332
>> ……
>> sqrt 2019-04-10 02:17:15: 131081408
>> ……
>> sqrt 2019-04-10 04:40:35: 131081408
>> ……
>>
>> Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.
>
> Hi,
>
> The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest?
>
> According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf):
>
> The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors.
>
> -Will Cohen
>
>
>>
>> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[
>> perf record: Woken up 64 times to write data ] [ perf record: Captured
>> and wrote 204.878 MB perf.sched (1911189 samples) ]
>>
>> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
>>
>> ----------------------------------------------------------------------
>> -------------------------------------------
>>   Task                  |   Runtime ms  | Switches | Average delay ms
>> | Maximum delay ms | Maximum delay at       |
>> ----------------------------------------------------------------------
>> -------------------------------------------
>> ----------------------------------------------------------------------
>> -------------------------------------------
>>   TOTAL:                |      0.000 ms |        0 |
>> ---------------------------------------------------
>>
>>
>>
>> Is there any other tools of perf to found out the max latency by irq or cpu idle ?
>>
>>
>

2019-04-15 18:01:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Em Mon, Apr 15, 2019 at 07:52:59AM -0500, Paul Clarke escreveu:
> On 4/13/19 1:01 AM, Linhaifeng wrote:
> > Sorry, the value 131081408 is just for example. Actually the result is like this:
> > sqrt 2019-04-10 23:53:50: 43968
> > sqrt 2019-04-10 23:53:51: 44060
> > sqrt 2019-04-10 23:53:52: 49012
> > sqrt 2019-04-10 23:53:53: 38172
> > sqrt 2019-04-10 23:53:54: 131081408
> > sqrt 2019-04-10 23:53:55: 43600
> > sqrt 2019-04-10 23:53:56: 46704
> > sqrt 2019-04-10 23:53:57: 46880
> > sqrt 2019-04-10 23:53:58: 44332
> > ……
> > sqrt 2019-04-10 02:17:15: 136345876
> > ……
> > sqrt 2019-04-10 04:40:35: 136335644
> > ……
>
> Given the periodicity, I'm wondering if it could be System Management Interrupts (SMIs)?
>
> If so, some BIOSes have settings to mitigate, otherwise, there's not much that can be done, per my understanding.

[root@quaco perf]# perf list smi

List of pre-defined events (to be used in -e):

msr/smi/ [Kernel PMU event]

sunrpc:xprt_transmit [Tracepoint event]
tcp:tcp_retransmit_skb [Tracepoint event]
tcp:tcp_retransmit_synack [Tracepoint event]

sdt_qemu:qmp_block_job_dismiss [SDT event]
sdt_qemu:qmp_job_dismiss [SDT event]

Metric Groups:

[root@quaco perf]#

Here, using it:

[root@quaco perf]# perf stat -I 1000 -e msr/smi/
# time counts unit events
1.000919819 0 msr/smi/
2.001689950 0 msr/smi/
3.002266988 0 msr/smi/
4.003049439 16 msr/smi/
5.003676064 0 msr/smi/
6.004420613 16 msr/smi/
^C 6.609405666 0 msr/smi/

[root@quaco perf]#


In this Lenovo T480s I generate SMIs by pressing the mute button :-)

- Arnaldo

2019-04-16 01:10:27

by Linhaifeng

[permalink] [raw]
Subject: RE: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Hi,

Thank you every much!
I found MSI in my system too and there are some errors found in bios log. Replace the memorys can fix it.


L2GW_2680:/home/fsp/ycb # perf stat -C 1-11,14-25,29-39,42-52 --smi-cost
^C
Performance counter stats for 'CPU(s) 1-11,14-25,29-39,42-52':

SMI cycles% SMI#
-0.2% 90

4472.483413712 seconds time elapsed


-----Original Message-----
From: Arnaldo Carvalho de Melo [mailto:[email protected]]
Sent: 2019年4月16日 1:59
To: Paul Clarke <[email protected]>
Cc: Linhaifeng <[email protected]>; [email protected]; [email protected]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Em Mon, Apr 15, 2019 at 07:52:59AM -0500, Paul Clarke escreveu:
> On 4/13/19 1:01 AM, Linhaifeng wrote:
> > Sorry, the value 131081408 is just for example. Actually the result is like this:
> > sqrt 2019-04-10 23:53:50: 43968
> > sqrt 2019-04-10 23:53:51: 44060
> > sqrt 2019-04-10 23:53:52: 49012
> > sqrt 2019-04-10 23:53:53: 38172
> > sqrt 2019-04-10 23:53:54: 131081408 sqrt 2019-04-10 23:53:55:
> > 43600 sqrt 2019-04-10 23:53:56: 46704 sqrt 2019-04-10 23:53:57:
> > 46880 sqrt 2019-04-10 23:53:58: 44332 …… sqrt 2019-04-10
> > 02:17:15: 136345876 …… sqrt 2019-04-10 04:40:35: 136335644 ……
>
> Given the periodicity, I'm wondering if it could be System Management Interrupts (SMIs)?
>
> If so, some BIOSes have settings to mitigate, otherwise, there's not much that can be done, per my understanding.

[root@quaco perf]# perf list smi

List of pre-defined events (to be used in -e):

msr/smi/ [Kernel PMU event]

sunrpc:xprt_transmit [Tracepoint event]
tcp:tcp_retransmit_skb [Tracepoint event]
tcp:tcp_retransmit_synack [Tracepoint event]

sdt_qemu:qmp_block_job_dismiss [SDT event]
sdt_qemu:qmp_job_dismiss [SDT event]

Metric Groups:

[root@quaco perf]#

Here, using it:

[root@quaco perf]# perf stat -I 1000 -e msr/smi/
# time counts unit events
1.000919819 0 msr/smi/
2.001689950 0 msr/smi/
3.002266988 0 msr/smi/
4.003049439 16 msr/smi/
5.003676064 0 msr/smi/
6.004420613 16 msr/smi/
^C 6.609405666 0 msr/smi/

[root@quaco perf]#


In this Lenovo T480s I generate SMIs by pressing the mute button :-)

- Arnaldo

2019-04-16 01:36:33

by Linhaifeng

[permalink] [raw]
Subject: RE: perf tools:Is there any tools to found out the max latency by irq or cpu idle

Hi, PC

Thank you every much!

-----Original Message-----
From: Paul Clarke [mailto:[email protected]]
Sent: 2019年4月15日 20:53
To: Linhaifeng <[email protected]>; [email protected]; [email protected]
Subject: Re: perf tools:Is there any tools to found out the max latency by irq or cpu idle

On 4/13/19 1:01 AM, Linhaifeng wrote:
> Sorry, the value 131081408 is just for example. Actually the result is like this:
> sqrt 2019-04-10 23:53:50: 43968
> sqrt 2019-04-10 23:53:51: 44060
> sqrt 2019-04-10 23:53:52: 49012
> sqrt 2019-04-10 23:53:53: 38172
> sqrt 2019-04-10 23:53:54: 131081408
> sqrt 2019-04-10 23:53:55: 43600
> sqrt 2019-04-10 23:53:56: 46704
> sqrt 2019-04-10 23:53:57: 46880
> sqrt 2019-04-10 23:53:58: 44332
> ……
> sqrt 2019-04-10 02:17:15: 136345876
> ……
> sqrt 2019-04-10 04:40:35: 136335644
> ……

Given the periodicity, I'm wondering if it could be System Management Interrupts (SMIs)?

If so, some BIOSes have settings to mitigate, otherwise, there's not much that can be done, per my understanding.

PC

> -----Original Message-----
> From: William Cohen [mailto:[email protected]]
> Sent: 2019年4月12日 22:06
> To: Linhaifeng <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: perf tools:Is there any tools to found out the max
> latency by irq or cpu idle
>
> On 4/11/19 8:57 PM, Linhaifeng wrote:
>> Hi,
>> I have a single thread application like this:
>>
>> While (1) {
>>     start = rdtsc();
>>     sqrt (1024);     
>>   end = rdtsc();
>>   cycles = end – start;
>>   printf("cycles: %d-%02d-%02d %02d:%02d:%02d: %lu\n",
>>       1900+timeinfo->tm_year, 1+timeinfo->tm_mon, timeinfo->tm_mday, timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec,
>>       cycles);
>> }
>> It print the cycles of sqrt every second and run with taskset –c 1 ./sqrt.
>> The result of test is:
>>
>> sqrt 2019-04-10 23:53:50: 43968
>> sqrt 2019-04-10 23:53:51: 44060
>> sqrt 2019-04-10 23:53:52: 49012
>> sqrt 2019-04-10 23:53:53: 38172
>> sqrt 2019-04-10 23:53:54: 131081408
>> sqrt 2019-04-10 23:53:55: 43600
>> sqrt 2019-04-10 23:53:56: 46704
>> sqrt 2019-04-10 23:53:57: 46880
>> sqrt 2019-04-10 23:53:58: 44332
>> ……
>> sqrt 2019-04-10 02:17:15: 131081408
>> ……
>> sqrt 2019-04-10 04:40:35: 131081408
>> ……
>>
>> Every 2hour23min there would be a large cycles. I use perf sched not found any sched_switch events.
>
> Hi,
>
> The fact that it is the same value 131081408 every 2 hours 23 minutes looks suspect. One would expect some variation in the counts. It looks like there is some rollover or overflow issue. It would be helpful to print out the start and end values to see what is going on with the raw rdstc values. Maybe the thread is being moved between processors and the TSC are out of sync. What particular processor model was this running on? Was this running on physical hardware or inside a kvm guest?
>
> According to the Intel 64 and IA-32 Architecture Software Devloper's Manual Volume 3 (325384-sdm-vol-3abcd.pdf):
>
> The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound. Intel guarantees that the time-stamp counter will not wraparound within 10 years after being reset. The period for counter wrap is longer for Pentium 4, Intel Xeon, P6 family, and Pentium processors.
>
> -Will Cohen
>
>
>>
>> L2GW_2680:/home/fsp/zn # perf sched record -C 6-11 -o perf.sched ^C[
>> perf record: Woken up 64 times to write data ] [ perf record:
>> Captured and wrote 204.878 MB perf.sched (1911189 samples) ]
>>
>> L2GW_2680:/home/fsp/zn # perf sched latency -i perf.sched
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>   Task                  |   Runtime ms  | Switches | Average delay ms
>> | Maximum delay ms | Maximum delay at       |
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>   TOTAL:                |      0.000 ms |        0 |
>> ---------------------------------------------------
>>
>>
>>
>> Is there any other tools of perf to found out the max latency by irq or cpu idle ?
>>
>>
>