2012-06-12 12:57:07

by Luming Yu

[permalink] [raw]
Subject: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

Hi All,

I must have forgotten cc key persons. Sorry to make noise again.
I need to know what the right practice is to get your attention to
accept a new tool upstream like this one.

About the tool:

It's unique. It's simple. But it's valuable, But it's still a starting point.

The tool is based on Jcm's latency testing tool in RT tree to detect
SMI caused problem.
Basically it's a testing tool to measure latency and bandwidth of raw
hardware instructions and component functions exclusively in
stop_machine context. It's a kernel module that can be run separately
from kernel boot. The Goal is to test out hardware/BIOS caused
problems in 2 minutes. To me, capable of measuring the intrinsic of
hardware on which we build our software is always a better idea than
blindly looking for data from any documents. In current version of the
tool, we have a basic sampling facility and TSC test ready for x86. I
plan to add more test into this tool to enrich our tool set in Linux.

Any inputs are appreciated. :-)

Thanks for your time.
/l

---------- Forwarded message ----------
From: Luming Yu <[email protected]>
Date: Tue, Jun 12, 2012 at 11:42 AM
Subject: Fwd: [patch] a simple hardware detector for latency as well
as throughput ver. 0.1.0
To: LKML <[email protected]>


Hello everyone,

I'm trying to push a new tool upstream. I'd like to hear back from you
what the best practice is to get the job done.

Thanks,
Luming


---------- Forwarded message ----------
From: Luming Yu <[email protected]>
Date: Mon, Jun 11, 2012 at 9:59 PM
Subject: Fwd: [patch] a simple hardware detector for latency as well
as throughput ver. 0.1.0
To: [email protected]
Cc: Andrew Morton <[email protected]>, [email protected],
[email protected]


Hi,

I'd like to know if the patch looks good for linux-next to find its
way upstream in 3.6.

Thanks and regards,
Luming


---------- Forwarded message ----------
From: Luming Yu <[email protected]>
Date: Wed, May 30, 2012 at 7:47 AM
Subject: Fwd: [patch] a simple hardware detector for latency as well
as throughput ver. 0.1.0
To: Andrew Morton <[email protected]>
Cc: [email protected]


Hello akpm,

I'd like to push the patch to upstream, but I'm not sure if jcm has
extra bandwidth although he is also interested in having the tool
upstream..So I'd like ping you to check if there is any chance to
queue it up in your tree first.I will enhance it further after it's
upstream.

Thanks,
Luming



---------- Forwarded message ----------
From: Luming Yu <[email protected]>
Date: Tue, May 29, 2012 at 4:37 AM
Subject: [patch] a simple hardware detector for latency as well as
throughput ver. 0.1.0
To: [email protected]
Cc: [email protected]


Hi Jon,

The patch is the fist step to test some basic hardware functions like
TSC to help people understand if there is any hardware latency as well as
throughput problem exposed on bare metal or left behind by BIOS or
interfered by SM. Currently the patch tests hardware features
(tsc,freq, and rdrandom whiich is new instruction to get random
number) in stop_machine context. I will add more after the first step
get merged for those guys who want to directly play with new hardware
functions.

I suppose I can add your signed-off-by as the code is derived from your
hwlat_dector.

I'm also reuqesting if you are going to queue it up somewhere that can
be pulled into 3.5.

Of cause, I will update the patch based upon any comments that you
think must be fixed for 3.5 merge.

Thanks,
Luming


Signed-off-by Luming Yu <[email protected]>


 Kconfig   |    7
 Makefile  |    2
 hw_test.c |  954
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 963 insertions(+)


2012-06-12 14:42:44

by Jimmy Thrasibule

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Tue, 2012-06-12 at 20:57 +0800, Luming Yu wrote:
> Hi All,
>
> I must have forgotten cc key persons. Sorry to make noise again.
> I need to know what the right practice is to get your attention to
> accept a new tool upstream like this one.
Might good to have a look at the following:

http://www.kernel.org/doc/Documentation/SubmittingPatches
http://www.kernel.org/doc/Documentation/SubmitChecklist
http://www.linuxfoundation.org/content/how-participate-linux-community

Regards,
Jimmy

2012-06-12 14:57:23

by Luming Yu

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Tue, Jun 12, 2012 at 10:42 PM, Jimmy Thrasibule
<[email protected]> wrote:
> On Tue, 2012-06-12 at 20:57 +0800, Luming Yu wrote:
>> Hi All,
>>
>> I must have forgotten cc key persons. Sorry to make noise again.
>> I need to know what the right practice is to get your attention to
>> accept a new tool upstream like this one.
> Might good to have a look at the following:
>
> http://www.kernel.org/doc/Documentation/SubmittingPatches
> http://www.kernel.org/doc/Documentation/SubmitChecklist
> http://www.linuxfoundation.org/content/how-participate-linux-community

Thanks for these pointers, they are useful.
My intention is to push the initial working version into -mm.Then
continuously improve bits by bits. The first version works at minimal
level as a tool. So I requested it merges into linux-next or -mm.
as the first step to find way upstream. Any advise on this?

>
> Regards,
> Jimmy
>

2012-06-13 22:20:48

by Andrew Morton

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Tue, 12 Jun 2012 20:57:02 +0800
Luming Yu <[email protected]> wrote:

> I need to know what the right practice is to get your attention to
> accept a new tool upstream like this one.

Seems that you have some good feedback from Arnd to be looking at. I'm
usually the guy for mysterious misc stuff such as this, so please cc me
on future revisions.

The name "hw_test" and "HW_TEST" is too vague. The topic "testing
hardware" is very broad, and this module only touches a small fraction
of it, so please think up a far more specific name.

2012-06-14 09:25:48

by Luming Yu

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Wed, Jun 13, 2012 at 6:20 PM, Andrew Morton
<[email protected]> wrote:
> On Tue, 12 Jun 2012 20:57:02 +0800
> Luming Yu <[email protected]> wrote:
>
>> I need to know what the right practice is to get your attention to
>> accept a new tool upstream like this one.
>
> Seems that you have some good feedback from Arnd to be looking at.  I'm
> usually the guy for mysterious misc stuff such as this, so please cc me
> on future revisions.

Andrew, Thanks a lot :-) The community is really helpful after find
right people for right things.

>
> The name "hw_test" and "HW_TEST" is too vague.  The topic "testing
> hardware" is very broad, and this module only touches a small fraction
> of it, so please think up a far more specific name.
>

I'm working on Version 2 of the tool which would be renamed to
cpu_latency_test, or simply misc_latency_test?

thanks!!! /l

2012-06-14 10:05:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Tue, 2012-06-12 at 20:57 +0800, Luming Yu wrote:
> The Goal is to test out hardware/BIOS caused
> problems in 2 minutes. To me, capable of measuring the intrinsic of
> hardware on which we build our software is always a better idea than
> blindly looking for data from any documents. In current version of the
> tool, we have a basic sampling facility and TSC test ready for x86. I
> plan to add more test into this tool to enrich our tool set in Linux.
>
>
There's SMI damage around on much longer periods than 2 minutes.

Also, you can't really do stop_machine for 2 minutes and expect the
system to survive.

Furthermore, I think esp. on more recent chips there's better ways of
doing it.

For Intel there's a IA32_DEBUGCTL.FREEZE_WHILE_SMM_EN [bit 14], if you
program a PMU event that ticks at the same rate as the TSC and enable
the FREEZE_WHILE_SMM stuff, any drift observed between that and the TSC
is time lost to SMM. It also has MSR_SMI_COUNT [MSR 34H] which counts
the number of SMIs.

For AMD there's only event 02Bh, which is SMIs Received. I'm not sure it
has anything like the FREEZE or if the event is modifyable to count the
cycles in SMI.

2012-06-14 14:15:22

by Luming Yu

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Thu, Jun 14, 2012 at 6:04 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2012-06-12 at 20:57 +0800, Luming Yu wrote:
>> The Goal is to test out hardware/BIOS caused
>> problems in 2 minutes. To me, capable of measuring the intrinsic of
>> hardware on which we build our software is always a better idea than
>> blindly looking for data from any documents. In current version of the
>> tool, we have a basic sampling facility and TSC test ready for x86. I
>> plan to add more test into this tool to enrich our tool set in Linux.
>>
>>
> There's SMI damage around on much longer periods than 2 minutes.

Right, it's a problem that if we run the tool when SMI is not triggered.
My rough idea is to come up with some ideas to scan the system. Not entirely
sure but 2 minutes could be sufficient to finish such scan on a normal
laptop. The question is how we scan? We may need to adjust the 2 minutes
goal accordingly based on the method of scan and the size of machine to scan.

>
> Also, you can't really do stop_machine for 2 minutes and expect the
> system to survive.

By design, the test thread suppose to, by default, for tsc
measurement, spend sample_width/sample_window of cpu cycles in
stop_machine context on sampling. The left cpu cycles yields to other
threads in msleep_interruptible. The default sample_width is 500us,
the default sample_window is 1ms.

>
> Furthermore, I think esp. on more recent chips there's better ways of
> doing it.

Right , the reference to PMU counts will make the tool more useful.

>
> For Intel there's a IA32_DEBUGCTL.FREEZE_WHILE_SMM_EN [bit 14], if you
> program a PMU event that ticks at the same rate as the TSC and enable
> the FREEZE_WHILE_SMM stuff, any drift observed between that and the TSC
> is time lost to SMM. It also has MSR_SMI_COUNT [MSR 34H] which counts
> the number of SMIs.
>
> For AMD there's only event 02Bh, which is SMIs Received. I'm not sure it
> has anything like the FREEZE or if the event is modifyable to count the
> cycles in SMI.

It's in to-do-list for version 0.2 of the tool.

Thanks!!!

Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On 14.06.12 12:04:56, Peter Zijlstra wrote:
For AMD there's only event 02Bh, which is SMIs Received. I'm not sure it
> has anything like the FREEZE or if the event is modifyable to count the
> cycles in SMI.

Peter, which use cases do you have in mind. Is it to root cause
latencies? Or just to see what happens on the system, you long it
spends in smi mode? On current systems counting smi cycles seems not
to be possible.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-06-21 14:44:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: What is the right practice to get new code upstream( was Fwd: [patch] a simple hardware detector for latency as well as throughput ver. 0.1.0)

On Thu, 2012-06-21 at 15:29 +0200, Robert Richter wrote:
> On 14.06.12 12:04:56, Peter Zijlstra wrote:
> For AMD there's only event 02Bh, which is SMIs Received. I'm not sure it
> > has anything like the FREEZE or if the event is modifyable to count the
> > cycles in SMI.
>
> Peter, which use cases do you have in mind. Is it to root cause
> latencies? Or just to see what happens on the system, you long it
> spends in smi mode? On current systems counting smi cycles seems not
> to be possible.

Yeah exactly. So we can whack vendors over the head with hard evidence
their BIOS is utter shite.

So what we do now is disable interrupts, run a tight TSC read loop and
report fail when you see a big delta.

Now some 'creative' BIOS people thought it would be a good idea to
save/restore TSC over the SMI, this avoids detection. It also completely
wrecks TSC sync across cores.

But the SMI stuff is a real problem for -rt, this feature^Wfailure-add
is a real problem, we've seen SMIs that go well above a ms in duration,
which of course completely wreck the system.

IIRC the worst tglx ever encountered was 0.5s or so.

So ideally the PMU would have 2 events, one counting SMIs one counting
cycles in SMM. Both should ignore any and all FREEZE_IN_SMM bits if such
a thing exists. The hardware should also hard fail if such a counter is
fiddled with from SMM context.

This would give us the capability to log exactly when and for how long
the system is taken from us and makes it impossible to 'fix' from SMM.