2003-07-08 22:30:54

by Nakajima, Jun

[permalink] [raw]
Subject: [PATCH] idle using PNI monitor/mwait

Hi Linus,

Attached is a patch that enables PNI (Prescott New Instructions)
monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
is similar to hlt, but you can avoid IPI to wake up the processor
waiting. A write (by another processor) to the address range specified
by MONITOR would wake up the processor waiting on MWAIT.

Please apply.

Thanks,
Jun

----------------
diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c
linux-2.5.74/arch/i386/kernel/cpu/intel.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c
2003-07-02 13:43:55.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/cpu/intel.c 2003-07-08
09:18:28.000000000 -0700
@@ -13,6 +13,7 @@

static int disable_P4_HT __initdata = 0;
extern int trap_init_f00f_bug(void);
+extern void select_idle_routine(const struct cpuinfo_x86 *c);

#ifdef CONFIG_X86_INTEL_USERCOPY
/*
@@ -172,7 +173,7 @@
}
#endif

-
+ select_idle_routine(c);
if (c->cpuid_level > 1) {
/* supports eax=2 call */
int i, j, n;
diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/process.c
linux-2.5.74/arch/i386/kernel/process.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/process.c 2003-07-02
13:38:40.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/process.c 2003-07-08
11:52:42.000000000 -0700
@@ -148,11 +148,56 @@
}
}

+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait
state
+ * through MWAIT. Whenever someone changes need_resched, we would be
woken
+ * up from MWAIT (without an IPI).
+ */
+static void mwait_idle (void)
+{
+ local_irq_enable();
+
+ if (!need_resched()) {
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ do {
+ __monitor((void *)&current_thread_info()->flags,
0, 0);
+ if (need_resched())
+ break;
+ __mwait(0, 0);
+ } while (!need_resched());
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ }
+}
+
+void __init select_idle_routine(const struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_MWAIT)) {
+ printk("Monitor/Mwait feature present.\n");
+ /*
+ * Skip, if setup has overridden idle.
+ * Also, take care of system with asymmetric CPUs.
+ * Use, mwait_idle only if all cpus support it.
+ * If not, we fallback to default_idle()
+ */
+ if (!pm_idle) {
+ pm_idle = mwait_idle;
+ }
+ return;
+ }
+ pm_idle = default_idle;
+ return;
+}
+
static int __init idle_setup (char *str)
{
if (!strncmp(str, "poll", 4)) {
printk("using polling idle threads.\n");
pm_idle = poll_idle;
+ } else if (!strncmp(str, "halt", 4)) {
+ printk("using halt in idle threads.\n");
+ pm_idle = default_idle;
}

return 1;
diff -ur /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h
linux-2.5.74/include/asm-i386/cpufeature.h
--- /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h
2003-07-02 13:51:50.000000000 -0700
+++ linux-2.5.74/include/asm-i386/cpufeature.h 2003-07-08
09:18:28.000000000 -0700
@@ -71,6 +71,8 @@

/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_EST (4*32+ 7) /* Enhanced SpeedStep
*/
+#define X86_FEATURE_MWAIT (4*32+ 3) /* Monitor/Mwait support */
+

/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word
5 */
#define X86_FEATURE_XSTORE (5*32+ 2) /* on-CPU RNG present (xstore
insn) */
diff -ur /build/orig/linux-2.5.74/include/asm-i386/processor.h
linux-2.5.74/include/asm-i386/processor.h
--- /build/orig/linux-2.5.74/include/asm-i386/processor.h
2003-07-02 13:40:24.000000000 -0700
+++ linux-2.5.74/include/asm-i386/processor.h 2003-07-08
09:18:28.000000000 -0700
@@ -272,6 +272,22 @@
#define pc98 0
#endif

+static __inline__ void __monitor(const void *eax, unsigned long ecx,
+ unsigned long edx)
+{
+ /* "monitor %eax,%ecx,%edx;" */
+ asm volatile(
+ ".byte 0x0f,0x01,0xc8;"
+ : :"a" (eax), "c" (ecx), "d"(edx));
+}
+
+static __inline__ void __mwait(unsigned long eax, unsigned long ecx)
+{
+ /* "mwait %eax,%ecx;" */
+ asm volatile(
+ ".byte 0x0f,0x01,0xc9;"
+ : :"a" (eax), "c" (ecx));
+}

/* from system description table in BIOS. Mostly for MCA use, but
others may find it useful. */



Attachments:
mwait-2.5.74.patch (3.72 kB)
mwait-2.5.74.patch

2003-07-08 23:19:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] idle using PNI monitor/mwait


On Tue, 8 Jul 2003, Nakajima, Jun wrote:
>
> Attached is a patch that enables PNI (Prescott New Instructions)
> monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
> is similar to hlt, but you can avoid IPI to wake up the processor
> waiting. A write (by another processor) to the address range specified
> by MONITOR would wake up the processor waiting on MWAIT.

How about spinlocks? Does it make sense to make the contention code use
mwait too, or are the latencies too high? Not that we have a lot of
high-contention locks any more, so maybe it doesn't much matter.

Also, wasn't there some flag to set the "mwait" granularity? I don't see
anything like that in the patch..

Linus


2003-07-09 00:21:20

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [PATCH] idle using PNI monitor/mwait

That's right. If we have a lot of high-contention locks in the kernel,
we need to fix the code first, to get benefits for the other
architectures.

"mwait" granularity (64-byte, for example) is given by the cpuid
instruction, and we did not use it because 1) it's unlikely that the
other fields of the task structure are modified when it's idle, 2) the
processor needs to check the flag after mwait anyway, to avoid waking up
with a false signal caused by other break events (i.e. mwait is a hint).

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:[email protected]]
> Sent: Tuesday, July 08, 2003 4:34 PM
> To: Nakajima, Jun
> Cc: [email protected]; Saxena, Sunil; Mallick, Asit K;
> Pallipadi, Venkatesh
> Subject: Re: [PATCH] idle using PNI monitor/mwait
>
>
> On Tue, 8 Jul 2003, Nakajima, Jun wrote:
> >
> > Attached is a patch that enables PNI (Prescott New Instructions)
> > monitor/mwait in kernel idle (opcodes are now public). Basically
MWAIT
> > is similar to hlt, but you can avoid IPI to wake up the processor
> > waiting. A write (by another processor) to the address range
specified
> > by MONITOR would wake up the processor waiting on MWAIT.
>
> How about spinlocks? Does it make sense to make the contention code
use
> mwait too, or are the latencies too high? Not that we have a lot of
> high-contention locks any more, so maybe it doesn't much matter.
>
> Also, wasn't there some flag to set the "mwait" granularity? I don't
see
> anything like that in the patch..
>
> Linus

2003-07-09 06:38:16

by Zwane Mwaikambo

[permalink] [raw]
Subject: RE: [PATCH] idle using PNI monitor/mwait

On Tue, 8 Jul 2003, Nakajima, Jun wrote:

> That's right. If we have a lot of high-contention locks in the kernel,
> we need to fix the code first, to get benefits for the other
> architectures.
>
> "mwait" granularity (64-byte, for example) is given by the cpuid
> instruction, and we did not use it because 1) it's unlikely that the
> other fields of the task structure are modified when it's idle, 2) the
> processor needs to check the flag after mwait anyway, to avoid waking up
> with a false signal caused by other break events (i.e. mwait is a hint).

It could still be very handy for polling loops of the form;

while (!ready)
__asm__ ("pause;");

Jun would there be any thermal advantages over using poll and pause ?

Thanks,
Zwane
--
function.linuxpower.ca

2003-07-09 10:47:57

by Alan

[permalink] [raw]
Subject: Re: [PATCH] idle using PNI monitor/mwait

On Maw, 2003-07-08 at 22:23, Nakajima, Jun wrote:
> Hi Linus,
>
> Attached is a patch that enables PNI (Prescott New Instructions)
> monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
> is similar to hlt, but you can avoid IPI to wake up the processor
> waiting. A write (by another processor) to the address range specified
> by MONITOR would wake up the processor waiting on MWAIT.

Is mwait dependant on cached cpu memory and the cache exclusivity logic
or directly on the processor. In other words can I use mwait in future
to wait for DMA to hit a given location ? - Im mostly thinking about
debugging uses

2003-07-09 16:25:15

by Mallick, Asit K

[permalink] [raw]
Subject: RE: [PATCH] idle using PNI monitor/mwait

Alan,
Mwait is not dependent directly on the processor and any bus master
write will wake up the mwait. So, your example will also work.
Thanks,
Asit


> -----Original Message-----
> From: Alan Cox [mailto:[email protected]]
> Sent: Wednesday, July 09, 2003 4:00 AM
> To: Nakajima, Jun
> Cc: Linus Torvalds; Linux Kernel Mailing List; Saxena, Sunil;
> Mallick, Asit K; Pallipadi, Venkatesh
> Subject: Re: [PATCH] idle using PNI monitor/mwait
>
>
> On Maw, 2003-07-08 at 22:23, Nakajima, Jun wrote:
> > Hi Linus,
> >
> > Attached is a patch that enables PNI (Prescott New Instructions)
> > monitor/mwait in kernel idle (opcodes are now public).
> Basically MWAIT
> > is similar to hlt, but you can avoid IPI to wake up the processor
> > waiting. A write (by another processor) to the address
> range specified
> > by MONITOR would wake up the processor waiting on MWAIT.
>
> Is mwait dependant on cached cpu memory and the cache
> exclusivity logic
> or directly on the processor. In other words can I use mwait in future
> to wait for DMA to hit a given location ? - Im mostly thinking about
> debugging uses
>
>

2003-07-09 16:47:11

by Mallick, Asit K

[permalink] [raw]
Subject: RE: [PATCH] idle using PNI monitor/mwait

Linus,

We are analyzing the performance of use of mwait in contention codes. We
do not have all the data yet and will let you know the benefit of use of
mwait in contention code.
Thanks,
Asit


> -----Original Message-----
> From: Nakajima, Jun
> Sent: Tuesday, July 08, 2003 5:36 PM
> To: 'Linus Torvalds'
> Cc: [email protected]; Saxena, Sunil; Mallick,
> Asit K; Pallipadi, Venkatesh
> Subject: RE: [PATCH] idle using PNI monitor/mwait
>
>
> That's right. If we have a lot of high-contention locks in
> the kernel, we need to fix the code first, to get benefits
> for the other architectures.
>
> "mwait" granularity (64-byte, for example) is given by the
> cpuid instruction, and we did not use it because 1) it's
> unlikely that the other fields of the task structure are
> modified when it's idle, 2) the processor needs to check the
> flag after mwait anyway, to avoid waking up with a false
> signal caused by other break events (i.e. mwait is a hint).
>
> Jun
>
> > -----Original Message-----
> > From: Linus Torvalds [mailto:[email protected]]
> > Sent: Tuesday, July 08, 2003 4:34 PM
> > To: Nakajima, Jun
> > Cc: [email protected]; Saxena, Sunil; Mallick, Asit K;
> > Pallipadi, Venkatesh
> > Subject: Re: [PATCH] idle using PNI monitor/mwait
> >
> >
> > On Tue, 8 Jul 2003, Nakajima, Jun wrote:
> > >
> > > Attached is a patch that enables PNI (Prescott New Instructions)
> > > monitor/mwait in kernel idle (opcodes are now public).
> Basically MWAIT
> > > is similar to hlt, but you can avoid IPI to wake up the processor
> > > waiting. A write (by another processor) to the address
> range specified
> > > by MONITOR would wake up the processor waiting on MWAIT.
> >
> > How about spinlocks? Does it make sense to make the
> contention code use
> > mwait too, or are the latencies too high? Not that we have a lot of
> > high-contention locks any more, so maybe it doesn't much matter.
> >
> > Also, wasn't there some flag to set the "mwait"
> granularity? I don't see
> > anything like that in the patch..
> >
> > Linus
>
>

2003-07-10 01:02:58

by Saxena, Sunil

[permalink] [raw]
Subject: RE: [PATCH] idle using PNI monitor/mwait

Thermal advantages may be there and like "pause" they would be
implementation specific.

Thanks
Sunil

-----Original Message-----
From: Zwane Mwaikambo [mailto:[email protected]]
Sent: Tuesday, July 08, 2003 11:42 PM
To: Nakajima, Jun
Cc: Linus Torvalds; [email protected]; Saxena, Sunil;
Mallick, Asit K; Pallipadi, Venkatesh
Subject: RE: [PATCH] idle using PNI monitor/mwait

On Tue, 8 Jul 2003, Nakajima, Jun wrote:

> That's right. If we have a lot of high-contention locks in the kernel,
> we need to fix the code first, to get benefits for the other
> architectures.
>
> "mwait" granularity (64-byte, for example) is given by the cpuid
> instruction, and we did not use it because 1) it's unlikely that the
> other fields of the task structure are modified when it's idle, 2) the
> processor needs to check the flag after mwait anyway, to avoid waking
up
> with a false signal caused by other break events (i.e. mwait is a
hint).

It could still be very handy for polling loops of the form;

while (!ready)
__asm__ ("pause;");

Jun would there be any thermal advantages over using poll and pause ?

Thanks,
Zwane
--
function.linuxpower.ca