2014-10-08 09:04:32

by Chris Wilson

[permalink] [raw]
Subject: i915.ko WC writes are slow after ea8596bb2d8d379


I ran into a problem on a Sandybridge i5-2500s whilst measuring the
performance of GTT write-combining access. I found subsequent runs were
about 10-40x slower than the first. For example,

igt/gem_gtt_speed:

Time to read 16k through a GTT map: 325.285?s
Time to write 16k through a GTT map: 4.729?s
Time to clear 16k through a GTT map: 4.584?s
Time to clear 16k through a cached GTT map: 1.342?s

on the second run became:

Time to read 16k through a GTT map: 332.148?s
Time to write 16k through a GTT map: 209.411?s
Time to clear 16k through a GTT map: 56.460?s
Time to clear 16k through a cached GTT map: 50.897?s

Naively I would say that we lost the wc on our ioremap.
/sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
runs.

A bisection pointed to

commit ea8596bb2d8d37957f3e92db9511c50801689180
Author: Masami Hiramatsu <[email protected]>
Date: Thu Jul 18 20:47:53 2013 +0900

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions

of which the active ingredient was just

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..f4001e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP

config HAVE_TEXT_POKE_SMP
bool
- select STOP_MACHINE if SMP

config X86_DEV_DMA_OPS
bool

and adding that back into the current build, e.g.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3632743..48a8a69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,6 +87,7 @@ config X86
select HAVE_USER_RETURN_NOTIFIER
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select HAVE_ARCH_JUMP_LABEL
+ select STOP_MACHINE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select SPARSE_IRQ
select GENERIC_FIND_FIRST_BIT

fixes the regression.

For the record, this kernel build doesn't use modules, which seems relevant
in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD".
-Chris

--
Chris Wilson, Intel Open Source Technology Centre


2014-10-08 10:11:07

by Chuck Ebbert

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson <[email protected]> wrote:

>
> I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> performance of GTT write-combining access. I found subsequent runs were
> about 10-40x slower than the first. For example,
>
> igt/gem_gtt_speed:
>
> Time to read 16k through a GTT map: 325.285µs
> Time to write 16k through a GTT map: 4.729µs
> Time to clear 16k through a GTT map: 4.584µs
> Time to clear 16k through a cached GTT map: 1.342µs
>
> on the second run became:
>
> Time to read 16k through a GTT map: 332.148µs
> Time to write 16k through a GTT map: 209.411µs
> Time to clear 16k through a GTT map: 56.460µs
> Time to clear 16k through a cached GTT map: 50.897µs
>
> Naively I would say that we lost the wc on our ioremap.
> /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> runs.
>
> A bisection pointed to
>
> commit ea8596bb2d8d37957f3e92db9511c50801689180
> Author: Masami Hiramatsu <[email protected]>
> Date: Thu Jul 18 20:47:53 2013 +0900
>
> kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
>
> of which the active ingredient was just
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b32ebf9..f4001e0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
>
> config HAVE_TEXT_POKE_SMP
> bool
> - select STOP_MACHINE if SMP
>
> config X86_DEV_DMA_OPS
> bool
>
> and adding that back into the current build, e.g.

Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
sync and your results depend on which CPU the test runs on?

>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3632743..48a8a69 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -87,6 +87,7 @@ config X86
> select HAVE_USER_RETURN_NOTIFIER
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select HAVE_ARCH_JUMP_LABEL
> + select STOP_MACHINE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> select SPARSE_IRQ
> select GENERIC_FIND_FIRST_BIT
>
> fixes the regression.
>
> For the record, this kernel build doesn't use modules, which seems relevant
> in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE
> in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD".

2014-10-08 17:47:12

by Chuck Ebbert

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, 8 Oct 2014 10:03:36 +0100
Chris Wilson <[email protected]> wrote:

> and adding that back into the current build, e.g.
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3632743..48a8a69 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -87,6 +87,7 @@ config X86
> select HAVE_USER_RETURN_NOTIFIER
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select HAVE_ARCH_JUMP_LABEL
> + select STOP_MACHINE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> select SPARSE_IRQ
> select GENERIC_FIND_FIRST_BIT
>
> fixes the regression.
>

Looking closer at this, it seems most configs work by accident,
because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
you disabled both of those? stop_machine() is called from all kinds
of places and almost none of them make sure STOP_MACHINE is selected.

$ find -name Kconf\* | xargs grep STOP_MACHINE
./init/Kconfig:config STOP_MACHINE

All these places use stop_machine():

mm/page_alloc.c, line 3886
drivers/xen/manage.c, line 130
drivers/char/hw_random/intel-rng.c, line 373
arch/powerpc/mm/numa.c:
line 1616
line 1623
arch/powerpc/platforms/powernv/subcore.c, line 324
arch/arm/kernel/kprobes.c, line 165
arch/arm/kernel/patch.c:
line 64
line 71
arch/s390/kernel/jump_label.c, line 61
arch/s390/kernel/kprobes.c:
line 311
line 320
arch/s390/kernel/time.c:
line 820
line 1590
arch/x86/kernel/cpu/mtrr/main.c, line 231
arch/arm64/kernel/insn.c, line 181
kernel/time/timekeeping.c, line 892
kernel/trace/ftrace.c, line 2219
kernel/module.c:
line 770
line 1861

2014-10-08 19:50:46

by Chris Wilson

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson <[email protected]> wrote:
>
> >
> > I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> > performance of GTT write-combining access. I found subsequent runs were
> > about 10-40x slower than the first. For example,
> >
> > igt/gem_gtt_speed:
> >
> > Time to read 16k through a GTT map: 325.285?s
> > Time to write 16k through a GTT map: 4.729?s
> > Time to clear 16k through a GTT map: 4.584?s
> > Time to clear 16k through a cached GTT map: 1.342?s
> >
> > on the second run became:
> >
> > Time to read 16k through a GTT map: 332.148?s
> > Time to write 16k through a GTT map: 209.411?s
> > Time to clear 16k through a GTT map: 56.460?s
> > Time to clear 16k through a cached GTT map: 50.897?s
> >
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> >
> > A bisection pointed to
> >
> > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > Author: Masami Hiramatsu <[email protected]>
> > Date: Thu Jul 18 20:47:53 2013 +0900
> >
> > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
> >
> > of which the active ingredient was just
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..f4001e0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> >
> > config HAVE_TEXT_POKE_SMP
> > bool
> > - select STOP_MACHINE if SMP
> >
> > config X86_DEV_DMA_OPS
> > bool
> >
> > and adding that back into the current build, e.g.
>
> Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
> sync and your results depend on which CPU the test runs on?

Indeed, this appears to be the explanation. (And here I thought PAT
superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
GTT quite a while ago.)

Replacing the stop_machine there with on_each_cpu does the trick:

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index f961de9..c0e37d5 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -151,7 +151,7 @@ struct set_mtrr_data {
*
* Returns nothing.
*/
-static int mtrr_rendezvous_handler(void *info)
+static void mtrr_rendezvous_handler(void *info)
{
struct set_mtrr_data *data = info;

@@ -174,7 +174,6 @@ static int mtrr_rendezvous_handler(void *info)
} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
mtrr_if->set_all();
}
- return 0;
}

static inline int types_compatible(mtrr_type type1, mtrr_type type2)
@@ -228,7 +227,7 @@ set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type typ
.smp_type = type
};

- stop_machine(mtrr_rendezvous_handler, &data, cpu_online_mask);
+ on_each_cpu_mask(cpu_online_mask, mtrr_rendezvous_handler, &data, true);
}

static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
@@ -240,8 +239,7 @@ static void set_mtrr_from_inactive_cpu(unsigned int reg, unsigned long base,
.smp_type = type
};

- stop_machine_from_inactive_cpu(mtrr_rendezvous_handler, &data,
- cpu_callout_mask);
+ on_each_cpu_mask(cpu_callout_mask, mtrr_rendezvous_handler, &data, true);
}

/**

--
Chris Wilson, Intel Open Source Technology Centre

2014-10-08 21:37:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On 10/08/2014 12:49 PM, Chris Wilson wrote:
>
> Indeed, this appears to be the explanation. (And here I thought PAT
> superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
> GTT quite a while ago.)
>
> Replacing the stop_machine there with on_each_cpu does the trick:
>

It should, but there seem to be quite a few drivers which still muck
with MTRRs. However, i915 is not one of them, it calls
io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
what the heck is going on here.

> Naively I would say that we lost the wc on our ioremap.
> /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> runs.

Could you tell me what the above looks like?

-hpa



Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

(2014/10/09 2:47), Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson <[email protected]> wrote:
>
>> and adding that back into the current build, e.g.
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 3632743..48a8a69 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -87,6 +87,7 @@ config X86
>> select HAVE_USER_RETURN_NOTIFIER
>> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>> select HAVE_ARCH_JUMP_LABEL
>> + select STOP_MACHINE
>> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>> select SPARSE_IRQ
>> select GENERIC_FIND_FIRST_BIT
>>
>> fixes the regression.
>>
>
> Looking closer at this, it seems most configs work by accident,
> because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it
> you disabled both of those? stop_machine() is called from all kinds
> of places and almost none of them make sure STOP_MACHINE is selected.

I guess most of them expects stop_machine() is not a configurable
feature...
If some of them requires stop_machine(), it should enable it on its
kconfig entry (including ftrace, kprobes).

> $ find -name Kconf\* | xargs grep STOP_MACHINE
> ./init/Kconfig:config STOP_MACHINE
>
> All these places use stop_machine():
>
> mm/page_alloc.c, line 3886
> drivers/xen/manage.c, line 130
> drivers/char/hw_random/intel-rng.c, line 373
> arch/powerpc/mm/numa.c:
> line 1616
> line 1623
> arch/powerpc/platforms/powernv/subcore.c, line 324
> arch/arm/kernel/kprobes.c, line 165
> arch/arm/kernel/patch.c:
> line 64
> line 71
> arch/s390/kernel/jump_label.c, line 61
> arch/s390/kernel/kprobes.c:
> line 311
> line 320
> arch/s390/kernel/time.c:
> line 820
> line 1590
> arch/x86/kernel/cpu/mtrr/main.c, line 231
> arch/arm64/kernel/insn.c, line 181
> kernel/time/timekeeping.c, line 892
> kernel/trace/ftrace.c, line 2219
> kernel/module.c:
> line 770
> line 1861
>

BTW, as I sent a series of patches, the last two can be removed.
https://lkml.org/lkml/2014/8/25/142

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-10-09 06:54:32

by Chris Wilson

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, Oct 08, 2014 at 02:36:49PM -0700, H. Peter Anvin wrote:
> On 10/08/2014 12:49 PM, Chris Wilson wrote:
> >
> > Indeed, this appears to be the explanation. (And here I thought PAT
> > superseded mtrrs - i915.ko stopped trying to use assign an mtrr for its
> > GTT quite a while ago.)
> >
> > Replacing the stop_machine there with on_each_cpu does the trick:
> >
>
> It should, but there seem to be quite a few drivers which still muck
> with MTRRs. However, i915 is not one of them, it calls
> io_mapping_create_wc() followed by arch_phys_wc_add(), so I'm wondering
> what the heck is going on here.

This system also have a radeon GPU. Disabling it (not building in the
module) makes no difference to the wc speed.

> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
>
> Could you tell me what the above looks like?

# cat /sys/kernel/debug/x86/pat_memtype_list
PAT memtype list:
write-back @ 0x8cf34000-0x8cf43000
write-back @ 0x8cf4d000-0x8cf4e000
write-back @ 0x8cf4d000-0x8cf50000
write-back @ 0x8cf50000-0x8cf51000
write-back @ 0x8cf51000-0x8cf52000
write-back @ 0x8cf52000-0x8cf53000
write-back @ 0x8cf53000-0x8cf55000
write-back @ 0x8cf55000-0x8cf56000
write-back @ 0x8cf9d000-0x8cf9e000
write-back @ 0x8cf9f000-0x8cfa0000
write-back @ 0x8cffc000-0x8cffd000
uncached-minus @ 0x8fc00000-0x8fe00000
write-combining @ 0x8fe00000-0x90000000
uncached-minus @ 0x90220000-0x90240000
uncached-minus @ 0x90300000-0x90320000
uncached-minus @ 0x90340000-0x90341000
uncached-minus @ 0x90380000-0x90381000
write-combining @ 0xa0000000-0xc0000000
write-combining @ 0xa0139000-0xa0159000
write-combining @ 0xa0159000-0xa0179000
write-combining @ 0xa0179000-0xa0199000
write-combining @ 0xc0040000-0xc025e000
write-combining @ 0xc025e000-0xc045e000
write-combining @ 0xc045e000-0xc045f000
write-combining @ 0xc045f000-0xc075f000
uncached-minus @ 0xf8000000-0xfc000000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed1f000-0xfed20000

(identical for good/bad runs)

# cat /proc/mtrr
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 256MB, count=1: write-back
reg02: base=0x08e000000 ( 2272MB), size= 32MB, count=1: uncachable
reg03: base=0x08d000000 ( 2256MB), size= 16MB, count=1: uncachable
reg04: base=0x100000000 ( 4096MB), size= 2048MB, count=1: write-back
reg05: base=0x170000000 ( 5888MB), size= 256MB, count=1: uncachable
reg06: base=0x16f000000 ( 5872MB), size= 16MB, count=1: uncachable
reg07: base=0x16e800000 ( 5864MB), size= 8MB, count=1: uncachable
reg08: base=0x16e600000 ( 5862MB), size= 2MB, count=1: uncachable

# cat /proc/iomem:
00000000-00000fff : reserved
00001000-0009bbff : System RAM
0009bc00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000cdfff : Video ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000dffff : PCI Bus 0000:00
000e0000-000fffff : reserved
000e0000-000e3fff : PCI Bus 0000:00
000e4000-000e7fff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-1fffffff : System RAM
01000000-0161981b : Kernel code
0161981c-01ca20ff : Kernel data
01dac000-01e2dfff : Kernel bss
20000000-201fffff : reserved
20000000-201fffff : pnp 00:05
20200000-3fffffff : System RAM
40000000-401fffff : reserved
40000000-401fffff : pnp 00:05
40200000-8ccd2fff : System RAM
8ccd3000-8cd66fff : reserved
8cd67000-8cfe6fff : ACPI Non-volatile Storage
8cfe7000-8cffefff : ACPI Tables
8cfff000-8cffffff : System RAM
8d000000-8f9fffff : reserved
8da00000-8f9fffff : Graphics Stolen Memory
8fa00000-feafffff : PCI Bus 0000:00
8fa00000-8fa00fff : pnp 00:03
8fc00000-8fffffff : 0000:00:02.0
90000000-900fffff : PCI Bus 0000:04
90000000-900fffff : PCI Bus 0000:05
90000000-90003fff : 0000:05:00.0
90010000-900107ff : 0000:05:00.0
90100000-901fffff : PCI Bus 0000:03
90100000-90101fff : 0000:03:00.0
90200000-902fffff : PCI Bus 0000:01
90200000-9021ffff : 0000:01:00.0
90220000-9023ffff : 0000:01:00.0
90240000-90243fff : 0000:01:00.1
90300000-9031ffff : 0000:00:19.0
90300000-9031ffff : e1000e
90330000-903300ff : 0000:00:1f.3
90340000-903407ff : 0000:00:1f.2
90340000-903407ff : ahci
90350000-903503ff : 0000:00:1d.0
90360000-90363fff : 0000:00:1b.0
90370000-903703ff : 0000:00:1a.0
90380000-90380fff : 0000:00:19.0
90380000-90380fff : e1000e
90390000-90390fff : 0000:00:16.3
903a0000-903a000f : 0000:00:16.0
a0000000-bfffffff : 0000:00:02.0
c0000000-cfffffff : PCI Bus 0000:01
c0000000-cfffffff : 0000:01:00.0
f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f]
f8000000-fbffffff : reserved
f8000000-fbffffff : pnp 00:03
fec00000-fec00fff : reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed10000-fed13fff : reserved
fed18000-fed19fff : reserved
fed18000-fed18fff : pnp 00:03
fed19000-fed19fff : pnp 00:03
fed1c000-fed1ffff : reserved
fed1c000-fed1ffff : pnp 00:03
fed20000-fed3ffff : pnp 00:03
fed40000-fed44fff : PCI Bus 0000:00
fed45000-fed8ffff : pnp 00:03
fed90000-fed93fff : pnp 00:03
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
ff000000-ffffffff : INT0800:00
ff980000-ffbfffff : reserved
ffd80000-ffffffff : reserved
100000000-16e5fffff : System RAM
16e600000-16fffffff : RAM buffer

# lspci -vv -s 0:0:2
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 2210
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 26
Region 0: Memory at 8fc00000 (64-bit, non-prefetchable) [size=4M]
Region 2: Memory at a0000000 (64-bit, prefetchable) [size=512M]
Region 4: I/O ports at 3000 [size=64]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee0f00c Data: 41b1
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a4] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: i915

--
Chris Wilson, Intel Open Source Technology Centre

2014-10-09 12:52:51

by Chuck Ebbert

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Thu, 9 Oct 2014 07:53:31 +0100
Chris Wilson <[email protected]> wrote:

> # cat /proc/mtrr
> reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
> reg01: base=0x080000000 ( 2048MB), size= 256MB, count=1: write-back
> reg02: base=0x08e000000 ( 2272MB), size= 32MB, count=1: uncachable
> reg03: base=0x08d000000 ( 2256MB), size= 16MB, count=1: uncachable
> reg04: base=0x100000000 ( 4096MB), size= 2048MB, count=1: write-back
> reg05: base=0x170000000 ( 5888MB), size= 256MB, count=1: uncachable
> reg06: base=0x16f000000 ( 5872MB), size= 16MB, count=1: uncachable
> reg07: base=0x16e800000 ( 5864MB), size= 8MB, count=1: uncachable
> reg08: base=0x16e600000 ( 5862MB), size= 2MB, count=1: uncachable
>

Well that's what the kernel thinks is in every CPU.
Could you try installing x86info and running "x86info --mtrr
--all-cpus" while running the broken kernel?

2014-10-09 13:01:42

by Chris Wilson

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
> Could you try installing x86info and running "x86info --mtrr
> --all-cpus" while running the broken kernel?

# /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed
IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
Time to read 16k through a GTT map: 318.643?s
Time to write 16k through a GTT map: 203.103?s
Time to clear 16k through a GTT map: 53.098?s
Time to clear 16k through a cached GTT map: 49.925?s

(i.e. bad kernel)

# x86info --mtrr --all-cpus
x86info v1.30. Dave Jones 2001-2011
Feedback to <[email protected]>.

Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable))

--------------------------------------------------------------------------

CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable))

--------------------------------------------------------------------------
CPU #3:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable))

--------------------------------------------------------------------------
CPU #4:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10))
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable))

--------------------------------------------------------------------------
Total processor threads: 4
This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.30GHz

--
Chris Wilson, Intel Open Source Technology Centre

2014-10-09 14:46:51

by Chuck Ebbert

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Thu, 9 Oct 2014 14:00:47 +0100
Chris Wilson <[email protected]> wrote:

> On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote:
> > Could you try installing x86info and running "x86info --mtrr
> > --all-cpus" while running the broken kernel?
>
> # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed
> IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64)
> Time to read 16k through a GTT map: 318.643µs
> Time to write 16k through a GTT map: 203.103µs
> Time to clear 16k through a GTT map: 53.098µs
> Time to clear 16k through a cached GTT map: 49.925µs
>
> (i.e. bad kernel)
>
> # x86info --mtrr --all-cpus
> x86info v1.30. Dave Jones 2001-2011
> Feedback to <[email protected]>.
>
> Found 4 CPUs.
> CPU #1:
> Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
> Type: 0 (Original OEM)
> CPU Model (x86info's best guess): Unknown model.
> Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz
>
> MTRR registers:
> MTRRcap (0xfe): 0x0000000000000d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10))
> MTRRphysBase0 (0x200): 0x0000000000000006 (physbase field:0x000000 type field: 0x06 (write-back))
> MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
> MTRRphysBase1 (0x202): 0x0000000080000006 (physbase field:0x080000 type field: 0x06 (write-back))
> MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
> MTRRphysBase2 (0x204): 0x000000008e000000 (physbase field:0x08e000 type field: 0x00 (uncacheable))
> MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask field:0xffe000 valid flag: 1)
> MTRRphysBase3 (0x206): 0x000000008d000000 (physbase field:0x08d000 type field: 0x00 (uncacheable))
> MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
> MTRRphysBase4 (0x208): 0x0000000100000006 (physbase field:0x100000 type field: 0x06 (write-back))
> MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask field:0xf80000 valid flag: 1)
> MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase field:0x170000 type field: 0x00 (uncacheable))
> MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask field:0xff0000 valid flag: 1)
> MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase field:0x16f000 type field: 0x00 (uncacheable))
> MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask field:0xfff000 valid flag: 1)
> MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase field:0x16e800 type field: 0x00 (uncacheable))
> MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask field:0xfff800 valid flag: 1)
> MTRRfix64K_00000 (0x250): 0x0606060606060606
> MTRRfix16K_80000 (0x258): 0x0606060606060606
> MTRRfix16K_A0000 (0x259): 0x0000000000000000
> MTRRfix4K_C8000 (0x269): 0x0505050505050505
> MTRRfix4K_D0000 0x26a: 0x0000000000000000
> MTRRfix4K_D8000 0x26b: 0x0000000000000000
> MTRRfix4K_E0000 0x26c: 0x0000000000000000
> MTRRfix4K_E8000 0x26d: 0x0505050505050505
> MTRRfix4K_F0000 0x26e: 0x0505050505050505
> MTRRfix4K_F8000 0x26f: 0x0505050505050505
> MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable))
>

<snip>

Well they're all the same.

Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
it only prints the first 8. I don't know if it will show anything
different, but can you try fixing it with this patch?

--- a/mtrr.c
+++ b/mtrr.c
@@ -75,19 +75,23 @@
printf("0x%016llx\n", val);
}

-static void decode_mtrrcap(int cpu, int msr)
+unsigned int decode_mtrrcap(int cpu, int msr)
{
unsigned long long val;
+ unsigned int vcnt = 0;
int ret;

ret = mtrr_value(cpu,msr,&val);
if (ret) {
+ vcnt = (unsigned int)(val & IA32_MTRRCAP_VCNT);
printf("0x%016llx ", val);
printf("(smrr flag: 0x%01x, ",(unsigned int) (val & IA32_MTRRCAP_SMRR) >> 11 );
printf("wc flag: 0x%01x, ",(unsigned int) (val&IA32_MTRRCAP_WC) >> 10);
printf("fix flag: 0x%01x, ",(unsigned int) (val&IA32_MTRRCAP_FIX) >> 8);
- printf("vcnt field: 0x%02x (%d))\n",(unsigned int) (val&IA32_MTRRCAP_VCNT) , (int) (val&IA32_MTRRCAP_VCNT));
+ printf("vcnt field: 0x%02x (%u))\n", vcnt, vcnt);
}
+
+ return vcnt;
}

static void decode_mtrr_deftype(int cpu, int msr)
@@ -142,7 +146,7 @@
void dump_mtrrs(struct cpudata *cpu)
{
unsigned long long val = 0;
- unsigned int i;
+ unsigned int i, vcnt;

if (!(cpu->flags_edx & (X86_FEATURE_MTRR)))
return;
@@ -157,11 +161,11 @@
printf("MTRR registers:\n");

printf("MTRRcap (0xfe): ");
- decode_mtrrcap(cpu->number, 0xfe);
+ vcnt = decode_mtrrcap(cpu->number, 0xfe);

set_max_phy_addr(cpu);

- for (i = 0; i < 16; i+=2) {
+ for (i = 0; i < 2 * vcnt; i += 2) {
printf("MTRRphysBase%u (0x%x): ", i/2, (unsigned int) 0x200+i);
decode_mtrr_physbase(cpu->number, 0x200+i);
printf("MTRRphysMask%u (0x%x): ", i/2, (unsigned int) 0x201+i);

2014-10-09 15:15:46

by Chris Wilson

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Thu, Oct 09, 2014 at 09:46:37AM -0500, Chuck Ebbert wrote:
> Well they're all the same.
>
> Hmm, x86info is not dumping all the variable MTRRs. You have 10, but
> it only prints the first 8. I don't know if it will show anything
> different, but can you try fixing it with this patch?

Source (https://github.com/dankamongmen/x86info) was slightly different,
but I followed the drift.

tldr: 8,9 appear to be identical on all cpus as well.

$ sudo ./x86info --mtrr --all-cpus
x86info v1.31pre
Found 4 CPUs.
CPU #1:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable))
MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable))
MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable))

--------------------------------------------------------------------------
CPU #2:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable))
MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable))
MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable))

--------------------------------------------------------------------------
CPU #3:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable))
MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable))
MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable))

--------------------------------------------------------------------------
CPU #4:
Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (SandyBridge)
Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz

MTRR registers:
MTRRcap (0xfe): 0x0000000000000d0a wc:1 fix:1 vcnt:10
MTRRphysBase0 (0x200): 0x0000000000000006 (physbase:0x000000 type: 0x06 (write-back))
MTRRphysMask0 (0x201): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase1 (0x202): 0x0000000080000006 (physbase:0x080000 type: 0x06 (write-back))
MTRRphysMask1 (0x203): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase2 (0x204): 0x000000008e000000 (physbase:0x08e000 type: 0x00 (uncacheable))
MTRRphysMask2 (0x205): 0x0000000ffe000800 (physmask:0xffe000 valid:1)
MTRRphysBase3 (0x206): 0x000000008d000000 (physbase:0x08d000 type: 0x00 (uncacheable))
MTRRphysMask3 (0x207): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase4 (0x208): 0x0000000100000006 (physbase:0x100000 type: 0x06 (write-back))
MTRRphysMask4 (0x209): 0x0000000f80000800 (physmask:0xf80000 valid:1)
MTRRphysBase5 (0x20a): 0x0000000170000000 (physbase:0x170000 type: 0x00 (uncacheable))
MTRRphysMask5 (0x20b): 0x0000000ff0000800 (physmask:0xff0000 valid:1)
MTRRphysBase6 (0x20c): 0x000000016f000000 (physbase:0x16f000 type: 0x00 (uncacheable))
MTRRphysMask6 (0x20d): 0x0000000fff000800 (physmask:0xfff000 valid:1)
MTRRphysBase7 (0x20e): 0x000000016e800000 (physbase:0x16e800 type: 0x00 (uncacheable))
MTRRphysMask7 (0x20f): 0x0000000fff800800 (physmask:0xfff800 valid:1)
MTRRphysBase8 (0x210): 0x000000016e600000 (physbase:0x16e600 type: 0x00 (uncacheable))
MTRRphysMask8 (0x211): 0x0000000fffe00800 (physmask:0xfffe00 valid:1)
MTRRphysBase9 (0x212): 0x0000000000000000 (physbase:0x000000 type: 0x00 (uncacheable))
MTRRphysMask9 (0x213): 0x0000000000000000 (physmask:0x000000 valid:0)
MTRRfix64K_00000 (0x250): 0x0606060606060606
MTRRfix16K_80000 (0x258): 0x0606060606060606
MTRRfix16K_A0000 (0x259): 0x0000000000000000
MTRRfix4K_C8000 (0x269): 0x0505050505050505
MTRRfix4K_D0000 0x26a: 0x0000000000000000
MTRRfix4K_D8000 0x26b: 0x0000000000000000
MTRRfix4K_E0000 0x26c: 0x0000000000000000
MTRRfix4K_E8000 0x26d: 0x0505050505050505
MTRRfix4K_F0000 0x26e: 0x0505050505050505
MTRRfix4K_F8000 0x26f: 0x0505050505050505
MTRRdefType (0x2ff): 0x0000000000000c00 (fixed-range flag:1 enable flag:1 default type:0x00 (uncacheable))

--------------------------------------------------------------------------
Total processor threads: 4
This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.30GHz

--
Chris Wilson, Intel Open Source Technology Centre

2015-11-18 14:50:05

by Chris Wilson

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, Oct 08, 2014 at 05:10:59AM -0500, Chuck Ebbert wrote:
> On Wed, 8 Oct 2014 10:03:36 +0100
> Chris Wilson <[email protected]> wrote:
>
> >
> > I ran into a problem on a Sandybridge i5-2500s whilst measuring the
> > performance of GTT write-combining access. I found subsequent runs were
> > about 10-40x slower than the first. For example,
> >
> > igt/gem_gtt_speed:
> >
> > Time to read 16k through a GTT map: 325.285?s
> > Time to write 16k through a GTT map: 4.729?s
> > Time to clear 16k through a GTT map: 4.584?s
> > Time to clear 16k through a cached GTT map: 1.342?s
> >
> > on the second run became:
> >
> > Time to read 16k through a GTT map: 332.148?s
> > Time to write 16k through a GTT map: 209.411?s
> > Time to clear 16k through a GTT map: 56.460?s
> > Time to clear 16k through a cached GTT map: 50.897?s
> >
> > Naively I would say that we lost the wc on our ioremap.
> > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated
> > runs.
> >
> > A bisection pointed to
> >
> > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > Author: Masami Hiramatsu <[email protected]>
> > Date: Thu Jul 18 20:47:53 2013 +0900
> >
> > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
> >
> > of which the active ingredient was just
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b32ebf9..f4001e0 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> >
> > config HAVE_TEXT_POKE_SMP
> > bool
> > - select STOP_MACHINE if SMP
> >
> > config X86_DEV_DMA_OPS
> > bool
> >
> > and adding that back into the current build, e.g.
>
> Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of
> sync and your results depend on which CPU the test runs on?

(From the other reply, it did and is still required).

I have run into other issues where stop_machine() tries to only do a
irq-disabled callback on the local CPU as opposed to halting all CPUs
and running the callback universally.

My understanding is that the root cause of the issue is:

diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..8235e0b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1993,8 +1993,7 @@ config INIT_ALL_POSSIBLE

config STOP_MACHINE
bool
- default y
- depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
+ default y if SMP || HOTPLUG_CPU
help
Need stop_machine() primitive.

Although

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index d2abbdb..ff4f029 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
* grabbing every spinlock (and more). So the "read" side to such a
* lock is anything which disables preemption.
*/
-#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

/**
* stop_machine: freeze the machine on all CPUs and run this function
@@ -128,7 +128,7 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
const struct cpumask *cpus);

-#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#else /* CONFIG_SMP */

static inline int __stop_machine(int (*fn)(void *), void *data,
const struct cpumask *cpus)
@@ -153,5 +153,5 @@ static inline int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
return __stop_machine(fn, data, cpus);
}

-#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
#endif /* _LINUX_STOP_MACHINE */
diff --git a/init/Kconfig b/init/Kconfig
index af09b4f..44600a8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1991,13 +1991,6 @@ config INIT_ALL_POSSIBLE
it was better to provide this option than to break all the archs
and have several arch maintainers pursuing me down dark alleys.

-config STOP_MACHINE
- bool
- default y
- depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
- help
- Need stop_machine() primitive.
-
source "block/Kconfig"

config PREEMPT_NOTIFIERS
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index fd643d8..2dd1f306 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -513,7 +513,7 @@ static int __init cpu_stop_init(void)
}
early_initcall(cpu_stop_init);

-#ifdef CONFIG_STOP_MACHINE
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
{
@@ -613,4 +613,4 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
return ret ?: done.ret;
}

-#endif /* CONFIG_STOP_MACHINE */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */

may be more apt.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2015-11-18 15:57:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379

On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson <[email protected]> wrote:
> Although
>
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index d2abbdb..ff4f029 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
> * grabbing every spinlock (and more). So the "read" side to such a
> * lock is anything which disables preemption.
> */
> -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

[...]


This seems much better. Having a set of stop_machine functions around
that don't work depending on config seems dangerous.

--Andy

2015-11-19 08:14:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379


* Andy Lutomirski <[email protected]> wrote:

> On Wed, Nov 18, 2015 at 6:48 AM, Chris Wilson <[email protected]> wrote:
> > Although
> >
> > diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> > index d2abbdb..ff4f029 100644
> > --- a/include/linux/stop_machine.h
> > +++ b/include/linux/stop_machine.h
> > @@ -97,7 +97,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
> > * grabbing every spinlock (and more). So the "read" side to such a
> > * lock is anything which disables preemption.
> > */
> > -#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> > +#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)
>
> [...]
>
> This seems much better. Having a set of stop_machine functions around
> that don't work depending on config seems dangerous.

Agreed.

Acked-by: Ingo Molnar <[email protected]>

Thanks,

Ingo

2015-11-19 08:16:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: i915.ko WC writes are slow after ea8596bb2d8d379


* Chris Wilson <[email protected]> wrote:

> > > A bisection pointed to
> > >
> > > commit ea8596bb2d8d37957f3e92db9511c50801689180
> > > Author: Masami Hiramatsu <[email protected]>
> > > Date: Thu Jul 18 20:47:53 2013 +0900
> > >
> > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions
> > >
> > > of which the active ingredient was just
> > >
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index b32ebf9..f4001e0 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP
> > >
> > > config HAVE_TEXT_POKE_SMP
> > > bool
> > > - select STOP_MACHINE if SMP

Ouch...

This is certainly an educative example of how pure 'code removal' patches can have
unintended side effects.

Is there a full fix patch available, and is anyone pushing that to Linus?

Thanks,

Ingo

2015-11-19 10:05:30

by Chris Wilson

[permalink] [raw]
Subject: [PATCH] kernel: Remove stop_machine() Kconfig dependency

Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable. This leads
to configurations where stop_machine() is broken as it will then only
run the callback on the local CPU with irqs disabled, and not stop the
other CPUs or run the callback on them. For example, this breaks MTRR
setup on x86 in certain configs since

commit ea8596bb2d8d37957f3e92db9511c50801689180
Author: Masami Hiramatsu <[email protected]>
Date: Thu Jul 18 20:47:53 2013 +0900

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions

as the MTRR is only established on the boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <[email protected]>
Preemptively-Acked-by: Ingo Molnar <[email protected]>
Cc:"Paul E. McKenney" <[email protected]>
Cc: Pranith Kumar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Iulia Manda <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Chuck Ebbert <[email protected]>
---
include/linux/stop_machine.h | 6 +++---
init/Kconfig | 7 -------
kernel/stop_machine.c | 4 ++--
3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 0adedca24c5b..0e1b1540597a 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -99,7 +99,7 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
* grabbing every spinlock (and more). So the "read" side to such a
* lock is anything which disables preemption.
*/
-#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

/**
* stop_machine: freeze the machine on all CPUs and run this function
@@ -118,7 +118,7 @@ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus);

int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data,
const struct cpumask *cpus);
-#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */

static inline int stop_machine(cpu_stop_fn_t fn, void *data,
const struct cpumask *cpus)
@@ -137,5 +137,5 @@ static inline int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data,
return stop_machine(fn, data, cpus);
}

-#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
#endif /* _LINUX_STOP_MACHINE */
diff --git a/init/Kconfig b/init/Kconfig
index c24b6f767bf0..235c7a2c0d20 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2030,13 +2030,6 @@ config INIT_ALL_POSSIBLE
it was better to provide this option than to break all the archs
and have several arch maintainers pursuing me down dark alleys.

-config STOP_MACHINE
- bool
- default y
- depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU
- help
- Need stop_machine() primitive.
-
source "block/Kconfig"

config PREEMPT_NOTIFIERS
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 867bc20e1ef1..a3bbaee77c58 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -531,7 +531,7 @@ static int __init cpu_stop_init(void)
}
early_initcall(cpu_stop_init);

-#ifdef CONFIG_STOP_MACHINE
+#if defined(CONFIG_SMP) || defined(CONFIG_HOTPLUG_CPU)

static int __stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus)
{
@@ -631,4 +631,4 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data,
return ret ?: done.ret;
}

-#endif /* CONFIG_STOP_MACHINE */
+#endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
--
2.6.2