2013-07-30 19:15:07

by Ilari Stenroth

[permalink] [raw]
Subject: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

Hi,

Does somebody know why arch/x86/kernel/cpu/intel.c has
tlb_flushall_shift detection logic for Ivy Bridge CPU family but not for
Haswell? Maybe intel_cacheinfo.c needs to be checked for Haswell updates
too.

Regards,
Ilari Stenroth


2013-07-30 19:35:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On Tue, Jul 30, 2013 at 09:50:49PM +0300, Ilari Stenroth wrote:
> Does somebody know why arch/x86/kernel/cpu/intel.c has
> tlb_flushall_shift detection logic for Ivy Bridge CPU family but not
> for Haswell? Maybe intel_cacheinfo.c needs to be checked for Haswell
> updates too.

Because someone needs to sit down and write it. Oh, and more
importantly, test it on real hardware.

:-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-30 19:44:16

by Ilari Stenroth

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 30.7.2013 22.35, Borislav Petkov wrote:
> On Tue, Jul 30, 2013 at 09:50:49PM +0300, Ilari Stenroth wrote:
>> Does somebody know why arch/x86/kernel/cpu/intel.c has
>> tlb_flushall_shift detection logic for Ivy Bridge CPU family but not
>> for Haswell? Maybe intel_cacheinfo.c needs to be checked for Haswell
>> updates too.
>
> Because someone needs to sit down and write it. Oh, and more
> importantly, test it on real hardware.
>
> :-)
>

Right :-) Can volunteer to test, only once I get a motherboard bug
fixed. It runs only one core. Poor Supermicro X10SLH-F thinks Xeon
E3-1265Lv3 has 1C2T :-/

Regs,
Ilari Stenroth

2013-07-30 19:54:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On Tue, Jul 30, 2013 at 10:44:02PM +0300, Ilari Stenroth wrote:
> On 30.7.2013 22.35, Borislav Petkov wrote:
> > On Tue, Jul 30, 2013 at 09:50:49PM +0300, Ilari Stenroth wrote:
> >> Does somebody know why arch/x86/kernel/cpu/intel.c has
> >> tlb_flushall_shift detection logic for Ivy Bridge CPU family but not
> >> for Haswell? Maybe intel_cacheinfo.c needs to be checked for Haswell
> >> updates too.
> >
> > Because someone needs to sit down and write it. Oh, and more
> > importantly, test it on real hardware.
> >
> > :-)
> >
> Right :-) Can volunteer to test, only once I get a motherboard bug
> fixed. It runs only one core. Poor Supermicro X10SLH-F thinks Xeon
> E3-1265Lv3 has 1C2T :-/

Yeah, if I had to guess, I'd say the highest probability is for patches
about it to be coming from Alex. :)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-01 08:54:55

by Alex Shi

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 07/31/2013 03:54 AM, Borislav Petkov wrote:
> On Tue, Jul 30, 2013 at 10:44:02PM +0300, Ilari Stenroth wrote:
>> On 30.7.2013 22.35, Borislav Petkov wrote:
>>> On Tue, Jul 30, 2013 at 09:50:49PM +0300, Ilari Stenroth wrote:
>>>> Does somebody know why arch/x86/kernel/cpu/intel.c has
>>>> tlb_flushall_shift detection logic for Ivy Bridge CPU family but not
>>>> for Haswell? Maybe intel_cacheinfo.c needs to be checked for Haswell
>>>> updates too.
>>>
>>> Because someone needs to sit down and write it. Oh, and more
>>> importantly, test it on real hardware.
>>>
>>> :-)
>>>
>> Right :-) Can volunteer to test, only once I get a motherboard bug
>> fixed. It runs only one core. Poor Supermicro X10SLH-F thinks Xeon
>> E3-1265Lv3 has 1C2T :-/
>
> Yeah, if I had to guess, I'd say the highest probability is for patches
> about it to be coming from Alex. :)
>

just borrowed a haswell laptop and run the munmap case on this. :)
The cpu is 2 core * HT. The test show tlb_flushall_shift = 1 has best
performance.

tlb_flushall_shift is 1
=============== t = 2
munmap use 243ms 14889ns/time, memory access uses 336949 times/thread/ms, cost 2ns/time
munmap use 152ms 18662ns/time, memory access uses 336561 times/thread/ms, cost 2ns/time
munmap use 60ms 14835ns/time, memory access uses 198710 times/thread/ms, cost 5ns/time
munmap use 41ms 20030ns/time, memory access uses 208748 times/thread/ms, cost 4ns/time
munmap use 21ms 20995ns/time, memory access uses 191849 times/thread/ms, cost 5ns/time
munmap use 21ms 41909ns/time, memory access uses 296545 times/thread/ms, cost 3ns/time
=============== t = 4
munmap use 468ms 14287ns/time, memory access uses 72088 times/thread/ms, cost 13ns/time
munmap use 286ms 17488ns/time, memory access uses 65232 times/thread/ms, cost 15ns/time
munmap use 210ms 25746ns/time, memory access uses 97080 times/thread/ms, cost 10ns/time
munmap use 66ms 16138ns/time, memory access uses 56450 times/thread/ms, cost 17ns/time
munmap use 51ms 25323ns/time, memory access uses 41930 times/thread/ms, cost 23ns/time
munmap use 44ms 43599ns/time, memory access uses 53031 times/thread/ms, cost 18ns/time
munmap use 28ms 56011ns/time, memory access uses 36889 times/thread/ms, cost 27ns/time
=============== t = 8
munmap use 2429ms 74138ns/time, memory access uses 42202 times/thread/ms, cost 23ns/time
munmap use 1079ms 65880ns/time, memory access uses 41497 times/thread/ms, cost 24ns/time
munmap use 623ms 76108ns/time, memory access uses 47844 times/thread/ms, cost 20ns/time
munmap use 387ms 94619ns/time, memory access uses 34652 times/thread/ms, cost 28ns/time
munmap use 90ms 44180ns/time, memory access uses 26498 times/thread/ms, cost 37ns/time
munmap use 49ms 47903ns/time, memory access uses 33863 times/thread/ms, cost 29ns/time
munmap use 26ms 51164ns/time, memory access uses 31491 times/thread/ms, cost 31ns/time

tlb_flush_shift is -1
=============== t = 2
munmap use 418ms 12766ns/time, memory access uses 124215 times/thread/ms, cost 8ns/time
munmap use 184ms 11271ns/time, memory access uses 36519 times/thread/ms, cost 27ns/time
munmap use 116ms 14177ns/time, memory access uses 112472 times/thread/ms, cost 8ns/time
munmap use 66ms 16347ns/time, memory access uses 137546 times/thread/ms, cost 7ns/time
munmap use 43ms 21087ns/time, memory access uses 47053 times/thread/ms, cost 21ns/time
munmap use 31ms 30787ns/time, memory access uses 202638 times/thread/ms, cost 4ns/time
munmap use 22ms 43187ns/time, memory access uses 255272 times/thread/ms, cost 3ns/time
=============== t = 4
munmap use 572ms 17483ns/time, memory access uses 54936 times/thread/ms, cost 18ns/time
munmap use 481ms 29360ns/time, memory access uses 71397 times/thread/ms, cost 14ns/time
munmap use 168ms 20575ns/time, memory access uses 59827 times/thread/ms, cost 16ns/time
munmap use 73ms 18062ns/time, memory access uses 34687 times/thread/ms, cost 28ns/time
munmap use 42ms 20581ns/time, memory access uses 48571 times/thread/ms, cost 20ns/time
munmap use 46ms 45261ns/time, memory access uses 43408 times/thread/ms, cost 23ns/time
munmap use 21ms 41828ns/time, memory access uses 49751 times/thread/ms, cost 20ns/time
=============== t = 8
munmap use 1761ms 53756ns/time, memory access uses 40636 times/thread/ms, cost 24ns/time
munmap use 238ms 14541ns/time, memory access uses 19968 times/thread/ms, cost 50ns/time
munmap use 262ms 31988ns/time, memory access uses 31964 times/thread/ms, cost 31ns/time
munmap use 127ms 31086ns/time, memory access uses 35674 times/thread/ms, cost 28ns/time
munmap use 73ms 35764ns/time, memory access uses 23482 times/thread/ms, cost 42ns/time
munmap use 59ms 58406ns/time, memory access uses 36680 times/thread/ms, cost 27ns/time
munmap use 20ms 40608ns/time, memory access uses 26733 times/thread/ms, cost 37ns/time

------
>From 1322ea9e17ad4d9e49e2d93cfc04805368e28273 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Thu, 1 Aug 2013 16:30:23 +0800
Subject: [PATCH 2/2] tlb/tlb_flushall_shift: add haswell tlb_flush_shift

Tested on i5 4350U with munmap case, https://lkml.org/lkml/2012/5/17/59
The best performance is tlb_flush_shift = 1.
The balance point is 256 entries.

Signed-off-by: Alex Shi <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 9a4bc51..ac9b83a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -627,6 +627,7 @@ static void intel_tlb_flushall_shift_set(struct cpuinfo_x86 *c)
tlb_flushall_shift = 5;
break;
case 0x63a: /* Ivybridge */
+ case 0x645: /* Haswell */
tlb_flushall_shift = 1;
break;
case 0x63e: /* Ivybridge EP */
--
1.7.12

--
Thanks
Alex

2013-08-01 09:09:19

by Alex Shi

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 08/01/2013 04:53 PM, Alex Shi wrote:
> ------
> From 1322ea9e17ad4d9e49e2d93cfc04805368e28273 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Thu, 1 Aug 2013 16:30:23 +0800
> Subject: [PATCH 2/2] tlb/tlb_flushall_shift: add haswell tlb_flush_shift
>
> Tested on i5 4350U with munmap case, https://lkml.org/lkml/2012/5/17/59
> The best performance is tlb_flush_shift = 1.
> The balance point is 256 entries.

Before above patch, I also added the IVB EP cpu tlb_flushall_shift. testing
show the best performance at 2. The box has 12 core * HT * 2S.

test command
#for t in `echo 12 24 48 96`; do
echo "=============== t = $t ";
for i in `echo 8 16 32 64 128 256 512 `; do
sudo ./munmap -t $t -n $i;
done
done

detailed result as following:
tlb_flushall_shift = 2;
=============== t = 12
munmap use 516ms 15768ns/time, memory access uses 120232 times/thread/ms, cost 8ns/time
munmap use 297ms 18157ns/time, memory access uses 114378 times/thread/ms, cost 8ns/time
munmap use 175ms 21371ns/time, memory access uses 96932 times/thread/ms, cost 10ns/time
munmap use 115ms 28270ns/time, memory access uses 100961 times/thread/ms, cost 9ns/time
munmap use 90ms 44421ns/time, memory access uses 91293 times/thread/ms, cost 10ns/time
munmap use 28ms 27384ns/time, memory access uses 100032 times/thread/ms, cost 9ns/time
munmap use 20ms 40723ns/time, memory access uses 114393 times/thread/ms, cost 8ns/time
=============== t = 24
munmap use 700ms 21380ns/time, memory access uses 119336 times/thread/ms, cost 8ns/time
munmap use 398ms 24338ns/time, memory access uses 78586 times/thread/ms, cost 12ns/time
munmap use 215ms 26264ns/time, memory access uses 83551 times/thread/ms, cost 11ns/time
munmap use 148ms 36289ns/time, memory access uses 61251 times/thread/ms, cost 16ns/time
munmap use 117ms 57573ns/time, memory access uses 83114 times/thread/ms, cost 12ns/time
munmap use 34ms 33767ns/time, memory access uses 82493 times/thread/ms, cost 12ns/time
munmap use 25ms 50686ns/time, memory access uses 68961 times/thread/ms, cost 14ns/time
=============== t = 48
munmap use 1250ms 38153ns/time, memory access uses 35963 times/thread/ms, cost 27ns/time
munmap use 582ms 35563ns/time, memory access uses 34776 times/thread/ms, cost 28ns/time
munmap use 348ms 42544ns/time, memory access uses 33767 times/thread/ms, cost 29ns/time
munmap use 200ms 49034ns/time, memory access uses 31150 times/thread/ms, cost 32ns/time
munmap use 140ms 68527ns/time, memory access uses 28236 times/thread/ms, cost 35ns/time
munmap use 44ms 43445ns/time, memory access uses 33564 times/thread/ms, cost 29ns/time
munmap use 27ms 54053ns/time, memory access uses 34163 times/thread/ms, cost 29ns/time
=============== t = 96
munmap use 5189ms 158378ns/time, memory access uses 17812 times/thread/ms, cost 56ns/time
munmap use 1236ms 75476ns/time, memory access uses 17563 times/thread/ms, cost 56ns/time
munmap use 628ms 76755ns/time, memory access uses 16746 times/thread/ms, cost 59ns/time
munmap use 319ms 77978ns/time, memory access uses 15956 times/thread/ms, cost 62ns/time
munmap use 258ms 126385ns/time, memory access uses 15307 times/thread/ms, cost 65ns/time
munmap use 130ms 127057ns/time, memory access uses 16644 times/thread/ms, cost 60ns/time
munmap use 31ms 61663ns/time, memory access uses 14797 times/thread/ms, cost 67ns/time

tlb_flushall_shift = -1; //keep tlb flush all for any scenarios.
=============== t = 12
munmap use 485ms 14815ns/time, memory access uses 96048 times/thread/ms, cost 10ns/time
munmap use 232ms 14167ns/time, memory access uses 83143 times/thread/ms, cost 12ns/time
munmap use 133ms 16252ns/time, memory access uses 96413 times/thread/ms, cost 10ns/time
munmap use 67ms 16489ns/time, memory access uses 86718 times/thread/ms, cost 11ns/time
munmap use 46ms 22943ns/time, memory access uses 105914 times/thread/ms, cost 9ns/time
munmap use 29ms 28740ns/time, memory access uses 92108 times/thread/ms, cost 10ns/time
munmap use 20ms 40128ns/time, memory access uses 110841 times/thread/ms, cost 9ns/time
=============== t = 24
munmap use 590ms 18022ns/time, memory access uses 81828 times/thread/ms, cost 12ns/time
munmap use 336ms 20526ns/time, memory access uses 80119 times/thread/ms, cost 12ns/time
munmap use 189ms 23125ns/time, memory access uses 48884 times/thread/ms, cost 20ns/time
munmap use 104ms 25607ns/time, memory access uses 83410 times/thread/ms, cost 11ns/time
munmap use 54ms 26795ns/time, memory access uses 49105 times/thread/ms, cost 20ns/time
munmap use 29ms 29079ns/time, memory access uses 94668 times/thread/ms, cost 10ns/time
munmap use 25ms 49228ns/time, memory access uses 80346 times/thread/ms, cost 12ns/time
=============== t = 48
munmap use 1000ms 30541ns/time, memory access uses 35379 times/thread/ms, cost 28ns/time
munmap use 540ms 33010ns/time, memory access uses 32934 times/thread/ms, cost 30ns/time
munmap use 326ms 39891ns/time, memory access uses 32601 times/thread/ms, cost 30ns/time
munmap use 143ms 35140ns/time, memory access uses 32842 times/thread/ms, cost 30ns/time
munmap use 91ms 44713ns/time, memory access uses 32021 times/thread/ms, cost 31ns/time
munmap use 44ms 43337ns/time, memory access uses 32962 times/thread/ms, cost 30ns/time
munmap use 29ms 56936ns/time, memory access uses 31399 times/thread/ms, cost 31ns/time
=============== t = 96
munmap use 4551ms 138892ns/time, memory access uses 17208 times/thread/ms, cost 58ns/time
munmap use 776ms 47383ns/time, memory access uses 16560 times/thread/ms, cost 60ns/time
munmap use 513ms 62707ns/time, memory access uses 16478 times/thread/ms, cost 60ns/time
munmap use 184ms 45111ns/time, memory access uses 16368 times/thread/ms, cost 61ns/time
munmap use 205ms 100519ns/time, memory access uses 16631 times/thread/ms, cost 60ns/time
munmap use 47ms 46059ns/time, memory access uses 15144 times/thread/ms, cost 66ns/time
munmap use 34ms 66474ns/time, memory access uses 13951 times/thread/ms, cost 71ns/time


-----------
>From 6fb21a9ce475cfc6c7c39bdfd3d9422be24cdb74 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Wed, 31 Jul 2013 16:28:42 +0800
Subject: [PATCH 1/2] x86/tlb_flushall_shift: add Ivybridge EP CPU support

Tested with munmap.c on Ivybridge EP 2S machine, the best shift value is
2, that means when the tlb flush entries less than 64, single invlpg has
performance benefit on this machine.
The testcase come from: https://lkml.org/lkml/2012/5/17/59
Results show it has about 5% to 30% performance increase.

Signed-off-by: Alex Shi <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ec72995..9a4bc51 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -629,6 +629,9 @@ static void intel_tlb_flushall_shift_set(struct cpuinfo_x86 *c)
case 0x63a: /* Ivybridge */
tlb_flushall_shift = 1;
break;
+ case 0x63e: /* Ivybridge EP */
+ tlb_flushall_shift = 2;
+ break;
default:
tlb_flushall_shift = 6;
}
--
1.7.12

2013-08-05 02:48:48

by Alex Shi

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 08/01/2013 05:07 PM, Alex Shi wrote:
> From 6fb21a9ce475cfc6c7c39bdfd3d9422be24cdb74 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Wed, 31 Jul 2013 16:28:42 +0800
> Subject: [PATCH 1/2] x86/tlb_flushall_shift: add Ivybridge EP CPU support
>
> Tested with munmap.c on Ivybridge EP 2S machine, the best shift value is
> 2, that means when the tlb flush entries less than 64, single invlpg has
> performance benefit on this machine.
> The testcase come from: https://lkml.org/lkml/2012/5/17/59
> Results show it has about 5% to 30% performance increase.
>
> Signed-off-by: Alex Shi <[email protected]>
> ---

Any comments are appreciated!

--
Thanks
Alex

2013-08-05 02:49:53

by Alex Shi

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 08/01/2013 04:53 PM, Alex Shi wrote:
> From 1322ea9e17ad4d9e49e2d93cfc04805368e28273 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Thu, 1 Aug 2013 16:30:23 +0800
> Subject: [PATCH 2/2] tlb/tlb_flushall_shift: add haswell tlb_flush_shift
>
> Tested on i5 4350U with munmap case, https://lkml.org/lkml/2012/5/17/59
> The best performance is tlb_flush_shift = 1.
> The balance point is 256 entries.
>
> Signed-off-by: Alex Shi <[email protected]>


Any comments on this, Peter? :)
> ---
> arch/x86/kernel/cpu/intel.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 9a4bc51..ac9b83a 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -627,6 +627,7 @@ static void intel_tlb_flushall_shift_set(struct cpuinfo_x86 *c)
> tlb_flushall_shift = 5;
> break;
> case 0x63a: /* Ivybridge */
> + case 0x645: /* Haswell */
> tlb_flushall_shift = 1;
> break;
> case 0x63e: /* Ivybridge EP */


--
Thanks
Alex

2013-08-05 02:59:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 08/04/2013 07:48 PM, Alex Shi wrote:
>
> Any comments on this, Peter? :)
>

Sounds like you have done the measurements, so it seems fine.

-hpa

2013-08-17 20:53:46

by Ilari Stenroth

[permalink] [raw]
Subject: Re: arch/x86/kernel/cpu/intel.c needs an update for Haswell?

On 5.8.2013 5.59, H. Peter Anvin wrote:
> On 08/04/2013 07:48 PM, Alex Shi wrote:
>>
>> Any comments on this, Peter? :)
>>
>
> Sounds like you have done the measurements, so it seems fine.
>
> -hpa
>

I've been using these patches. No compalints.
Should the patches get pulled to linux-next tree for wider testing?

--
Ilari Stenroth