2011-03-03 11:03:15

by Yasunori Goto

[permalink] [raw]
Subject: Strange minor page fault repeats when SPECjbb2005 is executed


Hello.

I found an abnormal kernel behavior when I executed SPECjbb2005.
It is that minor page fault repeats even though page table entry is correct.

If you know something about cause of this phenomenon, please let me know.
If nobody knows, please reproduce it and find why it is happen.


The details of this problem are the followings.

When SPECjbb2005 is executed, it sometimes stop displaying message of
progress suddenly.
-------------
:
:
116.160: [GC 406278K->193826K(2071552K), 0.0117750 secs]
116.342: [GC 406626K->194122K(2071616K), 0.0118000 secs]
116.522: [GC
^
It stop showing message from here.
--------------------

This condition keeps for some time, then SPECjbb2005 finishes.
The stop time is from about 5 seconds to an hours. It may tend to be longer
on bigger box, but I'm not sure.

In addition, the stop condition is sometimes released when I executed other
unrelated command.



When this phenomenon happen, I saw that too many minor page faults occur
on the java process.

Here is output of /proc/<pid>/stat of the java process.

Usually, the minor fault value is under 1 million on my current environment
like the following.
------------------
1744 (java) S 1742 1742 1721 34816 1742 4202496 968826 0 35834 0 218734 ...
^^^^^^
minor fault value

------------------

But, when SPECjbb2005 stops with this phenomenon, this value increases
sharply.
----

2065 (java) S 2063 2063 1721 34816 2063 4202496 157573157 0 1 0 .....
^^^^^^^^^
2065 (java) S 2063 2063 1721 34816 2063 4202496 231388697 0 1 0 ......
^^^^^^^^^

2065 (java) S 2063 2063 1721 34816 2063 4202496 438851940 0 1 0 ....
^^^^^^^^^

2065 (java) S 2063 2063 1721 34816 2063 4202496 524209252 0 1 0 ...
^^^^^^^^^
-----

I checked page table entry status at handle_mm_fault by debugfs,
then even ptes are correct, page fault repeated on same virtual address
and same cpus. And I think page table entry looks correct.

In this log, cpu4 and 6 repeat page faults.
----
handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
:
:
------------
(Here is the patch to display the above log by debugfs.)

---------------
Index: linux-2.6.38-rc5/mm/memory.c
===================================================================
--- linux-2.6.38-rc5.orig/mm/memory.c
+++ linux-2.6.38-rc5/mm/memory.c
@@ -3333,6 +3333,8 @@ int handle_mm_fault(struct mm_struct *mm
*/
pte = pte_offset_map(pmd, address);

+ pr_debug("%s jiffies64=%lld cpu=%d address=%08lx pmdval=%016llx ptehigh=%08lx ptelow=%08lx\n",__func__, get_jiffies_64(), smp_processor_id(), address, pmd->pmd, pte->pte_high, pte->pte_low);
+
return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}

---------------

I confirmed this phenomenon is reproduced on 2.6.31 and 2.6.38-rc5
of x86 kernel, and I heard this phenomenon doesn't occur on
x86-64 kernel from another engineer who found this problem first.

In addition, this phenomenon occurred on 4 boxes, so I think the cause
is not hardware malfunction.


Here is the run.sh of SPECjbb2005 when it is reproduced on my current box.
----------
date
echo $CLASSPATH
CLASSPATH=./jbb.jar:./jbb_no_precompile.jar:./check.jar:./reporter.jar:$CLASSPATH
echo $CLASSPATH
export CLASSPATH

/usr/bin/java -XX:+UseParallelGC -ms2032m -mx2032m -XX:-AlwaysPreTouch -verbosegc
spec.jbb.JBBmain -propfile SPECjbb.props_WH1-8

-----------


Though I'll continue to chase the cause of this problem, please help to solve this...

Thanks.

--
Yasunori Goto


2011-03-03 16:50:24

by Rik van Riel

[permalink] [raw]
Subject: Re: Strange minor page fault repeats when SPECjbb2005 is executed

On 03/03/2011 06:01 AM, Yasunori Goto wrote:

> In this log, cpu4 and 6 repeat page faults.
> ----
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067

> I confirmed this phenomenon is reproduced on 2.6.31 and 2.6.38-rc5
> of x86 kernel, and I heard this phenomenon doesn't occur on
> x86-64 kernel from another engineer who found this problem first.
>
> In addition, this phenomenon occurred on 4 boxes, so I think the cause
> is not hardware malfunction.

On what CPU model(s) does this happen?

Obviously the PTE is present and allows read, write and
execute accesses, so the PTE should not cause any faults.

That leaves the TLB. It looks almost like the CPU keeps
re-faulting on a (old?) TLB entry, possibly with wrong
permissions, and does not re-load it from the PTE.

I know this "should not happen" on x86, but I cannot think
of an alternative explanation right now. Can you try flushing
the TLB entry in question from handle_pte_fault?

It looks like the code already does this for write faults, but
maybe the garbage collection code uses PROT_NONE a lot and is
running into this issue with a read or exec fault?

It would be good to print the fault flags as well in your debug
print, so we know what kind of fault is being repeated...

2011-03-04 02:53:14

by Yasunori Goto

[permalink] [raw]
Subject: Re: Strange minor page fault repeats when SPECjbb2005 is executed

Thank you for your response.

> On 03/03/2011 06:01 AM, Yasunori Goto wrote:
>
> > In this log, cpu4 and 6 repeat page faults.
> > ----
> > handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> > handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> > handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> > handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> > handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
>
> > I confirmed this phenomenon is reproduced on 2.6.31 and 2.6.38-rc5
> > of x86 kernel, and I heard this phenomenon doesn't occur on
> > x86-64 kernel from another engineer who found this problem first.
> >
> > In addition, this phenomenon occurred on 4 boxes, so I think the cause
> > is not hardware malfunction.
>
> On what CPU model(s) does this happen?

I'll attach cpuinfo in my current box which can reproduce this problem.
(1 socket, 4 cores, 8 logical cpus.)

And summary of other 3 boxes are here. (Unfortunately, I can't use them now.)

1) Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
2 socket,
# of core -> 12
# of logical cpus -> 24

2) Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
2 socket
# of coer -> 8
# of logical cpus -> 16

3) Intel(R) Xeon(TM) CPU 3.80GHz
1 socket
# of core 1
# of logical cpus -> 2

(The stop time was very short on 3rd box. It was about 5 seconds.
But it was sometimes 1 hours on 1st box.)


>
> Obviously the PTE is present and allows read, write and
> execute accesses, so the PTE should not cause any faults.
>
> That leaves the TLB. It looks almost like the CPU keeps
> re-faulting on a (old?) TLB entry, possibly with wrong
> permissions, and does not re-load it from the PTE.
>
> I know this "should not happen" on x86, but I cannot think
> of an alternative explanation right now. Can you try flushing
> the TLB entry in question from handle_pte_fault?

I inserted __flush_tlb_one(address) between my debug print and calling
handle_pte_fault(). But, this phenomenon is reproduced.

>
> It looks like the code already does this for write faults, but
> maybe the garbage collection code uses PROT_NONE a lot and is
> running into this issue with a read or exec fault?
>
> It would be good to print the fault flags as well in your debug
> print, so we know what kind of fault is being repeated...

I did it. The flags was 9 (FALUT_FLAG_ALLOW_RETRY and FAULT_FLAG_WRITE).

-----
handle_mm_fault jiffies64=4295117679 cpu=7 address=40001788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54cb1067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=2 address=40007788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54d3b067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=6 address=40000000 pmdval=0000000070845067 ptehigh=00000000 ptelow=54cb0067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=4 address=40002788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54d22067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=7 address=40001788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54cb1067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=2 address=40007788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54d3b067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=6 address=40000000 pmdval=0000000070845067 ptehigh=00000000 ptelow=54cb0067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=4 address=40002788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54d22067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=7 address=40001788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54cb1067 flags=9
handle_mm_fault jiffies64=4295117679 cpu=2 address=40007788 pmdval=0000000070845067 ptehigh=00000000 ptelow=54d3b067 flags=9
-----
These outputs repeated many times....


Thanks.

-----------
/proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.20
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 9
cpu cores : 4
apicid : 18
initial apicid : 18
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 20
initial apicid : 20
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.16
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 9
cpu cores : 4
apicid : 19
initial apicid : 19
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
stepping : 1
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 5333.15
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:


--
Yasunori Goto