zap_pte_range loops from @addr to @end. In the middle, if it runs out of
batching slots, TLB entries needs to be flushed for @start to @interim,
NOT @interim to @end.
Since ARC port doesn't use page free batching I can't test it myself but
this seems like the right thing to do.
Observed this when working on a fix for the issue at thread:
http://www.spinics.net/lists/linux-arch/msg21736.html
Signed-off-by: Vineet Gupta <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected] <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Max Filippov <[email protected]>
---
mm/memory.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 6dc1882..d9d5fd9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
spinlock_t *ptl;
pte_t *start_pte;
pte_t *pte;
+ unsigned long range_start = addr;
again:
init_rss_vec(rss);
@@ -1215,12 +1216,14 @@ again:
force_flush = 0;
#ifdef HAVE_GENERIC_MMU_GATHER
- tlb->start = addr;
- tlb->end = end;
+ tlb->start = range_start;
+ tlb->end = addr;
#endif
tlb_flush_mmu(tlb);
- if (addr != end)
+ if (addr != end) {
+ range_start = addr;
goto again;
+ }
}
return addr;
--
1.7.10.4
On Wed, May 29, 2013 at 01:56:13PM +0100, Vineet Gupta wrote:
> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> batching slots, TLB entries needs to be flushed for @start to @interim,
> NOT @interim to @end.
>
> Since ARC port doesn't use page free batching I can't test it myself but
> this seems like the right thing to do.
> Observed this when working on a fix for the issue at thread:
> http://www.spinics.net/lists/linux-arch/msg21736.html
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: [email protected]
> Cc: [email protected] <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Max Filippov <[email protected]>
> ---
> mm/memory.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..d9d5fd9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> spinlock_t *ptl;
> pte_t *start_pte;
> pte_t *pte;
> + unsigned long range_start = addr;
>
> again:
> init_rss_vec(rss);
> @@ -1215,12 +1216,14 @@ again:
> force_flush = 0;
>
> #ifdef HAVE_GENERIC_MMU_GATHER
> - tlb->start = addr;
> - tlb->end = end;
> + tlb->start = range_start;
> + tlb->end = addr;
> #endif
> tlb_flush_mmu(tlb);
> - if (addr != end)
> + if (addr != end) {
> + range_start = addr;
> goto again;
> + }
> }
Isn't this code only run if force_flush != 0? force_flush is set to
!__tlb_remove_page() and this function always returns 1 on (generic TLB)
UP since tlb_fast_mode() is 1. There is no batching on UP with the
generic TLB code.
--
Catalin
On 05/29/2013 07:33 PM, Catalin Marinas wrote:
> On Wed, May 29, 2013 at 01:56:13PM +0100, Vineet Gupta wrote:
>> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
>> batching slots, TLB entries needs to be flushed for @start to @interim,
>> NOT @interim to @end.
>>
>> Since ARC port doesn't use page free batching I can't test it myself but
>> this seems like the right thing to do.
>> Observed this when working on a fix for the issue at thread:
>> http://www.spinics.net/lists/linux-arch/msg21736.html
>>
>> Signed-off-by: Vineet Gupta <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Hugh Dickins <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: David Rientjes <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected] <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Max Filippov <[email protected]>
>> ---
>> mm/memory.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6dc1882..d9d5fd9 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>> spinlock_t *ptl;
>> pte_t *start_pte;
>> pte_t *pte;
>> + unsigned long range_start = addr;
>>
>> again:
>> init_rss_vec(rss);
>> @@ -1215,12 +1216,14 @@ again:
>> force_flush = 0;
>>
>> #ifdef HAVE_GENERIC_MMU_GATHER
>> - tlb->start = addr;
>> - tlb->end = end;
>> + tlb->start = range_start;
>> + tlb->end = addr;
>> #endif
>> tlb_flush_mmu(tlb);
>> - if (addr != end)
>> + if (addr != end) {
>> + range_start = addr;
>> goto again;
>> + }
>> }
> Isn't this code only run if force_flush != 0? force_flush is set to
> !__tlb_remove_page() and this function always returns 1 on (generic TLB)
> UP since tlb_fast_mode() is 1. There is no batching on UP with the
> generic TLB code.
Correct ! That's why the changelog says I couldn't test it on ARC port itself :-)
However based on the other discussion (Max's TLB/PTE inconsistency), as I started
writing code to reuse this block to flush the TLB even for non forced case, I
realized that what this is doing is incorrect and won't work for the general flushing.
Ignoring all other threads, do we agree that the exiting code - if used in any
situations is incorrect semantically ?
-Vineet
On Wed, May 29, 2013 at 03:08:37PM +0100, Vineet Gupta wrote:
> On 05/29/2013 07:33 PM, Catalin Marinas wrote:
> > On Wed, May 29, 2013 at 01:56:13PM +0100, Vineet Gupta wrote:
> >> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> >> batching slots, TLB entries needs to be flushed for @start to @interim,
> >> NOT @interim to @end.
> >>
> >> Since ARC port doesn't use page free batching I can't test it myself but
> >> this seems like the right thing to do.
> >> Observed this when working on a fix for the issue at thread:
> >> http://www.spinics.net/lists/linux-arch/msg21736.html
> >>
> >> Signed-off-by: Vineet Gupta <[email protected]>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Cc: Hugh Dickins <[email protected]>
> >> Cc: Rik van Riel <[email protected]>
> >> Cc: David Rientjes <[email protected]>
> >> Cc: Peter Zijlstra <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected] <[email protected]>
> >> Cc: Catalin Marinas <[email protected]>
> >> Cc: Max Filippov <[email protected]>
> >> ---
> >> mm/memory.c | 9 ++++++---
> >> 1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 6dc1882..d9d5fd9 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> >> spinlock_t *ptl;
> >> pte_t *start_pte;
> >> pte_t *pte;
> >> + unsigned long range_start = addr;
> >>
> >> again:
> >> init_rss_vec(rss);
> >> @@ -1215,12 +1216,14 @@ again:
> >> force_flush = 0;
> >>
> >> #ifdef HAVE_GENERIC_MMU_GATHER
> >> - tlb->start = addr;
> >> - tlb->end = end;
> >> + tlb->start = range_start;
> >> + tlb->end = addr;
> >> #endif
> >> tlb_flush_mmu(tlb);
> >> - if (addr != end)
> >> + if (addr != end) {
> >> + range_start = addr;
> >> goto again;
> >> + }
> >> }
> > Isn't this code only run if force_flush != 0? force_flush is set to
> > !__tlb_remove_page() and this function always returns 1 on (generic TLB)
> > UP since tlb_fast_mode() is 1. There is no batching on UP with the
> > generic TLB code.
>
> Correct ! That's why the changelog says I couldn't test it on ARC port itself :-)
>
> However based on the other discussion (Max's TLB/PTE inconsistency), as I started
> writing code to reuse this block to flush the TLB even for non forced case, I
> realized that what this is doing is incorrect and won't work for the general flushing.
An alternative would be to make sure the above block is always called
when tlb_fast_mode():
diff --git a/mm/memory.c b/mm/memory.c
index 6dc1882..f8b1f30 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1211,7 +1211,7 @@ again:
* the PTE lock to avoid doing the potential expensive TLB invalidate
* and page-free while holding it.
*/
- if (force_flush) {
+ if (force_flush || tlb_fast_mode(tlb)) {
force_flush = 0;
#ifdef HAVE_GENERIC_MMU_GATHER
> Ignoring all other threads, do we agree that the exiting code - if used in any
> situations is incorrect semantically ?
It is incorrect unless there are requirements for
arch_leave_lazy_mmu_mode() to handle the TLB invalidation (it doesn't
look like it's widely implemented though).
--
Catalin
On 05/29/2013 07:59 PM, Catalin Marinas wrote:
> On Wed, May 29, 2013 at 03:08:37PM +0100, Vineet Gupta wrote:
>> On 05/29/2013 07:33 PM, Catalin Marinas wrote:
>>> On Wed, May 29, 2013 at 01:56:13PM +0100, Vineet Gupta wrote:
>>>> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
>>>> batching slots, TLB entries needs to be flushed for @start to @interim,
>>>> NOT @interim to @end.
>>>>
>>>> Since ARC port doesn't use page free batching I can't test it myself but
>>>> this seems like the right thing to do.
>>>> Observed this when working on a fix for the issue at thread:
>>>> http://www.spinics.net/lists/linux-arch/msg21736.html
>>>>
>>>> Signed-off-by: Vineet Gupta <[email protected]>
>>>> Cc: Andrew Morton <[email protected]>
>>>> Cc: Mel Gorman <[email protected]>
>>>> Cc: Hugh Dickins <[email protected]>
>>>> Cc: Rik van Riel <[email protected]>
>>>> Cc: David Rientjes <[email protected]>
>>>> Cc: Peter Zijlstra <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected] <[email protected]>
>>>> Cc: Catalin Marinas <[email protected]>
>>>> Cc: Max Filippov <[email protected]>
>>>> ---
>>>> mm/memory.c | 9 ++++++---
>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 6dc1882..d9d5fd9 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>>> spinlock_t *ptl;
>>>> pte_t *start_pte;
>>>> pte_t *pte;
>>>> + unsigned long range_start = addr;
>>>>
>>>> again:
>>>> init_rss_vec(rss);
>>>> @@ -1215,12 +1216,14 @@ again:
>>>> force_flush = 0;
>>>>
>>>> #ifdef HAVE_GENERIC_MMU_GATHER
>>>> - tlb->start = addr;
>>>> - tlb->end = end;
>>>> + tlb->start = range_start;
>>>> + tlb->end = addr;
>>>> #endif
>>>> tlb_flush_mmu(tlb);
>>>> - if (addr != end)
>>>> + if (addr != end) {
>>>> + range_start = addr;
>>>> goto again;
>>>> + }
>>>> }
>>> Isn't this code only run if force_flush != 0? force_flush is set to
>>> !__tlb_remove_page() and this function always returns 1 on (generic TLB)
>>> UP since tlb_fast_mode() is 1. There is no batching on UP with the
>>> generic TLB code.
>> Correct ! That's why the changelog says I couldn't test it on ARC port itself :-)
>>
>> However based on the other discussion (Max's TLB/PTE inconsistency), as I started
>> writing code to reuse this block to flush the TLB even for non forced case, I
>> realized that what this is doing is incorrect and won't work for the general flushing.
> An alternative would be to make sure the above block is always called
> when tlb_fast_mode():
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..f8b1f30 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1211,7 +1211,7 @@ again:
> * the PTE lock to avoid doing the potential expensive TLB invalidate
> * and page-free while holding it.
> */
> - if (force_flush) {
> + if (force_flush || tlb_fast_mode(tlb)) {
> force_flush = 0;
I agree with tlb_fast_mode() addition (to solve Max's issue). The problem however
is that when we hit this at the end of loop - @addr is already pointing to @end so
range flush gets start = end - not what we really intended.
>> Ignoring all other threads, do we agree that the exiting code - if used in any
>> situations is incorrect semantically ?
> It is incorrect unless there are requirements for
> arch_leave_lazy_mmu_mode() to handle the TLB invalidation (it doesn't
> look like it's widely implemented though).
This patch is preparatory - independent of Max's issue. It is fixing just the
forced flush case - whoever uses it right now (ofcourse UP + generic TLB doesn't).
Thx,
-Vineet
On Wed, May 29, 2013 at 03:36:02PM +0100, Vineet Gupta wrote:
> On 05/29/2013 07:59 PM, Catalin Marinas wrote:
> > On Wed, May 29, 2013 at 03:08:37PM +0100, Vineet Gupta wrote:
> >> On 05/29/2013 07:33 PM, Catalin Marinas wrote:
> >>> On Wed, May 29, 2013 at 01:56:13PM +0100, Vineet Gupta wrote:
> >>>> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> >>>> batching slots, TLB entries needs to be flushed for @start to @interim,
> >>>> NOT @interim to @end.
> >>>>
> >>>> Since ARC port doesn't use page free batching I can't test it myself but
> >>>> this seems like the right thing to do.
> >>>> Observed this when working on a fix for the issue at thread:
> >>>> http://www.spinics.net/lists/linux-arch/msg21736.html
> >>>>
> >>>> Signed-off-by: Vineet Gupta <[email protected]>
> >>>> Cc: Andrew Morton <[email protected]>
> >>>> Cc: Mel Gorman <[email protected]>
> >>>> Cc: Hugh Dickins <[email protected]>
> >>>> Cc: Rik van Riel <[email protected]>
> >>>> Cc: David Rientjes <[email protected]>
> >>>> Cc: Peter Zijlstra <[email protected]>
> >>>> Cc: [email protected]
> >>>> Cc: [email protected] <[email protected]>
> >>>> Cc: Catalin Marinas <[email protected]>
> >>>> Cc: Max Filippov <[email protected]>
> >>>> ---
> >>>> mm/memory.c | 9 ++++++---
> >>>> 1 file changed, 6 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/memory.c b/mm/memory.c
> >>>> index 6dc1882..d9d5fd9 100644
> >>>> --- a/mm/memory.c
> >>>> +++ b/mm/memory.c
> >>>> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> >>>> spinlock_t *ptl;
> >>>> pte_t *start_pte;
> >>>> pte_t *pte;
> >>>> + unsigned long range_start = addr;
> >>>>
> >>>> again:
> >>>> init_rss_vec(rss);
> >>>> @@ -1215,12 +1216,14 @@ again:
> >>>> force_flush = 0;
> >>>>
> >>>> #ifdef HAVE_GENERIC_MMU_GATHER
> >>>> - tlb->start = addr;
> >>>> - tlb->end = end;
> >>>> + tlb->start = range_start;
> >>>> + tlb->end = addr;
> >>>> #endif
> >>>> tlb_flush_mmu(tlb);
> >>>> - if (addr != end)
> >>>> + if (addr != end) {
> >>>> + range_start = addr;
> >>>> goto again;
> >>>> + }
> >>>> }
> >>> Isn't this code only run if force_flush != 0? force_flush is set to
> >>> !__tlb_remove_page() and this function always returns 1 on (generic TLB)
> >>> UP since tlb_fast_mode() is 1. There is no batching on UP with the
> >>> generic TLB code.
> >> Correct ! That's why the changelog says I couldn't test it on ARC port itself :-)
> >>
> >> However based on the other discussion (Max's TLB/PTE inconsistency), as I started
> >> writing code to reuse this block to flush the TLB even for non forced case, I
> >> realized that what this is doing is incorrect and won't work for the general flushing.
> > An alternative would be to make sure the above block is always called
> > when tlb_fast_mode():
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6dc1882..f8b1f30 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1211,7 +1211,7 @@ again:
> > * the PTE lock to avoid doing the potential expensive TLB invalidate
> > * and page-free while holding it.
> > */
> > - if (force_flush) {
> > + if (force_flush || tlb_fast_mode(tlb)) {
> > force_flush = 0;
>
> I agree with tlb_fast_mode() addition (to solve Max's issue). The problem however
> is that when we hit this at the end of loop - @addr is already pointing to @end so
> range flush gets start = end - not what we really intended.
OK. So for this part your patch looks fine.
Acked-by: Catalin Marinas <[email protected]>
[[email protected]]
On 05/29/2013 06:26 PM, Vineet Gupta wrote:
> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> batching slots, TLB entries needs to be flushed for @start to @interim,
> NOT @interim to @end.
>
> Since ARC port doesn't use page free batching I can't test it myself but
> this seems like the right thing to do.
> Observed this when working on a fix for the issue at thread:
> http://www.spinics.net/lists/linux-arch/msg21736.html
>
> Signed-off-by: Vineet Gupta <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: [email protected]
> Cc: [email protected] <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Max Filippov <[email protected]>
> ---
> mm/memory.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..d9d5fd9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1110,6 +1110,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> spinlock_t *ptl;
> pte_t *start_pte;
> pte_t *pte;
> + unsigned long range_start = addr;
>
> again:
> init_rss_vec(rss);
> @@ -1215,12 +1216,14 @@ again:
> force_flush = 0;
>
> #ifdef HAVE_GENERIC_MMU_GATHER
> - tlb->start = addr;
> - tlb->end = end;
> + tlb->start = range_start;
> + tlb->end = addr;
> #endif
> tlb_flush_mmu(tlb);
> - if (addr != end)
> + if (addr != end) {
> + range_start = addr;
> goto again;
> + }
> }
>
> return addr;
>
From: Vineet Gupta <[email protected]>
Date: Wed, 29 May 2013 18:26:13 +0530
> zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> batching slots, TLB entries needs to be flushed for @start to @interim,
> NOT @interim to @end.
>
> Since ARC port doesn't use page free batching I can't test it myself but
> this seems like the right thing to do.
> Observed this when working on a fix for the issue at thread:
> http://www.spinics.net/lists/linux-arch/msg21736.html
>
> Signed-off-by: Vineet Gupta <[email protected]>
As this bug can cause pretty serious memory corruption, I'd like to
see this submitted to -stable.
Thanks!
On Mon, 29 Jul 2013 16:41:06 -0700 (PDT) David Miller <[email protected]> wrote:
> From: Vineet Gupta <[email protected]>
> Date: Wed, 29 May 2013 18:26:13 +0530
>
> > zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> > batching slots, TLB entries needs to be flushed for @start to @interim,
> > NOT @interim to @end.
> >
> > Since ARC port doesn't use page free batching I can't test it myself but
> > this seems like the right thing to do.
> > Observed this when working on a fix for the issue at thread:
> > http://www.spinics.net/lists/linux-arch/msg21736.html
> >
> > Signed-off-by: Vineet Gupta <[email protected]>
>
> As this bug can cause pretty serious memory corruption, I'd like to
> see this submitted to -stable.
Greg, e6c495a96ce02574e765d5140039a64c8d4e8c9e from mainline, please.
On Mon, Jul 29, 2013 at 04:46:58PM -0700, Andrew Morton wrote:
> On Mon, 29 Jul 2013 16:41:06 -0700 (PDT) David Miller <[email protected]> wrote:
>
> > From: Vineet Gupta <[email protected]>
> > Date: Wed, 29 May 2013 18:26:13 +0530
> >
> > > zap_pte_range loops from @addr to @end. In the middle, if it runs out of
> > > batching slots, TLB entries needs to be flushed for @start to @interim,
> > > NOT @interim to @end.
> > >
> > > Since ARC port doesn't use page free batching I can't test it myself but
> > > this seems like the right thing to do.
> > > Observed this when working on a fix for the issue at thread:
> > > http://www.spinics.net/lists/linux-arch/msg21736.html
> > >
> > > Signed-off-by: Vineet Gupta <[email protected]>
> >
> > As this bug can cause pretty serious memory corruption, I'd like to
> > see this submitted to -stable.
>
> Greg, e6c495a96ce02574e765d5140039a64c8d4e8c9e from mainline, please.
Now applied to 3.10-stable, thanks.
greg k-h