2018-11-30 14:07:41

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

Hi,

Yesterday Tom reported a CPA bug triggered by the AMDGPU team.

It turns out that with commit:

a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")

I misread the cpa array code and messed up the TLB invalidations for it. These
patches (hopefully) fix the issue while also shrinking the CPA code again.

Tom, would you be so kind as to test again? These patches are significantly
different from what I send you yesterday.

---
arch/x86/mm/mm_internal.h | 2 +
arch/x86/mm/pageattr.c | 167 ++++++++++++++++++++--------------------------
arch/x86/mm/tlb.c | 4 +-
3 files changed, 79 insertions(+), 94 deletions(-)



2018-11-30 14:54:30

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

Hi Peter,

Unfortunately I can't apply this on top of our drm-next the first patch
fails.

Alex: could we rebase again at some point?

Tom

On 2018-11-30 8:44 a.m., Peter Zijlstra wrote:
> Hi,
>
> Yesterday Tom reported a CPA bug triggered by the AMDGPU team.
>
> It turns out that with commit:
>
> a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
>
> I misread the cpa array code and messed up the TLB invalidations for it. These
> patches (hopefully) fix the issue while also shrinking the CPA code again.
>
> Tom, would you be so kind as to test again? These patches are significantly
> different from what I send you yesterday.
>
> ---
> arch/x86/mm/mm_internal.h | 2 +
> arch/x86/mm/pageattr.c | 167 ++++++++++++++++++++--------------------------
> arch/x86/mm/tlb.c | 4 +-
> 3 files changed, 79 insertions(+), 94 deletions(-)
>

2018-11-30 15:10:40

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 10:07 a.m., Deucher, Alexander wrote:
> Sure, but it might be week or so.  For now can you test against Linus
> master?  It should be close enough.

I need the bulk move from the our drm-next merge (which isn't yet
upstream) to trigger the bug though.

I can try to cherry pick it on top of master.

Tom

>
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* StDenis, Tom
> *Sent:* Friday, November 30, 2018 9:52:26 AM
> *To:* Peter Zijlstra; [email protected]; [email protected]
> *Cc:* [email protected]; [email protected]; Deucher, Alexander
> *Subject:* Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation
> Hi Peter,
>
> Unfortunately I can't apply this on top of our drm-next the first patch
> fails.
>
> Alex: could we rebase again at some point?
>
> Tom
>
> On 2018-11-30 8:44 a.m., Peter Zijlstra wrote:
>> Hi,
>>
>> Yesterday Tom reported a CPA bug triggered by the AMDGPU team.
>>
>> It turns out that with commit:
>>
>>    a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
>>
>> I misread the cpa array code and messed up the TLB invalidations for it. These
>> patches (hopefully) fix the issue while also shrinking the CPA code again.
>>
>> Tom, would you be so kind as to test again? These patches are significantly
>> different from what I send you yesterday.
>>
>> ---
>> arch/x86/mm/mm_internal.h |   2 +
>> arch/x86/mm/pageattr.c    | 167 ++++++++++++++++++++--------------------------
>> arch/x86/mm/tlb.c         |   4 +-
>> 3 files changed, 79 insertions(+), 94 deletions(-)
>>
>

2018-11-30 15:11:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 02:52:26PM +0000, StDenis, Tom wrote:
> Hi Peter,
>
> Unfortunately I can't apply this on top of our drm-next the first patch
> fails.

Against what tree would you like the patches? rebasing should not be
hard I think.

2018-11-30 15:13:37

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 10:09 a.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 02:52:26PM +0000, StDenis, Tom wrote:
>> Hi Peter,
>>
>> Unfortunately I can't apply this on top of our drm-next the first patch
>> fails.
>
> Against what tree would you like the patches? rebasing should not be
> hard I think.
>

Actually never mind the AMDGPU patches I need are actually upstream I
was mistaken :-)

I'll try it out shortly.

Tom

2018-11-30 15:16:59

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 10:09 a.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 02:52:26PM +0000, StDenis, Tom wrote:
>> Hi Peter,
>>
>> Unfortunately I can't apply this on top of our drm-next the first patch
>> fails.
>
> Against what tree would you like the patches? rebasing should not be
> hard I think.
>

Actually I just tried applying against the tip of master and got the
same errors...

[root@carrizo linux]# git apply \[PATCH\ 1_4\]\ x86_mm_cpa\:\ Add\
__cpa_addr\(\)\ helper\ -\ Peter\ Zijlstra\ \<[email protected]\>\ -\
2018-11-30\ 0844.eml
error: patch failed: arch/x86/mm/pageattr.c:228
error: arch/x86/mm/pageattr.c: patch does not apply


Any ideas?

Tom

2018-11-30 15:26:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 03:14:30PM +0000, StDenis, Tom wrote:
> On 2018-11-30 10:09 a.m., Peter Zijlstra wrote:
> > On Fri, Nov 30, 2018 at 02:52:26PM +0000, StDenis, Tom wrote:
> >> Hi Peter,
> >>
> >> Unfortunately I can't apply this on top of our drm-next the first patch
> >> fails.
> >
> > Against what tree would you like the patches? rebasing should not be
> > hard I think.
> >
>
> Actually I just tried applying against the tip of master and got the
> same errors...
>
> [root@carrizo linux]# git apply \[PATCH\ 1_4\]\ x86_mm_cpa\:\ Add\
> __cpa_addr\(\)\ helper\ -\ Peter\ Zijlstra\ \<[email protected]\>\ -\
> 2018-11-30\ 0844.eml
> error: patch failed: arch/x86/mm/pageattr.c:228
> error: arch/x86/mm/pageattr.c: patch does not apply
>
>
> Any ideas?

Hurm.. no. They apply cleanly to Linus' tree here.

linux-2.6$ git describe
v4.20-rc4-156-g94f371cb7394
linux-2.6$ quilt push 4
Applying patch patches/peterz-cpa-addr.patch
patching file arch/x86/mm/pageattr.c
Applying patch patches/peterz-cpa-fix-flush_array.patch
patching file arch/x86/mm/mm_internal.h
patching file arch/x86/mm/pageattr.c
patching file arch/x86/mm/tlb.c
Applying patch patches/peterz-cpa-fold-cpa_flush.patch
patching file arch/x86/mm/pageattr.c
Applying patch patches/peterz-cpa-clflush_opt.patch
patching file arch/x86/mm/pageattr.c
Now at patch patches/peterz-cpa-clflush_opt.patch

Weird.

2018-11-30 15:29:16

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 10:23 a.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 03:14:30PM +0000, StDenis, Tom wrote:
>> On 2018-11-30 10:09 a.m., Peter Zijlstra wrote:
>>> On Fri, Nov 30, 2018 at 02:52:26PM +0000, StDenis, Tom wrote:
>>>> Hi Peter,
>>>>
>>>> Unfortunately I can't apply this on top of our drm-next the first patch
>>>> fails.
>>>
>>> Against what tree would you like the patches? rebasing should not be
>>> hard I think.
>>>
>>
>> Actually I just tried applying against the tip of master and got the
>> same errors...
>>
>> [root@carrizo linux]# git apply \[PATCH\ 1_4\]\ x86_mm_cpa\:\ Add\
>> __cpa_addr\(\)\ helper\ -\ Peter\ Zijlstra\ \<[email protected]\>\ -\
>> 2018-11-30\ 0844.eml
>> error: patch failed: arch/x86/mm/pageattr.c:228
>> error: arch/x86/mm/pageattr.c: patch does not apply
>>
>>
>> Any ideas?
>
> Hurm.. no. They apply cleanly to Linus' tree here.
>
> linux-2.6$ git describe
> v4.20-rc4-156-g94f371cb7394
> linux-2.6$ quilt push 4
> Applying patch patches/peterz-cpa-addr.patch
> patching file arch/x86/mm/pageattr.c
> Applying patch patches/peterz-cpa-fix-flush_array.patch
> patching file arch/x86/mm/mm_internal.h
> patching file arch/x86/mm/pageattr.c
> patching file arch/x86/mm/tlb.c
> Applying patch patches/peterz-cpa-fold-cpa_flush.patch
> patching file arch/x86/mm/pageattr.c
> Applying patch patches/peterz-cpa-clflush_opt.patch
> patching file arch/x86/mm/pageattr.c
> Now at patch patches/peterz-cpa-clflush_opt.patch
>
> Weird.
>

I can apply the patch you attached but the inline patches just don't
apply. Could be my imap client (thunderbird) mangled them but I've
applied patches this way before. could you attach them instead please?

Tom

2018-11-30 15:33:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 04:23:47PM +0100, Peter Zijlstra wrote:

> Hurm.. no. They apply cleanly to Linus' tree here.
>
> linux-2.6$ git describe
> v4.20-rc4-156-g94f371cb7394
> linux-2.6$ quilt push 4
> Applying patch patches/peterz-cpa-addr.patch
> patching file arch/x86/mm/pageattr.c
> Applying patch patches/peterz-cpa-fix-flush_array.patch
> patching file arch/x86/mm/mm_internal.h
> patching file arch/x86/mm/pageattr.c
> patching file arch/x86/mm/tlb.c
> Applying patch patches/peterz-cpa-fold-cpa_flush.patch
> patching file arch/x86/mm/pageattr.c
> Applying patch patches/peterz-cpa-clflush_opt.patch
> patching file arch/x86/mm/pageattr.c
> Now at patch patches/peterz-cpa-clflush_opt.patch
>
> Weird.

I pushed them out to:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm

I hope that works; I'm out for a few hours, but should check on email
again tonight.

2018-11-30 16:23:48

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 10:31 a.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 04:23:47PM +0100, Peter Zijlstra wrote:
>
>> Hurm.. no. They apply cleanly to Linus' tree here.
>>
>> linux-2.6$ git describe
>> v4.20-rc4-156-g94f371cb7394
>> linux-2.6$ quilt push 4
>> Applying patch patches/peterz-cpa-addr.patch
>> patching file arch/x86/mm/pageattr.c
>> Applying patch patches/peterz-cpa-fix-flush_array.patch
>> patching file arch/x86/mm/mm_internal.h
>> patching file arch/x86/mm/pageattr.c
>> patching file arch/x86/mm/tlb.c
>> Applying patch patches/peterz-cpa-fold-cpa_flush.patch
>> patching file arch/x86/mm/pageattr.c
>> Applying patch patches/peterz-cpa-clflush_opt.patch
>> patching file arch/x86/mm/pageattr.c
>> Now at patch patches/peterz-cpa-clflush_opt.patch
>>
>> Weird.
>
> I pushed them out to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm
>
> I hope that works; I'm out for a few hours, but should check on email
> again tonight.
>

NAK I get a failure in TTM on init with your x86/mm branch (see attached
dmesg).

This builds an RC2 kernel btw whereas we were building an RC3 kernel
which is about 974 commits behind the tip of our drm-next and about 850
commits behind the last drm-next merge from Dave.

Tom


Attachments:
carrizo_dmesg.log (66.56 kB)
carrizo_dmesg.log

2018-11-30 17:48:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
> On 2018-11-30 10:31 a.m., Peter Zijlstra wrote:

> > I pushed them out to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm
> >
> > I hope that works; I'm out for a few hours, but should check on email
> > again tonight.
> >
>
> NAK I get a failure in TTM on init with your x86/mm branch (see attached
> dmesg).

*sigh*, it's been one of those days. Ok, I'll go write some cpa
selftests or something so that I have code that uses this stuff.

2018-11-30 17:50:55

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On 2018-11-30 12:48 p.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
>> On 2018-11-30 10:31 a.m., Peter Zijlstra wrote:
>
>>> I pushed them out to:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm
>>>
>>> I hope that works; I'm out for a few hours, but should check on email
>>> again tonight.
>>>
>>
>> NAK I get a failure in TTM on init with your x86/mm branch (see attached
>> dmesg).
>
> *sigh*, it's been one of those days. Ok, I'll go write some cpa
> selftests or something so that I have code that uses this stuff.
>

Well the ttm crash could be completely unrelated the problem is your
x86/mm branch is not up to date with master and doesn't include drm fixes.

Tom

2018-11-30 17:51:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 03:27:02PM +0000, StDenis, Tom wrote:
> I can apply the patch you attached but the inline patches just don't
> apply. Could be my imap client (thunderbird) mangled them but I've
> applied patches this way before. could you attach them instead please?

That's arguably a bug in Thunderbird; but there's already upstream quilt
changes (that I used to have before Debian helpfully updated my quilt
package) that should remedy this as well.

It seems some MUA's get horribly confused about the
"Content-Disposition: inline; filename=$patch" header quilt-mail adds.

I've once again removed that from my local copy; hopefully the next time
Debian updates that package it will actually be with a new enough
version to also include those changes :/

2018-11-30 18:08:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 05:49:34PM +0000, StDenis, Tom wrote:
> On 2018-11-30 12:48 p.m., Peter Zijlstra wrote:
> > On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
> >> On 2018-11-30 10:31 a.m., Peter Zijlstra wrote:
> >
> >>> I pushed them out to:
> >>>
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm
> >>>
> >>> I hope that works; I'm out for a few hours, but should check on email
> >>> again tonight.
> >>>
> >>
> >> NAK I get a failure in TTM on init with your x86/mm branch (see attached
> >> dmesg).
> >
> > *sigh*, it's been one of those days. Ok, I'll go write some cpa
> > selftests or something so that I have code that uses this stuff.
> >
>
> Well the ttm crash could be completely unrelated the problem is your
> x86/mm branch is not up to date with master and doesn't include drm fixes.

Well, it crashes right in the middle of the CPA code, and I'm having a
horrible day, so I'm thinking I screwed up rather than anything else.

Also, some level of selftests would be good to have in any case I
figure.

2018-12-03 15:42:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
> NAK I get a failure in TTM on init with your x86/mm branch (see attached
> dmesg).

So the good news is that with some additional self-tests I can trivially
reproduce this. The bad news is that an otherwise straight forward
cleanup seems to make CPA horribly mad at me.

And since we're somewhat late in the release cycle, I suppose we should
do the simple thing first, and then I can try and figure out this CPA
mess later.

So how about this relatively simple partial revert to sort the problem.

---
Subject: x86/mm/cpa: Fix cpa_flush_array() TLB invalidation

In commit:

a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")

I misread the cpa array code and incorrectly used
tlb_flush_kernel_range(), resulting in missing TLB flushes and
consequent failures.

Instead do a full invalidate in this case -- for now.

Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
Reported-by: "StDenis, Tom" <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/mm/pageattr.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index bac35001d896..61bc7d1800d7 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -285,20 +285,16 @@ static void cpa_flush_all(unsigned long cache)
on_each_cpu(__cpa_flush_all, (void *) cache, 1);
}

-static bool __cpa_flush_range(unsigned long start, int numpages, int cache)
+static bool __inv_flush_all(int cache)
{
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);

- WARN_ON(PAGE_ALIGN(start) != start);
-
if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
cpa_flush_all(cache);
return true;
}

- flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
-
- return !cache;
+ return false;
}

static void cpa_flush_range(unsigned long start, int numpages, int cache)
@@ -306,7 +302,14 @@ static void cpa_flush_range(unsigned long start, int numpages, int cache)
unsigned int i, level;
unsigned long addr;

- if (__cpa_flush_range(start, numpages, cache))
+ WARN_ON(PAGE_ALIGN(start) != start);
+
+ if (__inv_flush_all(cache))
+ return;
+
+ flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
+
+ if (!cache)
return;

/*
@@ -332,7 +335,12 @@ static void cpa_flush_array(unsigned long baddr, unsigned long *start,
{
unsigned int i, level;

- if (__cpa_flush_range(baddr, numpages, cache))
+ if (__inv_flush_all(cache))
+ return;
+
+ flush_tlb_all();
+
+ if (!cache)
return;

/*

2018-12-03 19:27:37

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

Hi Peter,

After updating my UMDs (mesa/etc) over the weekend I cannot reproduce
the bug to begin with. I'll try jumping directly to the intersection
and see if I can reproduce the fault there otherwise I'll have to
rollback my umds.

Hopefully I can test this tomorrow.

Tom

On 2018-12-03 10:41 a.m., Peter Zijlstra wrote:
> On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
>> NAK I get a failure in TTM on init with your x86/mm branch (see attached
>> dmesg).
>
> So the good news is that with some additional self-tests I can trivially
> reproduce this. The bad news is that an otherwise straight forward
> cleanup seems to make CPA horribly mad at me.
>
> And since we're somewhat late in the release cycle, I suppose we should
> do the simple thing first, and then I can try and figure out this CPA
> mess later.
>
> So how about this relatively simple partial revert to sort the problem.
>
> ---
> Subject: x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
>
> In commit:
>
> a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
>
> I misread the cpa array code and incorrectly used
> tlb_flush_kernel_range(), resulting in missing TLB flushes and
> consequent failures.
>
> Instead do a full invalidate in this case -- for now.
>
> Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
> Reported-by: "StDenis, Tom" <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/mm/pageattr.c | 24 ++++++++++++++++--------
> 1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index bac35001d896..61bc7d1800d7 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -285,20 +285,16 @@ static void cpa_flush_all(unsigned long cache)
> on_each_cpu(__cpa_flush_all, (void *) cache, 1);
> }
>
> -static bool __cpa_flush_range(unsigned long start, int numpages, int cache)
> +static bool __inv_flush_all(int cache)
> {
> BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
>
> - WARN_ON(PAGE_ALIGN(start) != start);
> -
> if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
> cpa_flush_all(cache);
> return true;
> }
>
> - flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
> -
> - return !cache;
> + return false;
> }
>
> static void cpa_flush_range(unsigned long start, int numpages, int cache)
> @@ -306,7 +302,14 @@ static void cpa_flush_range(unsigned long start, int numpages, int cache)
> unsigned int i, level;
> unsigned long addr;
>
> - if (__cpa_flush_range(start, numpages, cache))
> + WARN_ON(PAGE_ALIGN(start) != start);
> +
> + if (__inv_flush_all(cache))
> + return;
> +
> + flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
> +
> + if (!cache)
> return;
>
> /*
> @@ -332,7 +335,12 @@ static void cpa_flush_array(unsigned long baddr, unsigned long *start,
> {
> unsigned int i, level;
>
> - if (__cpa_flush_range(baddr, numpages, cache))
> + if (__inv_flush_all(cache))
> + return;
> +
> + flush_tlb_all();
> +
> + if (!cache)
> return;
>
> /*
>

2018-12-05 16:29:45

by StDenis, Tom

[permalink] [raw]
Subject: Re: [PATCH 0/4] x86/mm/cpa: Fix cpa-array TLB invalidation

Hi Peter,

Good news is that I got our opengl test running on your x86/mm branch.
The commit a2b4306c50b5de2ca955cd73ac57c2ac6426ee15 (current tip of
x86/mm) is good. For sanity I jumped back and found this commit
a2aa52ab16efbee40ad118ebac4a5e438f5b43ee doesn't work.

Thanks,
Tom



On 2018-12-03 2:26 p.m., Tom St Denis wrote:
> Hi Peter,
>
> After updating my UMDs (mesa/etc) over the weekend I cannot reproduce
> the bug to begin with.  I'll try jumping directly to the intersection
> and see if I can reproduce the fault there otherwise I'll have to
> rollback my umds.
>
> Hopefully I can test this tomorrow.
>
> Tom
>
> On 2018-12-03 10:41 a.m., Peter Zijlstra wrote:
>> On Fri, Nov 30, 2018 at 04:19:46PM +0000, StDenis, Tom wrote:
>>> NAK I get a failure in TTM on init with your x86/mm branch (see attached
>>> dmesg).
>>
>> So the good news is that with some additional self-tests I can trivially
>> reproduce this. The bad news is that an otherwise straight forward
>> cleanup seems to make CPA horribly mad at me.
>>
>> And since we're somewhat late in the release cycle, I suppose we should
>> do the simple thing first, and then I can try and figure out this CPA
>> mess later.
>>
>> So how about this relatively simple partial revert to sort the problem.
>>
>> ---
>> Subject: x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
>>
>> In commit:
>>
>>    a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
>>
>> I misread the cpa array code and incorrectly used
>> tlb_flush_kernel_range(), resulting in missing TLB flushes and
>> consequent failures.
>>
>> Instead do a full invalidate in this case -- for now.
>>
>> Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()")
>> Reported-by: "StDenis, Tom" <[email protected]>
>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> ---
>>   arch/x86/mm/pageattr.c | 24 ++++++++++++++++--------
>>   1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>> index bac35001d896..61bc7d1800d7 100644
>> --- a/arch/x86/mm/pageattr.c
>> +++ b/arch/x86/mm/pageattr.c
>> @@ -285,20 +285,16 @@ static void cpa_flush_all(unsigned long cache)
>>       on_each_cpu(__cpa_flush_all, (void *) cache, 1);
>>   }
>> -static bool __cpa_flush_range(unsigned long start, int numpages, int
>> cache)
>> +static bool __inv_flush_all(int cache)
>>   {
>>       BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
>> -    WARN_ON(PAGE_ALIGN(start) != start);
>> -
>>       if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
>>           cpa_flush_all(cache);
>>           return true;
>>       }
>> -    flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
>> -
>> -    return !cache;
>> +    return false;
>>   }
>>   static void cpa_flush_range(unsigned long start, int numpages, int
>> cache)
>> @@ -306,7 +302,14 @@ static void cpa_flush_range(unsigned long start,
>> int numpages, int cache)
>>       unsigned int i, level;
>>       unsigned long addr;
>> -    if (__cpa_flush_range(start, numpages, cache))
>> +    WARN_ON(PAGE_ALIGN(start) != start);
>> +
>> +    if (__inv_flush_all(cache))
>> +        return;
>> +
>> +    flush_tlb_kernel_range(start, start + PAGE_SIZE * numpages);
>> +
>> +    if (!cache)
>>           return;
>>       /*
>> @@ -332,7 +335,12 @@ static void cpa_flush_array(unsigned long baddr,
>> unsigned long *start,
>>   {
>>       unsigned int i, level;
>> -    if (__cpa_flush_range(baddr, numpages, cache))
>> +    if (__inv_flush_all(cache))
>> +        return;
>> +
>> +    flush_tlb_all();
>> +
>> +    if (!cache)
>>           return;
>>       /*
>>
>