LinuxLists.cc - Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

2013-08-14 17:40:45

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

[Let's CC some more people]

On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!

Hi,

> Since v3.7.2 on two independent machines a very specific Git repository
> fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> only occurs on a very specific repository and can be reproduced stably
> on two independent laptops. Git mailing list ran out of ideas and for me
> this looks like some very exotic kernel issue.
>
> After a _very long session of rebooting and bisecting_ the Linux kernel
> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> to the following patch:
>
> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> 787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable [1]
> 53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]

Thanks for bisecting this up!

I will look into this but I find it really strange. The patch only
limits the number of batched pages to be freed. This might happen even
without the patch, albeit less likely, when a new batch cannot be
allocated.
That being said, I do not see anything obviously wrong with the patch
itself. Maybe we are not flushing those pages properly in some corner
case which doesn't trigger normally. I will have to look at it but I
really think this just exhibits a subtle bug in batch pages freeing.

I have no objection to revert the patch for now until we find out what
is really going on.

> More details are available in my previous discussion on the Git mailing:
>
> http://thread.gmane.org/gmane.comp.version-control.git/231872
>
> Never had any hardware/stability issues _at all_ with these machines.
> Only one repo out of 112 is affected. It's a git-svn clone and even
> recreated copies out of svn do trigger the same failure.
>
> I was able to bisect this error to this very specific commit.
> Furthermore: Reverting this commit in 3.9.11 still solves the error.
>
> I assume this is a regression of the Linux kernel (not Git) and would
> kindly ask you to revert the afore mentioned commits.
>
> Thanks!
> - Ben
>
>
> I'm not subscribed - please CC me.
>
> [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
>

--
Michal Hocko
SUSE Labs

2013-08-14 17:58:11

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

[Forgot to add Peter]

On Wed 14-08-13 19:40:39, Michal Hocko wrote:
> [Let's CC some more people]
>
> On Wed 14-08-13 18:36:53, Ben Tebulin wrote:
> > Hello Michal, Johannes, Balbir, Kamezawa and Mailing lists!
>
> Hi,
>
> > Since v3.7.2 on two independent machines a very specific Git repository
> > fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This
> > only occurs on a very specific repository and can be reproduced stably
> > on two independent laptops. Git mailing list ran out of ideas and for me
> > this looks like some very exotic kernel issue.
> >
> > After a _very long session of rebooting and bisecting_ the Linux kernel
> > (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> > to the following patch:
> >
> > *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> > 787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable [1]
> > 53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
>
> Thanks for bisecting this up!
>
> I will look into this but I find it really strange. The patch only
> limits the number of batched pages to be freed. This might happen even
> without the patch, albeit less likely, when a new batch cannot be
> allocated.
> That being said, I do not see anything obviously wrong with the patch
> itself. Maybe we are not flushing those pages properly in some corner
> case which doesn't trigger normally. I will have to look at it but I
> really think this just exhibits a subtle bug in batch pages freeing.
>
> I have no objection to revert the patch for now until we find out what
> is really going on.
>
> > More details are available in my previous discussion on the Git mailing:
> >
> > http://thread.gmane.org/gmane.comp.version-control.git/231872
> >
> > Never had any hardware/stability issues _at all_ with these machines.
> > Only one repo out of 112 is affected. It's a git-svn clone and even
> > recreated copies out of svn do trigger the same failure.
> >
> > I was able to bisect this error to this very specific commit.
> > Furthermore: Reverting this commit in 3.9.11 still solves the error.
> >
> > I assume this is a regression of the Linux kernel (not Git) and would
> > kindly ask you to revert the afore mentioned commits.
> >
> > Thanks!
> > - Ben
> >
> >
> > I'm not subscribed - please CC me.
> >
> > [1] https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=787f7301074ccd07a3e82236ca41eefd245f4e07
> > [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=53a59fc67f97374758e63a9c785891ec62324c81
> >
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2013-08-14 18:03:36

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <[email protected]> wrote:
>>
>> After a _very long session of rebooting and bisecting_ the Linux kernel
>> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
>> to the following patch:
>>
>> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
>> 787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable [1]
>> 53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
>
> Thanks for bisecting this up!
>
> I will look into this but I find it really strange.

We had a TLB invalidation bug in the case when we ran out of page
slots (and limiting the mmu_gather batching basically forcesd an early
case of that).

It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
fix the TLB range flushed when __tlb_remove_page() runs out of
slots"), and that doesn't seem to have been marked for stable
(probably because the commit message makes everytbody reading it think
it's limited to ARC).

Ben, can you try back-porting that commit from mainline and see if
that fixes things?

Linus

2013-08-14 18:28:13

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Wed 14-08-13 11:03:32, Linus Torvalds wrote:
> On Wed, Aug 14, 2013 at 10:40 AM, Michal Hocko <[email protected]> wrote:
> >>
> >> After a _very long session of rebooting and bisecting_ the Linux kernel
> >> (fortunately I had a SSD and ccache!) I was able to pinpoint the cause
> >> to the following patch:
> >>
> >> *"mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"*
> >> 787f7301074ccd07a3e82236ca41eefd245f4e07 linux stable [1]
> >> 53a59fc67f97374758e63a9c785891ec62324c81 upstream commit [2]
> >
> > Thanks for bisecting this up!
> >
> > I will look into this but I find it really strange.
>
> We had a TLB invalidation bug in the case when we ran out of page
> slots (and limiting the mmu_gather batching basically forcesd an early
> case of that).
>
> It was fixed in commit e6c495a96ce02574e765d5140039a64c8d4e8c9e ("mm:
> fix the TLB range flushed when __tlb_remove_page() runs out of
> slots"),

OK that would suggest the issue has been introduced by 597e1c35:
(mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
which is not 3.7 when Ben started seeing the issue but this definitely
smells like a bug that would be amplified by the bisected patch.

Thanks for pointing this out, Linus!

> and that doesn't seem to have been marked for stable
> (probably because the commit message makes everytbody reading it think
> it's limited to ARC).
>
> Ben, can you try back-porting that commit from mainline and see if
> that fixes things?
>
> Linus
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2013-08-14 18:35:25

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Wed, Aug 14, 2013 at 11:28 AM, Michal Hocko <[email protected]> wrote:
>
> OK that would suggest the issue has been introduced by 597e1c35:
> (mm/mmu_gather: enable tlb flush range in generic mmu_gather) in 3.6
> which is not 3.7 when Ben started seeing the issue but this definitely
> smells like a bug that would be amplified by the bisected patch.

Yes, the bug was originally introduced in 597e1c35, but in practice it
never happened, because the force_flush case would not ever really
trigger unless __get_free_pages(GFP_NOWAIT) returned NULL.

Which is *very* rare.

So the commit that Ben bisected things down to wasn't the one that
really introduced the bug, but it was the one that made
tlb_next_batch() much more likely to return failure, which in turn
made it much easier to *expose* the bug.

NOTE! I still absolutely want Ben to actually test that fix (ie
backport commit e6c495a96ce0 to his tree), because without testing
this is all just theoretical, and there might be other things hiding
here. But it makes sense to me, and I think this already-known bug
explains the symptoms.

Linus

2013-08-15 09:25:19

by Ben Tebulin

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

Am 14.08.2013 20:35, schrieb Linus Torvalds:
> Yes, the bug was originally introduced in 597e1c35, but in practice it
> never happened, [...]
>
> NOTE! I still absolutely want Ben to actually test that fix (ie
> backport commit e6c495a96ce0 to his tree), because without testing
> this is all just theoretical, and there might be other things hiding
> here.[..]

I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
Unfortunately this does _not resolve_ my issue (too good to be true) :-(

2013-08-15 12:02:34

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
>
> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> Unfortunately this does _not resolve_ my issue (too good to be true) :-(

Ho humm. I've found at least one other bug, but that one only affects
hugepages. Do you perhaps have transparent hugepages enabled? But even
then it looks quite unlikely.

I'll think about this some more. I'm not happy with how that
particular whole TLB flushing hack was done, but I need to sleep on
this.

Linus

2013-08-15 12:37:28

by Ben Tebulin

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

Am 15.08.2013 14:02, schrieb Linus Torvalds:
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled?

I was using the Ubuntu mainline Kernel config:

ben@n179 ~/p/linux.git> cat .config | grep TRANSPARENT_HUG
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Thanks!

Being an end user having only a very limited understanding of the
internals behind this issue, I really appreciate any support I receive
from people who do. :-)

2013-08-15 13:40:34

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
> >
> > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
>
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.

__unmap_hugepage_range is hugetlb not THP if you had that one in mind.
And yes, it doesn't set the range which sounds buggy.

> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

I am looking into it as well, but there are high prio things which
preempt me a lot :/

Thanks for looking into it.
--
Michal Hocko
SUSE Labs

2013-08-15 14:46:05

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
> > >
> > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> >
> > Ho humm. I've found at least one other bug, but that one only affects
> > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > then it looks quite unlikely.
>
> __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> And yes, it doesn't set the range which sounds buggy.

Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
means that the full range is flushed.

--
Michal Hocko
SUSE Labs

2013-08-15 14:53:35

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
> > > >
> > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > >
> > > Ho humm. I've found at least one other bug, but that one only affects
> > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > then it looks quite unlikely.
> >
> > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > And yes, it doesn't set the range which sounds buggy.
>
> Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> means that the full range is flushed.

Dohh... But we need need_flush_all and that is not set here. So this
really looks buggy.

--
Michal Hocko
SUSE Labs

2013-08-15 15:14:20

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu 15-08-13 16:53:32, Michal Hocko wrote:
> On Thu 15-08-13 16:46:00, Michal Hocko wrote:
> > On Thu 15-08-13 15:40:31, Michal Hocko wrote:
> > > On Thu 15-08-13 05:02:31, Linus Torvalds wrote:
> > > > On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
> > > > >
> > > > > I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
> > > > > Unfortunately this does _not resolve_ my issue (too good to be true) :-(
> > > >
> > > > Ho humm. I've found at least one other bug, but that one only affects
> > > > hugepages. Do you perhaps have transparent hugepages enabled? But even
> > > > then it looks quite unlikely.
> > >
> > > __unmap_hugepage_range is hugetlb not THP if you had that one in mind.
> > > And yes, it doesn't set the range which sounds buggy.
> >
> > Or, did you mean tlb_remove_page called from zap_huge_pmd? That one
> > should be safe as tlb_remove_pmd_tlb_entry sets need_flush and that
> > means that the full range is flushed.
>
> Dohh... But we need need_flush_all and that is not set here. So this
> really looks buggy.

This is a really dumb attempt to fix this but maybe it is worth trying
to confirm we are really seeing this problem. It still flushes too much
potentially but I am not sure how to find out the proper start...
Will think about it more.
---
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a92012a..a16f452 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1381,7 +1381,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
VM_BUG_ON(!PageHead(page));
tlb->mm->nr_ptes--;
spin_unlock(&tlb->mm->page_table_lock);
- tlb_remove_page(tlb, page);
+ if (!__tlb_remove_page(tlb, page)) {
+ tlb->start = 0;
+ tlb->end = addr + HPAGE_SIZE;
+ tlb_flush_mmu(tlb);
+ }
}
pte_free(tlb->mm, pgtable);
ret = 1;
--
Michal Hocko
SUSE Labs

2013-08-15 18:00:09

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu, Aug 15, 2013 at 5:02 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Aug 15, 2013 at 2:25 AM, Ben Tebulin <[email protected]> wrote:
>>
>> I just cherry-picked e6c495a96ce0 into 3.9.11 and 3.7.10.
>> Unfortunately this does _not resolve_ my issue (too good to be true) :-(
>
> Ho humm. I've found at least one other bug, but that one only affects
> hugepages. Do you perhaps have transparent hugepages enabled? But even
> then it looks quite unlikely.
>
> I'll think about this some more. I'm not happy with how that
> particular whole TLB flushing hack was done, but I need to sleep on
> this.

Ok, so I've slept on it, and here's my current thinking.

The bugs in the TLB handling were all about missing or confused
updates to the TLB range, and the thing is, they were missing or
confused because you had to do really confusing things, and remember
to set the range properly.

And that is because the interface is horrible.

This patch tries to fix the interface instead of trying to patch up
the individual places that *should* set the range some particular way.
Sadly, that means that I had to change the calling convention for
"tlb_gather_mmu()", so the patch is larger than I'd like. But it's all
very straightforward:

(1) instead of passing "fullmm" to tlb_gather_mmu(), pass the
start/end address.

A range of 0 to ~0ul implies "fullmm", and we calculate that with
"!(start | (end+1))"

(2) Because access to start/end now becomes an internal API, the
patch makes *all* TLB gather implementations do this.

So I added start/end fields to the tlb_gather structure as necessary.

Note that some architectures already had "start_range/end_range"
values, and I left those alone (because the new start/end might work a
bit differently), but it's very possible that those could be removed,
and they'd just use the "generic" start/end values. I'm cc'ing the
arch list to see what the reaction to this all is.

(3) I removed all the other games with start/end, because now
start/end is _always_ valid.

Notably, if any caller of "tlb_flush_mmu()" forgets to update the
start/end fields (like I think the hugetlb case did), it is no longer
a bug. The start/end will have been set up by the initialization of
TLB gather, so we're all good.

(4) The ONE exception to (3) is the zap_pte_range() case in
mm/memory.c, which used to do all the special start/end games, and now
instead just updates start/end to be the "chunk" it just worked on
before flushing the TLB, and the "rest of the area" afterwards.

Even that special (4) case is simpler now, imho, exactly because
start/end is a valid range at all points (it used to be that it wasn't
a valid range the first time, since it wasn't set up initially). So
now that code in case (4) makes more sense, but more importantly, now
it should be just an optimization - we *could* have dropped all the
start/end updates, but then we'd just ask the TLB to be flushed for
the whole original range every time.

Anyway. I've booted this, and I'm writing this email with a kernel
running this, BUT:

- I have not compile-tested anything but x86-64, so the non-generic
TLB gather changes are all just done blindly. They are very
straightforward, but still..

- I have no idea whether this will fix the problem Ben sees, but I
feel happier about the code, because now any place that forgets to set
up start/end will work just fine, because they are always valid. Ben,
please test. I'm worried that the problem you see is something even
more fundamentally wrong with the whole "oops, must flush in the
middle" logic, but I'm _hoping_ this fixes it.

- This patch is against current git, so to apply you need to have
that commit e6c495a96ce0 cherry-picked to older kernels first. But
other than that I don't think this code has changed, so it should
apply cleanly.

Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

Linus

Attachments:

patch.diff (10.64 kB)

2013-08-15 18:42:29

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

On Thu, Aug 15, 2013 at 11:29 AM, Bjørn Mork <[email protected]> wrote:
> Linus Torvalds <[email protected]> writes:
>
>> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...
>
> I can see that :-) You have a couple of "unsigned logn"s here.

Just checking that you guys are awake.

Good job. You passed.

Linus

2013-08-15 18:54:44

by Bjørn Mork

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please revert 53a59fc67!

Linus Torvalds <[email protected]> writes:

> Comments? Especially s390, ARM, ia64, sh and um that I edited blindly...

I can see that :-) You have a couple of "unsigned logn"s here.

Bjørn

> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -35,6 +35,7 @@ struct mmu_gather {
> struct mm_struct *mm;
> unsigned int fullmm;
> struct vm_area_struct *vma;
> + unsigned long start, end;
> unsigned long range_start;
> unsigned long range_end;
> unsigned int nr;
> @@ -97,10 +98,12 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
> }
>
> static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

[..]

> diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
> index e61d43d9f689..47745b255721 100644
> --- a/arch/sh/include/asm/tlb.h
> +++ b/arch/sh/include/asm/tlb.h
> @@ -36,10 +36,12 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
> }
>
> static inline void
> -tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_mm_flush)
> +tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned logn end)

2013-08-15 23:06:00

by Ben Tebulin

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

Am 15.08.2013 20:00, schrieb Linus Torvalds:
> Ok, so I've slept on it, and here's my current thinking.
> [...]

Many thoughts which as a user I'm am unable to follow ;-)

> This patch tries to fix the interface instead of trying to patch up
> the individual places that *should* set the range some particular way
> [...]
> This patch is against current git, so to apply you need to have
> that commit e6c495a96ce0 cherry-picked to older kernels first.

I took a shot based on 3.9.11 + e6c495a96ce0. The reason why I don't
simply use the current git master is, that for some reasons my
linux-image-*.deb become 750MB and larger since 3.10.y and I have no
clue at all why and what to do about it.

The patch failed. Due to my outstanding incompetence I resorted into
applying it onto master, cherry-picking that back and trying to resolve
the remaining conflicts correctly.

> - I have no idea whether this will fix the problem Ben sees, but I
> feel happier about the code, because now any place that forgets to set
> up start/end will work just fine, because they are always valid.

Simpler code? Resilient API? Happy people? Great!

> Ben, please test. I'm worried that the problem you see is something
> even more fundamentally wrong with the whole "oops, must flush in the
> middle" logic, but I'm _hoping_ this fixes it.

It's gone.

Really!

I git-fsck'ed successfully around 30 times in a row.
And even all the other things still seem to work ;-)

Honestly I have to confess that I'm deeply impressed how this finally
worked out: I just threw a particular, innocent-looking commit hash and
nothing more into the round. And while still being unsure if this might
be a plain user space issue, only 24h later I received a 11kb sized
kernel patch (with blatant typos in it !1! *g* ) apparently solving my
issue.

/me happy now, too! :)

- Ben

2013-08-16 00:33:46

by Linus Torvalds

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin <[email protected]> wrote:
>
>> Ben, please test. I'm worried that the problem you see is something
>> even more fundamentally wrong with the whole "oops, must flush in the
>> middle" logic, but I'm _hoping_ this fixes it.
>
> It's gone.
>
> Really!
>
> I git-fsck'ed successfully around 30 times in a row.
> And even all the other things still seem to work ;-)

Goodie. I think I'm just going to commit it (with the speling fixes
for other architectures) asap. It's bigger than I'd like, but it's a
lot simpler than the alternatives of trying to figure out exactly
which call chain got things wrong with the previous confusing model.

Thanks for bisecting and testing.

> Honestly I have to confess that I'm deeply impressed how this finally
> worked out: I just threw a particular, innocent-looking commit hash and
> nothing more into the round.

Being able to bisect the exact commit that introduced the bad behavior
is *very* powerful debugging aid, and in fact the smaller and more
innocent-looking the bisected commit is, the easier it generally is to
then say "ok, it must be related to this one particular issue". So the
bisection really pinpointed the area. After that it was just a matter
of reading the source code and seeing what looked suspicious.

I'll probably delay committing it until tomorrow, in the hope that
somebody using one of the other architectures will at least ack that
it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
fixes) just to encourage that. Hint hint, everybody..

Linus

Attachments:

patch.diff (10.64 kB)

2013-08-16 06:22:35

by Stephen Rothwell

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

Hi Linus,

On Thu, 15 Aug 2013 17:33:28 -0700 Linus Torvalds <[email protected]> wrote:
>
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

I built all the (major) PowerPC defconfigs, allnoconfig and allmodconfig
and they built as well as they did before this patch (i.e. some failed
for other reasons). I have not done any boot testing on PowerPC.

--
Cheers,
Stephen Rothwell [email protected]

Attachments:

(No filename) (653.00 B)
(No filename) (836.00 B)
Download all attachments

2013-08-16 07:55:57

by Richard Weinberger

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

On Fri, Aug 16, 2013 at 2:33 AM, Linus Torvalds
<[email protected]> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

/me tested arch/um, so far everything looks good. :-)

--
Thanks,
//richard

2013-08-16 11:00:36

by Michal Hocko

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

On Thu 15-08-13 17:33:28, Linus Torvalds wrote:
> On Thu, Aug 15, 2013 at 4:05 PM, Ben Tebulin <[email protected]> wrote:
> >
> >> Ben, please test. I'm worried that the problem you see is something
> >> even more fundamentally wrong with the whole "oops, must flush in the
> >> middle" logic, but I'm _hoping_ this fixes it.
> >
> > It's gone.
> >
> > Really!
> >
> > I git-fsck'ed successfully around 30 times in a row.
> > And even all the other things still seem to work ;-)
>
> Goodie. I think I'm just going to commit it (with the speling fixes
> for other architectures) asap. It's bigger than I'd like, but it's a
> lot simpler than the alternatives of trying to figure out exactly
> which call chain got things wrong with the previous confusing model.

I was thinking about teaching __tlb_remove_page to update the range
automatically from the given address.

But your patch looks good to me as well.

Feel free to add
Reviewed-by: Michal Hocko <[email protected]>

Thanks!
--
Michal Hocko
SUSE Labs

2013-08-16 11:29:02

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

On Fri, Aug 16, 2013 at 01:00:31PM +0200, Michal Hocko wrote:

> I was thinking about teaching __tlb_remove_page to update the range
> automatically from the given address.

The mmu_gather unification stuff I had did it differently still:

http://permalink.gmane.org/gmane.linux.kernel.mm/81287

That said, I do like Linus' approach. The only thing I haven't
considered is if it does the right thing for tile,mips-r4k which have
'special' rules for VM_HUGETLB. Although I don't think it changes those
archs enough to break anything.

I should find some time to finally finish that series :/

2013-08-17 00:09:34

by Tony Luck

[permalink] [raw]

Subject: Re: [Bug] Reproducible data corruption on i5-3340M: Please continue your great work! :-)

On Thu, Aug 15, 2013 at 5:33 PM, Linus Torvalds
<[email protected]> wrote:
> I'll probably delay committing it until tomorrow, in the hope that
> somebody using one of the other architectures will at least ack that
> it compiles. I'm re-attaching the patch (with the two "logn" -> "long"
> fixes) just to encourage that. Hint hint, everybody..

I see I'm too late to supply an Ack for the commit, because it is already in.
But just for completeness sake - all my ia64 configs build OK, and the couple
that get boot tested still appear to be working too.

-Tony