2006-12-16 16:16:29

by Martin Michlmayr

[permalink] [raw]
Subject: Recent mm changes leading to filesystem corruption?

Debian recently applied a number of mm changes that went into 2.6.19
to their 2.6.18 kernel for LSB 3.1 compliance (msync() had problems
before). Since then, some filesystem corruption has been observed
which can be traced back to these mm changes. Is anyone aware of
problems with these patches?

The patches that were applied are:

- mm: tracking shared dirty pages
- mm: balance dirty pages
- mm: optimize the new mprotect() code a bit
- mm: small cleanup of install_page()
- mm: fixup do_wp_page()
- mm: msync() cleanup

With these applied to 2.6.18, the Debian installer on a slow ARM
system fails because a program segfaults due to filesystem corruption:
http://bugs.debian.org/401980 This problem also occurs if you only
apply the "mm: tracking shared dirty pages" patch to 2.6.18 from the
series of 5 patches listed above.

Another problem has been reported related to libtorrent: according to
http://bugs.debian.org/402707 someone also saw this with non-Debian
2.6.19 but obviously it's hard to say whether the bugs are really
related.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=394392;msg=24 shows
some dmesg messages but again it's not 100% clear it's the same bug.

Has anyone else seen problems or is aware of a fix to the patches
listed above that I'm unaware of? It's possible the problem only
shows up on slow systems. (The corruption is reproducible on a slow
NSLU2 ARM system with 32 MB ram, but it doesn't happen on a faster ARM
box with more RAM.)
--
Martin Michlmayr
http://www.cyrius.com/


2006-12-16 18:37:38

by Hugh Dickins

[permalink] [raw]
Subject: Re: Recent mm changes leading to filesystem corruption?

On Sat, 16 Dec 2006, Martin Michlmayr wrote:

> Debian recently applied a number of mm changes that went into 2.6.19
> to their 2.6.18 kernel for LSB 3.1 compliance (msync() had problems
> before). Since then, some filesystem corruption has been observed
> which can be traced back to these mm changes. Is anyone aware of
> problems with these patches?

Very disturbing. I'm not aware of any problem with them, and we
surely wouldn't have released 2.6.19 with any known-corrupting patches
in. There's some doubts about 2.6.19 itself in the links below: were
it not for those, I'd suspect a mismerge of the pieces into 2.6.18,
perhaps a hidden dependency on something else. I'll ponder a little,
but let's CC linux-mm in case someone there has an idea.

Hugh

>
> The patches that were applied are:
>
> - mm: tracking shared dirty pages
> - mm: balance dirty pages
> - mm: optimize the new mprotect() code a bit
> - mm: small cleanup of install_page()
> - mm: fixup do_wp_page()
> - mm: msync() cleanup
>
> With these applied to 2.6.18, the Debian installer on a slow ARM
> system fails because a program segfaults due to filesystem corruption:
> http://bugs.debian.org/401980 This problem also occurs if you only
> apply the "mm: tracking shared dirty pages" patch to 2.6.18 from the
> series of 5 patches listed above.
>
> Another problem has been reported related to libtorrent: according to
> http://bugs.debian.org/402707 someone also saw this with non-Debian
> 2.6.19 but obviously it's hard to say whether the bugs are really
> related.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=394392;msg=24 shows
> some dmesg messages but again it's not 100% clear it's the same bug.
>
> Has anyone else seen problems or is aware of a fix to the patches
> listed above that I'm unaware of? It's possible the problem only
> shows up on slow systems. (The corruption is reproducible on a slow
> NSLU2 ARM system with 32 MB ram, but it doesn't happen on a faster ARM
> box with more RAM.)
> --
> Martin Michlmayr
> http://www.cyrius.com/

2006-12-16 18:44:58

by Martin Michlmayr

[permalink] [raw]
Subject: Re: Recent mm changes leading to filesystem corruption?

* Hugh Dickins <[email protected]> [2006-12-16 18:20]:
> Very disturbing. I'm not aware of any problem with them, and we
> surely wouldn't have released 2.6.19 with any known-corrupting patches
> in. There's some doubts about 2.6.19 itself in the links below: were
> it not for those, I'd suspect a mismerge of the pieces into 2.6.18,
> perhaps a hidden dependency on something else. I'll ponder a little,
> but let's CC linux-mm in case someone there has an idea.

Do you think http://article.gmane.org/gmane.linux.kernel/473710 might
be related?
--
Martin Michlmayr
http://www.cyrius.com/

2006-12-16 19:06:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: Recent mm changes leading to filesystem corruption?

On Sat, 16 Dec 2006, Martin Michlmayr wrote:
> * Hugh Dickins <[email protected]> [2006-12-16 18:20]:
> > Very disturbing. I'm not aware of any problem with them, and we
> > surely wouldn't have released 2.6.19 with any known-corrupting patches
> > in. There's some doubts about 2.6.19 itself in the links below: were
> > it not for those, I'd suspect a mismerge of the pieces into 2.6.18,
> > perhaps a hidden dependency on something else. I'll ponder a little,
> > but let's CC linux-mm in case someone there has an idea.
>
> Do you think http://article.gmane.org/gmane.linux.kernel/473710 might
> be related?

Sounds like it. Let's CC Jan Kara on your other thread,
he seems to have delved into it a little.

Hugh

2006-12-16 20:55:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Recent mm changes leading to filesystem corruption?

On Sat, 2006-12-16 at 16:50 +0100, Martin Michlmayr wrote:
> Debian recently applied a number of mm changes that went into 2.6.19
> to their 2.6.18 kernel for LSB 3.1 compliance (msync() had problems
> before). Since then, some filesystem corruption has been observed
> which can be traced back to these mm changes. Is anyone aware of
> problems with these patches?

As said by Hugh, no we were not.

> The patches that were applied are:
>
> - mm: tracking shared dirty pages
> - mm: balance dirty pages
> - mm: optimize the new mprotect() code a bit
> - mm: small cleanup of install_page()
> - mm: fixup do_wp_page()
> - mm: msync() cleanup
>
> With these applied to 2.6.18, the Debian installer on a slow ARM
> system fails because a program segfaults due to filesystem corruption:
> http://bugs.debian.org/401980 This problem also occurs if you only
> apply the "mm: tracking shared dirty pages" patch to 2.6.18 from the
> series of 5 patches listed above.

This made me think of a blog entry by DaveM from some time ago:
http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2006/06/09

> Another problem has been reported related to libtorrent: according to
> http://bugs.debian.org/402707 someone also saw this with non-Debian
> 2.6.19 but obviously it's hard to say whether the bugs are really
> related.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=394392;msg=24 shows
> some dmesg messages but again it's not 100% clear it's the same bug.
>
> Has anyone else seen problems or is aware of a fix to the patches
> listed above that I'm unaware of? It's possible the problem only
> shows up on slow systems. (The corruption is reproducible on a slow
> NSLU2 ARM system with 32 MB ram, but it doesn't happen on a faster ARM
> box with more RAM.)

What is not clear from all these reports is what architectures this is
seen on. I suspect some of them are i686, which together with the
explicit mention of ARM make it a cross platform issue.



2006-12-16 21:23:57

by Martin Michlmayr

[permalink] [raw]
Subject: Re: Recent mm changes leading to filesystem corruption?

* Peter Zijlstra <[email protected]> [2006-12-16 21:55]:
> What is not clear from all these reports is what architectures this is
> seen on. I suspect some of them are i686, which together with the
> explicit mention of ARM make it a cross platform issue.

Problems have been seen at least on x86, x86_64 and arm.
--
Martin Michlmayr
[email protected]