Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757624Ab3DQFau (ORCPT ); Wed, 17 Apr 2013 01:30:50 -0400 Received: from mail-ia0-f177.google.com ([209.85.210.177]:34370 "EHLO mail-ia0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754486Ab3DQFat (ORCPT ); Wed, 17 Apr 2013 01:30:49 -0400 Message-ID: <516E3383.5060105@gmail.com> Date: Wed, 17 Apr 2013 13:30:43 +0800 From: Simon Jeons User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: Andi Kleen CC: Mitsuhiro Tanino , linux-kernel , linux-mm Subject: Re: [RFC Patch 0/2] mm: Add parameters to make kernel behavior at memory error on dirty cache selectable References: <51662D5B.3050001@hitachi.com> <20130411134915.GH16732@two.firstfloor.org> In-Reply-To: <20130411134915.GH16732@two.firstfloor.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2423 Lines: 59 On 04/11/2013 09:49 PM, Andi Kleen wrote: >> As a result, if the dirty cache includes user data, the data is lost, >> and data corruption occurs if an application uses old data. Hi Andi, Could you give me the link of your mce testcase? > The application cannot use old data, the kernel code kills it if it > would do that. And if it's IO data there is an EIO triggered. > > iirc the only concern in the past was that the application may miss > the asynchronous EIO because it's cleared on any fd access. > > This is a general problem not specific to memory error handling, > as these asynchronous IO errors can happen due to other reason > (bad disk etc.) > > If you're really concerned about this case I think the solution > is to make the EIO more sticky so that there is a higher chance > than it gets returned. This will make your data much more safe, > as it will cover all kinds of IO errors, not just the obscure memory > errors. > > Or maybe have a panic knob on any IO error for any case if you don't > trust your application to check IO syscalls. But I would rather > have better EIO reporting than just giving up like this. > > The problem of tying it just to any dirty data for memory errors > is that most anonymous data is dirty and it doesn't have this problem > at all (because the signals handle this and they cannot be lost) > > And that is a far more common case than this relatively unlikely > case of dirty IO data. > > So just doing it for "dirty" is not the right knob. > > Basically I'm saying if you worry about unreliable IO error reporting > fix IO error reporting, don't add random unnecessary panics to > the memory error handling. > > BTW my suspicion is that if you approach this from a data driven > perspective: that is measure how much such dirty data is typically > around in comparison to other data it will be unlikely. Such > a study can be done with the "page-types" program in tools/vm > > -Andi > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/