Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753304Ab1EQIqp (ORCPT ); Tue, 17 May 2011 04:46:45 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:44464 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752142Ab1EQIqn (ORCPT ); Tue, 17 May 2011 04:46:43 -0400 Date: Tue, 17 May 2011 10:46:22 +0200 From: Ingo Molnar To: Huang Ying Cc: Len Brown , linux-kernel@vger.kernel.org, Andi Kleen , Tony Luck , linux-acpi@vger.kernel.org, Andi Kleen , Wu Fengguang , Andrew Morton , Linus Torvalds , Peter Zijlstra , Borislav Petkov Subject: Re: [PATCH 5/9] HWPoison: add memory_failure_queue() Message-ID: <20110517084622.GE22093@elte.hu> References: <1305619719-7480-1-git-send-email-ying.huang@intel.com> <1305619719-7480-6-git-send-email-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1305619719-7480-6-git-send-email-ying.huang@intel.com> User-Agent: Mutt/1.5.20 (2009-08-17) X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.3.1 -2.0 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2487 Lines: 63 * Huang Ying wrote: > memory_failure() is the entry point for HWPoison memory error > recovery. It must be called in process context. But commonly > hardware memory errors are notified via MCE or NMI, so some delayed > execution mechanism must be used. In MCE handler, a work queue + ring > buffer mechanism is used. > > In addition to MCE, now APEI (ACPI Platform Error Interface) GHES > (Generic Hardware Error Source) can be used to report memory errors > too. To add support to APEI GHES memory recovery, a mechanism similar > to that of MCE is implemented. memory_failure_queue() is the new > entry point that can be called in IRQ context. The next step is to > make MCE handler uses this interface too. > > Signed-off-by: Huang Ying > Cc: Andi Kleen > Cc: Wu Fengguang > Cc: Andrew Morton > --- > include/linux/mm.h | 1 > mm/memory-failure.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 93 insertions(+) I have to say i disagree with how this is designed and how this is exposed to user-space - and i pointed this out before. It's up to Len whether you muck up drivers/acpi/ but here you are patching mm/ again ... I just had a quick look into the current affairs of mm/memory-inject.c and it has become an *even* nastier collection of hacks since the last time i commented on its uglies. Special hack upon special hack, totally disorganized code, special-purpose, partly ioctl driven opaque information extraction to user-space using the erst-dbg device interface. We have all the maintenance overhead and little of the gains from hw error event features... In this patch you add: +struct memory_failure_entry { + unsigned long pfn; + int trapno; + int flags; +}; Instead of exposing this event to other users who might be interested in these events - such as the RAS daemon under development by Boris. We have a proper framework (ring-buffer, NMI execution, etc.) for reporting events, why are you not using (and extending) it instead of creating this nasty looking, isolated, ACPI specific low level feature? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/