Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752298Ab3HZW2r (ORCPT ); Mon, 26 Aug 2013 18:28:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26717 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734Ab3HZW2p (ORCPT ); Mon, 26 Aug 2013 18:28:45 -0400 Date: Mon, 26 Aug 2013 18:28:33 -0400 From: Dave Jones To: Hugh Dickins Cc: Linus Torvalds , Cyrill Gorcunov , Hillf Danton , Linux-MM , Linux Kernel Subject: Re: unused swap offset / bad page map. Message-ID: <20130826222833.GA24320@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Cyrill Gorcunov , Hillf Danton , Linux-MM , Linux Kernel References: <20130821204901.GA19802@redhat.com> <20130823032127.GA5098@redhat.com> <20130823035344.GB5098@redhat.com> <20130826190757.GB27768@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2720 Lines: 61 On Mon, Aug 26, 2013 at 03:08:45PM -0700, Hugh Dickins wrote: > > That said, google does find "swap_free: Unused swap offset entry" > > reports from over the years. Most of them seem to be single-bit > > errors, though (ie when the entry is 00000100 or similar I'm more > > inclined to blame a bit error > > Yes, historically they have usually represented either single-bit > errors, or corruption of page tables by other kernel data. The > swap subsystem discovers it, but it's rarely an error of swap. Just to rule out bad hardware, I've seen this on two systems (admittedly the exact same spec, but still..) > So I don't care for Dave's suggestion much earlier in this thread, > that swapoff should fail with -EINVAL if there has been a bad page > taint: that doesn't necessarily interfere with swapoff at all. > > And besides, swapoff is killable: yes, if counts go wrong, it > can cycle around endlessly, but it checks for signal_pending() > each time around the loop. It might be killable, but if I've done /sbin/reboot, and the kernel dies in sys_swapoff because of the corruption, I won't get a chance to kill it, because at that point the shutdown process has killed my shell, sshd, and just about everything else. It mieans a grumpy walk to the other side of the house to prod a reset button. So yeah, it might not be a mergable thing, but at least while bisecting it's pretty much a must-have. > I just did a quick diff of 3.11-rc7/mm against 3.10, and here's > a line in mremap which worries me. That set_pte_at() is operating > on anything that isn't pte_none(), so the pte_mksoft_dirty() looks > prone to corrupt a swap entry. > > I've not tried matching up bits with Dave's reports, and just going > into a meeting now, but this patch looks worth a try: probably Cyrill > can improve it meanwhile to what he actually wants there (I'm > surprised anything special is needed for just moving a pte). > > Hugh > > --- 3.11-rc7/mm/mremap.c 2013-07-14 17:10:16.640003652 -0700 > +++ linux/mm/mremap.c 2013-08-26 14:46:14.460027627 -0700 > @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_str > continue; > pte = ptep_get_and_clear(mm, old_addr, old_pte); > pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); > - set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte)); > + set_pte_at(mm, new_addr, new_pte, pte); > } I'll give this a shot once I'm done with the bisect. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/