Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757290AbXJCPzZ (ORCPT ); Wed, 3 Oct 2007 11:55:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754593AbXJCPzP (ORCPT ); Wed, 3 Oct 2007 11:55:15 -0400 Received: from nz-out-0506.google.com ([64.233.162.232]:45217 "EHLO nz-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754105AbXJCPzM (ORCPT ); Wed, 3 Oct 2007 11:55:12 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=q1/rmi1P6wsN3VvRDlxXMQnRRfiyeUagfcpHtGOFEZYSKYMmZHjDVlh8H0qx4e6AZV8jbFy9WHtyiN8QbYfjt1r/ABe/OT9n7EZ9213xrxEddkghH9pFt8xIdGAGe9Y1kgoOojzRZradUwGC7jrAIPcE+MyR8VpDKbMhJ8A6Ofs= Message-ID: <64bb37e0710030855t360f2216mb4c38cfab6d88f37@mail.gmail.com> Date: Wed, 3 Oct 2007 17:55:10 +0200 From: "Torsten Kaiser" To: "Tejun Heo" Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1 Cc: "Jeff Garzik" , linux-kernel@vger.kernel.org, akpm@linux-foundation.org, "Matt Mackall" In-Reply-To: <64bb37e0710030821u56157ad1s6252ee01e050c7d5@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <64bb37e0709261326h4890a07fx60c7d6772e4e63c4@mail.gmail.com> <46FC1104.8080105@gmail.com> <64bb37e0709272236g7da8f370lc7f737e908725e88@mail.gmail.com> <64bb37e0709292300t39028029n2375899d7ba1e8ce@mail.gmail.com> <46FFB412.20202@gmail.com> <64bb37e0709300919w3e9db6aci4c0b9df43407fff3@mail.gmail.com> <46FFDF64.1080005@gmail.com> <64bb37e0709301139h456a82d6u98630a4d1503eaf@mail.gmail.com> <64bb37e0710011100t2cd81a32g501435b98f783ba9@mail.gmail.com> <64bb37e0710030821u56157ad1s6252ee01e050c7d5@mail.gmail.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2499 Lines: 61 [CC added to author of the bad patch] Short recap: Since 2.6.23-rc4-mm1 all mm-kernel randomly fail one of two drives on my Silicon Image 3132. This failure happens when my initramfs wants to start the RAID that is on these drives. The first error libata throws is: Oct 3 16:56:46 treogen [ 63.320000] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Oct 3 16:56:46 treogen [ 63.320000] ata2.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out Oct 3 16:56:46 treogen [ 63.320000] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 3 16:56:46 treogen [ 63.320000] ata2.00: status: {DRDY } Resetting the sata link fails, the drive is no longer reachable until a reboot. I then bisected the mm-patches from 2.6.23-rc4-mm1 with the following result: On 10/3/07, Torsten Kaiser wrote: > I'm now finished with bisecting, still 2 patches, but I don't want to > spend another two hours waiting... > > And the winners are: (from broken-out patchset of 2.6.23-rc4-mm1) > maps2-simplify-interdependence-of-proc-pid-maps-and-smaps.patch > maps2-move-clear_refs-code-to-task_mmuc.patch > > Before these patches I have never seen the bug, with these I only got > two good boots when trying to recreate the problem. But even the > kernels that did one good boot failed on the second try. The simplify-patch just seems to move some code around, but I see a real change in the other one: This patch removes clear_refs_smap() from fs/proc/task_mmu.c by moving its code to a new function. But during the move the main for-loop from clear_refs_smap was changed: old: for (vma = mm->mmap; vma; vma = vma->vm_next) if (vma->vm_mm && !is_vm_hugetlb_page(vma)) walk_page_range(vma->vm_mm, vma->vm_start, vma->vm_end, &clear_refs_walk, vma); new: for (vma = mm->mmap; vma; vma = vma->vm_next) if (!is_vm_hugetlb_page(vma)) walk_page_range(mm, vma->vm_start, vma->vm_end, &clear_refs_walk, vma); The walk_page_range() is no longer called on vma->vm_mm, but on mm directly. I don't know how this can kill the sata_sil24-driver, but at least it looks suspicious. As I'm not really a kernel hacker, I defer this question to the ones that are. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/