Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760475AbYACPsy (ORCPT ); Thu, 3 Jan 2008 10:48:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759404AbYACPog (ORCPT ); Thu, 3 Jan 2008 10:44:36 -0500 Received: from rtr.ca ([76.10.145.34]:2520 "EHLO mail.rtr.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759372AbYACPod (ORCPT ); Thu, 3 Jan 2008 10:44:33 -0500 Message-ID: <477D02E0.5040301@rtr.ca> Date: Thu, 03 Jan 2008 10:44:32 -0500 From: Mark Lord User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: Robert Hancock Cc: Mark Lord , Allen Martin , Jeff Garzik , Tejun Heo , Gabor Gombas , linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org, Kuan Luo , Peer Chen Subject: Re: sata_nv + ADMA + Samsung disk problem References: <20070808120804.GB5257@boogie.lpds.sztaki.hu> <20080101164416.GA29574@boogie.lpds.sztaki.hu> <477B0429.7040909@gmail.com> <477B0CFD.1030603@shaw.ca> <477BDEA5.8040701@garzik.org> <477C2A99.9010208@shaw.ca> <477C61D3.30009@rtr.ca> <477C6A85.9020607@shaw.ca> In-Reply-To: <477C6A85.9020607@shaw.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2650 Lines: 56 Robert Hancock wrote: > Mark Lord wrote: >> Robert Hancock wrote: >> .. >>> From some of the traces I took previously (posted on LKML as >>> "sata_nv ADMA controller lockup investigation" way back in Feb 07), >>> what seems to occur is that when the second command is issued very >>> rapidly (within less than 20 microseconds, or potentially longer) >>> after the previous command's completion, the ADMA status changes from >>> 0x500 (STOPPED and IDLE) to 0x400 (just IDLE) as it typically does, >>> but then it sticks there, no interrupt is ever raised, and CPB >>> response flags remain at 0. >> .. >> >> Assuming that NVidia got their ADMA core logic from Pacific Digital >> (the inventors), then it may have some of the same bugs as the original. >> >> One of those bugs is that the aGO trigger is sampled in a "racey" way, >> such that it sometimes may miss a recent addition to the ring. >> >> The *only* way to guarantee things with the original Pacific Digital core >> was to (1) always retrigger aGO for a full ring scan with each new >> addition, >> and (2) poll periodically (every half second or so) rather than relying >> exclusively on the IRQ actually working.. >> >> Dunno about the NVidia version. > > Theirs works rather differently - the GO bit is there, but there's > another append register which is used to tell the controller that a new > tag has been added to the CPB list. .. The PacDigi core uses a "search count" register for that purpose, but the buggy nature of the core required that it always be set to "2 * ring_size" to ensure nothing got missed. Here's some comments from the original ADMA driver. Maybe something from here might help with the NV stuff, too. // There is a chance that the chip will skip over a CPB if a SERVICE interrupt // occurs while it's reading the CPB header. This won't cause us to get // stuck anywhere, but it might slow down execution of the new CPB if // it has to wait for the next time we hit aGO. So.. Dxxx/Dxxx suggest // that all we need to do is tell the chip to do two passes around the ring // from an aGO instead of one pass, so that it will find the "missed" CPB // on the second pass. This isn't as bad as it first looks. // writew(channel->num_cpbs * 2, &adma_regs->cpb_search_count); Or again, the NV stuff may be completely different (?). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/