Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262212AbVDFOQ1 (ORCPT ); Wed, 6 Apr 2005 10:16:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262213AbVDFOQ1 (ORCPT ); Wed, 6 Apr 2005 10:16:27 -0400 Received: from 167.imtp.Ilyichevsk.Odessa.UA ([195.66.192.167]:1160 "HELO port.imtp.ilyichevsk.odessa.ua") by vger.kernel.org with SMTP id S262212AbVDFOQV (ORCPT ); Wed, 6 Apr 2005 10:16:21 -0400 From: Denis Vlasenko To: linux-os@analogic.com, Dave Korn Subject: Re: [BUG mm] "fixed" i386 memcpy inlining buggy Date: Wed, 6 Apr 2005 17:16:07 +0300 User-Agent: KMail/1.5.4 Cc: Denis Vlasenko , Christophe Saout , Andrew Morton , Jan Hubicka , Gerold Jury , jakub@redhat.com, Linux Kernel Mailing List , gcc@gcc.gnu.org References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200504061716.07895.vda@port.imtp.ilyichevsk.odessa.ua> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1778 Lines: 54 On Wednesday 06 April 2005 16:18, Richard B. Johnson wrote: > > Attached is inline ix86 memcpy() plus test code that tests its > corner-cases. The in-line code makes no jumps, but uses longword > copies, word copies and any spare byte copy. It works at all > offsets, doesn't require alignment but would work fastest if > both source and destination were longword aligned. Yours is: "shr $1, %%ecx\n" \ "pushf\n" \ "shr $1, %%ecx\n" \ "pushf\n" \ <=== not needed "rep\n" \ "movsl\n" \ "popf\n" \ <=== not needed "adcl %%ecx, %%ecx\n" \ "rep\n" \ "movsw\n" \ "popf\n" \ "adcl %%ecx, %%ecx\n" \ "rep\n" \ "movsb\n" \ You struggle too much for that movsw. -mm one (which happen to be mine) is: "movl %ecx,%4" "shr $2,%ecx" "rep ; movsl" "movl %4,%%ecx" "andl $3,%%ecx" "jz 1ft" /* pay 2 byte penalty for a chance to skip microcoded rep */ "rep ; movsb" "1:" and I can still drop that jz. It is there just to have a chance to skip rep movsb, it was measured to be slow enough to matter. rep movs are a bit slow to start, on small blocks it is measurable. However, maybe it is even better without jz, need to benchmark 'cold path' (i.e. where branch predictor have no data to predict it) somehow. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/