Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753795Ab3J3K3u (ORCPT ); Wed, 30 Oct 2013 06:29:50 -0400 Received: from mx0.aculab.com ([213.249.233.131]:47662 "HELO mx0.aculab.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751353Ab3J3K3s convert rfc822-to-8bit (ORCPT ); Wed, 30 Oct 2013 06:29:48 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 8BIT Subject: RE: [PATCH] x86: Run checksumming in parallel accross multiple alu's Date: Wed, 30 Oct 2013 10:27:30 -0000 Message-ID: In-Reply-To: <201310300525.r9U5Pdqo014902@ib.usersys.redhat.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] x86: Run checksumming in parallel accross multiple alu's Thread-Index: Ac7VMEhEoPA1tpjMQzSXmI3yif0kwAAJbglg References: 20131029202644.GB32389@localhost.localdomain <201310300525.r9U5Pdqo014902@ib.usersys.redhat.com> From: "David Laight" To: "Doug Ledford" , "Neil Horman" Cc: "Ingo Molnar" , "Eric Dumazet" , , Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1423 Lines: 29 > The parallel ALU design of this patch seems OK at first glance, but it means > that two parallel operations are both trying to set/clear both the overflow > and carry flags of the EFLAGS register of the *CPU* (not the ALU). So, either > some CPU in the past had a set of overflow/carry flags per ALU and did some > sort of magic to make sure that the last state of those flags across multiple > ALUs that might have been used in parallelizing work were always in the CPU's > logical EFLAGS register, or the CPU has a buggy microcode that allowed two > ALUs to operate on data at the same time in situations where they would > potentially stomp on the carry/overflow flags of the other ALUs operations. IIRC x86 cpu treat the (arithmetic) flags register as a single entity. So an instruction that only changes some of the flags is dependant on any previous instruction that changes any flags. OTOH it the instruction writes all of the flags then it doesn't have to wait for the earlier instruction to complete. This is problematic for the ADC chain in the IP checksum. I did once try to use the SSE instructions to sum 16bit fields into multiple 32bit registers. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/