Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751933AbdCEJvO (ORCPT ); Sun, 5 Mar 2017 04:51:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:44449 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750773AbdCEJvN (ORCPT ); Sun, 5 Mar 2017 04:51:13 -0500 Date: Sun, 5 Mar 2017 10:50:59 +0100 From: Borislav Petkov To: hpa@zytor.com Cc: Logan Gunthorpe , Thomas Gleixner , Ingo Molnar , Tony Luck , Al Viro , the arch/x86 maintainers , Linux Kernel Mailing List Subject: Re: Question Regarding ERMS memcpy Message-ID: <20170305095059.l4od2yjqm5yxx6ln@pd.tnic> References: <20170304224341.zfp4fl37ypt57amg@pd.tnic> <5CCEF10D-5647-4503-A398-0681DF2C8847@zytor.com> <20170305001447.kcxignj3nsq35vci@pd.tnic> <20170305003349.6kgq4ovj7ipezfxu@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1387 Lines: 44 On Sat, Mar 04, 2017 at 04:56:38PM -0800, hpa@zytor.com wrote: > That's what the -march= and -mtune= option do! How does that even help with a distro kernel built with -mtune=generic ? gcc can't possibly know on what targets is that kernel going to be booted on. So it probably does some universally optimal things, like in the dmi_scan_machine() case: memcpy_fromio(buf, p, 32); turns into: .loc 3 219 0 movl $8, %ecx #, tmp79 movq %rax, %rsi # p, p movq %rsp, %rdi #, tmp77 rep movsl Apparently it thinks it is fine to do 8*4-byte MOVS. But why not 4*8-byte MOVS? That's half the loops. [ It is a whole different story what the machine actually does underneath. It being a half cacheline probably doesn't help and it really does the separate MOVs but then it would be cheaper if it did 4 8-byte ones. ] One thing's for sure - both variants are certainly cheaper than to CALL a memcpy variant. What we probably should try to do, though, is simply patch in the body of REP; MOVSQ or REP; MOVSB into the call sites and only have a call to memcpy_orig() because that last one if fat. I remember we did talk about it at some point but don't remember why we didn't do it. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --