Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965270AbcJGQon (ORCPT ); Fri, 7 Oct 2016 12:44:43 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:35384 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965231AbcJGQoR (ORCPT ); Fri, 7 Oct 2016 12:44:17 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161007003850.GA7197@kernel.org> From: Linus Torvalds Date: Fri, 7 Oct 2016 09:44:15 -0700 X-Google-Sender-Auth: lWJREH-_yD_HqP1VvWGxA_q4MaQ Message-ID: Subject: Re: [GIT PULL] MD update for 4.9 To: doug@easyco.com Cc: Shaohua Li , Linux Kernel Mailing List , linux-raid , Neil Brown Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1580 Lines: 32 On Thu, Oct 6, 2016 at 10:39 PM, Doug Dumitru wrote: > > There is another thread in [linux-raid] discussing pre-fetches in the > raid-6 AVX2 code. My testing implies that the prefetch distance is > too short. In your new AVX512 code, it looks like there are 24 > instructions, each with latencies of 1, between the prefetch and the > actual memory load. I don't have a AVX512 CPU to try this on, but the > prefetch might do better at a bigger distance. If I am not mistaken, > it takes a lot longer than 24 clocks to fetch 4 cache lines. We have basically never had a case where prefetches were actually a good idea. If the hardware doesn't do prefetching on its own (partly with just physical memory patterns in the memory controller, partly just with aggressive OoO), software isn't going to be able to improve on the situation in general. SW prefetching is a broken concept. You can make big differences for very specific microarchitectures (usually the broken shit ones are the ones that show it best), but in the general case it's pretty much always a lost cause. We've had real cases where prefetching just then made things worse on other hardware. So just don't do it. It's benchmarketing for specific hardware, it's not worth worrying about in the bigger picture. You'll find people spend a lot of time tuning things for their particular hardware, and it not helping at all on anything else. Waste of time. Life is too short (and software is too complex) to try to work around broken microarchitectures with sw prefetching. Linus