Received: by 10.213.65.68 with SMTP id h4csp217736imn; Tue, 20 Mar 2018 01:28:27 -0700 (PDT) X-Google-Smtp-Source: AG47ELteBp+eixR+DUxF2bAN+HlsQONHd8vIZJi6TlXNpLhUh/J3JI8rsW/lQrI18UvZaVOObRKl X-Received: by 2002:a17:902:f81:: with SMTP id 1-v6mr15562716plz.265.1521534507076; Tue, 20 Mar 2018 01:28:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521534507; cv=none; d=google.com; s=arc-20160816; b=lye07vUJQPXR0KiJKR+bl8z1vkzdaEDx0i464ZIRoIntmGAiQR54j/g6SFM1u9hgL8 X0/bwHe5habU9vD83ee6LqVZOvdwe3PBcuHAXcc3aEkOTS6aeRd/JjFgzOcL+A9Ol9+D JCnSbURH9AIH58E1P9O3hUc+4JF3CUgm3xnqimmwmvXSHSwDz2tB+KlKo9Of6IfWJlWw 7W82yV1LZIjF2bbA1Bqb+gC5sppaWJBAmG1iBO2LV6oPa8cSQFNu6rCpN9HspqZdNc9f OfKgUFadOAKQMSABSUvRid/rIu9cdatDAsK+gmQ3ZM0Nx6MHHnoIPli6q2VpBTSn2kHs HcCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=pDBFb3UYjLUQGEo9co1nKWPouQqr+wfSRKzivbDm41U=; b=WS3ivxScI+G1GvKU+JF5I4JFvz3+hGngAt+77Jqm9Q508gnEaX25EwyWpPmeJlQy85 Qk+Ph0nyS9eb3vZkDORhQTDWOhmB/HZeDwWWDYP8F8Ri8o9TnFPI8YcUUu9Ot25k1dTR l4ahX9lqqVcqfOAce9L12pftF0rw11AqLDuNLXaN3LVeqGXOjaYfueSr7AD9PSP1Tgfx rAbmOmMIijEq6BQb5y6PbCSYYjYULNU/eywDNA8AnjWel8M8nrBLifS2MIneqyetfR0P 5NFdMa+rP/6NHxxn3Z3VR+UNZRr0m+Ldd5+P+0YlG61RyiE2pyobH8BIcMWe95D04pLP 2CBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Tp2cg5Td; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g23-v6si1099877plo.697.2018.03.20.01.28.10; Tue, 20 Mar 2018 01:28:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Tp2cg5Td; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751873AbeCTI1B (ORCPT + 99 others); Tue, 20 Mar 2018 04:27:01 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:33320 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750991AbeCTI04 (ORCPT ); Tue, 20 Mar 2018 04:26:56 -0400 Received: by mail-wm0-f67.google.com with SMTP id i189so591606wmf.0; Tue, 20 Mar 2018 01:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=pDBFb3UYjLUQGEo9co1nKWPouQqr+wfSRKzivbDm41U=; b=Tp2cg5Tdq3OZ9vLPAtv4WGQDsTwe1ZBnCJdM5KybBfVWi5JY/3c9ObODpngVKioEIu e6vXYyjdBjxGwEy1YaSig6SuEoQ7vTyDtR9fCIsugM9POrBdnpCOB9NiAVJ8iw/7L8MJ Lq4pnPJSDWmvzKU5Ro0GnpRV83RT7q/oMLYsOBKao8AgZ57bOm1RQFW+e3I6K2MPVFWx kf/KcPaA5odbuV6+qPYtmDwCf86IUiBexDH/B7TiGEuLMFII/71oDLL5Rz23JL9oOvyg TzDjvZqqjUiDU+BZG5S6srNV4lURqo2srmncsH47UKyn4zb0bMQ5+2L0CWOQFrthqwgg 8tKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=pDBFb3UYjLUQGEo9co1nKWPouQqr+wfSRKzivbDm41U=; b=WeUq/Vmk15QHZykfENCgD0IlHdN7/LbP6ZZ6AjjesjFdli5cmwcakI0XTuM/jC/NkX m3pKQRq/fJ2JfNef4+n8n8l1CZkASym48tOG/dWg/USkreNuYQoUnKdMt0wdqywvotUa YJhYTd4DqE+RWPQ8ijlWbNjeo0uHmR0Fmqf78LQmixz9UkKs65uh0OvkSMyQpv4wwbTm QEQaJjkKhW909F2s/2VMgz5ncMYAXpSStP4+wjTRw8IO3tnE9Td98uXZus/GIDaFDyZX RzmutH/YS/PfaxEW94TczR0Q06+O4yEbhelgJZUZBLVRGR/6aaVInLc7B4iyZKrcktSX Hiyg== X-Gm-Message-State: AElRT7ElywvWFKC1Oo0Ai7kklyTbrhxiOaALslMA5kixkwfGSrhSluv+ aX1MRXCcrpzGZITLOS+zcDQ= X-Received: by 10.28.105.218 with SMTP id z87mr552625wmh.100.1521534415273; Tue, 20 Mar 2018 01:26:55 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id b99sm837323wrd.75.2018.03.20.01.26.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 20 Mar 2018 01:26:54 -0700 (PDT) Date: Tue, 20 Mar 2018 09:26:51 +0100 From: Ingo Molnar To: Thomas Gleixner Cc: David Laight , 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Fenghua Yu , Eric Biggers Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Message-ID: <20180320082651.jmxvvii2xvmpyr2s@gmail.com> References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Thomas Gleixner wrote: > > Useful also for code that needs AVX-like registers to do things like CRCs. > > x86/crypto/ has a lot of AVX optimized code. Yeah, that's true, but the crypto code is processing fundamentally bigger blocks of data, which amortizes the cost of using kernel_fpu_begin()/_end(). kernel_fpu_begin()/_end() is a pretty heavy operation because it does a full FPU save/restore via the XSAVE[S] and XRSTOR[S] instructions, which can easily copy a thousand bytes around! So kernel_fpu_begin()/_end() is probably a non-starter for something small, like a single 256-bit or 512-bit word access. But there's actually a new thing in modern kernels: we got rid of (most of) lazy save/restore FPU code, our new x86 FPU model is very "direct" with no FPU faults taken normally. So assuming the target driver will only load on modern FPUs I *think* it should actually be possible to do something like (pseudocode): vmovdqa %ymm0, 40(%rsp) vmovdqa %ymm1, 80(%rsp) ... # use ymm0 and ymm1 ... vmovdqa 80(%rsp), %ymm1 vmovdqa 40(%rsp), %ymm0 ... without using the heavy XSAVE/XRSTOR instructions. Note that preemption probably still needs to be disabled and possibly there are other details as well, but there should be no 'heavy' FPU operations. I think this should still preserve all user-space FPU state and shouldn't muck up any 'weird' user-space FPU state (such as pending exceptions, legacy x87 running code, NaN registers or weird FPU control word settings) we might have interrupted either. But I could be wrong, it should be checked whether this sequence is safe. Worst-case we might have to save/restore the FPU control and tag words - but those operations should still be much faster than a full XSAVE/XRSTOR pair. So I do think we could do more in this area to improve driver performance, if the code is correct and if there's actual benchmarks that are showing real benefits. Thanks, Ingo