Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp5234903ybx; Sun, 10 Nov 2019 09:20:16 -0800 (PST) X-Google-Smtp-Source: APXvYqxqJeBLAHFPkS2egbn/gVSPdb66BH8NVQDS59REUcbwrqZDuDVO/6l03/LqtrSjMyNd6YEO X-Received: by 2002:a17:906:945:: with SMTP id j5mr19302118ejd.211.1573406416516; Sun, 10 Nov 2019 09:20:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573406416; cv=none; d=google.com; s=arc-20160816; b=KiEmKlCvNdmabI3vme5WoNFXpdbpKod6EnKnpEaEBhUQUUo85OKFUOKGg8do9harZP KzfIC4QLxpBiIfg9bwljRJzhSbYMr+rtz99YAQatuYAVdhi0p1AzmX+NTaR/wkfeB3Gd JzEFoCPNyuMc2Jju1Bdoj6ewzZLvPtIjKLvLHbk67l1malz/8GsXY9WE67P6ZBpdpTaw 0RXVHEHsKOR5Fd8U/m6mp/CqHLqXo8h2mvho0dL0tT2W+aPKLShI8uUGRBgvrJpDjk5e JgeQFKBCSnmX8JA0SU07c1WJ/5E5THSf1mBrQGXPmKBajBshqw1iBwxfWaBO/EZ/cQ7M JjDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=xnLS2eoi1pQL/SWWKJ+ZWGhkul+SYeaNHpkX1daZWRw=; b=Fx55bYjDb6SXgH7wcJ8LDRfTvtBGhv6bf5i6pLQD4I14q728ppN7Qsb1sVFCGdBzBY z9SyTGhdqoZfNTFvXn/RoGxurSDDqhSpYW3E/+ACsOboUgqRAWUenRuM2RIsoRhNeJ9Q 7old7TOR9001A4HW8f8vg6hyhHcg5qhYpFlyxB6cbleitDxIJxqZdySEWcuOBD/Jbiv7 ztKjtNTndPDn3tC73pgIZBppVn3cH0+p01YYFYRsDZPPb52DBNuggDa27n980w29tNc8 Qd32v27oiggitLO4jQVNbCbYh2bpYE1LU22/SC14HZ5D6Rt0l9fPkMNOAu/QcSxhciI/ XuEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h14si10119602edj.225.2019.11.10.09.19.53; Sun, 10 Nov 2019 09:20:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726959AbfKJRRn (ORCPT + 99 others); Sun, 10 Nov 2019 12:17:43 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:38567 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726651AbfKJRRm (ORCPT ); Sun, 10 Nov 2019 12:17:42 -0500 Received: by mail-pl1-f196.google.com with SMTP id w8so6673680plq.5 for ; Sun, 10 Nov 2019 09:17:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xnLS2eoi1pQL/SWWKJ+ZWGhkul+SYeaNHpkX1daZWRw=; b=X9+2Kt+qYQTswByTEVn6L/1xhqVUslGS06tm6iiJbognavsZNQxs//aQwDBdNWbOCs /jfmzsRWrc/1zGQ+RUF8J1E+nSpfL52tbIwdQNEMGRl6yAIds78D2UHEz27vT0EeaPln j+vf4oypO0yGiixzuSPID5aGYoCyfXxCT49bjWC+PA9khbADleY7fY0qD8wqs15bb5bp S++2SahFh0WcsvgWCQM4rR1TJnyRMc1h//RZ8AGIIjLvxvravfa2BfK88H0AOTCsvecw eJ+1vFFXHJhO3HpSDS8ytxswX4/LeSchganWi7o2EmRkSRzc581WbzPtFNPonYgcvxzl DRJg== X-Gm-Message-State: APjAAAVQhK3RjJBNUpwdaYNW/C+/PtsiOCGCzG2vdl8vkxhfiHiR/XMq HDtUgFkloQ1n2fhUWwphOVbUzg== X-Received: by 2002:a17:902:5a41:: with SMTP id f1mr22082396plm.168.1573406261481; Sun, 10 Nov 2019 09:17:41 -0800 (PST) Received: from ?IPv6:2601:646:c200:1ef2:3602:86ff:fef6:e86b? ([2601:646:c200:1ef2:3602:86ff:fef6:e86b]) by smtp.googlemail.com with ESMTPSA id h8sm17859893pjp.1.2019.11.10.09.17.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Nov 2019 09:17:40 -0800 (PST) Subject: Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further To: Ingo Molnar , Linus Torvalds Cc: Thomas Gleixner , LKML , the arch/x86 maintainers , Stephen Hemminger , Willy Tarreau , Juergen Gross , Sean Christopherson , "H. Peter Anvin" References: <20191106193459.581614484@linutronix.de> <20191106202806.241007755@linutronix.de> <20191107082541.GF30739@gmail.com> From: Andy Lutomirski Message-ID: <54ae3c3a-f49a-f5cc-d174-de6e64afe899@kernel.org> Date: Sun, 10 Nov 2019 09:17:39 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0 MIME-Version: 1.0 In-Reply-To: <20191107082541.GF30739@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/7/19 12:25 AM, Ingo Molnar wrote: > > * Linus Torvalds wrote: > >> On Wed, Nov 6, 2019 at 12:57 PM Thomas Gleixner wrote: >>> >>> Calculate both the position of the first zero bit and the last zero bit to >>> limit the range which needs to be copied. This does not solve the problem >>> when the previous tasked had only byte 0 cleared and the next one has only >>> byte 65535 cleared, but trying to solve that would be too complex and >>> heavyweight for the context switch path. As the ioperm() usage is very rare >>> the case which is optimized is the single task/process which uses ioperm(). >> >> Hmm. >> >> I may read this patch wrong, but from what I can tell, if we really >> have just one process with an io bitmap, we're doing unnecessary >> copies. >> >> If we really have just one process that has an iobitmap, I think we >> could just keep the bitmap of that process entirely unchanged. Then, >> when we switch away from it, we set the io_bitmap_base to an invalid >> base outside the TSS segment, and when we switch back, we set it back >> to the valid one. No actual bitmap copies at all. >> >> So I think that rather than the "begin/end offset" games, we should >> perhaps have a "what was the last process that used the IO bitmap for >> this TSS" pointer (and, I think, some sequence counter, so that when >> the process updates its bitmap, it invalidates that case)? >> >> Of course, you can do *nboth*, but if we really think that the common >> case is "one special process", then I think the begin/end offset is >> useless, but a "last bitmap process" would be very useful. >> >> Am I missing something? > > In fact on SMP systems this would result in a very nice optimization: > pretty quickly *all* TSS's would be populated with that single task's > bitmap, and it would persist even across migrations from CPU to CPU. > > I'd love to get rid of the offset caching and bit scanning games as well > - it doesn't really help in a number of common scenarios and it > complicates this non-trivial piece of code a *LOT* - and we probably > don't really have the natural testing density of this code anymore to > find any regressions quickly. I think we should not over-optimize this. I am all for penalizing ioperm() and iopl() users as much as is convenient for us. There is simply no legitimate use case. Sorry, DPDK, but "virtio-legacy sucks, let's optimize the crap out of something that is slow anyway and use iopl()" is not a good excuse. Just use the %*!7 syscall to write to /sys/.../resource0 and suck up the probably negligible performance hit. And tell your customers to upgrade their hypervisors. And quite kvetching about performance of the control place on an old software-emulated NIC while you're at it. For the TLB case, it's worth tracking who last used which ASID and whether it's still up to date, since *everyone* uses the MMU. For ioperm, I don't really believe this is worth it. > > So intuitively I'd suggest we gravitate towards the simplest > implementation, with a good caching optimization for the single-task > case. I agree with the first bit, but caching on an SMP system is necessarily subtle. Some kind of invalidation is needed. > > I.e. the model I'm suggesting is that if a task uses ioperm() or iopl() > then it should have a bitmap from that point on until exit(), even if > it's all zeroes or all ones. Most applications that are using those > primitives really need it all the time and are using just a few ioports, > so all the tracking doesn't help much anyway. > > On a related note, another simplification would be that in principle we > could also use just a single bitmap and emulate iopl() as an ioperm(all) > or ioperm(none) calls. Yeah, it's not fully ABI compatible for mixed > ioperm()/iopl() uses, but is that ABI actually being relied on in > practice? > Let's please keep the ABI. Or rather, let's attempt to eventually remove the ABI, but let's not change it in the mean time please.