Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp3963487ybg; Mon, 28 Oct 2019 23:33:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqwEYmTkfs9tC3W2iCiCaI9qo7tNjVrtCpHJaNGfocyj7DIEOW8nbnfLM2SchNxOIeJTrVY0 X-Received: by 2002:a50:91f6:: with SMTP id h51mr23444037eda.99.1572330798439; Mon, 28 Oct 2019 23:33:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572330798; cv=none; d=google.com; s=arc-20160816; b=sI4nSxYAoBU1Q9DNQc0OylRJWMge+g+GoWgSwEdgoTqgRitweNgq7PanWtpg7Ks/Ty 5/drsEPuuFJHFNiZafYYqlrO3j20l1goHc4R7sOtAYhsmkXlVykl+rOt2mzhimWSd2zn m6MKbmCi/Ty/xHFGGwyQSfbrlQP+u+HWmjjPIjcL+Db0rilpBzt+sevqm1JgdT+H6Vjj 9j6/zFJnGQYc8w5O4uQ5smWj3eD6i+2dyJJA7WxhiIc3t5HVITu0OZXhFGAWhR+HBFrf wzGsj8t52U5DvqoMVxMrIb49NloRB/4re6C269/TbB0/CQ8YZn7TCsOFbuP2UCQA+9P2 Kiyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=eOtKPSKzNlKj91XYEKv2H9qykBcHBY+lBz0pU5gPiuo=; b=P+r59Eh4y/DKIvMhIU48VSmiOScTvQj1dbNWN3nsBmgIlivTevXp1vekw16VsvNZYt RW3ZoSnHoZUuadKIkiv2wL4iqcRnH85yjdaur9fY17571IvdbQcJKZ8Zwi+C/xHMw95u N/sacnMywg7Oenb5yX8emXqBdeQoBqrv3p7ypYgM9AXUGWWVH1d6UDO/nSy91bYK5+Wv MUdJ2wQdt8m1VrxtOKP4jq/9twXcuingAsJb0VD9LIbo3EEmwdq0nIn8gKl7vAaAuGLO hgcqEhbYTWEy7HAGDCEsMywTQ3hhYfLmfg1FyRuKPrkQ2ZENZGhM0gRlz5H2Udapnz4u UClg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sHQ5BW+4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 14si7966833eja.294.2019.10.28.23.32.55; Mon, 28 Oct 2019 23:33:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sHQ5BW+4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404256AbfJ1SBC (ORCPT + 99 others); Mon, 28 Oct 2019 14:01:02 -0400 Received: from mail.kernel.org ([198.145.29.99]:42404 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404245AbfJ1SBB (ORCPT ); Mon, 28 Oct 2019 14:01:01 -0400 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5B75C20862 for ; Mon, 28 Oct 2019 18:01:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572285660; bh=jYHTq2fhIHq1FOGacNdIHCM7MFR7pavyDtJAQBFVzZE=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=sHQ5BW+4IbC5n3E0yGSonfPNMLwwnaEyifNaFo6GjGnbmAiIscG86OpPWOXR5I7kf Ep5LUXW/3jqk8p75rF6+VTgdRwfsjxidebVeo0ID2shPpfIGVuPOHAkM597TiF1+qK f96abHy/p7boIglCDvJVWbePMCwf684Vnx0io62Y= Received: by mail-wr1-f46.google.com with SMTP id w18so10913623wrt.3 for ; Mon, 28 Oct 2019 11:01:00 -0700 (PDT) X-Gm-Message-State: APjAAAUMjqONxv3aY4JK21xjZlBC6pKEc1eNUzDiOUpLoBVESVRhnPcI dMbkwPddfAj+9I+tijgoghZa5HQ6+1D1caNxBZBk0A== X-Received: by 2002:a5d:4d0f:: with SMTP id z15mr15475859wrt.195.1572285658852; Mon, 28 Oct 2019 11:00:58 -0700 (PDT) MIME-Version: 1.0 References: <20191025064225.GA22917@1wt.eu> <20191028094253.054fbf9c@hermes.lan> In-Reply-To: <20191028094253.054fbf9c@hermes.lan> From: Andy Lutomirski Date: Mon, 28 Oct 2019 11:00:47 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [dpdk-dev] Please stop using iopl() in DPDK To: Stephen Hemminger Cc: Willy Tarreau , Andy Lutomirski , dev@dpdk.org, Thomas Gleixner , Peter Zijlstra , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Oct 28, 2019, at 10:43 AM, Stephen Hemminger wrote: > > =EF=BB=BFOn Fri, 25 Oct 2019 08:42:25 +0200 > Willy Tarreau wrote: > >> Hi Andy, >> >>> On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: >>> Hi all- >>> >>> Supporting iopl() in the Linux kernel is becoming a maintainability >>> problem. As far as I know, DPDK is the only major modern user of >>> iopl(). >>> >>> After doing some research, DPDK uses direct io port access for only a >>> single purpose: accessing legacy virtio configuration structures. >>> These structures are mapped in IO space in BAR 0 on legacy virtio >>> devices. >>> >>> There are at least three ways you could avoid using iopl(). Here they >>> are in rough order of quality in my opinion: >> (...) >> >> I'm just wondering, why wouldn't we introduce a sys_ioport() syscall >> to perform I/Os in the kernel without having to play at all with iopl()/ >> ioperm() ? That would alleviate the need for these large port maps. >> Applications that use outb/inb() usually don't need extreme speeds. >> Each time I had to use them, it was to access a watchdog, a sensor, a >> fan, control a front panel LED, or read/write to NVRAM. Some userland >> drivers possibly don't need much more, and very likely run with >> privileges turned on all the time, so replacing their inb()/outb() calls >> would mostly be a matter of redefining them using a macro to use the >> syscall instead. >> >> I'd see an API more or less like this : >> >> int ioport(int op, u16 port, long val, long *ret); >> >> would take values such as INB,INW,INL to fill *, OUTB,OUTW,OUL >> to read from , possibly ORB,ORW,ORL to read, or with , write >> back and return previous value to , ANDB/W/L, XORB/W/L to do the >> same with and/xor, and maybe a TEST operation to just validate support >> at start time and replace ioperm/iopl so that subsequent calls do not >> need to check for errors. Applications could then replace : >> >> ioperm() with ioport(TEST,port,0,0) >> iopl() with ioport(TEST,0,0,0) >> outb() with ioport(OUTB,port,val,0) >> inb() with ({ char val;ioport(INB,port,0,&val);val;}) >> >> ... and so on. >> >> And then ioperm/iopl can easily be dropped. >> >> Maybe I'm overlooking something ? >> Willy > > DPDK does not want to system calls. It kills performance. > With pure user mode access it can reach > 10 Million Packets/sec > with a system call per packet that drops to 1 Million Packets/sec. If you are getting 10 MPPS with an OUT per packet, I=E2=80=99ll buy you a whole case of beer. I=E2=80=99m suggesting that, on virtio-legacy, you benchmark the performanc= e hit of using a syscall to ring the doorbell. Right now, you're doing an OUT instruction that traps to the hypervisor, probably gets emulated, and goes out to whatever host-side driver is running. The cost of doing that is going to be quite high, especially on older machines. I'm guessing that adding a syscall to the mix won't make much difference. --Andy