Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp102016ybg; Fri, 25 Oct 2019 17:28:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqxgnIaThX5KWiOqpMhAFALli8HOpH6iGTBUiS5g13KoyE61AIUE+l3UYkjClKg4r+4dEWHS X-Received: by 2002:a50:f40c:: with SMTP id r12mr7178741edm.50.1572049730747; Fri, 25 Oct 2019 17:28:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572049730; cv=none; d=google.com; s=arc-20160816; b=kVLw2NA0YjMXaORzRBVsKvB0N35bJKz30jtciHHZdMSeCotlZOeGS7SrwMwLjRHmKT eQcOYVDydrXc/fXWZoCnFM3lpw36EKs/Q3/loFzdmMWook4Ql6sZ0xZKhqqADc2NOImd +NV9muZQrRO0M8cdv5d1R61RyWt9upfmG5nH8+zM5D4HVlHU24TfwRsB5pP+f4JOmJvz Q8QCKv/vCR4KACL+S0GI/jRQpa00JktV5j55S7JNJdQqE4mxXHRttIJ7DOtuDp1TLNuD PkBG6PUDdunkWTrOOYat8VskI97Ogt2FCO3F5YwKt/AIPALwXUcjy6TWIGaHFFU5YtXb wfeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=PEAjql2iFw7td1jfpuF823je//pyqsV/Cf65YBW3xJ8=; b=lyPVO6jh2+F0Yu1PK/pPHnFgxCWq0cJrHQmcDNeClIj83F3Ltitfeo9//gXmkycDO9 om8ujHpm+tMQZAyDFBwI8edwEZrItFOpxgboQCEmjgtr1YXkq98phPgiJ7S217/762ob NqF0Ua1L9wPaWjiaHqE8YW7HLiMMKD51id1EBBbzAP0WmQDndA7tkwjbfs3lONLnVZGZ k7UH6Iwcm7rJmwK0+mmjk+iteFvrl//aSZ7EMG0d1pUy7jmv6LBQmcCiOleCaACzlahQ DR+OyAz1TU3UHBQuier3FRSS64oZmzzPCWskpvpAZrWkmmJj5GCVOvS/Q8J6pigq+OMQ YgxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rivMcYDd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d28si2545768edb.10.2019.10.25.17.28.24; Fri, 25 Oct 2019 17:28:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=rivMcYDd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725926AbfJZA1h (ORCPT + 99 others); Fri, 25 Oct 2019 20:27:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:38844 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725881AbfJZA1h (ORCPT ); Fri, 25 Oct 2019 20:27:37 -0400 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 377AB214DA for ; Sat, 26 Oct 2019 00:27:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572049655; bh=OxoWV/8MGbWjGNY8ZbA6IGGQbLDSYSHvlpPbwgSb1aI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=rivMcYDdtEE8KSs5ULRXc+fzoXaDbSjoiR7YHc9VHyEpEBCza7j9R5SUx3ZuNwvx4 BzbNL5aFtakb/5jQLBetI1zjdvj0nP8fl5yA9uEmE3FyvRVN6daHi80+Ry3QOg0P07 kLXbzW6Y4MTFcZAYAh4/3rEm0P5kKZ9bSKh6m6ZA= Received: by mail-wm1-f54.google.com with SMTP id 22so3560537wms.3 for ; Fri, 25 Oct 2019 17:27:35 -0700 (PDT) X-Gm-Message-State: APjAAAVHpiwIgSxvegkhCCVA7Db1ke8jYKaNPJrLQg2vH+siZ5irPzgP TV16FHRtw7L5w0IwUUvOBUlmHQdJ0/zpXURot/Dpfw== X-Received: by 2002:a1c:ed0d:: with SMTP id l13mr5730543wmh.76.1572049653656; Fri, 25 Oct 2019 17:27:33 -0700 (PDT) MIME-Version: 1.0 References: <20191025091310.05770edc@hermes.lan> In-Reply-To: <20191025091310.05770edc@hermes.lan> From: Andy Lutomirski Date: Fri, 25 Oct 2019 17:27:22 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [dpdk-dev] Please stop using iopl() in DPDK To: Stephen Hemminger Cc: Andy Lutomirski , dev@dpdk.org, Thomas Gleixner , Peter Zijlstra , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Oct 25, 2019, at 9:13 AM, Stephen Hemminger wrote: > > =EF=BB=BFOn Thu, 24 Oct 2019 21:45:56 -0700 > Andy Lutomirski wrote: > >> Hi all- >> >> Supporting iopl() in the Linux kernel is becoming a maintainability >> problem. As far as I know, DPDK is the only major modern user of >> iopl(). >> >> After doing some research, DPDK uses direct io port access for only a >> single purpose: accessing legacy virtio configuration structures. >> These structures are mapped in IO space in BAR 0 on legacy virtio >> devices. > > Yes. Legacy virtio seems to have been designed without consideration > of how to use it in userspace. Xen, Vmware and Hyper-V all use memory > as a doorbell mechanism which is easier to use from userspace. > > >> There are at least three ways you could avoid using iopl(). Here they >> are in rough order of quality in my opinion: >> >> 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use >> read() and write() on resource0 in sysfs. > > The cost of entering the kernel for a doorbell mechanism is too > expensive and would kill performance. > > >> 2. Use the alternative access mechanism in the virtio legacy spec: >> there is a way to access all of these structures via configuration >> space. > > There is no way to use memory doorbell on older versions of virtio. > Users want to run DPDK on old stuff like RHEL6 and even older > kernel forks. There are even use cases where virtio is used for > a non-Linux host; such as GCP. > > >> 3. Use ioperm() instead of iopl(). > > Ioperm has the wrong thread semantics. All DPDK applications have > multiple threads and the initialization logic needs to work even > if the thread is started later; threads can also be started by > the user application. > > Iopl applies to whole process so this is not an issue. This is not true. ioperm() and iopl() have identical thread semantics. I think what you=E2=80=99re seeing is that you can set iopl(3) early withou= t knowing which port range to request. You could alternatively set ioperm() early and ask for a very wide range. In principle, we could make ioperm() be per thread, but I=E2=80=99m not sure we should add that ki= nd of complexity to support a mostly obsolete use case like this. There's actually an argument to be made that per-mm ioperm would be easier to handle in the kernel than per-task due to the vagaries of KPTI. All this being said, what are the actual performance implications of write() to /sys/.../resource0? Off the top of my head, I would guess that the actual OUTB or OUTL instruction itself is incredibly slow due to being trapped and emulated and that virtio-legacy hypervisors aren't particularly fast to begin with and that, as a result, the write() might not actually matter that much. > >> >> >> We are considering changes to the kernel that will potentially harm >> the performance of any program that uses iopl(3) -- in particular, >> context switches will become more expensive, and the scheduler might >> need to explicitly penalize such programs to ensure fairness. Using >> ioperm() already hurts performance, and the proposed changes to iopl() >> will make it even worse. Alternatively, the kernel could drop iopl() >> support entirely. I will certainly make a change to allow >> distributions to remove iopl() support entirely from their kernels, >> and I expect that distributions will do this. >> >> Please fix DPDK. > > Please fix virtio. Done, with the new version of virtio :)