Hi Linus,
This is a new tree, I had a conflict with your latest tree due to some
trivial cleanups you merged. I've added the fix for CVE-2008-3831 which is
unembargoed.
Please pull the 'drm-next' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next
This contains two patches outside the DRM git tree to add exports for GEM
functionality while we await Nick Piggins vmap and shmem changes.
This update contains the following.
1. CVE-2008-3831 - fixes for memset on arbitrary memory via i915 on G33 hw and
above.
2. Intel GEM memory manager .
3. Opregion support for Intel Integrated chipsets.
4. Updated radeon driver with support for new chipsets + bugfixes
5. vblank reworking to save power and interrupts on intel/radeon GPUs.
Dave.
arch/x86/mm/highmem_32.c | 1 +
drivers/gpu/drm/Kconfig | 3 +-
drivers/gpu/drm/Makefile | 5 +-
drivers/gpu/drm/drm_agpsupport.c | 52 +-
drivers/gpu/drm/drm_cache.c | 69 +
drivers/gpu/drm/drm_drv.c | 6 +
drivers/gpu/drm/drm_fops.c | 8 +-
drivers/gpu/drm/drm_gem.c | 421 ++++++
drivers/gpu/drm/drm_irq.c | 464 +++++-
drivers/gpu/drm/drm_memory.c | 2 +
drivers/gpu/drm/drm_mm.c | 5 +-
drivers/gpu/drm/drm_proc.c | 135 ++-
drivers/gpu/drm/drm_stub.c | 11 +-
drivers/gpu/drm/drm_sysfs.c | 2 +-
drivers/gpu/drm/i915/Makefile | 7 +-
drivers/gpu/drm/i915/i915_dma.c | 332 +++--
drivers/gpu/drm/i915/i915_drv.c | 476 +------
drivers/gpu/drm/i915/i915_drv.h | 1180 +++++-----------
drivers/gpu/drm/i915/i915_gem.c | 2558 ++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_gem_debug.c | 201 +++
drivers/gpu/drm/i915/i915_gem_proc.c | 292 ++++
drivers/gpu/drm/i915/i915_gem_tiling.c | 257 ++++
drivers/gpu/drm/i915/i915_irq.c | 514 +++++--
drivers/gpu/drm/i915/i915_opregion.c | 371 +++++
drivers/gpu/drm/i915/i915_reg.h | 1417 ++++++++++++++++++
drivers/gpu/drm/i915/i915_suspend.c | 509 +++++++
drivers/gpu/drm/mga/mga_drv.c | 29 +-
drivers/gpu/drm/mga/mga_drv.h | 6 +-
drivers/gpu/drm/mga/mga_irq.c | 74 +-
drivers/gpu/drm/mga/mga_state.c | 2 +-
drivers/gpu/drm/r128/r128_drv.c | 29 +-
drivers/gpu/drm/r128/r128_drv.h | 11 +-
drivers/gpu/drm/r128/r128_irq.c | 55 +-
drivers/gpu/drm/r128/r128_state.c | 2 +-
drivers/gpu/drm/radeon/radeon_cp.c | 53 +-
drivers/gpu/drm/radeon/radeon_drv.c | 32 +-
drivers/gpu/drm/radeon/radeon_drv.h | 57 +-
drivers/gpu/drm/radeon/radeon_irq.c | 268 +++--
drivers/gpu/drm/radeon/radeon_state.c | 2 +-
drivers/gpu/drm/sis/sis_mm.c | 10 +-
drivers/gpu/drm/via/via_drv.c | 26 +-
drivers/gpu/drm/via/via_drv.h | 16 +-
drivers/gpu/drm/via/via_irq.c | 105 +-
drivers/gpu/drm/via/via_mm.c | 3 +-
include/drm/drm.h | 63 +-
include/drm/drmP.h | 249 +++-
include/drm/drm_pciids.h | 54 +-
include/drm/i915_drm.h | 333 +++++
mm/shmem.c | 1 +
49 files changed, 8814 insertions(+), 1964 deletions(-)
create mode 100644 drivers/gpu/drm/drm_cache.c
create mode 100644 drivers/gpu/drm/drm_gem.c
create mode 100644 drivers/gpu/drm/i915/i915_gem.c
create mode 100644 drivers/gpu/drm/i915/i915_gem_debug.c
create mode 100644 drivers/gpu/drm/i915/i915_gem_proc.c
create mode 100644 drivers/gpu/drm/i915/i915_gem_tiling.c
create mode 100644 drivers/gpu/drm/i915/i915_opregion.c
create mode 100644 drivers/gpu/drm/i915/i915_reg.h
create mode 100644 drivers/gpu/drm/i915/i915_suspend.c
commit 4b40893918203ee1a1f6a114316c2a19c072e9bd
Author: Matthias Hopf <[email protected]>
Date: Sat Oct 18 07:18:05 2008 +1000
drm/i915: fix ioremap of a user address for non-root (CVE-2008-3831)
Olaf Kirch noticed that the i915_set_status_page() function of the i915
kernel driver calls ioremap with an address offset that is supplied by
userspace via ioctl. The function zeroes the mapped memory via memset
and tells the hardware about the address. Turns out that access to that
ioctl is not restricted to root so users could probably exploit that to
do nasty things. We haven't tried to write actual exploit code though.
It only affects the Intel G33 series and newer.
Signed-off-by: Dave Airlie <[email protected]>
commit 9e0b97e37fddaf5419d8af24362015ab684eff7e
Author: Dave Airlie <[email protected]>
Date: Fri Oct 17 09:29:14 2008 +1000
drm: make CONFIG_DRM depend on CONFIG_SHMEM.
This can be removed later when DRM doesn't depend on shmem.
Signed-off-by: Dave Airlie <[email protected]>
commit edc6f389f6ae9cb7621270a8ddbb1892bd8df125
Author: Alex Deucher <[email protected]>
Date: Fri Oct 17 09:21:45 2008 +1000
radeon: fix PCI bus mastering support enables.
Someone noticed these registers moved around for later chips,
so we redo the codepaths per-chip. PCIE chips don't appear to
require explicit enables.
Signed-off-by: Dave Airlie <[email protected]>
commit b2ceddfa52cbeb244b90096f1e8d3e9f7e0ce299
Author: Alex Deucher <[email protected]>
Date: Fri Oct 17 09:19:33 2008 +1000
radeon: add RS400 family support.
This adds support for the RS400 family of IGPs for Intel CPUs.
Signed-off-by: Dave Airlie <[email protected]>
commit f0738e92403466d45cfb5008da668260c77fff4b
Author: Alex Deucher <[email protected]>
Date: Thu Oct 16 17:12:02 2008 +1000
drm/radeon: add support for RS740 IGP chipsets.
This adds support for the HS2100 IGP chipset.
Signed-off-by: Dave Airlie <[email protected]>
commit b612eda98e4b4bae4c98a863f039bc89425f9039
Author: Eric Anholt <[email protected]>
Date: Wed Oct 15 00:05:58 2008 -0700
i915: GM45 has GM965-style MCH setup.
Fixes tiling swizzling mode failures that manifest in glReadPixels().
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 6dbe2772d6af067845bab57be490c302f4490ac7
Author: Keith Packard <[email protected]>
Date: Tue Oct 14 21:41:13 2008 -0700
i915: Don't run retire work handler while suspended
At leavevt and lastclose time, cancel any pending retire work handler
invocation, and keep the retire work handler from requeuing itself if it is
currently running.
This patch restructures i915_gem_idle to perform all of these tasks instead
of having both leavevt and lastclose call a sequence of functions.
Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit ba1eb1d825fdef40f69871caf8e5842d00efbbc5
Author: Keith Packard <[email protected]>
Date: Tue Oct 14 19:55:10 2008 -0700
i915: Map status page cached for chips with GTT-based HWS location.
This should improve performance by avoiding uncached reads by the CPU (the
point of having a status page), and may improve stability. This patch only
affects G33, GM45 and G45 chips as those are the only ones using GTT-based
HWS mappings.
Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 50aa253d820ad4577e2231202f2c8fd89f9dc4e6
Author: Keith Packard <[email protected]>
Date: Tue Oct 14 17:20:35 2008 -0700
i915: Fix up ring initialization to cover G45 oddities
G45 appears quite sensitive to ring initialization register writes,
sometimes leaving the HEAD register with the START register contents. Check
to make sure HEAD is reset correctly when START is written, and fix it up,
screaming loudly.
Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 0cdad7e88a23910a911a3339ff2d70f8f952d7b8
Author: Keith Packard <[email protected]>
Date: Tue Oct 14 17:19:38 2008 -0700
i915: Use non-reserved status page index for breadcrumb
Dwords 0 through 0x1f are reserved for use by the hardware. Move the GEM
breadcrumb from 0x10 to 0x20 to keep out of this area.
Signed-off-by: Keith Packard <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 630681d9a5314e6cf53d144f7f58b7c19862a7d3
Author: Eric Anholt <[email protected]>
Date: Mon Oct 6 15:14:12 2008 -0700
drm: Increment dev_priv->irq_received so i915_gem_interrupts count works.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 9bfbd5cb72c9edb8504a4a7a0aa89cdb2fcb4845
Author: Jesse Barnes <[email protected]>
Date: Mon Sep 15 15:00:33 2008 -0700
drm: kill drm_device->irq
Like the last patch but adds a macro to get at the irq value instead of
dereferencing pdev directly. Should make things easier for the BSD guys and
if we ever support non-PCI devices.
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit e0f0754ff6128570dcf38032f5bfb1f195e5bbee
Author: Dave Airlie <[email protected]>
Date: Tue Oct 7 13:41:49 2008 +1000
drm: wbinvd is cache coherent.
doing an ipi for the wbinvd case isn't necessary.
Signed-off-by: Dave Airlie <[email protected]>
commit e7d22bc3cb57126196c4f475d4e55aa44e151784
Author: Dave Airlie <[email protected]>
Date: Tue Oct 7 13:40:36 2008 +1000
i915: add missing return in error path.
Signed-off-by: Dave Airlie <[email protected]>
commit 2bdf00b22154023ac312481583603f4724eb1401
Author: Dave Airlie <[email protected]>
Date: Tue Oct 7 13:40:10 2008 +1000
i915: fixup permissions on gem ioctls.
init/entervt/leavevt should be root-only master ioctls.
Signed-off-by: Dave Airlie <[email protected]>
commit 3043c60c485ad694392d3f71bd7ef9f5c5f7cfdd
Author: Eric Anholt <[email protected]>
Date: Thu Oct 2 12:24:47 2008 -0700
drm: Clean up many sparse warnings in i915.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit bd88ee4c1b1c8fc8b78a0ba7b6235d230cea0d05
Author: Eric Anholt <[email protected]>
Date: Tue Sep 23 14:50:57 2008 -0700
drm: Use ioremap_wc in i915_driver instead of ioremap, since we always want WC.
Fixes failure to map the ringbuffer when PAT tells us we don't get to do
uncached on something that's already mapped WC, or something along those lines.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 28af0a2767412937e8424364a8ece9b230bdbc83
Author: Eric Anholt <[email protected]>
Date: Mon Sep 15 13:13:34 2008 -0700
drm: G33-class hardware has a newer 965-style MCH (no DCC register).
Fixes bad software fallback rendering in Mesa in dual-channel configurations.
d9a2470012588dc5313a5ac8bb2f03575af00e99
Signed-off-by: Dave Airlie <[email protected]>
commit 4f481ed22ec0d412336a13dc4477f6d0f3688882
Author: Eric Anholt <[email protected]>
Date: Wed Sep 10 14:22:49 2008 -0700
drm: Avoid oops in GEM execbuffers with bad arguments.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit d4e7b898c12b2b458323243abdd4a215f8f8f090
Author: Eric Anholt <[email protected]>
Date: Tue Sep 9 11:40:34 2008 -0700
DRM: Return -EBADF on bad object in flink, and return curent name if it exists.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit dbb19d302baa8ba518599701914f600935fc53fa
Author: Kristian H?gsberg <[email protected]>
Date: Wed Aug 20 11:04:27 2008 -0400
i915 gem: install and uninstall irq handler in entervt and leavevt ioctls.
Signed-off-by: Kristian H?gsberg <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit c99b058f132388a666544d293392d52d1def6b12
Author: Kristian H?gsberg <[email protected]>
Date: Wed Aug 20 11:20:13 2008 -0400
i915: Make use of sarea_priv conditional.
We fail ioctls that depend on the sarea_priv with EINVAL.
Signed-off-by: Kristian H?gsberg <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 546b0974c39657017407c86fe79811100b60700d
Author: Eric Anholt <[email protected]>
Date: Mon Sep 1 16:45:29 2008 -0700
i915: Use struct_mutex to protect ring in GEM mode.
In the conversion for GEM, we had stopped using the hardware lock to protect
ring usage, since it was all internal to the DRM now. However, some paths
weren't converted to using struct_mutex to prevent multiple threads from
concurrently working on the ring, in particular between the vblank swap handler
and ioctls.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit ed4c9c4acf948b42b138747fcb8843ecb1a24ce4
Author: Kristian H?gsberg <[email protected]>
Date: Wed Aug 20 11:08:52 2008 -0400
i915: Add chip set ID param.
Signed-off-by: Kristian H?gsberg <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 673a394b1e3b69be886ff24abfd6df97c52e8d08
Author: Eric Anholt <[email protected]>
Date: Wed Jul 30 12:06:12 2008 -0700
drm: Add GEM ("graphics execution manager") to i915 driver.
GEM allows the creation of persistent buffer objects accessible by the
graphics device through new ioctls for managing execution of commands on the
device. The userland API is almost entirely driver-specific to ensure that
any driver building on this model can easily map the interface to individual
driver requirements.
GEM is used by the 2d driver for managing its internal state allocations and
will be used for pixmap storage to reduce memory consumption and enable
zero-copy GLX_EXT_texture_from_pixmap, and in the 3d driver is used to enable
GL_EXT_framebuffer_object and GL_ARB_pixel_buffer_object.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit d1d8c925b71dd6753bf438f9e14a9e5c5183bcc6
Author: Eric Anholt <[email protected]>
Date: Thu Aug 21 12:53:33 2008 -0700
Export kmap_atomic_pfn for DRM-GEM.
The driver would like to map IO space directly for copying data in when
appropriate, to avoid CPU cache flushing for streaming writes.
kmap_atomic_pfn lets us avoid IPIs associated with ioremap for this process.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 395e0ddc44005ced5e4fed9bfc2e4bdf63d37627
Author: Keith Packard <[email protected]>
Date: Fri Jun 20 00:08:06 2008 -0700
Export shmem_file_setup for DRM-GEM
GEM needs to create shmem files to back buffer objects. Though currently
creation of files for objects could have been driven from userland, the
modesetting work will require allocation of buffer objects before userland
is running, for boot-time message display.
Signed-off-by: Eric Anholt <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 0a3e67a4caac273a3bfc4ced3da364830b1ab241
Author: Jesse Barnes <[email protected]>
Date: Tue Sep 30 12:14:26 2008 -0700
drm: Rework vblank-wait handling to allow interrupt reduction.
Previously, drivers supporting vblank interrupt waits would run the interrupt
all the time, or all the time that any 3d client was running, preventing the
CPU from sleeping for long when the system was otherwise idle. Now, interrupts
are disabled any time that no client is waiting on a vblank event. The new
method uses vblank counters on the chipsets when the interrupts are turned
off, rather than counting interrupts, so that we can continue to present
accurate vblank numbers.
Co-author: Michel D?nzer <[email protected]>
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 2df68b439fcb97a4c55f81516206ef4ee325e28d
Author: David Howells <[email protected]>
Date: Tue Sep 2 11:03:14 2008 +1000
drm/cred: wrap task credential accesses in the drm driver.
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: James Morris <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: David Airlie <[email protected]>
commit b9bfdfe6703eb089839d48316a79c84924a3c335
Author: Jesse Barnes <[email protected]>
Date: Mon Aug 25 15:16:19 2008 -0700
new chip name is GM45
Author: Zhenyu Wang <[email protected]>
i915: official name for GM45 chipset
Signed-off-by: Zhenyu Wang <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 317c35d1446f68b34d4de4e1100fc01680bd4877
Author: Jesse Barnes <[email protected]>
Date: Mon Aug 25 15:11:06 2008 -0700
separate i915 suspend/resume functions into their own file
[Patch against drm-next. Consider this a trial balloon for our new Linux
development model.]
This is a big chunk of code. Separating it out makes it easier to change
without churn on the main i915_drv.c file (and there will be churn as we
fix bugs and add things like kernel mode setting). Also makes it easier
to share this file with BSD.
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 6b79d521e07aae155303a992245abb539974dbaa
Author: Dave Airlie <[email protected]>
Date: Tue Sep 2 10:10:16 2008 +1000
radeon: fix writeback across suspend/resume.
Make writeback not get disabled on resume.
Signed-off-by: Dave Airlie <[email protected]>
commit 38eda21189b414b1520ea7aa0e71220796f7008f
Author: Dave Airlie <[email protected]>
Date: Tue Sep 2 10:06:06 2008 +1000
drm: fix sysfs error path.
Pointed out by Roel Kluin on dri-devel.
Signed-off-by: Dave Airlie <[email protected]>
commit dfcf96d09cff63c9aaa8e7c98bbc71e5073b1377
Author: Adrian Bunk <[email protected]>
Date: Sun Aug 24 17:11:22 2008 +1000
FB_SIS=m, DRM_SIS=y is not a legal configuration.
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 8ee1c3db9075cb3211352e737e0feb98fd733b20
Author: Matthew Garrett <[email protected]>
Date: Tue Aug 5 19:37:25 2008 +0100
Add Intel ACPI IGD OpRegion support
This adds the support necessary for allowing ACPI backlight control to
work on some newer Intel-based graphics systems. Tested on Thinkpad T61
and HP 2510p hardware.
Signed-off-by: Matthew Garrett <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 398c9cb20b5c6c5d1313912b937d653a46fec578
Author: Keith Packard <[email protected]>
Date: Wed Jul 30 13:03:43 2008 -0700
i915: Initialize hardware status page at device load when possible.
Some chips were unstable with repeated setup/teardown of the hardware status
page.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit d3a6d4467ca44833bcb4ba1893a7aeaae939e4d5
Author: Keith Packard <[email protected]>
Date: Wed Jul 30 12:21:20 2008 -0700
i915: Track progress inside of batchbuffers for determining wedgedness.
This avoids early termination for long-running commands.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit ed4cb4142b242d8090d3811d5eb4abf6aa985bc8
Author: Eric Anholt <[email protected]>
Date: Tue Jul 29 12:10:39 2008 -0700
i915: Add support for MSI and interrupt mitigation.
Previous attempts at interrupt mitigation had been foiled by i915_wait_irq's
failure to update the sarea seqno value when the status page indicated that
the seqno had already been passed. MSI support has been seen to cut CPU
costs by up to 40% in some workloads by avoiding other expensive interrupt
handlers for frequent graphics interrupts.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 585fb111348f7cdc30c6a1b903987612ddeafb23
Author: Jesse Barnes <[email protected]>
Date: Tue Jul 29 11:54:06 2008 -0700
i915: Use more consistent names for regs, and store them in a separate file.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 962d4fd7273e144ae003ddb90138ae4b80567c70
Author: Keith Packard <[email protected]>
Date: Wed Jul 30 12:36:08 2008 -0700
i915: Ignore X server provided mmio address
It is already correctly detected by the kernel for use in suspend/resume.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 0790d5e148c0747499742a3c09ba5f1c07f9ed0d
Author: Keith Packard <[email protected]>
Date: Wed Jul 30 12:28:47 2008 -0700
i915: remove settable use_mi_batchbuffer_start
The driver can know what hardware requires MI_BATCH_BUFFER vs
MI_BATCH_BUFFER_START; there's no reason to let user mode configure this.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 48f185d0e0f3adde81117ead074e5e6ec5548449
Author: David Howells <[email protected]>
Date: Wed Jul 30 12:29:38 2008 -0700
SiS DRM: fix a pointer cast warning
Fix a pointer cast warning in the SIS DRM code.
This was introduced in patch ce65a44de07f73ceda1749812b75086b7add408d.
Signed-off-by: David Howells <[email protected]>
Cc: Dave Airlie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit 6bb9e4bff5c6fd908d907222108ef5650d77972f
Author: David Howells <[email protected]>
Date: Wed Jul 30 12:29:37 2008 -0700
SiS DRM: fix the memory allocator if the SIS FB is built as a module
Fix the SIS DRM memory allocator if the SIS FB built as a module. The SIS DRM
code initialises the mm allocation hooks, but _only_ if the SIS FB is not
built as a module because it depends on CONFIG_FB_SIS, and that's unset if the
SIS FB is not built in. It must check CONFIG_FB_SIS_MODULE as well.
Signed-off-by: David Howells <[email protected]>
Cc: Dave Airlie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
commit ddb7f4cb819fb6b9df261ce4c80b3c6f4852620d
Author: Carlos R. Mafra <[email protected]>
Date: Wed Jul 30 12:29:37 2008 -0700
drm: remove #define's for non-linux systems
There is no point in considering FreeBSD et al. in the linux kernel
source code.
Signed-off-by: Carlos R. Mafra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
On Fri, 17 Oct 2008, Dave Airlie wrote:
>
> Please pull the 'drm-next' branch from
> ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next
Grr.
This whole merge series has been full of people sending me UNTESTED CRAP.
So what's the excuse _this_ time for adding all these stupid warnings to
my build log? Did nobody test this?
drivers/gpu/drm/drm_proc.c: In function ?drm_gem_one_name_info?:
drivers/gpu/drm/drm_proc.c:525: warning: format ?%d? expects type ?int?, but argument 3 has type ?size_t?
drivers/gpu/drm/drm_proc.c:533: warning: format ?%9d? expects type ?int?, but argument 4 has type ?size_t?
drivers/gpu/drm/i915/i915_gem.c: In function ?i915_gem_gtt_pwrite?:
drivers/gpu/drm/i915/i915_gem.c:184: warning: unused variable ?vaddr_atomic?
and I wonder how many other warnings got added that I never even noticed
because I don't build them?
And yes, it's not just warnings. One of thse is horribly bad code:
nid->len += sprintf(&nid->buf[nid->len],
"%6d%9d%8d%9d\n",
obj->name, obj->size,
atomic_read(&obj->handlecount.refcount),
atomic_read(&obj->refcount.refcount));
where it's just wrong to use the field width as a separator. Because if
the counts are big enough, the separator suddenly goes away!
So that format string should be
"%6d %8zd %7d %8d\n"
instead. Which gives the same output when you don't overflow, and doesn't
have the overflow bug when you do.
As to that "vaddr_atomic" thing, the warning would have been avoided if
you had just cleanly split up the optimistic fast case.
IOW, write cleaner code, and the warning just goes away on its own.
Something like the appended. UNTESTED!
Hmm?
I really wish people were more careful, and took more pride in trying to
write readable code, with small modular functions instead. And move those
variables down to the block they are needed in.
Anyway, I pulled the thing, but _please_ test this cleanup and send it
back to me if it passes your testing. Ok?
Linus
---
drivers/gpu/drm/drm_proc.c | 4 +-
drivers/gpu/drm/i915/i915_gem.c | 59 +++++++++++++++++++++++---------------
2 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/drm_proc.c b/drivers/gpu/drm/drm_proc.c
index d490db4..ae73b7f 100644
--- a/drivers/gpu/drm/drm_proc.c
+++ b/drivers/gpu/drm/drm_proc.c
@@ -522,12 +522,12 @@ static int drm_gem_one_name_info(int id, void *ptr, void *data)
struct drm_gem_object *obj = ptr;
struct drm_gem_name_info_data *nid = data;
- DRM_INFO("name %d size %d\n", obj->name, obj->size);
+ DRM_INFO("name %d size %zd\n", obj->name, obj->size);
if (nid->eof)
return 0;
nid->len += sprintf(&nid->buf[nid->len],
- "%6d%9d%8d%9d\n",
+ "%6d %8zd %7d %8d\n",
obj->name, obj->size,
atomic_read(&obj->handlecount.refcount),
atomic_read(&obj->refcount.refcount));
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9ac73dd..b8c8b2e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -171,6 +171,36 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
return 0;
}
+/*
+ * Try to write quickly with an atomic kmap. Return true on success.
+ *
+ * If this fails (which includes a partial write), we'll redo the whole
+ * thing with the slow version.
+ *
+ * This is a workaround for the low performance of iounmap (approximate
+ * 10% cpu cost on normal 3D workloads). kmap_atomic on HIGHMEM kernels
+ * happens to let us map card memory without taking IPIs. When the vmap
+ * rework lands we should be able to dump this hack.
+ */
+static inline int fast_user_write(unsigned long pfn, char __user *user_data, int l)
+{
+#ifdef CONFIG_HIGHMEM
+ unsigned long unwritten;
+ char *vaddr_atomic;
+
+ vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
+#if WATCH_PWRITE
+ DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
+ i, o, l, pfn, vaddr_atomic);
+#endif
+ unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o, user_data, l);
+ kunmap_atomic(vaddr_atomic, KM_USER0);
+ return !unwritten;
+#else
+ return 1;
+#endif
+}
+
static int
i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_i915_gem_pwrite *args,
@@ -180,12 +210,7 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
ssize_t remain;
loff_t offset;
char __user *user_data;
- char __iomem *vaddr;
- char *vaddr_atomic;
- int i, o, l;
int ret = 0;
- unsigned long pfn;
- unsigned long unwritten;
user_data = (char __user *) (uintptr_t) args->data_ptr;
remain = args->size;
@@ -209,6 +234,9 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
obj_priv->dirty = 1;
while (remain > 0) {
+ unsigned long pfn;
+ int i, o, l;
+
/* Operation in this page
*
* i = page number
@@ -223,25 +251,10 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
pfn = (dev->agp->base >> PAGE_SHIFT) + i;
-#ifdef CONFIG_HIGHMEM
- /* This is a workaround for the low performance of iounmap
- * (approximate 10% cpu cost on normal 3D workloads).
- * kmap_atomic on HIGHMEM kernels happens to let us map card
- * memory without taking IPIs. When the vmap rework lands
- * we should be able to dump this hack.
- */
- vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
-#if WATCH_PWRITE
- DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr_atomic);
-#endif
- unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o,
- user_data, l);
- kunmap_atomic(vaddr_atomic, KM_USER0);
+ if (!fast_user_write(pfn, user_data, l)) {
+ unsigned long unwritten;
+ char __iomem *vaddr;
- if (unwritten)
-#endif /* CONFIG_HIGHMEM */
- {
vaddr = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
#if WATCH_PWRITE
DRM_INFO("pwrite slow i %d o %d l %d "
On Saturday 18 October 2008 08:29, Dave Airlie wrote:
> Hi Linus,
>
> This is a new tree, I had a conflict with your latest tree due to some
> trivial cleanups you merged. I've added the fix for CVE-2008-3831 which is
> unembargoed.
>
> Please pull the 'drm-next' branch from
> ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git
> drm-next
>
> This contains two patches outside the DRM git tree to add exports for GEM
> functionality while we await Nick Piggins vmap and shmem changes.
OK. I was hoping that kmap_atomic_pfn thing *never* see the light of
mainline ;) Because hopefully the vmap improvements should be OK for
2.6.28 as well...
But it's already there. OK, if vmap patches go upstream, then it can
be switched over and the export removed before the release...
Thanks,
Nick
On Fri, 2008-10-17 at 15:43 -0700, Linus Torvalds wrote:
>
> On Fri, 17 Oct 2008, Dave Airlie wrote:
> >
> > Please pull the 'drm-next' branch from
> > ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next
>
> Grr.
>
> This whole merge series has been full of people sending me UNTESTED CRAP.
>
> So what's the excuse _this_ time for adding all these stupid warnings to
> my build log? Did nobody test this?
>
> drivers/gpu/drm/drm_proc.c: In function ‘drm_gem_one_name_info’:
> drivers/gpu/drm/drm_proc.c:525: warning: format ‘%d’ expects type ‘int’, but argument 3 has type ‘size_t’
> drivers/gpu/drm/drm_proc.c:533: warning: format ‘%9d’ expects type ‘int’, but argument 4 has type ‘size_t’
> drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_gtt_pwrite’:
> drivers/gpu/drm/i915/i915_gem.c:184: warning: unused variable ‘vaddr_atomic’
>
> and I wonder how many other warnings got added that I never even noticed
> because I don't build them?
>
> And yes, it's not just warnings. One of thse is horribly bad code:
>
> nid->len += sprintf(&nid->buf[nid->len],
> "%6d%9d%8d%9d\n",
> obj->name, obj->size,
> atomic_read(&obj->handlecount.refcount),
> atomic_read(&obj->refcount.refcount));
>
> where it's just wrong to use the field width as a separator. Because if
> the counts are big enough, the separator suddenly goes away!
>
> So that format string should be
>
> "%6d %8zd %7d %8d\n"
>
> instead. Which gives the same output when you don't overflow, and doesn't
> have the overflow bug when you do.
>
> As to that "vaddr_atomic" thing, the warning would have been avoided if
> you had just cleanly split up the optimistic fast case.
>
> IOW, write cleaner code, and the warning just goes away on its own.
> Something like the appended. UNTESTED!
Yeah, none of us are on x86-64, so we missed those warnings in testing.
The change looks pretty good except for s/return 1/return 0/. We wanted
to pull the i_wish_it_was_ioremap_atomic() hack out so that we could use
it for relocation updates as well (which aren't copy_from_user), and
I've started writing that patch. Expect something pushed at you soon.
--
Eric Anholt
[email protected] [email protected]
On Fri, 17 Oct 2008, Eric Anholt wrote:
>
> Yeah, none of us are on x86-64, so we missed those warnings in testing.
Really? None of you use any modern CPU's, or you're _all_ running 32-bit
distros even though your cpu's could support 64-bit ones?
I would suggest at least _somebody_ in the intel graphics team try to get
with the times.. I realize that Otellini was saying "Nobody needs 64-bit
on the desktop" a few years ago, but he was full of sh*t then, and it's
certainly not remotely true now.
It's not being disloyal to your CEO, really. I'm pretty sure nobody will
be fired just for ignoring that whole "640kB^H^H^H^H^H32-bits should be
enough for everybody" idiocy.
> The change looks pretty good except for s/return 1/return 0/.
oops, yes. Thinko. I reversed the meaning of the return value: at first I
just returned "unwritten", but decided that that was a really ugly
interface, and then when I prettied that up, I didn't fix the !HIGHMEM
case. And I obviously never _tested_ it.
Linus
On Fri, 2008-10-17 at 19:47 -0700, Linus Torvalds wrote:
> Really? None of you use any modern CPU's, or you're _all_ running 32-bit
> distros even though your cpu's could support 64-bit ones?
We're lazy, perhaps even lazier than yourself. Given that the whole goal
is to essentially ignore the CPU and get our code running on the GPU,
it's hard to get excited about the kind of kernel we're running on the
CPU.
We've got a bunch of test boxes that run 64-bits, unfortunately, the
people doing builds there appear not to care about warnings. That will
get fixed.
I've also got this new G45 box here; perhaps it's time to enter the 21st
century and run it in native 64-bit mode. It's scary though; I've been
writing window systems for 32-bit machines for thirty years now.
--
[email protected]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Keith Packard wrote:
> On Fri, 2008-10-17 at 19:47 -0700, Linus Torvalds wrote:
>
>> Really? None of you use any modern CPU's, or you're _all_ running 32-bit
>> distros even though your cpu's could support 64-bit ones?
>
> We're lazy, perhaps even lazier than yourself. Given that the whole goal
> is to essentially ignore the CPU and get our code running on the GPU,
> it's hard to get excited about the kind of kernel we're running on the
> CPU.
>
> We've got a bunch of test boxes that run 64-bits, unfortunately, the
> people doing builds there appear not to care about warnings. That will
> get fixed.
Indeed. I have been running 64 bit builds for quite a while now, and
merely ignored the warnings as "not my problem." In the future, I shall
make it my problem.
~ C.
- --
~ Corbin Simpson
<[email protected]>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkj5hecACgkQeCCY8PC5utAMAQCggtkBoolO58rW5PnPkYTosyZ5
DkgAnAqRSZzoFhgkdFcL92qV6Zyc7usJ
=vpMP
-----END PGP SIGNATURE-----
On Fri, 2008-10-17 at 19:47 -0700, Linus Torvalds wrote:
>
> On Fri, 17 Oct 2008, Eric Anholt wrote:
> >
> > Yeah, none of us are on x86-64, so we missed those warnings in testing.
>
> Really? None of you use any modern CPU's, or you're _all_ running 32-bit
> distros even though your cpu's could support 64-bit ones?
>
> I would suggest at least _somebody_ in the intel graphics team try to get
> with the times.. I realize that Otellini was saying "Nobody needs 64-bit
> on the desktop" a few years ago, but he was full of sh*t then, and it's
> certainly not remotely true now.
>
> It's not being disloyal to your CEO, really. I'm pretty sure nobody will
> be fired just for ignoring that whole "640kB^H^H^H^H^H32-bits should be
> enough for everybody" idiocy.
Writing 3D drivers means running 3D games. Running 3D games
unfortunately means running a lot of 32-bit userland as the fun stuff is
binary-only. So I stick to a 32-bit system, becuase past experience
trying to run both on the same system has been misery.
--
Eric Anholt
[email protected] [email protected]
2008/10/18 Linus Torvalds <[email protected]>:
>
>
> On Fri, 17 Oct 2008, Dave Airlie wrote:
>>
>> Please pull the 'drm-next' branch from
>> ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-next
>
> Grr.
>
> This whole merge series has been full of people sending me UNTESTED CRAP.
>
> So what's the excuse _this_ time for adding all these stupid warnings to
> my build log? Did nobody test this?
The code has been tested on 32 and 64 bit in lots of places, however I
don't generally trawl the Fedora
kernel build logs, and the amount of warnings we have there, new ones
just get lost in the mists..
So its not untested on 64-bit, its just nobody who cared enough saw
the compiler warnings. If linux-next had been
going for the past few weeks someone would have spotted them, I just
got the report yesterday from linux-next after I fed the push request.
Perhaps we all need access to the Ingo compile farm.
Dave.
On Sat, 2008-10-18 at 12:37 +1100, Nick Piggin wrote:
> OK. I was hoping that kmap_atomic_pfn thing *never* see the light of
> mainline ;) Because hopefully the vmap improvements should be OK for
> 2.6.28 as well...
Eric and I chatted with Venki Pallipadi this week and decided what we
really need is an official API for getting at pfns in our device BAR,
which is what we're (ab)using kmap_atomic_pfn for. This ties in with
some PAT issues as well, where we want to assert that all mappings of
our BAR are in WC mode. The basic plan is to have four new functions
(yes, I'm making up names here):
struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
int bar,
int prot);
void io_mapping_free(struct io_mapping *mapping);
void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
This would have the additional effect of making all other mappings of
this BAR be required to match the specific prot, both providing
protection against broken drivers, and providing support for old X
servers which would directly map this BAR from user space.
> But it's already there. OK, if vmap patches go upstream, then it can
> be switched over and the export removed before the release...
I'd say we should leave things alone for .28 and work on making an
official IO mapping API available.
--
[email protected]
On Sat, 18 Oct 2008, Keith Packard wrote:
>
> The basic plan is to have four new functions (yes, I'm making up names
> here):
>
> struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
> int bar,
> int prot);
> void io_mapping_free(struct io_mapping *mapping);
>
> void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
> void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
The important thing is that mappings need to be per-CPU, so the above may
work, but only if it's designed so that "io_reserve_pci_resource()" will
actually reserve space for 'nr_possible_cpu' page mappings, and then the
"io_[un]map_atomic()" functions do per-CPU mappings.
Anything else is a disaster, because anything else implies TLB shootdown.
And quite frankly, even so, we'd possibly still be _better_ off with just
exposing the "kmap_atomic_pfn()" functionality even so. Because quite
frankly, your "io_reserve_pci_resource()" infrastructure is going to
inevitably be more complex and slower than the rather efficient
kmap_atomic_pfn() thing we have.
[ The *non-atomic* kmap() functions are fairly high-overhead, in that they
want to keep track of cached mappings and remember page addresses etc.
So those are the ones we don't want to support for non-HIGHMEM setups.
But the atomic kmaps are pretty simple, and really only need some
trivial FIXMAP support. We could easily extend it for x86-64, methinks,
and do it for x86-32 even when we don't do HIGHMEM.
Ingo? ]
One small detail: our we currently have "kmap_atomic_pfn()" and
"kmap_atomic_prot()", and we really should maek the fundamental core
operation be "kmap_atomic_pfn_prot()", and have everything be done in
terms of that. Looking at it, it also looks like kmap_atomic_prot() is
actually incorrect right now, and doesn't do a "prot" thing for
non-highmem pages, but just returns "page_address(page);"
Linus
Linus Torvalds wrote:
> On Sat, 18 Oct 2008, Keith Packard wrote:
>
>> The basic plan is to have four new functions (yes, I'm making up names
>> here):
>>
>> struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
>> int bar,
>> int prot);
>> void io_mapping_free(struct io_mapping *mapping);
>>
>> void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
>> void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
>>
>
> The important thing is that mappings need to be per-CPU, so the above may
> work, but only if it's designed so that "io_reserve_pci_resource()" will
> actually reserve space for 'nr_possible_cpu' page mappings, and then the
> "io_[un]map_atomic()" functions do per-CPU mappings.
>
> Anything else is a disaster, because anything else implies TLB shootdown.
>
> And quite frankly, even so, we'd possibly still be _better_ off with just
> exposing the "kmap_atomic_pfn()" functionality even so. Because quite
> frankly, your "io_reserve_pci_resource()" infrastructure is going to
> inevitably be more complex and slower than the rather efficient
> kmap_atomic_pfn() thing we have.
>
> [ The *non-atomic* kmap() functions are fairly high-overhead, in that they
> want to keep track of cached mappings and remember page addresses etc.
> So those are the ones we don't want to support for non-HIGHMEM setups.
>
> But the atomic kmaps are pretty simple, and really only need some
> trivial FIXMAP support. We could easily extend it for x86-64, methinks,
> and do it for x86-32 even when we don't do HIGHMEM.
>
> Ingo? ]
>
> One small detail: our we currently have "kmap_atomic_pfn()" and
> "kmap_atomic_prot()", and we really should maek the fundamental core
> operation be "kmap_atomic_pfn_prot()", and have everything be done in
> terms of that. Looking at it, it also looks like kmap_atomic_prot() is
> actually incorrect right now, and doesn't do a "prot" thing for
> non-highmem pages, but just returns "page_address(page);"
>
Actually, a "kmap_atomic_prot_pfn()" has been lurking in the drm repos
for some time now, but hasn't been suggested for upstream. It was
intended for drivers that require quick in-kernel patching of
write-combined io and highmem pages. The latter is a common situation
for PCIE graphics devices with their own MMU, so IMHO an exported
kmap_atomic_pfn_prot() would be a big help in such cases.
/Thomas
> Linus
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> --
> _______________________________________________
> Dri-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>
On Sat, 2008-10-18 at 12:31 -0700, Linus Torvalds wrote:
> The important thing is that mappings need to be per-CPU, so the above may
> work, but only if it's designed so that "io_reserve_pci_resource()" will
> actually reserve space for 'nr_possible_cpu' page mappings, and then the
> "io_[un]map_atomic()" functions do per-CPU mappings.
Yes, of course. The goal is to make it look exactly like
kmap_atomic_pfn, but work on non-memory pages even on non-HIGHMEM
configs for x86 (we have to use ioremap on those systems today, which is
a fairly significant performance problem).
> And quite frankly, even so, we'd possibly still be _better_ off with just
> exposing the "kmap_atomic_pfn()" functionality even so. Because quite
> frankly, your "io_reserve_pci_resource()" infrastructure is going to
> inevitably be more complex and slower than the rather efficient
> kmap_atomic_pfn() thing we have.
I want it to work just like kmap_atomic_pfn; Venki wanted some way to
"know" that no-one else was mapping the pages in non-WC mode. This
seemed like a reasonable compromise; the 'mapping' object exists solely
to hold the desired prot value to keep all PTEs consistent across the
whole system.
> One small detail: our we currently have "kmap_atomic_pfn()" and
> "kmap_atomic_prot()", and we really should maek the fundamental core
> operation be "kmap_atomic_pfn_prot()", and have everything be done in
> terms of that. Looking at it, it also looks like kmap_atomic_prot() is
> actually incorrect right now, and doesn't do a "prot" thing for
> non-highmem pages, but just returns "page_address(page);"
Fortunately, we use kmap_atomic_pfn only for our BAR, which is not a
non-highmem page.
Right now, we're counting on having our BAR covered by an MTRR register
so that we get WC access here. I'd like to have the API expose this so
that the driver will work on systems which don't have a spare MTRR for
the graphics aperture.
--
[email protected]
* Linus Torvalds <[email protected]> wrote:
> [ The *non-atomic* kmap() functions are fairly high-overhead, in that they
> want to keep track of cached mappings and remember page addresses etc.
> So those are the ones we don't want to support for non-HIGHMEM setups.
>
> But the atomic kmaps are pretty simple, and really only need some
> trivial FIXMAP support. We could easily extend it for x86-64, methinks,
> and do it for x86-32 even when we don't do HIGHMEM.
>
> Ingo? ]
agreed, and there's certainly no resistance from the x86 architecture
side to extend our mapping APIs in such directions.
But i think the direction of the new GEM code is subtly wrong here,
because it tries to manage memory even on 64-bit systems. IMO it should
just map the _whole_ graphics aperture (non-cached) and be done with it.
There's no faster method at managing pages than the CPU doing a TLB fill
from pagetables.
The only real API need i see is on 32-bit: with a 1GB or 2GB graphics
aperture we just cannot map that permanently, so kmap_atomic() is a
necessity. We can certainly extend that to non-highmem as well.
But this should be an ad-hoc transitionary thing for 32-bit, and on
64-bit we really should not be using any form of kmap.
Especially with large vertex buffers or textures, mapping a lot of pages
via kmap is not going to be trivial overhead - even if INVLPG is faster
than a full TLB flush, it's still on the order of 100-200 cycles - and
with a lot of pages that mounts up quickly. And if i understood your
workload correctly you want to do tens of thousand of map/unmap/remap
events per frame generated - depending on the type of the 3D app/engine.
Or am i missing something subtle? Why do you want the overhead of kmap
on 64-bit?
Ingo
On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:
> But i think the direction of the new GEM code is subtly wrong here,
> because it tries to manage memory even on 64-bit systems. IMO it should
> just map the _whole_ graphics aperture (non-cached) and be done with it.
> There's no faster method at managing pages than the CPU doing a TLB fill
> from pagetables.
Yeah, we're stuck thinking that we "can't" map the aperture because it's
too large, but with a 64-bit kernel, we should be able to keep it mapped
permanently.
Of course, the io_reserve_pci_resource and io_map_atomic functions could
do precisely that, as kmap_atomic does on non-HIGHMEM systems today.
> The only real API need i see is on 32-bit: with a 1GB or 2GB graphics
> aperture we just cannot map that permanently, so kmap_atomic() is a
> necessity. We can certainly extend that to non-highmem as well.
Yes, this is where exposing an io-specific atomic mapping function will
remain necessary for some time.
> And if i understood your
> workload correctly you want to do tens of thousand of map/unmap/remap
> events per frame generated - depending on the type of the 3D app/engine.
Yeah, data transfer from CPU to GPU is through a pwrite interface, and
we perform the transfer within the kernel using map/unmap operations on
the aperture as those are WC and hence do not require clflush.
> Or am i missing something subtle? Why do you want the overhead of kmap
> on 64-bit?
We don't, but I think it would be nice to have a common API that works
across all 32-bit configurations as well as 64-bit systems.
--
[email protected]
* Keith Packard <[email protected]> wrote:
> On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:
>
> > But i think the direction of the new GEM code is subtly wrong here,
> > because it tries to manage memory even on 64-bit systems. IMO it
> > should just map the _whole_ graphics aperture (non-cached) and be
> > done with it. There's no faster method at managing pages than the
> > CPU doing a TLB fill from pagetables.
>
> Yeah, we're stuck thinking that we "can't" map the aperture because
> it's too large, but with a 64-bit kernel, we should be able to keep it
> mapped permanently.
>
> Of course, the io_reserve_pci_resource and io_map_atomic functions
> could do precisely that, as kmap_atomic does on non-HIGHMEM systems
> today.
okay, so basically what we need is a shared API that does per page
kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the
impression that you were suggesting to extend kmap_atomic() to 64-bit -
which would be wrong.
So, in terms of the 4 APIs you suggest:
struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
int bar,
int prot);
void io_mapping_free(struct io_mapping *mapping);
void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
here is what we'd do on 64-bit:
- io_reserve_pci_resource() would just do an ioremap(), and would save
the ioremap-ed memory into struct io_mapping.
- io_mapping_free() does the iounmap()
- io_map_atomic(): just arithmetics, returns mapping->base + pfn - no
TLB activities at all.
- io_unmap_atomic(): NOP.
it's as fast as it gets: zero overhead in essence. Note that it's also
shared between all CPUs and there's no aliasing trouble.
And we could make it even faster: if you think we could even use 2MB
TLBs for the _linear_ ioremap()s here, hm? There's plenty of address
space on 64-bit so we can align to 2MB just fine - and aperture sizes
are 2MB sized anyway.
Or we could go one step further and install these aperture mappings into
the _kernel linear_ address space. That would be even faster, because
we'd have a constant offset. We have the (2MB mappings aware) mechanism
for that already. (Yinghai Cc:-ed - he did a lot of great work to
generalize this area.)
(In fact if we installed it into the linear kernel address space, and if
the aperture is 1GB aligned, we will automatically use gbpages for it.
Were Intel to support gbpages in the future ;-)
the _real_ remapping in a graphics aperture happens on the GPU level
anyway, you manage an in-RAM GPU pagetable that just works like an
IOMMU, correct?
on 32-bit we'd have what you use in the GEM code today:
- io_reserve_pci_resource(): a NOP in essence
- io_mapping_free(): a NOP
- io_map_atomic(): does a kmap_atomic(pfn)
- io_unmap_atomic(): does a kunmap_atomic(pfn)
so on 32-bit we have the INVLPG TLB overhead and preemption restrictions
- but we knew that. We'd have to allow atomic_kmap() on non-highmem as
well but that's fair.
Mind sending patches for this? :-)
Ingo
On Sat, Oct 18, 2008 at 6:32 PM, Ingo Molnar <[email protected]> wrote:
>
> * Keith Packard <[email protected]> wrote:
>
>> On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:
>>
>> > But i think the direction of the new GEM code is subtly wrong here,
>> > because it tries to manage memory even on 64-bit systems. IMO it
>> > should just map the _whole_ graphics aperture (non-cached) and be
>> > done with it. There's no faster method at managing pages than the
>> > CPU doing a TLB fill from pagetables.
>>
>> Yeah, we're stuck thinking that we "can't" map the aperture because
>> it's too large, but with a 64-bit kernel, we should be able to keep it
>> mapped permanently.
>>
>> Of course, the io_reserve_pci_resource and io_map_atomic functions
>> could do precisely that, as kmap_atomic does on non-HIGHMEM systems
>> today.
>
> okay, so basically what we need is a shared API that does per page
> kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the
> impression that you were suggesting to extend kmap_atomic() to 64-bit -
> which would be wrong.
Is it possible to use a segment register to map the whole aperture on
32b? A segment register might allow common code on 64b/32b by
eliminating the need to move the mapping window around.
>
> So, in terms of the 4 APIs you suggest:
>
> struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
> int bar,
> int prot);
> void io_mapping_free(struct io_mapping *mapping);
>
> void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
> void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
>
> here is what we'd do on 64-bit:
>
> - io_reserve_pci_resource() would just do an ioremap(), and would save
> the ioremap-ed memory into struct io_mapping.
>
> - io_mapping_free() does the iounmap()
>
> - io_map_atomic(): just arithmetics, returns mapping->base + pfn - no
> TLB activities at all.
>
> - io_unmap_atomic(): NOP.
>
> it's as fast as it gets: zero overhead in essence. Note that it's also
> shared between all CPUs and there's no aliasing trouble.
>
> And we could make it even faster: if you think we could even use 2MB
> TLBs for the _linear_ ioremap()s here, hm? There's plenty of address
> space on 64-bit so we can align to 2MB just fine - and aperture sizes
> are 2MB sized anyway.
>
> Or we could go one step further and install these aperture mappings into
> the _kernel linear_ address space. That would be even faster, because
> we'd have a constant offset. We have the (2MB mappings aware) mechanism
> for that already. (Yinghai Cc:-ed - he did a lot of great work to
> generalize this area.)
>
> (In fact if we installed it into the linear kernel address space, and if
> the aperture is 1GB aligned, we will automatically use gbpages for it.
> Were Intel to support gbpages in the future ;-)
>
> the _real_ remapping in a graphics aperture happens on the GPU level
> anyway, you manage an in-RAM GPU pagetable that just works like an
> IOMMU, correct?
>
> on 32-bit we'd have what you use in the GEM code today:
>
> - io_reserve_pci_resource(): a NOP in essence
>
> - io_mapping_free(): a NOP
>
> - io_map_atomic(): does a kmap_atomic(pfn)
>
> - io_unmap_atomic(): does a kunmap_atomic(pfn)
>
> so on 32-bit we have the INVLPG TLB overhead and preemption restrictions
> - but we knew that. We'd have to allow atomic_kmap() on non-highmem as
> well but that's fair.
>
> Mind sending patches for this? :-)
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Jon Smirl
[email protected]
On Sat, 18 Oct 2008, Jon Smirl wrote:
>
> Is it possible to use a segment register to map the whole aperture on
> 32b?
No. Segment registers don't extend the virtual address space, they can
only limit visibility into the one single 32-bit one.
IOW, segment registers don't actually extend addressing in any way, they
only limit it. There's a reason why people don't use them (except as a
strange base register for things like per-cpu or per-thread variables, and
that is not to extend the address space, but to avoid wasting precious
_useful_ registers on that).
Linus
On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> the _real_ remapping in a graphics aperture happens on the GPU level
> anyway, you manage an in-RAM GPU pagetable that just works like an
> IOMMU, correct?
Yes, a one-level linear MMU which uses BIOS-reserved memory.
So, at least for a prototype, on 64-bit we can just use ioremap_wc and
hold the mapping while the driver is open? Is there any huge benefit to
using the kernel mapping?
> so on 32-bit we have the INVLPG TLB overhead and preemption restrictions
> - but we knew that. We'd have to allow atomic_kmap() on non-highmem as
> well but that's fair.
Yes, the non-highmem case is currently in fairly bad shape.
> Mind sending patches for this? :-)
I've got Venki lined up to do this work for me; once we're happy enough
with the API. In particular, the non-highmem 32-bit case seems a bit
tricky.
Also, does anyone have a better set of names for this stuff?
io_reserve_pci_mapping seems fairly ugly to me.
--
[email protected]
On Sat, 18 Oct 2008 17:38:11 -0700
Keith Packard <[email protected]> wrote:
> I've got Venki lined up to do this work for me; once we're happy
> enough with the API. In particular, the non-highmem 32-bit case seems
> a bit tricky.
>
> Also, does anyone have a better set of names for this stuff?
> io_reserve_pci_mapping seems fairly ugly to me.
>
how about pci_resource_force_caching(pci_dev, barnr, cachetype)?
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sat, 2008-10-18 at 18:06 -0700, Arjan van de Ven wrote:
> how about pci_resource_force_caching(pci_dev, barnr, cachetype)?
and then 'pci_map_atomic'? Not sure this makes sense in the pci
namespace.
--
[email protected]
On Sunday 19 October 2008 06:31, Linus Torvalds wrote:
> On Sat, 18 Oct 2008, Keith Packard wrote:
> > The basic plan is to have four new functions (yes, I'm making up names
> > here):
> >
> > struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
> > int bar,
> > int prot);
> > void io_mapping_free(struct io_mapping *mapping);
> >
> > void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
> > void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);
>
> The important thing is that mappings need to be per-CPU, so the above may
> work, but only if it's designed so that "io_reserve_pci_resource()" will
> actually reserve space for 'nr_possible_cpu' page mappings, and then the
> "io_[un]map_atomic()" functions do per-CPU mappings.
>
> Anything else is a disaster, because anything else implies TLB shootdown.
TLB shootdown need only be implied if the behaviour required is to
unmap the virtual address *and* cause any other CPU that subsequently
touches it to fault.
For kva, that would be a bug anyway (use after free). The only thing
it implies is that a TLB shootdown happens at some point before the
address get reused.
Still, it's always going to be faster than a global mapping, if done
properly. I was thinking about doing a vmap_atomic thing generically
in the vmap layer... why exactly do we need the FIXMAP stuff for it?
On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> Mind sending patches for this? :-)
Here's what these functions would look like as wrappers on top of the
existing APIs:
/* x86_64 style */
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return (struct io_reserve *) ioremap_wc(base, size);
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
return ((char *) reserve) + offset;
}
static inline void
iounmap_atomic(void *vaddr)
{
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
iounmap(reserve);
}
/* HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return (struct io_reserve *) base;
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
offset += (unsigned long) reserve;
return kmap_atomic(offset >> PAGE_SHIFT, KM_USER0);
}
static inline void
iounmap_atomic(void *vaddr)
{
kunmap_atomic(vaddr, KM_USER0);
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
#endif
/* 32-bit non-HIGHMEM style */
#if 0
static inline struct io_reserve *
iomap_reserve(unsigned long base, unsigned long size)
{
return NULL;
}
static inline void *
iomap_atomic(struct io_reserve *reserve, unsigned long offset)
{
return NULL;
}
static inline void
iounmap_atomic(void *vaddr)
{
}
static inline void
iomap_unreserve(struct io_reserve *reserve)
{
}
#endif
>
> Ingo
--
[email protected]
On Sat, Oct 18, 2008 at 3:32 PM, Ingo Molnar <[email protected]> wrote:
>
> (In fact if we installed it into the linear kernel address space, and if
> the aperture is 1GB aligned, we will automatically use gbpages for it.
> Were Intel to support gbpages in the future ;-)
>
we could expand init_memory_mapping to support direct map for them...
with needed prot.
or use set init_memory_mapping() + set_memory_wb() directly.
but that is only for 64bit x86
also someone is talking about to have 6 pcie display adapters on 64bit
system. and every card will have 4g ram.
32bit could use fix map for 1G or 2G mapping.
YH
On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
> On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
>
> > Mind sending patches for this? :-)
Here's a patch for the i915 driver that includes the new API. Tested on
x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in the
i915 directory for this patch, but it should go elsewhere.
The new APIs are:
io_reserve_create_wc
io_reserve_free
io_reserve_map_atomic_wc
io_reserve_unmap_atomic
io_reserve_map_wc
io_reserve_unmap
I added the non-atomic variants at Eric's suggestion so that we can use
the direct map on x86_64, avoiding any use of ioremap at run-time. I
think the resulting code looks quite a bit cleaner now. Also, one
benchmark I tried ran about 18 times faster in 64-bit mode than the
original code.
From 2f6b125cb586a0671a2b9c22aadb03fcafdf99ab Mon Sep 17 00:00:00 2001
From: Keith Packard <[email protected]>
Date: Sat, 18 Oct 2008 22:59:58 -0700
Subject: [PATCH] [drm/i915] Create new 'io_reserve' API for mapping GTT pages
The io_reserve API abstracts away the operations necessary to construct
mappings for our GTT aperture, providing atomic and non-atomic mappings for
GTT pages that work efficiently on x86_64 and x86_32+HIGHMEM configurations.
This eliminates the in-drive abuse of the kmap_atomic_pfn function as well
as improving GEM performance on x86_64 architecture.
Signed-off-by: Keith Packard <[email protected]>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 71 ++++++++++++----------
drivers/gpu/drm/i915/i915_irq.c | 4 +-
drivers/gpu/drm/i915/io_reserve.h | 122 +++++++++++++++++++++++++++++++++++++
4 files changed, 165 insertions(+), 35 deletions(-)
create mode 100644 drivers/gpu/drm/i915/io_reserve.h
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..c99b9ea 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
#define _I915_DRV_H_
#include "i915_reg.h"
+#include "io_reserve.h"
/* General customization:
*/
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
struct {
struct drm_mm gtt_space;
+ struct io_reserve *io_reserve;
+
/**
* List of objects currently involved in rendering from the
* ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9255088..cd9e489 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -177,14 +177,14 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_file *file_priv)
{
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ drm_i915_private_t *dev_priv = dev->dev_private;
ssize_t remain;
- loff_t offset;
+ loff_t offset, base;
char __user *user_data;
char __iomem *vaddr;
char *vaddr_atomic;
- int i, o, l;
+ int o, l;
int ret = 0;
- unsigned long pfn;
unsigned long unwritten;
user_data = (char __user *) (uintptr_t) args->data_ptr;
@@ -211,42 +211,41 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
while (remain > 0) {
/* Operation in this page
*
- * i = page number
+ * base = page offset within aperture
* o = offset within page
* l = bytes to copy
*/
- i = offset >> PAGE_SHIFT;
+ base = (offset & ~(PAGE_SIZE-1));
o = offset & (PAGE_SIZE-1);
l = remain;
if ((o + l) > PAGE_SIZE)
l = PAGE_SIZE - o;
- pfn = (dev->agp->base >> PAGE_SHIFT) + i;
-
-#ifdef CONFIG_HIGHMEM
/* This is a workaround for the low performance of iounmap
* (approximate 10% cpu cost on normal 3D workloads).
- * kmap_atomic on HIGHMEM kernels happens to let us map card
- * memory without taking IPIs. When the vmap rework lands
- * we should be able to dump this hack.
+ * io_reserve_map_atomic_wc maps card memory
+ * without taking IPIs.
+ */
+ vaddr_atomic = io_reserve_map_atomic_wc(dev_priv->mm.io_reserve,
+ base);
+ if (vaddr_atomic) {
+ unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o,
+ user_data, l);
+ io_reserve_unmap_atomic(vaddr_atomic);
+ } else
+ unwritten = l;
+
+ /* If we get a fault while copying data, then (presumably) our
+ * source page isn't available. In this case, use the
+ * non-atomic __copy_from_user function
*/
- vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
-#if WATCH_PWRITE
- DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr_atomic);
-#endif
- unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o,
- user_data, l);
- kunmap_atomic(vaddr_atomic, KM_USER0);
-
if (unwritten)
-#endif /* CONFIG_HIGHMEM */
{
- vaddr = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+ vaddr = io_reserve_map_wc(dev_priv->mm.io_reserve, base);
#if WATCH_PWRITE
- DRM_INFO("pwrite slow i %d o %d l %d "
- "pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr);
+ DRM_INFO("pwrite slow base %ld o %d l %d "
+ "vaddr %p\n",
+ base, o, l, vaddr);
#endif
if (vaddr == NULL) {
ret = -EFAULT;
@@ -256,7 +255,7 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
#if WATCH_PWRITE
DRM_INFO("unwritten %ld\n", unwritten);
#endif
- iounmap(vaddr);
+ io_reserve_unmap(vaddr);
if (unwritten) {
ret = -EFAULT;
goto fail;
@@ -1489,6 +1488,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
struct drm_i915_gem_exec_object *entry)
{
struct drm_device *dev = obj->dev;
+ drm_i915_private_t *dev_priv = dev->dev_private;
struct drm_i915_gem_relocation_entry reloc;
struct drm_i915_gem_relocation_entry __user *relocs;
struct drm_i915_gem_object *obj_priv = obj->driver_private;
@@ -1621,12 +1621,11 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
(last_reloc_offset & ~(PAGE_SIZE - 1)) !=
(reloc_offset & ~(PAGE_SIZE - 1))) {
if (reloc_page != NULL)
- iounmap(reloc_page);
+ io_reserve_unmap(reloc_page);
- reloc_page = ioremap_wc(dev->agp->base +
- (reloc_offset &
- ~(PAGE_SIZE - 1)),
- PAGE_SIZE);
+ reloc_page = io_reserve_map_wc(dev_priv->mm.io_reserve,
+ (reloc_offset &
+ ~(PAGE_SIZE - 1)));
last_reloc_offset = reloc_offset;
if (reloc_page == NULL) {
drm_gem_object_unreference(target_obj);
@@ -1636,7 +1635,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
}
reloc_entry = (uint32_t __iomem *)(reloc_page +
- (reloc_offset & (PAGE_SIZE - 1)));
+ (reloc_offset & (PAGE_SIZE - 1)));
reloc_val = target_obj_priv->gtt_offset + reloc.delta;
#if WATCH_BUF
@@ -1661,7 +1660,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
}
if (reloc_page != NULL)
- iounmap(reloc_page);
+ io_reserve_unmap(reloc_page);
#if WATCH_BUF
if (0)
@@ -2504,6 +2503,10 @@ i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
if (ret != 0)
return ret;
+ dev_priv->mm.io_reserve = io_reserve_create_wc(dev->agp->base,
+ dev->agp->agp_info.aper_size
+ * 1024 * 1024);
+
mutex_lock(&dev->struct_mutex);
BUG_ON(!list_empty(&dev_priv->mm.active_list));
BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
@@ -2521,11 +2524,13 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
+ drm_i915_private_t *dev_priv = dev->dev_private;
int ret;
ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
+ io_reserve_free(dev_priv->mm.io_reserve);
return ret;
}
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index ce866ac..de8e084 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -784,8 +784,8 @@ int i915_vblank_swap(struct drm_device *dev, void *data,
if (dev_priv->swaps_pending >= 10) {
DRM_DEBUG("Too many swaps queued\n");
DRM_DEBUG(" pipe 0: %d pipe 1: %d\n",
- drm_vblank_count(dev, 0);
- drm_vblank_count(dev, 1);
+ drm_vblank_count(dev, 0),
+ drm_vblank_count(dev, 1));
list_for_each(list, &dev_priv->vbl_swaps.head) {
vbl_old = list_entry(list, drm_i915_vbl_swap_t, head);
diff --git a/drivers/gpu/drm/i915/io_reserve.h b/drivers/gpu/drm/i915/io_reserve.h
new file mode 100644
index 0000000..4e90a36
--- /dev/null
+++ b/drivers/gpu/drm/i915/io_reserve.h
@@ -0,0 +1,122 @@
+/*
+ * Copyright © 2008 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ * Keith Packard <[email protected]>
+ *
+ */
+/* x86_64 style */
+
+#ifndef _IO_RESERVE_H_
+#define _IO_RESERVE_H_
+
+/* this struct isn't actually defined anywhere */
+struct io_reserve;
+
+#ifdef CONFIG_X86_64
+
+/* Create the io_reserve object*/
+static inline struct io_reserve *
+io_reserve_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_reserve *) ioremap_wc(base, size);
+}
+
+static inline void
+io_reserve_free(struct io_reserve *reserve)
+{
+ iounmap(reserve);
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_reserve_map_atomic_wc(struct io_reserve *reserve, unsigned long offset)
+{
+ return ((char *) reserve) + offset;
+}
+
+static inline void
+io_reserve_unmap_atomic(void *vaddr)
+{
+}
+
+/* Non-atomic map/unmap */
+static inline void *
+io_reserve_map_wc(struct io_reserve *reserve, unsigned long offset)
+{
+ return ((char *) reserve) + offset;
+}
+
+static inline void
+io_reserve_unmap(void *vaddr)
+{
+}
+
+#endif /* CONFIG_X86_64 */
+
+#ifdef CONFIG_X86_32
+static inline struct io_reserve *
+io_reserve_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_reserve *) base;
+}
+
+static inline void
+io_reserve_free(struct io_reserve *reserve)
+{
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_reserve_map_atomic_wc(struct io_reserve *reserve, unsigned long offset)
+{
+#ifdef CONFIG_HIGHMEM
+ offset += (unsigned long) reserve;
+ return kmap_atomic_pfn(offset >> PAGE_SHIFT, KM_USER0);
+#else
+ return NULL;
+#endif
+}
+
+static inline void
+io_reserve_unmap_atomic(void *vaddr)
+{
+#ifdef CONFIG_HIGHMEM
+ kunmap_atomic(vaddr, KM_USER0);
+#endif
+}
+
+static inline void *
+io_reserve_map_wc(struct io_reserve *reserve, unsigned long offset)
+{
+ offset += (unsigned long) reserve;
+ return ioremap_wc(offset, PAGE_SIZE);
+}
+
+static inline void
+io_reserve_unmap(void *vaddr)
+{
+ iounmap(vaddr);
+}
+#endif /* CONFIG_X86_32 */
+
+#endif /* _IO_RESERVE_H_ */
--
1.5.6.5
--
[email protected]
On Sat, 2008-10-18 at 00:49 -0700, Eric Anholt wrote:
> Writing 3D drivers means running 3D games. Running 3D games
> unfortunately means running a lot of 32-bit userland as the fun stuff is
> binary-only. So I stick to a 32-bit system, becuase past experience
> trying to run both on the same system has been misery.
You can run 32bit user-space on a 64bit kernel just fine.
Any breakage, if anything, you find will need to get fixed anyway.
Also, one would hope Intel is poking these game writers to migrate to
64bit Linux ;-)
* Keith Packard <[email protected]> wrote:
> On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
> > On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> >
> > > Mind sending patches for this? :-)
>
> Here's a patch for the i915 driver that includes the new API. Tested
> on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in
> the i915 directory for this patch, but it should go elsewhere.
>
> The new APIs are:
>
> io_reserve_create_wc
> io_reserve_free
> io_reserve_map_atomic_wc
> io_reserve_unmap_atomic
> io_reserve_map_wc
> io_reserve_unmap
very nice!
I think we need a somewhat different abstraction though.
Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move
to generic code.
Secondly, wouldnt the right abstraction be to attach this functionality
to 'struct resource' ? [or at least create a second struct that embedds
struct resource]
this abstraction is definitely not a PCI thing and not a
detached-from-everything thing, it's an IO resource thing. We could make
it a property of struct resource:
struct resource {
resource_size_t start;
resource_size_t end;
const char *name;
unsigned long flags;
struct resource *parent, *sibling, *child;
+ void *mapping;
};
The APIs would be:
int io_resource_init_mapping(struct resource *res);
void io_resource_free_mapping(struct resource *res);
void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
void io_resource_unmap(struct resource *res, void *kaddr);
Note how simple and consistent it all gets: IO resources already know
their physical location and their size limits. Being able to cache an
ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
relatively simple and natural extension to the concept.
i think that would be quite acceptable - and the APIs could just
transparently work on it. This would also allow the PCI code to
automatically unmap any cached mappings from resources, when the driver
deinitializes.
Linus, Jesse, what do you think?
i think we need to finalize the API names and their abstraction level,
and then could even merge those APIs into v2.6.28 on a fast path, to
enable you to use it. It does not interact with anything else so it
should be safe to do.
(i'd not suggest to merge the i915 bits just yet - but that's obviously
not my call.)
Ingo
On Sun, 19 Oct 2008 19:53:20 +0200
Ingo Molnar <[email protected]> wrote:
>
> struct resource {
> resource_size_t start;
> resource_size_t end;
> const char *name;
> unsigned long flags;
> struct resource *parent, *sibling, *child;
> + void *mapping;
> };
>
> The APIs would be:
>
> int io_resource_init_mapping(struct resource *res);
> void io_resource_free_mapping(struct resource *res);
> void * io_resource_map(struct resource *res, pfn_t pfn, unsigned
> long offset); void io_resource_unmap(struct resource *res, void
> *kaddr);
>
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit]
> is a relatively simple and natural extension to the concept.
and making a simple wrapper around this that turns "struct pci_dev,
barnr" into a resource would make sense too, but yes.
We need one more
io_resource_force_cachability(struct resource, cachetype)
or maybe only
io_resource_force_writecombine(struct resource)
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote:
> * Keith Packard <[email protected]> wrote:
>
> > On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
> > > On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
> > >
> > > > Mind sending patches for this? :-)
> >
> > Here's a patch for the i915 driver that includes the new API. Tested
> > on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in
> > the i915 directory for this patch, but it should go elsewhere.
> >
> > The new APIs are:
> >
> > io_reserve_create_wc
> > io_reserve_free
> > io_reserve_map_atomic_wc
> > io_reserve_unmap_atomic
> > io_reserve_map_wc
> > io_reserve_unmap
>
> very nice!
>
> I think we need a somewhat different abstraction though.
>
> Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move
> to generic code.
>
> Secondly, wouldnt the right abstraction be to attach this functionality
> to 'struct resource' ? [or at least create a second struct that embedds
> struct resource]
>
> this abstraction is definitely not a PCI thing and not a
> detached-from-everything thing, it's an IO resource thing. We could make
> it a property of struct resource:
>
> struct resource {
> resource_size_t start;
> resource_size_t end;
> const char *name;
> unsigned long flags;
> struct resource *parent, *sibling, *child;
> + void *mapping;
> };
>
> The APIs would be:
>
> int io_resource_init_mapping(struct resource *res);
> void io_resource_free_mapping(struct resource *res);
> void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
> void io_resource_unmap(struct resource *res, void *kaddr);
>
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
> relatively simple and natural extension to the concept.
>
> i think that would be quite acceptable - and the APIs could just
> transparently work on it. This would also allow the PCI code to
> automatically unmap any cached mappings from resources, when the driver
> deinitializes.
>
> Linus, Jesse, what do you think?
>
> i think we need to finalize the API names and their abstraction level,
> and then could even merge those APIs into v2.6.28 on a fast path, to
> enable you to use it. It does not interact with anything else so it
> should be safe to do.
This API needs the cacheability control, which I don't see in it
currently. In the past we've been relying on an MTRR over the GTT
resulting in all of our UC- mappings getting us the correct WC behavior,
but now there aren't enough MTRRs to go around and graphics loses out
(at about a 20% CPU cost as a conservative estimate). The primary goal
of the new API is to let us eat PAT costs up front so we don't have to
at runtime.
Second, we need to know when we're doing a mapping whether we're
affected by atomic scheduling restrictions. Right now our plan has been
to try doing page-by-page
io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if
we fail at that at some point (map returns NULL or we get a partial
completion from copy_from_user_inatomic) then fall back to io_map_wc()
and copy_from_user() the whole thing at once. That gets us good
performance on both x86 with highmem and x86-64, and not too shabby
performance on x86 non-highmem.
Also, while it's rare, there have been graphics cards (looking at you,
S3) where BARs were expensive for some reason and they stuffed both the
framebuffer and registers into one PCI BAR, where you want the FB to be
WC and the registers to be UC. Not sure if they would be supportable
with this API or not. And if it's not, I'm not sure how much we care to
design for them, but it's something to potentially consider.
Finally, I'm confused by the pfn and offset args to io_resource_map,
when I expected something parallel to ioremap but with our resource arg
added.
--
Eric Anholt
[email protected] [email protected]
On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote:
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a
> relatively simple and natural extension to the concept.
I'm not sure I see any value in caching mappings here; we're mostly
interested in copying lots of data onto the card and so we use a lot of
different mappings; atomic mappings are easy to use, and efficient
enough.
Also, if we're using a resource, I'd like to just use byte offsets and
not pfns; the resource presumably knows the base address.
> i think we need to finalize the API names and their abstraction level,
> and then could even merge those APIs into v2.6.28 on a fast path, to
> enable you to use it. It does not interact with anything else so it
> should be safe to do.
Let's figure out the API we want, then figure out whether it makes sense
to stick it into 2.6.28 or whether we just want to wait for .29. I don't
mind rewriting the driver for the next release.
I will probably use something like the io_reserve stuff for .28 though;
it makes the code easier to read and gives us good performance on 64 bit
kernels.
--
[email protected]
On Sun, 2008-10-19 at 19:52 +0200, Peter Zijlstra wrote:
> On Sat, 2008-10-18 at 00:49 -0700, Eric Anholt wrote:
>
> > Writing 3D drivers means running 3D games. Running 3D games
> > unfortunately means running a lot of 32-bit userland as the fun stuff is
> > binary-only. So I stick to a 32-bit system, becuase past experience
> > trying to run both on the same system has been misery.
>
> You can run 32bit user-space on a 64bit kernel just fine.
> Any breakage, if anything, you find will need to get fixed anyway.
>
I'd like to second this. 32bit on 64bit works fine. I've been
maintaining separate multilib versions of the Gentoo libdrm/Mesa ebuilds
for 32bit/64bit. It is important that the 32bit and 64bit graphics
stacks are synchronised, rather than relying on external compatibility
libraries as Gentoo does currently. Failing to do so does lead to
instability and/or failure.
> Also, one would hope Intel is poking these game writers to migrate to
> 64bit Linux ;-)
>
Indeed! :)
* Keith Packard <[email protected]> wrote:
> On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
>
> > the _real_ remapping in a graphics aperture happens on the GPU level
> > anyway, you manage an in-RAM GPU pagetable that just works like an
> > IOMMU, correct?
>
> Yes, a one-level linear MMU which uses BIOS-reserved memory.
>
> So, at least for a prototype, on 64-bit we can just use ioremap_wc and
> hold the mapping while the driver is open? Is there any huge benefit
> to using the kernel mapping?
correct. There's no benefit to using the kernel mapping - and we can add
2MB pages support to ioremap just fine and then you'll benefit from it
automatically.
Ingo
* Ingo Molnar <[email protected]> wrote:
> very nice!
>
> I think we need a somewhat different abstraction though.
>
> Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to
> move to generic code.
>
> Secondly, wouldnt the right abstraction be to attach this
> functionality to 'struct resource' ? [or at least create a second
> struct that embedds struct resource]
>
> this abstraction is definitely not a PCI thing and not a
> detached-from-everything thing, it's an IO resource thing. We could
> make it a property of struct resource:
>
> struct resource {
> resource_size_t start;
> resource_size_t end;
> const char *name;
> unsigned long flags;
> struct resource *parent, *sibling, *child;
> + void *mapping;
> };
>
> The APIs would be:
>
> int io_resource_init_mapping(struct resource *res);
> void io_resource_free_mapping(struct resource *res);
> void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
> void io_resource_unmap(struct resource *res, void *kaddr);
>
> Note how simple and consistent it all gets: IO resources already know
> their physical location and their size limits. Being able to cache an
> ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is
> a relatively simple and natural extension to the concept.
>
> i think that would be quite acceptable - and the APIs could just
> transparently work on it. This would also allow the PCI code to
> automatically unmap any cached mappings from resources, when the
> driver deinitializes.
>
> Linus, Jesse, what do you think?
the downsize would be that we'd attach a runtime property to the
IORESOURCE_MEM resource tree - which is a fairly static thing right now,
after the point where we finalize the resource tree. (modulo
device/bridge hotplug variances)
Another downside is that we might not want to map the whole thing. I.e.
the structure of the IO memory space we want to map by drivers might be
different from how it looks like in the resource tree.
the concept of introducing resource->mapping does not feel _that_ wrong
though and has a couple of upsides: it could act as a natural mapping
type serializer for example and drivers wouldnt have to explicitly
manage ioremap results - they could just use the resource descriptor
directly and "read" and "write" to/from it. readl/writel could be
extended to operate on the resource descriptor transparently, getting
rid of a source of resource mismatches and overmapping, etc. etc. We
could even safety check IO space accesses this way.
and we'd get rid of the complication that your APIs introduced, the need
to introduce a separate io_mapping type, etc.
Dunno, i might be missing some obvious downside why this wasnt done like
that until now.
Ingo
* Eric Anholt <[email protected]> wrote:
> > The APIs would be:
> >
> > int io_resource_init_mapping(struct resource *res);
> > void io_resource_free_mapping(struct resource *res);
> > void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
> > void io_resource_unmap(struct resource *res, void *kaddr);
> >
> > Note how simple and consistent it all gets: IO resources already
> > know their physical location and their size limits. Being able to
> > cache an ioremap in a mapping [and being able to use atomic kmaps on
> > 32-bit] is a relatively simple and natural extension to the concept.
> >
> > i think that would be quite acceptable - and the APIs could just
> > transparently work on it. This would also allow the PCI code to
> > automatically unmap any cached mappings from resources, when the
> > driver deinitializes.
> >
> > Linus, Jesse, what do you think?
> >
> > i think we need to finalize the API names and their abstraction
> > level, and then could even merge those APIs into v2.6.28 on a fast
> > path, to enable you to use it. It does not interact with anything
> > else so it should be safe to do.
>
> This API needs the cacheability control, which I don't see in it
> currently. [...]
yes, these two should do the trick:
int io_resource_init_mapping_wc(struct resource *res);
int io_resource_init_mapping_wb(struct resource *res);
> Second, we need to know when we're doing a mapping whether we're
> affected by atomic scheduling restrictions. Right now our plan has
> been to try doing page-by-page
> io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if
> we fail at that at some point (map returns NULL or we get a partial
> completion from copy_from_user_inatomic) then fall back to io_map_wc()
> and copy_from_user() the whole thing at once. That gets us good
> performance on both x86 with highmem and x86-64, and not too shabby
> performance on x86 non-highmem.
that gets ugly very fast. I think we should not use atomic kmaps but
NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically
atomic kmaps but without the preemption restrictions). We could
take/drop the mutex and statistically you'll stay on the same CPU and
wont ever contend on that lock in practice.
> Also, while it's rare, there have been graphics cards (looking at you,
> S3) where BARs were expensive for some reason and they stuffed both
> the framebuffer and registers into one PCI BAR, where you want the FB
> to be WC and the registers to be UC. Not sure if they would be
> supportable with this API or not. And if it's not, I'm not sure how
> much we care to design for them, but it's something to potentially
> consider.
yes, this is a weakness of this API - you cannot mix multiple
cachability domains within the same BAR.
and that can happen on non-graphics as well: some storage controller
that has regular control registers in one portion of the BAR, which all
need to be consistently accessed via UC and properly POST-ed - while it
could also have some large mailbox structure at the end of the BAR,
which could be mapped both cacheable or perhaps WC.
So ... i guess we can go back to the io_mapping API proposed by Keith,
but not make it atomic kmap based but fixmap + mutex based - for good
32-bit performance. (and the fixmap would not be used on 64-bit at all)
> Finally, I'm confused by the pfn and offset args to io_resource_map,
> when I expected something parallel to ioremap but with our resource
> arg added.
ok.
Ingo
* Keith Packard <[email protected]> wrote:
> On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote:
>
> > Note how simple and consistent it all gets: IO resources already
> > know their physical location and their size limits. Being able to
> > cache an ioremap in a mapping [and being able to use atomic kmaps on
> > 32-bit] is a relatively simple and natural extension to the concept.
>
> I'm not sure I see any value in caching mappings here; we're mostly
> interested in copying lots of data onto the card and so we use a lot
> of different mappings; atomic mappings are easy to use, and efficient
> enough.
yes but note that by caching the whole mapping on 64-bit we get
everything we want: trivially lockless, works from any CPU, can be
preempted at will, and there are no ugly INVLPG flushes anywhere.
you'll even get 2MB mappings automatically, if the BAR is aligned and
sized correctly.
32-bit we should handle as well but not design for it.
Ingo
* Ingo Molnar <[email protected]> wrote:
> that gets ugly very fast. I think we should not use atomic kmaps but
> NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically
> atomic kmaps but without the preemption restrictions). We could
> take/drop the mutex and statistically you'll stay on the same CPU and
> wont ever contend on that lock in practice.
peterz pointed it out that we need to be careful with the TLBs in that
case.
I think it's solvable: a small non-default special-case in switch_to()
would invlpg any pending such page. (and no two such mappings could be
held at the same time)
So as long as we dont leave the CPU we wont ever do TLB flushes, and the
flush will be local even if we do.
Ingo
On Sat, 18 Oct 2008, Eric Anholt wrote:
>
> > It's not being disloyal to your CEO, really. I'm pretty sure nobody will
> > be fired just for ignoring that whole "640kB^H^H^H^H^H32-bits should be
> > enough for everybody" idiocy.
>
> Writing 3D drivers means running 3D games. Running 3D games
> unfortunately means running a lot of 32-bit userland as the fun stuff is
> binary-only.
That's not a very good reason, though. We're supposed to be perfectly
binary compatible, so it would actually be *better* if you did the testing
using a 64-bit kernel.
Sure, don't do it on _all_ machines, but running 32-bit user land is
definitely not a valid reason to avoid a 64-bit kernel. Quite the reverse.
We'd like to see more coverage.
Linus
On Friday, October 17, 2008 7:10 pm Eric Anholt wrote:
> On Fri, 2008-10-17 at 15:43 -0700, Linus Torvalds wrote:
> > On Fri, 17 Oct 2008, Dave Airlie wrote:
> > So what's the excuse _this_ time for adding all these stupid warnings to
> > my build log? Did nobody test this?
> >
> > drivers/gpu/drm/drm_proc.c: In function ‘drm_gem_one_name_info’:
> > drivers/gpu/drm/drm_proc.c:525: warning: format ‘%d’ expects type
> > ‘int’, but argument 3 has type ‘size_t’ drivers/gpu/drm/drm_proc.c:533:
> > warning: format ‘%9d’ expects type ‘int’, but argument 4 has type
> > ‘size_t’ drivers/gpu/drm/i915/i915_gem.c: In function
> > ‘i915_gem_gtt_pwrite’: drivers/gpu/drm/i915/i915_gem.c:184: warning:
> > unused variable ‘vaddr_atomic’
>
> Yeah, none of us are on x86-64, so we missed those warnings in testing.
Actually, I'm on x86_64 pretty much exclusively and saw these warnings last
week. But I didn't send a fix (yet); sorry.
That said, this code was far from untested, even though it did contain a few
compile warnings, so I think Linus's complaint about UNTESTED CRAP was at
least half wrong.
--
Jesse Barnes, Intel Open Source Technology Center
On Mon, 2008-10-20 at 13:58 +0200, Ingo Molnar wrote:
> yes but note that by caching the whole mapping on 64-bit we get
> everything we want: trivially lockless, works from any CPU, can be
> preempted at will, and there are no ugly INVLPG flushes anywhere.
I was assuming that on 64-bit, the map would be created at driver init
time and be left in place until the driver closed; if that's what you
mean by 'caching', then yes, we should cache the map.
> 32-bit we should handle as well but not design for it.
As long as we get kmap_atomic-like performance, and we get to simplify
our code, I'm up for it.
--
[email protected]
* Keith Packard <[email protected]> wrote:
> On Mon, 2008-10-20 at 13:58 +0200, Ingo Molnar wrote:
>
> > yes but note that by caching the whole mapping on 64-bit we get
> > everything we want: trivially lockless, works from any CPU, can be
> > preempted at will, and there are no ugly INVLPG flushes anywhere.
>
> I was assuming that on 64-bit, the map would be created at driver init
> time and be left in place until the driver closed; if that's what you
> mean by 'caching', then yes, we should cache the map.
correct.
> > 32-bit we should handle as well but not design for it.
>
> As long as we get kmap_atomic-like performance, and we get to simplify
> our code, I'm up for it.
okay. So ... mind sending your io_mapping patch as a generic facility?
It looks all good to me in its present form, except that it should live
in include/linux/io.h, not in the drivers/gpu/drm/i915/io_reserve.h file
:-)
also, please send at least two patches, so that we can look at (and
possibly merge) the generic facility in isolation.
Ingo
Switch the i915 device aperture mapping to the io-mapping interface, taking
advantage of the cleaner API to extend it across all of the mapping uses,
including both pwrite and relocation updates.
This dramatically improves performance on 64-bit kernels which were using
the same slow path as 32-bit non-HIGHMEM kernels prior to this patch.
Signed-off-by: Keith Packard <[email protected]>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 81 ++++++++++++++++-----------------------
2 files changed, 36 insertions(+), 48 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..8ca5fbc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
#define _I915_DRV_H_
#include "i915_reg.h"
+#include <linux/io-mapping.h>
/* General customization:
*/
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
struct {
struct drm_mm gtt_space;
+ struct io_mapping *io_mapping;
+
/**
* List of objects currently involved in rendering from the
* ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9255088..d38b052 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -177,14 +177,14 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_file *file_priv)
{
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ drm_i915_private_t *dev_priv = dev->dev_private;
ssize_t remain;
- loff_t offset;
+ loff_t offset, base;
char __user *user_data;
char __iomem *vaddr;
char *vaddr_atomic;
- int i, o, l;
+ int o, l;
int ret = 0;
- unsigned long pfn;
unsigned long unwritten;
user_data = (char __user *) (uintptr_t) args->data_ptr;
@@ -211,42 +211,38 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
while (remain > 0) {
/* Operation in this page
*
- * i = page number
+ * base = page offset within aperture
* o = offset within page
* l = bytes to copy
*/
- i = offset >> PAGE_SHIFT;
+ base = (offset & ~(PAGE_SIZE-1));
o = offset & (PAGE_SIZE-1);
l = remain;
if ((o + l) > PAGE_SIZE)
l = PAGE_SIZE - o;
- pfn = (dev->agp->base >> PAGE_SHIFT) + i;
-
-#ifdef CONFIG_HIGHMEM
/* This is a workaround for the low performance of iounmap
* (approximate 10% cpu cost on normal 3D workloads).
- * kmap_atomic on HIGHMEM kernels happens to let us map card
- * memory without taking IPIs. When the vmap rework lands
- * we should be able to dump this hack.
+ * io_mapping_map_atomic_wc maps card memory
+ * without taking IPIs.
*/
- vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
-#if WATCH_PWRITE
- DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr_atomic);
-#endif
+ vaddr_atomic = io_mapping_map_atomic_wc(dev_priv->mm.io_mapping,
+ base);
unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o,
user_data, l);
- kunmap_atomic(vaddr_atomic, KM_USER0);
-
+ io_mapping_unmap_atomic(vaddr_atomic);
+
+ /* If we get a fault while copying data, then (presumably) our
+ * source page isn't available. In this case, use the
+ * non-atomic __copy_from_user function
+ */
if (unwritten)
-#endif /* CONFIG_HIGHMEM */
{
- vaddr = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+ vaddr = io_mapping_map_wc(dev_priv->mm.io_mapping, base);
#if WATCH_PWRITE
- DRM_INFO("pwrite slow i %d o %d l %d "
- "pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr);
+ DRM_INFO("pwrite slow base %ld o %d l %d "
+ "vaddr %p\n",
+ base, o, l, vaddr);
#endif
if (vaddr == NULL) {
ret = -EFAULT;
@@ -256,7 +252,7 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
#if WATCH_PWRITE
DRM_INFO("unwritten %ld\n", unwritten);
#endif
- iounmap(vaddr);
+ io_mapping_unmap(vaddr);
if (unwritten) {
ret = -EFAULT;
goto fail;
@@ -1489,12 +1485,12 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
struct drm_i915_gem_exec_object *entry)
{
struct drm_device *dev = obj->dev;
+ drm_i915_private_t *dev_priv = dev->dev_private;
struct drm_i915_gem_relocation_entry reloc;
struct drm_i915_gem_relocation_entry __user *relocs;
struct drm_i915_gem_object *obj_priv = obj->driver_private;
int i, ret;
- uint32_t last_reloc_offset = -1;
- void __iomem *reloc_page = NULL;
+ void __iomem *reloc_page;
/* Choose the GTT offset for our buffer and put it there. */
ret = i915_gem_object_pin(obj, (uint32_t) entry->alignment);
@@ -1617,26 +1613,11 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
* perform.
*/
reloc_offset = obj_priv->gtt_offset + reloc.offset;
- if (reloc_page == NULL ||
- (last_reloc_offset & ~(PAGE_SIZE - 1)) !=
- (reloc_offset & ~(PAGE_SIZE - 1))) {
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
- reloc_page = ioremap_wc(dev->agp->base +
- (reloc_offset &
- ~(PAGE_SIZE - 1)),
- PAGE_SIZE);
- last_reloc_offset = reloc_offset;
- if (reloc_page == NULL) {
- drm_gem_object_unreference(target_obj);
- i915_gem_object_unpin(obj);
- return -ENOMEM;
- }
- }
-
+ reloc_page = io_mapping_map_atomic_wc(dev_priv->mm.io_mapping,
+ (reloc_offset &
+ ~(PAGE_SIZE - 1)));
reloc_entry = (uint32_t __iomem *)(reloc_page +
- (reloc_offset & (PAGE_SIZE - 1)));
+ (reloc_offset & (PAGE_SIZE - 1)));
reloc_val = target_obj_priv->gtt_offset + reloc.delta;
#if WATCH_BUF
@@ -1645,6 +1626,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
readl(reloc_entry), reloc_val);
#endif
writel(reloc_val, reloc_entry);
+ io_mapping_unmap_atomic(reloc_page);
/* Write the updated presumed offset for this entry back out
* to the user.
@@ -1660,9 +1642,6 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
drm_gem_object_unreference(target_obj);
}
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
#if WATCH_BUF
if (0)
i915_gem_dump_object(obj, 128, __func__, ~0);
@@ -2504,6 +2483,10 @@ i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
if (ret != 0)
return ret;
+ dev_priv->mm.io_mapping = io_mapping_create_wc(dev->agp->base,
+ dev->agp->agp_info.aper_size
+ * 1024 * 1024);
+
mutex_lock(&dev->struct_mutex);
BUG_ON(!list_empty(&dev_priv->mm.active_list));
BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
@@ -2521,11 +2504,13 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
+ drm_i915_private_t *dev_priv = dev->dev_private;
int ret;
ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
+ io_mapping_free(dev_priv->mm.io_mapping);
return ret;
}
--
1.5.6.5
Graphics devices have large PCI apertures which would consume a significant
fraction of a 32-bit address space if mapped during driver initialization.
Using ioremap at runtime is impractical as it is too slow. This new set of
interfaces uses atomic mappings on 32-bit processors and a large static
mapping on 64-bit processors to provide reasonable 32-bit performance and
optimal 64-bit performance.
The current implementation sits atop the existing CONFIG_HIGHMEM kmap_atomic
mechanism for 32-bit processors when present. When absent, it just uses
ioremap, which remains horribly inefficient. Fixing non-HIGHMEM 32-bit
kernels to provide per-CPU mappings ala HIGHMEM would resolve that
performance issue.
Signed-off-by: Keith Packard <[email protected]>
---
Documentation/io-mapping.txt | 84 ++++++++++++++++++++++++++++
include/linux/io-mapping.h | 125 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 209 insertions(+), 0 deletions(-)
create mode 100644 Documentation/io-mapping.txt
create mode 100644 include/linux/io-mapping.h
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..ebf6dc5
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,84 @@
+The io_mapping functions in linux/io.h provide an abstraction for
+efficiently mapping small regions of an io device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as it would consume too much of the kernel address space.
+
+A mapping object is created during driver initialization using
+
+ struct io_mapping *
+ io_mapping_create_wc(unsigned long base, unsigned long size)
+
+ 'base' is the bus address of the region to be made
+ mappable, while 'size' indicates how large a mapping region to
+ enable. Both are in bytes.
+
+ This _wc variant provides a mapping which may only be used
+ with the io_mapping_map_atomic_wc or io_mapping_map_wc.
+
+With this mapping object, individual pages can be mapped either atomically
+or not, depending on the necessary scheduling environment. Of course, atomic
+maps are more efficient:
+
+ void *
+ io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+
+ 'offset' is the offset within the defined mapping region.
+ Accessing addresses beyond the region specified in the
+ creation function yields undefined results. Using an offset
+ which is not page aligned yields an undefined result. The
+ return value points to a single page in CPU address space.
+
+ This _wc variant returns a write-combining map to the
+ page and may only be used with
+
+ Note that the task may not sleep while holding this page
+ mapped.
+
+ void
+ io_mapping_unmap_atomic(void *vaddr)
+
+ 'vaddr' must be the the value returned by the last
+ io_mapping_map_atomic_wc call. This unmaps the specified
+ page, and allows the task to sleep once again.
+
+If you need to sleep while holding the lock, you can use the non-atomic
+variant, although they may be significantly slower;
+
+ void *
+ io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+
+ This works like io_mapping_map_atomic_wc except it allows
+ the task to sleep while holding the page mapped.
+
+ void
+ io_mapping_unmap(void *vaddr)
+
+ This works like io_mapping_unmap_atomic, except it is used
+ for pages mapped with io_mapping_map_wc.
+
+At driver close time, the io_mapping object must be freed:
+
+ void
+ io_mapping_free(struct io_mapping *mapping)
+
+Current Implementation:
+
+The initial implementation of these functions use existing mapping
+mechanisms and so provide only an abstraction layer and no new
+functionality.
+
+On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
+range, creating a permanent kernel-visible mapping to the resource. The
+map_atomic and map functions add the requested offset to the base of the
+virtual address returned by ioremap_wc.
+
+On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
+kmap_atomic_pfn to map the specified page in an atomic fashion;
+kmap_atomic_pfn isn't really supposed to be used with device pages, but it
+provides an efficient mapping for this usage.
+
+On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
+io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
+performs an IPI to inform all processors about the new mapping. This results
+in a significant performance penalty.
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
new file mode 100644
index 0000000..dcc24d5
--- /dev/null
+++ b/include/linux/io-mapping.h
@@ -0,0 +1,125 @@
+/*
+ * Copyright © 2008 Keith Packard <[email protected]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IO_MAPPING_H
+#define _LINUX_IO_MAPPING_H
+
+#include <linux/types.h>
+#include <asm/io.h>
+#include <asm/page.h>
+#include <linux/highmem.h>
+
+/*
+ * The io_mapping mechanism provides an abstraction for mapping
+ * individual pages from an io device to the CPU in an efficient fashion.
+ *
+ * See Documentation/io_mapping.txt
+ */
+
+/* this struct isn't actually defined anywhere */
+struct io_mapping;
+
+#ifdef CONFIG_X86_64
+
+/* Create the io_mapping object*/
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) ioremap_wc(base, size);
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+ iounmap(mapping);
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+}
+
+/* Non-atomic map/unmap */
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+}
+
+#endif /* CONFIG_X86_64 */
+
+#ifdef CONFIG_X86_32
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) base;
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+#ifdef CONFIG_HIGHMEM
+ return kmap_atomic_pfn(offset >> PAGE_SHIFT, KM_USER0);
+#else
+ return ioremap_wc(offset, PAGE_SIZE);
+#endif
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+#ifdef CONFIG_HIGHMEM
+ kunmap_atomic(vaddr, KM_USER0);
+#else
+ iounmap(vaddr);
+#endif
+}
+
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+ return ioremap_wc(offset, PAGE_SIZE);
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+ iounmap(vaddr);
+}
+#endif /* CONFIG_X86_32 */
+
+#endif /* _LINUX_IO_MAPPING_H */
--
1.5.6.5
> okay. So ... mind sending your io_mapping patch as a generic facility?
> It looks all good to me in its present form, except that it should live
> in include/linux/io.h, not in the drivers/gpu/drm/i915/io_reserve.h file
> :-)
The first patch in this series (assuming I'm driving git-send-email
correctly) adds the io_mapping API. I ended up creating a new
linux/io_mapping.h file as the kernel init code uses io.h and got very angry
when I tried to include linux/highmem.h from that. I'm afraid I gave up at
that point and just moved the code to a new file.
The second patch switches the drm/i915 driver to the new API. Performance
improvements on 64-bit kernels are impressive as we were using the slow path
before and now get to take advantage of 64-bit wonderfulness.
* Keith Packard <[email protected]> wrote:
> > okay. So ... mind sending your io_mapping patch as a generic
> > facility? It looks all good to me in its present form, except that
> > it should live in include/linux/io.h, not in the
> > drivers/gpu/drm/i915/io_reserve.h file
> > :-)
>
> The first patch in this series (assuming I'm driving git-send-email
> correctly) adds the io_mapping API. I ended up creating a new
> linux/io_mapping.h file as the kernel init code uses io.h and got very
> angry when I tried to include linux/highmem.h from that. I'm afraid I
> gave up at that point and just moved the code to a new file.
ah ... good call, i missed that mess. linux/io.h is indeed dependency
laden and it's best to keep new facilities separated anyway.
> The second patch switches the drm/i915 driver to the new API.
> Performance improvements on 64-bit kernels are impressive as we were
> using the slow path before and now get to take advantage of 64-bit
> wonderfulness.
heh, cool - the wonders of 64-bit x86 :-)
Any ballpark-figure numbers you can share with us?
Ingo
On Thu, 2008-10-23 at 10:05 +0200, Ingo Molnar wrote:
> Any ballpark-figure numbers you can share with us?
For one quake-3 based game we use for performance regression checking,
64-bit kernels run about 18 times faster now. That's the difference
between using a zero-cost dynamic mapping and using ioremap_wc for each
page.
--
[email protected]
We just ran some numbers on a box with PAT enabled and broken MTRRs.
Finally we have a test platform for the difference between kmap_atomic
and kmap_atomic_prot. Using regular kmap_atomic on this platform, we get
UC access to the graphics device; sending data from the CPU is a bit
slow. Adding kmap_atomic_prot_pfn and specifying WC access raises that
by a fairly significant factor, taking our CPU utilization for
copy_from_user from 40% to 2%.
Here's a patch which adds kmap_atomic_prot_pfn to the kernel for this
usage. When we add native io-mapping support instead of sitting on top
of kmap, we can remove this function.
I've reworked the io_mapping patches to use this function as well, along
with cleaning up the i915 code along the lines of Linus' current
version. I'll post those if this patch looks acceptable.
From e3f633dcb36889fa85ea06cca339072df3c44ae0 Mon Sep 17 00:00:00 2001
From: Keith Packard <[email protected]>
Date: Thu, 23 Oct 2008 11:53:45 -0700
Subject: [PATCH] Add kmap_atomic_prot_pfn
kmap_atomic_prot_pfn is a mixture of kmap_atomic_prot and kmap_atomic_pfn,
accepting both a pfn instead of a page and an explicit protection value.
Signed-off-by: Keith Packard <[email protected]>
---
arch/x86/mm/highmem_32.c | 19 +++++++++++++++++++
include/asm-x86/highmem.h | 1 +
include/linux/highmem.h | 1 +
3 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index bcc079c..91ae5e8 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -139,6 +139,25 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
}
EXPORT_SYMBOL_GPL(kmap_atomic_pfn); /* temporarily in use by i915 GEM until vmap */
+/* This is the same as kmap_atomic_prot() but can map memory that doesn't
+ * have a struct page associated with it.
+ */
+void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
+{
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+
+ pagefault_disable();
+
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
+ arch_flush_lazy_mmu_mode();
+
+ return (void*) vaddr;
+}
+EXPORT_SYMBOL_GPL(kmap_atomic_prot_pfn); /* temporarily in use by i915 GEM until vmap */
+
struct page *kmap_atomic_to_page(void *ptr)
{
unsigned long idx, vaddr = (unsigned long)ptr;
diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h
index bc3f6a2..a1f8f8c 100644
--- a/include/asm-x86/highmem.h
+++ b/include/asm-x86/highmem.h
@@ -66,6 +66,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot);
void *kmap_atomic(struct page *page, enum km_type type);
void kunmap_atomic(void *kvaddr, enum km_type type);
void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
+void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
struct page *kmap_atomic_to_page(void *ptr);
#ifndef CONFIG_PARAVIRT
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 7dcbc82..7fbee2c 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -55,6 +55,7 @@ static inline void *kmap_atomic(struct page *page, enum km_type idx)
#define kunmap_atomic(addr, idx) do { pagefault_enable(); } while (0)
#define kmap_atomic_pfn(pfn, idx) kmap_atomic(pfn_to_page(pfn), (idx))
+#define kmap_atomic_prot_pfn(pfn, idx, prot) kmap_atomic_prot(pfn_to_page(pfn), (idx), (prot))
#define kmap_atomic_to_page(ptr) virt_to_page(ptr)
#define kmap_flush_unused() do {} while(0)
--
1.5.6.5
--
[email protected]
On Thu, 23 Oct 2008 13:22:12 -0700
Keith Packard <[email protected]> wrote:
> We just ran some numbers on a box with PAT enabled and broken MTRRs.
> Finally we have a test platform for the difference between kmap_atomic
> and kmap_atomic_prot. Using regular kmap_atomic on this platform, we get
> UC access to the graphics device; sending data from the CPU is a bit
> slow. Adding kmap_atomic_prot_pfn and specifying WC access raises that
> by a fairly significant factor, taking our CPU utilization for
> copy_from_user from 40% to 2%.
>
> Here's a patch which adds kmap_atomic_prot_pfn to the kernel for this
> usage. When we add native io-mapping support instead of sitting on top
> of kmap, we can remove this function.
>
> I've reworked the io_mapping patches to use this function as well, along
> with cleaning up the i915 code along the lines of Linus' current
> version. I'll post those if this patch looks acceptable.
>
> From e3f633dcb36889fa85ea06cca339072df3c44ae0 Mon Sep 17 00:00:00 2001
> From: Keith Packard <[email protected]>
> Date: Thu, 23 Oct 2008 11:53:45 -0700
> Subject: [PATCH] Add kmap_atomic_prot_pfn
>
> kmap_atomic_prot_pfn is a mixture of kmap_atomic_prot and kmap_atomic_pfn,
> accepting both a pfn instead of a page and an explicit protection value.
>
> Signed-off-by: Keith Packard <[email protected]>
> ---
> arch/x86/mm/highmem_32.c | 19 +++++++++++++++++++
> include/asm-x86/highmem.h | 1 +
> include/linux/highmem.h | 1 +
> 3 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
> index bcc079c..91ae5e8 100644
> --- a/arch/x86/mm/highmem_32.c
> +++ b/arch/x86/mm/highmem_32.c
> @@ -139,6 +139,25 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
> }
> EXPORT_SYMBOL_GPL(kmap_atomic_pfn); /* temporarily in use by i915 GEM until vmap */
>
> +/* This is the same as kmap_atomic_prot() but can map memory that doesn't
> + * have a struct page associated with it.
> + */
> +void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
> +{
> + enum fixed_addresses idx;
> + unsigned long vaddr;
> +
> + pagefault_disable();
> +
> + idx = type + KM_TYPE_NR*smp_processor_id();
> + vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> + set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
> + arch_flush_lazy_mmu_mode();
> +
> + return (void*) vaddr;
> +}
> +EXPORT_SYMBOL_GPL(kmap_atomic_prot_pfn); /* temporarily in use by i915 GEM until vmap */
I guess one could reimplemenet kmap_atomic_pfn() to call this. Sometime.
> struct page *kmap_atomic_to_page(void *ptr)
> {
> unsigned long idx, vaddr = (unsigned long)ptr;
> diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h
> index bc3f6a2..a1f8f8c 100644
> --- a/include/asm-x86/highmem.h
> +++ b/include/asm-x86/highmem.h
> @@ -66,6 +66,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot);
> void *kmap_atomic(struct page *page, enum km_type type);
> void kunmap_atomic(void *kvaddr, enum km_type type);
> void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
> +void *kmap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
Given that all highmem-implementing archtiectures must use the same
declaration here, we might as well put it into include/linux/highmem.h.
Although that goes against current mistakes^Wcode.
Does powerpc32 still implement highmem? It seems that way. You broke
it, no?
On Thu, 2008-10-23 at 13:38 -0700, Andrew Morton wrote:
> I guess one could reimplemenet kmap_atomic_pfn() to call this. Sometime.
The goal is to stop needing this function fairly soon and replace it
with a 'real' io-mapping implementation for 32-bit processors.
> Given that all highmem-implementing archtiectures must use the same
> declaration here, we might as well put it into include/linux/highmem.h.
> Although that goes against current mistakes^Wcode.
I'd hate to break with a long tradition.
> Does powerpc32 still implement highmem? It seems that way. You broke
> it, no?
Powerpc32 doesn't have kmap_atomic_pfn either. Seems like the set of
HIGHMEM functions is not uniform across architectures.
--
[email protected]
On Thu, 23 Oct 2008, Andrew Morton wrote:
>
> Given that all highmem-implementing archtiectures must use the same
> declaration here, we might as well put it into include/linux/highmem.h.
> Although that goes against current mistakes^Wcode.
>
> Does powerpc32 still implement highmem? It seems that way. You broke
> it, no?
This really shouldn't be in highmem.h AT ALL.
The whole point of that function has absolutely nothing to do with
highmem, and it *must* be useful on non-highmem configurations to be
appropriate.
So I'd much rather create a new <linux/kmap.h> or something. Or just
expose this from to <asm/fixmap.h> or something. Let's not confuse this
with highmem, even if the implementation _historically_ was due to that.
Linus
On Thu, 2008-10-23 at 14:24 -0700, Linus Torvalds wrote:
> The whole point of that function has absolutely nothing to do with
> highmem, and it *must* be useful on non-highmem configurations to be
> appropriate.
Right, I'd just like my io_mapping_map_atomic_wc to be able to rapidly
map random pages from my PCI BAR in WC mode. The code in your tree uses
kmap_atomic_pfn, which does the mapping, but use the wrong prot bits. On
a system with MTRR failure, this all ends badly, with factors of 20
performance drops for copying data from the CPU to the aperture.
> So I'd much rather create a new <linux/kmap.h> or something. Or just
> expose this from to <asm/fixmap.h> or something. Let's not confuse this
> with highmem, even if the implementation _historically_ was due to that.
Sure, we readily admit that we're abusing the highmem API. So we wrapped
that abusive code in a pretty package that can be re-implemented however
you desire.
How (and when) would you like to see this fixed?
--
[email protected]
On Thu, 23 Oct 2008, Keith Packard wrote:
>
> > So I'd much rather create a new <linux/kmap.h> or something. Or just
> > expose this from to <asm/fixmap.h> or something. Let's not confuse this
> > with highmem, even if the implementation _historically_ was due to that.
>
> Sure, we readily admit that we're abusing the highmem API. So we wrapped
> that abusive code in a pretty package that can be re-implemented however
> you desire.
>
> How (and when) would you like to see this fixed?
I'm not entirely sure who wants to own up to owning that particular part
of code, and is willing to make kmap_atomic_prot_pfn() also work in the
absense of CONFIG_HIGHMEM.
I suspect it's Ingo, but I also suspect that he'd like to see a tested
patch by somebody who actually _uses_ this code.
So I would suspect that if you guys actually write a patch, and make sure
that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to
Ingo, good things will happen.
As to the "without CONFIG_HIGHMEM" part: making all of this work even
without HIGHMEM should literally be a matter of:
- remove the '#ifdef CONFIG_HIGHMEM' in <asm/fixmap_32.h>, so that we
have fixmap entries for the temporary kernel mappings regardless (ie
FIX_KMAP_BEGIN and FIX_KMAP_END).
- move the code for the atomic accesses that use those fixmap entries
into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at
least just kmap_atomic_prot_pfn() and kunmap_atomic().
and it probably should all work automatically. The kmap_atomic() stuff
really should be almost totally independent of CONFIG_HIGHMEM, since it's
really much more closely related to fixmaps than HIGHMEM.
I guess there may be some debugging code that depends on HIGHMEM (eg that
whole testing for whether a page is a highmem page or not), so it might be
a _bit_ more than just moving code around, but I really didn't look
closer.
Then, there's the issue of 64-bit, and just mapping everything there, and
the interface to that. I liked the trivial extension to "struct resource"
to have a "cached mapping" pointer. So if we can just make it pass
resources around and get a page that way (and not even need kmap() on
64-bit architections), that would be good.
It's too late for -rc1 (which I'm planning on cutting within the hour),
and if it ends up being complex, I guess that it means this will eb a
2.6.29 issue.
But if it really is just a matter of mostly just some trivial code
movement and both the gfx and x86 people are all happy, I'll happily take
it for -rc2, and we can just leave this all behind us.
Linus
On Thu, 2008-10-23 at 14:24 -0700, Linus Torvalds wrote:
> The whole point of that function has absolutely nothing to do with
> highmem, and it *must* be useful on non-highmem configurations to be
> appropriate.
>
> So I'd much rather create a new <linux/kmap.h> or something. Or just
> expose this from to <asm/fixmap.h> or something. Let's not confuse
> this
> with highmem, even if the implementation _historically_ was due to
> that.
Well, on powerpc, we just went (or rather, Kumar just went) through the
oops of implementing fixmap and then kmap on top of it... just because
we wanted kmap_atomic functionality on non-highmem platforms :-) (and
the fixmap approach has some other interesting features).
So yes, I agree. Typically very useful for any 32-bit processor with a
memory mapped PCI Express config space since it's something like 256M of
virtual space to map it all iirc.
Cheers,
Ben.
On Thu, 2008-10-23 at 19:48 -0700, Linus Torvalds wrote:
> Then, there's the issue of 64-bit, and just mapping everything there, and
> the interface to that. I liked the trivial extension to "struct resource"
> to have a "cached mapping" pointer. So if we can just make it pass
> resources around and get a page that way (and not even need kmap() on
> 64-bit architections), that would be good.
I'm not that fan of carrying a mapping with a struct resource because if
we do that we should probably also refcount the mapping, and then there
is the whole question of mappings with different attributes, etc etc...
Cheers,
Ben.
On Thu, 2008-10-23 at 19:48 -0700, Linus Torvalds wrote:
> I'm not entirely sure who wants to own up to owning that particular part
> of code, and is willing to make kmap_atomic_prot_pfn() also work in the
> absense of CONFIG_HIGHMEM.
All of the kmap_atomic functions *do* work without CONFIG_HIGHMEM, they
just don't do what we want in this case. Without knowing the history, it
seems fairly clear that the kmap functions are designed to map physical
memory pages into the kernel virtual address space. On small 32-bit
systems, that's trivial, you just use the direct map (as one does on
64-bit systems). The magic fixmap entries make this work with
CONFIG_HIGHMEM as well.
So, I fear touching the kmap API as it appears to have a specific and
useful purpose, irrespective of the memory size the kernel is configured
for.
What I've proposed is that we create a new io-space specific set of
fixmap APIs. On CONFIG_HIGHMEM, they'd just use the existing kmap_atomic
mechanism, but on small 32-bit systems, we'd enable the fixmaps and have
some for that environment as well.
> So I would suspect that if you guys actually write a patch, and make sure
> that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to
> Ingo, good things will happen.
Ok, we can give this a try.
> and it probably should all work automatically. The kmap_atomic() stuff
> really should be almost totally independent of CONFIG_HIGHMEM, since it's
> really much more closely related to fixmaps than HIGHMEM.
As above, I think kmap_atomic should be left alone as a way of quickly
mapping memory pages. There are a users of both kmap_atomic_prot (xen)
and kmap_atomic_pfn (crash dumps).
> I guess there may be some debugging code that depends on HIGHMEM (eg that
> whole testing for whether a page is a highmem page or not), so it might be
> a _bit_ more than just moving code around, but I really didn't look
> closer.
>
> Then, there's the issue of 64-bit, and just mapping everything there, and
> the interface to that. I liked the trivial extension to "struct resource"
> to have a "cached mapping" pointer. So if we can just make it pass
> resources around and get a page that way (and not even need kmap() on
> 64-bit architections), that would be good.
The io_mapping API I proposed does precisely this. On 32-bit systems, it
uses kmap_atomic for each page access while on 64-bit systems it uses
ioremap_wc at device init time and then just uses an offset for each
page access.
Hiding this detail behind an API leaves the driver code independent of
this particular choice.
> It's too late for -rc1 (which I'm planning on cutting within the hour),
> and if it ends up being complex, I guess that it means this will eb a
> 2.6.29 issue.
If we do end up pushing this out to 2.6.29, I'd like to see
kmap_atomic_prot_pfn in place as a stop-gap so that PAT performance on
32-bit systems is reasonable. I don't think too many people are running
desktop systems without CONFIG_HIGHMEM at this point, and if so, we can
just suggest that perhaps they change their configuration.
> But if it really is just a matter of mostly just some trivial code
> movement and both the gfx and x86 people are all happy, I'll happily take
> it for -rc2, and we can just leave this all behind us.
I'll try to get something working in the next day or so and see how it
looks. My plan at this point is to create new API for 32-bit systems:
void *io_map_atomic_wc(unsigned long pfn)
void io_unmap_atomic(void *addr);
With this, I can switch my existing io_mapping API over to an
io-specific interface and implement those using the fixmap code.
--
[email protected]
On Thu, 23 Oct 2008 00:14:46 -0700 Keith Packard wrote:
> Documentation/io-mapping.txt | 84 ++++++++++++++++++++++++++++
> include/linux/io-mapping.h | 125 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 209 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/io-mapping.txt
> create mode 100644 include/linux/io-mapping.h
>
> diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
> new file mode 100644
> index 0000000..ebf6dc5
> --- /dev/null
> +++ b/Documentation/io-mapping.txt
> @@ -0,0 +1,84 @@
> +The io_mapping functions in linux/io.h provide an abstraction for
io-mapping.h ?
> +efficiently mapping small regions of an io device to the CPU. The initial
IO or I/O, please
> +usage is to support the large graphics aperture on 32-bit processors where
> +ioremap_wc cannot be used to statically map the entire aperture to the CPU
> +as it would consume too much of the kernel address space.
> +
> +A mapping object is created during driver initialization using
> +
> + struct io_mapping *
> + io_mapping_create_wc(unsigned long base, unsigned long size)
> +
> + 'base' is the bus address of the region to be made
> + mappable, while 'size' indicates how large a mapping region to
> + enable. Both are in bytes.
> +
> + This _wc variant provides a mapping which may only be used
> + with the io_mapping_map_atomic_wc or io_mapping_map_wc.
> +
> +With this mapping object, individual pages can be mapped either atomically
> +or not, depending on the necessary scheduling environment. Of course, atomic
> +maps are more efficient:
> +
> + void *
> + io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
> +
> + 'offset' is the offset within the defined mapping region.
> + Accessing addresses beyond the region specified in the
> + creation function yields undefined results. Using an offset
> + which is not page aligned yields an undefined result. The
> + return value points to a single page in CPU address space.
> +
> + This _wc variant returns a write-combining map to the
> + page and may only be used with
with <TBD>...
> +
> + Note that the task may not sleep while holding this page
> + mapped.
> +
> + void
> + io_mapping_unmap_atomic(void *vaddr)
> +
> + 'vaddr' must be the the value returned by the last
> + io_mapping_map_atomic_wc call. This unmaps the specified
> + page, and allows the task to sleep once again.
s/,//
> +
> +If you need to sleep while holding the lock, you can use the non-atomic
> +variant, although they may be significantly slower;
s/;/./
> +
> + void *
> + io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
> +
> + This works like io_mapping_map_atomic_wc except it allows
> + the task to sleep while holding the page mapped.
> +
> + void
> + io_mapping_unmap(void *vaddr)
> +
> + This works like io_mapping_unmap_atomic, except it is used
> + for pages mapped with io_mapping_map_wc.
> +
> +At driver close time, the io_mapping object must be freed:
> +
> + void
> + io_mapping_free(struct io_mapping *mapping)
> +
> +Current Implementation:
> +
> +The initial implementation of these functions use existing mapping
uses
> +mechanisms and so provide only an abstraction layer and no new
provides
> +functionality.
> +
> +On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
> +range, creating a permanent kernel-visible mapping to the resource. The
> +map_atomic and map functions add the requested offset to the base of the
> +virtual address returned by ioremap_wc.
> +
> +On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
> +kmap_atomic_pfn to map the specified page in an atomic fashion;
> +kmap_atomic_pfn isn't really supposed to be used with device pages, but it
> +provides an efficient mapping for this usage.
> +
> +On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
> +io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
> +performs an IPI to inform all processors about the new mapping. This results
> +in a significant performance penalty.
And I wish you could lose that horrible (non-Linux kernel) style of function
return type on a separate line.
---
~Randy
On Fri, 2008-10-24 at 14:24 +1100, Benjamin Herrenschmidt wrote:
> I'm not that fan of carrying a mapping with a struct resource because if
> we do that we should probably also refcount the mapping, and then there
> is the whole question of mappings with different attributes, etc etc...
I'm fine with sticking the mapping in a separate structure; it's just
the return from ioremap_wc on 64-bit systems, and nothing at all on
32-bit systems. I don't really see the advantage of moving it into the
resource, especially as we may need different access modes (UC vs WC)
for different portions of a single BAR.
We may want to add some bounds and access mode information to this
structure to check for broken drivers.
--
[email protected]
On Thu, 2008-10-23 at 19:48 -0700, Linus Torvalds wrote:
> So I would suspect that if you guys actually write a patch, and make sure
> that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to
> Ingo, good things will happen.
Something like the following (yes, I know, this is missing the
include/asm-x86 rename)?
From e7921809c72f940295311cfa6c300d5234ac96c1 Mon Sep 17 00:00:00 2001
From: Keith Packard <[email protected]>
Date: Thu, 23 Oct 2008 23:17:40 -0700
Subject: [PATCH] [x86_32] Add io_map_atomic using fixmaps
This steals the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic io resource mapping. These fixmaps are available
even with CONFIG_HIGHMEM turned off.
Signed-off-by: Keith Packard <[email protected]>
---
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/init_32.c | 3 +-
arch/x86/mm/iomap_32.c | 59 +++++++++++++++++++++++++++++++++++++++++++
include/asm-x86/fixmap.h | 4 +++
include/asm-x86/fixmap_32.h | 4 ---
include/asm-x86/highmem.h | 8 +++---
include/asm-x86/iomap.h | 30 ++++++++++++++++++++++
include/linux/io-mapping.h | 15 +++-------
8 files changed, 104 insertions(+), 21 deletions(-)
create mode 100644 arch/x86/mm/iomap_32.c
create mode 100644 include/asm-x86/iomap.h
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 59f89b4..fea4565 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -1,7 +1,7 @@
obj-y := init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
pat.o pgtable.o gup.o
-obj-$(CONFIG_X86_32) += pgtable_32.o
+obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8396868..c483f42 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -334,7 +334,6 @@ int devmem_is_allowed(unsigned long pagenr)
return 0;
}
-#ifdef CONFIG_HIGHMEM
pte_t *kmap_pte;
pgprot_t kmap_prot;
@@ -357,6 +356,7 @@ static void __init kmap_init(void)
kmap_prot = PAGE_KERNEL;
}
+#ifdef CONFIG_HIGHMEM
static void __init permanent_kmaps_init(pgd_t *pgd_base)
{
unsigned long vaddr;
@@ -436,7 +436,6 @@ static void __init set_highmem_pages_init(void)
#endif /* !CONFIG_NUMA */
#else
-# define kmap_init() do { } while (0)
# define permanent_kmaps_init(pgd_base) do { } while (0)
# define set_highmem_pages_init() do { } while (0)
#endif /* CONFIG_HIGHMEM */
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
new file mode 100644
index 0000000..c559599
--- /dev/null
+++ b/arch/x86/mm/iomap_32.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright © 2008 Keith Packard <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/iomap.h>
+#include <linux/module.h>
+
+/* Map 'pfn' using fixed map 'type' and protections 'prot'
+ */
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
+{
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+
+ pagefault_disable();
+
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
+ arch_flush_lazy_mmu_mode();
+
+ return (void*) vaddr;
+}
+EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type)
+{
+ unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
+ enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
+
+ /*
+ * Force other mappings to Oops if they'll try to access this pte
+ * without first remap it. Keeping stale mappings around is a bad idea
+ * also, in case the page changes cacheability attributes or becomes
+ * a protected page in a hypervisor.
+ */
+ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
+ kpte_clear_flush(kmap_pte-idx, vaddr);
+
+ arch_flush_lazy_mmu_mode();
+ pagefault_enable();
+}
+EXPORT_SYMBOL_GPL(iounmap_atomic);
diff --git a/include/asm-x86/fixmap.h b/include/asm-x86/fixmap.h
index 78e33a1..a8b4379 100644
--- a/include/asm-x86/fixmap.h
+++ b/include/asm-x86/fixmap.h
@@ -9,6 +9,10 @@
extern int fixmaps_set;
+extern pte_t *kmap_pte;
+extern pgprot_t kmap_prot;
+extern pte_t *pkmap_page_table;
+
void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);
diff --git a/include/asm-x86/fixmap_32.h b/include/asm-x86/fixmap_32.h
index 8844002..97c2976 100644
--- a/include/asm-x86/fixmap_32.h
+++ b/include/asm-x86/fixmap_32.h
@@ -28,10 +28,8 @@ extern unsigned long __FIXADDR_TOP;
#include <asm/acpi.h>
#include <asm/apicdef.h>
#include <asm/page.h>
-#ifdef CONFIG_HIGHMEM
#include <linux/threads.h>
#include <asm/kmap_types.h>
-#endif
/*
* Here we define all the compile-time 'special' virtual
@@ -75,10 +73,8 @@ enum fixed_addresses {
#ifdef CONFIG_X86_CYCLONE_TIMER
FIX_CYCLONE_TIMER, /*cyclone timer register*/
#endif
-#ifdef CONFIG_HIGHMEM
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
-#endif
#ifdef CONFIG_PCI_MMCONFIG
FIX_PCIE_MCFG,
#endif
diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h
index a1f8f8c..d728928 100644
--- a/include/asm-x86/highmem.h
+++ b/include/asm-x86/highmem.h
@@ -4,6 +4,9 @@
* Used in CONFIG_HIGHMEM systems for memory pages which
* are not addressable by direct kernel virtual addresses.
*
+ * Used in other 32-bit systems for io pages which are not
+ * in the direct kernel virtual map
+ *
* Copyright (C) 1999 Gerhard Wichert, Siemens AG
* [email protected]
*
@@ -25,14 +28,11 @@
#include <asm/kmap_types.h>
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
+#include <asm/fixmap.h>
/* declarations for highmem.c */
extern unsigned long highstart_pfn, highend_pfn;
-extern pte_t *kmap_pte;
-extern pgprot_t kmap_prot;
-extern pte_t *pkmap_page_table;
-
/*
* Right now we initialize only a single pte table. It can be extended
* easily, subsequent pte tables have to be allocated in one physical
diff --git a/include/asm-x86/iomap.h b/include/asm-x86/iomap.h
new file mode 100644
index 0000000..33b8180
--- /dev/null
+++ b/include/asm-x86/iomap.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright © 2008 Keith Packard <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type);
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index bd1dc4f..a7e0d98 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -75,6 +75,9 @@ io_mapping_unmap(void *vaddr)
#endif /* CONFIG_X86_64 */
#ifdef CONFIG_X86_32
+
+#include <asm/iomap.h>
+
static inline struct io_mapping *
io_mapping_create_wc(unsigned long base, unsigned long size)
{
@@ -91,22 +94,14 @@ static inline void *
io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
{
offset += (unsigned long) mapping;
-#ifdef CONFIG_HIGHMEM
- return kmap_atomic_prot_pfn(offset >> PAGE_SHIFT, KM_USER0,
+ return iomap_atomic_prot_pfn(offset >> PAGE_SHIFT, KM_USER0,
__pgprot(__PAGE_KERNEL_WC));
-#else
- return ioremap_wc(offset, PAGE_SIZE);
-#endif
}
static inline void
io_mapping_unmap_atomic(void *vaddr)
{
-#ifdef CONFIG_HIGHMEM
- kunmap_atomic(vaddr, KM_USER0);
-#else
- iounmap(vaddr);
-#endif
+ iounmap_atomic(vaddr, KM_USER0);
}
static inline void *
--
1.5.6.5
--
[email protected]
On Thu, 2008-10-23 at 21:49 -0700, Randy Dunlap wrote:
(A bunch of helpful edits for the io-mapping.txt file)
Thanks! They're in my tree and will get included in the next version of
this patch.
--
[email protected]
Keith,
What you actually are doing here is claiming copyright on code that
other people have written, and tighten the export restrictions.
kmap_atomic_prot_pfn() appeared long ago in drm git with identical code
and purpose, but with different authors, and iounmap_atomic is identical
to kunmap_atomic.
Pls fix.
/Thomas
Keith Packard wrote:
>
> diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
> new file mode 100644
> index 0000000..c559599
> --- /dev/null
> +++ b/arch/x86/mm/iomap_32.c
> @@ -0,0 +1,59 @@
> +/*
> + * Copyright © 2008 Keith Packard <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <asm/iomap.h>
> +#include <linux/module.h>
> +
> +/* Map 'pfn' using fixed map 'type' and protections 'prot'
> + */
> +void *
> +iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
> +{
> + enum fixed_addresses idx;
> + unsigned long vaddr;
> +
> + pagefault_disable();
> +
> + idx = type + KM_TYPE_NR*smp_processor_id();
> + vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> + set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
> + arch_flush_lazy_mmu_mode();
> +
> + return (void*) vaddr;
> +}
> +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
> +
> +void
> +iounmap_atomic(void *kvaddr, enum km_type type)
> +{
> + unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
> + enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
> +
> + /*
> + * Force other mappings to Oops if they'll try to access this pte
> + * without first remap it. Keeping stale mappings around is a bad idea
> + * also, in case the page changes cacheability attributes or becomes
> + * a protected page in a hypervisor.
> + */
> + if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
> + kpte_clear_flush(kmap_pte-idx, vaddr);
> +
> + arch_flush_lazy_mmu_mode();
> + pagefault_enable();
> +}
> +EXPORT_SYMBOL_GPL(iounmap_atomic);
>
* Thomas Hellstr?m <[email protected]> wrote:
> Keith,
>
> What you actually are doing here is claiming copyright on code that
> other people have written, and tighten the export restrictions.
> kmap_atomic_prot_pfn() appeared long ago in drm git with identical
> code and purpose, but with different authors, and iounmap_atomic is
> identical to kunmap_atomic.
>> +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
you want to use this facility in a binary-only driver?
Ingo
* Linus Torvalds <[email protected]> wrote:
> On Thu, 23 Oct 2008, Keith Packard wrote:
> >
> > > So I'd much rather create a new <linux/kmap.h> or something. Or just
> > > expose this from to <asm/fixmap.h> or something. Let's not confuse this
> > > with highmem, even if the implementation _historically_ was due to that.
> >
> > Sure, we readily admit that we're abusing the highmem API. So we wrapped
> > that abusive code in a pretty package that can be re-implemented however
> > you desire.
> >
> > How (and when) would you like to see this fixed?
>
> I'm not entirely sure who wants to own up to owning that particular
> part of code, and is willing to make kmap_atomic_prot_pfn() also work
> in the absense of CONFIG_HIGHMEM.
yeah, that would be the x86 maintainers. (whoever they are)
> I suspect it's Ingo, but I also suspect that he'd like to see a tested
> patch by somebody who actually _uses_ this code.
yeah. I already asked Keith yesterday to send one coherent bundle of
patches and i think we've got all the code sent already, you also pulled
the latest DRM tree - so it all just needs sorting out and testing.
[ I can possibly test the end result with a bleeding-edge Xorg which
supports the new DRI APIs. (Whether X comes up fine is a regular,
necessary and 'fun' component of the x86 regression testing we do
anyway. We also tend to notice highmem breakages promptly.) ]
> So I would suspect that if you guys actually write a patch, and make
> sure that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send
> it to Ingo, good things will happen.
>
> As to the "without CONFIG_HIGHMEM" part: making all of this work even
> without HIGHMEM should literally be a matter of:
>
> - remove the '#ifdef CONFIG_HIGHMEM' in <asm/fixmap_32.h>, so that we
> have fixmap entries for the temporary kernel mappings regardless (ie
> FIX_KMAP_BEGIN and FIX_KMAP_END).
>
> - move the code for the atomic accesses that use those fixmap entries
> into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at
> least just kmap_atomic_prot_pfn() and kunmap_atomic().
>
> and it probably should all work automatically. The kmap_atomic() stuff
> really should be almost totally independent of CONFIG_HIGHMEM, since
> it's really much more closely related to fixmaps than HIGHMEM.
yeah.
> I guess there may be some debugging code that depends on HIGHMEM (eg
> that whole testing for whether a page is a highmem page or not), so it
> might be a _bit_ more than just moving code around, but I really
> didn't look closer.
>
> Then, there's the issue of 64-bit, and just mapping everything there,
> and the interface to that. I liked the trivial extension to "struct
> resource" to have a "cached mapping" pointer. So if we can just make
> it pass resources around and get a page that way (and not even need
> kmap() on 64-bit architections), that would be good.
hm, the thing that i find the weakest aspect of that (despite having
suggested this approach) is that the structure and the granularity of
the resource tree is really controlled by the hardware environment. We
try to map the hardware circumstances accurately at bootstrap / device
discovery time, and keep it rather static from that point on. (modulo
hotplug and dynamic BAR sizing)
And this static property of the resource tree is _good_ IMO, because we
can think about it as a hardware property - not something tweaked by the
kernel. (the kernel does tweak it when need to be or when the hardware
asks for it, but it's more of an exception)
That means that if a driver wants to map just a portion of a BAR,
(because the hardware itself compresses various different pieces of
functionality into the same BAR), this abstraction becomes a limitation
on usage.
And it might even be _impossible_ to use the simplest form of
resource->mem_cache facility with certain types of hardware: say there's
a cacheable and an uncacheable window within the same BAR - and we'd map
the whole thing as cacheable. The CPU is then entitled to (and will most
likely) prefetch into the uncacheable region as well, causing hw
lockups, etc. [In this thread it was claimed that S3 chips have such
properties.]
And tweaking this abstraction to cover those cases, for the ioremap to
not be a full mapping of the resource looks a bit hacky/limiting as
well:
- we'd either have to add 'size', 'offset' properties to the window we
cache in the struct resource. (and possibly an array. yuck.)
- or we'd have to say "dont use this facility with such quirky hardware
then".
- or we'd have to say "split up the struct resource into two pieces
artificially", when the driver starts using the resource - which
violates the current rather static nature of the resource tree.
Maybe i'm overcomplicating it and maybe this last option isnt all that
bad after all: as the 'combined' resource in such cases _is_ artificial
- and i915+ does not have such problematic BARs to begin with. (keeping
separate BARs for the framebuffer is the sane thing to do anyway)
OTOH, Keith's io-mapping API does look pretty natural too - it wraps
facilities that are already available to drivers, into a coherent unit.
A driver has to be aware of it anyway, and drivers have to store their
ioremap results in the private device structure currently anyway, so it
fits nicely into current ioremap practices and is a gradual extension of
it.
Discovering the resource at the driver level might be a bit complicated
(right now there's no need for any 3D driver to even know about struct
resource - they just use the PCI APIs) and then using it dynamically
brings up various lifetime questions.
Hm? Right now i'm leaning slightly towards Keith's code - but no strong
feelings. (Assuming that the atomic-kmap facilities are separated out
cleanly and are made independent of any highmem connections - but that
is just a shuffle-code-around-and-test excercise)
> It's too late for -rc1 (which I'm planning on cutting within the
> hour), and if it ends up being complex, I guess that it means this
> will eb a 2.6.29 issue.
>
> But if it really is just a matter of mostly just some trivial code
> movement and both the gfx and x86 people are all happy, I'll happily
> take it for -rc2, and we can just leave this all behind us.
ok. I'm pretty optimistic about it because the risk factor seems
manageable: only one driver is affected by the changes - and that
driver's authors wrote this new core kernel facility: which is the best
of all combinations. We can test the x86 side highmem impact just fine.
Plus this portion of the current upstream code in the DRM driver is
rather ugly to begin with, so in my book this is a fair cleanup/bugfix
as well.
Ingo
Ingo Molnar wrote:
> * Thomas Hellstr?m <[email protected]> wrote:
>
>
>> Keith,
>>
>> What you actually are doing here is claiming copyright on code that
>> other people have written, and tighten the export restrictions.
>> kmap_atomic_prot_pfn() appeared long ago in drm git with identical
>> code and purpose, but with different authors, and iounmap_atomic is
>> identical to kunmap_atomic.
>>
>
>
>>> +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
>>>
>
> you want to use this facility in a binary-only driver?
>
> Ingo
>
At this point I have no such use for it, no.
The original user was a MIT style licenced driver.
/Thomas
* Thomas Hellstr?m <[email protected]> wrote:
> Ingo Molnar wrote:
>> * Thomas Hellstr?m <[email protected]> wrote:
>>
>>
>>> Keith,
>>>
>>> What you actually are doing here is claiming copyright on code that
>>> other people have written, and tighten the export restrictions.
>>> kmap_atomic_prot_pfn() appeared long ago in drm git with identical
>>> code and purpose, but with different authors, and iounmap_atomic is
>>> identical to kunmap_atomic.
>>>
>>
>>
>>>> +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
>>>>
>>
>> you want to use this facility in a binary-only driver?
>>
>> Ingo
>>
> At this point I have no such use for it, no.
> The original user was a MIT style licenced driver.
okay, then just to put this question to rest: i wrote the original
32-bit highmem code ~10 years ago. I wrote the first version of fixmap
support - in fact i coined the term. I wrote the first version of the
atomic-kmap facility as well.
All of that code is licensed under the GPLv2. So if anyone wants to make
any copyright claims about highmem/kmap/fixmap derivative works,
consider it in that light.
Regarding this new API variant that Keith wrote: it would be silly and
dangerous to export it anywhere but to in-kernel drivers. The API
disables preemption on 32-bit and rummages deep in the guts of the
kernel as well, uses up a precious resource (fixmap slots), etc. It's
internal and we eventually might want to deprecate forms of it and
concentrate on the good 64-bit performance side.
Ingo
Ingo Molnar wrote:
> * Thomas Hellstr?m <[email protected]> wrote:
>
>
>> Ingo Molnar wrote:
>>
>>> * Thomas Hellstr?m <[email protected]> wrote:
>>>
>>>
>>>
>>>> Keith,
>>>>
>>>> What you actually are doing here is claiming copyright on code that
>>>> other people have written, and tighten the export restrictions.
>>>> kmap_atomic_prot_pfn() appeared long ago in drm git with identical
>>>> code and purpose, but with different authors, and iounmap_atomic is
>>>> identical to kunmap_atomic.
>>>>
>>>>
>>>
>>>
>>>>> +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
>>>>>
>>>>>
>>> you want to use this facility in a binary-only driver?
>>>
>>> Ingo
>>>
>>>
>> At this point I have no such use for it, no.
>> The original user was a MIT style licenced driver.
>>
>
> okay, then just to put this question to rest: i wrote the original
> 32-bit highmem code ~10 years ago. I wrote the first version of fixmap
> support - in fact i coined the term. I wrote the first version of the
> atomic-kmap facility as well.
>
> All of that code is licensed under the GPLv2. So if anyone wants to make
> any copyright claims about highmem/kmap/fixmap derivative works,
> consider it in that light.
>
I fully acknowledge that. The fact that others wrote almost all of the
code is the reason why there is no additional copyright claims in the
file of the original kmap_atomic_prot_pfn() implementation.
What I'm considering very bad form is someone taking other people's
contributions, be it code or ideas, no matter how small they are, claim
copyright on them and restrict the original usage. That's really the
point here. Frankly, from my own usage point of view I don't really care
about the specific case (whether they are exported GPL or not), but with
the current form of this patch, I'm basically no longer allowed to use
the code currently in DRM git without Keith's permission.
> Regarding this new API variant that Keith wrote: it would be silly and
> dangerous to export it anywhere but to in-kernel drivers. The API
> disables preemption on 32-bit and rummages deep in the guts of the
> kernel as well, uses up a precious resource (fixmap slots), etc. It's
> internal and we eventually might want to deprecate forms of it and
> concentrate on the good 64-bit performance side.
>
>
These are all good arguments to reserve this api for in-kernel drivers.
There are other reasons why it should be available to out of tree drivers:
* The api enables a fast facility that will be extremely important by
graphics drivers in the future, probably not only for in-kernel drivers.
Particularly as future graphics drivers will want to get fast
write-combined kernel mappings also on highmem pages, not only io.
This latter actually suggests keeping the form kmap_atomic_prot_pfn
instead of iomap_atomic_prot_pfn, and make it permanent, unless the same
goals can be achieved by the VMAP work Nick Piggin is suggesting.
* The use of k[un]map_atomic() would, following your argumentation, be
equally dangerous to export non-GPL?
Now, considering these pros and cons one might still come to the
conclusion that for stability reasons it is best to keep this API for in
kernel drivers. I don't really know enough about the affected kernel
internals to be able to judge. I just think it's important that all
facts are considered.
/Thomas
On Thu, 23 Oct 2008, Keith Packard wrote:
>
> I'm fine with sticking the mapping in a separate structure; it's just
> the return from ioremap_wc on 64-bit systems, and nothing at all on
> 32-bit systems.
Actually, on 32-bit, the 'prot' should be there, as should the starting
physical page. Otherwise the two interfaces would be very odd, and you'd
have to repeat those arguments in all callers (ie both in "prepare" and
in the actual "access").
Linus
On Fri, 2008-10-24 at 07:53 -0700, Linus Torvalds wrote:
> Actually, on 32-bit, the 'prot' should be there, as should the starting
> physical page. Otherwise the two interfaces would be very odd, and you'd
> have to repeat those arguments in all callers (ie both in "prepare" and
> in the actual "access").
I suppose. What I did instead was create _wc versions of both the
prepare and access functions to eliminate the need for additional data.
Either way is fine with me; I took the route which didn't require an
additional allocation.
--
[email protected]
On Fri, 2008-10-24 at 09:33 +0200, Thomas Hellström wrote:
> Keith,
>
> What you actually are doing here is claiming copyright on code that
> other people have written, and tighten the export restrictions.
> kmap_atomic_prot_pfn() appeared long ago in drm git with identical code
> and purpose, but with different authors, and iounmap_atomic is identical
> to kunmap_atomic.
Yeah, I just stuck my usual license header on it and didn't think about
authorship. I'll fix that, once we figure out what the appropriate name
is.
But, as this code is clearly a trivial adaptation of the existing kernel
code, it should carry a GPLv2 license. I'm also not particular as to the
EXPORT restriction, I was just following the EXPORT advice given for the
other newly exposed kernel symbols we're using.
--
[email protected]
Keith Packard wrote:
> On Fri, 2008-10-24 at 09:33 +0200, Thomas Hellström wrote:
>
>> Keith,
>>
>> What you actually are doing here is claiming copyright on code that
>> other people have written, and tighten the export restrictions.
>> kmap_atomic_prot_pfn() appeared long ago in drm git with identical code
>> and purpose, but with different authors, and iounmap_atomic is identical
>> to kunmap_atomic.
>>
>
> Yeah, I just stuck my usual license header on it and didn't think about
> authorship. I'll fix that, once we figure out what the appropriate name
> is.
>
> But, as this code is clearly a trivial adaptation of the existing kernel
> code, it should carry a GPLv2 license. I'm also not particular as to the
> EXPORT restriction, I was just following the EXPORT advice given for the
> other newly exposed kernel symbols we're using.
>
>
Thanks, Keith.
If there is a chance that people who do want the EXPORT_SYMBOL_GPL
restriction are OK to go with just EXPORT_SYMBOL(), I guess that would
fit best considering what's already exported and doable in drivers today.
/Thomas
On Fri, Oct 24, 2008 at 1:39 AM, Keith Packard <[email protected]> wrote:
> On Thu, 2008-10-23 at 10:05 +0200, Ingo Molnar wrote:
>
>> Any ballpark-figure numbers you can share with us?
>
> For one quake-3 based game we use for performance regression checking,
> 64-bit kernels run about 18 times faster now. That's the difference
> between using a zero-cost dynamic mapping and using ioremap_wc for each
> page.
>
> --
> [email protected]
>
So I've put these patches into Fedora rawhide kernel to test, and
glxgears on my
945G hw went from 85fps to 380fps, clearly we would want these patches
upstream sooner
rather than later.
Or we at least need something in the i915 driver to get the speed
under GEM back up.
I'm happy to ship these patches in F10 GA if that makes any
difference, so I'd really like them upstream asap.
Dave.
* Dave Airlie <[email protected]> wrote:
> On Fri, Oct 24, 2008 at 1:39 AM, Keith Packard <[email protected]> wrote:
> > On Thu, 2008-10-23 at 10:05 +0200, Ingo Molnar wrote:
> >
> >> Any ballpark-figure numbers you can share with us?
> >
> > For one quake-3 based game we use for performance regression checking,
> > 64-bit kernels run about 18 times faster now. That's the difference
> > between using a zero-cost dynamic mapping and using ioremap_wc for each
> > page.
> >
> > --
> > [email protected]
> >
>
> So I've put these patches into Fedora rawhide kernel to test, and
> glxgears on my 945G hw went from 85fps to 380fps, clearly we would
> want these patches upstream sooner rather than later.
yep, it's all lined up already in tip/core/resources, and got
massively tested over the past few days. Will send a pull request to
Linus tomorrow-ish - we need one final cleanup patch and then it's
green to go.
Ingo
On Mon, 3 Nov 2008, Dave Airlie wrote:
>
> So I've put these patches into Fedora rawhide kernel to test, and
> glxgears on my 945G hw went from 85fps to 380fps, clearly we would want
> these patches upstream sooner rather than later.
I'm inclined to agree. Not that I think 380fps sounds very impressive (I
get 850+ fps with _software_ rendering, for chissake), but because 85 fps
is a joke, and clearly without this setup there's not even any point to
try to do any other optimizations.
And if we're looking at the patches being in Fedora anyway, there really
is no reason not to merge them. Even if it's out of the merge window.
Ingo?
Linus
* Linus Torvalds <[email protected]> wrote:
>
>
> On Mon, 3 Nov 2008, Dave Airlie wrote:
> >
> > So I've put these patches into Fedora rawhide kernel to test, and
> > glxgears on my 945G hw went from 85fps to 380fps, clearly we would want
> > these patches upstream sooner rather than later.
>
> I'm inclined to agree. Not that I think 380fps sounds very
> impressive (I get 850+ fps with _software_ rendering, for chissake),
> but because 85 fps is a joke, and clearly without this setup there's
> not even any point to try to do any other optimizations.
>
> And if we're looking at the patches being in Fedora anyway, there
> really is no reason not to merge them. Even if it's out of the merge
> window.
>
> Ingo?
there was one pending item: i was waiting for the x86 #ifdef to be
cleaned up out of include/linux/io-mapping.h, but that is trivial with
no impact to any object code and can thus perhaps wait - would be nice
to get this in early in this -rc.
Please pull the latest io-mappings-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus
Thanks,
Ingo
------------------>
Keith Packard (3):
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
resources: add io-mapping functions to dynamically map large device apertures
i915: use io-mapping interfaces instead of a variety of mapping kludges
Documentation/io-mapping.txt | 76 +++++++++++++++++
arch/x86/include/asm/fixmap.h | 4 +
arch/x86/include/asm/fixmap_32.h | 4 -
arch/x86/include/asm/highmem.h | 5 +-
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/init_32.c | 3 +-
arch/x86/mm/iomap_32.c | 59 +++++++++++++
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 174 +++++++++++++++++--------------------
include/asm-x86/iomap.h | 30 +++++++
include/linux/io-mapping.h | 118 ++++++++++++++++++++++++++
11 files changed, 373 insertions(+), 105 deletions(-)
create mode 100644 Documentation/io-mapping.txt
create mode 100644 arch/x86/mm/iomap_32.c
create mode 100644 include/asm-x86/iomap.h
create mode 100644 include/linux/io-mapping.h
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..cd2f726
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,76 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as it would consume too much of the kernel address space.
+
+A mapping object is created during driver initialization using
+
+ struct io_mapping *io_mapping_create_wc(unsigned long base,
+ unsigned long size)
+
+ 'base' is the bus address of the region to be made
+ mappable, while 'size' indicates how large a mapping region to
+ enable. Both are in bytes.
+
+ This _wc variant provides a mapping which may only be used
+ with the io_mapping_map_atomic_wc or io_mapping_map_wc.
+
+With this mapping object, individual pages can be mapped either atomically
+or not, depending on the necessary scheduling environment. Of course, atomic
+maps are more efficient:
+
+ void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
+ unsigned long offset)
+
+ 'offset' is the offset within the defined mapping region.
+ Accessing addresses beyond the region specified in the
+ creation function yields undefined results. Using an offset
+ which is not page aligned yields an undefined result. The
+ return value points to a single page in CPU address space.
+
+ This _wc variant returns a write-combining map to the
+ page and may only be used with mappings created by
+ io_mapping_create_wc
+
+ Note that the task may not sleep while holding this page
+ mapped.
+
+ void io_mapping_unmap_atomic(void *vaddr)
+
+ 'vaddr' must be the the value returned by the last
+ io_mapping_map_atomic_wc call. This unmaps the specified
+ page and allows the task to sleep once again.
+
+If you need to sleep while holding the lock, you can use the non-atomic
+variant, although they may be significantly slower.
+
+ void *io_mapping_map_wc(struct io_mapping *mapping,
+ unsigned long offset)
+
+ This works like io_mapping_map_atomic_wc except it allows
+ the task to sleep while holding the page mapped.
+
+ void io_mapping_unmap(void *vaddr)
+
+ This works like io_mapping_unmap_atomic, except it is used
+ for pages mapped with io_mapping_map_wc.
+
+At driver close time, the io_mapping object must be freed:
+
+ void io_mapping_free(struct io_mapping *mapping)
+
+Current Implementation:
+
+The initial implementation of these functions uses existing mapping
+mechanisms and so provides only an abstraction layer and no new
+functionality.
+
+On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
+range, creating a permanent kernel-visible mapping to the resource. The
+map_atomic and map functions add the requested offset to the base of the
+virtual address returned by ioremap_wc.
+
+On 32-bit processors, io_mapping_map_atomic_wc uses io_map_atomic_prot_pfn,
+which uses the fixmaps to get us a mapping to a page using an atomic fashion.
+For io_mapping_map_wc, ioremap_wc() is used to get a mapping of the region.
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 8668a94..23696d4 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -9,6 +9,10 @@
extern int fixmaps_set;
+extern pte_t *kmap_pte;
+extern pgprot_t kmap_prot;
+extern pte_t *pkmap_page_table;
+
void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);
diff --git a/arch/x86/include/asm/fixmap_32.h b/arch/x86/include/asm/fixmap_32.h
index 09f29ab..c7115c1 100644
--- a/arch/x86/include/asm/fixmap_32.h
+++ b/arch/x86/include/asm/fixmap_32.h
@@ -28,10 +28,8 @@ extern unsigned long __FIXADDR_TOP;
#include <asm/acpi.h>
#include <asm/apicdef.h>
#include <asm/page.h>
-#ifdef CONFIG_HIGHMEM
#include <linux/threads.h>
#include <asm/kmap_types.h>
-#endif
/*
* Here we define all the compile-time 'special' virtual
@@ -75,10 +73,8 @@ enum fixed_addresses {
#ifdef CONFIG_X86_CYCLONE_TIMER
FIX_CYCLONE_TIMER, /*cyclone timer register*/
#endif
-#ifdef CONFIG_HIGHMEM
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
-#endif
#ifdef CONFIG_PCI_MMCONFIG
FIX_PCIE_MCFG,
#endif
diff --git a/arch/x86/include/asm/highmem.h b/arch/x86/include/asm/highmem.h
index a3b3b7c..bf9276b 100644
--- a/arch/x86/include/asm/highmem.h
+++ b/arch/x86/include/asm/highmem.h
@@ -25,14 +25,11 @@
#include <asm/kmap_types.h>
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
+#include <asm/fixmap.h>
/* declarations for highmem.c */
extern unsigned long highstart_pfn, highend_pfn;
-extern pte_t *kmap_pte;
-extern pgprot_t kmap_prot;
-extern pte_t *pkmap_page_table;
-
/*
* Right now we initialize only a single pte table. It can be extended
* easily, subsequent pte tables have to be allocated in one physical
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 59f89b4..fea4565 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -1,7 +1,7 @@
obj-y := init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
pat.o pgtable.o gup.o
-obj-$(CONFIG_X86_32) += pgtable_32.o
+obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8396868..c483f42 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -334,7 +334,6 @@ int devmem_is_allowed(unsigned long pagenr)
return 0;
}
-#ifdef CONFIG_HIGHMEM
pte_t *kmap_pte;
pgprot_t kmap_prot;
@@ -357,6 +356,7 @@ static void __init kmap_init(void)
kmap_prot = PAGE_KERNEL;
}
+#ifdef CONFIG_HIGHMEM
static void __init permanent_kmaps_init(pgd_t *pgd_base)
{
unsigned long vaddr;
@@ -436,7 +436,6 @@ static void __init set_highmem_pages_init(void)
#endif /* !CONFIG_NUMA */
#else
-# define kmap_init() do { } while (0)
# define permanent_kmaps_init(pgd_base) do { } while (0)
# define set_highmem_pages_init() do { } while (0)
#endif /* CONFIG_HIGHMEM */
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
new file mode 100644
index 0000000..d0151d8
--- /dev/null
+++ b/arch/x86/mm/iomap_32.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright ? 2008 Ingo Molnar
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/iomap.h>
+#include <linux/module.h>
+
+/* Map 'pfn' using fixed map 'type' and protections 'prot'
+ */
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
+{
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+
+ pagefault_disable();
+
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
+ arch_flush_lazy_mmu_mode();
+
+ return (void*) vaddr;
+}
+EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type)
+{
+ unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
+ enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
+
+ /*
+ * Force other mappings to Oops if they'll try to access this pte
+ * without first remap it. Keeping stale mappings around is a bad idea
+ * also, in case the page changes cacheability attributes or becomes
+ * a protected page in a hypervisor.
+ */
+ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
+ kpte_clear_flush(kmap_pte-idx, vaddr);
+
+ arch_flush_lazy_mmu_mode();
+ pagefault_enable();
+}
+EXPORT_SYMBOL_GPL(iounmap_atomic);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..126b2f1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
#define _I915_DRV_H_
#include "i915_reg.h"
+#include <linux/io-mapping.h>
/* General customization:
*/
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
struct {
struct drm_mm gtt_space;
+ struct io_mapping *gtt_mapping;
+
/**
* List of objects currently involved in rendering from the
* ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 17ae330..61183b9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -171,35 +171,50 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
return 0;
}
-/*
- * Try to write quickly with an atomic kmap. Return true on success.
- *
- * If this fails (which includes a partial write), we'll redo the whole
- * thing with the slow version.
- *
- * This is a workaround for the low performance of iounmap (approximate
- * 10% cpu cost on normal 3D workloads). kmap_atomic on HIGHMEM kernels
- * happens to let us map card memory without taking IPIs. When the vmap
- * rework lands we should be able to dump this hack.
+/* This is the fast write path which cannot handle
+ * page faults in the source data
*/
-static inline int fast_user_write(unsigned long pfn, char __user *user_data,
- int l, int o)
+
+static inline int
+fast_user_write(struct io_mapping *mapping,
+ loff_t page_base, int page_offset,
+ char __user *user_data,
+ int length)
{
-#ifdef CONFIG_HIGHMEM
- unsigned long unwritten;
char *vaddr_atomic;
+ unsigned long unwritten;
- vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
-#if WATCH_PWRITE
- DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr_atomic);
-#endif
- unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o, user_data, l);
- kunmap_atomic(vaddr_atomic, KM_USER0);
- return !unwritten;
-#else
+ vaddr_atomic = io_mapping_map_atomic_wc(mapping, page_base);
+ unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + page_offset,
+ user_data, length);
+ io_mapping_unmap_atomic(vaddr_atomic);
+ if (unwritten)
+ return -EFAULT;
+ return 0;
+}
+
+/* Here's the write path which can sleep for
+ * page faults
+ */
+
+static inline int
+slow_user_write(struct io_mapping *mapping,
+ loff_t page_base, int page_offset,
+ char __user *user_data,
+ int length)
+{
+ char __iomem *vaddr;
+ unsigned long unwritten;
+
+ vaddr = io_mapping_map_wc(mapping, page_base);
+ if (vaddr == NULL)
+ return -EFAULT;
+ unwritten = __copy_from_user(vaddr + page_offset,
+ user_data, length);
+ io_mapping_unmap(vaddr);
+ if (unwritten)
+ return -EFAULT;
return 0;
-#endif
}
static int
@@ -208,10 +223,12 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_file *file_priv)
{
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ drm_i915_private_t *dev_priv = dev->dev_private;
ssize_t remain;
- loff_t offset;
+ loff_t offset, page_base;
char __user *user_data;
- int ret = 0;
+ int page_offset, page_length;
+ int ret;
user_data = (char __user *) (uintptr_t) args->data_ptr;
remain = args->size;
@@ -235,57 +252,37 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
obj_priv->dirty = 1;
while (remain > 0) {
- unsigned long pfn;
- int i, o, l;
-
/* Operation in this page
*
- * i = page number
- * o = offset within page
- * l = bytes to copy
+ * page_base = page offset within aperture
+ * page_offset = offset within page
+ * page_length = bytes to copy for this page
*/
- i = offset >> PAGE_SHIFT;
- o = offset & (PAGE_SIZE-1);
- l = remain;
- if ((o + l) > PAGE_SIZE)
- l = PAGE_SIZE - o;
-
- pfn = (dev->agp->base >> PAGE_SHIFT) + i;
-
- if (!fast_user_write(pfn, user_data, l, o)) {
- unsigned long unwritten;
- char __iomem *vaddr;
-
- vaddr = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
-#if WATCH_PWRITE
- DRM_INFO("pwrite slow i %d o %d l %d "
- "pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr);
-#endif
- if (vaddr == NULL) {
- ret = -EFAULT;
- goto fail;
- }
- unwritten = __copy_from_user(vaddr + o, user_data, l);
-#if WATCH_PWRITE
- DRM_INFO("unwritten %ld\n", unwritten);
-#endif
- iounmap(vaddr);
- if (unwritten) {
- ret = -EFAULT;
+ page_base = (offset & ~(PAGE_SIZE-1));
+ page_offset = offset & (PAGE_SIZE-1);
+ page_length = remain;
+ if ((page_offset + remain) > PAGE_SIZE)
+ page_length = PAGE_SIZE - page_offset;
+
+ ret = fast_user_write (dev_priv->mm.gtt_mapping, page_base,
+ page_offset, user_data, page_length);
+
+ /* If we get a fault while copying data, then (presumably) our
+ * source page isn't available. In this case, use the
+ * non-atomic function
+ */
+ if (ret) {
+ ret = slow_user_write (dev_priv->mm.gtt_mapping,
+ page_base, page_offset,
+ user_data, page_length);
+ if (ret)
goto fail;
- }
}
- remain -= l;
- user_data += l;
- offset += l;
+ remain -= page_length;
+ user_data += page_length;
+ offset += page_length;
}
-#if WATCH_PWRITE && 1
- i915_gem_clflush_object(obj);
- i915_gem_dump_object(obj, args->offset + args->size, __func__, ~0);
- i915_gem_clflush_object(obj);
-#endif
fail:
i915_gem_object_unpin(obj);
@@ -1503,12 +1500,12 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
struct drm_i915_gem_exec_object *entry)
{
struct drm_device *dev = obj->dev;
+ drm_i915_private_t *dev_priv = dev->dev_private;
struct drm_i915_gem_relocation_entry reloc;
struct drm_i915_gem_relocation_entry __user *relocs;
struct drm_i915_gem_object *obj_priv = obj->driver_private;
int i, ret;
- uint32_t last_reloc_offset = -1;
- void __iomem *reloc_page = NULL;
+ void __iomem *reloc_page;
/* Choose the GTT offset for our buffer and put it there. */
ret = i915_gem_object_pin(obj, (uint32_t) entry->alignment);
@@ -1631,26 +1628,11 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
* perform.
*/
reloc_offset = obj_priv->gtt_offset + reloc.offset;
- if (reloc_page == NULL ||
- (last_reloc_offset & ~(PAGE_SIZE - 1)) !=
- (reloc_offset & ~(PAGE_SIZE - 1))) {
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
- reloc_page = ioremap_wc(dev->agp->base +
- (reloc_offset &
- ~(PAGE_SIZE - 1)),
- PAGE_SIZE);
- last_reloc_offset = reloc_offset;
- if (reloc_page == NULL) {
- drm_gem_object_unreference(target_obj);
- i915_gem_object_unpin(obj);
- return -ENOMEM;
- }
- }
-
+ reloc_page = io_mapping_map_atomic_wc(dev_priv->mm.gtt_mapping,
+ (reloc_offset &
+ ~(PAGE_SIZE - 1)));
reloc_entry = (uint32_t __iomem *)(reloc_page +
- (reloc_offset & (PAGE_SIZE - 1)));
+ (reloc_offset & (PAGE_SIZE - 1)));
reloc_val = target_obj_priv->gtt_offset + reloc.delta;
#if WATCH_BUF
@@ -1659,6 +1641,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
readl(reloc_entry), reloc_val);
#endif
writel(reloc_val, reloc_entry);
+ io_mapping_unmap_atomic(reloc_page);
/* Write the updated presumed offset for this entry back out
* to the user.
@@ -1674,9 +1657,6 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
drm_gem_object_unreference(target_obj);
}
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
#if WATCH_BUF
if (0)
i915_gem_dump_object(obj, 128, __func__, ~0);
@@ -2518,6 +2498,10 @@ i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
if (ret != 0)
return ret;
+ dev_priv->mm.gtt_mapping = io_mapping_create_wc(dev->agp->base,
+ dev->agp->agp_info.aper_size
+ * 1024 * 1024);
+
mutex_lock(&dev->struct_mutex);
BUG_ON(!list_empty(&dev_priv->mm.active_list));
BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
@@ -2535,11 +2519,13 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
+ drm_i915_private_t *dev_priv = dev->dev_private;
int ret;
ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
+ io_mapping_free(dev_priv->mm.gtt_mapping);
return ret;
}
diff --git a/include/asm-x86/iomap.h b/include/asm-x86/iomap.h
new file mode 100644
index 0000000..c1f0628
--- /dev/null
+++ b/include/asm-x86/iomap.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright ? 2008 Ingo Molnar
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type);
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
new file mode 100644
index 0000000..1b56699
--- /dev/null
+++ b/include/linux/io-mapping.h
@@ -0,0 +1,118 @@
+/*
+ * Copyright ? 2008 Keith Packard <[email protected]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IO_MAPPING_H
+#define _LINUX_IO_MAPPING_H
+
+#include <linux/types.h>
+#include <asm/io.h>
+#include <asm/page.h>
+#include <asm/iomap.h>
+
+/*
+ * The io_mapping mechanism provides an abstraction for mapping
+ * individual pages from an io device to the CPU in an efficient fashion.
+ *
+ * See Documentation/io_mapping.txt
+ */
+
+/* this struct isn't actually defined anywhere */
+struct io_mapping;
+
+#ifdef CONFIG_X86_64
+
+/* Create the io_mapping object*/
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) ioremap_wc(base, size);
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+ iounmap(mapping);
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+}
+
+/* Non-atomic map/unmap */
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+}
+
+#endif /* CONFIG_X86_64 */
+
+#ifdef CONFIG_X86_32
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) base;
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+ return iomap_atomic_prot_pfn(offset >> PAGE_SHIFT, KM_USER0,
+ __pgprot(__PAGE_KERNEL_WC));
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+ iounmap_atomic(vaddr, KM_USER0);
+}
+
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+ return ioremap_wc(offset, PAGE_SIZE);
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+ iounmap(vaddr);
+}
+#endif /* CONFIG_X86_32 */
+
+#endif /* _LINUX_IO_MAPPING_H */
* Ingo Molnar <[email protected]> wrote:
> there was one pending item: i was waiting for the x86 #ifdef to be
> cleaned up out of include/linux/io-mapping.h, but that is trivial
> with no impact to any object code and can thus perhaps wait - would
> be nice to get this in early in this -rc.
These cleanups can now be found in the second tree below.
They can be cleanly pulled over the other tree i just posted, but it
has slightly less testing. (the cleanup commits are fresh - but there
should be no difference in functionality)
Please pull the latest io-mappings-for-linus-2 git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git io-mappings-for-linus-2
Thanks,
Ingo
------------------>
Keith Packard (5):
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
resources: add io-mapping functions to dynamically map large device apertures
i915: use io-mapping interfaces instead of a variety of mapping kludges
io mapping: improve documentation
io mapping: clean up #ifdefs
Documentation/io-mapping.txt | 82 ++++++++++++++++++
arch/x86/Kconfig | 4 +
arch/x86/include/asm/fixmap.h | 4 +
arch/x86/include/asm/fixmap_32.h | 4 -
arch/x86/include/asm/highmem.h | 5 +-
arch/x86/mm/Makefile | 2 +-
arch/x86/mm/init_32.c | 3 +-
arch/x86/mm/iomap_32.c | 59 +++++++++++++
drivers/gpu/drm/i915/i915_drv.h | 3 +
drivers/gpu/drm/i915/i915_gem.c | 174 +++++++++++++++++--------------------
include/asm-x86/iomap.h | 30 +++++++
include/linux/io-mapping.h | 125 +++++++++++++++++++++++++++
12 files changed, 390 insertions(+), 105 deletions(-)
create mode 100644 Documentation/io-mapping.txt
create mode 100644 arch/x86/mm/iomap_32.c
create mode 100644 include/asm-x86/iomap.h
create mode 100644 include/linux/io-mapping.h
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
new file mode 100644
index 0000000..473e43b
--- /dev/null
+++ b/Documentation/io-mapping.txt
@@ -0,0 +1,82 @@
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as it would consume too much of the kernel address space.
+
+A mapping object is created during driver initialization using
+
+ struct io_mapping *io_mapping_create_wc(unsigned long base,
+ unsigned long size)
+
+ 'base' is the bus address of the region to be made
+ mappable, while 'size' indicates how large a mapping region to
+ enable. Both are in bytes.
+
+ This _wc variant provides a mapping which may only be used
+ with the io_mapping_map_atomic_wc or io_mapping_map_wc.
+
+With this mapping object, individual pages can be mapped either atomically
+or not, depending on the necessary scheduling environment. Of course, atomic
+maps are more efficient:
+
+ void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
+ unsigned long offset)
+
+ 'offset' is the offset within the defined mapping region.
+ Accessing addresses beyond the region specified in the
+ creation function yields undefined results. Using an offset
+ which is not page aligned yields an undefined result. The
+ return value points to a single page in CPU address space.
+
+ This _wc variant returns a write-combining map to the
+ page and may only be used with mappings created by
+ io_mapping_create_wc
+
+ Note that the task may not sleep while holding this page
+ mapped.
+
+ void io_mapping_unmap_atomic(void *vaddr)
+
+ 'vaddr' must be the the value returned by the last
+ io_mapping_map_atomic_wc call. This unmaps the specified
+ page and allows the task to sleep once again.
+
+If you need to sleep while holding the lock, you can use the non-atomic
+variant, although they may be significantly slower.
+
+ void *io_mapping_map_wc(struct io_mapping *mapping,
+ unsigned long offset)
+
+ This works like io_mapping_map_atomic_wc except it allows
+ the task to sleep while holding the page mapped.
+
+ void io_mapping_unmap(void *vaddr)
+
+ This works like io_mapping_unmap_atomic, except it is used
+ for pages mapped with io_mapping_map_wc.
+
+At driver close time, the io_mapping object must be freed:
+
+ void io_mapping_free(struct io_mapping *mapping)
+
+Current Implementation:
+
+The initial implementation of these functions uses existing mapping
+mechanisms and so provides only an abstraction layer and no new
+functionality.
+
+On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
+range, creating a permanent kernel-visible mapping to the resource. The
+map_atomic and map functions add the requested offset to the base of the
+virtual address returned by ioremap_wc.
+
+On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
+kmap_atomic_pfn to map the specified page in an atomic fashion;
+kmap_atomic_pfn isn't really supposed to be used with device pages, but it
+provides an efficient mapping for this usage.
+
+On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
+io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
+performs an IPI to inform all processors about the new mapping. This results
+in a significant performance penalty.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6f20718..e60c59b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1894,6 +1894,10 @@ config SYSVIPC_COMPAT
endmenu
+config HAVE_ATOMIC_IOMAP
+ def_bool y
+ depends on X86_32
+
source "net/Kconfig"
source "drivers/Kconfig"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 8668a94..23696d4 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -9,6 +9,10 @@
extern int fixmaps_set;
+extern pte_t *kmap_pte;
+extern pgprot_t kmap_prot;
+extern pte_t *pkmap_page_table;
+
void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);
diff --git a/arch/x86/include/asm/fixmap_32.h b/arch/x86/include/asm/fixmap_32.h
index 09f29ab..c7115c1 100644
--- a/arch/x86/include/asm/fixmap_32.h
+++ b/arch/x86/include/asm/fixmap_32.h
@@ -28,10 +28,8 @@ extern unsigned long __FIXADDR_TOP;
#include <asm/acpi.h>
#include <asm/apicdef.h>
#include <asm/page.h>
-#ifdef CONFIG_HIGHMEM
#include <linux/threads.h>
#include <asm/kmap_types.h>
-#endif
/*
* Here we define all the compile-time 'special' virtual
@@ -75,10 +73,8 @@ enum fixed_addresses {
#ifdef CONFIG_X86_CYCLONE_TIMER
FIX_CYCLONE_TIMER, /*cyclone timer register*/
#endif
-#ifdef CONFIG_HIGHMEM
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
-#endif
#ifdef CONFIG_PCI_MMCONFIG
FIX_PCIE_MCFG,
#endif
diff --git a/arch/x86/include/asm/highmem.h b/arch/x86/include/asm/highmem.h
index a3b3b7c..bf9276b 100644
--- a/arch/x86/include/asm/highmem.h
+++ b/arch/x86/include/asm/highmem.h
@@ -25,14 +25,11 @@
#include <asm/kmap_types.h>
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
+#include <asm/fixmap.h>
/* declarations for highmem.c */
extern unsigned long highstart_pfn, highend_pfn;
-extern pte_t *kmap_pte;
-extern pgprot_t kmap_prot;
-extern pte_t *pkmap_page_table;
-
/*
* Right now we initialize only a single pte table. It can be extended
* easily, subsequent pte tables have to be allocated in one physical
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 59f89b4..fea4565 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -1,7 +1,7 @@
obj-y := init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
pat.o pgtable.o gup.o
-obj-$(CONFIG_X86_32) += pgtable_32.o
+obj-$(CONFIG_X86_32) += pgtable_32.o iomap_32.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8396868..c483f42 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -334,7 +334,6 @@ int devmem_is_allowed(unsigned long pagenr)
return 0;
}
-#ifdef CONFIG_HIGHMEM
pte_t *kmap_pte;
pgprot_t kmap_prot;
@@ -357,6 +356,7 @@ static void __init kmap_init(void)
kmap_prot = PAGE_KERNEL;
}
+#ifdef CONFIG_HIGHMEM
static void __init permanent_kmaps_init(pgd_t *pgd_base)
{
unsigned long vaddr;
@@ -436,7 +436,6 @@ static void __init set_highmem_pages_init(void)
#endif /* !CONFIG_NUMA */
#else
-# define kmap_init() do { } while (0)
# define permanent_kmaps_init(pgd_base) do { } while (0)
# define set_highmem_pages_init() do { } while (0)
#endif /* CONFIG_HIGHMEM */
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
new file mode 100644
index 0000000..d0151d8
--- /dev/null
+++ b/arch/x86/mm/iomap_32.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright ? 2008 Ingo Molnar
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/iomap.h>
+#include <linux/module.h>
+
+/* Map 'pfn' using fixed map 'type' and protections 'prot'
+ */
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
+{
+ enum fixed_addresses idx;
+ unsigned long vaddr;
+
+ pagefault_disable();
+
+ idx = type + KM_TYPE_NR*smp_processor_id();
+ vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ set_pte(kmap_pte-idx, pfn_pte(pfn, prot));
+ arch_flush_lazy_mmu_mode();
+
+ return (void*) vaddr;
+}
+EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type)
+{
+ unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
+ enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
+
+ /*
+ * Force other mappings to Oops if they'll try to access this pte
+ * without first remap it. Keeping stale mappings around is a bad idea
+ * also, in case the page changes cacheability attributes or becomes
+ * a protected page in a hypervisor.
+ */
+ if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
+ kpte_clear_flush(kmap_pte-idx, vaddr);
+
+ arch_flush_lazy_mmu_mode();
+ pagefault_enable();
+}
+EXPORT_SYMBOL_GPL(iounmap_atomic);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f20ffe1..126b2f1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -31,6 +31,7 @@
#define _I915_DRV_H_
#include "i915_reg.h"
+#include <linux/io-mapping.h>
/* General customization:
*/
@@ -246,6 +247,8 @@ typedef struct drm_i915_private {
struct {
struct drm_mm gtt_space;
+ struct io_mapping *gtt_mapping;
+
/**
* List of objects currently involved in rendering from the
* ringbuffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 17ae330..61183b9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -171,35 +171,50 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
return 0;
}
-/*
- * Try to write quickly with an atomic kmap. Return true on success.
- *
- * If this fails (which includes a partial write), we'll redo the whole
- * thing with the slow version.
- *
- * This is a workaround for the low performance of iounmap (approximate
- * 10% cpu cost on normal 3D workloads). kmap_atomic on HIGHMEM kernels
- * happens to let us map card memory without taking IPIs. When the vmap
- * rework lands we should be able to dump this hack.
+/* This is the fast write path which cannot handle
+ * page faults in the source data
*/
-static inline int fast_user_write(unsigned long pfn, char __user *user_data,
- int l, int o)
+
+static inline int
+fast_user_write(struct io_mapping *mapping,
+ loff_t page_base, int page_offset,
+ char __user *user_data,
+ int length)
{
-#ifdef CONFIG_HIGHMEM
- unsigned long unwritten;
char *vaddr_atomic;
+ unsigned long unwritten;
- vaddr_atomic = kmap_atomic_pfn(pfn, KM_USER0);
-#if WATCH_PWRITE
- DRM_INFO("pwrite i %d o %d l %d pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr_atomic);
-#endif
- unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + o, user_data, l);
- kunmap_atomic(vaddr_atomic, KM_USER0);
- return !unwritten;
-#else
+ vaddr_atomic = io_mapping_map_atomic_wc(mapping, page_base);
+ unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + page_offset,
+ user_data, length);
+ io_mapping_unmap_atomic(vaddr_atomic);
+ if (unwritten)
+ return -EFAULT;
+ return 0;
+}
+
+/* Here's the write path which can sleep for
+ * page faults
+ */
+
+static inline int
+slow_user_write(struct io_mapping *mapping,
+ loff_t page_base, int page_offset,
+ char __user *user_data,
+ int length)
+{
+ char __iomem *vaddr;
+ unsigned long unwritten;
+
+ vaddr = io_mapping_map_wc(mapping, page_base);
+ if (vaddr == NULL)
+ return -EFAULT;
+ unwritten = __copy_from_user(vaddr + page_offset,
+ user_data, length);
+ io_mapping_unmap(vaddr);
+ if (unwritten)
+ return -EFAULT;
return 0;
-#endif
}
static int
@@ -208,10 +223,12 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
struct drm_file *file_priv)
{
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ drm_i915_private_t *dev_priv = dev->dev_private;
ssize_t remain;
- loff_t offset;
+ loff_t offset, page_base;
char __user *user_data;
- int ret = 0;
+ int page_offset, page_length;
+ int ret;
user_data = (char __user *) (uintptr_t) args->data_ptr;
remain = args->size;
@@ -235,57 +252,37 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
obj_priv->dirty = 1;
while (remain > 0) {
- unsigned long pfn;
- int i, o, l;
-
/* Operation in this page
*
- * i = page number
- * o = offset within page
- * l = bytes to copy
+ * page_base = page offset within aperture
+ * page_offset = offset within page
+ * page_length = bytes to copy for this page
*/
- i = offset >> PAGE_SHIFT;
- o = offset & (PAGE_SIZE-1);
- l = remain;
- if ((o + l) > PAGE_SIZE)
- l = PAGE_SIZE - o;
-
- pfn = (dev->agp->base >> PAGE_SHIFT) + i;
-
- if (!fast_user_write(pfn, user_data, l, o)) {
- unsigned long unwritten;
- char __iomem *vaddr;
-
- vaddr = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
-#if WATCH_PWRITE
- DRM_INFO("pwrite slow i %d o %d l %d "
- "pfn %ld vaddr %p\n",
- i, o, l, pfn, vaddr);
-#endif
- if (vaddr == NULL) {
- ret = -EFAULT;
- goto fail;
- }
- unwritten = __copy_from_user(vaddr + o, user_data, l);
-#if WATCH_PWRITE
- DRM_INFO("unwritten %ld\n", unwritten);
-#endif
- iounmap(vaddr);
- if (unwritten) {
- ret = -EFAULT;
+ page_base = (offset & ~(PAGE_SIZE-1));
+ page_offset = offset & (PAGE_SIZE-1);
+ page_length = remain;
+ if ((page_offset + remain) > PAGE_SIZE)
+ page_length = PAGE_SIZE - page_offset;
+
+ ret = fast_user_write (dev_priv->mm.gtt_mapping, page_base,
+ page_offset, user_data, page_length);
+
+ /* If we get a fault while copying data, then (presumably) our
+ * source page isn't available. In this case, use the
+ * non-atomic function
+ */
+ if (ret) {
+ ret = slow_user_write (dev_priv->mm.gtt_mapping,
+ page_base, page_offset,
+ user_data, page_length);
+ if (ret)
goto fail;
- }
}
- remain -= l;
- user_data += l;
- offset += l;
+ remain -= page_length;
+ user_data += page_length;
+ offset += page_length;
}
-#if WATCH_PWRITE && 1
- i915_gem_clflush_object(obj);
- i915_gem_dump_object(obj, args->offset + args->size, __func__, ~0);
- i915_gem_clflush_object(obj);
-#endif
fail:
i915_gem_object_unpin(obj);
@@ -1503,12 +1500,12 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
struct drm_i915_gem_exec_object *entry)
{
struct drm_device *dev = obj->dev;
+ drm_i915_private_t *dev_priv = dev->dev_private;
struct drm_i915_gem_relocation_entry reloc;
struct drm_i915_gem_relocation_entry __user *relocs;
struct drm_i915_gem_object *obj_priv = obj->driver_private;
int i, ret;
- uint32_t last_reloc_offset = -1;
- void __iomem *reloc_page = NULL;
+ void __iomem *reloc_page;
/* Choose the GTT offset for our buffer and put it there. */
ret = i915_gem_object_pin(obj, (uint32_t) entry->alignment);
@@ -1631,26 +1628,11 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
* perform.
*/
reloc_offset = obj_priv->gtt_offset + reloc.offset;
- if (reloc_page == NULL ||
- (last_reloc_offset & ~(PAGE_SIZE - 1)) !=
- (reloc_offset & ~(PAGE_SIZE - 1))) {
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
- reloc_page = ioremap_wc(dev->agp->base +
- (reloc_offset &
- ~(PAGE_SIZE - 1)),
- PAGE_SIZE);
- last_reloc_offset = reloc_offset;
- if (reloc_page == NULL) {
- drm_gem_object_unreference(target_obj);
- i915_gem_object_unpin(obj);
- return -ENOMEM;
- }
- }
-
+ reloc_page = io_mapping_map_atomic_wc(dev_priv->mm.gtt_mapping,
+ (reloc_offset &
+ ~(PAGE_SIZE - 1)));
reloc_entry = (uint32_t __iomem *)(reloc_page +
- (reloc_offset & (PAGE_SIZE - 1)));
+ (reloc_offset & (PAGE_SIZE - 1)));
reloc_val = target_obj_priv->gtt_offset + reloc.delta;
#if WATCH_BUF
@@ -1659,6 +1641,7 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
readl(reloc_entry), reloc_val);
#endif
writel(reloc_val, reloc_entry);
+ io_mapping_unmap_atomic(reloc_page);
/* Write the updated presumed offset for this entry back out
* to the user.
@@ -1674,9 +1657,6 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
drm_gem_object_unreference(target_obj);
}
- if (reloc_page != NULL)
- iounmap(reloc_page);
-
#if WATCH_BUF
if (0)
i915_gem_dump_object(obj, 128, __func__, ~0);
@@ -2518,6 +2498,10 @@ i915_gem_entervt_ioctl(struct drm_device *dev, void *data,
if (ret != 0)
return ret;
+ dev_priv->mm.gtt_mapping = io_mapping_create_wc(dev->agp->base,
+ dev->agp->agp_info.aper_size
+ * 1024 * 1024);
+
mutex_lock(&dev->struct_mutex);
BUG_ON(!list_empty(&dev_priv->mm.active_list));
BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
@@ -2535,11 +2519,13 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
+ drm_i915_private_t *dev_priv = dev->dev_private;
int ret;
ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
+ io_mapping_free(dev_priv->mm.gtt_mapping);
return ret;
}
diff --git a/include/asm-x86/iomap.h b/include/asm-x86/iomap.h
new file mode 100644
index 0000000..c1f0628
--- /dev/null
+++ b/include/asm-x86/iomap.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright ? 2008 Ingo Molnar
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+void *
+iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
+
+void
+iounmap_atomic(void *kvaddr, enum km_type type);
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
new file mode 100644
index 0000000..82df317
--- /dev/null
+++ b/include/linux/io-mapping.h
@@ -0,0 +1,125 @@
+/*
+ * Copyright ? 2008 Keith Packard <[email protected]>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IO_MAPPING_H
+#define _LINUX_IO_MAPPING_H
+
+#include <linux/types.h>
+#include <asm/io.h>
+#include <asm/page.h>
+#include <asm/iomap.h>
+
+/*
+ * The io_mapping mechanism provides an abstraction for mapping
+ * individual pages from an io device to the CPU in an efficient fashion.
+ *
+ * See Documentation/io_mapping.txt
+ */
+
+/* this struct isn't actually defined anywhere */
+struct io_mapping;
+
+#ifdef CONFIG_HAVE_ATOMIC_IOMAP
+
+/*
+ * For small address space machines, mapping large objects
+ * into the kernel virtual space isn't practical. Where
+ * available, use fixmap support to dynamically map pages
+ * of the object at run time.
+ */
+
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) base;
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+ return iomap_atomic_prot_pfn(offset >> PAGE_SHIFT, KM_USER0,
+ __pgprot(__PAGE_KERNEL_WC));
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+ iounmap_atomic(vaddr, KM_USER0);
+}
+
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ offset += (unsigned long) mapping;
+ return ioremap_wc(offset, PAGE_SIZE);
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+ iounmap(vaddr);
+}
+
+#else
+
+/* Create the io_mapping object*/
+static inline struct io_mapping *
+io_mapping_create_wc(unsigned long base, unsigned long size)
+{
+ return (struct io_mapping *) ioremap_wc(base, size);
+}
+
+static inline void
+io_mapping_free(struct io_mapping *mapping)
+{
+ iounmap(mapping);
+}
+
+/* Atomic map/unmap */
+static inline void *
+io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap_atomic(void *vaddr)
+{
+}
+
+/* Non-atomic map/unmap */
+static inline void *
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+{
+ return ((char *) mapping) + offset;
+}
+
+static inline void
+io_mapping_unmap(void *vaddr)
+{
+}
+
+#endif /* HAVE_ATOMIC_IOMAP */
+
+#endif /* _LINUX_IO_MAPPING_H */
Having looked at this some, I have one, tiny little suggestion:
> +With this mapping object, individual pages can be mapped either
> atomically +or not, depending on the necessary scheduling
> environment. Of course, atomic +maps are more efficient:
> +
> + void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
> + unsigned long offset)
Should the documentation for this function (perhaps the
certainly-forthcoming kerneldoc comments :) mention loudly that this
function uses KM_USER0? This isn't kmap(), and doesn't look much like
it; someday some developer might get an ugly surprise when they try to
use that slot simultaneously for something else.
jon
* Jonathan Corbet <[email protected]> wrote:
> Having looked at this some, I have one, tiny little suggestion:
>
> > +With this mapping object, individual pages can be mapped either
> > atomically +or not, depending on the necessary scheduling
> > environment. Of course, atomic +maps are more efficient:
> > +
> > + void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
> > + unsigned long offset)
>
> Should the documentation for this function (perhaps the
> certainly-forthcoming kerneldoc comments :) mention loudly that this
> function uses KM_USER0? This isn't kmap(), and doesn't look much
> like it; someday some developer might get an ugly surprise when they
> try to use that slot simultaneously for something else.
definitely worth a comment.
Ingo