2015-06-25 01:24:48

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 0/9] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

From: "Luis R. Rodriguez" <[email protected]>

Boris,

This patchset is part of the long haul of series that addresses removal of
direct use of MTRR and transforms drivers over to use PAT interfaces when
available [0]. Other than this series there is only one more pending series for
that effort, the other one being the atyfb device driver specific changes which
no one has replied to for over one month and I'll soon repost and hope that
Andrew might pick up. The patches in this series were originally split in two
series but I've combined them now given all Acks have been collected and they
are all related. Tomi has provided his Acked-by for all device driver changes.
Bjorn had originally reviewed this series and was comfortable with all the code
except for the use of EXPORT_SYMBOL_GPL() despite new clarifications of how we
can use this for new symbols and our preference for it on new PAT interfaces
[1], despite this Bjorn has clarified he's comfortable with this going in
through another maintainer and in particular Arnd [2]. The v7 series was posted
addressing Arnd, Arnd provided his Acked-by for all PCI and devres changes but
noted he's on parental leave and not taking any patches for arm-soc or
asm-generic until he's back at work in around 3 months from now [2] so he
suggested to see if I could find another maintainer to have these go through.

This v8 goes unmodified, except for the devres commit, since those routines
are not yet used by any device driver for now I've just skipped exporting
the symbols but did note that if they will be it must be exported with
EXPORT_SYMBOL_GPL(). Once we have a driver need them upstream we can export
these.

Although I had test compiled this before just to be safe I went ahead and
successfully test-compiled this set with allmodconfig, specially since I've now
removed the exports for the devres routines. Please let me know if these might
be able to go through you or if there are any questions. I will note the recent
discussion with Benjamin over the v7 series concluded that the ideas we both
were alluding to, on automating instead the WC effects for devices seems a bit
too idealistic for PCI / PCIE for now, but perhaps we should at least consider
this in the future for userspace mmap() calls [4].

[0] http://lkml.kernel.org/r/CAB=NE6UgtdSoBsA=8+ueYRAZHDnWUSmQAoHhAaefqudBrSY7Zw@mail.gmail.com
[1] http://lkml.kernel.org/r/CAErSpo4sHA-f83X1nW2QdLT9GdubFXCQ7UEJmSfFc5GBjj8FSA@mail.gmail.com
[2] http://lkml.kernel.org/r/CAErSpo7CNH1WpgqJCEU8EtxiFNp_PiQ3cBwnKiWQpUaD-fd4YA@mail.gmail.com
[3] http://lkml.kernel.org/r/[email protected]

Luis R. Rodriguez (9):
pci: add pci_ioremap_wc_bar()
video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
PCI: Add pci_iomap_wc() variants
lib: devres: add pcim_iomap_wc() variants
video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()

drivers/pci/pci.c | 14 ++++++++
drivers/video/fbdev/arkfb.c | 36 +++----------------
drivers/video/fbdev/gxt4500.c | 2 +-
drivers/video/fbdev/i740fb.c | 35 ++++--------------
drivers/video/fbdev/kyro/fbdev.c | 33 ++++++-----------
drivers/video/fbdev/s3fb.c | 35 ++++--------------
drivers/video/fbdev/vt8623fb.c | 31 ++++------------
include/asm-generic/pci_iomap.h | 14 ++++++++
include/linux/pci.h | 3 ++
include/video/kyro.h | 4 +--
lib/devres.c | 76 ++++++++++++++++++++++++++++++++++++++++
lib/pci_iomap.c | 61 ++++++++++++++++++++++++++++++++
12 files changed, 204 insertions(+), 140 deletions(-)

--
2.3.2.209.gd67f9d5.dirty


2015-06-25 01:26:57

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 1/9] pci: add pci_ioremap_wc_bar()

From: "Luis R. Rodriguez" <[email protected]>

This lets drivers take advantage of PAT when available. This
should help with the transition of converting video drivers over
to ioremap_wc() to help with the goal of eventually using
_PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
ioremap_nocache() (de33c442e titled "x86 PAT: fix performance
drop for glx, use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

Cc: Toshi Kani <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/pci/pci.c | 14 ++++++++++++++
include/linux/pci.h | 1 +
2 files changed, 15 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0008c95..fdae37b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -138,6 +138,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
return ioremap_nocache(res->start, resource_size(res));
}
EXPORT_SYMBOL_GPL(pci_ioremap_bar);
+
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
+{
+ /*
+ * Make sure the BAR is actually a memory resource, not an IO resource
+ */
+ if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ return ioremap_wc(pci_resource_start(pdev, bar),
+ pci_resource_len(pdev, bar));
+}
+EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
#endif

#define PCI_FIND_CAP_TTL 48
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c0dd4ab..1193975 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1657,6 +1657,7 @@ static inline void pci_mmcfg_late_init(void) { }
int pci_ext_cfg_avail(void);

void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);

#ifdef CONFIG_PCI_IOV
int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:29:10

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 2/9] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()

From: "Luis R. Rodriguez" <[email protected]>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Jingoo Han <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Benoit Taine <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/i740fb.c | 35 ++++++-----------------------------
1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c
index a2b4204..452e116 100644
--- a/drivers/video/fbdev/i740fb.c
+++ b/drivers/video/fbdev/i740fb.c
@@ -27,24 +27,15 @@
#include <linux/console.h>
#include <video/vga.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
#include "i740_reg.h"

static char *mode_option;
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif

struct i740fb_par {
unsigned char __iomem *regs;
bool has_sgram;
-#ifdef CONFIG_MTRR
- int mtrr_reg;
-#endif
+ int wc_cookie;
bool ddc_registered;
struct i2c_adapter ddc_adapter;
struct i2c_algo_bit_data ddc_algo;
@@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
goto err_request_regions;
}

- info->screen_base = pci_ioremap_bar(dev, 0);
+ info->screen_base = pci_ioremap_wc_bar(dev, 0);
if (!info->screen_base) {
dev_err(info->device, "error remapping base\n");
ret = -ENOMEM;
@@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)

fb_info(info, "%s frame buffer device\n", info->fix.id);
pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start,
- info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;

err_reg_framebuffer:
@@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev)

if (info) {
struct i740fb_par *par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
if (par->ddc_registered)
@@ -1287,10 +1268,8 @@ static int __init i740fb_setup(char *options)
while ((opt = strsep(&options, ",")) != NULL) {
if (!*opt)
continue;
-#ifdef CONFIG_MTRR
else if (!strncmp(opt, "mtrr:", 5))
mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
else
mode_option = opt;
}
@@ -1327,7 +1306,5 @@ MODULE_DESCRIPTION("fbdev driver for Intel740");
module_param(mode_option, charp, 0444);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");

-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:31:21

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 3/9] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()

From: "Luis R. Rodriguez" <[email protected]>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Jingoo Han <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/kyro/fbdev.c | 33 +++++++++++----------------------
include/video/kyro.h | 4 +---
2 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c
index 65041e1..5bb0153 100644
--- a/drivers/video/fbdev/kyro/fbdev.c
+++ b/drivers/video/fbdev/kyro/fbdev.c
@@ -22,9 +22,6 @@
#include <linux/pci.h>
#include <asm/io.h>
#include <linux/uaccess.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif

#include <video/kyro.h>

@@ -84,9 +81,7 @@ static device_info_t deviceInfo;
static char *mode_option = NULL;
static int nopan = 0;
static int nowrap = 1;
-#ifdef CONFIG_MTRR
static int nomtrr = 0;
-#endif

/* PCI driver prototypes */
static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options)
nopan = 1;
} else if (strcmp(this_opt, "nowrap") == 0) {
nowrap = 1;
-#ifdef CONFIG_MTRR
} else if (strcmp(this_opt, "nomtrr") == 0) {
nomtrr = 1;
-#endif
} else {
mode_option = this_opt;
}
@@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

currentpar->regbase = deviceInfo.pSTGReg =
ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len);
+ if (!currentpar->regbase)
+ goto out_free_fb;

- info->screen_base = ioremap_nocache(kyro_fix.smem_start,
- kyro_fix.smem_len);
+ info->screen_base = pci_ioremap_wc_bar(pdev, 0);
+ if (!info->screen_base)
+ goto out_unmap_regs;

-#ifdef CONFIG_MTRR
if (!nomtrr)
- currentpar->mtrr_handle =
- mtrr_add(kyro_fix.smem_start,
- kyro_fix.smem_len,
- MTRR_TYPE_WRCOMB, 1);
-#endif
+ currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start,
+ kyro_fix.smem_len);

kyro_fix.ypanstep = nopan ? 0 : 1;
kyro_fix.ywrapstep = nowrap ? 0 : 1;
@@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;

out_unmap:
- iounmap(currentpar->regbase);
iounmap(info->screen_base);
+out_unmap_regs:
+ iounmap(currentpar->regbase);
+out_free_fb:
framebuffer_release(info);

return -EINVAL;
@@ -770,12 +764,7 @@ static void kyrofb_remove(struct pci_dev *pdev)
iounmap(info->screen_base);
iounmap(par->regbase);

-#ifdef CONFIG_MTRR
- if (par->mtrr_handle)
- mtrr_del(par->mtrr_handle,
- info->fix.smem_start,
- info->fix.smem_len);
-#endif
+ arch_phys_wc_del(par->wc_cookie);

unregister_framebuffer(info);
framebuffer_release(info);
diff --git a/include/video/kyro.h b/include/video/kyro.h
index c563968..b958c2e 100644
--- a/include/video/kyro.h
+++ b/include/video/kyro.h
@@ -35,9 +35,7 @@ struct kyrofb_info {
/* Useful to hold depth here for Linux */
u8 PIXDEPTH;

-#ifdef CONFIG_MTRR
- int mtrr_handle;
-#endif
+ int wc_cookie;
};

extern int kyro_dev_init(void);
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:33:34

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 4/9] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer

From: "Luis R. Rodriguez" <[email protected]>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

Cc: Laurent Pinchart <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/gxt4500.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c
index 135d78a..f19133a 100644
--- a/drivers/video/fbdev/gxt4500.c
+++ b/drivers/video/fbdev/gxt4500.c
@@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

info->fix.smem_start = fb_phys;
info->fix.smem_len = pci_resource_len(pdev, 1);
- info->screen_base = pci_ioremap_bar(pdev, 1);
+ info->screen_base = pci_ioremap_wc_bar(pdev, 1);
if (!info->screen_base) {
dev_err(&pdev->dev, "gxt4500: cannot map framebuffer\n");
goto err_unmap_regs;
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:35:44

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 5/9] PCI: Add pci_iomap_wc() variants

From: "Luis R. Rodriguez" <[email protected]>

PCI BARs tell us whether prefetching is safe, but they don't say anything
about write combining (WC). WC changes ordering rules and allows writes to
be collapsed, so it's not safe in general to use it on a prefetchable
region.

Add pci_iomap_wc() and pci_iomap_wc_range() so drivers can take advantage
of write combining when they know it's safe.

On architectures that don't fully support WC, e.g., x86 without PAT,
drivers for legacy framebuffers may get some of the benefit by using
arch_phys_wc_add() in addition to pci_iomap_wc(). But arch_phys_wc_add()
is unreliable and should be avoided in general. On x86, it uses MTRRs,
which are limited in number and size, so the results will vary based on
driver loading order.

The goals of adding pci_iomap_wc() are to:

- Give drivers an architecture-independent way to use WC so they can stop
using interfaces like mtrr_add() (on x86, pci_iomap_wc() uses
PAT when available)

- Move toward using _PAGE_CACHE_MODE_UC, not _PAGE_CACHE_MODE_UC_MINUS,
on x86 on ioremap_nocache() (see de33c442ed2a ("x86 PAT: fix
performance drop for glx, use UC minus for ioremap(), ioremap_nocache()
and pci_mmap_page_range()")

Link: http://lkml.kernel.org/r/[email protected]
Original-posting: http://lkml.kernel.org/r/[email protected]
Cc: Toshi Kani <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: [email protected]
Cc: Stefan Bader <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Roger Pau Monné <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/asm-generic/pci_iomap.h | 14 ++++++++++
lib/pci_iomap.c | 61 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 75 insertions(+)

diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index 7389c87..b1e17fc 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -15,9 +15,13 @@ struct pci_dev;
#ifdef CONFIG_PCI
/* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
unsigned long offset,
unsigned long maxlen);
+extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+ unsigned long offset,
+ unsigned long maxlen);
/* Create a virtual mapping cookie for a port on a given PCI device.
* Do not call this directly, it exists to make it easier for architectures
* to override */
@@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
return NULL;
}

+static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
+{
+ return NULL;
+}
static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
unsigned long offset,
unsigned long maxlen)
{
return NULL;
}
+static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+ unsigned long offset,
+ unsigned long maxlen)
+{
+ return NULL;
+}
#endif

#endif /* __ASM_GENERIC_IO_H */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index bcce5f1..9604dcb 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
EXPORT_SYMBOL(pci_iomap_range);

/**
+ * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @offset: map memory at the given offset in BAR
+ * @maxlen: max length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR from offset to the end, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
+ int bar,
+ unsigned long offset,
+ unsigned long maxlen)
+{
+ resource_size_t start = pci_resource_start(dev, bar);
+ resource_size_t len = pci_resource_len(dev, bar);
+ unsigned long flags = pci_resource_flags(dev, bar);
+
+ if (len <= offset || !start)
+ return NULL;
+ len -= offset;
+ start += offset;
+ if (maxlen && len > maxlen)
+ len = maxlen;
+ if (flags & IORESOURCE_IO)
+ return NULL;
+ if (flags & IORESOURCE_MEM)
+ return ioremap_wc(start, len);
+ /* What? */
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
+
+/**
* pci_iomap - create a virtual mapping cookie for a PCI BAR
* @dev: PCI device that owns the BAR
* @bar: BAR number
@@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
return pci_iomap_range(dev, bar, 0, maxlen);
}
EXPORT_SYMBOL(pci_iomap);
+
+/**
+ * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+ return pci_iomap_wc_range(dev, bar, 0, maxlen);
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc);
#endif /* CONFIG_PCI */
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:37:56

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 6/9] lib: devres: add pcim_iomap_wc() variants

From: "Luis R. Rodriguez" <[email protected]>

Now that we have pci_iomap_wc() add the respective
devres helpers. These go unexported for now but
note that should they later be exported this
must go with EXPORT_SYMBOL_GPL().

Cc: Toshi Kani <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: [email protected]
Cc: Stefan Bader <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Roger Pau Monné <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/linux/pci.h | 2 ++
lib/devres.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1193975..5ff15c1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1609,9 +1609,11 @@ static inline void pci_dev_specific_enable_acs(struct pci_dev *dev) { }
#endif

void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);
+void __iomem *pcim_iomap_wc(struct pci_dev *pdev, int bar, unsigned long maxlen);
void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
void __iomem * const *pcim_iomap_table(struct pci_dev *pdev);
int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name);
+int pcim_iomap_wc_regions(struct pci_dev *pdev, int mask, const char *name);
int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
const char *name);
void pcim_iounmap_regions(struct pci_dev *pdev, int mask);
diff --git a/lib/devres.c b/lib/devres.c
index fbe2aac..38acc53 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -304,6 +304,29 @@ void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen)
EXPORT_SYMBOL(pcim_iomap);

/**
+ * pcim_iomap_wc - Managed pcim_iomap_wc()
+ * @pdev: PCI device to iomap for
+ * @bar: BAR to iomap
+ * @maxlen: Maximum length of iomap
+ *
+ * Managed pci_iomap_wc(). Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *pcim_iomap_wc(struct pci_dev *pdev, int bar, unsigned long maxlen)
+{
+ void __iomem **tbl;
+
+ BUG_ON(bar >= PCIM_IOMAP_MAX);
+
+ tbl = (void __iomem **)pcim_iomap_table(pdev);
+ if (!tbl || tbl[bar]) /* duplicate mappings not allowed */
+ return NULL;
+
+ tbl[bar] = pci_iomap_wc(pdev, bar, maxlen);
+ return tbl[bar];
+}
+
+/**
* pcim_iounmap - Managed pci_iounmap()
* @pdev: PCI device to iounmap for
* @addr: Address to unmap
@@ -383,6 +406,59 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name)
EXPORT_SYMBOL(pcim_iomap_regions);

/**
+ * pcim_iomap_wc_regions - Request and iomap PCI BARs with write-combining
+ * @pdev: PCI device to map IO resources for
+ * @mask: Mask of BARs to request and iomap
+ * @name: Name used when requesting regions
+ *
+ * Request and iomap regions specified by @mask with a preference for
+ * write-combining.
+ */
+int pcim_iomap_wc_regions(struct pci_dev *pdev, int mask, const char *name)
+{
+ void __iomem * const *iomap;
+ int i, rc;
+
+ iomap = pcim_iomap_table(pdev);
+ if (!iomap)
+ return -ENOMEM;
+
+ for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+ unsigned long len;
+
+ if (!(mask & (1 << i)))
+ continue;
+
+ rc = -EINVAL;
+ len = pci_resource_len(pdev, i);
+ if (!len)
+ goto err_inval;
+
+ rc = pci_request_region(pdev, i, name);
+ if (rc)
+ goto err_inval;
+
+ rc = -ENOMEM;
+ if (!pcim_iomap_wc(pdev, i, 0))
+ goto err_region;
+ }
+
+ return 0;
+
+ err_region:
+ pci_release_region(pdev, i);
+ err_inval:
+ while (--i >= 0) {
+ if (!(mask & (1 << i)))
+ continue;
+ pcim_iounmap(pdev, iomap[i]);
+ pci_release_region(pdev, i);
+ }
+
+ return rc;
+}
+
+/**
* pcim_iomap_regions_request_all - Request all BARs and iomap specified ones
* @pdev: PCI device to map IO resources for
* @mask: Mask of BARs to iomap
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:40:01

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 7/9] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()

From: "Luis R. Rodriguez" <[email protected]>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Laurent Pinchart <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: "Lad, Prabhakar" <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/arkfb.c | 36 +++++-------------------------------
1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index b305a1e..6a317de 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -26,13 +26,9 @@
#include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
#include <video/vga.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct arkfb_info {
int mclk_freq;
- int mtrr_reg;
+ int wc_cookie;

struct dac_info *dac;
struct vgastate state;
@@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs = {

static char *mode_option = "640x480-8@60";

-#ifdef CONFIG_MTRR
-static int mtrr = 1;
-#endif
-
MODULE_AUTHOR("(c) 2007 Ondrej Zajicek <[email protected]>");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("fbdev driver for ARK 2000PV");
@@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0444);
MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");

-#ifdef CONFIG_MTRR
-module_param(mtrr, int, 0444);
-MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-
static int threshold = 4;

module_param(threshold, int, 0644);
@@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.smem_len = pci_resource_len(dev, 0);

/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)

/* Record a reference to the driver data */
pci_set_drvdata(dev, info);
-
-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
-
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);
return 0;

/* Error handling */
@@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev)

if (info) {
struct arkfb_info *par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
dac_release(par->dac);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:42:17

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 8/9] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()

From: "Luis R. Rodriguez" <[email protected]>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: Jingoo Han <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Lad, Prabhakar" <[email protected]>
Cc: Rickard Strandqvist <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/s3fb.c | 35 ++++++-----------------------------
1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index f0ae61a..13b1090 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -28,13 +28,9 @@
#include <linux/i2c.h>
#include <linux/i2c-algo-bit.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct s3fb_info {
int chip, rev, mclk_freq;
- int mtrr_reg;
+ int wc_cookie;
struct vgastate state;
struct mutex open_lock;
unsigned int ref_count;
@@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs = {


static char *mode_option;
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif
-
static int fasttext = 1;


@@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0444);
MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif

module_param(fasttext, int, 0644);
MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)");
@@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.smem_len = pci_resource_len(dev, 0);

/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
/* Record a reference to the driver data */
pci_set_drvdata(dev, info);

-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);

return 0;

@@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)

if (info) {
par = info->par;
-
-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);

@@ -1551,10 +1530,8 @@ static int __init s3fb_setup(char *options)

if (!*opt)
continue;
-#ifdef CONFIG_MTRR
else if (!strncmp(opt, "mtrr:", 5))
mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
else if (!strncmp(opt, "fasttext:", 9))
fasttext = simple_strtoul(opt + 9, NULL, 0);
else
--
2.3.2.209.gd67f9d5.dirty

2015-06-25 01:44:33

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v8 9/9] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()

From: "Luis R. Rodriguez" <[email protected]>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
de33c442e titled "x86 PAT: fix performance drop for glx,
use UC minus for ioremap(), ioremap_nocache() and
pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Rob Clark <[email protected]>
Cc: Laurent Pinchart <[email protected]>
Cc: Jingoo Han <[email protected]>
Cc: "Lad, Prabhakar" <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Antonino Daplas <[email protected]>
Cc: Jean-Christophe Plagniol-Villard <[email protected]>
Cc: Tomi Valkeinen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/video/fbdev/vt8623fb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index ea7f056..60f24828 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -26,13 +26,9 @@
#include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
#include <video/vga.h>

-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
struct vt8623fb_info {
char __iomem *mmio_base;
- int mtrr_reg;
+ int wc_cookie;
struct vgastate state;
struct mutex open_lock;
unsigned int ref_count;
@@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs = {
/* Module parameters */

static char *mode_option = "640x480-8@60";
-
-#ifdef CONFIG_MTRR
static int mtrr = 1;
-#endif

MODULE_AUTHOR("(c) 2006 Ondrej Zajicek <[email protected]>");
MODULE_LICENSE("GPL");
@@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644);
MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
module_param_named(mode, mode_option, charp, 0);
MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)");
-
-#ifdef CONFIG_MTRR
module_param(mtrr, int, 0444);
MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif


/* ------------------------------------------------------------------------- */
@@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
info->fix.mmio_len = pci_resource_len(dev, 1);

/* Map physical IO memory address into kernel space */
- info->screen_base = pci_iomap(dev, 0, 0);
+ info->screen_base = pci_iomap_wc(dev, 0, 0);
if (! info->screen_base) {
rc = -ENOMEM;
dev_err(info->device, "iomap for framebuffer failed\n");
@@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
/* Record a reference to the driver data */
pci_set_drvdata(dev, info);

-#ifdef CONFIG_MTRR
- if (mtrr) {
- par->mtrr_reg = -1;
- par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
- }
-#endif
+ if (mtrr)
+ par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+ info->fix.smem_len);

return 0;

@@ -816,13 +803,7 @@ static void vt8623_pci_remove(struct pci_dev *dev)
if (info) {
struct vt8623fb_info *par = info->par;

-#ifdef CONFIG_MTRR
- if (par->mtrr_reg >= 0) {
- mtrr_del(par->mtrr_reg, 0, 0);
- par->mtrr_reg = -1;
- }
-#endif
-
+ arch_phys_wc_del(par->wc_cookie);
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);

--
2.3.2.209.gd67f9d5.dirty

2015-06-25 15:09:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v8 5/9] PCI: Add pci_iomap_wc() variants

On Wed, Jun 24, 2015 at 06:22:18PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> PCI BARs tell us whether prefetching is safe, but they don't say anything
> about write combining (WC). WC changes ordering rules and allows writes to
> be collapsed, so it's not safe in general to use it on a prefetchable
> region.
>
> Add pci_iomap_wc() and pci_iomap_wc_range() so drivers can take advantage
> of write combining when they know it's safe.
>
> On architectures that don't fully support WC, e.g., x86 without PAT,
> drivers for legacy framebuffers may get some of the benefit by using
> arch_phys_wc_add() in addition to pci_iomap_wc(). But arch_phys_wc_add()
> is unreliable and should be avoided in general. On x86, it uses MTRRs,
> which are limited in number and size, so the results will vary based on
> driver loading order.
>
> The goals of adding pci_iomap_wc() are to:
>
> - Give drivers an architecture-independent way to use WC so they can stop
> using interfaces like mtrr_add() (on x86, pci_iomap_wc() uses
> PAT when available)
>
> - Move toward using _PAGE_CACHE_MODE_UC, not _PAGE_CACHE_MODE_UC_MINUS,
> on x86 on ioremap_nocache() (see de33c442ed2a ("x86 PAT: fix
> performance drop for glx, use UC minus for ioremap(), ioremap_nocache()
> and pci_mmap_page_range()")

...

> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index bcce5f1..9604dcb 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> EXPORT_SYMBOL(pci_iomap_range);
>
> /**
> + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @offset: map memory at the given offset in BAR
> + * @maxlen: max length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR from offset to the end, pass %0 here.
> + * */
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> + int bar,
> + unsigned long offset,
> + unsigned long maxlen)
> +{
> + resource_size_t start = pci_resource_start(dev, bar);
> + resource_size_t len = pci_resource_len(dev, bar);
> + unsigned long flags = pci_resource_flags(dev, bar);
> +
> + if (len <= offset || !start)
> + return NULL;
> + len -= offset;
> + start += offset;
> + if (maxlen && len > maxlen)
> + len = maxlen;
> + if (flags & IORESOURCE_IO)
> + return NULL;

I've moved this check at the beginning of the function so that we bail
out before doing the computations above it.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-06-25 15:41:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v8 6/9] lib: devres: add pcim_iomap_wc() variants

On Wed, Jun 24, 2015 at 06:22:19PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> Now that we have pci_iomap_wc() add the respective
> devres helpers. These go unexported for now but
> note that should they later be exported this
> must go with EXPORT_SYMBOL_GPL().

Do I see it correctly, those are not used in this patchset?

If so, then let's keep this patch in the bag and pick it up only when
those functions have users.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-06-25 15:52:23

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v8 6/9] lib: devres: add pcim_iomap_wc() variants

On Thu, Jun 25, 2015 at 8:40 AM, Borislav Petkov <[email protected]> wrote:
> On Wed, Jun 24, 2015 at 06:22:19PM -0700, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> Now that we have pci_iomap_wc() add the respective
>> devres helpers. These go unexported for now but
>> note that should they later be exported this
>> must go with EXPORT_SYMBOL_GPL().
>
> Do I see it correctly, those are not used in this patchset?

That's correct. It was a preemptive implementation of devres pci wc APIs.

> If so, then let's keep this patch in the bag and pick it up only when
> those functions have users.

OK!

Luis

2015-06-25 15:54:04

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v8 5/9] PCI: Add pci_iomap_wc() variants

On Thu, Jun 25, 2015 at 8:09 AM, Borislav Petkov <[email protected]> wrote:
>> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
>> + int bar,
>> + unsigned long offset,
>> + unsigned long maxlen)
>> +{
>> + resource_size_t start = pci_resource_start(dev, bar);
>> + resource_size_t len = pci_resource_len(dev, bar);
>> + unsigned long flags = pci_resource_flags(dev, bar);
>> +
>> + if (len <= offset || !start)
>> + return NULL;
>> + len -= offset;
>> + start += offset;
>> + if (maxlen && len > maxlen)
>> + len = maxlen;
>> + if (flags & IORESOURCE_IO)
>> + return NULL;
>
> I've moved this check at the beginning of the function so that we bail
> out before doing the computations above it.

That indeed looks like a good optimization.

Luis

2015-06-25 20:47:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v8 0/9] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

On Wed, Jun 24, 2015 at 06:22:13PM -0700, Luis R. Rodriguez wrote:
> Luis R. Rodriguez (9):
> pci: add pci_ioremap_wc_bar()
> video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
> video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
> video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
> PCI: Add pci_iomap_wc() variants
> lib: devres: add pcim_iomap_wc() variants
> video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
> video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
> video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
>
> drivers/pci/pci.c | 14 ++++++++
> drivers/video/fbdev/arkfb.c | 36 +++----------------
> drivers/video/fbdev/gxt4500.c | 2 +-
> drivers/video/fbdev/i740fb.c | 35 ++++--------------
> drivers/video/fbdev/kyro/fbdev.c | 33 ++++++-----------
> drivers/video/fbdev/s3fb.c | 35 ++++--------------
> drivers/video/fbdev/vt8623fb.c | 31 ++++------------
> include/asm-generic/pci_iomap.h | 14 ++++++++
> include/linux/pci.h | 3 ++
> include/video/kyro.h | 4 +--
> lib/devres.c | 76 ++++++++++++++++++++++++++++++++++++++++
> lib/pci_iomap.c | 61 ++++++++++++++++++++++++++++++++
> 12 files changed, 204 insertions(+), 140 deletions(-)

Took those, modulo the devres one.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-06-26 02:35:59

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH v8 0/9] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

On Wed, 2015-06-24 at 18:22 -0700, Luis R. Rodriguez wrote:
> Although I had test compiled this before just to be safe I went ahead and
> successfully test-compiled this set with allmodconfig, specially since I've now
> removed the exports for the devres routines. Please let me know if these might
> be able to go through you or if there are any questions. I will note the recent
> discussion with Benjamin over the v7 series concluded that the ideas we both
> were alluding to, on automating instead the WC effects for devices seems a bit
> too idealistic for PCI / PCIE for now, but perhaps we should at least consider
> this in the future for userspace mmap() calls [4].

So I've been trying to figure out how to make this practically work for us (powerpc).

writel() will never write combine for us, it uses too heavy barriers.

writel_relaxed() today is identical to writel() but we can change it.

The problem is that switching to G=0 mappings (which is what provides us with write
combining) also architecturally enables prefetch and speculative loads... and again
architecturally (the implementations may differ), kills the effect of the lightweight
io barrier eieio which we would have to use in readl_relaxed() and writel_relaxed()
to provide their normal semantics.

So it boils down to: Can we modify the documentation of readl_relaxed() and writel_relaxed()
to define them as being even further relaxed when using a "wc" mapping ?

Otherwise, the only way out I see for us on powerpc is to bias massively writel_relaxed()
against real_relaxed() by putting heavy barriers around the load in the latter so we can
keep them completely out of the former and still enable wc.

Ben.

2015-07-07 16:15:18

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v8 0/9] pci: add pci_iomap_wc() and pci_ioremap_wc_bar()

On Fri, Jun 26, 2015 at 12:12:06PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2015-06-24 at 18:22 -0700, Luis R. Rodriguez wrote:
> > Although I had test compiled this before just to be safe I went ahead and
> > successfully test-compiled this set with allmodconfig, specially since I've now
> > removed the exports for the devres routines. Please let me know if these might
> > be able to go through you or if there are any questions. I will note the recent
> > discussion with Benjamin over the v7 series concluded that the ideas we both
> > were alluding to, on automating instead the WC effects for devices seems a bit
> > too idealistic for PCI / PCIE for now, but perhaps we should at least consider
> > this in the future for userspace mmap() calls [4].
>
> So I've been trying to figure out how to make this practically work for us (powerpc).
>
> writel() will never write combine for us, it uses too heavy barriers.
>
> writel_relaxed() today is identical to writel() but we can change it.
>
> The problem is that switching to G=0 mappings (which is what provides us with write
> combining) also architecturally enables prefetch and speculative loads... and again
> architecturally (the implementations may differ), kills the effect of the lightweight
> io barrier eieio which we would have to use in readl_relaxed() and writel_relaxed()
> to provide their normal semantics.
>
> So it boils down to: Can we modify the documentation of readl_relaxed() and writel_relaxed()
> to define them as being even further relaxed when using a "wc" mapping ?
>
> Otherwise, the only way out I see for us on powerpc is to bias massively writel_relaxed()
> against real_relaxed() by putting heavy barriers around the load in the latter so we can
> keep them completely out of the former and still enable wc.

Depends if you semantically then also are implicating its use for the ioremap_wc()
area and if we've ensured we've visited all other possibilities to avoid this. Instead
of replying here though it seems we have a large general ioremap() semantic discussion
ongoing on another thread which is far ahead of this one and more generalized. Mind
following up there, seems the party is there:

http://lkml.kernel.org/r/[email protected]

Luis