2009-01-11 14:40:27

by Ingo Molnar

[permalink] [raw]
Subject: [git pull] x86 fixes


Linus,

Please pull the latest x86-fixes-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus

out-of-topic modifications in x86-fixes-for-linus:
--------------------------------------------------
include/asm-generic/pgtable.h # e104ba3: x86 PAT: change track_pfn_vma_new
mm/memory.c # e104ba3: x86 PAT: change track_pfn_vma_new
# e61304a: x86 PAT: remove PFNMAP type on tr

Thanks,

Ingo

------------------>
Andi Kleen (2):
x86: hpet: allow force enable on ICH10 HPET
x86: avoid theoretical vmalloc fault loop

Jaswinder Singh Rajput (1):
x86: fix mpparse.c build error on latest git

Kyle McMartin (1):
x86, mtrr: fix types used in userspace exported header

Suresh Siddha (1):
x86, pat: fix reserve_memtype() for legacy 1MB range

[email protected] (6):
x86 PAT: remove PFNMAP type on track_pfn_vma_new() error
x86 PAT: consolidate old memtype new memtype check into a function
x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
x86 PAT: return compatible mapping to remap_pfn_range callers
x86 PAT: ioremap_wc should take resource_size_t parameter
x86 PAT: remove CPA WARN_ON for zero pte


arch/x86/include/asm/io.h | 2 +-
arch/x86/include/asm/mtrr.h | 10 ++--
arch/x86/include/asm/pgtable.h | 19 +++++++
arch/x86/kernel/mpparse.c | 1 +
arch/x86/kernel/quirks.c | 3 +-
arch/x86/mm/fault.c | 2 +-
arch/x86/mm/ioremap.c | 2 +-
arch/x86/mm/pageattr.c | 10 ++--
arch/x86/mm/pat.c | 109 +++++++++++++++++++++++++++------------
arch/x86/pci/i386.c | 12 +----
include/asm-generic/pgtable.h | 4 +-
mm/memory.c | 15 ++++--
12 files changed, 125 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 05cfed4..bdbb4b9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -91,7 +91,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);

extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
unsigned long prot_val);
-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);

/*
* early_ioremap() and early_iounmap() are for temporary early boot-time
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index cb988aa..14080d2 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -58,15 +58,15 @@ struct mtrr_gentry {
#endif /* !__i386__ */

struct mtrr_var_range {
- u32 base_lo;
- u32 base_hi;
- u32 mask_lo;
- u32 mask_hi;
+ __u32 base_lo;
+ __u32 base_hi;
+ __u32 mask_lo;
+ __u32 mask_hi;
};

/* In the Intel processor's MTRR interface, the MTRR type is always held in
an 8 bit field: */
-typedef u8 mtrr_type;
+typedef __u8 mtrr_type;

#define MTRR_NUM_FIXED_RANGES 88
#define MTRR_MAX_VAR_RANGES 256
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 83e69f4..06bbcbd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -341,6 +341,25 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)

+static inline int is_new_memtype_allowed(unsigned long flags,
+ unsigned long new_flags)
+{
+ /*
+ * Certain new memtypes are not allowed with certain
+ * requested memtype:
+ * - request is uncached, return cannot be write-back
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC_MINUS &&
+ new_flags == _PAGE_CACHE_WB) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ return 0;
+ }
+
+ return 1;
+}
+
#ifndef __ASSEMBLY__
/* Indicate that x86 has its own track and untrack pfn vma functions */
#define __HAVE_PFNMAP_TRACKING
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index c0601c2..a649a4c 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -27,6 +27,7 @@
#include <asm/e820.h>
#include <asm/trampoline.h>
#include <asm/setup.h>
+#include <asm/smp.h>

#include <mach_apic.h>
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 309949e..697d1b7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -172,7 +172,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH8_4,
ich_force_enable_hpet);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH9_7,
ich_force_enable_hpet);
-
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x3a16, /* ICH10 */
+ ich_force_enable_hpet);

static struct pci_dev *cached_dev;

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9e268b6..90dfae5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -534,7 +534,7 @@ static int vmalloc_fault(unsigned long address)
happen within a race in page table update. In the later
case just flush. */

- pgd = pgd_offset(current->mm ?: &init_mm, address);
+ pgd = pgd_offset(current->active_mm, address);
pgd_ref = pgd_offset_k(address);
if (pgd_none(*pgd_ref))
return -1;
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bd85d42..2ddb1e7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -367,7 +367,7 @@ EXPORT_SYMBOL(ioremap_nocache);
*
* Must be freed with iounmap.
*/
-void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
if (pat_enabled)
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e89d248..4cf30de 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -555,10 +555,12 @@ repeat:
if (!pte_val(old_pte)) {
if (!primary)
return 0;
- WARN(1, KERN_WARNING "CPA: called for zero pte. "
- "vaddr = %lx cpa->vaddr = %lx\n", address,
- *cpa->vaddr);
- return -EINVAL;
+
+ /*
+ * Special error value returned, indicating that the mapping
+ * did not exist at this address.
+ */
+ return -EFAULT;
}

if (level == PG_LEVEL_4K) {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 85cbd3c..ec8cd49 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -333,11 +333,20 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
req_type & _PAGE_CACHE_MASK);
}

- is_range_ram = pagerange_is_ram(start, end);
- if (is_range_ram == 1)
- return reserve_ram_pages_type(start, end, req_type, new_type);
- else if (is_range_ram < 0)
- return -EINVAL;
+ /*
+ * For legacy reasons, some parts of the physical address range in the
+ * legacy 1MB region is treated as non-RAM (even when listed as RAM in
+ * the e820 tables). So we will track the memory attributes of this
+ * legacy 1MB region using the linear memtype_list always.
+ */
+ if (end >= ISA_END_ADDRESS) {
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type,
+ new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+ }

new = kmalloc(sizeof(struct memtype), GFP_KERNEL);
if (!new)
@@ -505,6 +514,35 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
}
#endif /* CONFIG_STRICT_DEVMEM */

+/*
+ * Change the memory type for the physial address range in kernel identity
+ * mapping space if that range is a part of identity map.
+ */
+static int kernel_map_sync_memtype(u64 base, unsigned long size,
+ unsigned long flags)
+{
+ unsigned long id_sz;
+ int ret;
+
+ if (!pat_enabled || base >= __pa(high_memory))
+ return 0;
+
+ id_sz = (__pa(high_memory) < base + size) ?
+ __pa(high_memory) - base :
+ size;
+
+ ret = ioremap_change_attr((unsigned long)__va(base), id_sz, flags);
+ /*
+ * -EFAULT return means that the addr was not valid and did not have
+ * any identity mapping. That case is a success for
+ * kernel_map_sync_memtype.
+ */
+ if (ret == -EFAULT)
+ ret = 0;
+
+ return ret;
+}
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -555,9 +593,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
if (retval < 0)
return 0;

- if (((pfn < max_low_pfn_mapped) ||
- (pfn >= (1UL<<(32 - PAGE_SHIFT)) && pfn < max_pfn_mapped)) &&
- ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
+ if (kernel_map_sync_memtype(offset, size, flags)) {
free_memtype(offset, offset + size);
printk(KERN_INFO
"%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
@@ -601,12 +637,13 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
* Reserved non RAM regions only and after successful reserve_memtype,
* this func also keeps identity mapping (if any) in sync with this new prot.
*/
-static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
+ int strict_prot)
{
int is_ram = 0;
- int id_sz, ret;
+ int ret;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+ unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

is_ram = pagerange_is_ram(paddr, paddr + size);

@@ -625,26 +662,27 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
return ret;

if (flags != want_flags) {
- free_memtype(paddr, paddr + size);
- printk(KERN_ERR
- "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
- current->comm, current->pid,
- cattr_name(want_flags),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size),
- cattr_name(flags));
- return -EINVAL;
+ if (strict_prot || !is_new_memtype_allowed(want_flags, flags)) {
+ free_memtype(paddr, paddr + size);
+ printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
+ " for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size),
+ cattr_name(flags));
+ return -EINVAL;
+ }
+ /*
+ * We allow returning different type than the one requested in
+ * non strict case.
+ */
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) &
+ (~_PAGE_CACHE_MASK)) |
+ flags);
}

- /* Need to keep identity mapping in sync */
- if (paddr >= __pa(high_memory))
- return 0;
-
- id_sz = (__pa(high_memory) < paddr + size) ?
- __pa(high_memory) - paddr :
- size;
-
- if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+ if (kernel_map_sync_memtype(paddr, size, flags)) {
free_memtype(paddr, paddr + size);
printk(KERN_ERR
"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
@@ -689,6 +727,7 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
unsigned long vma_start = vma->vm_start;
unsigned long vma_end = vma->vm_end;
unsigned long vma_size = vma_end - vma_start;
+ pgprot_t pgprot;

if (!pat_enabled)
return 0;
@@ -702,7 +741,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
WARN_ON_ONCE(1);
return -EINVAL;
}
- return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}

/* reserve entire vma page by page, using pfn and prot from pte */
@@ -710,7 +750,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
continue;

- retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
if (retval)
goto cleanup_ret;
}
@@ -741,7 +782,7 @@ cleanup_ret:
* Note that this function can be called with caller trying to map only a
* subrange/page inside the vma.
*/
-int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
int retval = 0;
@@ -758,14 +799,14 @@ int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, prot);
+ return reserve_pfn_range(paddr, vma_size, prot, 0);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
if (retval)
goto cleanup_ret;
}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index f884740..5ead808 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -314,17 +314,7 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
return retval;

if (flags != new_flags) {
- /*
- * Do not fallback to certain memory types with certain
- * requested type:
- * - request is uncached, return cannot be write-back
- * - request is uncached, return cannot be write-combine
- * - request is write-combine, return cannot be write-back
- */
- if ((flags == _PAGE_CACHE_UC_MINUS &&
- (new_flags == _PAGE_CACHE_WB)) ||
- (flags == _PAGE_CACHE_WC &&
- new_flags == _PAGE_CACHE_WB)) {
+ if (!is_new_memtype_allowed(flags, new_flags)) {
free_memtype(addr, addr+len);
return -EINVAL;
}
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 72ebe91..8e6d0ca 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -301,7 +301,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
* track_pfn_vma_new is called when a _new_ pfn mapping is being established
* for physical range indicated by pfn and size.
*/
-static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
return 0;
@@ -332,7 +332,7 @@ static inline void untrack_pfn_vma(struct vm_area_struct *vma,
{
}
#else
-extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size);
extern int track_pfn_vma_copy(struct vm_area_struct *vma);
extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
diff --git a/mm/memory.c b/mm/memory.c
index e009ce8..238fb8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
int ret;
+ pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
- if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+ if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
return -EINVAL;

- ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ ret = insert_pfn(vma, addr, pfn, pgprot);

if (ret)
untrack_pfn_vma(vma, pfn, PAGE_SIZE);
@@ -1671,9 +1672,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,

vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

- err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
- if (err)
+ err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
+ if (err) {
+ /*
+ * To indicate that track_pfn related cleanup is not
+ * needed from higher level routine calling unmap_vmas
+ */
+ vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
return -EINVAL;
+ }

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;


2009-01-11 16:45:35

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Sun, Jan 11, 2009 at 3:39 PM, Ingo Molnar <[email protected]> wrote:
>
> Linus,
>
> Please pull the latest x86-fixes-for-linus git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
>
> out-of-topic modifications in x86-fixes-for-linus:
> --------------------------------------------------
> include/asm-generic/pgtable.h # e104ba3: x86 PAT: change track_pfn_vma_new
> mm/memory.c # e104ba3: x86 PAT: change track_pfn_vma_new
> # e61304a: x86 PAT: remove PFNMAP type on tr
>
> Thanks,
>
> Ingo
>
> ------------------>
> Andi Kleen (2):
> x86: hpet: allow force enable on ICH10 HPET
> x86: avoid theoretical vmalloc fault loop
>
> Jaswinder Singh Rajput (1):
> x86: fix mpparse.c build error on latest git
>
> Kyle McMartin (1):
> x86, mtrr: fix types used in userspace exported header
>
> Suresh Siddha (1):
> x86, pat: fix reserve_memtype() for legacy 1MB range
>
> [email protected] (6):
> x86 PAT: remove PFNMAP type on track_pfn_vma_new() error
> x86 PAT: consolidate old memtype new memtype check into a function
> x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
> x86 PAT: return compatible mapping to remap_pfn_range callers
> x86 PAT: ioremap_wc should take resource_size_t parameter
> x86 PAT: remove CPA WARN_ON for zero pte

Something is (very) wrong with one(?) of these patches.

After upgrading from 2.6.28 to 2.6.29-rc1 I lost direct rendering.
Each time I tried to start a program that uses DRM I get this in the
syslog and the programm falls back to mesa software rendering:
Jan 11 13:32:31 treogen [ 77.167977] X:3280 map pfn expected mapping
type uncached-min
us for e0000000-e7ff8000, got write-combining
Jan 11 13:32:31 treogen [ 77.173620] X:3280 freeing invalid memtype
e0000000-e7ff8000
Jan 11 13:34:51 treogen [ 217.861668] glxinfo:3492 map pfn expected
mapping type uncach
ed-minus for e0000000-e7ff8000, got write-combining
Jan 11 13:34:51 treogen [ 217.867220] glxinfo:3492 freeing invalid
memtype e0000000-e7f
f8000
Jan 11 13:35:23 treogen [ 249.771043] glxinfo:3494 map pfn expected
mapping type uncach
ed-minus for e0000000-e7ff8000, got write-combining
Jan 11 13:35:23 treogen [ 249.776589] glxinfo:3494 freeing invalid
memtype e0000000-e7f
f8000

Otherwise 2.6.29-rc1 worked for me. Even booting with 'fastboot' did
not result in any problems, but it did cut the in-kernel-time down
from ~12 sec to ~6 sec.

Hoping to fix this memtype problem I applied the patch from the pull
request to 29-rc1 and rebooted. Now the system completely locks up
when X is trying to start.
Via serial console I got this Oops:
[ 79.500149] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000003
[ 79.509240] IP: [<0000000000000003>] 0x3
[ 79.510002] PGD 0
[ 79.510002] Oops: 0010 [#1] SMP
[ 79.510002] last sysfs file:
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
[ 79.510002] CPU 0
[ 79.510002] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
v4l2_compat_ioctl32 usbhid videobuf_dma_sg videobuf_core hid btcx_risc
tveeprom sg pata_amd
[ 79.510002] Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #2
[ 79.510002] RIP: 0010:[<0000000000000003>] [<0000000000000003>] 0x3
[ 79.510002] RSP: 0018:ffffffff809a8b18 EFLAGS: 00010002
[ 79.510002] RAX: 0000000000000001 RBX: ffffffff00000000 RCX: 0000000000000000
[ 79.510002] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff809a8ca8
[ 79.510002] RBP: ffffffff809a8b18 R08: 0000000000000001 R09: 0000000000000100
[ 79.510002] R10: ffffffff8026af40 R11: 00000000000068d8 R12: 0000000000000000
[ 79.510002] R13: ffff88007e4fd700 R14: ffff880028018d00 R15: ffffffff809a8aa8
[ 79.510002] FS: 00007ff217e406f0(0000) GS:ffffffff809b1040(0000)
knlGS:0000000000000000
[ 79.510002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 79.510002] CR2: 0000000000000003 CR3: 0000000000201000 CR4: 00000000000006e0
[ 79.510002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 79.510002] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[ 79.510002] Process swapper (pid: 0, threadinfo ffffffff8087e000,
task ffffffff807de360)
[ 79.510002] Stack:
[ 79.510002] ffffffff809a8b68 ffffffff802389d7 0000000000000000
ffffffff809a8b60
[ 79.510002] 0000000000000082 ffffffff8022a7a8 0000000000000000
0000000000000001
[ 79.510002] 0000000000000060 ffffffff807de360 ffffffff809a8b78
ffffffff80238b7d
[ 79.510002] Call Trace:
[ 79.510002] Call Trace:
[ 79.510002] <IRQ> <0> [<ffffffff802389d7>] try_to_wake_up+0x137/0x2d0
[ 79.510002] [<ffffffff8022a7a8>] ? do_page_fault+0x368/0x970
[ 79.510002] [<ffffffff80238b7d>] default_wake_function+0xd/0x10
[ 79.510002] [<ffffffff8025a751>] autoremove_wake_function+0x11/0x40
[ 79.510002] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
[ 79.510002] [<ffffffff8065d1ef>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 79.510002] [<ffffffff8026b02c>] ?
generic_smp_call_function_interrupt+0xec/0x100
[ 79.510002] [<ffffffff8065cddd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 79.510002] [<ffffffff8026af40>] ?
generic_smp_call_function_interrupt+0x0/0x100
[ 79.510002] [<ffffffff8026b02c>] ?
generic_smp_call_function_interrupt+0xec/0x100
[ 79.510002] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
[ 79.510002] [<ffffffff8026b02c>] ?
generic_smp_call_function_interrupt+0xec/0x100
[ 79.510002] [<ffffffff8026af40>] ?
generic_smp_call_function_interrupt+0x0/0x100
[ 79.510002] [<ffffffff8024402c>] ? warn_slowpath+0x4c/0x130
[ 79.510002] [<ffffffff804b8f85>] ? scsi_next_command+0x45/0x60
[ 79.510002] [<ffffffff804b9bd6>] ? scsi_io_completion+0x376/0x4e0
[ 79.510002] [<ffffffff804b2f6c>] ? scsi_finish_command+0xac/0xe0
[ 79.510002] [<ffffffff804b9e08>] ? scsi_softirq_done+0xb8/0x140
[ 79.510002] [<ffffffff8025d360>] ? __remove_hrtimer+0x40/0xa0
[ 79.510002] [<ffffffff8026b02c>] ?
generic_smp_call_function_interrupt+0xec/0x100
[ 79.510002] [<ffffffff8021e54f>] ? smp_call_function_interrupt+0x1f/0x30
[ 79.510002] [<ffffffff8020c863>] ? call_function_interrupt+0x13/0x20
[ 79.510002] <EOI> <0>Code: Bad RIP value.
[ 79.510002] RIP [<0000000000000003>] 0x3
[ 79.510002] RSP <ffffffff809a8b18>
[ 79.510002] CR2: 0000000000000003
[ 79.510002] ---[ end trace 99e686e29f771a49 ]---
[ 79.510002] Kernel panic - not syncing: Fatal exception in interrupt
[ 79.510002] ------------[ cut here ]------------

last sysfs file: /sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable

lspci -t:
-[0000:00]-+-00.0
[snip]
+-0f.0-[0000:01]--+-00.0
| \-00.1
lspci:
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60
[Radeon X300 (PCIE)]
01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]

Userspace is xorg-server-1.5.3 with mesa-7.3_rc1 and xf86-video-ati-6.9.0.
With 2.6.28 this combination works for accelerated direct rendering.
PAT was enabled on 2.6.28 and both vanilla 2.6.29-rc1 and the patched -rc1.

Just ask, if you need more information, or if you have a patch to try.

Torsten

2009-01-11 18:18:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Torsten Kaiser <[email protected]> wrote:

> On Sun, Jan 11, 2009 at 3:39 PM, Ingo Molnar <[email protected]> wrote:
> >
> > Linus,
> >
> > Please pull the latest x86-fixes-for-linus git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-fixes-for-linus
> >
> > out-of-topic modifications in x86-fixes-for-linus:
> > --------------------------------------------------
> > include/asm-generic/pgtable.h # e104ba3: x86 PAT: change track_pfn_vma_new
> > mm/memory.c # e104ba3: x86 PAT: change track_pfn_vma_new
> > # e61304a: x86 PAT: remove PFNMAP type on tr
> >
> > Thanks,
> >
> > Ingo
> >
> > ------------------>
> > Andi Kleen (2):
> > x86: hpet: allow force enable on ICH10 HPET
> > x86: avoid theoretical vmalloc fault loop
> >
> > Jaswinder Singh Rajput (1):
> > x86: fix mpparse.c build error on latest git
> >
> > Kyle McMartin (1):
> > x86, mtrr: fix types used in userspace exported header
> >
> > Suresh Siddha (1):
> > x86, pat: fix reserve_memtype() for legacy 1MB range
> >
> > [email protected] (6):
> > x86 PAT: remove PFNMAP type on track_pfn_vma_new() error
> > x86 PAT: consolidate old memtype new memtype check into a function
> > x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
> > x86 PAT: return compatible mapping to remap_pfn_range callers
> > x86 PAT: ioremap_wc should take resource_size_t parameter
> > x86 PAT: remove CPA WARN_ON for zero pte
>
> Something is (very) wrong with one(?) of these patches.
>
> After upgrading from 2.6.28 to 2.6.29-rc1 I lost direct rendering.
> Each time I tried to start a program that uses DRM I get this in the
> syslog and the programm falls back to mesa software rendering:
> Jan 11 13:32:31 treogen [ 77.167977] X:3280 map pfn expected mapping
> type uncached-min
> us for e0000000-e7ff8000, got write-combining
> Jan 11 13:32:31 treogen [ 77.173620] X:3280 freeing invalid memtype
> e0000000-e7ff8000
> Jan 11 13:34:51 treogen [ 217.861668] glxinfo:3492 map pfn expected
> mapping type uncach
> ed-minus for e0000000-e7ff8000, got write-combining
> Jan 11 13:34:51 treogen [ 217.867220] glxinfo:3492 freeing invalid
> memtype e0000000-e7f
> f8000
> Jan 11 13:35:23 treogen [ 249.771043] glxinfo:3494 map pfn expected
> mapping type uncach
> ed-minus for e0000000-e7ff8000, got write-combining
> Jan 11 13:35:23 treogen [ 249.776589] glxinfo:3494 freeing invalid
> memtype e0000000-e7f
> f8000
>
> Otherwise 2.6.29-rc1 worked for me. Even booting with 'fastboot' did
> not result in any problems, but it did cut the in-kernel-time down
> from ~12 sec to ~6 sec.
>
> Hoping to fix this memtype problem I applied the patch from the pull
> request to 29-rc1 and rebooted. Now the system completely locks up
> when X is trying to start.
> Via serial console I got this Oops:
> [ 79.500149] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000003
> [ 79.509240] IP: [<0000000000000003>] 0x3
> [ 79.510002] PGD 0
> [ 79.510002] Oops: 0010 [#1] SMP
> [ 79.510002] last sysfs file:
> /sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
> [ 79.510002] CPU 0
> [ 79.510002] Modules linked in: w83792d tuner tea5767 tda8290
> tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
> tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
> v4l2_compat_ioctl32 usbhid videobuf_dma_sg videobuf_core hid btcx_risc
> tveeprom sg pata_amd
> [ 79.510002] Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #2
> [ 79.510002] RIP: 0010:[<0000000000000003>] [<0000000000000003>] 0x3
> [ 79.510002] RSP: 0018:ffffffff809a8b18 EFLAGS: 00010002
> [ 79.510002] RAX: 0000000000000001 RBX: ffffffff00000000 RCX: 0000000000000000
> [ 79.510002] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff809a8ca8
> [ 79.510002] RBP: ffffffff809a8b18 R08: 0000000000000001 R09: 0000000000000100
> [ 79.510002] R10: ffffffff8026af40 R11: 00000000000068d8 R12: 0000000000000000
> [ 79.510002] R13: ffff88007e4fd700 R14: ffff880028018d00 R15: ffffffff809a8aa8
> [ 79.510002] FS: 00007ff217e406f0(0000) GS:ffffffff809b1040(0000)
> knlGS:0000000000000000
> [ 79.510002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 79.510002] CR2: 0000000000000003 CR3: 0000000000201000 CR4: 00000000000006e0
> [ 79.510002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 79.510002] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
> [ 79.510002] Process swapper (pid: 0, threadinfo ffffffff8087e000,
> task ffffffff807de360)
> [ 79.510002] Stack:
> [ 79.510002] ffffffff809a8b68 ffffffff802389d7 0000000000000000
> ffffffff809a8b60
> [ 79.510002] 0000000000000082 ffffffff8022a7a8 0000000000000000
> 0000000000000001
> [ 79.510002] 0000000000000060 ffffffff807de360 ffffffff809a8b78
> ffffffff80238b7d
> [ 79.510002] Call Trace:
> [ 79.510002] Call Trace:
> [ 79.510002] <IRQ> <0> [<ffffffff802389d7>] try_to_wake_up+0x137/0x2d0
> [ 79.510002] [<ffffffff8022a7a8>] ? do_page_fault+0x368/0x970
> [ 79.510002] [<ffffffff80238b7d>] default_wake_function+0xd/0x10
> [ 79.510002] [<ffffffff8025a751>] autoremove_wake_function+0x11/0x40
> [ 79.510002] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
> [ 79.510002] [<ffffffff8065d1ef>] ? _spin_unlock_irqrestore+0x2f/0x40
> [ 79.510002] [<ffffffff8026b02c>] ?
> generic_smp_call_function_interrupt+0xec/0x100
> [ 79.510002] [<ffffffff8065cddd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
> [ 79.510002] [<ffffffff8026af40>] ?
> generic_smp_call_function_interrupt+0x0/0x100
> [ 79.510002] [<ffffffff8026b02c>] ?
> generic_smp_call_function_interrupt+0xec/0x100
> [ 79.510002] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
> [ 79.510002] [<ffffffff8026b02c>] ?
> generic_smp_call_function_interrupt+0xec/0x100
> [ 79.510002] [<ffffffff8026af40>] ?
> generic_smp_call_function_interrupt+0x0/0x100
> [ 79.510002] [<ffffffff8024402c>] ? warn_slowpath+0x4c/0x130
> [ 79.510002] [<ffffffff804b8f85>] ? scsi_next_command+0x45/0x60
> [ 79.510002] [<ffffffff804b9bd6>] ? scsi_io_completion+0x376/0x4e0
> [ 79.510002] [<ffffffff804b2f6c>] ? scsi_finish_command+0xac/0xe0
> [ 79.510002] [<ffffffff804b9e08>] ? scsi_softirq_done+0xb8/0x140
> [ 79.510002] [<ffffffff8025d360>] ? __remove_hrtimer+0x40/0xa0
> [ 79.510002] [<ffffffff8026b02c>] ?
> generic_smp_call_function_interrupt+0xec/0x100
> [ 79.510002] [<ffffffff8021e54f>] ? smp_call_function_interrupt+0x1f/0x30
> [ 79.510002] [<ffffffff8020c863>] ? call_function_interrupt+0x13/0x20
> [ 79.510002] <EOI> <0>Code: Bad RIP value.
> [ 79.510002] RIP [<0000000000000003>] 0x3
> [ 79.510002] RSP <ffffffff809a8b18>
> [ 79.510002] CR2: 0000000000000003
> [ 79.510002] ---[ end trace 99e686e29f771a49 ]---
> [ 79.510002] Kernel panic - not syncing: Fatal exception in interrupt
> [ 79.510002] ------------[ cut here ]------------

hm, that looks really nasty crash - Linus, you might want to defer this
pull. The PAT folks Cc:-ed.

Ingo

2009-01-12 18:21:11

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: RE: [git pull] x86 fixes



>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Torsten Kaiser
>Sent: Sunday, January 11, 2009 8:45 AM
>To: Ingo Molnar
>Cc: Linus Torvalds; [email protected]; Andrew
>Morton; Thomas Gleixner; H. Peter Anvin
>Subject: Re: [git pull] x86 fixes
>
>On Sun, Jan 11, 2009 at 3:39 PM, Ingo Molnar <[email protected]> wrote:
>>
>> Linus,
>>
>> Please pull the latest x86-fixes-for-linus git tree from:
>>
>>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
>.git x86-fixes-for-linus
>>
>> out-of-topic modifications in x86-fixes-for-linus:
>> --------------------------------------------------
>> include/asm-generic/pgtable.h # e104ba3: x86 PAT:
>change track_pfn_vma_new
>> mm/memory.c # e104ba3: x86 PAT:
>change track_pfn_vma_new
>> # e61304a: x86 PAT: remove
>PFNMAP type on tr
>>
>> Thanks,
>>
>> Ingo
>>
>> ------------------>
>> Andi Kleen (2):
>> x86: hpet: allow force enable on ICH10 HPET
>> x86: avoid theoretical vmalloc fault loop
>>
>> Jaswinder Singh Rajput (1):
>> x86: fix mpparse.c build error on latest git
>>
>> Kyle McMartin (1):
>> x86, mtrr: fix types used in userspace exported header
>>
>> Suresh Siddha (1):
>> x86, pat: fix reserve_memtype() for legacy 1MB range
>>
>> [email protected] (6):
>> x86 PAT: remove PFNMAP type on track_pfn_vma_new() error
>> x86 PAT: consolidate old memtype new memtype check into
>a function
>> x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
>> x86 PAT: return compatible mapping to remap_pfn_range callers
>> x86 PAT: ioremap_wc should take resource_size_t parameter
>> x86 PAT: remove CPA WARN_ON for zero pte
>
>Something is (very) wrong with one(?) of these patches.
>
>After upgrading from 2.6.28 to 2.6.29-rc1 I lost direct rendering.
>Each time I tried to start a program that uses DRM I get this in the
>syslog and the programm falls back to mesa software rendering:
>Jan 11 13:32:31 treogen [ 77.167977] X:3280 map pfn expected mapping
>type uncached-min
>us for e0000000-e7ff8000, got write-combining
>Jan 11 13:32:31 treogen [ 77.173620] X:3280 freeing invalid memtype
>e0000000-e7ff8000
>Jan 11 13:34:51 treogen [ 217.861668] glxinfo:3492 map pfn expected
>mapping type uncach
>ed-minus for e0000000-e7ff8000, got write-combining
>Jan 11 13:34:51 treogen [ 217.867220] glxinfo:3492 freeing invalid
>memtype e0000000-e7f
>f8000
>Jan 11 13:35:23 treogen [ 249.771043] glxinfo:3494 map pfn expected
>mapping type uncach
>ed-minus for e0000000-e7ff8000, got write-combining
>Jan 11 13:35:23 treogen [ 249.776589] glxinfo:3494 freeing invalid
>memtype e0000000-e7f
>f8000
>
>Otherwise 2.6.29-rc1 worked for me. Even booting with 'fastboot' did
>not result in any problems, but it did cut the in-kernel-time down
>from ~12 sec to ~6 sec.
>
>Hoping to fix this memtype problem I applied the patch from the pull
>request to 29-rc1 and rebooted. Now the system completely locks up
>when X is trying to start.
>Via serial console I got this Oops:
>[ 79.500149] BUG: unable to handle kernel NULL pointer dereference
>at 0000000000000003
>[ 79.509240] IP: [<0000000000000003>] 0x3
>[ 79.510002] PGD 0
>[ 79.510002] Oops: 0010 [#1] SMP
>[ 79.510002] last sysfs file:
>/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
>[ 79.510002] CPU 0
>[ 79.510002] Modules linked in: w83792d tuner tea5767 tda8290
>tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
>tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
>v4l2_compat_ioctl32 usbhid videobuf_dma_sg videobuf_core hid btcx_risc
>tveeprom sg pata_amd
>[ 79.510002] Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #2
>[ 79.510002] RIP: 0010:[<0000000000000003>] [<0000000000000003>] 0x3
>[ 79.510002] RSP: 0018:ffffffff809a8b18 EFLAGS: 00010002
>[ 79.510002] RAX: 0000000000000001 RBX: ffffffff00000000
>RCX: 0000000000000000
>[ 79.510002] RDX: 0000000000000001 RSI: 0000000000000000
>RDI: ffffffff809a8ca8
>[ 79.510002] RBP: ffffffff809a8b18 R08: 0000000000000001
>R09: 0000000000000100
>[ 79.510002] R10: ffffffff8026af40 R11: 00000000000068d8
>R12: 0000000000000000
>[ 79.510002] R13: ffff88007e4fd700 R14: ffff880028018d00
>R15: ffffffff809a8aa8
>[ 79.510002] FS: 00007ff217e406f0(0000) GS:ffffffff809b1040(0000)
>knlGS:0000000000000000
>[ 79.510002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>[ 79.510002] CR2: 0000000000000003 CR3: 0000000000201000
>CR4: 00000000000006e0
>[ 79.510002] DR0: 0000000000000000 DR1: 0000000000000000
>DR2: 0000000000000000
>[ 79.510002] DR3: 0000000000000000 DR6: 00000000ffff4ff0
>DR7: 0000000000000400
>[ 79.510002] Process swapper (pid: 0, threadinfo ffffffff8087e000,
>task ffffffff807de360)
>[ 79.510002] Stack:
>[ 79.510002] ffffffff809a8b68 ffffffff802389d7 0000000000000000
>ffffffff809a8b60
>[ 79.510002] 0000000000000082 ffffffff8022a7a8 0000000000000000
>0000000000000001
>[ 79.510002] 0000000000000060 ffffffff807de360 ffffffff809a8b78
>ffffffff80238b7d
>[ 79.510002] Call Trace:
>[ 79.510002] Call Trace:
>[ 79.510002] <IRQ> <0> [<ffffffff802389d7>]
>try_to_wake_up+0x137/0x2d0
>[ 79.510002] [<ffffffff8022a7a8>] ? do_page_fault+0x368/0x970
>[ 79.510002] [<ffffffff80238b7d>] default_wake_function+0xd/0x10
>[ 79.510002] [<ffffffff8025a751>] autoremove_wake_function+0x11/0x40
>[ 79.510002] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
>[ 79.510002] [<ffffffff8065d1ef>] ?
>_spin_unlock_irqrestore+0x2f/0x40
>[ 79.510002] [<ffffffff8026b02c>] ?
>generic_smp_call_function_interrupt+0xec/0x100
>[ 79.510002] [<ffffffff8065cddd>] ?
>trace_hardirqs_off_thunk+0x3a/0x6c
>[ 79.510002] [<ffffffff8026af40>] ?
>generic_smp_call_function_interrupt+0x0/0x100
>[ 79.510002] [<ffffffff8026b02c>] ?
>generic_smp_call_function_interrupt+0xec/0x100
>[ 79.510002] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
>[ 79.510002] [<ffffffff8026b02c>] ?
>generic_smp_call_function_interrupt+0xec/0x100
>[ 79.510002] [<ffffffff8026af40>] ?
>generic_smp_call_function_interrupt+0x0/0x100
>[ 79.510002] [<ffffffff8024402c>] ? warn_slowpath+0x4c/0x130
>[ 79.510002] [<ffffffff804b8f85>] ? scsi_next_command+0x45/0x60
>[ 79.510002] [<ffffffff804b9bd6>] ? scsi_io_completion+0x376/0x4e0
>[ 79.510002] [<ffffffff804b2f6c>] ? scsi_finish_command+0xac/0xe0
>[ 79.510002] [<ffffffff804b9e08>] ? scsi_softirq_done+0xb8/0x140
>[ 79.510002] [<ffffffff8025d360>] ? __remove_hrtimer+0x40/0xa0
>[ 79.510002] [<ffffffff8026b02c>] ?
>generic_smp_call_function_interrupt+0xec/0x100
>[ 79.510002] [<ffffffff8021e54f>] ?
>smp_call_function_interrupt+0x1f/0x30
>[ 79.510002] [<ffffffff8020c863>] ?
>call_function_interrupt+0x13/0x20
>[ 79.510002] <EOI> <0>Code: Bad RIP value.
>[ 79.510002] RIP [<0000000000000003>] 0x3
>[ 79.510002] RSP <ffffffff809a8b18>
>[ 79.510002] CR2: 0000000000000003
>[ 79.510002] ---[ end trace 99e686e29f771a49 ]---
>[ 79.510002] Kernel panic - not syncing: Fatal exception in interrupt
>[ 79.510002] ------------[ cut here ]------------
>
>last sysfs file:
>/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
>
>lspci -t:
>-[0000:00]-+-00.0
> [snip]
> +-0f.0-[0000:01]--+-00.0
> | \-00.1
>lspci:
>00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express
>bridge (rev a3)
>01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60
>[Radeon X300 (PCIE)]
>01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
>
>Userspace is xorg-server-1.5.3 with mesa-7.3_rc1 and
>xf86-video-ati-6.9.0.
>With 2.6.28 this combination works for accelerated direct rendering.
>PAT was enabled on 2.6.28 and both vanilla 2.6.29-rc1 and the
>patched -rc1.
>
>Just ask, if you need more information, or if you have a patch to try.
>

Torsten,

I don't seem to be able to reproduce this failure on my test systems..
What distribution are you using here? Can you send me the kernel config that you used.

Thanks,
Venki

2009-01-12 19:02:21

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 7:17 PM, Pallipadi, Venkatesh
<[email protected]> wrote:
>>Hoping to fix this memtype problem I applied the patch from the pull
>>request to 29-rc1 and rebooted. Now the system completely locks up
>>when X is trying to start.
>>Via serial console I got this Oops:
>>[ 79.500149] BUG: unable to handle kernel NULL pointer dereference
>>at 0000000000000003
>>[ 79.509240] IP: [<0000000000000003>] 0x3
>>[ 79.510002] PGD 0
>>[ 79.510002] Oops: 0010 [#1] SMP
>>[ 79.510002] last sysfs file:
>>/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
>>[ 79.510002] CPU 0
>>[ 79.510002] Modules linked in: w83792d tuner tea5767 tda8290
>>tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
>>tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
>>v4l2_compat_ioctl32 usbhid videobuf_dma_sg videobuf_core hid btcx_risc
>>tveeprom sg pata_amd
>>[ 79.510002] Pid: 0, comm: swapper Not tainted 2.6.29-rc1 #2
>>[ 79.510002] RIP: 0010:[<0000000000000003>] [<0000000000000003>] 0x3
>>[ 79.510002] RSP: 0018:ffffffff809a8b18 EFLAGS: 00010002
>>[ 79.510002] RAX: 0000000000000001 RBX: ffffffff00000000
>>RCX: 0000000000000000
>>[ 79.510002] RDX: 0000000000000001 RSI: 0000000000000000
>>RDI: ffffffff809a8ca8
>>[ 79.510002] RBP: ffffffff809a8b18 R08: 0000000000000001
>>R09: 0000000000000100
>>[ 79.510002] R10: ffffffff8026af40 R11: 00000000000068d8
>>R12: 0000000000000000
>>[ 79.510002] R13: ffff88007e4fd700 R14: ffff880028018d00
>>R15: ffffffff809a8aa8
>>[ 79.510002] FS: 00007ff217e406f0(0000) GS:ffffffff809b1040(0000)
>>knlGS:0000000000000000
>>[ 79.510002] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>[ 79.510002] CR2: 0000000000000003 CR3: 0000000000201000
>>CR4: 00000000000006e0
>>[ 79.510002] DR0: 0000000000000000 DR1: 0000000000000000
>>DR2: 0000000000000000
>>[ 79.510002] DR3: 0000000000000000 DR6: 00000000ffff4ff0
>>DR7: 0000000000000400
>>[ 79.510002] Process swapper (pid: 0, threadinfo ffffffff8087e000,
>>task ffffffff807de360)
>>[ 79.510002] Stack:
>>[ 79.510002] ffffffff809a8b68 ffffffff802389d7 0000000000000000
>>ffffffff809a8b60
>>[ 79.510002] 0000000000000082 ffffffff8022a7a8 0000000000000000
>>0000000000000001
>>[ 79.510002] 0000000000000060 ffffffff807de360 ffffffff809a8b78
>>ffffffff80238b7d
>>[ 79.510002] Call Trace:
>>[ 79.510002] Call Trace:
>>[ 79.510002] <IRQ> <0> [<ffffffff802389d7>]
>>try_to_wake_up+0x137/0x2d0
>>[ 79.510002] [<ffffffff8022a7a8>] ? do_page_fault+0x368/0x970
>>[ 79.510002] [<ffffffff80238b7d>] default_wake_function+0xd/0x10
>>[ 79.510002] [<ffffffff8025a751>] autoremove_wake_function+0x11/0x40
>>[ 79.510002] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
>>[ 79.510002] [<ffffffff8065d1ef>] ?
>>_spin_unlock_irqrestore+0x2f/0x40
>>[ 79.510002] [<ffffffff8026b02c>] ?
>>generic_smp_call_function_interrupt+0xec/0x100
>>[ 79.510002] [<ffffffff8065cddd>] ?
>>trace_hardirqs_off_thunk+0x3a/0x6c
>>[ 79.510002] [<ffffffff8026af40>] ?
>>generic_smp_call_function_interrupt+0x0/0x100
>>[ 79.510002] [<ffffffff8026b02c>] ?
>>generic_smp_call_function_interrupt+0xec/0x100
>>[ 79.510002] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
>>[ 79.510002] [<ffffffff8026b02c>] ?
>>generic_smp_call_function_interrupt+0xec/0x100
>>[ 79.510002] [<ffffffff8026af40>] ?
>>generic_smp_call_function_interrupt+0x0/0x100
>>[ 79.510002] [<ffffffff8024402c>] ? warn_slowpath+0x4c/0x130
>>[ 79.510002] [<ffffffff804b8f85>] ? scsi_next_command+0x45/0x60
>>[ 79.510002] [<ffffffff804b9bd6>] ? scsi_io_completion+0x376/0x4e0
>>[ 79.510002] [<ffffffff804b2f6c>] ? scsi_finish_command+0xac/0xe0
>>[ 79.510002] [<ffffffff804b9e08>] ? scsi_softirq_done+0xb8/0x140
>>[ 79.510002] [<ffffffff8025d360>] ? __remove_hrtimer+0x40/0xa0
>>[ 79.510002] [<ffffffff8026b02c>] ?
>>generic_smp_call_function_interrupt+0xec/0x100
>>[ 79.510002] [<ffffffff8021e54f>] ?
>>smp_call_function_interrupt+0x1f/0x30
>>[ 79.510002] [<ffffffff8020c863>] ?
>>call_function_interrupt+0x13/0x20
>>[ 79.510002] <EOI> <0>Code: Bad RIP value.
>>[ 79.510002] RIP [<0000000000000003>] 0x3
>>[ 79.510002] RSP <ffffffff809a8b18>
>>[ 79.510002] CR2: 0000000000000003
>>[ 79.510002] ---[ end trace 99e686e29f771a49 ]---
>>[ 79.510002] Kernel panic - not syncing: Fatal exception in interrupt
>>[ 79.510002] ------------[ cut here ]------------
> Torsten,
>
> I don't seem to be able to reproduce this failure on my test systems..
> What distribution are you using here? Can you send me the kernel config that you used.

I'm using Gentoo, the compiler is:
gcc (Gentoo 4.3.2-r2 p1.5, pie-10.1.5) 4.3.2

The system has 2x 2218 Opterons with 4GB of RAM, so it a NUMA system
with 2 nodes.
What might be important is, that I switched to the new TREE_RCU:
# CONFIG_CLASSIC_RCU is not set
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=4
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set

Rest of the .config is attached. I used the same .config for the
vanilla 2.6.29-rc1 that worked apart from the DRM trouble that was
also reported by others and the version patched with these fixes.

HTH

Torsten


Attachments:
(No filename) (5.18 kB)
config.txt (59.02 kB)
Download all attachments

2009-01-12 19:19:49

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 11:01:57AM -0800, Torsten Kaiser wrote:
> On Mon, Jan 12, 2009 at 7:17 PM, Pallipadi, Venkatesh
> <[email protected]> wrote:
> >
> > I don't seem to be able to reproduce this failure on my test systems..
> > What distribution are you using here? Can you send me the kernel config that you used.
>
> I'm using Gentoo, the compiler is:
> gcc (Gentoo 4.3.2-r2 p1.5, pie-10.1.5) 4.3.2
>
> The system has 2x 2218 Opterons with 4GB of RAM, so it a NUMA system
> with 2 nodes.
> What might be important is, that I switched to the new TREE_RCU:
> # CONFIG_CLASSIC_RCU is not set
> CONFIG_TREE_RCU=y
> # CONFIG_PREEMPT_RCU is not set
> # CONFIG_RCU_TRACE is not set
> CONFIG_RCU_FANOUT=4
> # CONFIG_RCU_FANOUT_EXACT is not set
> # CONFIG_TREE_RCU_TRACE is not set
> # CONFIG_PREEMPT_RCU_TRACE is not set
>
> Rest of the .config is attached. I used the same .config for the
> vanilla 2.6.29-rc1 that worked apart from the DRM trouble that was
> also reported by others and the version patched with these fixes.
>

I will try with this config. Meanwhile can you try the single patch below
over 2.6.29-rc1 and see whether you still see the failure. This patch
is fixing the DRM issue that you had seen and does not include other fixes
cleaups that were in the patch series. If you still see the failure, can yo
usend me the full boot log from the crash.

Thanks,
Venki

Signed-off-by: Venkatesh Pallipadi <[email protected]>

---
arch/x86/mm/pat.c | 50 ++++++++++++++++++++++++++++++++++----------------
mm/memory.c | 15 +++++++++++----
2 files changed, 45 insertions(+), 20 deletions(-)

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c 2009-01-12 10:45:03.000000000 -0800
+++ linux-2.6/arch/x86/mm/pat.c 2009-01-12 11:06:43.000000000 -0800
@@ -601,12 +601,13 @@ void unmap_devmem(unsigned long pfn, uns
* Reserved non RAM regions only and after successful reserve_memtype,
* this func also keeps identity mapping (if any) in sync with this new prot.
*/
-static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
+ int strict_prot)
{
int is_ram = 0;
int id_sz, ret;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+ unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

is_ram = pagerange_is_ram(paddr, paddr + size);

@@ -625,15 +626,29 @@ static int reserve_pfn_range(u64 paddr,
return ret;

if (flags != want_flags) {
- free_memtype(paddr, paddr + size);
- printk(KERN_ERR
- "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
- current->comm, current->pid,
- cattr_name(want_flags),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size),
- cattr_name(flags));
- return -EINVAL;
+ if (strict_prot ||
+ (want_flags == _PAGE_CACHE_UC_MINUS &&
+ flags == _PAGE_CACHE_WB) ||
+ (want_flags == _PAGE_CACHE_WC &&
+ flags == _PAGE_CACHE_WB)) {
+ free_memtype(paddr, paddr + size);
+ printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
+ " for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size),
+ cattr_name(flags));
+ return -EINVAL;
+ }
+ /*
+ * We allow returning different type than the one requested in
+ * non strict case.
+ */
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) &
+ (~_PAGE_CACHE_MASK)) |
+ flags);
+
}

/* Need to keep identity mapping in sync */
@@ -689,6 +704,7 @@ int track_pfn_vma_copy(struct vm_area_st
unsigned long vma_start = vma->vm_start;
unsigned long vma_end = vma->vm_end;
unsigned long vma_size = vma_end - vma_start;
+ pgprot_t pgprot;

if (!pat_enabled)
return 0;
@@ -702,7 +718,8 @@ int track_pfn_vma_copy(struct vm_area_st
WARN_ON_ONCE(1);
return -EINVAL;
}
- return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}

/* reserve entire vma page by page, using pfn and prot from pte */
@@ -710,7 +727,8 @@ int track_pfn_vma_copy(struct vm_area_st
if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
continue;

- retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
if (retval)
goto cleanup_ret;
}
@@ -741,7 +759,7 @@ cleanup_ret:
* Note that this function can be called with caller trying to map only a
* subrange/page inside the vma.
*/
-int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
int retval = 0;
@@ -758,14 +776,14 @@ int track_pfn_vma_new(struct vm_area_str
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, prot);
+ return reserve_pfn_range(paddr, vma_size, prot, 0);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
if (retval)
goto cleanup_ret;
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2009-01-12 10:45:03.000000000 -0800
+++ linux-2.6/mm/memory.c 2009-01-12 10:59:30.000000000 -0800
@@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct
unsigned long pfn)
{
int ret;
+ pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
- if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+ if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
return -EINVAL;

- ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ ret = insert_pfn(vma, addr, pfn, pgprot);

if (ret)
untrack_pfn_vma(vma, pfn, PAGE_SIZE);
@@ -1671,9 +1672,15 @@ int remap_pfn_range(struct vm_area_struc

vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

- err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
- if (err)
+ err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
+ if (err) {
+ /*
+ * To indicate that track_pfn related cleanup is not
+ * needed from higher level routine calling unmap_vmas
+ */
+ vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
return -EINVAL;
+ }

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;

2009-01-12 19:29:39

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 11:19:35AM -0800, Pallipadi, Venkatesh wrote:
> On Mon, Jan 12, 2009 at 11:01:57AM -0800, Torsten Kaiser wrote:
> > On Mon, Jan 12, 2009 at 7:17 PM, Pallipadi, Venkatesh
> > <[email protected]> wrote:
> > >
> > > I don't seem to be able to reproduce this failure on my test systems..
> > > What distribution are you using here? Can you send me the kernel config that you used.
> >
> > I'm using Gentoo, the compiler is:
> > gcc (Gentoo 4.3.2-r2 p1.5, pie-10.1.5) 4.3.2
> >
> > The system has 2x 2218 Opterons with 4GB of RAM, so it a NUMA system
> > with 2 nodes.
> > What might be important is, that I switched to the new TREE_RCU:
> > # CONFIG_CLASSIC_RCU is not set
> > CONFIG_TREE_RCU=y
> > # CONFIG_PREEMPT_RCU is not set
> > # CONFIG_RCU_TRACE is not set
> > CONFIG_RCU_FANOUT=4
> > # CONFIG_RCU_FANOUT_EXACT is not set
> > # CONFIG_TREE_RCU_TRACE is not set
> > # CONFIG_PREEMPT_RCU_TRACE is not set
> >
> > Rest of the .config is attached. I used the same .config for the
> > vanilla 2.6.29-rc1 that worked apart from the DRM trouble that was
> > also reported by others and the version patched with these fixes.
> >
>
> I will try with this config. Meanwhile can you try the single patch below
> over 2.6.29-rc1 and see whether you still see the failure. This patch
> is fixing the DRM issue that you had seen and does not include other fixes
> cleaups that were in the patch series. If you still see the failure, can yo
> usend me the full boot log from the crash.
>

oops. I missed out one file in the earlier test patch. Below is the
updated test patch that will go against 29-rc1.

Thanks,
Venki

Signed-off-by: Venkatesh Pallipadi <[email protected]>

---

Index: linux-2.6/arch/x86/mm/pat.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat.c 2009-01-12 10:45:03.000000000 -0800
+++ linux-2.6/arch/x86/mm/pat.c 2009-01-12 11:06:43.000000000 -0800
@@ -601,12 +601,13 @@ void unmap_devmem(unsigned long pfn, uns
* Reserved non RAM regions only and after successful reserve_memtype,
* this func also keeps identity mapping (if any) in sync with this new prot.
*/
-static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
+ int strict_prot)
{
int is_ram = 0;
int id_sz, ret;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+ unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

is_ram = pagerange_is_ram(paddr, paddr + size);

@@ -625,15 +626,29 @@ static int reserve_pfn_range(u64 paddr,
return ret;

if (flags != want_flags) {
- free_memtype(paddr, paddr + size);
- printk(KERN_ERR
- "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
- current->comm, current->pid,
- cattr_name(want_flags),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size),
- cattr_name(flags));
- return -EINVAL;
+ if (strict_prot ||
+ (want_flags == _PAGE_CACHE_UC_MINUS &&
+ flags == _PAGE_CACHE_WB) ||
+ (want_flags == _PAGE_CACHE_WC &&
+ flags == _PAGE_CACHE_WB)) {
+ free_memtype(paddr, paddr + size);
+ printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
+ " for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size),
+ cattr_name(flags));
+ return -EINVAL;
+ }
+ /*
+ * We allow returning different type than the one requested in
+ * non strict case.
+ */
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) &
+ (~_PAGE_CACHE_MASK)) |
+ flags);
+
}

/* Need to keep identity mapping in sync */
@@ -689,6 +704,7 @@ int track_pfn_vma_copy(struct vm_area_st
unsigned long vma_start = vma->vm_start;
unsigned long vma_end = vma->vm_end;
unsigned long vma_size = vma_end - vma_start;
+ pgprot_t pgprot;

if (!pat_enabled)
return 0;
@@ -702,7 +718,8 @@ int track_pfn_vma_copy(struct vm_area_st
WARN_ON_ONCE(1);
return -EINVAL;
}
- return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}

/* reserve entire vma page by page, using pfn and prot from pte */
@@ -710,7 +727,8 @@ int track_pfn_vma_copy(struct vm_area_st
if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
continue;

- retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
if (retval)
goto cleanup_ret;
}
@@ -741,7 +759,7 @@ cleanup_ret:
* Note that this function can be called with caller trying to map only a
* subrange/page inside the vma.
*/
-int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
int retval = 0;
@@ -758,14 +776,14 @@ int track_pfn_vma_new(struct vm_area_str
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, prot);
+ return reserve_pfn_range(paddr, vma_size, prot, 0);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
if (retval)
goto cleanup_ret;
}
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2009-01-12 10:45:03.000000000 -0800
+++ linux-2.6/mm/memory.c 2009-01-12 10:59:30.000000000 -0800
@@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct
unsigned long pfn)
{
int ret;
+ pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
- if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+ if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
return -EINVAL;

- ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ ret = insert_pfn(vma, addr, pfn, pgprot);

if (ret)
untrack_pfn_vma(vma, pfn, PAGE_SIZE);
@@ -1671,9 +1672,15 @@ int remap_pfn_range(struct vm_area_struc

vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

- err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
- if (err)
+ err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
+ if (err) {
+ /*
+ * To indicate that track_pfn related cleanup is not
+ * needed from higher level routine calling unmap_vmas
+ */
+ vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
return -EINVAL;
+ }

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 72ebe91..8e6d0ca 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -301,7 +301,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
* track_pfn_vma_new is called when a _new_ pfn mapping is being established
* for physical range indicated by pfn and size.
*/
-static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
return 0;
@@ -332,7 +332,7 @@ static inline void untrack_pfn_vma(struct vm_area_struct *vma,
{
}
#else
-extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size);
extern int track_pfn_vma_copy(struct vm_area_struct *vma);
extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,

2009-01-12 19:48:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git pull] x86 fixes



On Mon, 12 Jan 2009, Pallipadi, Venkatesh wrote:
> + if (strict_prot ||
> + (want_flags == _PAGE_CACHE_UC_MINUS &&
> + flags == _PAGE_CACHE_WB) ||
> + (want_flags == _PAGE_CACHE_WC &&
> + flags == _PAGE_CACHE_WB)) {

Please don't write code like this.

Do it as an inline function that returns true/false and has comments on
what the hell is going on.

If a conditional doesn't fit on one line, it should generally be
abstracted away into a readable function where the name explains what it
does conceptually.

Linus

2009-01-12 19:54:52

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 11:47:13AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 12 Jan 2009, Pallipadi, Venkatesh wrote:
> > + if (strict_prot ||
> > + (want_flags == _PAGE_CACHE_UC_MINUS &&
> > + flags == _PAGE_CACHE_WB) ||
> > + (want_flags == _PAGE_CACHE_WC &&
> > + flags == _PAGE_CACHE_WB)) {
>
> Please don't write code like this.
>
> Do it as an inline function that returns true/false and has comments on
> what the hell is going on.
>
> If a conditional doesn't fit on one line, it should generally be
> abstracted away into a readable function where the name explains what it
> does conceptually.
>

Yes. The actual patch that is lined up in tip fixes indeed has this as a
macro sharing this code with 2 callers and comment about this
(is_new_memtype_allowed()). I wanted to keep the changes smaller in this
test patch, which is just to root cause this particular crash and ended
up with above code.

Thanks,
Venki

2009-01-12 20:05:42

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 8:29 PM, Pallipadi, Venkatesh
<[email protected]> wrote:
> oops. I missed out one file in the earlier test patch. Below is the
> updated test patch that will go against 29-rc1.
>
> Thanks,
> Venki
>
> Signed-off-by: Venkatesh Pallipadi <[email protected]>

Tested-by: Torsten Kaiser <[email protected]>

The system boots normal and glxgears is accelerated again.

> ---
>
> Index: linux-2.6/arch/x86/mm/pat.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/pat.c 2009-01-12 10:45:03.000000000 -0800
> +++ linux-2.6/arch/x86/mm/pat.c 2009-01-12 11:06:43.000000000 -0800
> @@ -601,12 +601,13 @@ void unmap_devmem(unsigned long pfn, uns
> * Reserved non RAM regions only and after successful reserve_memtype,
> * this func also keeps identity mapping (if any) in sync with this new prot.
> */
> -static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
> +static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
> + int strict_prot)
> {
> int is_ram = 0;
> int id_sz, ret;
> unsigned long flags;
> - unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
> + unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);
>
> is_ram = pagerange_is_ram(paddr, paddr + size);
>
> @@ -625,15 +626,29 @@ static int reserve_pfn_range(u64 paddr,
> return ret;
>
> if (flags != want_flags) {
> - free_memtype(paddr, paddr + size);
> - printk(KERN_ERR
> - "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
> - current->comm, current->pid,
> - cattr_name(want_flags),
> - (unsigned long long)paddr,
> - (unsigned long long)(paddr + size),
> - cattr_name(flags));
> - return -EINVAL;
> + if (strict_prot ||
> + (want_flags == _PAGE_CACHE_UC_MINUS &&
> + flags == _PAGE_CACHE_WB) ||
> + (want_flags == _PAGE_CACHE_WC &&
> + flags == _PAGE_CACHE_WB)) {
> + free_memtype(paddr, paddr + size);
> + printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
> + " for %Lx-%Lx, got %s\n",
> + current->comm, current->pid,
> + cattr_name(want_flags),
> + (unsigned long long)paddr,
> + (unsigned long long)(paddr + size),
> + cattr_name(flags));
> + return -EINVAL;
> + }
> + /*
> + * We allow returning different type than the one requested in
> + * non strict case.
> + */
> + *vma_prot = __pgprot((pgprot_val(*vma_prot) &
> + (~_PAGE_CACHE_MASK)) |
> + flags);
> +
> }
>
> /* Need to keep identity mapping in sync */
> @@ -689,6 +704,7 @@ int track_pfn_vma_copy(struct vm_area_st
> unsigned long vma_start = vma->vm_start;
> unsigned long vma_end = vma->vm_end;
> unsigned long vma_size = vma_end - vma_start;
> + pgprot_t pgprot;
>
> if (!pat_enabled)
> return 0;
> @@ -702,7 +718,8 @@ int track_pfn_vma_copy(struct vm_area_st
> WARN_ON_ONCE(1);
> return -EINVAL;
> }
> - return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
> + pgprot = __pgprot(prot);
> + return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
> }
>
> /* reserve entire vma page by page, using pfn and prot from pte */
> @@ -710,7 +727,8 @@ int track_pfn_vma_copy(struct vm_area_st
> if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
> continue;
>
> - retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
> + pgprot = __pgprot(prot);
> + retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
> if (retval)
> goto cleanup_ret;
> }
> @@ -741,7 +759,7 @@ cleanup_ret:
> * Note that this function can be called with caller trying to map only a
> * subrange/page inside the vma.
> */
> -int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> +int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
> unsigned long pfn, unsigned long size)
> {
> int retval = 0;
> @@ -758,14 +776,14 @@ int track_pfn_vma_new(struct vm_area_str
> if (is_linear_pfn_mapping(vma)) {
> /* reserve the whole chunk starting from vm_pgoff */
> paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
> - return reserve_pfn_range(paddr, vma_size, prot);
> + return reserve_pfn_range(paddr, vma_size, prot, 0);
> }
>
> /* reserve page by page using pfn and size */
> base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
> for (i = 0; i < size; i += PAGE_SIZE) {
> paddr = base_paddr + i;
> - retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
> + retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
> if (retval)
> goto cleanup_ret;
> }
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c 2009-01-12 10:45:03.000000000 -0800
> +++ linux-2.6/mm/memory.c 2009-01-12 10:59:30.000000000 -0800
> @@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct
> unsigned long pfn)
> {
> int ret;
> + pgprot_t pgprot = vma->vm_page_prot;
> /*
> * Technically, architectures with pte_special can avoid all these
> * restrictions (same for remap_pfn_range). However we would like
> @@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct
>
> if (addr < vma->vm_start || addr >= vma->vm_end)
> return -EFAULT;
> - if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
> + if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
> return -EINVAL;
>
> - ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
> + ret = insert_pfn(vma, addr, pfn, pgprot);
>
> if (ret)
> untrack_pfn_vma(vma, pfn, PAGE_SIZE);
> @@ -1671,9 +1672,15 @@ int remap_pfn_range(struct vm_area_struc
>
> vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
>
> - err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
> - if (err)
> + err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
> + if (err) {
> + /*
> + * To indicate that track_pfn related cleanup is not
> + * needed from higher level routine calling unmap_vmas
> + */
> + vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
> return -EINVAL;
> + }
>
> BUG_ON(addr >= end);
> pfn -= addr >> PAGE_SHIFT;
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 72ebe91..8e6d0ca 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -301,7 +301,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
> * track_pfn_vma_new is called when a _new_ pfn mapping is being established
> * for physical range indicated by pfn and size.
> */
> -static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> +static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
> unsigned long pfn, unsigned long size)
> {
> return 0;
> @@ -332,7 +332,7 @@ static inline void untrack_pfn_vma(struct vm_area_struct *vma,
> {
> }
> #else
> -extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
> +extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
> unsigned long pfn, unsigned long size);
> extern int track_pfn_vma_copy(struct vm_area_struct *vma);
> extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
>

2009-01-12 20:39:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Pallipadi, Venkatesh <[email protected]> wrote:

> On Mon, Jan 12, 2009 at 11:47:13AM -0800, Linus Torvalds wrote:
> >
> >
> > On Mon, 12 Jan 2009, Pallipadi, Venkatesh wrote:
> > > + if (strict_prot ||
> > > + (want_flags == _PAGE_CACHE_UC_MINUS &&
> > > + flags == _PAGE_CACHE_WB) ||
> > > + (want_flags == _PAGE_CACHE_WC &&
> > > + flags == _PAGE_CACHE_WB)) {
> >
> > Please don't write code like this.
> >
> > Do it as an inline function that returns true/false and has comments on
> > what the hell is going on.
> >
> > If a conditional doesn't fit on one line, it should generally be
> > abstracted away into a readable function where the name explains what it
> > does conceptually.
> >
>
> Yes. The actual patch that is lined up in tip fixes indeed has this as a
> macro sharing this code with 2 callers and comment about this
> (is_new_memtype_allowed()). I wanted to keep the changes smaller in this
> test patch, which is just to root cause this particular crash and ended
> up with above code.

here are those 7 tip/x86/pat commits below, with changelogs.

Ingo

----------------------->
commit 4fa1489d2a74c1e3c6231f449d73ce46131523ae
Author: Suresh Siddha <[email protected]>
Date: Fri Jan 9 14:35:20 2009 -0800

x86, pat: fix reserve_memtype() for legacy 1MB range

Thierry Vignaud reported:
> http://bugzilla.kernel.org/show_bug.cgi?id=12372
>
> On P4 with an SiS motherboard (video card is a SiS 651)
> X server fails to start with error:
> xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid
> argument)

Here X is trying to map first 8KB of memory using /dev/mem. Existing
code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent
code changes don't allow to map memory with different attributes
at the same time.

Fix this by treating the first 1MB legacy region as special and always
track the attribute requests with in this region using linear linked
list (and don't bother if the range is RAM or non-RAM or mixed)

Reported-and-tested-by: Thierry Vignaud <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 160c42d..ec8cd49 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -333,11 +333,20 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
req_type & _PAGE_CACHE_MASK);
}

- is_range_ram = pagerange_is_ram(start, end);
- if (is_range_ram == 1)
- return reserve_ram_pages_type(start, end, req_type, new_type);
- else if (is_range_ram < 0)
- return -EINVAL;
+ /*
+ * For legacy reasons, some parts of the physical address range in the
+ * legacy 1MB region is treated as non-RAM (even when listed as RAM in
+ * the e820 tables). So we will track the memory attributes of this
+ * legacy 1MB region using the linear memtype_list always.
+ */
+ if (end >= ISA_END_ADDRESS) {
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type,
+ new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+ }

new = kmalloc(sizeof(struct memtype), GFP_KERNEL);
if (!new)

commit 895252ccb3050383e1dcf2c2536065e346c2fa14
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:14 2009 -0800

x86 PAT: remove CPA WARN_ON for zero pte

Impact: reduce scope of debug check - avoid warnings

The logic to find whether identity map exists or not using
high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
as the memory withing the range may not be mapped if there is a
unusable hole in e820.

Specifically, on my test system I started seeing these warnings with
tools like hwinfo, acpidump trying to map ACPI region.

[ 27.400018] ------------[ cut here ]------------
[ 27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
[ 27.400821] Hardware name: X7DB8
[ 27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
[ 27.401569] Modules linked in:
[ 27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec #586
[ 27.402141] Call Trace:
[ 27.402488] [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
[ 27.402749] [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
[ 27.403028] [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
[ 27.403333] [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
[ 27.403628] [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
[ 27.403883] [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
[ 27.404172] [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
[ 27.404512] [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
[ 27.404766] [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
[ 27.405026] [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
[ 27.405292] [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
[ 27.405590] [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
[ 27.405844] [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
[ 27.406097] [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
[ 27.406427] [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
[ 27.406686] [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
[ 27.406940] [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
[ 27.407209] [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
[ 27.407523] [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
[ 27.407776] [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
[ 27.408034] [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
[ 27.408339] [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
[ 27.408614] ---[ end trace 4b16ad70c09a602d ]---
[ 27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000

This is wih track_pfn_vma_new trying to keep identity map in sync.
The address cff6a000 is the ACPI region according to e820.

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
[ 0.000000] BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
[ 0.000000] BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
[ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[ 0.000000] BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000230000000 (usable)

And is not mapped as per init_memory_mapping.

[ 0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
[ 0.000000] init_memory_mapping: 0000000100000000-0000000230000000

We can add logic to check for this. But, there can also be other holes in
identity map when we have 1GB of aligned reserved space in e820.

This patch handles it by removing the WARN_ON and returning a specific
error value (EFAULT) to indicate that the address does not have any
identity mapping.

The code that tries to keep identity map in sync can ignore
this error, with other callers of cpa still getting error here.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e89d248..4cf30de 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -555,10 +555,12 @@ repeat:
if (!pte_val(old_pte)) {
if (!primary)
return 0;
- WARN(1, KERN_WARNING "CPA: called for zero pte. "
- "vaddr = %lx cpa->vaddr = %lx\n", address,
- *cpa->vaddr);
- return -EINVAL;
+
+ /*
+ * Special error value returned, indicating that the mapping
+ * did not exist at this address.
+ */
+ return -EFAULT;
}

if (level == PG_LEVEL_4K) {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8b08fb9..160c42d 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -505,6 +505,35 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
}
#endif /* CONFIG_STRICT_DEVMEM */

+/*
+ * Change the memory type for the physial address range in kernel identity
+ * mapping space if that range is a part of identity map.
+ */
+static int kernel_map_sync_memtype(u64 base, unsigned long size,
+ unsigned long flags)
+{
+ unsigned long id_sz;
+ int ret;
+
+ if (!pat_enabled || base >= __pa(high_memory))
+ return 0;
+
+ id_sz = (__pa(high_memory) < base + size) ?
+ __pa(high_memory) - base :
+ size;
+
+ ret = ioremap_change_attr((unsigned long)__va(base), id_sz, flags);
+ /*
+ * -EFAULT return means that the addr was not valid and did not have
+ * any identity mapping. That case is a success for
+ * kernel_map_sync_memtype.
+ */
+ if (ret == -EFAULT)
+ ret = 0;
+
+ return ret;
+}
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -555,9 +584,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
if (retval < 0)
return 0;

- if (((pfn < max_low_pfn_mapped) ||
- (pfn >= (1UL<<(32 - PAGE_SHIFT)) && pfn < max_pfn_mapped)) &&
- ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
+ if (kernel_map_sync_memtype(offset, size, flags)) {
free_memtype(offset, offset + size);
printk(KERN_INFO
"%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
@@ -605,7 +632,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
int strict_prot)
{
int is_ram = 0;
- int id_sz, ret;
+ int ret;
unsigned long flags;
unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

@@ -646,15 +673,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
flags);
}

- /* Need to keep identity mapping in sync */
- if (paddr >= __pa(high_memory))
- return 0;
-
- id_sz = (__pa(high_memory) < paddr + size) ?
- __pa(high_memory) - paddr :
- size;
-
- if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+ if (kernel_map_sync_memtype(paddr, size, flags)) {
free_memtype(paddr, paddr + size);
printk(KERN_ERR
"%s:%d reserve_pfn_range ioremap_change_attr failed %s "

commit 838b120c59b530ba58cc0197d208d08455733472
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:13 2009 -0800

x86 PAT: ioremap_wc should take resource_size_t parameter

Impact: fix/extend ioremap_wc() beyond 4GB aperture on 32-bit

ioremap_wc() was taking in unsigned long parameter, where as it should take
64-bit resource_size_t parameter like other ioremap variants.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 05cfed4..bdbb4b9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -91,7 +91,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);

extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
unsigned long prot_val);
-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);

/*
* early_ioremap() and early_iounmap() are for temporary early boot-time
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bd85d42..2ddb1e7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -367,7 +367,7 @@ EXPORT_SYMBOL(ioremap_nocache);
*
* Must be freed with iounmap.
*/
-void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
if (pat_enabled)
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,

commit 283c81fe6568202db345649e874d2a0f29dc5a84
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:12 2009 -0800

x86 PAT: return compatible mapping to remap_pfn_range callers

Impact: avoid warning message, potentially solve 3D performance regression

Change x86 PAT code to return compatible memtype if the exact memtype that
was requested in remap_pfn_rage and friends is not available due to some
conflict.

This is done by returning the compatible type in pgprot parameter of
track_pfn_vma_new(), and the caller uses that memtype for page table.

Note that track_pfn_vma_copy() which is basically called during fork gets the
prot from existing page table and should not have any conflict. Hence we use
strict memtype check there and do not allow compatible memtypes.

This patch fixes the bug reported here:

http://marc.info/?l=linux-kernel&m=123108883716357&w=2

Specifically the error message:

X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
got write-combining

Should go away.

Reported-and-bisected-by: Kevin Winchester <[email protected]>
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index f88ac80..8b08fb9 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -601,12 +601,13 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
* Reserved non RAM regions only and after successful reserve_memtype,
* this func also keeps identity mapping (if any) in sync with this new prot.
*/
-static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
+ int strict_prot)
{
int is_ram = 0;
int id_sz, ret;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+ unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

is_ram = pagerange_is_ram(paddr, paddr + size);

@@ -625,15 +626,24 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
return ret;

if (flags != want_flags) {
- free_memtype(paddr, paddr + size);
- printk(KERN_ERR
- "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
- current->comm, current->pid,
- cattr_name(want_flags),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size),
- cattr_name(flags));
- return -EINVAL;
+ if (strict_prot || !is_new_memtype_allowed(want_flags, flags)) {
+ free_memtype(paddr, paddr + size);
+ printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
+ " for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size),
+ cattr_name(flags));
+ return -EINVAL;
+ }
+ /*
+ * We allow returning different type than the one requested in
+ * non strict case.
+ */
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) &
+ (~_PAGE_CACHE_MASK)) |
+ flags);
}

/* Need to keep identity mapping in sync */
@@ -689,6 +699,7 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
unsigned long vma_start = vma->vm_start;
unsigned long vma_end = vma->vm_end;
unsigned long vma_size = vma_end - vma_start;
+ pgprot_t pgprot;

if (!pat_enabled)
return 0;
@@ -702,7 +713,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
WARN_ON_ONCE(1);
return -EINVAL;
}
- return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}

/* reserve entire vma page by page, using pfn and prot from pte */
@@ -710,7 +722,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
continue;

- retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
if (retval)
goto cleanup_ret;
}
@@ -758,14 +771,14 @@ int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, *prot);
+ return reserve_pfn_range(paddr, vma_size, prot, 0);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, *prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
if (retval)
goto cleanup_ret;
}

commit dfed11010f7b2d994444bcd83ec4cc7e80d7d030
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:11 2009 -0800

x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param

Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 85cbd3c..f88ac80 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -741,7 +741,7 @@ cleanup_ret:
* Note that this function can be called with caller trying to map only a
* subrange/page inside the vma.
*/
-int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
int retval = 0;
@@ -758,14 +758,14 @@ int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, prot);
+ return reserve_pfn_range(paddr, vma_size, *prot);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, *prot);
if (retval)
goto cleanup_ret;
}
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 72ebe91..8e6d0ca 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -301,7 +301,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
* track_pfn_vma_new is called when a _new_ pfn mapping is being established
* for physical range indicated by pfn and size.
*/
-static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
return 0;
@@ -332,7 +332,7 @@ static inline void untrack_pfn_vma(struct vm_area_struct *vma,
{
}
#else
-extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size);
extern int track_pfn_vma_copy(struct vm_area_struct *vma);
extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
diff --git a/mm/memory.c b/mm/memory.c
index 61c7e1e..238fb8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
int ret;
+ pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
- if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+ if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
return -EINVAL;

- ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ ret = insert_pfn(vma, addr, pfn, pgprot);

if (ret)
untrack_pfn_vma(vma, pfn, PAGE_SIZE);
@@ -1671,7 +1672,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,

vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

- err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
+ err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
if (err) {
/*
* To indicate that track_pfn related cleanup is not

commit a8eae3321ea94fe06c6a76b48cc6a082116b1784
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:10 2009 -0800

x86 PAT: consolidate old memtype new memtype check into a function

Impact: cleanup

Move the new memtype old memtype allowed check to header so that is can be
shared by other users. Subsequent patch uses this in pat.c in remap_pfn_range()
code path. No functionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 83e69f4..06bbcbd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -341,6 +341,25 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)

+static inline int is_new_memtype_allowed(unsigned long flags,
+ unsigned long new_flags)
+{
+ /*
+ * Certain new memtypes are not allowed with certain
+ * requested memtype:
+ * - request is uncached, return cannot be write-back
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC_MINUS &&
+ new_flags == _PAGE_CACHE_WB) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ return 0;
+ }
+
+ return 1;
+}
+
#ifndef __ASSEMBLY__
/* Indicate that x86 has its own track and untrack pfn vma functions */
#define __HAVE_PFNMAP_TRACKING
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index f884740..5ead808 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -314,17 +314,7 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
return retval;

if (flags != new_flags) {
- /*
- * Do not fallback to certain memory types with certain
- * requested type:
- * - request is uncached, return cannot be write-back
- * - request is uncached, return cannot be write-combine
- * - request is write-combine, return cannot be write-back
- */
- if ((flags == _PAGE_CACHE_UC_MINUS &&
- (new_flags == _PAGE_CACHE_WB)) ||
- (flags == _PAGE_CACHE_WC &&
- new_flags == _PAGE_CACHE_WB)) {
+ if (!is_new_memtype_allowed(flags, new_flags)) {
free_memtype(addr, addr+len);
return -EINVAL;
}

commit 18d82ebde7e40bf67c84b505a12be26133a89932
Author: [email protected] <[email protected]>
Date: Fri Jan 9 16:13:09 2009 -0800

x86 PAT: remove PFNMAP type on track_pfn_vma_new() error

Impact: fix (harmless) double-free of memtype entries and avoid warning

On track_pfn_vma_new() failure, reset the vm_flags so that there will be
no second cleanup happening when upper level routines call unmap_vmas().

This patch fixes part of the bug reported here:

http://marc.info/?l=linux-kernel&m=123108883716357&w=2

Specifically the error message:

X:5010 freeing invalid memtype d0000000-d0101000

Is due to multiple frees on error path, will not happen with the patch below.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/mm/memory.c b/mm/memory.c
index e009ce8..61c7e1e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1672,8 +1672,14 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
- if (err)
+ if (err) {
+ /*
+ * To indicate that track_pfn related cleanup is not
+ * needed from higher level routine calling unmap_vmas
+ */
+ vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
return -EINVAL;
+ }

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;

2009-01-12 20:41:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Torsten Kaiser <[email protected]> wrote:

> On Mon, Jan 12, 2009 at 8:29 PM, Pallipadi, Venkatesh
> <[email protected]> wrote:
> > oops. I missed out one file in the earlier test patch. Below is the
> > updated test patch that will go against 29-rc1.
> >
> > Thanks,
> > Venki
> >
> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
>
> Tested-by: Torsten Kaiser <[email protected]>
>
> The system boots normal and glxgears is accelerated again.

Could you try the tree below as well please?

It's functionally the same as the patch you just tried - with a few
cleanups. (If you again get a crash then we know that it's the difference
between this version and the patch you just tried that causes the crash.)

You can git-pull the URI below into v2.6.29-rc1.

Ingo

---------------------->
Please pull the latest x86/pat git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86/pat


out-of-topic modifications in x86/pat:
--------------------------------------
include/asm-generic/pgtable.h # dfed110: x86 PAT: change track_pfn_vma_new
mm/memory.c # dfed110: x86 PAT: change track_pfn_vma_new
# 18d82eb: x86 PAT: remove PFNMAP type on tr

Thanks,

Ingo

------------------>
Suresh Siddha (1):
x86, pat: fix reserve_memtype() for legacy 1MB range

[email protected] (6):
x86 PAT: remove PFNMAP type on track_pfn_vma_new() error
x86 PAT: consolidate old memtype new memtype check into a function
x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
x86 PAT: return compatible mapping to remap_pfn_range callers
x86 PAT: ioremap_wc should take resource_size_t parameter
x86 PAT: remove CPA WARN_ON for zero pte


arch/x86/include/asm/io.h | 2 +-
arch/x86/include/asm/pgtable.h | 19 +++++++
arch/x86/mm/ioremap.c | 2 +-
arch/x86/mm/pageattr.c | 10 ++--
arch/x86/mm/pat.c | 109 +++++++++++++++++++++++++++------------
arch/x86/pci/i386.c | 12 +----
include/asm-generic/pgtable.h | 4 +-
mm/memory.c | 15 ++++--
8 files changed, 116 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 05cfed4..bdbb4b9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -91,7 +91,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);

extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
unsigned long prot_val);
-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);

/*
* early_ioremap() and early_iounmap() are for temporary early boot-time
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 83e69f4..06bbcbd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -341,6 +341,25 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)

+static inline int is_new_memtype_allowed(unsigned long flags,
+ unsigned long new_flags)
+{
+ /*
+ * Certain new memtypes are not allowed with certain
+ * requested memtype:
+ * - request is uncached, return cannot be write-back
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC_MINUS &&
+ new_flags == _PAGE_CACHE_WB) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ return 0;
+ }
+
+ return 1;
+}
+
#ifndef __ASSEMBLY__
/* Indicate that x86 has its own track and untrack pfn vma functions */
#define __HAVE_PFNMAP_TRACKING
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bd85d42..2ddb1e7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -367,7 +367,7 @@ EXPORT_SYMBOL(ioremap_nocache);
*
* Must be freed with iounmap.
*/
-void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
if (pat_enabled)
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e89d248..4cf30de 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -555,10 +555,12 @@ repeat:
if (!pte_val(old_pte)) {
if (!primary)
return 0;
- WARN(1, KERN_WARNING "CPA: called for zero pte. "
- "vaddr = %lx cpa->vaddr = %lx\n", address,
- *cpa->vaddr);
- return -EINVAL;
+
+ /*
+ * Special error value returned, indicating that the mapping
+ * did not exist at this address.
+ */
+ return -EFAULT;
}

if (level == PG_LEVEL_4K) {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 85cbd3c..ec8cd49 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -333,11 +333,20 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
req_type & _PAGE_CACHE_MASK);
}

- is_range_ram = pagerange_is_ram(start, end);
- if (is_range_ram == 1)
- return reserve_ram_pages_type(start, end, req_type, new_type);
- else if (is_range_ram < 0)
- return -EINVAL;
+ /*
+ * For legacy reasons, some parts of the physical address range in the
+ * legacy 1MB region is treated as non-RAM (even when listed as RAM in
+ * the e820 tables). So we will track the memory attributes of this
+ * legacy 1MB region using the linear memtype_list always.
+ */
+ if (end >= ISA_END_ADDRESS) {
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type,
+ new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+ }

new = kmalloc(sizeof(struct memtype), GFP_KERNEL);
if (!new)
@@ -505,6 +514,35 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
}
#endif /* CONFIG_STRICT_DEVMEM */

+/*
+ * Change the memory type for the physial address range in kernel identity
+ * mapping space if that range is a part of identity map.
+ */
+static int kernel_map_sync_memtype(u64 base, unsigned long size,
+ unsigned long flags)
+{
+ unsigned long id_sz;
+ int ret;
+
+ if (!pat_enabled || base >= __pa(high_memory))
+ return 0;
+
+ id_sz = (__pa(high_memory) < base + size) ?
+ __pa(high_memory) - base :
+ size;
+
+ ret = ioremap_change_attr((unsigned long)__va(base), id_sz, flags);
+ /*
+ * -EFAULT return means that the addr was not valid and did not have
+ * any identity mapping. That case is a success for
+ * kernel_map_sync_memtype.
+ */
+ if (ret == -EFAULT)
+ ret = 0;
+
+ return ret;
+}
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -555,9 +593,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
if (retval < 0)
return 0;

- if (((pfn < max_low_pfn_mapped) ||
- (pfn >= (1UL<<(32 - PAGE_SHIFT)) && pfn < max_pfn_mapped)) &&
- ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
+ if (kernel_map_sync_memtype(offset, size, flags)) {
free_memtype(offset, offset + size);
printk(KERN_INFO
"%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
@@ -601,12 +637,13 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
* Reserved non RAM regions only and after successful reserve_memtype,
* this func also keeps identity mapping (if any) in sync with this new prot.
*/
-static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
+static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
+ int strict_prot)
{
int is_ram = 0;
- int id_sz, ret;
+ int ret;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+ unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

is_ram = pagerange_is_ram(paddr, paddr + size);

@@ -625,26 +662,27 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t vma_prot)
return ret;

if (flags != want_flags) {
- free_memtype(paddr, paddr + size);
- printk(KERN_ERR
- "%s:%d map pfn expected mapping type %s for %Lx-%Lx, got %s\n",
- current->comm, current->pid,
- cattr_name(want_flags),
- (unsigned long long)paddr,
- (unsigned long long)(paddr + size),
- cattr_name(flags));
- return -EINVAL;
+ if (strict_prot || !is_new_memtype_allowed(want_flags, flags)) {
+ free_memtype(paddr, paddr + size);
+ printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
+ " for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ (unsigned long long)paddr,
+ (unsigned long long)(paddr + size),
+ cattr_name(flags));
+ return -EINVAL;
+ }
+ /*
+ * We allow returning different type than the one requested in
+ * non strict case.
+ */
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) &
+ (~_PAGE_CACHE_MASK)) |
+ flags);
}

- /* Need to keep identity mapping in sync */
- if (paddr >= __pa(high_memory))
- return 0;
-
- id_sz = (__pa(high_memory) < paddr + size) ?
- __pa(high_memory) - paddr :
- size;
-
- if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+ if (kernel_map_sync_memtype(paddr, size, flags)) {
free_memtype(paddr, paddr + size);
printk(KERN_ERR
"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
@@ -689,6 +727,7 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
unsigned long vma_start = vma->vm_start;
unsigned long vma_end = vma->vm_end;
unsigned long vma_size = vma_end - vma_start;
+ pgprot_t pgprot;

if (!pat_enabled)
return 0;
@@ -702,7 +741,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
WARN_ON_ONCE(1);
return -EINVAL;
}
- return reserve_pfn_range(paddr, vma_size, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}

/* reserve entire vma page by page, using pfn and prot from pte */
@@ -710,7 +750,8 @@ int track_pfn_vma_copy(struct vm_area_struct *vma)
if (follow_phys(vma, vma_start + i, 0, &prot, &paddr))
continue;

- retval = reserve_pfn_range(paddr, PAGE_SIZE, __pgprot(prot));
+ pgprot = __pgprot(prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, &pgprot, 1);
if (retval)
goto cleanup_ret;
}
@@ -741,7 +782,7 @@ cleanup_ret:
* Note that this function can be called with caller trying to map only a
* subrange/page inside the vma.
*/
-int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
int retval = 0;
@@ -758,14 +799,14 @@ int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
if (is_linear_pfn_mapping(vma)) {
/* reserve the whole chunk starting from vm_pgoff */
paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
- return reserve_pfn_range(paddr, vma_size, prot);
+ return reserve_pfn_range(paddr, vma_size, prot, 0);
}

/* reserve page by page using pfn and size */
base_paddr = (resource_size_t)pfn << PAGE_SHIFT;
for (i = 0; i < size; i += PAGE_SIZE) {
paddr = base_paddr + i;
- retval = reserve_pfn_range(paddr, PAGE_SIZE, prot);
+ retval = reserve_pfn_range(paddr, PAGE_SIZE, prot, 0);
if (retval)
goto cleanup_ret;
}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index f884740..5ead808 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -314,17 +314,7 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
return retval;

if (flags != new_flags) {
- /*
- * Do not fallback to certain memory types with certain
- * requested type:
- * - request is uncached, return cannot be write-back
- * - request is uncached, return cannot be write-combine
- * - request is write-combine, return cannot be write-back
- */
- if ((flags == _PAGE_CACHE_UC_MINUS &&
- (new_flags == _PAGE_CACHE_WB)) ||
- (flags == _PAGE_CACHE_WC &&
- new_flags == _PAGE_CACHE_WB)) {
+ if (!is_new_memtype_allowed(flags, new_flags)) {
free_memtype(addr, addr+len);
return -EINVAL;
}
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 72ebe91..8e6d0ca 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -301,7 +301,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
* track_pfn_vma_new is called when a _new_ pfn mapping is being established
* for physical range indicated by pfn and size.
*/
-static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+static inline int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size)
{
return 0;
@@ -332,7 +332,7 @@ static inline void untrack_pfn_vma(struct vm_area_struct *vma,
{
}
#else
-extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t prot,
+extern int track_pfn_vma_new(struct vm_area_struct *vma, pgprot_t *prot,
unsigned long pfn, unsigned long size);
extern int track_pfn_vma_copy(struct vm_area_struct *vma);
extern void untrack_pfn_vma(struct vm_area_struct *vma, unsigned long pfn,
diff --git a/mm/memory.c b/mm/memory.c
index e009ce8..238fb8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1511,6 +1511,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn)
{
int ret;
+ pgprot_t pgprot = vma->vm_page_prot;
/*
* Technically, architectures with pte_special can avoid all these
* restrictions (same for remap_pfn_range). However we would like
@@ -1525,10 +1526,10 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,

if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT;
- if (track_pfn_vma_new(vma, vma->vm_page_prot, pfn, PAGE_SIZE))
+ if (track_pfn_vma_new(vma, &pgprot, pfn, PAGE_SIZE))
return -EINVAL;

- ret = insert_pfn(vma, addr, pfn, vma->vm_page_prot);
+ ret = insert_pfn(vma, addr, pfn, pgprot);

if (ret)
untrack_pfn_vma(vma, pfn, PAGE_SIZE);
@@ -1671,9 +1672,15 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,

vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

- err = track_pfn_vma_new(vma, prot, pfn, PAGE_ALIGN(size));
- if (err)
+ err = track_pfn_vma_new(vma, &prot, pfn, PAGE_ALIGN(size));
+ if (err) {
+ /*
+ * To indicate that track_pfn related cleanup is not
+ * needed from higher level routine calling unmap_vmas
+ */
+ vma->vm_flags &= ~(VM_IO | VM_RESERVED | VM_PFNMAP);
return -EINVAL;
+ }

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;

2009-01-12 20:53:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 12 Jan 2009, Pallipadi, Venkatesh wrote:
> > + if (strict_prot ||
> > + (want_flags == _PAGE_CACHE_UC_MINUS &&
> > + flags == _PAGE_CACHE_WB) ||
> > + (want_flags == _PAGE_CACHE_WC &&
> > + flags == _PAGE_CACHE_WB)) {
>
> Please don't write code like this.
>
> Do it as an inline function that returns true/false and has comments on
> what the hell is going on.

I have asked Venki to do a minimal 'combo' patch that isolates just the
functional changes. (it is otherwise identical to Venki's PAT changes.)

The reason why we wanted to re-test the functional changes was that
Torsten's crash looks very weird: double Call Trace line, a crash in the
scsi/ata code, showing the after-effects of some sort of memory corruption
there.

Connection to the x86-fixes patchset did not seem impossible [a theory
would be: cache aliasing problems causing memory corruption], but
nevertheless it was all quite weird. So we wanted an isolated repeat test
for just the functional changes.

The 7 patches lined up for you (but quarantined from x86/urgent for now,
until the crash Torsten got is investigated) introduce the above condition
cleanly, as:

+static inline int is_new_memtype_allowed(unsigned long flags,
+ unsigned long new_flags)
+{
+ /*
+ * Certain new memtypes are not allowed with certain
+ * requested memtype:
+ * - request is uncached, return cannot be write-back
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC_MINUS &&
+ new_flags == _PAGE_CACHE_WB) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ return 0;
+ }
+
+ return 1;
+}

Ingo

2009-01-12 21:04:19

by Harvey Harrison

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, 2009-01-12 at 21:52 +0100, Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:

> +static inline int is_new_memtype_allowed(unsigned long flags,
> + unsigned long new_flags)
> +{
> + /*
> + * Certain new memtypes are not allowed with certain
> + * requested memtype:
> + * - request is uncached, return cannot be write-back
> + * - request is write-combine, return cannot be write-back
> + */
> + if ((flags == _PAGE_CACHE_UC_MINUS &&
> + new_flags == _PAGE_CACHE_WB) ||
> + (flags == _PAGE_CACHE_WC &&
> + new_flags == _PAGE_CACHE_WB)) {
> + return 0;
> + }

if ((flags == _PAGE_CACHE_UC_MINUS || flags == _PAGE_CACHE_WC) &&
(new_flags == _PAGE_CACHE_WB))

might be a bit neater perhaps.

Harvey

2009-01-12 21:12:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Harvey Harrison <[email protected]> wrote:

> On Mon, 2009-01-12 at 21:52 +0100, Ingo Molnar wrote:
> > * Linus Torvalds <[email protected]> wrote:
>
> > +static inline int is_new_memtype_allowed(unsigned long flags,
> > + unsigned long new_flags)
> > +{
> > + /*
> > + * Certain new memtypes are not allowed with certain
> > + * requested memtype:
> > + * - request is uncached, return cannot be write-back
> > + * - request is write-combine, return cannot be write-back
> > + */
> > + if ((flags == _PAGE_CACHE_UC_MINUS &&
> > + new_flags == _PAGE_CACHE_WB) ||
> > + (flags == _PAGE_CACHE_WC &&
> > + new_flags == _PAGE_CACHE_WB)) {
> > + return 0;
> > + }
>
> if ((flags == _PAGE_CACHE_UC_MINUS || flags == _PAGE_CACHE_WC) &&
> (new_flags == _PAGE_CACHE_WB))
>
> might be a bit neater perhaps.

indeed. The most readable one is probably:

static inline int
is_new_memtype_allowed(unsigned long flags, unsigned long new_flags)
{
/*
* Certain new memtypes are not allowed with certain
* requested memtype:
* - request is uncached, return cannot be write-back
* - request is write-combine, return cannot be write-back
*/

if (new_flags != _PAGE_CACHE_WB)
return 1;

if (flags == _PAGE_CACHE_UC_MINUS)
return 0;
if (flags == _PAGE_CACHE_WC)
return 0;

return 1;
}

Ingo

2009-01-12 21:50:18

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 9:40 PM, Ingo Molnar <[email protected]> wrote:
>
> * Torsten Kaiser <[email protected]> wrote:
>
>> On Mon, Jan 12, 2009 at 8:29 PM, Pallipadi, Venkatesh
>> <[email protected]> wrote:
>> > oops. I missed out one file in the earlier test patch. Below is the
>> > updated test patch that will go against 29-rc1.
>> >
>> > Thanks,
>> > Venki
>> >
>> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
>>
>> Tested-by: Torsten Kaiser <[email protected]>
>>
>> The system boots normal and glxgears is accelerated again.
>
> Could you try the tree below as well please?

Before I read this mail, I already tried the tree you send to Linus as
a pull request.
That worked without a crash, but as expected the DRM related error was
still there.

> It's functionally the same as the patch you just tried - with a few
> cleanups. (If you again get a crash then we know that it's the difference
> between this version and the patch you just tried that causes the crash.)
>
> You can git-pull the URI below into v2.6.29-rc1.
>
> Ingo
>
> ---------------------->
> Please pull the latest x86/pat git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86/pat

pulled && build, here is the result:
[ 76.170171] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 76.178376] IP: [<(null)>] (null)
[ 76.180010] PGD 0
[ 76.180010] Oops: 0010 [#1] SMP
[ 76.180010] last sysfs file:
/sys/devices/pci0000:00/0000:00:0d.0/0000:02:00.0/irq
[ 76.180010] CPU 0
[ 76.180010] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_
common v4l2_common videodev usbhid v4l1_compat hid v4l2_compat_ioctl32
videobuf_dma_sg videobuf_core btcx_risc sg pata_amd tveeprom
[ 76.180010] Pid: 0, comm: swapper Not tainted
2.6.29-rc1-ingo-00008-g4fa1489 #1
[ 76.180010] RIP: 0010:[<0000000000000000>] [<(null)>] (null)
[ 76.180010] RSP: 0018:ffffffff809a8938 EFLAGS: 00010092
[ 76.180010] RAX: 0000000000000020 RBX: 0000000000000000 RCX: 00000000000003ff
[ 76.180010] RDX: 0000000000000020 RSI: 0000000000000400 RDI: 0000000000000020
[ 76.180010] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 76.180010] R10: ffffffff80a00320 R11: 0000000000000000 R12: 0000000000000000
[ 76.180010] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 76.180010] FS: 00007fa7b3ecf740(0000) GS:ffffffff809b1040(0000)
knlGS:0000000000000000
[ 76.180010] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 76.180010] CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
[ 76.180010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 76.180010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 76.180010] Process swapper (pid: 0, threadinfo ffffffff8087e000,
task ffffffff807de360)
[ 76.180010] Stack:
[ 76.180010] ffff88007e5c8da0 0000000000000000 0000000000000000
0000000400000000
[ 76.180010] 0000000000000092 00000000803f158f ffffffff809a8a28
ffffffff803f184a
[ 76.180010] 0000000000000000 ffffffff809a89a8 0000000000000000
00000000ffffffff
[ 76.180010] Call Trace:
[ 76.180010] <IRQ> <0> [<ffffffff803f184a>] ? number+0x2aa/0x2d0
[ 76.180010] [<ffffffff8023a518>] ? enqueue_task_fair+0x188/0x2c0
[ 76.180010] [<ffffffff8065a258>] ? printk+0x67/0x6f
[ 76.180010] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
[ 76.180010] [<ffffffff8022a267>] ? is_prefetch+0xa7/0x280
[ 76.180010] [<ffffffff8024426e>] ? oops_enter+0xe/0x10
[ 76.180010] [<ffffffff8020fd3b>] ? oops_begin+0x8b/0xa0
[ 76.180010] [<ffffffff8022a7b9>] ? do_page_fault+0x379/0x980
[ 76.180010] [<ffffffff803f24d1>] ? vsnprintf+0x351/0xbb0
[ 76.180010] [<ffffffff8065cddd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 76.180010] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
[ 76.180010] [<ffffffff8023a51c>] ? enqueue_task_fair+0x18c/0x2c0
[ 76.180010] [<ffffffff8023a518>] ? enqueue_task_fair+0x188/0x2c0
[ 76.180010] [<ffffffff80234d90>] ? enqueue_task+0x50/0x60
[ 76.180010] [<ffffffff80234ea2>] ? activate_task+0x22/0x30
[ 76.180010] [<ffffffff80238ae2>] ? try_to_wake_up+0x232/0x2d0
[ 76.180010] [<ffffffff80238b8d>] ? default_wake_function+0xd/0x10
[ 76.180010] [<ffffffff8025a761>] ? autoremove_wake_function+0x11/0x40
[ 76.180010] [<ffffffff80261b59>] ? getnstimeofday+0x59/0xe0
[ 76.180010] [<ffffffff80235842>] ? __wake_up_common+0x52/0x80
[ 76.180010] [<ffffffff802367a3>] ? __wake_up+0x43/0x70
[ 76.180010] [<ffffffff80256c70>] ? delayed_work_timer_fn+0x0/0x40
[ 76.180010] [<ffffffff80256c4c>] ? __queue_work+0x6c/0x90
[ 76.180010] [<ffffffff80256cad>] ? delayed_work_timer_fn+0x3d/0x40
[ 76.180010] [<ffffffff80210f20>] ? update_vsyscall+0xd0/0xe0
[ 76.180010] [<ffffffff8026202f>] ? update_wall_time+0x3ff/0x520
[ 76.180010] [<ffffffff8021b230>] ? post_set+0x20/0x40
[ 76.180010] [<ffffffff8021b6de>] ? generic_set_mtrr+0x11e/0x140
[ 76.180010] [<ffffffff80219457>] ? ipi_handler+0x47/0xb0
[ 76.180010] [<ffffffff8026afa0>] ?
generic_smp_call_function_interrupt+0x50/0x100
[ 76.180010] [<ffffffff8021e54f>] ? smp_call_function_interrupt+0x1f/0x30
[ 76.180010] [<ffffffff8020c863>] ? call_function_interrupt+0x13/0x20
[ 76.180010] <EOI> <0>Code: Bad RIP value.
[ 76.180010] RIP [<(null)>] (null)
[ 76.180010] RSP <ffffffff809a8938>
[ 76.180010] CR2: 0000000000000000
[ 76.180010] ---[ end trace 4da1e896c873962a ]---
[ 76.180010] Kernel panic - not syncing: Fatal exception in interrupt
[ 76.180010] ------------[ cut here ]------------
[ 76.180010] WARNING: at kernel/smp.c:299 smp_call_function_many+0x1e9/0x250()
[ 76.180010] Hardware name: KFN5-D SLI
[ 76.180010] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev usbhid v4l1_compat
hid v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc sg
pata_amd tveeprom
[ 76.180010] Pid: 0, comm: swapper Tainted: G D
2.6.29-rc1-ingo-00008-g4fa1489 #1
[ 76.180010] Call Trace:
[ 76.180010] <IRQ> [<ffffffff802440c0>] warn_slowpath+0xd0/0x130
[ 76.180010] [<ffffffff8065d1ef>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 76.180010] [<ffffffff8024496d>] ? release_console_sem+0x1dd/0x230
[ 76.180010] [<ffffffff8026adc9>] smp_call_function_many+0x1e9/0x250
[ 76.180010] [<ffffffff80213570>] ? stop_this_cpu+0x0/0x30
[ 76.180010] [<ffffffff8024496d>] ? release_console_sem+0x1dd/0x230
[ 76.180010] [<ffffffff8026ae50>] smp_call_function+0x20/0x30
[ 76.180010] [<ffffffff8021e4c0>] native_smp_send_stop+0x30/0x70
[ 76.180010] [<ffffffff8065a134>] panic+0xa8/0x165
[ 76.180010] [<ffffffff8065d1ef>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 76.180010] [<ffffffff8024496d>] ? release_console_sem+0x1dd/0x230
[ 76.180010] [<ffffffff80244c95>] ? console_unblank+0x75/0x90
[ 76.180010] [<ffffffff8020fca3>] oops_end+0x93/0xa0
[ 76.180010] [<ffffffff8022a864>] do_page_fault+0x424/0x980
[ 76.180010] [<ffffffff8065cddd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 76.180010] [<ffffffff803f184a>] ? number+0x2aa/0x2d0
[ 76.180010] [<ffffffff8023a518>] ? enqueue_task_fair+0x188/0x2c0
[ 76.180010] [<ffffffff8065a258>] ? printk+0x67/0x6f
[ 76.180010] [<ffffffff804cc70f>] ? ata_scsi_qc_complete+0x1df/0x4c0
[ 76.180010] [<ffffffff8022a267>] ? is_prefetch+0xa7/0x280
[ 76.180010] [<ffffffff8024426e>] ? oops_enter+0xe/0x10
[ 76.180010] [<ffffffff8020fd3b>] ? oops_begin+0x8b/0xa0
[ 76.180010] [<ffffffff8022a7b9>] ? do_page_fault+0x379/0x980
[ 76.180010] [<ffffffff803f24d1>] ? vsnprintf+0x351/0xbb0
[ 76.180010] [<ffffffff8065cddd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 76.180010] [<ffffffff8065d54f>] ? page_fault+0x1f/0x30
[ 76.180010] [<ffffffff8023a51c>] ? enqueue_task_fair+0x18c/0x2c0
[ 76.180010] [<ffffffff8023a518>] ? enqueue_task_fair+0x188/0x2c0
[ 76.180010] [<ffffffff80234d90>] ? enqueue_task+0x50/0x60
[ 76.180010] [<ffffffff80234ea2>] ? activate_task+0x22/0x30
[ 76.180010] [<ffffffff80238ae2>] ? try_to_wake_up+0x232/0x2d0
[ 76.180010] [<ffffffff80238b8d>] ? default_wake_function+0xd/0x10
[ 76.180010] [<ffffffff8025a761>] ? autoremove_wake_function+0x11/0x40
[ 76.180010] [<ffffffff80261b59>] ? getnstimeofday+0x59/0xe0
[ 76.180010] [<ffffffff80235842>] ? __wake_up_common+0x52/0x80
[ 76.180010] [<ffffffff802367a3>] ? __wake_up+0x43/0x70
[ 76.180010] [<ffffffff80256c70>] ? delayed_work_timer_fn+0x0/0x40
[ 76.180010] [<ffffffff80256c4c>] ? __queue_work+0x6c/0x90
[ 76.180010] [<ffffffff80256cad>] ? delayed_work_timer_fn+0x3d/0x40
[ 76.180010] [<ffffffff80210f20>] ? update_vsyscall+0xd0/0xe0
[ 76.180010] [<ffffffff8026202f>] ? update_wall_time+0x3ff/0x520
[ 76.180010] [<ffffffff8021b230>] ? post_set+0x20/0x40
[ 76.180010] [<ffffffff8021b6de>] ? generic_set_mtrr+0x11e/0x140
[ 76.180010] [<ffffffff80219457>] ? ipi_handler+0x47/0xb0
[ 76.180010] [<ffffffff8026afa0>] ?
generic_smp_call_function_interrupt+0x50/0x100
[ 76.180010] [<ffffffff8021e54f>] ? smp_call_function_interrupt+0x1f/0x30
[ 76.180010] [<ffffffff8020c863>] ? call_function_interrupt+0x13/0x20
[ 76.180010] <EOI> <4>---[ end trace 4da1e896c873962b ]---
[ 86.211282] INFO: RCU detected CPU 1 stall (t=1000 jiffies)
[ 86.211282] Pid: 3278, comm: X Tainted: G D W
2.6.29-rc1-ingo-00008-g4fa1489 #1
[ 86.211282] Call Trace:
[ 86.211282] <IRQ> [<ffffffff80277cdb>] __rcu_pending+0x7b/0x2c0
[ 86.211282] [<ffffffff80277f4e>] rcu_pending+0x2e/0x70
[ 86.211282] [<ffffffff8024ebee>] update_process_times+0x3e/0x70
[ 86.211282] [<ffffffff8026610d>] tick_sched_timer+0x6d/0xc0
[ 86.211282] [<ffffffff8025d42f>] __run_hrtimer+0x5f/0x130
[ 86.211282] [<ffffffff8025dcf5>] hrtimer_interrupt+0xa5/0x120
[ 86.211282] [<ffffffff8021f7b3>] smp_apic_timer_interrupt+0x83/0xc0
[ 86.211282] [<ffffffff8020c6e3>] apic_timer_interrupt+0x13/0x20
[ 86.211282] <EOI>

This time not even the keyboard leds where blinking, the system was
completely dead.

HTH

Torsten

2009-01-12 21:55:30

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 9:52 PM, Ingo Molnar <[email protected]> wrote:
> The reason why we wanted to re-test the functional changes was that
> Torsten's crash looks very weird: double Call Trace line, a crash in the
> scsi/ata code, showing the after-effects of some sort of memory corruption
> there.

The double Call Trace: line was a copy&paste error on my part. Its not
there in the original oops.

Sorry for that...

Torsten

2009-01-12 22:03:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Torsten Kaiser <[email protected]> wrote:

> On Mon, Jan 12, 2009 at 9:52 PM, Ingo Molnar <[email protected]> wrote:
> > The reason why we wanted to re-test the functional changes was that
> > Torsten's crash looks very weird: double Call Trace line, a crash in the
> > scsi/ata code, showing the after-effects of some sort of memory corruption
> > there.
>
> The double Call Trace: line was a copy&paste error on my part. Its not
> there in the original oops.
>
> Sorry for that...

ah, ok - that's fine.

I was just wondering whether it was two CPUs crashing at once and
producing an overlap - or something like that. (although typically in that
case we dont get such nice line duplication - we get totally garbled
output of the two oopses superimposed.)

It's just that when an oops looks weird we have to look at every small
detail, to be able to imagine the unimaginable.

Bugs you cannot even imagine are the toughest nuts usually, as the process
of debugging narrows imagination usually - often it involves repetitive
automatisms which are not helpful in expanding your thoughts to cover
tricky, unusual bugs.

If an oops looks difficult there's a way out of that trap: co-debug in
duos if you can - the same folks rarely get unimaginative for the very
same detail. (Or put it aside and leave it for the next morning - to flush
out the invisible temporary mental dead-ends one has installed
subconsciously and which are blocking you from reaching the real
solution.)

Ingo

2009-01-12 22:14:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


* Torsten Kaiser <[email protected]> wrote:

> On Mon, Jan 12, 2009 at 9:40 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Torsten Kaiser <[email protected]> wrote:
> >
> >> On Mon, Jan 12, 2009 at 8:29 PM, Pallipadi, Venkatesh
> >> <[email protected]> wrote:
> >> > oops. I missed out one file in the earlier test patch. Below is the
> >> > updated test patch that will go against 29-rc1.
> >> >
> >> > Thanks,
> >> > Venki
> >> >
> >> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
> >>
> >> Tested-by: Torsten Kaiser <[email protected]>
> >>
> >> The system boots normal and glxgears is accelerated again.
> >
> > Could you try the tree below as well please?
>
> Before I read this mail, I already tried the tree you send to Linus as a
> pull request. That worked without a crash, but as expected the DRM
> related error was still there.

Do you mean today's x86-fixes pull request to Linus? That would be the
expected behavior: i separated out the PAT fixes from that tree to be able
to progress with those other fixes - while the PAT angle is investigated.

Neither your crash log nor the review of the PAT patches revealed a
smoking gun (to me at least), but your crash obviously happened, and it
happened right after you pulled the x86-fixes tree.

> pulled && build, here is the result:
> [ 76.170171] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 76.178376] IP: [<(null)>] (null)

thanks, that's really helpful!

Below is the delta from the minimal patch you tried earlier today, to the
full clean patchset.

By all likelyhood, if you apply Venki's patch (which you tested earlier
today, and which did not crash and gave back 3D performance to you), and
then apply the patch below, you'll get the same crash again.

So the bug is in the diff below. My first guess would be:

-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);

we extended 4G to 64-bits on 32-bit systems. If there's a width problem
somewhere along the road we can mess the pagetables up real big.

the other possibility would be this hunk:

- is_range_ram = pagerange_is_ram(start, end);
- if (is_range_ram == 1)
- return reserve_ram_pages_type(start, end, req_type, new_type);
- else if (is_range_ram < 0)
- return -EINVAL;
+ /*
+ * For legacy reasons, some parts of the physical address range in the
+ * legacy 1MB region is treated as non-RAM (even when listed as RAM in
+ * the e820 tables). So we will track the memory attributes of this
+ * legacy 1MB region using the linear memtype_list always.
+ */
+ if (end >= ISA_END_ADDRESS) {
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type,
+ new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+ }

That is this patch's effect:

4fa1489: x86, pat: fix reserve_memtype() for legacy 1MB range

if you have more testing capacity, could you please try tip/master again:

http://people.redhat.com/mingo/tip.git/README

by all likelyhood it will crash for you (it has the PAT fixes included).
Then type this:

git revert 4fa1489

Does that solve the crash and give you good 3D performance again?

Ingo

-------------->
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 05cfed4..bdbb4b9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -91,7 +91,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);

extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
unsigned long prot_val);
-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);

/*
* early_ioremap() and early_iounmap() are for temporary early boot-time
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 83e69f4..06bbcbd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -341,6 +341,25 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)

+static inline int is_new_memtype_allowed(unsigned long flags,
+ unsigned long new_flags)
+{
+ /*
+ * Certain new memtypes are not allowed with certain
+ * requested memtype:
+ * - request is uncached, return cannot be write-back
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC_MINUS &&
+ new_flags == _PAGE_CACHE_WB) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ return 0;
+ }
+
+ return 1;
+}
+
#ifndef __ASSEMBLY__
/* Indicate that x86 has its own track and untrack pfn vma functions */
#define __HAVE_PFNMAP_TRACKING
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bd85d42..2ddb1e7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -367,7 +367,7 @@ EXPORT_SYMBOL(ioremap_nocache);
*
* Must be freed with iounmap.
*/
-void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
{
if (pat_enabled)
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e89d248..4cf30de 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -555,10 +555,12 @@ repeat:
if (!pte_val(old_pte)) {
if (!primary)
return 0;
- WARN(1, KERN_WARNING "CPA: called for zero pte. "
- "vaddr = %lx cpa->vaddr = %lx\n", address,
- *cpa->vaddr);
- return -EINVAL;
+
+ /*
+ * Special error value returned, indicating that the mapping
+ * did not exist at this address.
+ */
+ return -EFAULT;
}

if (level == PG_LEVEL_4K) {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 472d8ef..ec8cd49 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -333,11 +333,20 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
req_type & _PAGE_CACHE_MASK);
}

- is_range_ram = pagerange_is_ram(start, end);
- if (is_range_ram == 1)
- return reserve_ram_pages_type(start, end, req_type, new_type);
- else if (is_range_ram < 0)
- return -EINVAL;
+ /*
+ * For legacy reasons, some parts of the physical address range in the
+ * legacy 1MB region is treated as non-RAM (even when listed as RAM in
+ * the e820 tables). So we will track the memory attributes of this
+ * legacy 1MB region using the linear memtype_list always.
+ */
+ if (end >= ISA_END_ADDRESS) {
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type,
+ new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+ }

new = kmalloc(sizeof(struct memtype), GFP_KERNEL);
if (!new)
@@ -505,6 +514,35 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
}
#endif /* CONFIG_STRICT_DEVMEM */

+/*
+ * Change the memory type for the physial address range in kernel identity
+ * mapping space if that range is a part of identity map.
+ */
+static int kernel_map_sync_memtype(u64 base, unsigned long size,
+ unsigned long flags)
+{
+ unsigned long id_sz;
+ int ret;
+
+ if (!pat_enabled || base >= __pa(high_memory))
+ return 0;
+
+ id_sz = (__pa(high_memory) < base + size) ?
+ __pa(high_memory) - base :
+ size;
+
+ ret = ioremap_change_attr((unsigned long)__va(base), id_sz, flags);
+ /*
+ * -EFAULT return means that the addr was not valid and did not have
+ * any identity mapping. That case is a success for
+ * kernel_map_sync_memtype.
+ */
+ if (ret == -EFAULT)
+ ret = 0;
+
+ return ret;
+}
+
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
@@ -555,9 +593,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
if (retval < 0)
return 0;

- if (((pfn < max_low_pfn_mapped) ||
- (pfn >= (1UL<<(32 - PAGE_SHIFT)) && pfn < max_pfn_mapped)) &&
- ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
+ if (kernel_map_sync_memtype(offset, size, flags)) {
free_memtype(offset, offset + size);
printk(KERN_INFO
"%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
@@ -605,7 +641,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
int strict_prot)
{
int is_ram = 0;
- int id_sz, ret;
+ int ret;
unsigned long flags;
unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);

@@ -626,11 +662,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
return ret;

if (flags != want_flags) {
- if (strict_prot ||
- (want_flags == _PAGE_CACHE_UC_MINUS &&
- flags == _PAGE_CACHE_WB) ||
- (want_flags == _PAGE_CACHE_WC &&
- flags == _PAGE_CACHE_WB)) {
+ if (strict_prot || !is_new_memtype_allowed(want_flags, flags)) {
free_memtype(paddr, paddr + size);
printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
" for %Lx-%Lx, got %s\n",
@@ -648,18 +680,9 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
*vma_prot = __pgprot((pgprot_val(*vma_prot) &
(~_PAGE_CACHE_MASK)) |
flags);
-
}

- /* Need to keep identity mapping in sync */
- if (paddr >= __pa(high_memory))
- return 0;
-
- id_sz = (__pa(high_memory) < paddr + size) ?
- __pa(high_memory) - paddr :
- size;
-
- if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+ if (kernel_map_sync_memtype(paddr, size, flags)) {
free_memtype(paddr, paddr + size);
printk(KERN_ERR
"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index f884740..5ead808 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -314,17 +314,7 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
return retval;

if (flags != new_flags) {
- /*
- * Do not fallback to certain memory types with certain
- * requested type:
- * - request is uncached, return cannot be write-back
- * - request is uncached, return cannot be write-combine
- * - request is write-combine, return cannot be write-back
- */
- if ((flags == _PAGE_CACHE_UC_MINUS &&
- (new_flags == _PAGE_CACHE_WB)) ||
- (flags == _PAGE_CACHE_WC &&
- new_flags == _PAGE_CACHE_WB)) {
+ if (!is_new_memtype_allowed(flags, new_flags)) {
free_memtype(addr, addr+len);
return -EINVAL;
}

2009-01-12 22:16:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [git pull] x86 fixes


Linus,

* Torsten Kaiser <[email protected]> wrote:

> pulled && build, here is the result:
> [ 76.170171] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 76.178376] IP: [<(null)>] (null)

this test result from Torsten establishes it beyond doubt that the current
x86-fixes-for-linus lineup is safe to pull.

We'll work with Toralf to pin down the PAT crash as well, and will send
those fixes once they work fine on Toralf's box.

Ingo

2009-01-13 19:20:39

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [git pull] x86 fixes

On Mon, Jan 12, 2009 at 11:13 PM, Ingo Molnar <[email protected]> wrote:
> * Torsten Kaiser <[email protected]> wrote:
>> On Mon, Jan 12, 2009 at 9:40 PM, Ingo Molnar <[email protected]> wrote:
>> > * Torsten Kaiser <[email protected]> wrote:
>> >> On Mon, Jan 12, 2009 at 8:29 PM, Pallipadi, Venkatesh
>> >> <[email protected]> wrote:
>> >> > oops. I missed out one file in the earlier test patch. Below is the
>> >> > updated test patch that will go against 29-rc1.
>> >> >
>> >> > Thanks,
>> >> > Venki
>> >> >
>> >> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
>> >>
>> >> Tested-by: Torsten Kaiser <[email protected]>
>> >>
>> >> The system boots normal and glxgears is accelerated again.
>> >
>> > Could you try the tree below as well please?
>>
>> Before I read this mail, I already tried the tree you send to Linus as a
>> pull request. That worked without a crash, but as expected the DRM
>> related error was still there.
>
> Do you mean today's x86-fixes pull request to Linus?

Yes, ...

> That would be the
> expected behavior: i separated out the PAT fixes from that tree to be able
> to progress with those other fixes - while the PAT angle is investigated.

... I did see that. I tested the DRM just to be sure that I a) got a
kernel without the fix, as I was expecting and b) that this does not
trigger any other unhappiness.

But as written yesterday: That tree did not crash and the DRM thing
was also in -rc1.

> Neither your crash log nor the review of the PAT patches revealed a
> smoking gun (to me at least), but your crash obviously happened, and it
> happened right after you pulled the x86-fixes tree.
>
>> pulled && build, here is the result:
>> [ 76.170171] BUG: unable to handle kernel NULL pointer dereference at (null)
>> [ 76.178376] IP: [<(null)>] (null)
>
> thanks, that's really helpful!
>
> Below is the delta from the minimal patch you tried earlier today, to the
> full clean patchset.
>
> By all likelyhood, if you apply Venki's patch (which you tested earlier
> today, and which did not crash and gave back 3D performance to you), and
> then apply the patch below, you'll get the same crash again.

That crash was just your tree without, also without the DRM fix from
Venki. In the crashing case its not important anyway, because the
system crashed during X startup, so I never even get a chance to run
any DRM program. ;-P

> So the bug is in the diff below. My first guess would be:
>
> -extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
> +extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
>
> we extended 4G to 64-bits on 32-bit systems. If there's a width problem
> somewhere along the road we can mess the pagetables up real big.

I'm on x86_64, so it should be 64bit anyway. But I will not claim to
know the current sizes of resource_size_t or unsigned long. ;)

But I do have 4GB RAM and part of it is remapped beyound the 32bit limit.

> the other possibility would be this hunk:
>
> - is_range_ram = pagerange_is_ram(start, end);
> - if (is_range_ram == 1)
> - return reserve_ram_pages_type(start, end, req_type, new_type);
> - else if (is_range_ram < 0)
> - return -EINVAL;
> + /*
> + * For legacy reasons, some parts of the physical address range in the
> + * legacy 1MB region is treated as non-RAM (even when listed as RAM in
> + * the e820 tables). So we will track the memory attributes of this
> + * legacy 1MB region using the linear memtype_list always.
> + */
> + if (end >= ISA_END_ADDRESS) {
> + is_range_ram = pagerange_is_ram(start, end);
> + if (is_range_ram == 1)
> + return reserve_ram_pages_type(start, end, req_type,
> + new_type);
> + else if (is_range_ram < 0)
> + return -EINVAL;
> + }
>
> That is this patch's effect:
>
> 4fa1489: x86, pat: fix reserve_memtype() for legacy 1MB range

reverted that patch und booted => still crashes, but in yet another strange way:
[ 93.160112] int3: 0000 [#1] SMP
[ 93.164076] last sysfs file:
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
[ 93.170009] CPU 0
[ 93.170009] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core usbhid btcx_risc hid
tveeprom pata_amd sg
[ 93.170009] Pid: 0, comm: swapper Not tainted
2.6.29-rc1-ingo-00009-geae2f18 #2
[ 93.170009] RIP: 0010:[<ffffffff8099ecc1>] [<ffffffff8099ecc1>]
per_cpu__rcu_bh_data+0x1/0xc0
[ 93.170009] RSP: 0018:ffffffff809a8ed8 EFLAGS: 00000286
[ 93.170009] RAX: ffff88011ddf1930 RBX: ffffffff809a8ed0 RCX: ffffffff80a008c8
[ 93.170009] RDX: 00000000000003fc RSI: ffff880028014c00 RDI: ffffffff807e9440
[ 93.170009] RBP: 000000000000000a R08: ffff880028013180 R09: 0000000000000000
[ 93.170009] R10: ffffffff8087fe58 R11: 0000000000000001 R12: ffffffff80261b39
[ 93.170009] R13: 0000000000000100 R14: 000000000000000a R15: ffffffff8099ecc0
[ 93.170009] FS: 00007f2d71cf56f0(0000) GS:ffffffff809b1040(0000)
knlGS:0000000000000000
[ 93.170009] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 93.170009] CR2: 00007f2d7185a920 CR3: 0000000000201000 CR4: 00000000000006e0
[ 93.170009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 93.170009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 93.170009] Process swapper (pid: 0, threadinfo ffffffff8087e000,
task ffffffff807de360)
[ 93.170009] Stack:
[ 93.170009] ffffffff809a8ef8 ffffffff80277c12 000000000000000a
0000000000000040
[ 93.170009] ffffffff809a8f38 ffffffff809a8f10 ffffffff8021b230
ffffffff809a8f50
[ 93.170009] ffffffff8021b6de 00000000000e0000 ffff88007c407af8
0000000000000086
[ 93.170009] Call Trace:
[ 93.170009] <IRQ> <0> [<ffffffff80277c12>] ? rcu_process_callbacks+0x32/0x60
[ 93.170009] [<ffffffff8021b230>] ? post_set+0x20/0x40
[ 93.170009] [<ffffffff8021b6de>] ? generic_set_mtrr+0x11e/0x140
[ 93.170009] [<ffffffff80219457>] ? ipi_handler+0x47/0xb0
[ 93.170009] [<ffffffff8026af80>] ?
generic_smp_call_function_interrupt+0x50/0x100
[ 93.170009] [<ffffffff8021e54f>] ? smp_call_function_interrupt+0x1f/0x30
[ 93.170009] [<ffffffff8020c863>] ? call_function_interrupt+0x13/0x20
[ 93.170009] <EOI> <0>Code: cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc <cc> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc
[ 93.170009] RIP [<ffffffff8099ecc1>] per_cpu__rcu_bh_data+0x1/0xc0
[ 93.170009] RSP <ffffffff809a8ed8>
[ 93.181327] ---[ end trace e7dd93fe22e9ffa7 ]---
[ 93.181327] Kernel panic - not syncing: Fatal exception in interrupt
[ 93.172531] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 93.172531] IP: [<ffffffff8026af53>]
generic_smp_call_function_interrupt+0x23/0x100
[ 93.172531] PGD 11b918067 PUD 11b83e067 PMD 0
[ 93.172531] Oops: 0000 [#2] SMP
[ 93.172531] last sysfs file:
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.0/enable
[ 93.172531] CPU 2
[ 93.172531] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core usbhid btcx_risc hid
tveeprom pata_amd sg
[ 93.172531] Pid: 3283, comm: X Tainted: G D
2.6.29-rc1-ingo-00009-geae2f18 #2
[ 93.172531] RIP: 0010:[<ffffffff8026af53>] [<ffffffff8026af53>]
generic_smp_call_function_interrupt+0x23/0x100
[ 93.172531] RSP: 0018:ffff88011f127f80 EFLAGS: 00010046
[ 93.172531] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88007f13ab80
[ 93.172531] RDX: ffffffff809a2d00 RSI: 0000000000000000 RDI: 0000000000000002
[ 93.172531] RBP: ffff88011f127fa0 R08: 0000000000000000 R09: ffff88011e40f780
[ 93.172531] R10: ffff88007c407e48 R11: 0000000000000000 R12: ffff88011ddf1ee0
[ 93.172531] R13: 0000000000000000 R14: 0000000000000002 R15: ffff88011e59a780
[ 93.172531] FS: 00007f3267f8e6f0(0000) GS:ffff88011f0de000(0000)
knlGS:0000000000000000
[ 93.172531] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 93.172531] CR2: 0000000000000000 CR3: 000000011b9b7000 CR4: 00000000000006e0
[ 93.172531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 93.172531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 93.172531] Process X (pid: 3283, threadinfo ffff88007c406000, task
ffff88007d145700)
[ 93.172531] Stack:
[ 93.172531] ffff88011e59a780 ffff88007e09c3d8 0000000000000000
ffff88007e09c3d8
[ 93.172531] ffff88011f127fb0 ffffffff8021e54f ffff88007c407c80
ffffffff8020c863 <EOI>
[ 93.172531] 841f0ffffffcebe9 ff02680000000000 02e850ec8348ffff
00011b8de8fffff1
[ 93.172531] Call Trace:
[ 93.172531] <IRQ> <0> [<ffffffff8021e54f>]
smp_call_function_interrupt+0x1f/0x30
[ 93.172531] [<ffffffff8020c863>] call_function_interrupt+0x13/0x20
[ 93.172531] <EOI> <0>Code: e8 d3 0a 05 00 c9 c3 90 55 48 89 e5 41
56 65 44 8b 34 25 24 00 00 00 41 55 41 54 53 48 8b 1d 55 df 57 00 eb
06 0f 1f 00 48 8b 1b <48> 8b 03 48 81 fb a0 8e 7e 80 0f 18 08 0f 84 9a
00 00 00 4c 8d
[ 93.172531] RIP [<ffffffff8026af53>]
generic_smp_call_function_interrupt+0x23/0x100
[ 93.172531] RSP <ffff88011f127f80>
[ 93.172531] CR2: 0000000000000000
[ 93.172531] ---[ end trace e7dd93fe22e9ffa8 ]---
[ 93.172531] Kernel panic - not syncing: Fatal exception in interrupt
[ 93.172531] ------------[ cut here ]------------
[ 93.172531] WARNING: at kernel/smp.c:299 smp_call_function_many+0x1e9/0x250()
[ 93.172531] Hardware name: KFN5-D SLI
[ 93.172531] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core usbhid btcx_risc hid
tveeprom pata_amd sg
[ 93.172531] Pid: 3283, comm: X Tainted: G D
2.6.29-rc1-ingo-00009-geae2f18 #2
[ 93.172531] Call Trace:
[ 93.172531] <IRQ> [<ffffffff802440a0>] warn_slowpath+0xd0/0x130
[ 93.172531] [<ffffffff8065d1cf>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff8026ada9>] smp_call_function_many+0x1e9/0x250
[ 93.172531] [<ffffffff80213570>] ? stop_this_cpu+0x0/0x30
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff8026ae30>] smp_call_function+0x20/0x30
[ 93.172531] [<ffffffff8021e4c0>] native_smp_send_stop+0x30/0x70
[ 93.172531] [<ffffffff8065a114>] panic+0xa8/0x165
[ 93.172531] [<ffffffff8065d1cf>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff80244c75>] ? console_unblank+0x75/0x90
[ 93.172531] [<ffffffff8020fca3>] oops_end+0x93/0xa0
[ 93.172531] [<ffffffff8022a864>] do_page_fault+0x424/0x980
[ 93.172531] [<ffffffff80261b39>] ? getnstimeofday+0x59/0xe0
[ 93.172531] [<ffffffff8065cdbd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 93.172531] [<ffffffff8065d52f>] page_fault+0x1f/0x30
[ 93.172531] [<ffffffff8026af53>] ?
generic_smp_call_function_interrupt+0x23/0x100
[ 93.172531] [<ffffffff8021e54f>] smp_call_function_interrupt+0x1f/0x30
[ 93.172531] [<ffffffff8020c863>] call_function_interrupt+0x13/0x20
[ 93.172531] <EOI> <4>---[ end trace e7dd93fe22e9ffa9 ]---
[ 93.172531] ------------[ cut here ]------------
[ 93.172531] WARNING: at kernel/smp.c:220
smp_call_function_single+0xa7/0x110()
[ 93.172531] Hardware name: KFN5-D SLI
[ 93.172531] Modules linked in: w83792d tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
tvaudio msp3400 bttv ir_common v4l2_common videodev v4l1_compat
v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core usbhid btcx_risc hid
tveeprom pata_amd sg
[ 93.172531] Pid: 3283, comm: X Tainted: G D W
2.6.29-rc1-ingo-00009-geae2f18 #2
[ 93.172531] Call Trace:
[ 93.172531] <IRQ> [<ffffffff802440a0>] warn_slowpath+0xd0/0x130
[ 93.172531] [<ffffffff8065a063>] ? dump_stack+0x72/0x7b
[ 93.172531] [<ffffffff8026ba97>] ? print_modules+0x57/0xb0
[ 93.172531] [<ffffffff802440ba>] ? warn_slowpath+0xea/0x130
[ 93.172531] [<ffffffff8065d1cf>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff8026ab57>] smp_call_function_single+0xa7/0x110
[ 93.172531] [<ffffffff8026ad7a>] smp_call_function_many+0x1ba/0x250
[ 93.172531] [<ffffffff80213570>] ? stop_this_cpu+0x0/0x30
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff8026ae30>] smp_call_function+0x20/0x30
[ 93.172531] [<ffffffff8021e4c0>] native_smp_send_stop+0x30/0x70
[ 93.172531] [<ffffffff8065a114>] panic+0xa8/0x165
[ 93.172531] [<ffffffff8065d1cf>] ? _spin_unlock_irqrestore+0x2f/0x40
[ 93.172531] [<ffffffff8024494d>] ? release_console_sem+0x1dd/0x230
[ 93.172531] [<ffffffff80244c75>] ? console_unblank+0x75/0x90
[ 93.172531] [<ffffffff8020fca3>] oops_end+0x93/0xa0
[ 93.172531] [<ffffffff8022a864>] do_page_fault+0x424/0x980
[ 93.172531] [<ffffffff80261b39>] ? getnstimeofday+0x59/0xe0
[ 93.172531] [<ffffffff8065cdbd>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 93.172531] [<ffffffff8065d52f>] page_fault+0x1f/0x30
[ 93.172531] [<ffffffff8026af53>] ?
generic_smp_call_function_interrupt+0x23/0x100
[ 93.172531] [<ffffffff8021e54f>] smp_call_function_interrupt+0x1f/0x30
[ 93.172531] [<ffffffff8020c863>] call_function_interrupt+0x13/0x20
[ 93.172531] <EOI> <4>---[ end trace e7dd93fe22e9ffaa ]---

Similar additional warning where also on the very first crash that
just like this one also left the keyboard leds blinking.
I did not post them for the first crash, because I suspected that
these WARNINGs where just triggered because the first Oops messed
something up.

> if you have more testing capacity, could you please try tip/master again:

I will see, if I find time to test tip/master later...

> http://people.redhat.com/mingo/tip.git/README
>
> by all likelyhood it will crash for you (it has the PAT fixes included).
> Then type this:
>
> git revert 4fa1489
>
> Does that solve the crash and give you good 3D performance again?

Reverting 4fa1489 did not help.
Output from git log from the tree I tested:
eae2f1895569e51a97f359759826519f7e0f2a61 Revert "x86, pat: fix
reserve_memtype() for legacy 1MB range"
4fa1489d2a74c1e3c6231f449d73ce46131523ae x86, pat: fix
reserve_memtype() for legacy 1MB range
895252ccb3050383e1dcf2c2536065e346c2fa14 x86 PAT: remove CPA WARN_ON
for zero pte
838b120c59b530ba58cc0197d208d08455733472 x86 PAT: ioremap_wc should
take resource_size_t parameter
283c81fe6568202db345649e874d2a0f29dc5a84 x86 PAT: return compatible
mapping to remap_pfn_range callers
dfed11010f7b2d994444bcd83ec4cc7e80d7d030 x86 PAT: change
track_pfn_vma_new to take pgprot_t pointer param
a8eae3321ea94fe06c6a76b48cc6a082116b1784 x86 PAT: consolidate old
memtype new memtype check into a function
18d82ebde7e40bf67c84b505a12be26133a89932 x86 PAT: remove PFNMAP type
on track_pfn_vma_new() error
ae04d1401577bb63151480a053057de58b8e10bb powerpc: Fix cpufreq drivers
after cpufreq core changes
c59765042f53a79a7a65585042ff463b69cb248c Linux 2.6.29-rc1

I could not test the 3D performance, as X kept killing the system on startup. ;)
But as already written: Just the fix from Venkatesh alone did fix 3D
for me and did not result in any crashes.

Torsten