2009-06-28 16:10:07

by Arjan van de Ven

[permalink] [raw]
Subject: kerneloops.org report for the week

Few "highlights" this week
* mem_cgroup_add_lru_list (rank 2) is a high rising issue;
it's list corruption, question is why this is new
* rank 13 (memcmp in the raid code) is also new
* the warning in get_free_pages that has been discussed on lkml is dropping
from the ranks again


This week, a total of 15273 oopses and warnings have been reported,
compared to 13384 reports in the previous week.


Rank 1: i915_gem_set_tiling (warning)
Reported 2754 times (8928 total reports)
[gem] Failure in the tiling code
This warning was last seen in version 2.6.31-rc0-git15, and first seen in 2.6.29-rc2.
More info: http://www.kerneloops.org/searchweek.php?search=i915_gem_set_tiling

Rank 2: mem_cgroup_add_lru_list (warn)
Reported 1554 times (1622 total reports)
List corruption in the VM code
This oops was last seen in version 2.6.30-git19, and first seen in 2.6.29.
More info: http://www.kerneloops.org/searchweek.php?search=mem_cgroup_add_lru_list

Rank 3: getnstimeofday (warning)
Reported 1319 times (4893 total reports)
[suspend resume] getnstimeofday() is called before timekeeping is resumed
This oops was last seen in version 2.6.30, and first seen in 2.6.24.
More info: http://www.kerneloops.org/searchweek.php?search=getnstimeofday

Rank 4: iwl_set_dynamic_key (warning)
Reported 825 times (16074 total reports)
no space for new kew"
This warning was last seen in version 2.6.30, and first seen in 2.6.27.
More info: http://www.kerneloops.org/searchweek.php?search=iwl_set_dynamic_key

Rank 5: ct_vm_map (warning)
Reported 773 times (2362 total reports)
Fedora: Bug in the creative XFI driver backport
This warning was last seen in version 2.6.29.5, and first seen in 2.6.29-rc4-git1.
More info: http://www.kerneloops.org/searchweek.php?search=ct_vm_map

Rank 6: parport_device_proc_unregister (warning)
Reported 766 times (3660 total reports)
Alan has a fix for this
This warning was last seen in version 2.6.29.4, and first seen in 2.6.27.
More info: http://www.kerneloops.org/searchweek.php?search=parport_device_proc_unregister

Rank 7: hres_timers_resume (warning)
Reported 763 times (2368 total reports)
[suspend resume] hres_timers_resume() is incorrectly called with interrupts on
This warning was last seen in version 2.6.30, and first seen in 2.6.24.7.
More info: http://www.kerneloops.org/searchweek.php?search=hres_timers_resume

Rank 8: generic_get_mtrr (warning)
Reported 544 times (2061 total reports)
BIOS bug where the MTRRs are not set up correctly
This warning was last seen in version 2.6.30, and first seen in 2.6.25.3.
More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr

Rank 9: dmar_table_init (warning)
Reported 346 times (1206 total reports)
BIOS bug exposed via a WARN_ON
This warning was last seen in version 2.6.31-rc0-git18, and first seen in 2.6.29.
More info: http://www.kerneloops.org/searchweek.php?search=dmar_table_init

Rank 10: minstrel_get_rate (warning)
Reported 228 times (796 total reports)
Issue with the ath5k driver
This warning was last seen in version 2.6.30-rc3, and first seen in 2.6.27.12.
More info: http://www.kerneloops.org/searchweek.php?search=minstrel_get_rate

Rank 11: dev_watchdog(r8169) (oops)
Reported 206 times (9616 total reports)
This oops was last seen in version 2.6.30, and first seen in 2.6.26.6.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(r8169)

Rank 12: ath_get_rate (warning)
Reported 145 times (1051 total reports)
This warning was last seen in version 2.6.30, and first seen in 2.6.26.
More info: http://www.kerneloops.org/searchweek.php?search=ath_get_rate

Rank 13: memcmp (oops)
Reported 131 times (273 total reports)
memcmp trips up in the raid code
This oops was last seen in version 2.6.29.4, and first seen in 2.6.21-rc3.
More info: http://www.kerneloops.org/searchweek.php?search=memcmp


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org


2009-06-29 03:18:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: kerneloops.org report for the week


* Arjan van de Ven <[email protected]> wrote:

> Few "highlights" this week
> * mem_cgroup_add_lru_list (rank 2) is a high rising issue;
> it's list corruption, question is why this is new
> * rank 13 (memcmp in the raid code) is also new
> * the warning in get_free_pages that has been discussed on lkml is dropping
> from the ranks again
>
>
> This week, a total of 15273 oopses and warnings have been reported,
> compared to 13384 reports in the previous week.
>
>
> Rank 2: mem_cgroup_add_lru_list (warn)
> Reported 1554 times (1622 total reports)
> List corruption in the VM code
> This oops was last seen in version 2.6.30-git19, and first seen in 2.6.29.
> More info: http://www.kerneloops.org/searchweek.php?search=mem_cgroup_add_lru_list

At least one list corruption bug was fixed by:

cb4cbcf: mm: fix incorrect page removal from LRU

> Rank 3: getnstimeofday (warning)
> Reported 1319 times (4893 total reports)
> [suspend resume] getnstimeofday() is called before timekeeping is resumed
> This oops was last seen in version 2.6.30, and first seen in 2.6.24.
> More info: http://www.kerneloops.org/searchweek.php?search=getnstimeofday

Probably caused by some buggy driver callback?

> Rank 7: hres_timers_resume (warning)
> Reported 763 times (2368 total reports)
> [suspend resume] hres_timers_resume() is incorrectly called with interrupts on
> This warning was last seen in version 2.6.30, and first seen in 2.6.24.7.
> More info: http://www.kerneloops.org/searchweek.php?search=hres_timers_resume

This is probably a driver incorrectly enabling irqs in a resume
callback. This should be easier and more specific to debug with the
lockdep based annotation i suggested for the suspend code in various
`mails.

> Rank 8: generic_get_mtrr (warning)
> Reported 544 times (2061 total reports)
> BIOS bug where the MTRRs are not set up correctly
> This warning was last seen in version 2.6.30, and first seen in 2.6.25.3.
> More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr

I think this calls for enabling the x86 MTRR sanitizer by default -
500 out of 15000 reports suggests a significant proportion of Linux
systems is affected by MTRR setup problems.

I.e. we should change:

config MTRR_SANITIZER_ENABLE_DEFAULT
int "MTRR cleanup enable value (0-1)"
range 0 1
default "0"

To 'default "1"'. Any objections?

If the MTRR sanitizer is enabled then i think the above warning in
generic_get_mtrr() should never trigger.

Ingo

2009-06-29 09:05:21

by Andi Kleen

[permalink] [raw]
Subject: Re: kerneloops.org report for the week

Ingo Molnar <[email protected]> writes:
>
>> Rank 8: generic_get_mtrr (warning)
>> Reported 544 times (2061 total reports)
>> BIOS bug where the MTRRs are not set up correctly
>> This warning was last seen in version 2.6.30, and first seen in 2.6.25.3.
>> More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr
>
> I think this calls for enabling the x86 MTRR sanitizer by default -
> 500 out of 15000 reports suggests a significant proportion of Linux
> systems is affected by MTRR setup problems.

The question is if the MTRRs are really wrong. The generic.c code checks
against the maximum address space reported by the CPU, but the BIOS might
only sets up the address space that is actually used. The later
is not really wrong; it's not needed to set MTRRs for non existing
mappings.

It might be better to double check against the max e820 mapping too
or drop that check.

-Andi

--
[email protected] -- Speaking for myself only.