2007-10-09 20:54:43

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.23


Finally.

Yeah, it got delayed, not because of any huge issues, but because of
various bugfixes trickling in and causing me to reset my "release clock"
all the time. But it's out there now, and hopefully better for the wait.

Not a whole lot of changes since -rc9, although there's a few updates to
mips, sparc64 and blackfin in there. Ignoring those arch updates, there's
basically a number of mostly one-liners (mostly in drivers, but there's
some networking fixes and soem VFS/VM fixes there too).

Shortlog and diffstat appended (both relative to -rc9, of course - the
full log from 2.6.22 is on kernel.org as usual).

I want this to be what people look at for a few days, but expect the x86
merge to go ahead after that. So far, all indications are still that it's
going to be all smooth sailing, but hey, those indicators seem to always
say that, and only after the fact do people notice any problems ;)

Linus

---
Akinobu Mita (1):
[SPARC64]: check fork_idle() error

Al Viro (1):
fix bogus reporting of signals by audit

Alexey Dobriyan (2):
Move kasprintf.o to obj-y
[ROSE]: Fix rose.ko oops on unload

Alexey Kuznetsov (1):
[SFQ]: Remove artificial limitation for queue limit.

Andrew Morton (1):
binfmt_flat: checkpatch fixing minimum support for the blackfin relocations

Anton Blanchard (2):
[POWERPC] Fix xics set_affinity code
Fix timer_stats printout of events/sec

Attila Kinali (1):
Add manufacturer and card id of teltonica pcmcia modems

Ben Dooks (2):
[ARM] 4597/2: OSIRIS: ensure CPLD0 is preserved after suspend
[ARM] 4598/2: OSIRIS: Ensure we do not get nRSTOUT during suspend

Benjamin Herrenschmidt (1):
Fix non-terminated PCI match table in PowerMac IDE

Bernd Schmidt (1):
Binfmt_flat: Add minimum support for the Blackfin relocations

Brian Haley (1):
[IPv6]: Fix ICMPv6 redirect handling with target multicast address

Bryan Wu (1):
Blackfin arch: add some missing syscall

Dale Farnsworth (1):
mv643xx_eth: Do not modify struct netdev tx_queue_len

David S. Miller (8):
[SPARC]: Fix EBUS use of uninitialized variable.
[SPARC64]: Fix put_user() calls in binfmt_aout32.c
[SPARC64]: Fix missing load-twin usage in Niagara-1 memcpy.
[SPARC64]: Don't use in/local regs for ldx/stx data in N1 memcpy.
[SPARC64]: Fix domain-services port probing.
[SPARC64]: VIO device addition log message level is too high.
[SPARC64]: Temporary workaround for PCI-E slot on T1000.
[SPARC64]: Fix 'niu' complex IRQ probing.

Dmitry Torokhov (1):
Driver core: fix SYSF_DEPRECATED breakage for nested classdevs

Eric Dumazet (1):
[TCP]: secure_tcp_sequence_number() should not use a too fast clock

FUJITA Tomonori (1):
[SCSI] megaraid_old: fix READ_CAPACITY

Florian Fainelli (2):
[MIPS] Alchemy: Fix USB initialization.
[MIPS] Au1000: set the PCI controller IO base

Francois Romieu (1):
r8169: revert part of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2

Giuseppe Sacco (2):
[MIPS] IP32: Enable PCI bridges
[MIPS] IP32: Fix fatal typo in address computation.

Hugh Dickins (1):
Fix sys_remap_file_pages BUG at highmem.c:15!

Ilpo J?rvinen (1):
[TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed

Ingo Molnar (1):
sched: fix profile=sleep

Jeff Garzik (2):
aic94xx: fix DMA data direction for SMP requests
sata_mv: correct S/G table limits

Jeremy Fitzhardinge (1):
xen: disable split pte locks for now

Jiri Slaby (1):
Ata: pata_marvell, use ioread* for iomap-ped memory

Joe Perches (1):
bcm43xx: Correct printk with PFX before KERN_

John W. Linville (1):
[IEEE80211]: avoid integer underflow for runt rx frames

Karsten Keil (1):
ISDN: Fix data access out of array bounds

Kyle McMartin (1):
Revert "intel_agp: fix stolen mem range on G33"

Linus Torvalds (3):
VT_WAITACTIVE: Avoid returning EINTR when not necessary
Don't do load-average calculations at even 5-second intervals
Linux 2.6.23

Maarten Bressers (1):
Correct Makefile rule for generating custom keymap

Maciej W. Rozycki (1):
[MIPS] pg-r4k.c: Fix a typo in an R4600 v2 erratum workaround

Michael Hennerich (2):
Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <[email protected]>

Olof Johansson (1):
libata: fix for sata_mv >64KB DMA segments

Pavel Machek (1):
sysrq docs: document sequence that actually works

Peter Korsgaard (1):
dm9601: Fix receive MTU

Peter Zijlstra (2):
lockstat: documentation
mm: set_page_dirty_balance() vs ->page_mkwrite()

Rafal Bilski (1):
Longhaul: add auto enabled "revid_errata" option

Ralf Baechle (2):
[MIPS] Type proof reimplementation of cmpxchg.
[MIPS] Terminally fix local_{dec,sub}_if_positive

Richard Knutsson (1):
softmac: Fix compiler-warning

Ron Mercer (2):
qla3xxx: bugfix: Add memory barrier before accessing rx completion.
qla3xxx: bugfix: Fix VLAN rx completion handling.

Scott Thompson (1):
drivers/ata/pata_ixp4xx_cf.c: ioremap return code check

Serge Belyshev (1):
Remove unnecessary cast in prefetch()

Stefan Richter (1):
firewire: point to migration document

Stephen Hemminger (2):
sky2: jumbo frame regression fix
[PKT_SCHED] cls_u32: error code isn't been propogated properly

Sunil Mushran (1):
ocfs2: Unlock mutex in local alloc failure case

Tejun Heo (1):
ata_piix: add another TECRA M3 entry to broken suspend list

Trond Myklebust (1):
NLM: Fix a memory leak in nlmsvc_testlock

Yan Zheng (3):
fix VM_CAN_NONLINEAR check in sys_remap_file_pages
fix page release issue in filemap_fault
AIO: fix cleanup in io_submit_one(...)

---
Documentation/lockstat.txt | 120 +++++++
Documentation/sysrq.txt | 2 +-
Makefile | 2 +-
arch/arm/mach-s3c2440/mach-osiris.c | 18 +
arch/blackfin/kernel/bfin_gpio.c | 285 ++++++++++++++--
arch/blackfin/mach-common/entry.S | 23 +-
arch/i386/kernel/cpu/cpufreq/longhaul.c | 60 ++++-
arch/mips/au1000/common/pci.c | 1 +
arch/mips/au1000/mtx-1/board_setup.c | 4 +-
arch/mips/au1000/pb1000/board_setup.c | 6 +-
arch/mips/au1000/pb1100/board_setup.c | 4 +-
arch/mips/au1000/pb1500/board_setup.c | 6 +-
arch/mips/mm/pg-r4k.c | 2 +-
arch/mips/pci/ops-mace.c | 21 +-
arch/powerpc/platforms/pseries/xics.c | 2 +-
arch/sparc/kernel/ebus.c | 2 +
arch/sparc64/kernel/binfmt_aout32.c | 4 +-
arch/sparc64/kernel/ebus.c | 5 +-
arch/sparc64/kernel/pci_common.c | 4 +-
arch/sparc64/kernel/prom.c | 3 +-
arch/sparc64/kernel/smp.c | 2 +
arch/sparc64/kernel/vio.c | 29 ++-
arch/sparc64/lib/NGcopy_from_user.S | 8 +-
arch/sparc64/lib/NGcopy_to_user.S | 8 +-
arch/sparc64/lib/NGmemcpy.S | 371 ++++++++++++---------
drivers/ata/ata_piix.c | 7 +
drivers/ata/pata_ixp4xx_cf.c | 3 +
drivers/ata/pata_marvell.c | 4 +-
drivers/ata/sata_mv.c | 35 ++-
drivers/base/core.c | 10 +-
drivers/char/Makefile | 2 +-
drivers/char/agp/intel-agp.c | 5 -
drivers/char/random.c | 10 +-
drivers/char/vt_ioctl.c | 4 +-
drivers/firewire/Kconfig | 3 +-
drivers/ide/ppc/pmac.c | 1 +
drivers/isdn/i4l/isdn_common.c | 5 +-
drivers/net/mv643xx_eth.c | 1 -
drivers/net/qla3xxx.c | 7 +
drivers/net/r8169.c | 16 +-
drivers/net/sky2.c | 3 -
drivers/net/usb/dm9601.c | 2 +-
drivers/net/wireless/bcm43xx/bcm43xx_wx.c | 2 +-
drivers/scsi/aic94xx/aic94xx_task.c | 4 +-
drivers/scsi/megaraid.c | 8 +
drivers/serial/serial_cs.c | 1 +
fs/aio.c | 2 +-
fs/binfmt_flat.c | 6 +-
fs/lockd/svclock.c | 4 +-
fs/ocfs2/localalloc.c | 4 +-
include/asm-blackfin/mach-bf533/bfin_serial_5xx.h | 11 +-
include/asm-blackfin/mach-bf537/bfin_serial_5xx.h | 23 +-
include/asm-blackfin/mach-bf537/portmux.h | 35 ++-
include/asm-blackfin/mach-bf561/bfin_serial_5xx.h | 11 +-
include/asm-blackfin/portmux.h | 55 +++
include/asm-blackfin/unistd.h | 56 +++-
include/asm-h8300/flat.h | 3 +-
include/asm-m32r/flat.h | 3 +-
include/asm-m68knommu/flat.h | 3 +-
include/asm-mips/cmpxchg.h | 107 ++++++
include/asm-mips/local.h | 69 +----
include/asm-mips/system.h | 261 +---------------
include/asm-sh/flat.h | 3 +-
include/asm-v850/flat.h | 4 +-
include/asm-x86_64/processor.h | 2 +-
include/linux/sched.h | 2 +-
include/linux/writeback.h | 2 +-
include/net/rose.h | 2 +-
kernel/sched_fair.c | 10 +
kernel/signal.c | 22 +-
kernel/time/timer_stats.c | 5 +-
lib/Kconfig.debug | 2 +
lib/Makefile | 4 +-
mm/Kconfig | 1 +
mm/filemap.c | 1 +
mm/fremap.c | 2 +-
mm/memory.c | 23 +-
mm/page-writeback.c | 4 +-
net/ieee80211/ieee80211_rx.c | 6 +
net/ieee80211/softmac/ieee80211softmac_wx.c | 2 +-
net/ipv4/tcp_input.c | 3 +
net/ipv6/ndisc.c | 9 +-
net/rose/rose_loopback.c | 4 +-
net/rose/rose_route.c | 15 +-
net/sched/cls_u32.c | 2 +-
net/sched/sch_sfq.c | 47 ++-
86 files changed, 1254 insertions(+), 701 deletions(-)
create mode 100644 Documentation/lockstat.txt
create mode 100644 include/asm-mips/cmpxchg.h


2007-10-10 06:12:31

by Nicholas Miell

[permalink] [raw]
Subject: Re: Linux 2.6.23

On Tue, 2007-10-09 at 13:54 -0700, Linus Torvalds wrote:
> Finally.
>
> Yeah, it got delayed, not because of any huge issues, but because of
> various bugfixes trickling in and causing me to reset my "release clock"
> all the time. But it's out there now, and hopefully better for the wait.
>
> Not a whole lot of changes since -rc9, although there's a few updates to
> mips, sparc64 and blackfin in there. Ignoring those arch updates, there's
> basically a number of mostly one-liners (mostly in drivers, but there's
> some networking fixes and soem VFS/VM fixes there too).
>
> Shortlog and diffstat appended (both relative to -rc9, of course - the
> full log from 2.6.22 is on kernel.org as usual).
>
> I want this to be what people look at for a few days, but expect the x86
> merge to go ahead after that. So far, all indications are still that it's
> going to be all smooth sailing, but hey, those indicators seem to always
> say that, and only after the fact do people notice any problems ;)
>
> Linus

Does CFS still generate the following sysbench graphs with 2.6.23, or
did that get fixed?

http://people.freebsd.org/~kris/scaling/linux-pgsql.png
http://people.freebsd.org/~kris/scaling/linux-mysql.png

(There's also some interesting FreeBSD vs. Linux graphs in
http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf , but
AFAIK those comparisons are more indicative of glibc malloc performance
than Linux performance.)

--
Nicholas Miell <[email protected]>

2007-10-10 08:16:36

by René Rebe

[permalink] [raw]
Subject: Re: Linux 2.6.23

Hi Linus et al.,

2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:

In file included from fs/drop_caches.c:8:
include/linux/mm.h:1210: warning: 'struct super_block' declared inside parameter list
nclude/linux/mm.h:1210: warning: its scope is only this definition or declaration, which is probably not what you want
fs/drop_caches.c:17: error: conflicting types for 'drop_pagecache_sb'
include/linux/mm.h:1210: error: previous declaration of 'drop_pagecache_sb' was here
fs/drop_caches.c:28: error: conflicting types for 'drop_pagecache_sb'
include/linux/mm.h:1210: error: previous declaration of 'drop_pagecache_sb' was here

A little forward declaration fixes this:

--- linux-2.6.23/include/linux/mm.h.vanilla 2007-10-10 09:28:33.000000000 +0200
+++ linux-2.6.23/include/linux/mm.h 2007-10-10 09:30:23.000000000 +0200
@@ -1207,6 +1207,7 @@
void __user *, size_t *, loff_t *);
unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
unsigned long lru_pages);
+struct super_block;
extern void drop_pagecache_sb(struct super_block *);
void drop_pagecache(void);
void drop_slab(void);

You probably end up fixing it some other way, but as I do not know this
file inside out I just wanted to drop a note.

Yours,
Ren? Rebe

On Tuesday 09 October 2007 22:54:30 Linus Torvalds wrote:

> Finally.
>
> Yeah, it got delayed, not because of any huge issues, but because of
> various bugfixes trickling in and causing me to reset my "release clock"
> all the time. But it's out there now, and hopefully better for the wait.
>
> Not a whole lot of changes since -rc9, although there's a few updates to
> mips, sparc64 and blackfin in there. Ignoring those arch updates, there's
> basically a number of mostly one-liners (mostly in drivers, but there's
> some networking fixes and soem VFS/VM fixes there too).
>
> Shortlog and diffstat appended (both relative to -rc9, of course - the
> full log from 2.6.22 is on kernel.org as usual).
>
> I want this to be what people look at for a few days, but expect the x86
> merge to go ahead after that. So far, all indications are still that it's
> going to be all smooth sailing, but hey, those indicators seem to always
> say that, and only after the fact do people notice any problems ;)
>
> Linus

--
Ren? Rebe - ExactCODE GmbH - Europe, Germany, Berlin
http://exactcode.de | http://t2-project.org | http://rene.rebe.name

2007-10-10 08:37:32

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Linux 2.6.23

On 10/10/07, Ren? Rebe <[email protected]> wrote:
> 2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:
>
> In file included from fs/drop_caches.c:8:
> include/linux/mm.h:1210: warning: 'struct super_block' declared inside
> parameter list

> --- linux-2.6.23/include/linux/mm.h.vanilla
> +++ linux-2.6.23/include/linux/mm.h

> +struct super_block;
> extern void drop_pagecache_sb(struct super_block *);
> void drop_pagecache(void);
> void drop_slab(void);
>
> You probably end up fixing it some other way, but as I do not know this
> file inside out I just wanted to drop a note.

You have some strange vanilla kernel. 2.6.23 doesn't have this prototype.

2007-10-10 09:12:41

by Michael Tokarev

[permalink] [raw]
Subject: Re: Linux 2.6.23

Alexey Dobriyan wrote:
> On 10/10/07, Ren? Rebe <[email protected]> wrote:
>> 2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:
>>
>> In file included from fs/drop_caches.c:8:
>> include/linux/mm.h:1210: warning: 'struct super_block' declared inside
>> parameter list
>
>> --- linux-2.6.23/include/linux/mm.h.vanilla
>> +++ linux-2.6.23/include/linux/mm.h
>
>> +struct super_block;
>> extern void drop_pagecache_sb(struct super_block *);
>> void drop_pagecache(void);
>> void drop_slab(void);
>>
>> You probably end up fixing it some other way, but as I do not know this
>> file inside out I just wanted to drop a note.
>
> You have some strange vanilla kernel. 2.6.23 doesn't have this prototype.

The same happens here as well.

-rw-rw-r-- 1 mjt mjt 45488158 Oct 9 20:48 linux-2.6.23.tar.bz2
2cc2fd4d521dc5d7cfce0d8a9d1b3472 linux-2.6.23.tar.bz2

(timestamp is in UTC) Downloaded yesterday, 3 hours after an announce,
from http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 .

/mjt

2007-10-10 10:15:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.23


* Nicholas Miell <[email protected]> wrote:

> Does CFS still generate the following sysbench graphs with 2.6.23, or
> did that get fixed?
>
> http://people.freebsd.org/~kris/scaling/linux-pgsql.png
> http://people.freebsd.org/~kris/scaling/linux-mysql.png

as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:

http://redhat.com/~mingo/misc/sysbench.jpg

As you can see it in the graph, v2.6.23 schedules much more consistently
too. [ v2.6.22 has a small (but potentially statistically insignificant)
edge at 4-6 clients, and CFS has a slightly better peak (which is
statistically insignificant). ]

( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
nor in the setup - everything is pretty close to the defaults. )

i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it
apparently got resolved after various changes to the test environment:

http://jeffr-tech.livejournal.com/10103.html

" [<CFS>] has virtually no dropoff and performs better under load than
the default 2.6.21 scheduler. " (paraphrased)

(The new link you posted, just a few hours after the release of v2.6.23,
has not been reported to lkml before AFAICS - when did you become aware
of it? If you learned about it before v2.6.23 it might have been useful
to report it to the v2.6.23 regression list.)

At a quick glance there are no .configs or other testing details at or
around that URL that i could use to reproduce their result precisely, so
at least a minimal bugreport would be nice.

In any case, here are a few general comments about sysbench numbers:

Sysbench is a pretty 'batched' workload: it benefits most from batchy
scheduling: the client doing as much work as it can, then server doing
as much work as it can - and so on. The longer the client can work the
more cache-efficient the workload is. Any round-trip to the server due
to pesky preemption only blows up the cache footprint of the workload
and gives lower throughput.

This kind of workload would probably run best on DOS or Windows 3.11,
with no preemptive scheduling done at all. In other words: run both
mysqld and the client as SCHED_FIFO to get the best performance out of
it. So in that sense the workload is a bit similar to dbench.

The other thing is that mysqld does _tons_ of sys_time() calls, so GTOD
differences between .22 and .23 might cause extra overhead - especially
with 8 CPUs/cores. Does the sys_time() scalability patch below improve
sysbench performance for you? (i'm not sure about psqld)

If it's indeed due to batched vs. well-spread-out scheduling behavior
(which is possible), there are a few things you could do to make
scheduling more batched:

1) start the DB daemon up as SCHED_BATCH:

schedtool -B -e service mysqld restart

(and do the same with the client-side commands as well)

or:

schedtool -B $$

to mark the parent shell as SCHED_BATCH - then start up the DB and
start the client workload. (All other tasks not started from this
shell will still be SCHED_OTHER, so only your mysql workload will be
affected.) For example "beagled" already runs under SCHED_BATCH by
default.

SCHED_BATCH will cause the scheduler to batch up the workload more.
You basically tell the scheduler: "this workload really wants
throughput above all", and the scheduler takes that hint and acts
upon it. (it's still not as drastic as SCHED_FIFO, it's somewhere
between SCHED_OTHER and SCHED_FIFO, in terms of batching. Start up
your DB and your client as SCHED_FIFO via "schedtool -F -p 10 ..." to
establish the best-case batching win.)

2) check out the v22 CFS backport patch which has the latest & greatest
scheduler code, from http://people.redhat.com/mingo/cfs-scheduler/ .
Does performance go up for you with it? It's somewhat less
preemption-eager, which might as well make the crutial difference for
sysbench.

3) if it's enabled, disable CONFIG_PREEMPT=y. CONFIG_PREEMPT can cause
unwanted overscheduling and cache-trashing under overload.

hope this helps, and i'm definitely interested in more feedback about
this,

Ingo

Index: linux/kernel/time.c
===================================================================
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -57,11 +57,7 @@ EXPORT_SYMBOL(sys_tz);
*/
asmlinkage long sys_time(time_t __user * tloc)
{
- time_t i;
- struct timespec tv;
-
- getnstimeofday(&tv);
- i = tv.tv_sec;
+ time_t i = get_seconds();

if (tloc) {
if (put_user(i,tloc))
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -49,19 +49,12 @@ struct timespec wall_to_monotonic __attr
static unsigned long total_sleep_time; /* seconds */
EXPORT_SYMBOL(xtime);

-
-#ifdef CONFIG_NO_HZ
static struct timespec xtime_cache __attribute__ ((aligned (16)));
static inline void update_xtime_cache(u64 nsec)
{
xtime_cache = xtime;
timespec_add_ns(&xtime_cache, nsec);
}
-#else
-#define xtime_cache xtime
-/* We do *not* want to evaluate the argument for this case */
-#define update_xtime_cache(n) do { } while (0)
-#endif

static struct clocksource *clock; /* pointer to current clocksource */

2007-10-10 10:36:42

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Linux 2.6.23

On 10/10/07, Michael Tokarev <[email protected]> wrote:
> Alexey Dobriyan wrote:
> > On 10/10/07, Ren? Rebe <[email protected]> wrote:
> >> 2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:
> >>
> >> In file included from fs/drop_caches.c:8:
> >> include/linux/mm.h:1210: warning: 'struct super_block' declared inside
> >> parameter list
> >
> >> --- linux-2.6.23/include/linux/mm.h.vanilla
> >> +++ linux-2.6.23/include/linux/mm.h
> >
> >> +struct super_block;
> >> extern void drop_pagecache_sb(struct super_block *);
> >> void drop_pagecache(void);
> >> void drop_slab(void);
> >>
> >> You probably end up fixing it some other way, but as I do not know this
> >> file inside out I just wanted to drop a note.
> >
> > You have some strange vanilla kernel. 2.6.23 doesn't have this prototype.
>
> The same happens here as well.
>
> -rw-rw-r-- 1 mjt mjt 45488158 Oct 9 20:48 linux-2.6.23.tar.bz2
> 2cc2fd4d521dc5d7cfce0d8a9d1b3472 linux-2.6.23.tar.bz2
>
> (timestamp is in UTC) Downloaded yesterday, 3 hours after an announce,
> from http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 .

Strange. Same size, same md5, no super_block in mm.h, though

2007-10-10 10:53:17

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Linux 2.6.23


On Oct 10 2007 14:36, Alexey Dobriyan wrote:
>> >> --- linux-2.6.23/include/linux/mm.h.vanilla
>> >> +++ linux-2.6.23/include/linux/mm.h
>> >
>> >> +struct super_block;
>> >> extern void drop_pagecache_sb(struct super_block *);
>> >> void drop_pagecache(void);
>> >> void drop_slab(void);
>> >>
>> >> You probably end up fixing it some other way, but as I do not know this
>> >> file inside out I just wanted to drop a note.
>> >
>> > You have some strange vanilla kernel. 2.6.23 doesn't have this prototype.
>>
>> The same happens here as well.
>>
>> -rw-rw-r-- 1 mjt mjt 45488158 Oct 9 20:48 linux-2.6.23.tar.bz2
>> 2cc2fd4d521dc5d7cfce0d8a9d1b3472 linux-2.6.23.tar.bz2
>>
>> (timestamp is in UTC) Downloaded yesterday, 3 hours after an announce,
>> from http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 .
>
>Strange. Same size, same md5, no super_block in mm.h, though

Does someone still have the broken tarball?

There has not been any drop_pagecache_sb anytime between 2.6.23-rc1
and 2.6.23. drop_pagecache_sb reminds me of reiser4, too.

2007-10-10 11:13:19

by Michael Tokarev

[permalink] [raw]
Subject: Re: Linux 2.6.23

Jan Engelhardt wrote:
> On Oct 10 2007 14:36, Alexey Dobriyan wrote:
>>>>> --- linux-2.6.23/include/linux/mm.h.vanilla
>>>>> +++ linux-2.6.23/include/linux/mm.h
>>>>> +struct super_block;
>>>>> extern void drop_pagecache_sb(struct super_block *);
>>>>> void drop_pagecache(void);
>>>>> void drop_slab(void);
>>>>>
>>>>> You probably end up fixing it some other way, but as I do not know this
>>>>> file inside out I just wanted to drop a note.
>>>> You have some strange vanilla kernel. 2.6.23 doesn't have this prototype.
>>> The same happens here as well.
>>>
>>> -rw-rw-r-- 1 mjt mjt 45488158 Oct 9 20:48 linux-2.6.23.tar.bz2
>>> 2cc2fd4d521dc5d7cfce0d8a9d1b3472 linux-2.6.23.tar.bz2
>>>
>>> (timestamp is in UTC) Downloaded yesterday, 3 hours after an announce,
>>> from http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 .
>> Strange. Same size, same md5, no super_block in mm.h, though
>
> Does someone still have the broken tarball?
>
> There has not been any drop_pagecache_sb anytime between 2.6.23-rc1
> and 2.6.23. drop_pagecache_sb reminds me of reiser4, too.

ghhrm. That's nonsense. I found where that struct super_block come
from -- it's from unionfs patches for 2.6.22, which I forgot to
update for 2.6.23 (I just dropped new kernel tarball into my
build directory together with other patches and ran usual build
procedure). It's a definitely false alarm - the tarball is
fine.

/mjt

2007-10-10 19:14:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.23


* Ren? Rebe <[email protected]> wrote:

> Hi Linus et al.,
>
> 2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:

i know about 4 (low-impact, cornercase) build breakages for 2.6.23-final
on x86:

- an uncommon embedded config combinatio: if CONFIG_EMBEDDED=y and
CONFIG_BLOCK is unset. (a normally useless combination)

- an uncommon V4L config combination: mixed-modular-built-in driver V4L
config variation. (CONFIG_VIDEO_SAA7146=y and CONFIG_VIDEO_BUF=m)

- an uncommon MTD config combination (normal systems do not need
CONFIG_MTD configured)

- an uncommon CONFIG_USB_NET_CDC_SUBSET config combination (normal
systems should never hit that)

[ furthermore there are a few driver-firmware build options that break
and which are not correctly made dependent on !PREVENT_FIRMWARE_BUILD.
Again, this is not something one would normally configure. ]

your superblock build failure would be a new and so far unknown build
breakage variant - please send the .config you used, and double-check
that it's indeed a vanilla 2.6.23 tree.

Ingo

2007-10-10 19:26:52

by Michael Tokarev

[permalink] [raw]
Subject: Re: Linux 2.6.23

Ingo Molnar wrote:
> * Ren? Rebe <[email protected]> wrote:
>
>> Hi Linus et al.,
>>
>> 2.6.23 does not build with my usual .config on x86_64 and gcc-4.2.1:
[]
> your superblock build failure would be a new and so far unknown build
> breakage variant - please send the .config you used, and double-check
> that it's indeed a vanilla 2.6.23 tree.

It's not a vanilla 2.6.23. In vanilla 2.6.23 there's no lines about
which it complains (struct super_block isn't mentioned in mm.h at all).
It's some external patch that used to work with 2.6.22 but needs to be
updated for 2.6.23 - in my case it was unionfs.

/mjt

2007-10-10 20:04:22

by Andi Kleen

[permalink] [raw]
Subject: Re: Linux 2.6.23

Ingo Molnar <[email protected]> writes:

> your superblock build failure would be a new and so far unknown build
> breakage variant - please send the .config you used, and double-check
> that it's indeed a vanilla 2.6.23 tree.

It is not -- my 2.6.23 tree doesn't have the prototype that broke
the build for him.

-Andi

2007-10-10 23:27:18

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: Linux 2.6.23

Ingo Molnar <[email protected]> writes:

> - an uncommon embedded config combinatio: if CONFIG_EMBEDDED=y and
> CONFIG_BLOCK is unset. (a normally useless combination)

Uncommon but far from useless - may be pure initramfs-based.
--
Krzysztof Halasa

2007-10-11 01:20:56

by Nicholas Miell

[permalink] [raw]
Subject: Re: Linux 2.6.23

On Wed, 2007-10-10 at 12:14 +0200, Ingo Molnar wrote:
> * Nicholas Miell <[email protected]> wrote:
>
> > Does CFS still generate the following sysbench graphs with 2.6.23, or
> > did that get fixed?
> >
> > http://people.freebsd.org/~kris/scaling/linux-pgsql.png
> > http://people.freebsd.org/~kris/scaling/linux-mysql.png
>
> as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
>
> http://redhat.com/~mingo/misc/sysbench.jpg

That's nice to know. Note that I'm not actually involved in any of these
tests, just a somewhat interested bystander.

>
> As you can see it in the graph, v2.6.23 schedules much more consistently
> too. [ v2.6.22 has a small (but potentially statistically insignificant)
> edge at 4-6 clients, and CFS has a slightly better peak (which is
> statistically insignificant). ]
>
> ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
> 1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
> nor in the setup - everything is pretty close to the defaults. )
>
> i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it
> apparently got resolved after various changes to the test environment:
>
> http://jeffr-tech.livejournal.com/10103.html
>
> " [<CFS>] has virtually no dropoff and performs better under load than
> the default 2.6.21 scheduler. " (paraphrased)
>
> (The new link you posted, just a few hours after the release of v2.6.23,
> has not been reported to lkml before AFAICS - when did you become aware
> of it? If you learned about it before v2.6.23 it might have been useful
> to report it to the v2.6.23 regression list.)

According to my IRC logs, Jeffr pasted the URL at Oct 09 22:53:56 PDT.
He says he tried to contact you early in CFS's development, but got no
reply.

> At a quick glance there are no .configs or other testing details at or
> around that URL that i could use to reproduce their result precisely, so
> at least a minimal bugreport would be nice.
>

AFAICT, the configuration is described in
http://people.freebsd.org/~kris/scaling/mysql.html


--
Nicholas Miell <[email protected]>

2007-10-11 02:35:34

by Yanmin Zhang

[permalink] [raw]
Subject: Re: Linux 2.6.23

On Wed, 2007-10-10 at 12:14 +0200, Ingo Molnar wrote:
> * Nicholas Miell <[email protected]> wrote:
>
> > Does CFS still generate the following sysbench graphs with 2.6.23, or
> > did that get fixed?
> >
> > http://people.freebsd.org/~kris/scaling/linux-pgsql.png
> > http://people.freebsd.org/~kris/scaling/linux-mysql.png
I also captured the same issue on a couple of machines.

>
> as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
>
> http://redhat.com/~mingo/misc/sysbench.jpg
>
> As you can see it in the graph, v2.6.23 schedules much more consistently
> too. [ v2.6.22 has a small (but potentially statistically insignificant)
> edge at 4-6 clients, and CFS has a slightly better peak (which is
> statistically insignificant). ]
>
> ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
> 1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
> nor in the setup - everything is pretty close to the defaults. )
I used FedoraCore 8 Test2 distribution, so glibc-2.6.90-13 already fixed
the old malloc scalability issue. Cpu is 2.66GHZ quad core, 2 physical
processor, totally 8 cores. The regression is about 28%.


>
> i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it
> apparently got resolved after various changes to the test environment:
>
> http://jeffr-tech.livejournal.com/10103.html
>
> " [<CFS>] has virtually no dropoff and performs better under load than
> the default 2.6.21 scheduler. " (paraphrased)
>
> (The new link you posted, just a few hours after the release of v2.6.23,
> has not been reported to lkml before AFAICS - when did you become aware
> of it? If you learned about it before v2.6.23 it might have been useful
> to report it to the v2.6.23 regression list.)
I tested it in 2.6.22 and all 2.6.23-rc kernels. All 2.6.23-rc kernel has
the same regression. The testing result is stable.

> At a quick glance there are no .configs or other testing details at or
> around that URL that i could use to reproduce their result precisely, so
> at least a minimal bugreport would be nice.
Commandline to run testing:
#sysbench --test=oltp --mysql-user=root --mysql-db=mysql --max-time=120
--max-requests=0 --oltp-read-only=on --num-threads=16 run

> In any case, here are a few general comments about sysbench numbers:
>
> Sysbench is a pretty 'batched' workload: it benefits most from batchy
> scheduling: the client doing as much work as it can, then server doing
> as much work as it can - and so on. The longer the client can work the
> more cache-efficient the workload is. Any round-trip to the server due
> to pesky preemption only blows up the cache footprint of the workload
> and gives lower throughput.
>
> This kind of workload would probably run best on DOS or Windows 3.11,
> with no preemptive scheduling done at all. In other words: run both
> mysqld and the client as SCHED_FIFO to get the best performance out of
> it. So in that sense the workload is a bit similar to dbench.
>
> The other thing is that mysqld does _tons_ of sys_time() calls, so GTOD
> differences between .22 and .23 might cause extra overhead - especially
> with 8 CPUs/cores. Does the sys_time() scalability patch below improve
> sysbench performance for you? (i'm not sure about psqld)
>
> If it's indeed due to batched vs. well-spread-out scheduling behavior
> (which is possible), there are a few things you could do to make
> scheduling more batched:
>
> 1) start the DB daemon up as SCHED_BATCH:
>
> schedtool -B -e service mysqld restart
>
> (and do the same with the client-side commands as well)
>
> or:
>
> schedtool -B $$
>
> to mark the parent shell as SCHED_BATCH - then start up the DB and
> start the client workload. (All other tasks not started from this
> shell will still be SCHED_OTHER, so only your mysql workload will be
> affected.) For example "beagled" already runs under SCHED_BATCH by
> default.
>
> SCHED_BATCH will cause the scheduler to batch up the workload more.
> You basically tell the scheduler: "this workload really wants
> throughput above all", and the scheduler takes that hint and acts
> upon it. (it's still not as drastic as SCHED_FIFO, it's somewhere
> between SCHED_OTHER and SCHED_FIFO, in terms of batching. Start up
> your DB and your client as SCHED_FIFO via "schedtool -F -p 10 ..." to
> establish the best-case batching win.)
>
> 2) check out the v22 CFS backport patch which has the latest & greatest
> scheduler code, from http://people.redhat.com/mingo/cfs-scheduler/ .
> Does performance go up for you with it? It's somewhat less
> preemption-eager, which might as well make the crutial difference for
> sysbench.
>
> 3) if it's enabled, disable CONFIG_PREEMPT=y. CONFIG_PREEMPT can cause
> unwanted overscheduling and cache-trashing under overload.
Below is PREMPT config in my kernel config file.

CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set


-yanmin

2007-10-11 13:32:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.23


* Zhang, Yanmin <[email protected]> wrote:

> > ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
> > 1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
> > nor in the setup - everything is pretty close to the defaults. )
>
> I used FedoraCore 8 Test2 distribution, so glibc-2.6.90-13 already
> fixed the old malloc scalability issue. Cpu is 2.66GHZ quad core, 2
> physical processor, totally 8 cores. The regression is about 28%.

thanks for confirming this! I've updated glibc and mysql and now i can
reproduce something similar. (I have a theory about the reason of this
regression, and i'm working on a test-patch.)

Ingo

2007-10-12 01:48:21

by Nick Piggin

[permalink] [raw]
Subject: Re: Linux 2.6.23

On Wednesday 10 October 2007 20:14, Ingo Molnar wrote:
> * Nicholas Miell <[email protected]> wrote:
> > Does CFS still generate the following sysbench graphs with 2.6.23, or
> > did that get fixed?
> >
> > http://people.freebsd.org/~kris/scaling/linux-pgsql.png
> > http://people.freebsd.org/~kris/scaling/linux-mysql.png
>
> as far as my testsystem goes, v2.6.23 beats v2.6.22.9 in sysbench:
>
> http://redhat.com/~mingo/misc/sysbench.jpg
>
> As you can see it in the graph, v2.6.23 schedules much more consistently
> too. [ v2.6.22 has a small (but potentially statistically insignificant)
> edge at 4-6 clients, and CFS has a slightly better peak (which is
> statistically insignificant). ]
>
> ( Config is at http://redhat.com/~mingo/misc/config, system is Core2Duo
> 1.83 GHz, mysql-5.0.45, glibc-2.6. Nothing fancy either in the config
> nor in the setup - everything is pretty close to the defaults. )
>
> i'm aware of a 2.6.21 vs. 2.6.23 sysbench regression report, and it
> apparently got resolved after various changes to the test environment:
>
> http://jeffr-tech.livejournal.com/10103.html
>
> " [<CFS>] has virtually no dropoff and performs better under load than
> the default 2.6.21 scheduler. " (paraphrased)

;) I think you snipped the important bit:

"the peak is terrible but it has virtually no dropoff and performs
better under load than the default 2.6.21 scheduler." (verbatim)

The dropoff under load was due to trivially avoided mmap_sem
contention in the kernel and glibc (and not-very-scalable mysql
heap locking), rather than specifically anything the scheduler
was doing wrong, I think (when the scheduler chose to start
preempting threads holding locks, then performance would tank.
Exactly when that point was reached, and what happens afterwards
was probably just luck.)

2007-10-12 05:46:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.23


* Nick Piggin <[email protected]> wrote:

> ;) I think you snipped the important bit:
>
> "the peak is terrible but it has virtually no dropoff and performs
> better under load than the default 2.6.21 scheduler." (verbatim)

hm, i understood that peak remark to be in reference to FreeBSD's
scheduler (which the FreeBSD guys are primarily interested in
obviously), not v2.6.21 - but i could be wrong.

In any case, there is indeed a regression with sysbench and a low number
of threads, and it's being fixed. The peak got improved visibly in
sched-devel:

http://people.redhat.com/mingo/misc/sysbench-sched-devel.jpg

but there is still some peak regression left, i'm testing a patch for
that.

Ingo

2007-10-12 06:47:51

by Nick Piggin

[permalink] [raw]
Subject: Re: Linux 2.6.23

On Friday 12 October 2007 15:46, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
> > ;) I think you snipped the important bit:
> >
> > "the peak is terrible but it has virtually no dropoff and performs
> > better under load than the default 2.6.21 scheduler." (verbatim)
>
> hm, i understood that peak remark to be in reference to FreeBSD's
> scheduler (which the FreeBSD guys are primarily interested in
> obviously), not v2.6.21 - but i could be wrong.

I think the Linux peak has always been roughly as good as their
best FreeBSD ones (eg. http://people.freebsd.org/~jeff/sysbench.png).
Obviously in that graph, Linux sucks because of the malloc/mmap_sem
issue. It also shows what he is calling the terrible CFS peak, I
guess.

In my own tests, after that was fixed, Linux's peak got even a bit
higher, so that's the benchmark for performance.


> In any case, there is indeed a regression with sysbench and a low number
> of threads, and it's being fixed. The peak got improved visibly in
> sched-devel:
>
> http://people.redhat.com/mingo/misc/sysbench-sched-devel.jpg
>
> but there is still some peak regression left, i'm testing a patch for
> that.

OK good. Once that's fixed, we'll hopefully be competitive with
FreeBSD again in this test :)

2007-10-12 12:14:52

by Bill Davidsen

[permalink] [raw]
Subject: Re: Linux 2.6.23

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>> ;) I think you snipped the important bit:
>>
>> "the peak is terrible but it has virtually no dropoff and performs
>> better under load than the default 2.6.21 scheduler." (verbatim)
>
> hm, i understood that peak remark to be in reference to FreeBSD's
> scheduler (which the FreeBSD guys are primarily interested in
> obviously), not v2.6.21 - but i could be wrong.
>
> In any case, there is indeed a regression with sysbench and a low number
> of threads, and it's being fixed. The peak got improved visibly in
> sched-devel:
>
> http://people.redhat.com/mingo/misc/sysbench-sched-devel.jpg
>
> but there is still some peak regression left, i'm testing a patch for
> that.
>
There's one important bit missing from that graph, the
2.6.23-SCHED_BATCH values. Without that we can't tell how much
improvement is from sched-devel and how much from SCHED_BATCH. Clearly
2.6.23 is better than 2.6.22.any in this test, the locking issues seem
to dominate that difference to the point that nothing else would be
informative.

This weekend I have to do some building of kernels for various machines,
so I intend to run some builds SCHED_BATCH and some will just run. If I
find anything interesting I'll report.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot