2003-05-11 05:46:32

by Martin J. Bligh

[permalink] [raw]
Subject: 2.5.69-mjb1

The patchset contains mainly scalability and NUMA stuff, and anything
else that stops things from irritating me. It's meant to be pretty stable,
not so much a testing ground for new stuff.

I'd be very interested in feedback from anyone willing to test on any
platform, however large or small.

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.69/patch-2.5.69-mjb1.bz2

additional patches that can be applied if desired:

(these two form the qlogic feral driver)
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm1/broken-out/linux-isp.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm1/broken-out/isp-update-1.patch

Since 2.5.68-mjb3 (~ = changed, + = added, - = dropped)

Notes:

Now in Linus' tree:
- node_balance Martin J. Bligh
Turn on node balance - it's not overagressive anymore
- sisfix (equivalent fix) Martin J. Bligh
Fix warning & bug in sis900 driver
- diskstats Rick Lindsley
Make disk stats available in /proc so they scale to lotsa disks.
- lost_tick_fix John Stultz
Lost tick stuff
- dentry_stat_fix Maneesh
corrects the dentry_stat.nr_unused calculation
- follow_hugetlb_page Bill Irwin
Fix follow_hugetlb_page()
- hugetlb_mem_enough Bill Irwin
fixes an overflow when there is more than 4GB of hugetlb
- aio_fix Badari Pulavarty
Fix something in AIO
- oom_locking Bill Irwin / Robert Love
Fix OOM locking


Dropped until I get an updated version because I'm too idle to merge them ;-)
- separate_pmd Dave Hansen
Separate kernel pmd per process
- banana_split Dave Hansen
Provide non PMD aligned splits with PAE mode (eg 3.5:0.5)


New:
+ fsaio_vs_aio_fix Janet Morgan (?)
Fix up a fistfight between AIO and fsAIO.


Pending:
Hyperthreaded scheduler (Ingo Molnar)
scheduler callers profiling (Anton or Bill Hartner)
Child runs first (akpm)
Kexec
e1000 fixes
Update the lost timer ticks code

Present in this patch:

early_printk Dave Hansen et al.
Allow printk before console_init

confighz Andrew Morton / Dave Hansen
Make HZ a config option of 100 Hz or 1000 Hz

config_page_offset Dave Hansen / Andrea
Make PAGE_OFFSET a config option

numameminfo Martin Bligh / Keith Mannthey
Expose NUMA meminfo information under /proc/meminfo.numa

schedstat Rick Lindsley
Provide stats about the scheduler under /proc/schedstat

schedstat2 Rick Lindsley
Provide more stats about the scheduler under /proc/schedstat

schedstat-scripts Rick Lindsley
Provide some scripts for schedstat analysis under scripts/

sched_tunables Robert Love
Provide tunable parameters for the scheduler (+ NUMA scheduler)

irq_affinity Martin J. Bligh
Workaround for irq_affinity on clustered apic mode systems (eg x440)

partial_objrmap Dave McCracken
Object based rmap for filebacked pages.

objrmap_fix Dave McCracken
Fix detection of anon pages

objrmap_fixes Dave McCracken / Hugh Dickins
Fix up some mapped sizing bugs in objrmap

objrmap_mapcount Dave McCracken
Fix up some mapped sizing bugs in objrmap

kgdb Andrew Morton
The older version of kgdb, synched with 2.5.54-mm1

kprobes Vamsi Krishna S
Add kernel probes hooks to the kernel

thread_info_cleanup (4K stacks pt 1) Dave Hansen / Ben LaHaise
Prep work to reduce kernel stacks to 4K

interrupt_stacks (4K stacks pt 2) Dave Hansen / Ben LaHaise
Create a per-cpu interrupt stack.

stack_usage_check (4K stacks pt 3) Dave Hansen / Ben LaHaise
Check for kernel stack overflows.

4k_stack (4K stacks pt 4) Dave Hansen
Config option to reduce kernel stacks to 4K

fix_kgdb Dave Hansen
Fix interaction between kgdb and 4K stacks

stacks_from_slab William Lee Irwin
Take kernel stacks from the slab cache, not page allocation.

thread_under_page William Lee Irwin
Fix THREAD_SIZE < PAGE_SIZE case

lkcd LKCD team
Linux kernel crash dump support

percpu_loadavg Martin J. Bligh
Provide per-cpu loadaverages, and real load averages

spinlock_inlining Andrew Morton
Inline spinlocks for profiling. Made into a ugly config option by me.

summit_pcimap Matt Dobson
Provide pci bus -> node mapping for x440

reiserfs_dio Mingming Cao
DIO for Reiserfs

sched_interactive Ingo Molnar
Bugfix for interactive scheduler

kgdb_cleanup Martin J. Bligh
Stop kgdb renaming schedule to do_schedule when it's not even enabled

acenic_fix Martin J. Bligh
Fix warning in acenic driver

local_balance_exec Martin J. Bligh
Modify balance_exec to use node-local queues when idle

membind Matt Dobson
NUMA memory binding API

tcp_speedup Martin J. Bligh
Speedup TCP (avoid double copy) as suggested by Linus

early_printk_fix Keith Mannthey
Fix commandline parsing in early printk (yay!)

disable preempt Martin J. Bligh
I broke preempt somehow, temporarily disable it to stop accidents

sched_idle Martin J. Bligh
Call load_balance with proper idle flag (pointed out by John Hawkes)

ppc64 fixes Anton Blanchard
Various PPC64 fixes / updates

numameminfo fix Martin J. Bligh
Correct /proc/meminfo.numa for zholes_size.

more_async Andrew Morton
Make MS_ASYNC more async

config_debug Martin J. Bligh
Make '-g' for the kernel a config option

akpm_bear_pit Andrew Morton
Add a printk for some buffer error I was hitting

32bit_dev_t Andries Brouwer
Make dev_t 32 bit

dynamic_hd_struct Badari Pulavarty
Allocate hd_structs dynamically

devfs_fixup Badari Pulavarty ?
Fix some random thing in devfs

iosched_hashes Badari Pulavarty
Twiddle with the iosched hash tables for fun & profit

lotsa_sds Badari Pulavarty
Create some insane number of sds

per_node_idt Zwane Mwaikambo
Per node IDT so we can do silly numbers of IO-APICs on NUMA-Q

tulip_warning Dave Hansen
Fix silly warning in tulip driver for PAE mode

warn_e1000 Dave Hansen
Kill e1000 warning

pidmaps_nodepages Dave Hansen
Provide per-pid node mem usage info

config_numasched Dave Hansen
Turn NUMA scheduler into a config option

lockmeter_tytso Ted Tso
Fix lockmeter

aiofix2 Mingming Cao
fixed a bug in ioctx_alloc()

config_irqbal Keith Mannthey
Make irqbalance a config option

fs_aio_1_retry Suparna Bhattacharya
Filesystem aio. Chapter 1

fs_aio_2_read Suparna Bhattacharya
Filesystem aio. Chapter 2

fs_aio_3_write Suparna Bhattacharya
Filesystem aio. Chapter 3

fs_aio_4_down_wq Suparna Bhattacharya
Filesystem aio. Chapter 4

fs_aio_5_wrdown_wq Suparna Bhattacharya
Filesystem aio. Chapter 5

fs_aio_6_bread_wq Suparna Bhattacharya
Filesystem aio. Chapter 6

fs_aio_7_ext2getblk_wq Suparna Bhattacharya
Filesystem aio. Chapter 7

fs_aio_8_down_wq-ppc64 Suparna Bhattacharya
Filesystem aio. Chapter 8

fs_aio_9_down_wq-x86_64 Suparna Bhattacharya
Filesystem aio. Chapter 9

pae_vmalloc Dave Hansen
Stop PAE mode from piddling with PGD entries from vmalloc.

-mjb Martin J. Bligh
Add a tag to the makefile


2003-05-11 13:29:30

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

Any particular reason why CONFIG_PREEMPT is commented out?

Zwane
--
function.linuxpower.ca

2003-05-11 15:14:39

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

> Any particular reason why CONFIG_PREEMPT is commented out?

Yeah, 'cause it's broken ;-) If someone can figure out what broke it,
(something in my tree) I'll turn it back on. Else it stays disabled
as a footguard for poor innocents ;-)

M.

2003-05-12 13:17:02

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

On Sat, May 10, 2003 at 08:44:09PM -0700, Martin J. Bligh wrote:
> thread_info_cleanup (4K stacks pt 1) Dave Hansen / Ben LaHaise
> Prep work to reduce kernel stacks to 4K
> interrupt_stacks (4K stacks pt 2) Dave Hansen / Ben LaHaise
> Create a per-cpu interrupt stack.
> stack_usage_check (4K stacks pt 3) Dave Hansen / Ben LaHaise
> Check for kernel stack overflows.
> 4k_stack (4K stacks pt 4) Dave Hansen
> Config option to reduce kernel stacks to 4K


diff -urpN linux-2.5.69/include/asm-i386/processor.h kstk-2.5.69-1/include/asm-i386/processor.h
--- linux-2.5.69/include/asm-i386/processor.h 2003-05-04 16:53:00.000000000 -0700
+++ kstk-2.5.69-1/include/asm-i386/processor.h 2003-05-12 06:05:39.000000000 -0700
@@ -470,8 +470,8 @@ extern int kernel_thread(int (*fn)(void
extern unsigned long thread_saved_pc(struct task_struct *tsk);

unsigned long get_wchan(struct task_struct *p);
-#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
-#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
+#define KSTK_EIP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 5])
+#define KSTK_ESP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 2])

struct microcode {
unsigned int hdrver;

2003-05-12 14:42:36

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

--On Monday, May 12, 2003 06:29:39 -0700 William Lee Irwin III <[email protected]> wrote:

> On Sat, May 10, 2003 at 08:44:09PM -0700, Martin J. Bligh wrote:
>> thread_info_cleanup (4K stacks pt 1) Dave Hansen / Ben LaHaise
>> Prep work to reduce kernel stacks to 4K
>> interrupt_stacks (4K stacks pt 2) Dave Hansen / Ben LaHaise
>> Create a per-cpu interrupt stack.
>> stack_usage_check (4K stacks pt 3) Dave Hansen / Ben LaHaise
>> Check for kernel stack overflows.
>> 4k_stack (4K stacks pt 4) Dave Hansen
>> Config option to reduce kernel stacks to 4K
>
>
> diff -urpN linux-2.5.69/include/asm-i386/processor.h kstk-2.5.69-1/include/asm-i386/processor.h
> --- linux-2.5.69/include/asm-i386/processor.h 2003-05-04 16:53:00.000000000 -0700
> +++ kstk-2.5.69-1/include/asm-i386/processor.h 2003-05-12 06:05:39.000000000 -0700
> @@ -470,8 +470,8 @@ extern int kernel_thread(int (*fn)(void
> extern unsigned long thread_saved_pc(struct task_struct *tsk);
>
> unsigned long get_wchan(struct task_struct *p);
> -#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
> -#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
> +#define KSTK_EIP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 5])
> +#define KSTK_ESP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 2])
>
> struct microcode {
> unsigned int hdrver;

Can I get some sort of vague explanation please? ;-)

M.

2003-05-12 14:50:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

On Mon, May 12, 2003 at 05:40:55AM -0700, Martin J. Bligh wrote:
> Can I get some sort of vague explanation please? ;-)

How obvious does it have to be?

Those are trying to fish out the 2nd and 5th words from the top of the
stack. Magic numbers stopped working; symbolic constants save the day.


-- wli

2003-05-12 15:00:25

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

William Lee Irwin III wrote:
> On Mon, May 12, 2003 at 05:40:55AM -0700, Martin J. Bligh wrote:
>
>>Can I get some sort of vague explanation please? ;-)
>
> How obvious does it have to be?

More obvious than that :)

> Those are trying to fish out the 2nd and 5th words from the top of the
> stack. Magic numbers stopped working; symbolic constants save the day.

I guess I'm not understanding _why_ they're guaranteed to be there.

--
Dave Hansen
[email protected]

2003-05-12 14:54:28

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

Martin J. Bligh wrote:
> --On Monday, May 12, 2003 06:29:39 -0700 William Lee Irwin III <[email protected]> wrote:
>
>
>>On Sat, May 10, 2003 at 08:44:09PM -0700, Martin J. Bligh wrote:
>>
>>>thread_info_cleanup (4K stacks pt 1) Dave Hansen / Ben LaHaise
>>> Prep work to reduce kernel stacks to 4K
>>>interrupt_stacks (4K stacks pt 2) Dave Hansen / Ben LaHaise
>>> Create a per-cpu interrupt stack.
>>>stack_usage_check (4K stacks pt 3) Dave Hansen / Ben LaHaise
>>> Check for kernel stack overflows.
>>>4k_stack (4K stacks pt 4) Dave Hansen
>>> Config option to reduce kernel stacks to 4K
>>
>>
>>diff -urpN linux-2.5.69/include/asm-i386/processor.h kstk-2.5.69-1/include/asm-i386/processor.h
>>--- linux-2.5.69/include/asm-i386/processor.h 2003-05-04 16:53:00.000000000 -0700
>>+++ kstk-2.5.69-1/include/asm-i386/processor.h 2003-05-12 06:05:39.000000000 -0700
>>@@ -470,8 +470,8 @@ extern int kernel_thread(int (*fn)(void
>> extern unsigned long thread_saved_pc(struct task_struct *tsk);
>>
>> unsigned long get_wchan(struct task_struct *p);
>>-#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
>>-#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
>>+#define KSTK_EIP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 5])
>>+#define KSTK_ESP(task) (((unsigned long *)(task)->thread_info)[THREAD_SIZE/sizeof(unsigned long) - 2])
>>
>> struct microcode {
>> unsigned int hdrver;
>
> Can I get some sort of vague explanation please? ;-)

Wow, that's intuitive :)

They're trying to access the variables that have been pushed onto the
top of the stack. The thread_info field points to the bottom of the
kernel's stack (no matter how big it is). I don't know where the -5 and
-2 come from. It needs a big, fat stinking comment.

--
Dave Hansen
[email protected]

2003-05-12 15:09:11

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

> Me
>>> Can I get some sort of vague explanation please? ;-)
>
> Bill
> How obvious does it have to be?

More obvious than that, if I've never looked at it before ;-)

> Dave
> They're trying to access the variables that have been pushed onto the
> top of the stack. The thread_info field points to the bottom of the
> kernel's stack (no matter how big it is). I don't know where the -5 and
> -2 come from. It needs a big, fat stinking comment.

OK, so maybe I'm still asleep, but I don't see why the hardcoded
magic constant (grrr) is 4096 in mainline, when the stacksize is 8K.
Presumably the 1019*4 makes up the rest of it? Maybe the real question
is what the hell was whoever wrote that in the first place smoking ? ;-)
Why on earth would you skip halfway through the stack with one stupid
magic constant, and then the rest of the way with another?

Perhaps I'm just making the mistake of assuming the existing code was
sane ... I was just uncomfortable because I didn't understand why it was
like that before, I guess.

> Dave
> I don't know where the -5 and
> -2 come from. It needs a big, fat stinking comment.
>
> Bill
> Those are trying to fish out the 2nd and 5th words from the top of the
> stack. Magic numbers stopped working; symbolic constants save the day.

Right, I see what you're doing now.

But would be nice (as Dave said) if those magic numbers were no longer
magic numbers (as you did for the other part of it), if that's possible,
or commented. Not that you haven't vastly improved it already ;-)

Thanks,

M.

2003-05-12 15:24:00

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.5.69-mjb1

Martin J. Bligh wrote:
> OK, so maybe I'm still asleep, but I don't see why the hardcoded
> magic constant (grrr) is 4096 in mainline, when the stacksize is 8K.
> Presumably the 1019*4 makes up the rest of it? Maybe the real question
> is what the hell was whoever wrote that in the first place smoking ? ;-)
> Why on earth would you skip halfway through the stack with one stupid
> magic constant, and then the rest of the way with another?

You can go ask the author:

http://linus.bkbits.net:8080/linux-2.5/diffs/include/asm-i386/[email protected]?nav=index.html|src/|src/include|src/include/asm-i386|hist/include/asm-i386/processor.h


--
Dave Hansen
[email protected]

2003-05-12 15:44:52

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1



--On Monday, May 12, 2003 08:34:13 -0700 Dave Hansen <[email protected]> wrote:

> Martin J. Bligh wrote:
>> OK, so maybe I'm still asleep, but I don't see why the hardcoded
>> magic constant (grrr) is 4096 in mainline, when the stacksize is 8K.
>> Presumably the 1019*4 makes up the rest of it? Maybe the real question
>> is what the hell was whoever wrote that in the first place smoking ? ;-)
>> Why on earth would you skip halfway through the stack with one stupid
>> magic constant, and then the rest of the way with another?
>
> You can go ask the author:
>
> http://linus.bkbits.net:8080/linux-2.5/diffs/include/asm-i386/[email protected]?nav=index.html|src/|src/include|src/include/asm-i386|hist/include/asm-i386/processor.h

-#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1019])
-#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)))[1022])
...
+#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
+#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])

Nope, not his fault, really.

M.

2003-05-12 20:39:03

by Adrian Bunk

[permalink] [raw]
Subject: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

<-- snip -->

...
gcc -Wp,-MD,drivers/dump/.dump_blockdev.o.d -D__KERNEL__ -Iinclude
-Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -pipe
-mpreferred-stack-boundary=2 -march=k6 -Iinclude/asm-i386/mach-default
-fomit-frame-pointer -nostdinc -iwithprefix include -DKBUILD_BASENAME=dump_blockdev
\-DKBUILD_MODNAME=dump_blockdev -c -o drivers/dump/dump_blockdev.o
drivers/dump/dump_blockdev.c
drivers/dump/dump_blockdev.c: In function `dump_block_silence':
drivers/dump/dump_blockdev.c:264: warning: implicit declaration of function `blk_queue_empty'
...
386/oprofile/built-in.o net/built-in.o --end-group -o .tmp_vmlinux1
drivers/built-in.o(.text+0x77edaf): In function `dump_block_silence':
: undefined reference to `blk_queue_empty'
...
make: *** [.tmp_vmlinux1] Error 1

<-- snip -->

This is the only occurence of blk_queue_empty in the whole kernel tree.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2003-05-13 01:11:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.69-mjb1

On Mon, May 12, 2003 at 08:05:15AM -0700, Dave Hansen wrote:
> Wow, that's intuitive :)
> They're trying to access the variables that have been pushed onto the
> top of the stack. The thread_info field points to the bottom of the
> kernel's stack (no matter how big it is). I don't know where the -5 and
> -2 come from. It needs a big, fat stinking comment.

I'm not 100% convinced it DTRT on modern kernels. I vaguely wonder if
the following would be more appropriate. Shame the typedef isn't there
yet; the _struct suffix is an eyesore.


-- wli


diff -prauN linux-2.5.69/include/asm-i386/processor.h kstk-2.5.69-1/include/asm-i386/processor.h
--- linux-2.5.69/include/asm-i386/processor.h 2003-05-04 16:53:00.000000000 -0700
+++ kstk-2.5.69-1/include/asm-i386/processor.h 2003-05-12 14:02:13.000000000 -0700
@@ -96,9 +96,9 @@ extern struct cpuinfo_x86 cpu_data[];

extern char ignore_fpu_irq;

-extern void identify_cpu(struct cpuinfo_x86 *);
-extern void print_cpu_info(struct cpuinfo_x86 *);
-extern void dodgy_tsc(void);
+void identify_cpu(struct cpuinfo_x86 *);
+void print_cpu_info(struct cpuinfo_x86 *);
+void dodgy_tsc(void);

/*
* EFLAGS bits
@@ -457,21 +457,21 @@ struct task_struct;
struct mm_struct;

/* Free all resources held by a thread. */
-extern void release_thread(struct task_struct *);
+void release_thread(struct task_struct *);

/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct *tsk);
+void prepare_to_copy(struct task_struct *);

/*
* create a kernel thread without removing it from tasklists
*/
-extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
+int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);

-extern unsigned long thread_saved_pc(struct task_struct *tsk);
+unsigned long thread_saved_pc(struct task_struct *);

-unsigned long get_wchan(struct task_struct *p);
-#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
-#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
+unsigned long get_wchan(struct task_struct *);
+#define KSTK_EIP(task) ((task)->thread.eip)
+#define KSTK_ESP(task) ((task)->thread.esp)

struct microcode {
unsigned int hdrver;

2003-05-13 05:43:03

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.69-mjb1

>> Wow, that's intuitive :)
>> They're trying to access the variables that have been pushed onto the
>> top of the stack. The thread_info field points to the bottom of the
>> kernel's stack (no matter how big it is). I don't know where the -5 and
>> -2 come from. It needs a big, fat stinking comment.
>
> I'm not 100% convinced it DTRT on modern kernels. I vaguely wonder if
> the following would be more appropriate. Shame the typedef isn't there
> yet; the _struct suffix is an eyesore.


So are the new bits of the patch related to the KSTK_E* bit?
They don't seem to be ... however, this bit looks really good:

> -#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
> -#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
> +#define KSTK_EIP(task) ((task)->thread.eip)
> +#define KSTK_ESP(task) ((task)->thread.esp)

Can I assume it's tested, or does it need someone to do that?

Thanks,

M.

2003-05-13 05:56:11

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'



--Adrian Bunk <[email protected]> wrote (on Monday, May 12, 2003 22:51:40 +0200):

> <-- snip -->
>
> ...
> gcc -Wp,-MD,drivers/dump/.dump_blockdev.o.d -D__KERNEL__ -Iinclude
> -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -pipe
> -mpreferred-stack-boundary=2 -march=k6 -Iinclude/asm-i386/mach-default
> -fomit-frame-pointer -nostdinc -iwithprefix include -DKBUILD_BASENAME=dump_blockdev
> \-DKBUILD_MODNAME=dump_blockdev -c -o drivers/dump/dump_blockdev.o
> drivers/dump/dump_blockdev.c
> drivers/dump/dump_blockdev.c: In function `dump_block_silence':
> drivers/dump/dump_blockdev.c:264: warning: implicit declaration of function `blk_queue_empty'
> ...
> 386/oprofile/built-in.o net/built-in.o --end-group -o .tmp_vmlinux1
> drivers/built-in.o(.text+0x77edaf): In function `dump_block_silence':
> : undefined reference to `blk_queue_empty'
> ...
> make: *** [.tmp_vmlinux1] Error 1
>
> <-- snip -->
>
> This is the only occurence of blk_queue_empty in the whole kernel tree.

Thanks Adrian ... this is LKCD stuff, maybe Suparna / Bharata can fix it?
Looks like it disappeared in 2.5.67 or so.

M.

2003-05-13 06:15:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.69-mjb1

At some point in the past, someone wrote:
>> I'm not 100% convinced it DTRT on modern kernels. I vaguely wonder if
>> the following would be more appropriate. Shame the typedef isn't there
>> yet; the _struct suffix is an eyesore.

On Mon, May 12, 2003 at 08:41:22PM -0700, Martin J. Bligh wrote:
> So are the new bits of the patch related to the KSTK_E* bit?
> They don't seem to be ... however, this bit looks really good:

Random incidental cleanups.

At some point in the past, someone wrote:
>> -#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
>> -#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
>> +#define KSTK_EIP(task) ((task)->thread.eip)
>> +#define KSTK_ESP(task) ((task)->thread.esp)

On Mon, May 12, 2003 at 08:41:22PM -0700, Martin J. Bligh wrote:
> Can I assume it's tested, or does it need someone to do that?

Not tested. AFAICT it's returning random garbage somewhere near the
beginning of the kernel stack now but haven't checked the answers
generated at runtime to see if they look valid. This at least vaguely
resembles the intent of the code, but it might actually want
(task)->thread.esp0 or other such nonsense and so on now that I think
about it some more.

At the very least it looks plausible and doesn't perform accesses that
aren't actually on the stack as they're supposed to be when you shrink
the stack to 4KB. AIUI those kinds of out-of-bounds accesses to
potentially freed memory are oopsable with some debugging patches like
manfred's that unmaps freed pages.


-- wli

2003-05-13 06:29:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.69-mjb1

> > -#define KSTK_EIP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1019])
> > -#define KSTK_ESP(tsk) (((unsigned long *)(4096+(unsigned long)(tsk)->thread_info))[1022])
> > +#define KSTK_EIP(task) ((task)->thread.eip)
> > +#define KSTK_ESP(task) ((task)->thread.esp)
>
> Can I assume it's tested, or does it need someone to do that?

It's broken. It will report the kernel values, not the user values like the previous
version and also get it wrong for the current task

(KSTK_* is really a misnomer, it should be USTK_*. WCHAN is handled by a different
mechanism)

Something like this should work (untested)

#define KSTK_PTREGS(tsk) \
((struct pt_regs *)((unsigned long)((tsk)->thread_info) + THREAD_SIZE) - 1)
#define KSTK_EIP(tsk) (KSTK_PTREGS(tsk)->eip)
#define KSTK_ESP(tsk) (KSTK_PTREGS(tsk)->esp)

-Andi

2003-05-13 07:14:13

by Bharata B Rao

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Mon, May 12, 2003 at 08:51:05PM -0700, Martin J. Bligh wrote:
>
> --Adrian Bunk <[email protected]> wrote (on Monday, May 12, 2003 22:51:40 +0200):
>
> > <-- snip -->
> >
> > ...
> > gcc -Wp,-MD,drivers/dump/.dump_blockdev.o.d -D__KERNEL__ -Iinclude
> > -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -pipe
> > -mpreferred-stack-boundary=2 -march=k6 -Iinclude/asm-i386/mach-default
> > -fomit-frame-pointer -nostdinc -iwithprefix include -DKBUILD_BASENAME=dump_blockdev
> > \-DKBUILD_MODNAME=dump_blockdev -c -o drivers/dump/dump_blockdev.o
> > drivers/dump/dump_blockdev.c
> > drivers/dump/dump_blockdev.c: In function `dump_block_silence':
> > drivers/dump/dump_blockdev.c:264: warning: implicit declaration of function `blk_queue_empty'
> > ...
> > 386/oprofile/built-in.o net/built-in.o --end-group -o .tmp_vmlinux1
> > drivers/built-in.o(.text+0x77edaf): In function `dump_block_silence':
> > : undefined reference to `blk_queue_empty'
> > ...
> > make: *** [.tmp_vmlinux1] Error 1
> >
> > <-- snip -->
> >
> > This is the only occurence of blk_queue_empty in the whole kernel tree.
>
> Thanks Adrian ... this is LKCD stuff, maybe Suparna / Bharata can fix it?
> Looks like it disappeared in 2.5.67 or so.
>

Martin,

I have already sent you a fix for this. Anyway here it is again.

--- linux-2.5.69/drivers/dump/dump_blockdev.c.orig Tue May 13 12:30:49 2003
+++ linux-2.5.69/drivers/dump/dump_blockdev.c Tue May 13 12:34:09 2003
@@ -261,7 +261,7 @@

/* For now we assume we have the device to ourselves */
/* Just a quick sanity check */
- if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
+ if (elv_next_request(bdev_get_queue(dump_bdev->bdev))) {
/* i/o in flight - safer to quit */
return -EBUSY;
}

Regards,
Bharata.

2003-05-13 16:04:39

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

> I have already sent you a fix for this. Anyway here it is again.

Oops, I must have dropped it - thanks, I'll stick it in the next release.

> --- linux-2.5.69/drivers/dump/dump_blockdev.c.orig Tue May 13 12:30:49 2003
> +++ linux-2.5.69/drivers/dump/dump_blockdev.c Tue May 13 12:34:09 2003
> @@ -261,7 +261,7 @@
>
> /* For now we assume we have the device to ourselves */
> /* Just a quick sanity check */
> - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> + if (elv_next_request(bdev_get_queue(dump_bdev->bdev))) {
> /* i/o in flight - safer to quit */
> return -EBUSY;
> }

2003-05-13 18:06:15

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Tue, May 13 2003, Martin J. Bligh wrote:
> > I have already sent you a fix for this. Anyway here it is again.
>
> Oops, I must have dropped it - thanks, I'll stick it in the next release.
>
> > --- linux-2.5.69/drivers/dump/dump_blockdev.c.orig Tue May 13 12:30:49 2003
> > +++ linux-2.5.69/drivers/dump/dump_blockdev.c Tue May 13 12:34:09 2003
> > @@ -261,7 +261,7 @@
> >
> > /* For now we assume we have the device to ourselves */
> > /* Just a quick sanity check */
> > - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> > + if (elv_next_request(bdev_get_queue(dump_bdev->bdev))) {
> > /* i/o in flight - safer to quit */
> > return -EBUSY;
> > }

this looks horribly racy (of the io scheduler internals corrupting
kind), I don't see you holding the queue lock here. some io schedulers
do non-significant amount of work inside they next_request functions,
moving from back-end lists to dispatch queue.

--
Jens Axboe

2003-05-14 08:07:03

by Bharata B Rao

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Tue, May 13, 2003 at 08:11:55PM +0200, Jens Axboe wrote:
> > >
> > > /* For now we assume we have the device to ourselves */
> > > /* Just a quick sanity check */
> > > - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> > > + if (elv_next_request(bdev_get_queue(dump_bdev->bdev))) {
> > > /* i/o in flight - safer to quit */
> > > return -EBUSY;
> > > }
>
> this looks horribly racy (of the io scheduler internals corrupting
> kind), I don't see you holding the queue lock here. some io schedulers
> do non-significant amount of work inside they next_request functions,
> moving from back-end lists to dispatch queue.
>

Jens,

All we want to do here is to check if there are requests in the
queue. Hence thinking of using elv_queue_empty(). Do you think
we still need to acquire queue lock for this ? This code will be
run when we have stopped everything else in other cpus by putting
them into spin.

--- 2569+mjb1/drivers/dump/dump_blockdev.c.orig Wed May 14 13:23:36 2003
+++ 2569+mjb1/drivers/dump/dump_blockdev.c Wed May 14 13:24:58 2003
@@ -258,10 +258,11 @@
dump_block_silence(struct dump_dev *dev)
{
struct dump_blockdev *dump_bdev = DUMP_BDEV(dev);
+ struct request_queue *q = bdev_get_queue(dump_bdev->bdev);

/* For now we assume we have the device to ourselves */
/* Just a quick sanity check */
- if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
+ if (!elv_queue_empty(q)) {
/* i/o in flight - safer to quit */
return -EBUSY;
}

Regards,
Bharata.

2003-05-14 08:23:02

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Wed, May 14 2003, Bharata B Rao wrote:
> On Tue, May 13, 2003 at 08:11:55PM +0200, Jens Axboe wrote:
> > > >
> > > > /* For now we assume we have the device to ourselves */
> > > > /* Just a quick sanity check */
> > > > - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> > > > + if (elv_next_request(bdev_get_queue(dump_bdev->bdev))) {
> > > > /* i/o in flight - safer to quit */
> > > > return -EBUSY;
> > > > }
> >
> > this looks horribly racy (of the io scheduler internals corrupting
> > kind), I don't see you holding the queue lock here. some io schedulers
> > do non-significant amount of work inside they next_request functions,
> > moving from back-end lists to dispatch queue.
> >
>
> Jens,
>
> All we want to do here is to check if there are requests in the
> queue. Hence thinking of using elv_queue_empty(). Do you think
> we still need to acquire queue lock for this ? This code will be
> run when we have stopped everything else in other cpus by putting
> them into spin.

That really has to be locked down as well. For your purpose, I think the
use of elv_queue_empty() is much better even though it really is an
internal function. The problem mainly comes from AS, that can have non
empty queue but still return NULL in elv_next_request().

But yes, it needs to be locked. If you have pinned the other CPUs, then
I suppose it should work. But it's still a violation of the locking
rules, and one would get in trouble dropping the queue lock from the io
scheduler elevator_queue_empty_fn. No one does that currently, but... So
please take the lock.

--
Jens Axboe

2003-05-15 04:04:09

by Bharata B Rao

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Wed, May 14, 2003 at 10:32:24AM +0200, Jens Axboe wrote:
>
> That really has to be locked down as well. For your purpose, I think the
> use of elv_queue_empty() is much better even though it really is an
> internal function. The problem mainly comes from AS, that can have non
> empty queue but still return NULL in elv_next_request().
>
> But yes, it needs to be locked. If you have pinned the other CPUs, then
> I suppose it should work. But it's still a violation of the locking
> rules, and one would get in trouble dropping the queue lock from the io
> scheduler elevator_queue_empty_fn. No one does that currently, but... So
> please take the lock.
>

Ok, Now we try to acquire the lock and refuse to dump if we don't get
the lock.

--- 2569+mjb1/drivers/dump/dump_blockdev.c.orig Wed May 14 13:23:36 2003
+++ 2569+mjb1/drivers/dump/dump_blockdev.c Thu May 15 09:26:12 2003
@@ -258,10 +258,19 @@
dump_block_silence(struct dump_dev *dev)
{
struct dump_blockdev *dump_bdev = DUMP_BDEV(dev);
+ struct request_queue *q = bdev_get_queue(dump_bdev->bdev);
+ int ret;
+
+ /* If we can't get request queue lock, refuse to take the dump */
+ if (!spin_trylock(q->queue_lock))
+ return -EBUSY;
+
+ ret = elv_queue_empty(q);
+ spin_unlock(q->queue_lock);

/* For now we assume we have the device to ourselves */
/* Just a quick sanity check */
- if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
+ if (!ret) {
/* i/o in flight - safer to quit */
return -EBUSY;
}

Regards,
Bharata.

2003-05-15 07:20:30

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Thu, May 15 2003, Bharata B Rao wrote:
> On Wed, May 14, 2003 at 10:32:24AM +0200, Jens Axboe wrote:
> >
> > That really has to be locked down as well. For your purpose, I think the
> > use of elv_queue_empty() is much better even though it really is an
> > internal function. The problem mainly comes from AS, that can have non
> > empty queue but still return NULL in elv_next_request().
> >
> > But yes, it needs to be locked. If you have pinned the other CPUs, then
> > I suppose it should work. But it's still a violation of the locking
> > rules, and one would get in trouble dropping the queue lock from the io
> > scheduler elevator_queue_empty_fn. No one does that currently, but... So
> > please take the lock.
> >
>
> Ok, Now we try to acquire the lock and refuse to dump if we don't get
> the lock.
>
> --- 2569+mjb1/drivers/dump/dump_blockdev.c.orig Wed May 14 13:23:36 2003
> +++ 2569+mjb1/drivers/dump/dump_blockdev.c Thu May 15 09:26:12 2003
> @@ -258,10 +258,19 @@
> dump_block_silence(struct dump_dev *dev)
> {
> struct dump_blockdev *dump_bdev = DUMP_BDEV(dev);
> + struct request_queue *q = bdev_get_queue(dump_bdev->bdev);
> + int ret;
> +
> + /* If we can't get request queue lock, refuse to take the dump */
> + if (!spin_trylock(q->queue_lock))
> + return -EBUSY;
> +
> + ret = elv_queue_empty(q);
> + spin_unlock(q->queue_lock);
>
> /* For now we assume we have the device to ourselves */
> /* Just a quick sanity check */
> - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> + if (!ret) {
> /* i/o in flight - safer to quit */
> return -EBUSY;
> }

Are interrupts already disabled at this point? If yes, then it looks
fine.

--
Jens Axboe

2003-05-15 09:15:56

by Bharata B Rao

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

On Thu, May 15, 2003 at 09:29:37AM +0200, Jens Axboe wrote:
> >
> > --- 2569+mjb1/drivers/dump/dump_blockdev.c.orig Wed May 14 13:23:36 2003
> > +++ 2569+mjb1/drivers/dump/dump_blockdev.c Thu May 15 09:26:12 2003
> > @@ -258,10 +258,19 @@
> > dump_block_silence(struct dump_dev *dev)
> > {
> > struct dump_blockdev *dump_bdev = DUMP_BDEV(dev);
> > + struct request_queue *q = bdev_get_queue(dump_bdev->bdev);
> > + int ret;
> > +
> > + /* If we can't get request queue lock, refuse to take the dump */
> > + if (!spin_trylock(q->queue_lock))
> > + return -EBUSY;
> > +
> > + ret = elv_queue_empty(q);
> > + spin_unlock(q->queue_lock);
> >
> > /* For now we assume we have the device to ourselves */
> > /* Just a quick sanity check */
> > - if (!blk_queue_empty(bdev_get_queue(dump_bdev->bdev))) {
> > + if (!ret) {
> > /* i/o in flight - safer to quit */
> > return -EBUSY;
> > }
>
> Are interrupts already disabled at this point? If yes, then it looks
> fine.
>

Yes, interrupts are disabled at this point.

Martin, Could you please take this in, while I push this change to lkcd cvs.

Regards,
Bharata.

2003-05-15 14:57:45

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.69-mjb1: undefined reference to `blk_queue_empty'

>> Are interrupts already disabled at this point? If yes, then it looks
>> fine.
>>
>
> Yes, interrupts are disabled at this point.
>
> Martin, Could you please take this in, while I push this change to lkcd cvs.

Yup, will add it to next release.