This may be the last update for a week (unless there's a quick bug to
fix before next morning :). I wanted to ship async-io and largepage
support now even if they're not tested well yet, in order to possibly
get feedback while I'll be away.
If you need to use it in production and you need to go completely safe
you should backout (patch -R) these three patches in this below order
(then it'll be for sure as stable as any previous -aa):
9910_shm-largepage-1.gz
9900_aio-API-x86-2
9900_aio-2.gz
But even the above cannot introduce instability unless you actively use
those features (see below how to enable largepage support). (so as worse
it could be a DoS local security problem if something oopses in aio or
similar)
However for any long term permanent installation you should at least
backout the:
9900_aio-API-x86-1
for API reasons, in case in 2.5 those syscall numbers will be assigned
to different functions. I guess the syscall numbers are basically just
assigned after latest Ben's 2.5 patch but just in case.
URL:
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19rc4aa1.gz
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19rc4aa1/
diff between 2.4.19rc3aa4 and 2.4.19rc4aa1:
Only in 2.4.19rc3aa4: 00_d_unhash-race-1
Only in 2.4.19rc3aa4: 00_mmx_xmm-init-2
Only in 2.4.19rc3aa4: 00_vm86-3
Only in 2.4.19rc3aa4: 00_vm86-drop-v86mode-dead-thread-var-1
Only in 2.4.19rc3aa4: 00_vm86-pagetablelock-1
Merged in mainline.
Only in 2.4.19rc4aa1: 00_disable-reada-1
Fix failure of bread against reada by disabling reada
(from Andrew Morton).
Only in 2.4.19rc3aa4: 00_extraversion-3
Only in 2.4.19rc4aa1: 00_extraversion-4
Only in 2.4.19rc3aa4: 9900_aio-1.gz
Only in 2.4.19rc4aa1: 9900_aio-2.gz
Merge new cancellation API from Ben, drop the POLL and READX
functionalities (apparently they were experimental). pipe callbacks
doesn't implement yet the new cancellation API, they don't overwrite
the io_event structure yet, so temporarly disabled the copy_to_user of
such structure in sys_io_cancel (so not fully compliant yet), that
would otherwise expose uninitialized kernel stack to userspace (will be
fixed ASAP).
Didn't merged the sys_getevents_abs modification, it still takes
the timeout as argument, I still prefer it for the lower overhead in
the timeout case, despite it has a larger window for going out of sync
with the timeoftheday (a window with userspace in between where context
switches cannot disabled). Waiting a final judjment for 2.5 before
making any change here.
Only in 2.4.19rc3aa4: 9900_aio-API-x86-1
Only in 2.4.19rc4aa1: 9900_aio-API-x86-2
Go in sync with latest syscall numbering in Ben's proposed patch and
dropped sys_io_wait enterely.
Only in 2.4.19rc4aa1: 9910_shm-largepage-1.gz
Merge largpage support for shared memory from Ingo Molnar.
Dropped from it all the ABI kernel changes like the unregistered
MAP_BIGPAGE 0x40 as parameter to teh mmap syscall, 0x40 can do
a completely different thing in 2.5, the original version of the patch
wasn't backwards compatible. This one is fully backwards (or better
"forward") binary compatible because it's API-less (well.. almost, you
will get -EINVAL if you attempt a MAP_PRIVATE in /dev/shm then and
stuff like that, _but_ only after enabling the support via sysctl).
Completely untested though but no need to worry until/unless you run
"echo 1 >/proc/sys/kernel/shm-use-bigpages", you also need to specify
at boot how much memory you reserve for largepages, with bigpages=1g or
similar (same memparse sintax of mem=).
After the largepage shm support is enabled all shm segments will
be attempted to be backed from largepages. largepages don't apply
to file backed mappings or anonymoys mappings, just shared memory
either from mmap("/dev/zero", MAP_SHARED), /dev/shm or shmget/shmat.
In particular the shmget API will preallocate all pages (minor note: not the
pagetables, page faults will still happen!) before returning from the
syscall (could be changed very easily with a one liner, but I guess
those db prefer the pages to be preallocated). All shm segments backed
by largepages are VM_LOCKED (unpageable to swap).
Andrea
On Thu, Aug 01, 2002 at 07:51:24AM +0200, Andrea Arcangeli wrote:
> This may be the last update for a week (unless there's a quick bug to
> fix before next morning :). I wanted to ship async-io and largepage
I would like to thank Randy Hron for reproducing this problem so
quickly with the ltp testsuite:
>>EIP; 80132cc2 <shmem_writepage+22/130> <=====
here the fix:
--- 2.4.19rc4aa1/include/linux/mm.h.~1~ Thu Aug 1 07:15:54 2002
+++ 2.4.19rc4aa1/include/linux/mm.h Thu Aug 1 16:13:56 2002
@@ -296,8 +296,8 @@ typedef struct page {
#define PG_checked 12 /* kill me in 2.5.<early>. */
#define PG_arch_1 13
#define PG_reserved 14
-#define PG_bigpage 15
#define PG_launder 15 /* written out by VM pressure.. */
+#define PG_bigpage 16
/* Make it prettier to test the above... */
#define UnlockPage(page) unlock_page(page)
new rc4aa2 with this single fix is coming, if anybody else found any
other problem please let me know ASAP :), thanks.
Andrea
On 20020801 Andrea Arcangeli wrote:
> On Thu, Aug 01, 2002 at 07:51:24AM +0200, Andrea Arcangeli wrote:
> > This may be the last update for a week (unless there's a quick bug to
> > fix before next morning :). I wanted to ship async-io and largepage
>
> I would like to thank Randy Hron for reproducing this problem so
> quickly with the ltp testsuite:
>
> >>EIP; 80132cc2 <shmem_writepage+22/130> <=====
>
Can be related to this (which I get on every shm related op, like a pipe in
bzip2 -cd | patch -p1 ):
kernel BUG at page_alloc.c:98!
invalid operand: 0000 2.4.19-rc5-jam0 #1 SMP Thu Aug 1 12:28:09 CEST 2002
CPU: 0
EIP: 0010:[__free_pages_ok+87/752] Tainted: P
EIP: 0010:[<8013d227>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210286
eax: 00000000 ebx: 81128b10 ecx: 81128b10 edx: 00000000
esi: 8631fe74 edi: 00000000 ebp: 00000000 esp: 874e1f08
ds: 0018 es: 0018 ss: 0018
Process bonobo-moniker- (pid: 2103, stackpage=874e1000)
Stack: 00000000 8631fee4 8631fee8 00000115 00000000 00000000 8631fdc0 00000115
00000115 00000000 8631fdc0 80141c8b 874e1f3c 81128b10 2b331000 00000000
00000000 00001000 80141d6b 86fcb660 86fcb680 874e1f60 00000115 00000eeb
Call Trace: [do_shmem_file_read+299/432] [shmem_file_read+91/128] [sys_read+150/272] [system_call+51/56]
Call Trace: [<80141c8b>] [<80141d6b>] [<801454b6>] [<80108e4b>]
Code: 0f 0b 62 00 7d 55 27 80 8b 0d 10 50 32 80 89 d8 29 c8 69 c0
>>EIP; 8013d227 <__free_pages_ok+57/2f0> <=====
>>ebx; 81128b10 <_end+ddd614/22c4b04>
>>ecx; 81128b10 <_end+ddd614/22c4b04>
Trace; 80141c8b <do_shmem_file_read+12b/1b0>
Trace; 80141d6b <shmem_file_read+5b/80>
Trace; 801454b6 <sys_read+96/110>
Trace; 80108e4b <system_call+33/38>
Code; 8013d227 <__free_pages_ok+57/2f0>
00000000 <_EIP>:
Code; 8013d227 <__free_pages_ok+57/2f0> <=====
0: 0f 0b ud2a <=====
Code; 8013d229 <__free_pages_ok+59/2f0>
2: 62 00 bound %eax,(%eax)
Code; 8013d22b <__free_pages_ok+5b/2f0>
4: 7d 55 jge 5b <_EIP+0x5b> 8013d282 <__free_pages_ok+b2/2f0>
Code; 8013d22d <__free_pages_ok+5d/2f0>
6: 27 daa
Code; 8013d22e <__free_pages_ok+5e/2f0>
7: 80 8b 0d 10 50 32 80 orb $0x80,0x3250100d(%ebx)
Code; 8013d235 <__free_pages_ok+65/2f0>
e: 89 d8 mov %ebx,%eax
Code; 8013d237 <__free_pages_ok+67/2f0>
10: 29 c8 sub %ecx,%eax
Code; 8013d239 <__free_pages_ok+69/2f0>
12: 69 c0 00 00 00 00 imul $0x0,%eax,%eax
--
J.A. Magallon \ Software is like sex:
junk.able.es \ It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.19-rc4-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-0.2mdk))
On Thu, Aug 01, 2002 at 04:17:03PM +0200, Andrea Arcangeli wrote:
> new rc4aa2 with this single fix is coming, if anybody else found any
> other problem please let me know ASAP :), thanks.
why don't you merge up to -rc5?
On Thu, Aug 01, 2002 at 06:41:41PM +0400, Sergey S. Kostyliov wrote:
> On Thursday 01 August 2002 18:30, Christoph Hellwig wrote:
> > On Thu, Aug 01, 2002 at 04:17:03PM +0200, Andrea Arcangeli wrote:
> > > new rc4aa2 with this single fix is coming, if anybody else found any
> > > other problem please let me know ASAP :), thanks.
> >
> > why don't you merge up to -rc5?
>
> I think because diff bitween rc4 and rc5 is already in 2.4.19rc4aa1
It isn't.
On Thursday 01 August 2002 18:30, Christoph Hellwig wrote:
> On Thu, Aug 01, 2002 at 04:17:03PM +0200, Andrea Arcangeli wrote:
> > new rc4aa2 with this single fix is coming, if anybody else found any
> > other problem please let me know ASAP :), thanks.
>
> why don't you merge up to -rc5?
I think because diff bitween rc4 and rc5 is already in 2.4.19rc4aa1
--
Best regards,
Sergey S. Kostyliov <[email protected]>
Public PGP key: http://sysadminday.org.ru/rathamahata.asc
On 20020801 Christoph Hellwig wrote:
> On Thu, Aug 01, 2002 at 06:41:41PM +0400, Sergey S. Kostyliov wrote:
> > On Thursday 01 August 2002 18:30, Christoph Hellwig wrote:
> > > On Thu, Aug 01, 2002 at 04:17:03PM +0200, Andrea Arcangeli wrote:
> > > > new rc4aa2 with this single fix is coming, if anybody else found any
> > > > other problem please let me know ASAP :), thanks.
> > >
> > > why don't you merge up to -rc5?
> >
> > I think because diff bitween rc4 and rc5 is already in 2.4.19rc4aa1
>
> It isn't.
If Andrea is in hurry, and you can live with an 'unofficial' version,
I already did it.
--
J.A. Magallon \ Software is like sex:
junk.able.es \ It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.19-rc4-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-0.2mdk))
On 20020801 Christoph Hellwig wrote:
> On Thu, Aug 01, 2002 at 04:17:03PM +0200, Andrea Arcangeli wrote:
> > new rc4aa2 with this single fix is coming, if anybody else found any
> > other problem please let me know ASAP :), thanks.
>
> why don't you merge up to -rc5?
>
Until the official version, here is -rc5-aa0-2: rc4-aa1 ported to rc5 and
with the bigpage fix
http://giga.cps.unizar.es/~magallon/linux/kernel/2.4.19-rc5-jam0/00-rc5-aa0-2.bz2
--
J.A. Magallon \ Software is like sex:
junk.able.es \ It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.19-rc5-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-0.2mdk))
On Thu, Aug 01, 2002 at 04:26:23PM +0200, J.A. Magallon wrote:
>
> On 20020801 Andrea Arcangeli wrote:
> > On Thu, Aug 01, 2002 at 07:51:24AM +0200, Andrea Arcangeli wrote:
> > > This may be the last update for a week (unless there's a quick bug to
> > > fix before next morning :). I wanted to ship async-io and largepage
> >
> > I would like to thank Randy Hron for reproducing this problem so
> > quickly with the ltp testsuite:
> >
> > >>EIP; 80132cc2 <shmem_writepage+22/130> <=====
> >
>
> Can be related to this (which I get on every shm related op, like a pipe in
> bzip2 -cd | patch -p1 ):
>
> kernel BUG at page_alloc.c:98!
> invalid operand: 0000 2.4.19-rc5-jam0 #1 SMP Thu Aug 1 12:28:09 CEST 2002
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I love it :)
> CPU: 0
> EIP: 0010:[__free_pages_ok+87/752] Tainted: P
> EIP: 0010:[<8013d227>] Tainted: P
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00210286
> eax: 00000000 ebx: 81128b10 ecx: 81128b10 edx: 00000000
> esi: 8631fe74 edi: 00000000 ebp: 00000000 esp: 874e1f08
> ds: 0018 es: 0018 ss: 0018
> Process bonobo-moniker- (pid: 2103, stackpage=874e1000)
> Stack: 00000000 8631fee4 8631fee8 00000115 00000000 00000000 8631fdc0 00000115
> 00000115 00000000 8631fdc0 80141c8b 874e1f3c 81128b10 2b331000 00000000
> 00000000 00001000 80141d6b 86fcb660 86fcb680 874e1f60 00000115 00000eeb
> Call Trace: [do_shmem_file_read+299/432] [shmem_file_read+91/128] [sys_read+150/272] [system_call+51/56]
> Call Trace: [<80141c8b>] [<80141d6b>] [<801454b6>] [<80108e4b>]
> Code: 0f 0b 62 00 7d 55 27 80 8b 0d 10 50 32 80 89 d8 29 c8 69 c0
>
>
> >>EIP; 8013d227 <__free_pages_ok+57/2f0> <=====
this is another issue, here the fix:
--- 2.4.19rc5aa1/mm/shmem.c.~1~ Thu Aug 1 17:04:45 2002
+++ 2.4.19rc5aa1/mm/shmem.c Thu Aug 1 19:56:38 2002
@@ -1178,8 +1178,6 @@ static void do_shmem_file_read(struct fi
__free_page(page);
else
page_cache_release(page);
-
- page_cache_release(page);
}
*ppos = ((loff_t) index << page_shift) + offset;
Andrea