2003-09-12 19:46:27

by Marcelo Tosatti

[permalink] [raw]
Subject: Linux 2.4.23-pre4


Hello,

Here goes -pre4, which contains networking update, IA64 update, PPC
update, USB update, bunch of knfsd fixes, amongst others.

And finally merge most important part of -aa VM. Those changes are fixing
some OOM deadlocks, give us better per-zone balancing and better
reclaiming. The OOM killer has been removed.

I've been using it on my 256MB desktop: performance feels much better, but
it needs extensive testing, so please help.


Summary of changes from v2.4.23-pre3 to v2.4.23-pre4
============================================

<adsharma:unix-os.sc.intel.com>:
o ia64: IA-32 compatibility patch: FP denormal handling

<alex.williamson:hp.com>:
o ia64: Correct NR_CPUS/cpu_online test order in CMC/CPE polling

<bjorn.helgaas:hp.com>:
o ia64: Remove partial semtimedop32 stuff from upstream
o ia64: Merge to newer ACPI CA
o ia64: sys_ia32.c needs linux/quotacompat.h
o ia64: tlb.c whitespace cleanup to follow 2.5
o ia64: make cpu_relax() a barrier to be consistent with 2.5
o ia64: kernel/acpi.c: Whitespace changes to follow 2.5
o ia64: MCA: pass GP *physical address* to SAL
o ia64: minor bugfixes and whitespace cleanup to follow 2.5
o ia64: MCA: Find correct offset of OEM data (from Keith Owens)
o ia64: sal.h: Backport spelling and other trivial changes from 2.5
o ia64: Comment changes to fix "correctable" usage
o ia64: Fix check for binutils that supports "hint" instructions
o ia64: Update configs for upstream changes
o ia64: Use ARRAY_SIZE(), fix formatting, remove static initializers to zero
o ia64 unwind: (unw_access_ar): initialize struct pt_regs *pt before using it to get AR_CSD & AR_SSD
o ia64: Update defconfig to new generic config
o ia64: initialize bootmem early for acpi_table_init()
o ia64: Use $(CC), not $(AS), when checking for "hint @pause" support in binutils
o ia64: Clarify ACPI available_cpus handling
o ia64: TRIVIAL: Remove extraneous '`'
o ia64: minstate.h: whitespace changes to reduce diffs with 2.5
o ia64: Fix minstate comments
o ia64: fix SAVE_RESET so OS INIT handler works again
o ia64: Remove AIC7XXX driver from ski defconfig
o 2.4 HCDP early printk support

<chas:nrl.navy.mil>:
o [ATM]: In atm_getaddr() do not copy_to_user() with locks held

<daniel:deadlock.et.tudelft.nl>:
o Implement LCD display support in atyfb driver

<eric:lammerts.org>:
o fix current->user->processes leak in reparent_to_init()

<erikj:subway.americas.sgi.com>:
o ia64: 9/3/2003 SGI update

<erlend-a:us.his.no>:
o [CRYPTO]: Add alg. type to /proc/crypto output

<joris:struyve.be>:
o unusual_devs.h entry

<karlis:mt.lv>:
o [BRIDGE]: kfree --> kfree_skb

<marcelo:logos.cnet>:
o Mehmet Ceyran/Alan Cox: Longer i810_audio.c retries
o aa VM merge: Per-zone watermark changes, add lower_zone_reserve_ratio
o aa VM merge: page reclaiming logic changes: Kills oom killer
o aa VM merge: Page accounting helpers changes
o aa VM merge: tunables
o aa VM merge: Kill PF_MEMDIE
o aa VM merge: Fixup page reclaiming changes patch
o Changed EXTRAVERSION to -pre4
o Cset exclude: [email protected]|ChangeSet|20030912113656|10550

<matthewc:cse.unsw.edu.au>:
o smpboot.c, acpi.c

Alan Cox:
o Fix ymfpci oops

Alex Williamson:
o ia64: Use PAL_HALT_LIGHT in cpu_idle
o ia64: New CMC/CPE polling
o ia64: Update to CMC/CPE polling
o ia64: Rename SAL_CALL_SAFE to SAL_CALL_REENTRANT

Arjan van de Ven:
o LSB compliance fix in mprotect

Arun Sharma:
o ia64: translate F_GETLK64/F_SETLK64 to F_GETLK/F_SETLK
o ia64: fix memory leak in sys32_execve path

Chas Williams:
o [ATM]: If clip isn't a module don't __MOD_DEC_USE_COUNT()
o [ATM]: #define'ing pci_pool_create() breaks CONFIG_MODVERSION
o [ATM]: Backport lane/mpoa module locking cleanup from 2.6.x

David Mosberger:
o ia64: handle_fpu_swa() scaling fix
o ia64: Backtraces of all processes on INIT, warning cleanup

Greg Kroah-Hartman:
o USB: fix data toggle problem for pl2303 driver
o USB: update usb-serial.h with spelling fixes and get and set functions
o USB: backport some pl2303 B0 fixes
o USB: fix oops when yanking a usb-serial device from the system with the port still opened
o USB: fix copy_from_user call in acm.c
o USB: fix copy_from_user call in aiptek.c
o USB: fix copy_to_user call in uhci-debug.h
o USB: fix copy_to_user call in mdc800 driver
o USB: remove duplicated copy_from_user call in stv680 driver
o USB: fix copy_to_user calls in vicam driver

Harald Welte:
o [NETFILTER]: NAT range calculation fix

Jack Steiner:
o ia64: discontig/NUMA support
o ia64: Add ia64_imva() and a few more ia64_tpa() uses
o ia64: add support for non-identity mapped kernels
o ia64: remove some SN1 remnants, add a bit more SN2 support

Jean Tourrilhes:
o wireless extension update: 802.11a/802.11g fixes

Jens Axboe:
o Add NEC iStorage to SCSI blacklist

Keith M. Wesolowski:
o [SPARC32]: Ignore btfixups in .text.exit

Keith Owens:
o ia64: Clean up several warnings (no functional change)
o ia64: Correct typo in UNW_DPRINT() call
o ia64: Fix more UNW_DPRINT() typos
o ia64: Delete some generated ia64 files that were being left by make mrproper

Marc-Christian Petersen:
o Fixup 'make xconfig' problem caused by fetchop Config.in change

Martin Hicks:
o ia64: max user stack size of main thread configurable via RLIMIT_STACK

Matthew Wilcox:
o ia64: return PCI domain for pci_controller_num()

Neil Brown:
o knfsd: Lock client list while detaching locks
o knfsd: Set d_op when creating a parent directory during nfsd fh->dentry conversion
o knfsd: lockd fails to purge blocked NLM_LOCKs
o Fix typo in umem.c
o knfsd: Make sure nfs/tcp socket only gets closed
o knfsd: Change name of a #define in nfsd to match 2.6
o knfsd: Make sure nfsd replies from the address the request was sent to

Oleg Drokin:
o [2.4] Rocketport driver compile fix

Paul Fulghum:
o synclink update
o synclinkmp update
o synclink_cs update
o n_hdlc update
o synclink drivers fixup

Paul Mackerras:
o PPC32: Handle single-stepped emulated instructions correctly
o PPC32: Fix for highmem on machines with 64-bit PTEs (e.g. PPC440)
o PPC32: Simplify VMALLOC_START, make it just a variable
o PPC32: Fix a typo in the PPC 440GP support
o PPC32: Fix a bug where TLB entries didn't get execute permission on 40x

Ralf B?chle:
o avoid glibc conflict

Seth Rohit:
o ia64: use "hint @pause" in cpu_relax() and spinlock contention
o ia64: patch to use >256MB purges
o ia64: Restructure pt_regs and optimize syscall path
o ia64: Correct .unwabi for PT_REGS_SAVES (should be "3, 'i'")

Stephen Hemminger:
o [BRIDGE]: Clear hw checksum flags when bridging

St?phane Eranian:
o ia64: Fix perfmon usage of rum/srsm and sum/ssm

Tom Rini:
o PPC32: Add Magic SysRq support to MPC8260 platforms
o PPC32: Minor bootwrapper fixups

Tony Luck:
o ia64: cleaning up the INIT code (Backported from 2.5 by Bjorn Helgaas)
o ia64: Trim granules correctly in efi_memmap_walk()








2003-09-12 21:44:24

by Mike Fedyk

[permalink] [raw]
Subject: aa VM updates merged was: Linux 2.4.23-pre4

On Fri, Sep 12, 2003 at 04:48:50PM -0300, Marcelo Tosatti wrote:
> And finally merge most important part of -aa VM. Those changes are fixing
> some OOM deadlocks, give us better per-zone balancing and better
> reclaiming. The OOM killer has been removed.
>
> I've been using it on my 256MB desktop: performance feels much better, but
> it needs extensive testing, so please help.

...

> <marcelo:logos.cnet>:
> o aa VM merge: Per-zone watermark changes, add lower_zone_reserve_ratio
> o aa VM merge: page reclaiming logic changes: Kills oom killer
> o aa VM merge: Page accounting helpers changes
> o aa VM merge: tunables
> o aa VM merge: Kill PF_MEMDIE
> o aa VM merge: Fixup page reclaiming changes patch

Great. I will be going back to 2.4 to help test this for you.

Now we await the flamewar about removing the OOM killer... :-/

2003-09-13 21:41:44

by Willy Tarreau

[permalink] [raw]
Subject: Re: Linux 2.4.23-pre4

On Fri, Sep 12, 2003 at 04:48:50PM -0300, Marcelo Tosatti wrote:
>
> Hello,
>
> Here goes -pre4, which contains networking update, IA64 update, PPC
> update, USB update, bunch of knfsd fixes, amongst others.

Seems good here.

However, I tried to compile it on alpha with gcc-3.3.1. It failed on xor.h
because all the asm code is only one string.

Looking through the list archives, I found this (old) patch to 2.5.44-ac3 which
applies cleanly to 2.4.23-pre4. Would you mind applying it, please ?

Cheers,
Willy


>From [email protected] Tue Oct 29 00:11:51 2002
Return-Path: <[email protected]>
Received: from vax.home.local (vax [10.2.1.2])
by alpha.home.local (8.12.4/8.12.1) with ESMTP id g9SNBnn4022263
for <[email protected]>; Tue, 29 Oct 2002 00:11:49 +0100
Received: from vger.kernel.org (vger.kernel.org [209.116.70.75])
by vax.home.local (8.12.2/8.12.1) with ESMTP id g9SNAZtq026134
for <[email protected]>; Tue, 29 Oct 2002 00:11:00 +0100 (CET)
Received: ([email protected]) by vger.kernel.org via listexpand
id <S261352AbSJ1Wjd>; Mon, 28 Oct 2002 17:39:33 -0500
Received: ([email protected]) by vger.kernel.org
id <S261492AbSJ1Wjc>; Mon, 28 Oct 2002 17:39:32 -0500
Received: from p50829418.dip.t-dialin.net ([80.130.148.24]:2309 "EHLO
Marvin.DL8BCU.ampr.org") by vger.kernel.org with ESMTP
id <S261352AbSJ1WjG>; Mon, 28 Oct 2002 17:39:06 -0500
Received: from dl8bcu.de (th@localhost [127.0.0.1])
by Marvin.DL8BCU.ampr.org (8.12.1/8.12.1) with ESMTP id g9SMjMVK002631;
Mon, 28 Oct 2002 22:45:23 GMT
Received: (from th@localhost)
by dl8bcu.de (8.12.1/8.12.1/Submit) id g9SMjLmI002630;
Mon, 28 Oct 2002 22:45:21 GMT
X-Authentication-Warning: Marvin.borg.net: th set sender to [email protected] using -f
Date: Mon, 28 Oct 2002 22:45:21 +0000
From: Thorsten Kranzkowski <[email protected]>
To: Alan Cox <[email protected]>
Cc: linux-kernel mailing list <[email protected]>
Subject: [patch] asm-alpha/xor.h compile failure (2.5.44-ac5)
Message-ID: <[email protected]>
Reply-To: [email protected]
Mail-Followup-To: Alan Cox <[email protected]>,
linux-kernel mailing list <[email protected]>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
Sender: [email protected]
Precedence: bulk
X-Mailing-List: [email protected]
Status: RO
Content-Length: 34048
Lines: 1615

Hi,
gcc 3.3 complains about the missing quotes:


In file included from drivers/md/xor.c:23:
include/asm/xor.h: At top level:
include/asm/xor.h:36: error: request for member `text' in something not a structure or union
include/asm/xor.h:37: error: parse error before numeric constant
include/asm/xor.h:62: error: syntax error at '#' token
include/asm/xor.h:87:17: invalid suffix "b" on integer constant
include/asm/xor.h:119: error: syntax error at '#' token
include/asm/xor.h:120: error: syntax error at '#' token
include/asm/xor.h:121: error: syntax error at '#' token


Please apply.

Thorsten


diff -ur linux-2.5.44-ac3/include/asm-alpha/xor.h linux-2.5.44-ac3-ds20/include/asm-alpha/xor.h
--- linux-2.5.44-ac3/include/asm-alpha/xor.h Sat Oct 19 04:02:28 2002
+++ linux-2.5.44-ac3-ds20/include/asm-alpha/xor.h Fri Oct 25 18:06:00 2002
@@ -32,794 +32,794 @@
unsigned long *, unsigned long *,
unsigned long *, unsigned long *);

-asm("
- .text
- .align 3
- .ent xor_alpha_2
-xor_alpha_2:
- .prologue 0
- srl $16, 6, $16
- .align 4
-2:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,8($17)
- ldq $3,8($18)
-
- ldq $4,16($17)
- ldq $5,16($18)
- ldq $6,24($17)
- ldq $7,24($18)
-
- ldq $19,32($17)
- ldq $20,32($18)
- ldq $21,40($17)
- ldq $22,40($18)
-
- ldq $23,48($17)
- ldq $24,48($18)
- ldq $25,56($17)
- xor $0,$1,$0 # 7 cycles from $1 load
-
- ldq $27,56($18)
- xor $2,$3,$2
- stq $0,0($17)
- xor $4,$5,$4
-
- stq $2,8($17)
- xor $6,$7,$6
- stq $4,16($17)
- xor $19,$20,$19
-
- stq $6,24($17)
- xor $21,$22,$21
- stq $19,32($17)
- xor $23,$24,$23
-
- stq $21,40($17)
- xor $25,$27,$25
- stq $23,48($17)
- subq $16,1,$16
-
- stq $25,56($17)
- addq $17,64,$17
- addq $18,64,$18
- bgt $16,2b
-
- ret
- .end xor_alpha_2
-
- .align 3
- .ent xor_alpha_3
-xor_alpha_3:
- .prologue 0
- srl $16, 6, $16
- .align 4
-3:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,8($17)
-
- ldq $4,8($18)
- ldq $6,16($17)
- ldq $7,16($18)
- ldq $21,24($17)
-
- ldq $22,24($18)
- ldq $24,32($17)
- ldq $25,32($18)
- ldq $5,8($19)
-
- ldq $20,16($19)
- ldq $23,24($19)
- ldq $27,32($19)
- nop
-
- xor $0,$1,$1 # 8 cycles from $0 load
- xor $3,$4,$4 # 6 cycles from $4 load
- xor $6,$7,$7 # 6 cycles from $7 load
- xor $21,$22,$22 # 5 cycles from $22 load
-
- xor $1,$2,$2 # 9 cycles from $2 load
- xor $24,$25,$25 # 5 cycles from $25 load
- stq $2,0($17)
- xor $4,$5,$5 # 6 cycles from $5 load
-
- stq $5,8($17)
- xor $7,$20,$20 # 7 cycles from $20 load
- stq $20,16($17)
- xor $22,$23,$23 # 7 cycles from $23 load
-
- stq $23,24($17)
- xor $25,$27,$27 # 7 cycles from $27 load
- stq $27,32($17)
- nop
-
- ldq $0,40($17)
- ldq $1,40($18)
- ldq $3,48($17)
- ldq $4,48($18)
-
- ldq $6,56($17)
- ldq $7,56($18)
- ldq $2,40($19)
- ldq $5,48($19)
-
- ldq $20,56($19)
- xor $0,$1,$1 # 4 cycles from $1 load
- xor $3,$4,$4 # 5 cycles from $4 load
- xor $6,$7,$7 # 5 cycles from $7 load
-
- xor $1,$2,$2 # 4 cycles from $2 load
- xor $4,$5,$5 # 5 cycles from $5 load
- stq $2,40($17)
- xor $7,$20,$20 # 4 cycles from $20 load
-
- stq $5,48($17)
- subq $16,1,$16
- stq $20,56($17)
- addq $19,64,$19
-
- addq $18,64,$18
- addq $17,64,$17
- bgt $16,3b
- ret
- .end xor_alpha_3
-
- .align 3
- .ent xor_alpha_4
-xor_alpha_4:
- .prologue 0
- srl $16, 6, $16
- .align 4
-4:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,0($20)
-
- ldq $4,8($17)
- ldq $5,8($18)
- ldq $6,8($19)
- ldq $7,8($20)
-
- ldq $21,16($17)
- ldq $22,16($18)
- ldq $23,16($19)
- ldq $24,16($20)
-
- ldq $25,24($17)
- xor $0,$1,$1 # 6 cycles from $1 load
- ldq $27,24($18)
- xor $2,$3,$3 # 6 cycles from $3 load
-
- ldq $0,24($19)
- xor $1,$3,$3
- ldq $1,24($20)
- xor $4,$5,$5 # 7 cycles from $5 load
-
- stq $3,0($17)
- xor $6,$7,$7
- xor $21,$22,$22 # 7 cycles from $22 load
- xor $5,$7,$7
-
- stq $7,8($17)
- xor $23,$24,$24 # 7 cycles from $24 load
- ldq $2,32($17)
- xor $22,$24,$24
-
- ldq $3,32($18)
- ldq $4,32($19)
- ldq $5,32($20)
- xor $25,$27,$27 # 8 cycles from $27 load
-
- ldq $6,40($17)
- ldq $7,40($18)
- ldq $21,40($19)
- ldq $22,40($20)
-
- stq $24,16($17)
- xor $0,$1,$1 # 9 cycles from $1 load
- xor $2,$3,$3 # 5 cycles from $3 load
- xor $27,$1,$1
-
- stq $1,24($17)
- xor $4,$5,$5 # 5 cycles from $5 load
- ldq $23,48($17)
- ldq $24,48($18)
-
- ldq $25,48($19)
- xor $3,$5,$5
- ldq $27,48($20)
- ldq $0,56($17)
-
- ldq $1,56($18)
- ldq $2,56($19)
- xor $6,$7,$7 # 8 cycles from $6 load
- ldq $3,56($20)
-
- stq $5,32($17)
- xor $21,$22,$22 # 8 cycles from $22 load
- xor $7,$22,$22
- xor $23,$24,$24 # 5 cycles from $24 load
-
- stq $22,40($17)
- xor $25,$27,$27 # 5 cycles from $27 load
- xor $24,$27,$27
- xor $0,$1,$1 # 5 cycles from $1 load
-
- stq $27,48($17)
- xor $2,$3,$3 # 4 cycles from $3 load
- xor $1,$3,$3
- subq $16,1,$16
-
- stq $3,56($17)
- addq $20,64,$20
- addq $19,64,$19
- addq $18,64,$18
-
- addq $17,64,$17
- bgt $16,4b
- ret
- .end xor_alpha_4
-
- .align 3
- .ent xor_alpha_5
-xor_alpha_5:
- .prologue 0
- srl $16, 6, $16
- .align 4
-5:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,0($20)
-
- ldq $4,0($21)
- ldq $5,8($17)
- ldq $6,8($18)
- ldq $7,8($19)
-
- ldq $22,8($20)
- ldq $23,8($21)
- ldq $24,16($17)
- ldq $25,16($18)
-
- ldq $27,16($19)
- xor $0,$1,$1 # 6 cycles from $1 load
- ldq $28,16($20)
- xor $2,$3,$3 # 6 cycles from $3 load
-
- ldq $0,16($21)
- xor $1,$3,$3
- ldq $1,24($17)
- xor $3,$4,$4 # 7 cycles from $4 load
-
- stq $4,0($17)
- xor $5,$6,$6 # 7 cycles from $6 load
- xor $7,$22,$22 # 7 cycles from $22 load
- xor $6,$23,$23 # 7 cycles from $23 load
-
- ldq $2,24($18)
- xor $22,$23,$23
- ldq $3,24($19)
- xor $24,$25,$25 # 8 cycles from $25 load
-
- stq $23,8($17)
- xor $25,$27,$27 # 8 cycles from $27 load
- ldq $4,24($20)
- xor $28,$0,$0 # 7 cycles from $0 load
-
- ldq $5,24($21)
- xor $27,$0,$0
- ldq $6,32($17)
- ldq $7,32($18)
-
- stq $0,16($17)
- xor $1,$2,$2 # 6 cycles from $2 load
- ldq $22,32($19)
- xor $3,$4,$4 # 4 cycles from $4 load
-
- ldq $23,32($20)
- xor $2,$4,$4
- ldq $24,32($21)
- ldq $25,40($17)
-
- ldq $27,40($18)
- ldq $28,40($19)
- ldq $0,40($20)
- xor $4,$5,$5 # 7 cycles from $5 load
-
- stq $5,24($17)
- xor $6,$7,$7 # 7 cycles from $7 load
- ldq $1,40($21)
- ldq $2,48($17)
-
- ldq $3,48($18)
- xor $7,$22,$22 # 7 cycles from $22 load
- ldq $4,48($19)
- xor $23,$24,$24 # 6 cycles from $24 load
-
- ldq $5,48($20)
- xor $22,$24,$24
- ldq $6,48($21)
- xor $25,$27,$27 # 7 cycles from $27 load
-
- stq $24,32($17)
- xor $27,$28,$28 # 8 cycles from $28 load
- ldq $7,56($17)
- xor $0,$1,$1 # 6 cycles from $1 load
-
- ldq $22,56($18)
- ldq $23,56($19)
- ldq $24,56($20)
- ldq $25,56($21)
-
- xor $28,$1,$1
- xor $2,$3,$3 # 9 cycles from $3 load
- xor $3,$4,$4 # 9 cycles from $4 load
- xor $5,$6,$6 # 8 cycles from $6 load
-
- stq $1,40($17)
- xor $4,$6,$6
- xor $7,$22,$22 # 7 cycles from $22 load
- xor $23,$24,$24 # 6 cycles from $24 load
-
- stq $6,48($17)
- xor $22,$24,$24
- subq $16,1,$16
- xor $24,$25,$25 # 8 cycles from $25 load
-
- stq $25,56($17)
- addq $21,64,$21
- addq $20,64,$20
- addq $19,64,$19
-
- addq $18,64,$18
- addq $17,64,$17
- bgt $16,5b
- ret
- .end xor_alpha_5
-
- .align 3
- .ent xor_alpha_prefetch_2
-xor_alpha_prefetch_2:
- .prologue 0
- srl $16, 6, $16
-
- ldq $31, 0($17)
- ldq $31, 0($18)
-
- ldq $31, 64($17)
- ldq $31, 64($18)
-
- ldq $31, 128($17)
- ldq $31, 128($18)
-
- ldq $31, 192($17)
- ldq $31, 192($18)
- .align 4
-2:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,8($17)
- ldq $3,8($18)
-
- ldq $4,16($17)
- ldq $5,16($18)
- ldq $6,24($17)
- ldq $7,24($18)
-
- ldq $19,32($17)
- ldq $20,32($18)
- ldq $21,40($17)
- ldq $22,40($18)
-
- ldq $23,48($17)
- ldq $24,48($18)
- ldq $25,56($17)
- ldq $27,56($18)
-
- ldq $31,256($17)
- xor $0,$1,$0 # 8 cycles from $1 load
- ldq $31,256($18)
- xor $2,$3,$2
-
- stq $0,0($17)
- xor $4,$5,$4
- stq $2,8($17)
- xor $6,$7,$6
-
- stq $4,16($17)
- xor $19,$20,$19
- stq $6,24($17)
- xor $21,$22,$21
-
- stq $19,32($17)
- xor $23,$24,$23
- stq $21,40($17)
- xor $25,$27,$25
-
- stq $23,48($17)
- subq $16,1,$16
- stq $25,56($17)
- addq $17,64,$17
-
- addq $18,64,$18
- bgt $16,2b
- ret
- .end xor_alpha_prefetch_2
-
- .align 3
- .ent xor_alpha_prefetch_3
-xor_alpha_prefetch_3:
- .prologue 0
- srl $16, 6, $16
-
- ldq $31, 0($17)
- ldq $31, 0($18)
- ldq $31, 0($19)
-
- ldq $31, 64($17)
- ldq $31, 64($18)
- ldq $31, 64($19)
-
- ldq $31, 128($17)
- ldq $31, 128($18)
- ldq $31, 128($19)
-
- ldq $31, 192($17)
- ldq $31, 192($18)
- ldq $31, 192($19)
- .align 4
-3:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,8($17)
-
- ldq $4,8($18)
- ldq $6,16($17)
- ldq $7,16($18)
- ldq $21,24($17)
-
- ldq $22,24($18)
- ldq $24,32($17)
- ldq $25,32($18)
- ldq $5,8($19)
-
- ldq $20,16($19)
- ldq $23,24($19)
- ldq $27,32($19)
- nop
-
- xor $0,$1,$1 # 8 cycles from $0 load
- xor $3,$4,$4 # 7 cycles from $4 load
- xor $6,$7,$7 # 6 cycles from $7 load
- xor $21,$22,$22 # 5 cycles from $22 load
-
- xor $1,$2,$2 # 9 cycles from $2 load
- xor $24,$25,$25 # 5 cycles from $25 load
- stq $2,0($17)
- xor $4,$5,$5 # 6 cycles from $5 load
-
- stq $5,8($17)
- xor $7,$20,$20 # 7 cycles from $20 load
- stq $20,16($17)
- xor $22,$23,$23 # 7 cycles from $23 load
-
- stq $23,24($17)
- xor $25,$27,$27 # 7 cycles from $27 load
- stq $27,32($17)
- nop
-
- ldq $0,40($17)
- ldq $1,40($18)
- ldq $3,48($17)
- ldq $4,48($18)
-
- ldq $6,56($17)
- ldq $7,56($18)
- ldq $2,40($19)
- ldq $5,48($19)
-
- ldq $20,56($19)
- ldq $31,256($17)
- ldq $31,256($18)
- ldq $31,256($19)
-
- xor $0,$1,$1 # 6 cycles from $1 load
- xor $3,$4,$4 # 5 cycles from $4 load
- xor $6,$7,$7 # 5 cycles from $7 load
- xor $1,$2,$2 # 4 cycles from $2 load
-
- xor $4,$5,$5 # 5 cycles from $5 load
- xor $7,$20,$20 # 4 cycles from $20 load
- stq $2,40($17)
- subq $16,1,$16
-
- stq $5,48($17)
- addq $19,64,$19
- stq $20,56($17)
- addq $18,64,$18
-
- addq $17,64,$17
- bgt $16,3b
- ret
- .end xor_alpha_prefetch_3
-
- .align 3
- .ent xor_alpha_prefetch_4
-xor_alpha_prefetch_4:
- .prologue 0
- srl $16, 6, $16
-
- ldq $31, 0($17)
- ldq $31, 0($18)
- ldq $31, 0($19)
- ldq $31, 0($20)
-
- ldq $31, 64($17)
- ldq $31, 64($18)
- ldq $31, 64($19)
- ldq $31, 64($20)
-
- ldq $31, 128($17)
- ldq $31, 128($18)
- ldq $31, 128($19)
- ldq $31, 128($20)
-
- ldq $31, 192($17)
- ldq $31, 192($18)
- ldq $31, 192($19)
- ldq $31, 192($20)
- .align 4
-4:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,0($20)
-
- ldq $4,8($17)
- ldq $5,8($18)
- ldq $6,8($19)
- ldq $7,8($20)
-
- ldq $21,16($17)
- ldq $22,16($18)
- ldq $23,16($19)
- ldq $24,16($20)
-
- ldq $25,24($17)
- xor $0,$1,$1 # 6 cycles from $1 load
- ldq $27,24($18)
- xor $2,$3,$3 # 6 cycles from $3 load
-
- ldq $0,24($19)
- xor $1,$3,$3
- ldq $1,24($20)
- xor $4,$5,$5 # 7 cycles from $5 load
-
- stq $3,0($17)
- xor $6,$7,$7
- xor $21,$22,$22 # 7 cycles from $22 load
- xor $5,$7,$7
-
- stq $7,8($17)
- xor $23,$24,$24 # 7 cycles from $24 load
- ldq $2,32($17)
- xor $22,$24,$24
-
- ldq $3,32($18)
- ldq $4,32($19)
- ldq $5,32($20)
- xor $25,$27,$27 # 8 cycles from $27 load
-
- ldq $6,40($17)
- ldq $7,40($18)
- ldq $21,40($19)
- ldq $22,40($20)
-
- stq $24,16($17)
- xor $0,$1,$1 # 9 cycles from $1 load
- xor $2,$3,$3 # 5 cycles from $3 load
- xor $27,$1,$1
-
- stq $1,24($17)
- xor $4,$5,$5 # 5 cycles from $5 load
- ldq $23,48($17)
- xor $3,$5,$5
-
- ldq $24,48($18)
- ldq $25,48($19)
- ldq $27,48($20)
- ldq $0,56($17)
-
- ldq $1,56($18)
- ldq $2,56($19)
- ldq $3,56($20)
- xor $6,$7,$7 # 8 cycles from $6 load
-
- ldq $31,256($17)
- xor $21,$22,$22 # 8 cycles from $22 load
- ldq $31,256($18)
- xor $7,$22,$22
-
- ldq $31,256($19)
- xor $23,$24,$24 # 6 cycles from $24 load
- ldq $31,256($20)
- xor $25,$27,$27 # 6 cycles from $27 load
-
- stq $5,32($17)
- xor $24,$27,$27
- xor $0,$1,$1 # 7 cycles from $1 load
- xor $2,$3,$3 # 6 cycles from $3 load
-
- stq $22,40($17)
- xor $1,$3,$3
- stq $27,48($17)
- subq $16,1,$16
-
- stq $3,56($17)
- addq $20,64,$20
- addq $19,64,$19
- addq $18,64,$18
-
- addq $17,64,$17
- bgt $16,4b
- ret
- .end xor_alpha_prefetch_4
-
- .align 3
- .ent xor_alpha_prefetch_5
-xor_alpha_prefetch_5:
- .prologue 0
- srl $16, 6, $16
-
- ldq $31, 0($17)
- ldq $31, 0($18)
- ldq $31, 0($19)
- ldq $31, 0($20)
- ldq $31, 0($21)
-
- ldq $31, 64($17)
- ldq $31, 64($18)
- ldq $31, 64($19)
- ldq $31, 64($20)
- ldq $31, 64($21)
-
- ldq $31, 128($17)
- ldq $31, 128($18)
- ldq $31, 128($19)
- ldq $31, 128($20)
- ldq $31, 128($21)
-
- ldq $31, 192($17)
- ldq $31, 192($18)
- ldq $31, 192($19)
- ldq $31, 192($20)
- ldq $31, 192($21)
- .align 4
-5:
- ldq $0,0($17)
- ldq $1,0($18)
- ldq $2,0($19)
- ldq $3,0($20)
-
- ldq $4,0($21)
- ldq $5,8($17)
- ldq $6,8($18)
- ldq $7,8($19)
-
- ldq $22,8($20)
- ldq $23,8($21)
- ldq $24,16($17)
- ldq $25,16($18)
-
- ldq $27,16($19)
- xor $0,$1,$1 # 6 cycles from $1 load
- ldq $28,16($20)
- xor $2,$3,$3 # 6 cycles from $3 load
-
- ldq $0,16($21)
- xor $1,$3,$3
- ldq $1,24($17)
- xor $3,$4,$4 # 7 cycles from $4 load
-
- stq $4,0($17)
- xor $5,$6,$6 # 7 cycles from $6 load
- xor $7,$22,$22 # 7 cycles from $22 load
- xor $6,$23,$23 # 7 cycles from $23 load
-
- ldq $2,24($18)
- xor $22,$23,$23
- ldq $3,24($19)
- xor $24,$25,$25 # 8 cycles from $25 load
-
- stq $23,8($17)
- xor $25,$27,$27 # 8 cycles from $27 load
- ldq $4,24($20)
- xor $28,$0,$0 # 7 cycles from $0 load
-
- ldq $5,24($21)
- xor $27,$0,$0
- ldq $6,32($17)
- ldq $7,32($18)
-
- stq $0,16($17)
- xor $1,$2,$2 # 6 cycles from $2 load
- ldq $22,32($19)
- xor $3,$4,$4 # 4 cycles from $4 load
-
- ldq $23,32($20)
- xor $2,$4,$4
- ldq $24,32($21)
- ldq $25,40($17)
-
- ldq $27,40($18)
- ldq $28,40($19)
- ldq $0,40($20)
- xor $4,$5,$5 # 7 cycles from $5 load
-
- stq $5,24($17)
- xor $6,$7,$7 # 7 cycles from $7 load
- ldq $1,40($21)
- ldq $2,48($17)
-
- ldq $3,48($18)
- xor $7,$22,$22 # 7 cycles from $22 load
- ldq $4,48($19)
- xor $23,$24,$24 # 6 cycles from $24 load
-
- ldq $5,48($20)
- xor $22,$24,$24
- ldq $6,48($21)
- xor $25,$27,$27 # 7 cycles from $27 load
-
- stq $24,32($17)
- xor $27,$28,$28 # 8 cycles from $28 load
- ldq $7,56($17)
- xor $0,$1,$1 # 6 cycles from $1 load
-
- ldq $22,56($18)
- ldq $23,56($19)
- ldq $24,56($20)
- ldq $25,56($21)
-
- ldq $31,256($17)
- xor $28,$1,$1
- ldq $31,256($18)
- xor $2,$3,$3 # 9 cycles from $3 load
-
- ldq $31,256($19)
- xor $3,$4,$4 # 9 cycles from $4 load
- ldq $31,256($20)
- xor $5,$6,$6 # 8 cycles from $6 load
-
- stq $1,40($17)
- xor $4,$6,$6
- xor $7,$22,$22 # 7 cycles from $22 load
- xor $23,$24,$24 # 6 cycles from $24 load
-
- stq $6,48($17)
- xor $22,$24,$24
- ldq $31,256($21)
- xor $24,$25,$25 # 8 cycles from $25 load
-
- stq $25,56($17)
- subq $16,1,$16
- addq $21,64,$21
- addq $20,64,$20
-
- addq $19,64,$19
- addq $18,64,$18
- addq $17,64,$17
- bgt $16,5b
-
- ret
- .end xor_alpha_prefetch_5
-");
+asm(
+" .text \n"
+" .align 3 \n"
+" .ent xor_alpha_2 \n"
+"xor_alpha_2: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" .align 4 \n"
+"2: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,8($17) \n"
+" ldq $3,8($18) \n"
+" \n"
+" ldq $4,16($17) \n"
+" ldq $5,16($18) \n"
+" ldq $6,24($17) \n"
+" ldq $7,24($18) \n"
+" \n"
+" ldq $19,32($17) \n"
+" ldq $20,32($18) \n"
+" ldq $21,40($17) \n"
+" ldq $22,40($18) \n"
+" \n"
+" ldq $23,48($17) \n"
+" ldq $24,48($18) \n"
+" ldq $25,56($17) \n"
+" xor $0,$1,$0 # 7 cycles from $1 load \n"
+" \n"
+" ldq $27,56($18) \n"
+" xor $2,$3,$2 \n"
+" stq $0,0($17) \n"
+" xor $4,$5,$4 \n"
+" \n"
+" stq $2,8($17) \n"
+" xor $6,$7,$6 \n"
+" stq $4,16($17) \n"
+" xor $19,$20,$19 \n"
+" \n"
+" stq $6,24($17) \n"
+" xor $21,$22,$21 \n"
+" stq $19,32($17) \n"
+" xor $23,$24,$23 \n"
+" \n"
+" stq $21,40($17) \n"
+" xor $25,$27,$25 \n"
+" stq $23,48($17) \n"
+" subq $16,1,$16 \n"
+" \n"
+" stq $25,56($17) \n"
+" addq $17,64,$17 \n"
+" addq $18,64,$18 \n"
+" bgt $16,2b \n"
+" \n"
+" ret \n"
+" .end xor_alpha_2 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_3 \n"
+"xor_alpha_3: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" .align 4 \n"
+"3: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,8($17) \n"
+" \n"
+" ldq $4,8($18) \n"
+" ldq $6,16($17) \n"
+" ldq $7,16($18) \n"
+" ldq $21,24($17) \n"
+" \n"
+" ldq $22,24($18) \n"
+" ldq $24,32($17) \n"
+" ldq $25,32($18) \n"
+" ldq $5,8($19) \n"
+" \n"
+" ldq $20,16($19) \n"
+" ldq $23,24($19) \n"
+" ldq $27,32($19) \n"
+" nop \n"
+" \n"
+" xor $0,$1,$1 # 8 cycles from $0 load \n"
+" xor $3,$4,$4 # 6 cycles from $4 load \n"
+" xor $6,$7,$7 # 6 cycles from $7 load \n"
+" xor $21,$22,$22 # 5 cycles from $22 load \n"
+" \n"
+" xor $1,$2,$2 # 9 cycles from $2 load \n"
+" xor $24,$25,$25 # 5 cycles from $25 load \n"
+" stq $2,0($17) \n"
+" xor $4,$5,$5 # 6 cycles from $5 load \n"
+" \n"
+" stq $5,8($17) \n"
+" xor $7,$20,$20 # 7 cycles from $20 load \n"
+" stq $20,16($17) \n"
+" xor $22,$23,$23 # 7 cycles from $23 load \n"
+" \n"
+" stq $23,24($17) \n"
+" xor $25,$27,$27 # 7 cycles from $27 load \n"
+" stq $27,32($17) \n"
+" nop \n"
+" \n"
+" ldq $0,40($17) \n"
+" ldq $1,40($18) \n"
+" ldq $3,48($17) \n"
+" ldq $4,48($18) \n"
+" \n"
+" ldq $6,56($17) \n"
+" ldq $7,56($18) \n"
+" ldq $2,40($19) \n"
+" ldq $5,48($19) \n"
+" \n"
+" ldq $20,56($19) \n"
+" xor $0,$1,$1 # 4 cycles from $1 load \n"
+" xor $3,$4,$4 # 5 cycles from $4 load \n"
+" xor $6,$7,$7 # 5 cycles from $7 load \n"
+" \n"
+" xor $1,$2,$2 # 4 cycles from $2 load \n"
+" xor $4,$5,$5 # 5 cycles from $5 load \n"
+" stq $2,40($17) \n"
+" xor $7,$20,$20 # 4 cycles from $20 load \n"
+" \n"
+" stq $5,48($17) \n"
+" subq $16,1,$16 \n"
+" stq $20,56($17) \n"
+" addq $19,64,$19 \n"
+" \n"
+" addq $18,64,$18 \n"
+" addq $17,64,$17 \n"
+" bgt $16,3b \n"
+" ret \n"
+" .end xor_alpha_3 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_4 \n"
+"xor_alpha_4: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" .align 4 \n"
+"4: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,0($20) \n"
+" \n"
+" ldq $4,8($17) \n"
+" ldq $5,8($18) \n"
+" ldq $6,8($19) \n"
+" ldq $7,8($20) \n"
+" \n"
+" ldq $21,16($17) \n"
+" ldq $22,16($18) \n"
+" ldq $23,16($19) \n"
+" ldq $24,16($20) \n"
+" \n"
+" ldq $25,24($17) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" ldq $27,24($18) \n"
+" xor $2,$3,$3 # 6 cycles from $3 load \n"
+" \n"
+" ldq $0,24($19) \n"
+" xor $1,$3,$3 \n"
+" ldq $1,24($20) \n"
+" xor $4,$5,$5 # 7 cycles from $5 load \n"
+" \n"
+" stq $3,0($17) \n"
+" xor $6,$7,$7 \n"
+" xor $21,$22,$22 # 7 cycles from $22 load \n"
+" xor $5,$7,$7 \n"
+" \n"
+" stq $7,8($17) \n"
+" xor $23,$24,$24 # 7 cycles from $24 load \n"
+" ldq $2,32($17) \n"
+" xor $22,$24,$24 \n"
+" \n"
+" ldq $3,32($18) \n"
+" ldq $4,32($19) \n"
+" ldq $5,32($20) \n"
+" xor $25,$27,$27 # 8 cycles from $27 load \n"
+" \n"
+" ldq $6,40($17) \n"
+" ldq $7,40($18) \n"
+" ldq $21,40($19) \n"
+" ldq $22,40($20) \n"
+" \n"
+" stq $24,16($17) \n"
+" xor $0,$1,$1 # 9 cycles from $1 load \n"
+" xor $2,$3,$3 # 5 cycles from $3 load \n"
+" xor $27,$1,$1 \n"
+" \n"
+" stq $1,24($17) \n"
+" xor $4,$5,$5 # 5 cycles from $5 load \n"
+" ldq $23,48($17) \n"
+" ldq $24,48($18) \n"
+" \n"
+" ldq $25,48($19) \n"
+" xor $3,$5,$5 \n"
+" ldq $27,48($20) \n"
+" ldq $0,56($17) \n"
+" \n"
+" ldq $1,56($18) \n"
+" ldq $2,56($19) \n"
+" xor $6,$7,$7 # 8 cycles from $6 load \n"
+" ldq $3,56($20) \n"
+" \n"
+" stq $5,32($17) \n"
+" xor $21,$22,$22 # 8 cycles from $22 load \n"
+" xor $7,$22,$22 \n"
+" xor $23,$24,$24 # 5 cycles from $24 load \n"
+" \n"
+" stq $22,40($17) \n"
+" xor $25,$27,$27 # 5 cycles from $27 load \n"
+" xor $24,$27,$27 \n"
+" xor $0,$1,$1 # 5 cycles from $1 load \n"
+" \n"
+" stq $27,48($17) \n"
+" xor $2,$3,$3 # 4 cycles from $3 load \n"
+" xor $1,$3,$3 \n"
+" subq $16,1,$16 \n"
+" \n"
+" stq $3,56($17) \n"
+" addq $20,64,$20 \n"
+" addq $19,64,$19 \n"
+" addq $18,64,$18 \n"
+" \n"
+" addq $17,64,$17 \n"
+" bgt $16,4b \n"
+" ret \n"
+" .end xor_alpha_4 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_5 \n"
+"xor_alpha_5: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" .align 4 \n"
+"5: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,0($20) \n"
+" \n"
+" ldq $4,0($21) \n"
+" ldq $5,8($17) \n"
+" ldq $6,8($18) \n"
+" ldq $7,8($19) \n"
+" \n"
+" ldq $22,8($20) \n"
+" ldq $23,8($21) \n"
+" ldq $24,16($17) \n"
+" ldq $25,16($18) \n"
+" \n"
+" ldq $27,16($19) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" ldq $28,16($20) \n"
+" xor $2,$3,$3 # 6 cycles from $3 load \n"
+" \n"
+" ldq $0,16($21) \n"
+" xor $1,$3,$3 \n"
+" ldq $1,24($17) \n"
+" xor $3,$4,$4 # 7 cycles from $4 load \n"
+" \n"
+" stq $4,0($17) \n"
+" xor $5,$6,$6 # 7 cycles from $6 load \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" xor $6,$23,$23 # 7 cycles from $23 load \n"
+" \n"
+" ldq $2,24($18) \n"
+" xor $22,$23,$23 \n"
+" ldq $3,24($19) \n"
+" xor $24,$25,$25 # 8 cycles from $25 load \n"
+" \n"
+" stq $23,8($17) \n"
+" xor $25,$27,$27 # 8 cycles from $27 load \n"
+" ldq $4,24($20) \n"
+" xor $28,$0,$0 # 7 cycles from $0 load \n"
+" \n"
+" ldq $5,24($21) \n"
+" xor $27,$0,$0 \n"
+" ldq $6,32($17) \n"
+" ldq $7,32($18) \n"
+" \n"
+" stq $0,16($17) \n"
+" xor $1,$2,$2 # 6 cycles from $2 load \n"
+" ldq $22,32($19) \n"
+" xor $3,$4,$4 # 4 cycles from $4 load \n"
+" \n"
+" ldq $23,32($20) \n"
+" xor $2,$4,$4 \n"
+" ldq $24,32($21) \n"
+" ldq $25,40($17) \n"
+" \n"
+" ldq $27,40($18) \n"
+" ldq $28,40($19) \n"
+" ldq $0,40($20) \n"
+" xor $4,$5,$5 # 7 cycles from $5 load \n"
+" \n"
+" stq $5,24($17) \n"
+" xor $6,$7,$7 # 7 cycles from $7 load \n"
+" ldq $1,40($21) \n"
+" ldq $2,48($17) \n"
+" \n"
+" ldq $3,48($18) \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" ldq $4,48($19) \n"
+" xor $23,$24,$24 # 6 cycles from $24 load \n"
+" \n"
+" ldq $5,48($20) \n"
+" xor $22,$24,$24 \n"
+" ldq $6,48($21) \n"
+" xor $25,$27,$27 # 7 cycles from $27 load \n"
+" \n"
+" stq $24,32($17) \n"
+" xor $27,$28,$28 # 8 cycles from $28 load \n"
+" ldq $7,56($17) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" \n"
+" ldq $22,56($18) \n"
+" ldq $23,56($19) \n"
+" ldq $24,56($20) \n"
+" ldq $25,56($21) \n"
+" \n"
+" xor $28,$1,$1 \n"
+" xor $2,$3,$3 # 9 cycles from $3 load \n"
+" xor $3,$4,$4 # 9 cycles from $4 load \n"
+" xor $5,$6,$6 # 8 cycles from $6 load \n"
+" \n"
+" stq $1,40($17) \n"
+" xor $4,$6,$6 \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" xor $23,$24,$24 # 6 cycles from $24 load \n"
+" \n"
+" stq $6,48($17) \n"
+" xor $22,$24,$24 \n"
+" subq $16,1,$16 \n"
+" xor $24,$25,$25 # 8 cycles from $25 load \n"
+" \n"
+" stq $25,56($17) \n"
+" addq $21,64,$21 \n"
+" addq $20,64,$20 \n"
+" addq $19,64,$19 \n"
+" \n"
+" addq $18,64,$18 \n"
+" addq $17,64,$17 \n"
+" bgt $16,5b \n"
+" ret \n"
+" .end xor_alpha_5 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_prefetch_2 \n"
+"xor_alpha_prefetch_2: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" \n"
+" ldq $31, 0($17) \n"
+" ldq $31, 0($18) \n"
+" \n"
+" ldq $31, 64($17) \n"
+" ldq $31, 64($18) \n"
+" \n"
+" ldq $31, 128($17) \n"
+" ldq $31, 128($18) \n"
+" \n"
+" ldq $31, 192($17) \n"
+" ldq $31, 192($18) \n"
+" .align 4 \n"
+"2: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,8($17) \n"
+" ldq $3,8($18) \n"
+" \n"
+" ldq $4,16($17) \n"
+" ldq $5,16($18) \n"
+" ldq $6,24($17) \n"
+" ldq $7,24($18) \n"
+" \n"
+" ldq $19,32($17) \n"
+" ldq $20,32($18) \n"
+" ldq $21,40($17) \n"
+" ldq $22,40($18) \n"
+" \n"
+" ldq $23,48($17) \n"
+" ldq $24,48($18) \n"
+" ldq $25,56($17) \n"
+" ldq $27,56($18) \n"
+" \n"
+" ldq $31,256($17) \n"
+" xor $0,$1,$0 # 8 cycles from $1 load \n"
+" ldq $31,256($18) \n"
+" xor $2,$3,$2 \n"
+" \n"
+" stq $0,0($17) \n"
+" xor $4,$5,$4 \n"
+" stq $2,8($17) \n"
+" xor $6,$7,$6 \n"
+" \n"
+" stq $4,16($17) \n"
+" xor $19,$20,$19 \n"
+" stq $6,24($17) \n"
+" xor $21,$22,$21 \n"
+" \n"
+" stq $19,32($17) \n"
+" xor $23,$24,$23 \n"
+" stq $21,40($17) \n"
+" xor $25,$27,$25 \n"
+" \n"
+" stq $23,48($17) \n"
+" subq $16,1,$16 \n"
+" stq $25,56($17) \n"
+" addq $17,64,$17 \n"
+" \n"
+" addq $18,64,$18 \n"
+" bgt $16,2b \n"
+" ret \n"
+" .end xor_alpha_prefetch_2 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_prefetch_3 \n"
+"xor_alpha_prefetch_3: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" \n"
+" ldq $31, 0($17) \n"
+" ldq $31, 0($18) \n"
+" ldq $31, 0($19) \n"
+" \n"
+" ldq $31, 64($17) \n"
+" ldq $31, 64($18) \n"
+" ldq $31, 64($19) \n"
+" \n"
+" ldq $31, 128($17) \n"
+" ldq $31, 128($18) \n"
+" ldq $31, 128($19) \n"
+" \n"
+" ldq $31, 192($17) \n"
+" ldq $31, 192($18) \n"
+" ldq $31, 192($19) \n"
+" .align 4 \n"
+"3: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,8($17) \n"
+" \n"
+" ldq $4,8($18) \n"
+" ldq $6,16($17) \n"
+" ldq $7,16($18) \n"
+" ldq $21,24($17) \n"
+" \n"
+" ldq $22,24($18) \n"
+" ldq $24,32($17) \n"
+" ldq $25,32($18) \n"
+" ldq $5,8($19) \n"
+" \n"
+" ldq $20,16($19) \n"
+" ldq $23,24($19) \n"
+" ldq $27,32($19) \n"
+" nop \n"
+" \n"
+" xor $0,$1,$1 # 8 cycles from $0 load \n"
+" xor $3,$4,$4 # 7 cycles from $4 load \n"
+" xor $6,$7,$7 # 6 cycles from $7 load \n"
+" xor $21,$22,$22 # 5 cycles from $22 load \n"
+" \n"
+" xor $1,$2,$2 # 9 cycles from $2 load \n"
+" xor $24,$25,$25 # 5 cycles from $25 load \n"
+" stq $2,0($17) \n"
+" xor $4,$5,$5 # 6 cycles from $5 load \n"
+" \n"
+" stq $5,8($17) \n"
+" xor $7,$20,$20 # 7 cycles from $20 load \n"
+" stq $20,16($17) \n"
+" xor $22,$23,$23 # 7 cycles from $23 load \n"
+" \n"
+" stq $23,24($17) \n"
+" xor $25,$27,$27 # 7 cycles from $27 load \n"
+" stq $27,32($17) \n"
+" nop \n"
+" \n"
+" ldq $0,40($17) \n"
+" ldq $1,40($18) \n"
+" ldq $3,48($17) \n"
+" ldq $4,48($18) \n"
+" \n"
+" ldq $6,56($17) \n"
+" ldq $7,56($18) \n"
+" ldq $2,40($19) \n"
+" ldq $5,48($19) \n"
+" \n"
+" ldq $20,56($19) \n"
+" ldq $31,256($17) \n"
+" ldq $31,256($18) \n"
+" ldq $31,256($19) \n"
+" \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" xor $3,$4,$4 # 5 cycles from $4 load \n"
+" xor $6,$7,$7 # 5 cycles from $7 load \n"
+" xor $1,$2,$2 # 4 cycles from $2 load \n"
+" \n"
+" xor $4,$5,$5 # 5 cycles from $5 load \n"
+" xor $7,$20,$20 # 4 cycles from $20 load \n"
+" stq $2,40($17) \n"
+" subq $16,1,$16 \n"
+" \n"
+" stq $5,48($17) \n"
+" addq $19,64,$19 \n"
+" stq $20,56($17) \n"
+" addq $18,64,$18 \n"
+" \n"
+" addq $17,64,$17 \n"
+" bgt $16,3b \n"
+" ret \n"
+" .end xor_alpha_prefetch_3 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_prefetch_4 \n"
+"xor_alpha_prefetch_4: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" \n"
+" ldq $31, 0($17) \n"
+" ldq $31, 0($18) \n"
+" ldq $31, 0($19) \n"
+" ldq $31, 0($20) \n"
+" \n"
+" ldq $31, 64($17) \n"
+" ldq $31, 64($18) \n"
+" ldq $31, 64($19) \n"
+" ldq $31, 64($20) \n"
+" \n"
+" ldq $31, 128($17) \n"
+" ldq $31, 128($18) \n"
+" ldq $31, 128($19) \n"
+" ldq $31, 128($20) \n"
+" \n"
+" ldq $31, 192($17) \n"
+" ldq $31, 192($18) \n"
+" ldq $31, 192($19) \n"
+" ldq $31, 192($20) \n"
+" .align 4 \n"
+"4: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,0($20) \n"
+" \n"
+" ldq $4,8($17) \n"
+" ldq $5,8($18) \n"
+" ldq $6,8($19) \n"
+" ldq $7,8($20) \n"
+" \n"
+" ldq $21,16($17) \n"
+" ldq $22,16($18) \n"
+" ldq $23,16($19) \n"
+" ldq $24,16($20) \n"
+" \n"
+" ldq $25,24($17) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" ldq $27,24($18) \n"
+" xor $2,$3,$3 # 6 cycles from $3 load \n"
+" \n"
+" ldq $0,24($19) \n"
+" xor $1,$3,$3 \n"
+" ldq $1,24($20) \n"
+" xor $4,$5,$5 # 7 cycles from $5 load \n"
+" \n"
+" stq $3,0($17) \n"
+" xor $6,$7,$7 \n"
+" xor $21,$22,$22 # 7 cycles from $22 load \n"
+" xor $5,$7,$7 \n"
+" \n"
+" stq $7,8($17) \n"
+" xor $23,$24,$24 # 7 cycles from $24 load \n"
+" ldq $2,32($17) \n"
+" xor $22,$24,$24 \n"
+" \n"
+" ldq $3,32($18) \n"
+" ldq $4,32($19) \n"
+" ldq $5,32($20) \n"
+" xor $25,$27,$27 # 8 cycles from $27 load \n"
+" \n"
+" ldq $6,40($17) \n"
+" ldq $7,40($18) \n"
+" ldq $21,40($19) \n"
+" ldq $22,40($20) \n"
+" \n"
+" stq $24,16($17) \n"
+" xor $0,$1,$1 # 9 cycles from $1 load \n"
+" xor $2,$3,$3 # 5 cycles from $3 load \n"
+" xor $27,$1,$1 \n"
+" \n"
+" stq $1,24($17) \n"
+" xor $4,$5,$5 # 5 cycles from $5 load \n"
+" ldq $23,48($17) \n"
+" xor $3,$5,$5 \n"
+" \n"
+" ldq $24,48($18) \n"
+" ldq $25,48($19) \n"
+" ldq $27,48($20) \n"
+" ldq $0,56($17) \n"
+" \n"
+" ldq $1,56($18) \n"
+" ldq $2,56($19) \n"
+" ldq $3,56($20) \n"
+" xor $6,$7,$7 # 8 cycles from $6 load \n"
+" \n"
+" ldq $31,256($17) \n"
+" xor $21,$22,$22 # 8 cycles from $22 load \n"
+" ldq $31,256($18) \n"
+" xor $7,$22,$22 \n"
+" \n"
+" ldq $31,256($19) \n"
+" xor $23,$24,$24 # 6 cycles from $24 load \n"
+" ldq $31,256($20) \n"
+" xor $25,$27,$27 # 6 cycles from $27 load \n"
+" \n"
+" stq $5,32($17) \n"
+" xor $24,$27,$27 \n"
+" xor $0,$1,$1 # 7 cycles from $1 load \n"
+" xor $2,$3,$3 # 6 cycles from $3 load \n"
+" \n"
+" stq $22,40($17) \n"
+" xor $1,$3,$3 \n"
+" stq $27,48($17) \n"
+" subq $16,1,$16 \n"
+" \n"
+" stq $3,56($17) \n"
+" addq $20,64,$20 \n"
+" addq $19,64,$19 \n"
+" addq $18,64,$18 \n"
+" \n"
+" addq $17,64,$17 \n"
+" bgt $16,4b \n"
+" ret \n"
+" .end xor_alpha_prefetch_4 \n"
+" \n"
+" .align 3 \n"
+" .ent xor_alpha_prefetch_5 \n"
+"xor_alpha_prefetch_5: \n"
+" .prologue 0 \n"
+" srl $16, 6, $16 \n"
+" \n"
+" ldq $31, 0($17) \n"
+" ldq $31, 0($18) \n"
+" ldq $31, 0($19) \n"
+" ldq $31, 0($20) \n"
+" ldq $31, 0($21) \n"
+" \n"
+" ldq $31, 64($17) \n"
+" ldq $31, 64($18) \n"
+" ldq $31, 64($19) \n"
+" ldq $31, 64($20) \n"
+" ldq $31, 64($21) \n"
+" \n"
+" ldq $31, 128($17) \n"
+" ldq $31, 128($18) \n"
+" ldq $31, 128($19) \n"
+" ldq $31, 128($20) \n"
+" ldq $31, 128($21) \n"
+" \n"
+" ldq $31, 192($17) \n"
+" ldq $31, 192($18) \n"
+" ldq $31, 192($19) \n"
+" ldq $31, 192($20) \n"
+" ldq $31, 192($21) \n"
+" .align 4 \n"
+"5: \n"
+" ldq $0,0($17) \n"
+" ldq $1,0($18) \n"
+" ldq $2,0($19) \n"
+" ldq $3,0($20) \n"
+" \n"
+" ldq $4,0($21) \n"
+" ldq $5,8($17) \n"
+" ldq $6,8($18) \n"
+" ldq $7,8($19) \n"
+" \n"
+" ldq $22,8($20) \n"
+" ldq $23,8($21) \n"
+" ldq $24,16($17) \n"
+" ldq $25,16($18) \n"
+" \n"
+" ldq $27,16($19) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" ldq $28,16($20) \n"
+" xor $2,$3,$3 # 6 cycles from $3 load \n"
+" \n"
+" ldq $0,16($21) \n"
+" xor $1,$3,$3 \n"
+" ldq $1,24($17) \n"
+" xor $3,$4,$4 # 7 cycles from $4 load \n"
+" \n"
+" stq $4,0($17) \n"
+" xor $5,$6,$6 # 7 cycles from $6 load \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" xor $6,$23,$23 # 7 cycles from $23 load \n"
+" \n"
+" ldq $2,24($18) \n"
+" xor $22,$23,$23 \n"
+" ldq $3,24($19) \n"
+" xor $24,$25,$25 # 8 cycles from $25 load \n"
+" \n"
+" stq $23,8($17) \n"
+" xor $25,$27,$27 # 8 cycles from $27 load \n"
+" ldq $4,24($20) \n"
+" xor $28,$0,$0 # 7 cycles from $0 load \n"
+" \n"
+" ldq $5,24($21) \n"
+" xor $27,$0,$0 \n"
+" ldq $6,32($17) \n"
+" ldq $7,32($18) \n"
+" \n"
+" stq $0,16($17) \n"
+" xor $1,$2,$2 # 6 cycles from $2 load \n"
+" ldq $22,32($19) \n"
+" xor $3,$4,$4 # 4 cycles from $4 load \n"
+" \n"
+" ldq $23,32($20) \n"
+" xor $2,$4,$4 \n"
+" ldq $24,32($21) \n"
+" ldq $25,40($17) \n"
+" \n"
+" ldq $27,40($18) \n"
+" ldq $28,40($19) \n"
+" ldq $0,40($20) \n"
+" xor $4,$5,$5 # 7 cycles from $5 load \n"
+" \n"
+" stq $5,24($17) \n"
+" xor $6,$7,$7 # 7 cycles from $7 load \n"
+" ldq $1,40($21) \n"
+" ldq $2,48($17) \n"
+" \n"
+" ldq $3,48($18) \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" ldq $4,48($19) \n"
+" xor $23,$24,$24 # 6 cycles from $24 load \n"
+" \n"
+" ldq $5,48($20) \n"
+" xor $22,$24,$24 \n"
+" ldq $6,48($21) \n"
+" xor $25,$27,$27 # 7 cycles from $27 load \n"
+" \n"
+" stq $24,32($17) \n"
+" xor $27,$28,$28 # 8 cycles from $28 load \n"
+" ldq $7,56($17) \n"
+" xor $0,$1,$1 # 6 cycles from $1 load \n"
+" \n"
+" ldq $22,56($18) \n"
+" ldq $23,56($19) \n"
+" ldq $24,56($20) \n"
+" ldq $25,56($21) \n"
+" \n"
+" ldq $31,256($17) \n"
+" xor $28,$1,$1 \n"
+" ldq $31,256($18) \n"
+" xor $2,$3,$3 # 9 cycles from $3 load \n"
+" \n"
+" ldq $31,256($19) \n"
+" xor $3,$4,$4 # 9 cycles from $4 load \n"
+" ldq $31,256($20) \n"
+" xor $5,$6,$6 # 8 cycles from $6 load \n"
+" \n"
+" stq $1,40($17) \n"
+" xor $4,$6,$6 \n"
+" xor $7,$22,$22 # 7 cycles from $22 load \n"
+" xor $23,$24,$24 # 6 cycles from $24 load \n"
+" \n"
+" stq $6,48($17) \n"
+" xor $22,$24,$24 \n"
+" ldq $31,256($21) \n"
+" xor $24,$25,$25 # 8 cycles from $25 load \n"
+" \n"
+" stq $25,56($17) \n"
+" subq $16,1,$16 \n"
+" addq $21,64,$21 \n"
+" addq $20,64,$20 \n"
+" \n"
+" addq $19,64,$19 \n"
+" addq $18,64,$18 \n"
+" addq $17,64,$17 \n"
+" bgt $16,5b \n"
+" \n"
+" ret \n"
+" .end xor_alpha_prefetch_5 \n"
+);

static struct xor_block_template xor_block_alpha = {
name: "alpha",
--
| Thorsten Kranzkowski Internet: [email protected] |
| Mobile: ++49 170 1876134 Snail: Niemannsweg 30, 49201 Dissen, Germany |
| Ampr: dl8bcu@db0lj.#rpl.deu.eu, [email protected] [44.130.8.19] |

2003-09-14 11:34:35

by Willy Tarreau

[permalink] [raw]
Subject: Re: Linux 2.4.23-pre4 => NFSD problem on alpha

Hi Marcelo, Neil,

I've tested -pre4 on my alpha, and noticed that knsfd doesn't work anymore :
the client sticks in D state forever. It has been working flawlessly for
weeks with 2.4.22-rc2. What's strange is that 23-pre4 is OK on my athlon with
the same nfs-utils (1.0.5).

I have the following NFSD options on both kernels :
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y

My alpha kernels were build with GCC 3.2.3, while the athlon one is done with
2.95.3.

If I have some time, I'll try intermediate kernels to find which one brought
the problem. I noticed that there were knfsd changes in 2.4.23-pre3, perhaps
they're related. If you want me to try a patch, please ask.

Cheers,
Willy