Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263776AbUDFLoh (ORCPT ); Tue, 6 Apr 2004 07:44:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263657AbUDFLmr (ORCPT ); Tue, 6 Apr 2004 07:42:47 -0400 Received: from mtvcafw.sgi.com ([192.48.171.6]:14693 "EHLO omx3.sgi.com") by vger.kernel.org with ESMTP id S263784AbUDFLin (ORCPT ); Tue, 6 Apr 2004 07:38:43 -0400 Date: Tue, 6 Apr 2004 04:37:32 -0700 From: Paul Jackson To: Paul Jackson Cc: colpatch@us.ibm.com, wli@holomorphy.com, linux-kernel@vger.kernel.org Subject: Re: [Patch 17/23] mask v2 = [6/7] nodemask_t_ia64_changes Message-Id: <20040406043732.6fb2df9f.pj@sgi.com> In-Reply-To: <20040401131240.00f7d74d.pj@sgi.com> References: <20040401122802.23521599.pj@sgi.com> <20040401131240.00f7d74d.pj@sgi.com> Organization: SGI X-Mailer: Sylpheed version 0.8.10claws (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12231 Lines: 231 Matthew, A couple of these nodemask changes are increasing kernel text size quite a bit on big numa configurations. I've got one test case I ran where the text size of vmlinux increased from 8789097 to 8810513 bytes (2.6.5 kernel for ia64 SN2 NR_CPUS=512 sn2_defconfig, gcc 3.2.3). >From a cursory comparson of 'nm --print-size --size-sort' output, I think the increased space is caused by the numerous numnodes changes, such as: - pxm_to_nid_map[i] = numnodes; - nid_to_pxm_map[numnodes++] = i; + pxm_to_nid_map[i] = num_online_nodes(); + nid_to_pxm_map[num_online_nodes()] = i; + node_set_online(num_online_nodes()); And by the for loop replacements: - for (nid = 0, i = 0; i < numnodes; i++) { + nid = 0; + for_each_online_node(i) { In particular, the machine code generated by the following silly little routine: int foo() { int i = 0, n; for_each_online_node(n) i++; return i; } is ... hold onto your hat ... a000000100116f40 : a000000100116f40: 00 10 20 02 29 26 [MII] addl r2=-2091896,r1 a000000100116f46: 80 00 00 00 42 20 mov r8=r0 a000000100116f4c: 02 00 08 90 mov r17=256 a000000100116f50: 0b 90 00 00 00 21 [MMI] mov r18=r0;; a000000100116f56: 50 01 08 30 20 00 ld8 r21=[r2] a000000100116f5c: 00 00 04 00 nop.i 0x0;; a000000100116f60: 01 78 00 2a 00 21 [MII] mov r15=r21 a000000100116f66: 00 00 00 02 00 00 nop.i 0x0 a000000100116f6c: 00 00 04 00 nop.i 0x0;; a000000100116f70: 03 80 20 1e 18 14 [MII] ld8 r16=[r15],8 a000000100116f76: 10 01 46 7e 46 40 adds r17=-64,r17;; a000000100116f7c: 00 8c b0 88 and r2=-64,r17;; a000000100116f80: 10 48 00 04 08 39 [MIB] cmp.eq p9,p8=0,r2 a000000100116f86: a0 00 40 16 f2 05 cmp.eq p10,p11=0,r16 a000000100116f8c: 90 00 00 43 (p11) br.cond.dpnt.few a000000100117010 a000000100116f90: 11 90 00 25 00 21 [MIB] adds r18=64,r18 a000000100116f96: 00 00 00 02 00 04 nop.i 0x0 a000000100116f9c: e0 ff ff 4a (p08) br.cond.dptk.few a000000100116f70 ;; a000000100116fa0: 01 50 fc f9 ff 27 [MII] mov r10=-1 a000000100116fa6: 00 01 46 4a 40 20 sub r16=64,r17 a000000100116fac: 01 88 20 e4 cmp.eq p9,p8=0,r17;; a000000100116fb0: 30 71 00 24 00 21 [MIB] (p09) mov r14=r18 a000000100116fb6: b0 00 40 24 80 04 zxt4 r11=r16 a000000100116fbc: 90 00 00 42 (p09) br.cond.dptk.few a000000100117040 a000000100116fc0: 0b 18 00 1e 18 10 [MMI] ld8 r3=[r15];; a000000100116fc6: 00 00 00 02 00 20 nop.m 0x0 a000000100116fcc: b1 50 00 79 shr.u r9=r10,r11;; a000000100116fd0: 03 00 00 00 01 00 [MII] nop.m 0x0 a000000100116fd6: 00 00 00 02 00 00 nop.i 0x0;; a000000100116fdc: 00 00 04 00 nop.i 0x0;; a000000100116fe0: 01 00 00 00 01 00 [MII] nop.m 0x0 a000000100116fe6: 00 00 00 02 00 00 nop.i 0x0 a000000100116fec: 00 00 04 00 nop.i 0x0;; a000000100116ff0: 0b 80 24 06 0c 20 [MMI] and r16=r9,r3;; a000000100116ff6: d0 00 40 18 72 00 cmp.eq p13,p12=0,r16 a000000100116ffc: 00 00 04 00 nop.i 0x0;; a000000100117000: b0 71 48 22 00 20 [MIB] (p13) add r14=r18,r17 a000000100117006: 00 00 00 02 80 06 nop.i 0x0 a00000010011700c: 40 00 00 42 (p13) br.cond.dptk.few a000000100117040 a000000100117010: 0b 98 fc 21 3f 23 [MMI] adds r19=-1,r16;; a000000100117016: 10 99 40 1a 40 00 andcm r17=r19,r16 a00000010011701c: 00 00 04 00 nop.i 0x0;; a000000100117020: 02 00 00 00 01 00 [MII] nop.m 0x0 a000000100117026: f0 00 44 a4 39 c0 popcnt r15=r17;; a00000010011702c: 21 79 00 80 add r14=r18,r15 a000000100117030: 01 00 00 00 01 00 [MII] nop.m 0x0 a000000100117036: 00 00 00 02 00 00 nop.i 0x0 a00000010011703c: 00 00 04 00 nop.i 0x0;; a000000100117040: 00 90 fc 01 01 24 [MII] mov r18=255 a000000100117046: f0 00 38 00 42 c0 mov r15=r14 a00000010011704c: f2 e7 ff 9f mov r22=-1 a000000100117050: 1d a0 fc 01 01 24 [MFB] mov r20=255 a000000100117056: 00 00 00 02 00 00 nop.f 0x0 a00000010011705c: 00 00 00 20 nop.b 0x0;; a000000100117060: 10 78 48 1c 8e 30 [MIB] cmp4.lt p15,p14=r18,r14 a000000100117066: 00 00 00 02 80 87 nop.i 0x0 a00000010011706c: 08 00 84 03 (p15) br.ret.dpnt.many b0 a000000100117070: 01 c0 04 1e 00 21 [MII] adds r24=1,r15 a000000100117076: 30 01 00 04 48 00 mov r19=256 a00000010011707c: 11 40 00 84 adds r8=1,r8;; a000000100117080: 02 00 00 00 01 00 [MII] nop.m 0x0 a000000100117086: f0 00 60 2c 00 e0 sxt4 r15=r24;; a00000010011708c: c2 78 e4 52 shr.u r23=r15,6 a000000100117090: 09 90 00 1f 2c 22 [MMI] and r18=-64,r15 a000000100117096: a0 78 4c 16 68 e0 cmp.ltu p10,p11=r15,r19 a00000010011709c: f1 7b b0 80 and r15=63,r15;; a0000001001170a0: 10 98 4c 24 05 e0 [MIB] sub r19=r19,r18 a0000001001170a6: e2 00 00 04 c8 05 (p11) mov r14=256 a0000001001170ac: a0 01 00 42 (p11) br.cond.dptk.few a000000100117240 a0000001001170b0: 0a 40 00 1e 09 39 [MMI] cmp.eq p8,p9=0,r15;; a0000001001170b6: c0 f8 4d 1a 6a 40 cmp.ltu p12,p13=63,r19 a0000001001170bc: 63 79 20 79 shl r26=r22,r15 a0000001001170c0: 11 88 5c 2a 12 20 [MIB] shladd r17=r23,3,r21 a0000001001170c6: 00 00 00 02 00 04 nop.i 0x0 a0000001001170cc: 70 00 00 42 (p08) br.cond.dptk.few a000000100117130 ;; a0000001001170d0: 01 c8 20 22 18 14 [MII] ld8 r25=[r17],8 a0000001001170d6: 00 00 00 02 00 00 nop.i 0x0 a0000001001170dc: 00 00 04 00 nop.i 0x0;; a0000001001170e0: 01 00 00 00 01 00 [MII] nop.m 0x0 a0000001001170e6: 00 00 00 02 00 00 nop.i 0x0 a0000001001170ec: 00 00 04 00 nop.i 0x0;; a0000001001170f0: 10 80 68 32 0c 20 [MIB] and r16=r26,r25 a0000001001170f6: 00 00 00 02 80 06 nop.i 0x0 a0000001001170fc: c0 00 00 43 (p13) br.cond.dpnt.few a0000001001171b0 a000000100117100: 1d 98 00 27 3f 23 [MFB] adds r19=-64,r19 a000000100117106: 00 00 00 02 00 00 nop.f 0x0 a00000010011710c: 00 00 00 20 nop.b 0x0;; a000000100117110: 10 70 00 20 0f 39 [MIB] cmp.eq p14,p15=0,r16 a000000100117116: 00 00 00 02 80 07 nop.i 0x0 a00000010011711c: 00 01 00 43 (p15) br.cond.dpnt.few a000000100117210 a000000100117120: 00 90 00 25 00 21 [MII] adds r18=64,r18 a000000100117126: 00 00 00 02 00 00 nop.i 0x0 a00000010011712c: 00 00 04 00 nop.i 0x0 a000000100117130: 1d d8 00 27 2c 22 [MFB] and r27=-64,r19 a000000100117136: 00 00 00 02 00 00 nop.f 0x0 a00000010011713c: 00 00 00 20 nop.b 0x0;; a000000100117140: 10 58 00 36 0a 39 [MIB] cmp.eq p11,p10=0,r27 a000000100117146: 00 00 00 02 80 05 nop.i 0x0 a00000010011714c: 40 00 00 43 (p11) br.cond.dpnt.few a000000100117180 a000000100117150: 03 80 20 22 18 14 [MII] ld8 r16=[r17],8 a000000100117156: 30 01 4e 7e 46 80 adds r19=-64,r19;; a00000010011715c: 03 9c b0 88 and r28=-64,r19;; a000000100117160: 10 48 00 38 08 39 [MIB] cmp.eq p9,p8=0,r28 a000000100117166: c0 00 40 1a f2 06 cmp.eq p12,p13=0,r16 a00000010011716c: b0 00 00 43 (p13) br.cond.dpnt.few a000000100117210 a000000100117170: 11 90 00 25 00 21 [MIB] adds r18=64,r18 a000000100117176: 00 00 00 02 00 04 nop.i 0x0 a00000010011717c: e0 ff ff 4a (p08) br.cond.dptk.few a000000100117150 ;; a000000100117180: 1d 48 00 26 08 39 [MFB] cmp.eq p9,p8=0,r19 a000000100117186: 00 00 00 02 00 00 nop.f 0x0 a00000010011718c: 00 00 00 20 nop.b 0x0;; a000000100117190: 30 71 00 24 00 21 [MIB] (p09) mov r14=r18 a000000100117196: 00 00 00 02 80 04 nop.i 0x0 a00000010011719c: b0 00 00 42 (p09) br.cond.dptk.few a000000100117240 a0000001001171a0: 00 80 00 22 18 10 [MII] ld8 r16=[r17] a0000001001171a6: 00 00 00 02 00 00 nop.i 0x0 a0000001001171ac: 00 00 04 00 nop.i 0x0 a0000001001171b0: 0b f8 00 27 25 20 [MMI] sub r31=64,r19;; a0000001001171b6: 00 00 00 02 00 c0 nop.m 0x0 a0000001001171bc: 03 f8 48 00 zxt4 r30=r31;; a0000001001171c0: 01 00 00 00 01 00 [MII] nop.m 0x0 a0000001001171c6: d0 f1 58 80 3c 00 shr.u r29=r22,r30 a0000001001171cc: 00 00 04 00 nop.i 0x0;; a0000001001171d0: 03 00 00 00 01 00 [MII] nop.m 0x0 a0000001001171d6: 00 00 00 02 00 00 nop.i 0x0;; a0000001001171dc: 00 00 04 00 nop.i 0x0;; a0000001001171e0: 01 00 00 00 01 00 [MII] nop.m 0x0 a0000001001171e6: 00 00 00 02 00 00 nop.i 0x0 a0000001001171ec: 00 00 04 00 nop.i 0x0;; a0000001001171f0: 0b 80 74 20 0c 20 [MMI] and r16=r29,r16;; a0000001001171f6: f0 00 40 1c 72 00 cmp.eq p15,p14=0,r16 a0000001001171fc: 00 00 04 00 nop.i 0x0;; a000000100117200: f0 71 48 26 00 20 [MIB] (p15) add r14=r18,r19 a000000100117206: 00 00 00 02 80 07 nop.i 0x0 a00000010011720c: 40 00 00 42 (p15) br.cond.dptk.few a000000100117240 a000000100117210: 0b 48 fc 21 3f 23 [MMI] adds r9=-1,r16;; a000000100117216: 30 48 40 1a 40 00 andcm r3=r9,r16 a00000010011721c: 00 00 04 00 nop.i 0x0;; a000000100117220: 02 00 00 00 01 00 [MII] nop.m 0x0 a000000100117226: 20 00 0c a4 39 c0 popcnt r2=r3;; a00000010011722c: 21 11 00 80 add r14=r18,r2 a000000100117230: 01 00 00 00 01 00 [MII] nop.m 0x0 a000000100117236: 00 00 00 02 00 00 nop.i 0x0 a00000010011723c: 00 00 04 00 nop.i 0x0;; a000000100117240: 10 78 00 1c 00 21 [MIB] mov r15=r14 a000000100117246: b0 a0 38 14 61 05 cmp4.lt p11,p10=r20,r14 a00000010011724c: 30 fe ff 4a (p10) br.cond.dptk.few a000000100117070 a000000100117250: 11 00 00 00 01 00 [MIB] nop.m 0x0 a000000100117256: 00 00 00 02 00 80 nop.i 0x0 a00000010011725c: 08 00 84 00 br.ret.sptk.many b0;; Possible changes to consider: 1) Instead of replacing each numnodes with num_online_nodes(), rather add a local function variable: int numnodes = num_online_nodes(); This would reduce the size of the source code patch as well. 2) Perhaps some of the mechanism laying beneath num_online_nodes(), such as in the bitmap/bitop area, should not be inlined. 3) Are not the following two codes essentially equivalent: int n; for_each_online_node(n) { blah blah ... } and: int n; for (n = 0; n < MAX_NUMNODES; n++) { if (! node_online(n)) continue; blah blah ... } I'll wager the second form generates better code. And since the second form is closer to what was there before, generates a smaller patch. In other words, I am not yet understanding the value of changing each loop over nodes to use these macros. Just possible avenues for investigation - there are likely others. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/