Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756137Ab2HISQs (ORCPT ); Thu, 9 Aug 2012 14:16:48 -0400 Received: from sentry-two.sandia.gov ([132.175.109.14]:36860 "EHLO sentry-two.sandia.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752044Ab2HISQq (ORCPT ); Thu, 9 Aug 2012 14:16:46 -0400 X-WSS-ID: 0M8I2RT-0B-08H-02 X-M-MSG: X-Server-Uuid: 6BFC7783-7E22-49B4-B610-66D6BE496C0E Message-ID: <5023FE83.4090200@sandia.gov> Date: Thu, 9 Aug 2012 12:16:35 -0600 From: "Jim Schutt" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120607 Thunderbird/10.0.5 MIME-Version: 1.0 To: "Mel Gorman" cc: Linux-MM , "Rik van Riel" , "Minchan Kim" , LKML Subject: Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3 References: <1344520165-24419-1-git-send-email-mgorman@suse.de> In-Reply-To: <1344520165-24419-1-git-send-email-mgorman@suse.de> X-TMWD-Spam-Summary: TS=20120809181638; ID=1; SEV=2.3.1; DFV=B2012080918; IFV=NA; AIF=B2012080918; RPD=5.03.0010; ENG=NA; RPDID=7374723D303030312E30413031303230372E35303233464538362E303030323A534346535441543838363133332C73733D312C6667733D30; CAT=NONE; CON=NONE; SIG=AAAAAAAAAAAAAAAAAAAAAAAAfQ== X-MMS-Spam-Filter-ID: B2012080918_5.03.0010 X-WSS-ID: 7C3D210E4N0672270-01-01 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-RSA-Inspected: yes X-RSA-Classifications: Healthcare Dictionaries, public X-RSA-Action: allow Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12385 Lines: 225 On 08/09/2012 07:49 AM, Mel Gorman wrote: > Changelog since V2 > o Capture !MIGRATE_MOVABLE pages where possible > o Document the treatment of MIGRATE_MOVABLE pages while capturing > o Expand changelogs > > Changelog since V1 > o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan) > o Expanded changelogs a little > > Allocation success rates have been far lower since 3.4 due to commit > [fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This > commit was introduced for good reasons and it was known in advance that > the success rates would suffer but it was justified on the grounds that > the high allocation success rates were achieved by aggressive reclaim. > Success rates are expected to suffer even more in 3.6 due to commit > [7db8889a: mm: have order> 0 compaction start off where it left] which > testing has shown to severely reduce allocation success rates under load - > to 0% in one case. There is a proposed change to that patch in this series > and it would be ideal if Jim Schutt could retest the workload that led to > commit [7db8889a: mm: have order> 0 compaction start off where it left]. On my first test of this patch series on top of 3.5, I ran into an instance of what I think is the sort of thing that patch 4/5 was fixing. Here's what vmstat had to say during that period: ---------- 2012-08-09 11:58:04.107-06:00 vmstat -w 4 16 procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 20 14 0 235884 576 38916072 0 0 12 17047 171 133 3 8 85 4 0 18 17 0 220272 576 38955912 0 0 86 2131838 200142 162956 12 38 31 19 0 17 9 0 244284 576 38955328 0 0 19 2179562 213775 167901 13 43 26 18 0 27 15 0 223036 576 38952640 0 0 24 2202816 217996 158390 14 47 25 15 0 17 16 0 233124 576 38959908 0 0 5 2268815 224647 165728 14 50 21 15 0 16 13 0 225840 576 38995740 0 0 52 2253829 216797 160551 14 47 23 16 0 22 13 0 260584 576 38982908 0 0 92 2196737 211694 140924 14 53 19 15 0 16 10 0 235784 576 38917128 0 0 22 2157466 210022 137630 14 54 19 14 0 12 13 0 214300 576 38923848 0 0 31 2187735 213862 142711 14 52 20 14 0 25 12 0 219528 576 38919540 0 0 11 2066523 205256 142080 13 49 23 15 0 26 14 0 229460 576 38913704 0 0 49 2108654 200692 135447 13 51 21 15 0 11 11 0 220376 576 38862456 0 0 45 2136419 207493 146813 13 49 22 16 0 36 12 0 229860 576 38869784 0 0 7 2163463 212223 151812 14 47 25 14 0 16 13 0 238356 576 38891496 0 0 67 2251650 221728 154429 14 52 20 14 0 65 15 0 211536 576 38922108 0 0 59 2237925 224237 156587 14 53 19 14 0 24 13 0 585024 576 38634024 0 0 37 2240929 229040 148192 15 61 14 10 0 2012-08-09 11:59:04.714-06:00 vmstat -w 4 16 procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 43 8 0 794392 576 38382316 0 0 11 20491 576 420 3 10 82 4 0 127 6 0 579328 576 38422156 0 0 21 2006775 205582 119660 12 70 11 7 0 44 5 0 492860 576 38512360 0 0 46 1536525 173377 85320 10 78 7 4 0 218 9 0 585668 576 38271320 0 0 39 1257266 152869 64023 8 83 7 3 0 101 6 0 600168 576 38128104 0 0 10 1438705 160769 68374 9 84 5 3 0 62 5 0 597004 576 38098972 0 0 93 1376841 154012 63912 8 82 7 4 0 61 11 0 850396 576 37808772 0 0 46 1186816 145731 70453 7 78 9 6 0 124 7 0 437388 576 38126320 0 0 15 1208434 149736 57142 7 86 4 3 0 204 11 0 1105816 576 37309532 0 0 20 1327833 145979 52718 7 87 4 2 0 29 8 0 751020 576 37360332 0 0 8 1405474 169916 61982 9 85 4 2 0 38 7 0 626448 576 37333244 0 0 14 1328415 174665 74214 8 84 5 3 0 23 5 0 650040 576 37134280 0 0 28 1351209 179220 71631 8 85 5 2 0 40 10 0 610988 576 37054292 0 0 104 1272527 167530 73527 7 85 5 3 0 79 22 0 2076836 576 35487340 0 0 750 1249934 175420 70124 7 88 3 2 0 58 6 0 431068 576 36934140 0 0 1000 1366234 169675 72524 8 84 5 3 0 134 9 0 574692 576 36784980 0 0 1049 1305543 152507 62639 8 84 4 4 0 2012-08-09 12:00:09.137-06:00 vmstat -w 4 16 procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0 207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0 123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0 123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0 622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0 223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0 104 14 0 3140508 576 33522616 0 0 299 1414709 160879 51422 9 89 1 1 0 100 11 0 1323036 576 35337740 0 0 429 1637733 175817 94471 9 73 10 8 0 91 11 0 673320 576 35918084 0 0 562 1477100 157069 67951 8 83 5 4 0 35 15 0 3486592 576 32983244 0 0 384 1574186 189023 82135 9 81 5 5 0 51 16 0 1428108 576 34962112 0 0 394 1573231 160575 76632 9 76 9 7 0 55 6 0 719548 576 35621284 0 0 425 1483962 160335 79991 8 74 10 7 0 96 7 0 1226852 576 35062608 0 0 803 1531041 164923 70820 9 78 7 6 0 97 8 0 862500 576 35332496 0 0 536 1177949 155969 80769 7 74 13 7 0 23 5 0 6096372 576 30115776 0 0 367 919949 124993 81755 6 62 24 8 0 13 5 0 7427860 576 28368292 0 0 399 915331 153895 102186 6 53 32 9 0 ---------- And here's a perf report, captured/displayed with perf record -g -a sleep 10 perf report --sort symbol --call-graph fractal,5 sometime during that period just after 12:00:09, when the run queueu was > 100. ---------- Processed 0 events and LOST 1175296! Check IO/CPU overload! # Events: 208K cycles # # Overhead Symbol # ........ ..................................................................................................................................................................................... ................................................................................................................................................................................................. ............................................................................................................ # 34.63% [k] _raw_spin_lock_irqsave | |--97.30%-- isolate_freepages | compaction_alloc | unmap_and_move | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_slowpath | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | do_page_fault | page_fault | | | |--87.39%-- skb_copy_datagram_iovec | | tcp_recvmsg | | inet_recvmsg | | sock_recvmsg | | sys_recvfrom | | system_call | | __recv | | | | | --100.00%-- (nil) | | | --12.61%-- memcpy --2.70%-- [...] 14.31% [k] _raw_spin_lock_irq | |--98.08%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_slowpath | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | do_page_fault | page_fault | | | |--83.93%-- skb_copy_datagram_iovec | | tcp_recvmsg | | inet_recvmsg | | sock_recvmsg | | sys_recvfrom | | system_call | | __recv | | | | | --100.00%-- (nil) | | | --16.07%-- memcpy --1.92%-- [...] 5.48% [k] isolate_freepages_block | |--99.96%-- isolate_freepages | compaction_alloc | unmap_and_move | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_slowpath | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | do_page_fault | page_fault | | | |--86.01%-- skb_copy_datagram_iovec | | tcp_recvmsg | | inet_recvmsg | | sock_recvmsg | | sys_recvfrom | | system_call | | __recv | | | | | --100.00%-- (nil) | | | --13.99%-- memcpy --0.04%-- [...] 5.34% [.] ceph_crc32c_le | |--99.95%-- 0xb8057558d0065990 --0.05%-- [...] ---------- If I understand what this is telling me, skb_copy_datagram_iovec is responsible for triggering the calls to isolate_freepages_block, isolate_migratepages_range, and isolate_freepages? FWIW, I'm using a Chelsio T4 NIC in these hosts, with jumbo frames and the Linux TCP stack (i.e., no stateful TCP offload). -- Jim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/