Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1596954lqo; Sat, 18 May 2024 09:53:33 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVcC59Y/8pdSbzncGmxOZ1Rx2K7TqevGlebdWdrV1Tq4qKgGRcqk7r3mN8jox3je+mhHb1b8pr/I+2Mr0y1Kj6ZVF+S5JVNlJ86dchuGw== X-Google-Smtp-Source: AGHT+IEAXoj8smPnI7xPeo7SuE148B2iQq1zQv4fJIR18+WkrXjtHwD09XrwZXNy+sX+XzphVRFZ X-Received: by 2002:a17:902:f683:b0:1f0:9610:5c9c with SMTP id d9443c01a7336-1f096105f42mr105841085ad.11.1716051213454; Sat, 18 May 2024 09:53:33 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716051213; cv=pass; d=google.com; s=arc-20160816; b=0LxB8pc/7aZifMvW0mtSXqFlfjsDIIRWN+K2JNYxakzX7qwE4FPKIn+kI1/HiMCQ05 MHV+IG47253qyZxHLBYGkTSqOiUBqo36ePY3OQ9JW17WlHIzgycnZYqFncAQaLuctSE6 TgNWP/SYUTnhRnego6iGw9Qx6rq4kqndY88en12zAdu+z9tclH73vZLbRj40G6Dy3Q0M ALo6U3Y9RZKxsN8NZ1s0IXDBgXPc6UoXY0bIpKrgXZ3k8Gg0G0TaUAAanReR5Isgf9d5 /40uOUEeNmXmFVF8x+lLi2TFRhfoscAYVBCP4/vzh5oD+qQuvGjrHswyoGqhSEGPg2jI LOYg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :date:dkim-signature; bh=Y12G3706vB2C3fRBj2WW8vvTZtmZUq1w+NsO+AWMbSw=; fh=CrOnws4b/tSq3qbg2Cz1q706Lb6MYdoR7Ezzm5YxmUw=; b=C+yZz5XXOXqgM+IzF6xdPogLHWqokPzivAN6z4ZriDJp+zT8pn07k3MfkigOYnvx0H 2BD7cQsGIXiXzuU4mbvjqCUmz4Go08g5aBfItRwz6y8qSRoj33hSRwSrbKTBJKa4NyMr GyqnUp7K4vBHJthOa5bXfDqS9BbeqynzAZ4kNr9MaIFX2Xdnv+JCFoSy1jXSA2hY9gI8 vO+uvamTpPpW5YKzxN0S4tfM0FrJFo9hsHdrInJKK4xQZc5kbZwAJoc/xkAeurjtgUT5 9gNJwFRSd6+z22Jsz2XZUwmmLTswRXmGXwGLRORhXEGMoPFOKxSFGjm3F5Td3n2il+hL +F2Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PEjAveFX; arc=pass (i=1 spf=pass spfdomain=flex--sesse.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-182913-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182913-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d9443c01a7336-1ef0bf31a50si203205235ad.224.2024.05.18.09.53.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 May 2024 09:53:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182913-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PEjAveFX; arc=pass (i=1 spf=pass spfdomain=flex--sesse.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-182913-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182913-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 16DF228262F for ; Sat, 18 May 2024 16:53:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0C9D343AA4; Sat, 18 May 2024 16:53:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PEjAveFX" Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 396A233F7 for ; Sat, 18 May 2024 16:53:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716051205; cv=none; b=bYW0Crzgbw8S1MYoS4RYazcPBsA3gFOZ+SerFBRaOYfrbs8abBhh3aK+J7VafaUfME8Q8sR08k3WgBIzbf02KrAVwiWUiZwtj296U+MdlZ7OjZoGUPYFsv/ceCv9y/LY+6VngVECs29gRZSPpkcZf/m3NOV2iDgvjBLRt8CFrfo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716051205; c=relaxed/simple; bh=hfF4/ADedd/JfOCz6I73Q4BjKZIvLx7Vqraeq+tPYU4=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=A+drrJvrrVC+kclhMHKugPSHZ5kTRBj0oekQj6Eunj0eCjf7bhEK7B57rqZMnAqdFRdqnJw06AbnxAk9/EPZsrJE1H3XghrDVrte++T9P8ttCdTpYuvalgu8CoN8rzTSyOhs7kp6Oav7ayjKr4ht57Klt8YMKsnJP8q52rJ/vLE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--sesse.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PEjAveFX; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--sesse.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62035d9ecc4so197417677b3.1 for ; Sat, 18 May 2024 09:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716051201; x=1716656001; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=Y12G3706vB2C3fRBj2WW8vvTZtmZUq1w+NsO+AWMbSw=; b=PEjAveFX7mK4TP2csGwPb4VqhcEB9ggSSdv1CFxomlsD0NFpL77OtglcCyJ7Pk4/bv nkHKk6rlY5tvkDGjn2IFx/M31EEv6bx28b/uO97Spw2iFkcvRndXxPOB7N7uy1T2QZLV e8a2pVdwaIP/q8z6rFReYF5kbNN5eYiTHIkxOJlM6XZqoUnThV19Ao+hDktorMfkItwd okI1UlWSAxRlatXbJ36wFOkDv++Ny1pHf0d3dM/iK9F698zUafTF+T0AbfFIJaxzagYC siZK3fSADb0zpUcqM9M/aC51RltqyxNN3hn+KoxzOFhY0b4vmrINROk6iTEkx1KYqFiW tVBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716051201; x=1716656001; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Y12G3706vB2C3fRBj2WW8vvTZtmZUq1w+NsO+AWMbSw=; b=siLD78WcaYigXkazOQ7Yd1NATm0T94KLHJ539NC51FmDwCB3Xg3C1BZJOpXhkhPUYH 7bPDyjw2fwCqD9u/MHrkAjMyhWYzSvs+EnyTGNnUO293oPjWlAuAGrCnHjoLYKmaR4QG dJf5Ln4EsRcdK/K8qK0O6x1BUPLhD1pNRs1jAqL7tewLTakglFc21It1CUpqLlOyyGe/ D/hGciOjSOuPu9zedCCqpUH3pTwT4rncanRI2sTMZBE3MoBkW8wrWpqwy5ep9Hn3F1Pv //KHlgkQEYs+Sk3DrgcrSzocvwcvJAE9+9nDhnAb38G6rre14Kh9m81lsdZYlmexzKXn h0zg== X-Forwarded-Encrypted: i=1; AJvYcCXXRjl+G2+jsZgoQq3Y3vLw/unzzy4P+lButkCxIJSVEREzD4yiY/PKg65uyLGld68wywRL7g72+GXBLsC2oy68CtoWwt2sv/lXwrji X-Gm-Message-State: AOJu0YwCfkM0NfnUDbustu/EN+iBRBuhd7dE/msAkKHnqtp4OSDWvHNa 80jd0VSP+uTGwtEfn5k5WmtQCcJvdXzA42aiaswDpQ3zBcC7rCYviZYaaZdM5tGnNeXZoP82DQ= = X-Received: from sesse.osl.corp.google.com ([2a00:79e0:18:10:8b68:8208:4316:d53b]) (user=sesse job=sendgmr) by 2002:a05:690c:6302:b0:61b:46e:62da with SMTP id 00721157ae682-622affc63bfmr64483907b3.4.1716051201181; Sat, 18 May 2024 09:53:21 -0700 (PDT) Date: Sat, 18 May 2024 18:53:11 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240518165315.3963142-1-sesse@google.com> Subject: [PATCH v2] Add a trie to map quickly from address range to compilation unit. From: "Steinar H. Gunderson" To: acme@kernel.org Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, irogers@google.com, "Steinar H. Gunderson" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable When using perf to profile large binaries, _bfd_dwarf2_find_nearest_line() becomes a hotspot, as perf wants to get line number information (for inline-detection purposes) for each and every sample. In Chromium in particular (the content_shell binary), this entails going through 475k address ranges, which takes a long time when done repeatedly. Add a radix-256 trie over the address space to quickly map address to compilation unit spaces; for content_shell, which is 1.6 GB when some (but not full) debug information turned is on, we go from 6 ms to 0.006 ms (6 =C2=B5s) for each lookup from address to compilation unit, a 10= 00x speedup. There is a modest RAM increase of 180 MB in this binary (the existing linked list over ranges uses about 10 MB, and the entire perf job uses between 2=E2=80=933 GB for a medium-size profile); for smaller binaries wit= h few ranges, there should be hardly any extra RAM usage at all. --- bfd/dwarf2.c | 372 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 354 insertions(+), 18 deletions(-) diff --git a/bfd/dwarf2.c b/bfd/dwarf2.c index 404f35df62b..61ca68d4e97 100644 --- a/bfd/dwarf2.c +++ b/bfd/dwarf2.c @@ -82,6 +82,76 @@ struct adjusted_section bfd_vma adj_vma; }; =20 +/* A trie to map quickly from address range to compilation unit. + + This is a fairly standard radix-256 trie, used to quickly locate which + compilation unit any given address belongs to. Given that each compila= tion + unit may register hundreds of very small and unaligned ranges (which ma= y + potentially overlap, due to inlining and other concerns), and a large + program may end up containing hundreds of thousands of such ranges, we = cannot + scan through them linearly without undue slowdown. + + We use a hybrid trie to avoid memory explosion: There are two types of = trie + nodes, leaves and interior nodes. (Almost all nodes are leaves, so the= y + take up the bulk of the memory usage.) Leaves contain a simple array of + ranges (high/low address) and which compilation unit contains those ran= ges, + and when we get to a leaf, we scan through it linearly. Interior nodes + contain pointers to 256 other nodes, keyed by the next byte of the addr= ess. + So for a 64-bit address like 0x1234567abcd, we would start at the root = and go + down child[0x00]->child[0x00]->child[0x01]->child[0x23]->child[0x45] et= c., + until we hit a leaf. (Nodes are, in general, leaves until they exceed = the + default allocation of 16 elements, at which point they are converted to + interior node if possible.) This gives us near-constant lookup times; + the only thing that can be costly is if there are lots of overlapping r= anges + within a single 256-byte segment of the binary, in which case we have t= o + scan through them all to find the best match. + + For a binary with few ranges, we will in practice only have a single le= af node + at the root, containing a simple array. Thus, the scheme is efficient = for + both small and large binaries. + */ + +/* Experiments have shown 16 to be a memory-efficient default leaf size. + The only case where a leaf will hold more memory than this, is at the + bottomost level (covering 256 bytes in the binary), where we'll expand + the leaf to be able to hold more ranges if needed. + */ +#define TRIE_LEAF_SIZE 16 + +/* All trie_node pointers will really be trie_leaf or trie_interior, + but they have this common head. */ +struct trie_node +{ + /* If zero, we are an interior node. + Otherwise, how many ranges we have room for in this leaf. */ + int num_room_in_leaf; +}; + +struct trie_leaf +{ + struct trie_node head; + int num_stored_in_leaf; + struct { + const struct comp_unit *unit; + bfd_vma low_pc, high_pc; + } ranges[TRIE_LEAF_SIZE]; +}; + +struct trie_interior +{ + struct trie_node head; + struct trie_node *children[256]; +}; + +static struct trie_node *alloc_trie_leaf (bfd *abfd) +{ + struct trie_leaf *leaf =3D + (struct trie_leaf *) bfd_zalloc (abfd, sizeof (struct trie_leaf)); + if (leaf !=3D NULL) + leaf->head.num_room_in_leaf =3D TRIE_LEAF_SIZE; + return (struct trie_node *) leaf; +} + struct dwarf2_debug_file { /* The actual bfd from which debug info was loaded. Might be @@ -139,6 +209,9 @@ struct dwarf2_debug_file /* A list of all previously read comp_units. */ struct comp_unit *all_comp_units; =20 + /* A list of all previously read comp_units with no ranges (yet). */ + struct comp_unit *all_comp_units_without_ranges; + /* Last comp unit in list above. */ struct comp_unit *last_comp_unit; =20 @@ -147,6 +220,9 @@ struct dwarf2_debug_file =20 /* Hash table to map offsets to decoded abbrevs. */ htab_t abbrev_offsets; + + /* Root of a trie to map addresses to compilation units. */ + struct trie_node *trie_root; }; =20 struct dwarf2_debug @@ -220,6 +296,11 @@ struct comp_unit /* Chain the previously read compilation units. */ struct comp_unit *next_unit; =20 + /* Chain the previously read compilation units that have no ranges yet. + We scan these separately when we have a trie over the ranges. + Unused if arange.high !=3D 0. */ + struct comp_unit *next_unit_without_ranges; + /* Likewise, chain the compilation unit read after this one. The comp units are stored in reversed reading order. */ struct comp_unit *prev_unit; @@ -296,6 +377,10 @@ struct comp_unit =20 /* TRUE if symbols are cached in hash table for faster lookup by name. = */ bool cached; + + /* Used when iterating over trie leaves to know which units we have + already seen in this iteration. */ + bool mark; }; =20 /* This data structure holds the information of an abbrev. */ @@ -1767,9 +1852,183 @@ concat_filename (struct line_info_table *table, uns= igned int file) return strdup (filename); } =20 +/* Number of bits in a bfd_vma. */ +#define VMA_BITS (8 * sizeof (bfd_vma)) + +/* Check whether [low1, high1) can be combined with [low2, high2), + i.e., they touch or overlap. */ +static bool ranges_overlap(bfd_vma low1, bfd_vma high1, bfd_vma low2, bfd_= vma high2) +{ + if (low1 =3D=3D low2 || high1 =3D=3D high2) + return true; + + /* Sort so that low1 is below low2. */ + if (low1 > low2) + { + bfd_vma tmp; + + tmp =3D low1; + low1 =3D low2; + low2 =3D tmp; + + tmp =3D high1; + high1 =3D high2; + high2 =3D tmp; + } + + /* We touch iff low2 =3D=3D high1. + We overlap iff low2 is within [low1, high1). */ + return (low2 <=3D high1); +} + +/* Insert an address range in the trie mapping addresses to compilation un= its. + Will return the new trie node (usually the same as is being sent in, bu= t + in case of a leaf-to-interior conversion, or expansion of a leaf, it ma= y be + different), or NULL on failure. + */ +static struct trie_node *insert_arange_in_trie(bfd *abfd, + struct trie_node *trie, + bfd_vma trie_pc, + int trie_pc_bits, + const struct comp_unit *unit, + bfd_vma low_pc, + bfd_vma high_pc) +{ + bfd_vma clamped_low_pc, clamped_high_pc; + int ch, from_ch, to_ch; + bool is_full_leaf =3D false; + + /* See if we can extend any of the existing ranges. This merging + isn't perfect (if merging opens up the possibility of merging two exi= sting + ranges, we won't find them), but it takes the majority of the cases. = */ + if (trie->num_room_in_leaf > 0) + { + struct trie_leaf *leaf =3D (struct trie_leaf *) trie; + int i; + + for (i =3D 0; i < leaf->num_stored_in_leaf; ++i) + { + if (leaf->ranges[i].unit =3D=3D unit && + ranges_overlap(low_pc, high_pc, + leaf->ranges[i].low_pc, leaf->ranges[i].high_pc)) + { + if (low_pc < leaf->ranges[i].low_pc) + leaf->ranges[i].low_pc =3D low_pc; + if (high_pc > leaf->ranges[i].high_pc) + leaf->ranges[i].high_pc =3D high_pc; + return trie; + } + } + + is_full_leaf =3D leaf->num_stored_in_leaf =3D=3D trie->num_room_in_l= eaf; + } + + /* If we're a leaf with no more room and we're _not_ at the bottom, + convert to an interior node. */ + if (is_full_leaf && trie_pc_bits < (int)VMA_BITS) + { + const struct trie_leaf *leaf =3D (struct trie_leaf *) trie; + int i; + + trie =3D (struct trie_node *) bfd_zalloc (abfd, sizeof (struct trie_= interior)); + if (!trie) + return NULL; + is_full_leaf =3D false; + + /* TODO: If we wanted to save a little more memory at the cost of + complexity, we could have reused the old leaf node as one of the + children of the new interior node, instead of throwing it away. */ + for (i =3D 0; i < leaf->num_stored_in_leaf; ++i) + { + if (!insert_arange_in_trie (abfd, trie, trie_pc, trie_pc_bits, + leaf->ranges[i].unit, leaf->ranges[i].low_pc, + leaf->ranges[i].high_pc)) + return NULL; + } + } + + /* If we're a leaf with no more room and we _are_ at the bottom, + we have no choice but to just make it larger. */ + if (is_full_leaf) + { + const struct trie_leaf *leaf =3D (struct trie_leaf *) trie; + int new_room_in_leaf =3D trie->num_room_in_leaf * 2; + struct trie_leaf *new_leaf; + + new_leaf =3D (struct trie_leaf *) bfd_zalloc (abfd, + sizeof (struct trie_leaf) + (new_room_in_leaf - TRIE_LEAF_SIZE) * sizeof = (leaf->ranges[0])); + new_leaf->head.num_room_in_leaf =3D new_room_in_leaf; + new_leaf->num_stored_in_leaf =3D leaf->num_stored_in_leaf; + + memcpy (new_leaf->ranges, + leaf->ranges, + leaf->num_stored_in_leaf * sizeof (leaf->ranges[0])); + trie =3D (struct trie_node *) new_leaf; + is_full_leaf =3D false; + + /* Now the insert below will go through. */ + } + + /* If we're a leaf (now with room), we can just insert at the end. */ + if (trie->num_room_in_leaf > 0) + { + struct trie_leaf *leaf =3D (struct trie_leaf *) trie; + + int i =3D leaf->num_stored_in_leaf++; + leaf->ranges[i].unit =3D unit; + leaf->ranges[i].low_pc =3D low_pc; + leaf->ranges[i].high_pc =3D high_pc; + return trie; + } + + /* Now we are definitely an interior node, so recurse into all the relev= ant buckets. */ + + /* Clamp the range to the current trie bucket. */ + clamped_low_pc =3D low_pc; + clamped_high_pc =3D high_pc; + if (trie_pc_bits > 0) + { + bfd_vma bucket_high_pc =3D trie_pc + ((bfd_vma)-1 >> trie_pc_bits); = /* Inclusive. */ + if (clamped_low_pc < trie_pc) + clamped_low_pc =3D trie_pc; + if (clamped_high_pc > bucket_high_pc) + clamped_high_pc =3D bucket_high_pc; + } + + /* Insert the ranges in all buckets that it spans. */ + from_ch =3D (clamped_low_pc >> (VMA_BITS - trie_pc_bits - 8)) & 0xff; + to_ch =3D ((clamped_high_pc - 1) >> (VMA_BITS - trie_pc_bits - 8)) & 0xf= f; + for (ch =3D from_ch; ch <=3D to_ch; ++ch) + { + struct trie_interior *interior =3D (struct trie_interior *) trie; + struct trie_node *child =3D interior->children[ch]; + + if (child =3D=3D NULL) + { + child =3D alloc_trie_leaf (abfd); + if (!child) + return NULL; + } + child =3D insert_arange_in_trie (abfd, + child, + trie_pc + ((bfd_vma)ch << (VMA_BITS - trie_pc_bits - 8)), + trie_pc_bits + 8, + unit, + low_pc, + high_pc); + if (!child) + return NULL; + + interior->children[ch] =3D child; + } + + return trie; +} + + static bool arange_add (const struct comp_unit *unit, struct arange *first_arange, - bfd_vma low_pc, bfd_vma high_pc) + struct trie_node **trie_root, bfd_vma low_pc, bfd_vma high_pc) { struct arange *arange; =20 @@ -1777,6 +2036,19 @@ arange_add (const struct comp_unit *unit, struct ara= nge *first_arange, if (low_pc =3D=3D high_pc) return true; =20 + if (trie_root !=3D NULL) + { + *trie_root =3D insert_arange_in_trie (unit->file->bfd_ptr, + *trie_root, + 0, + 0, + unit, + low_pc, + high_pc); + if (*trie_root =3D=3D NULL) + return false; + } + /* If the first arange is empty, use it. */ if (first_arange->high =3D=3D 0) { @@ -2411,7 +2683,7 @@ decode_line_info (struct comp_unit *unit) low_pc =3D address; if (address > high_pc) high_pc =3D address; - if (!arange_add (unit, &unit->arange, low_pc, high_pc)) + if (!arange_add (unit, &unit->arange, &unit->file->trie_root, low_pc, = high_pc)) goto line_fail; break; case DW_LNE_set_address: @@ -3134,7 +3406,7 @@ find_abstract_instance (struct comp_unit *unit, =20 static bool read_ranges (struct comp_unit *unit, struct arange *arange, - bfd_uint64_t offset) + struct trie_node **trie_root, bfd_uint64_t offset) { bfd_byte *ranges_ptr; bfd_byte *ranges_end; @@ -3169,7 +3441,7 @@ read_ranges (struct comp_unit *unit, struct arange *a= range, base_address =3D high_pc; else { - if (!arange_add (unit, arange, + if (!arange_add (unit, arange, trie_root, base_address + low_pc, base_address + high_pc)) return false; } @@ -3179,7 +3451,7 @@ read_ranges (struct comp_unit *unit, struct arange *a= range, =20 static bool read_rnglists (struct comp_unit *unit, struct arange *arange, - bfd_uint64_t offset) + struct trie_node **trie_root, bfd_uint64_t offset) { bfd_byte *rngs_ptr; bfd_byte *rngs_end; @@ -3253,19 +3525,19 @@ read_rnglists (struct comp_unit *unit, struct arang= e *arange, return false; } =20 - if (!arange_add (unit, arange, low_pc, high_pc)) + if (!arange_add (unit, arange, trie_root, low_pc, high_pc)) return false; } } =20 static bool read_rangelist (struct comp_unit *unit, struct arange *arange, - bfd_uint64_t offset) + struct trie_node **trie_root, bfd_uint64_t offset) { if (unit->version <=3D 4) - return read_ranges (unit, arange, offset); + return read_ranges (unit, arange, trie_root, offset); else - return read_rnglists (unit, arange, offset); + return read_rnglists (unit, arange, trie_root, offset); } =20 static struct funcinfo * @@ -3617,7 +3889,7 @@ scan_unit_for_symbols (struct comp_unit *unit) =20 case DW_AT_ranges: if (is_int_form (&attr) - && !read_rangelist (unit, &func->arange, attr.u.val)) + && !read_rangelist (unit, &func->arange, &unit->file->trie_root, a= ttr.u.val)) goto fail; break; =20 @@ -3733,7 +4005,7 @@ scan_unit_for_symbols (struct comp_unit *unit) =20 if (func && high_pc !=3D 0) { - if (!arange_add (unit, &func->arange, low_pc, high_pc)) + if (!arange_add (unit, &func->arange, &unit->file->trie_root, low_pc, h= igh_pc)) goto fail; } } @@ -3931,7 +4203,7 @@ parse_comp_unit (struct dwarf2_debug *stash, =20 case DW_AT_ranges: if (is_int_form (&attr) - && !read_rangelist (unit, &unit->arange, attr.u.val)) + && !read_rangelist (unit, &unit->arange, &unit->file->trie_root, at= tr.u.val)) return NULL; break; =20 @@ -3973,7 +4245,7 @@ parse_comp_unit (struct dwarf2_debug *stash, high_pc +=3D low_pc; if (high_pc !=3D 0) { - if (!arange_add (unit, &unit->arange, low_pc, high_pc)) + if (!arange_add (unit, &unit->arange, &unit->file->trie_root, low_pc= , high_pc)) return NULL; } =20 @@ -4772,6 +5044,14 @@ _bfd_dwarf2_slurp_debug_info (bfd *abfd, bfd *debug_= bfd, if (!stash->alt.abbrev_offsets) return false; =20 + stash->f.trie_root =3D alloc_trie_leaf (abfd); + if (!stash->f.trie_root) + return false; + + stash->alt.trie_root =3D alloc_trie_leaf (abfd); + if (!stash->alt.trie_root) + return false; + *pinfo =3D stash; =20 if (debug_bfd =3D=3D NULL) @@ -4943,6 +5223,12 @@ stash_comp_unit (struct dwarf2_debug *stash, struct = dwarf2_debug_file *file) each->next_unit =3D file->all_comp_units; file->all_comp_units =3D each; =20 + if (each->arange.high =3D=3D 0) + { + each->next_unit_without_ranges =3D file->all_comp_units_without_ran= ges; + file->all_comp_units_without_ranges =3D each->next_unit_without_ran= ges; + } + file->info_ptr +=3D length; return each; } @@ -5185,17 +5471,67 @@ _bfd_dwarf2_find_nearest_line (bfd *abfd, } else { - for (each =3D stash->f.all_comp_units; each; each =3D each->next_uni= t) + struct trie_node *trie =3D stash->f.trie_root; + int bits =3D VMA_BITS - 8; + struct comp_unit **prev_each; + + /* Traverse interior nodes until we get to a leaf. */ + while (trie && trie->num_room_in_leaf =3D=3D 0) { - found =3D ((each->arange.high =3D=3D 0 - || comp_unit_contains_address (each, addr)) - && comp_unit_find_nearest_line (each, addr, + int ch =3D (addr >> bits) & 0xff; + trie =3D ((struct trie_interior *) trie)->children[ch]; + bits -=3D 8; + } + + if (trie) + { + const struct trie_leaf *leaf =3D (struct trie_leaf *) trie; + int i; + + for (i =3D 0; i < leaf->num_stored_in_leaf; ++i) + { + struct comp_unit *unit =3D (struct comp_unit *) leaf->ranges[i].uni= t; + unit->mark =3D false; + } + + for (i =3D 0; i < leaf->num_stored_in_leaf; ++i) + { + struct comp_unit *unit =3D (struct comp_unit *) leaf->ranges[i].uni= t; + if (unit->mark || + addr < leaf->ranges[i].low_pc || + addr >=3D leaf->ranges[i].high_pc) + continue; + unit->mark =3D true; + + found =3D comp_unit_find_nearest_line (unit, addr, filename_ptr, &function, linenumber_ptr, - discriminator_ptr)); + discriminator_ptr); + if (found) + goto done; + } + } + + /* Also scan through all compilation units without any ranges, + taking them out of the list if they have acquired any since last = time. */ + prev_each =3D &stash->f.all_comp_units_without_ranges; + for (each =3D *prev_each; each; each =3D each->next_unit_without_ran= ges) + { + if (each->arange.high !=3D 0) + { + *prev_each =3D each->next_unit_without_ranges; + continue; + } + + found =3D comp_unit_find_nearest_line (each, addr, + filename_ptr, + &function, + linenumber_ptr, + discriminator_ptr); if (found) goto done; + prev_each =3D &each->next_unit_without_ranges; } } =20 --=20 2.35.1