Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp11507659ybi; Thu, 25 Jul 2019 18:05:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqxb5mZgywoDXOTKBh8FWBA0Y09ucPbwRBsuoupLG0h+AvnC6oGEXXWYFn/zSLdIYHsbt/lx X-Received: by 2002:aa7:9ad2:: with SMTP id x18mr20266180pfp.192.1564103143447; Thu, 25 Jul 2019 18:05:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564103143; cv=none; d=google.com; s=arc-20160816; b=HeTQ4wJ2mE9uQjq8pzkDjbGvlI4QWi6KabMFE6yn4zGfQaJS4L39NBxbO9+HkMlkyZ JygzfvrVaeKi5GV9UwpYDgnWw6sizh/3qRbVluXmwPOJiuvL9hQYfx71loDFDt98tgvG h3aVAxQ/6t2F0uOSL7i1dYiQTwdLfbfmjaDIdOEVZZ47zJWPL4/V1wk54xnwwcyzN8/P lS2oKQv9Hf8KkHzOWqn/+XPPWVm9ivzEQCBb1yrKsra9Om4zjpYYAvlrrK+/v865xpno TVO+NWu0gmPoDgms8elId9dk07XqdJRgvkS2btElpHRgnwjUW9Rr3y680Zas+YIZ2M06 hGUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=FXxdgJofBHTfqUont/xN09J93S988kXgtU9AsgYaVqs=; b=WmWz8H/v0PPa8FVmFBASRnZ5Skrf/q6IBOIRwBSeY7QjB4a805z0PrlwAzwGe+xySl x6ioDAfe1+5HudEApWAjK5yAWlTi4XcxTLIVZ5oT+caq43zB/JO/tWeYh7eOFLAIF6cS XYlqbilk+yIs/XhfER/wXok6zxNHkoH+2w/5uQ/ZSKsAg4paKDYr2V6T6dd93mEMAWKz b94wEL2W0jXFAIwTGnkMCoZNrADC3Tz0TQ0U1/GLtpZk3qCzRk92KgU8ZTK/Ak1CqixQ de0zfyOZWhMzZn4v78d8+oaWE6cKLVf7IazlYNn8vSGxMkHmH8iC5Lm6EcLJq/yiEtjH 4/Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=QK5weqvk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z6si8696348plo.193.2019.07.25.18.05.27; Thu, 25 Jul 2019 18:05:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=QK5weqvk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727000AbfGZBC7 (ORCPT + 99 others); Thu, 25 Jul 2019 21:02:59 -0400 Received: from mail-yw1-f68.google.com ([209.85.161.68]:43158 "EHLO mail-yw1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726920AbfGZBC6 (ORCPT ); Thu, 25 Jul 2019 21:02:58 -0400 Received: by mail-yw1-f68.google.com with SMTP id n205so19845216ywb.10 for ; Thu, 25 Jul 2019 18:02:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FXxdgJofBHTfqUont/xN09J93S988kXgtU9AsgYaVqs=; b=QK5weqvkNzeEcaSBLhlR27Gq3Pelk7aFxUKr5ZZ5mlWxb7g5ZwhRSWK+XnTKjAniKe 7nIf/Bjv3xX4qlb4CMBC5+vWMVdehkRTF/iv3X11LeItcfjDNYFquDfx5hFy9XW+tnEX A2wSu3AwbOXnPuAMHPP59MOIkrGl8bovU9/btnm76H/ltdNSCr586mV/cwO1ews8Wy9E Gmk/t6LbKQraOKkr9xKE7LjFfAqf7KPhWFT8rP3w/MyA4Pf/KkwPnU/Ku3K3V2EPyh7s C5j4NwiZtERDNHJl+/armc5+5sJzZEWO3TsDW3ZYj0a8N7ASYIbfEzPueURG//CyOPsC bXLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FXxdgJofBHTfqUont/xN09J93S988kXgtU9AsgYaVqs=; b=hVDCN5aG1eEg0th6vynz17OuEnRFPddxszP7XOGFq9Aki2379b5Wy+EM4XezIpPioM iS2qj2B5MKY5xvgY6SCBdBD0Dzqf1pfaJhyLR06V/LHHGi4s5J4b3qRxwYYUp6v7tTki yXQoVViP9gPLV6gSDIqDO6TqR0rio+tNAOhwNBuU6pyyg43A23vKYldmQxGZOHG8DdQl ByOJOuhTuCJGwYulVpO3c/ou7XkK5RYHhHIF7BIgPUKJW46DXOx48X1DvmIxePlUiH07 cekJjDWoRRysf4ueQzK4YJVdBC5JAzCV/mlgOBXf8OHvXZCO4Z9Pzg5B8oxHgusIzHVs Lz+Q== X-Gm-Message-State: APjAAAV5RvVK1YTluQh9OJ03qZw5EBi6YBXJyd4iHrtavz3l6PTpbt/P 7mMzeCctYq8Xit2nv3nEn9EZ1AE7 X-Received: by 2002:a81:9152:: with SMTP id i79mr974003ywg.286.1564102976742; Thu, 25 Jul 2019 18:02:56 -0700 (PDT) Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com. [209.85.219.170]) by smtp.gmail.com with ESMTPSA id a4sm10227503ywe.28.2019.07.25.18.02.55 for (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 18:02:55 -0700 (PDT) Received: by mail-yb1-f170.google.com with SMTP id s41so16385857ybe.12 for ; Thu, 25 Jul 2019 18:02:55 -0700 (PDT) X-Received: by 2002:a25:5054:: with SMTP id e81mr57568085ybb.390.1564102975137; Thu, 25 Jul 2019 18:02:55 -0700 (PDT) MIME-Version: 1.0 References: <20190724165803.87470-1-brianvv@google.com> <20190724165803.87470-3-brianvv@google.com> <20190725235432.lkptx3fafegnm2et@ast-mbp> In-Reply-To: <20190725235432.lkptx3fafegnm2et@ast-mbp> From: Willem de Bruijn Date: Thu, 25 Jul 2019 21:02:18 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call To: Alexei Starovoitov Cc: Brian Vazquez , Song Liu , Brian Vazquez , Alexei Starovoitov , Daniel Borkmann , "David S . Miller" , Stanislav Fomichev , Petar Penkov , open list , Networking , bpf Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 25, 2019 at 7:54 PM Alexei Starovoitov wrote: > > On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote: > > > > > If prev_key is deleted before map_get_next_key(), we get the first key > > > > > again. This is pretty weird. > > > > > > > > Yes, I know. But note that the current scenario happens even for the > > > > old interface (imagine you are walking a map from userspace and you > > > > tried get_next_key the prev_key was removed, you will start again from > > > > the beginning without noticing it). > > > > I tried to sent a patch in the past but I was missing some context: > > > > before NULL was used to get the very first_key the interface relied in > > > > a random (non existent) key to retrieve the first_key in the map, and > > > > I was told what we still have to support that scenario. > > > > > > BPF_MAP_DUMP is slightly different, as you may return the first key > > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we > > > don't have to support legacy scenarios. > > > > > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try > > > to look up previous keys. Would something down this direction work? > > > > I've been thinking about it and I think first we need a way to detect > > that since key was not present we got the first_key instead: > > > > - One solution I had in mind was to explicitly asked for the first key > > with map_get_next_key(map, NULL, first_key) and while walking the map > > check that map_get_next_key(map, prev_key, key) doesn't return the > > same key. This could be done using memcmp. > > - Discussing with Stan, he mentioned that another option is to support > > a flag in map_get_next_key to let it know that we want an error > > instead of the first_key. > > > > After detecting the problem we also need to define what we want to do, > > here some options: > > > > a) Return the error to the caller > > b) Try with previous keys if any (which be limited to the keys that we > > have traversed so far in this dump call) > > c) continue with next entries in the map. array is easy just get the > > next valid key (starting on i+1), but hmap might be difficult since > > starting on the next bucket could potentially skip some keys that were > > concurrently added to the same bucket where key used to be, and > > starting on the same bucket could lead us to return repeated elements. > > > > Or maybe we could support those 3 cases via flags and let the caller > > decide which one to use? > > this type of indecision is the reason why I wasn't excited about > batch dumping in the first place and gave 'soft yes' when Stan > mentioned it during lsf/mm/bpf uconf. > We probably shouldn't do it. > It feels this map_dump makes api more complex and doesn't really > give much benefit to the user other than large map dump becomes faster. > I think we gotta solve this problem differently. Multiple variants with flags indeed makes the API complex. I think the kernel should expose only the simplest, most obvious behavior that allows the application to recover. In this case, that sounds like option (a) and restart. In practice, the common use case is to allocate enough user memory to read an entire table in one go, in which case the entire issue is moot. The cycle savings of dump are significant for large tables. I'm not sure how we achieve that differently and even simpler? We originally looked at shared memory, but that is obviously much more complex.