Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp226380pxb; Sat, 18 Sep 2021 00:44:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz25EJuDFPDh/4XTHwyFEMwXSjpMNdZQFWegDkV452KqKwt+jDOEpzSTNAfVrXEtrZuN9RZ X-Received: by 2002:a17:906:660f:: with SMTP id b15mr16563258ejp.491.1631951071534; Sat, 18 Sep 2021 00:44:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631951071; cv=none; d=google.com; s=arc-20160816; b=cSTcsm1VcoBG67Ex556anGJ4FzaOuFD1KfENgY4H3Kh40cj7ao4KijUGbrFDyo8Uaz apY1fIc0wn653uhjQg2k4sPOO2XAUpZcwQO1fnngNCXLPZF2wNmmsleectuB62JrMvkE mVBQkTlQ+rG5sWIi2kVFUUBBKzqn6a7x3mRo1OHuqTd2DYeg3Dbs+l1ku1CHfbyGfKtl 1he6CJ+cEqvkxIfmYB3ZRN1bshHgwuvMr8eKIBM3FbHJ62QA6BBHlfpUx5rXa5CArQ4V JgzTqUlFgx8/KN65/ArEtZMrygG1msBNY5xhA67wsi3L9Wsty+rfu/1TWfKHIV0F7poE 5HvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=14caHXts2Te/a723kZzIwMFe5EaCfVLGblohvhyzZds=; b=MHm99rR1rMhrF+gdY1DInksh4fXu8SXzWodHtS3OhjF6dJR8OMVSfeE68YxfNwUqzM cKarsmJEO0srGgKhZlEgDuga2kfGAUAnZZ32XNxhwkQfjTzlvLEUn9QZZKUTaX4911Fl ooEUvS27uS1Fbr5OnNZ4gQAjVdFoDCFUwJvzZnqm1HRdDAonAtlJPw9u8Eh/3LL8GUa0 oHauPnwi5W7Au2N0i9Ayd75utjCkOAG8KyBRMp5my8Jfi5IGZD/9m0a6AdtyLX10pj9S OL+rL5m7rFAu4GbE4H+xSLMlvMbo+dhZv/+xtBkPoHKpORz4Py41Xpqrtebk3Oi3gXeg hXFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f20si8604930edj.364.2021.09.18.00.44.07; Sat, 18 Sep 2021 00:44:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242447AbhIQTVa (ORCPT + 99 others); Fri, 17 Sep 2021 15:21:30 -0400 Received: from cloud.peff.net ([104.130.231.41]:50126 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242389AbhIQTV3 (ORCPT ); Fri, 17 Sep 2021 15:21:29 -0400 X-Greylist: delayed 401 seconds by postgrey-1.27 at vger.kernel.org; Fri, 17 Sep 2021 15:21:28 EDT Received: (qmail 11786 invoked by uid 109); 17 Sep 2021 19:13:25 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 17 Sep 2021 19:13:25 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 20772 invoked by uid 111); 17 Sep 2021 19:13:24 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 17 Sep 2021 15:13:24 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 17 Sep 2021 15:13:24 -0400 From: Jeff King To: Rolf Eike Beer Cc: Linus Torvalds , Junio C Hamano , Git List Mailing , Tobias Ulmer , Linux Kernel Mailing List Subject: Re: data loss when doing ls-remote and piped to command Message-ID: References: <6786526.72e2EbofS7@devpool47> <2722184.bRktqFsmb4@devpool47> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <2722184.bRktqFsmb4@devpool47> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 17, 2021 at 08:59:07AM +0200, Rolf Eike Beer wrote: > What you need is a _fast_ git server. kernel.org or github.com seem to be too > slow for this if you don't sit somewhere in their datacenter. Use something in > your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s > Ethernet in my case. One thing that puzzled me here: is the bad output between the server and ls-remote, or between ls-remote and its output pipe? I'd guess it has to be the latter, since otherwise ls-remote itself would barf with an error message. In that case, I'd think "git ls-remote ." would give you the fastest outcome, because it's talking to upload-pack on the local box. But I'm also confused how the speed could matter, as ls-remote reads the entire input into an in-memory array, and then formats it. We do the write using printf(). Is it possible your libc's stdio may drop bytes when the pipe is full, rather than blocking? In general, I'd expect write() to block, so libc doesn't have to care at all. But might there be something in your environment putting the pipe into non-blocking mode, and we get EAGAIN or something? If so, I'd expect stdio to return the error. Maybe patching Git like this would help: diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index f4fd823af8..5936b2b42c 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -146,7 +146,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) const struct ref_array_item *ref = ref_array.items[i]; if (show_symref_target && ref->symref) printf("ref: %s\t%s\n", ref->symref, ref->refname); - printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname); + if (printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname) < 0) + die_errno("printf failed"); status = 0; /* we found something */ } > And the reader must be "somewhat" slow. Using sha256sum works reliably for me. > Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems. If a slow pipe is involved, maybe: git ls-remote . | (sleep 5; cat) | sha256sum would help reproduce. Assuming ls-remote's output is bigger than your system pipe buffer (which is another interesting thing to check), then it should block for 5 seconds on write() midway through the output, which you can verify with strace. -Peff