git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling
Date: Tue, 15 Jun 2021 18:46:17 +0200	[thread overview]
Message-ID: <877divystm.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <34d5febf-508c-52b8-a04b-98298d75bd8d@web.de>


On Tue, Jun 15 2021, René Scharfe wrote:

> Am 14.06.21 um 21:08 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Jun 14 2021, René Scharfe wrote:
>>
>>> Am 14.06.21 um 13:07 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> On Thu, Jun 10 2021, René Scharfe wrote:
>>>>
>>>>> Am 09.06.21 um 00:12 schrieb Ævar Arnfjörð Bjarmason:
>>>>>>
>>>>>> On Tue, Jun 08 2021, René Scharfe wrote:
>>>>>>
>>>>>>> I wonder (only in a semi-curious way, though) if we can detect
>>>>>>> off-by-one errors by adding an assertion to display_progress() that
>>>>>>> requires the first update to have the value 0, and in stop_progress()
>>>>>>> one that requires the previous display_progress() call to have a value
>>>>>>> equal to the total number of work items.  Not sure it'd be worth the
>>>>>>> hassle..
>>>>>>
>>>>>> That's intentional. We started eating 3 apples, got to one, but now our
>>>>>> house is on fire and we're eating no more apples today, even if we
>>>>>> planned to eat 3 when we sat down.
>>>>>>
>>>>>> The progress bar reflects this unexpected but recoverable state:
>>>>>>
>>>>>>     $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' |
>>>>>>       ./helper/test-tool progress --total=3 Apples 2>&1 |
>>>>>>       cat -v | perl -pe 's/\^M\K/\n/g'
>>>>>>     Apples:   0% (0/3)^M
>>>>>>     Apples:  33% (1/3)^M
>>>>>>     Apples:  33% (1/3), done.
>>>>>>
>>>>>> We're at 1/3, but we're done. No more apples.
>>>>>>
>>>>>> This isn't just some hypothetical, e.g. consider neeing to unlink() or
>>>>>> remove files/directories one at a time in a directory and getting the
>>>>>> estimated number from st_nlink (yeah yeah, unportable, but it was the
>>>>>> first thing I thought of).
>>>>>>
>>>>>> We might think we're processing 10 entries, but another other processes
>>>>>> might make our progress bar end at more or less than the 100% we
>>>>>> expected. That's OK, not something we should invoke BUG() about.
>>>>>
>>>>> It doesn't have to be a BUG; a warning would suffice.  And I hope not
>>>>> finishing the expected number of items due to a catastrophic event is
>>>>> rare enough that an additional warning wouldn't cause too much pain.
>>>>
>>>> It's not a catastrophic event, just a run of the mill race condition
>>>> we'll expect if we're dealing with the real world.
>>>>
>>>> E.g. you asked to unlink 1000 files, we do so, we find 10 are unlinked
>>>> already, or the command is asked to recursively unlink all files in a
>>>> directory tree, and new ones have showed up.
>>>>
>>>> In those cases we should just just shrug and move on, no need for a
>>>> warning. We just don't always have perfect information about future
>>>> state at the start of the loop.
>>>
>>> If a planned work item is cancelled then it can still be counted as
>>> done.  Or the total could be adjusted, but that might look awkward.
>>>
>>>>> Loops that *regularly* end early are not a good fit for progress
>>>>> percentages, I think.
>>>>
>>>> Arguably yes, but in these fuzzy cases not providing a "total" means
>>>> showing no progress at all, just a counter. Perhaps we should have some
>>>> other "provide total, and it may be fuzzy" flag. Not providing it might
>>>> run into your proposed BUG(), my point was that the current API
>>>> providing this flexibility is intentional.
>>>
>>> Your patch turns a loop that doesn't immediately report skipped items
>>> into one with contiguous progress updates.  That's a good way to deal
>>> with the imagined restrictions for error detection.  Another would be
>>> to make the warnings optional.
>>
>> I don't see how there's anything wrong with the API use, how it needs a
>> warning etc.
>
> You pointed out that many callsites do:
>
> 	for (i = 0; i < large_number; i++) {
> 		display_progress(p, i + 1);
> 		/* work work work */
> 	}
>
> This is an off-by-one error because a finished item is reported before
> work on it starts.  Adding a warning can help find these cases reliably.

I understand that we're respectfully disagreeing and that's what you
think, but I really don't think it helps anyone if we just repeat our
respective points.

I don't think it's off-by-one, but you do.

Yes, I understand that you think that progress bars should absolutely
never ever show 1/5 if the first item is not finished. I disagree and
think that's not intuitive, per my "eating an Apple" example
upthread.

We disagree, and I for one think I understand what you mean, perhaps you
don't understand what I mean, but let's try to move on.

>>>>>> Similarly, the n=0 being distinguishable from the first
>>>>>> display_progress() is actually useful in practice. It's something I've
>>>>>> seen git.git emit (not recently, I patched the relevant code to emit
>>>>>> more granular progress).
>>>>>>
>>>>>> It's useful to know that we're stalling on the setup code before the
>>>>>> for-loop, not on the first item.
>>>>>
>>>>> Hmm, preparations that take a noticeable time might deserve their own
>>>>> progress line.
>>>>
>>>> Sure, and I've split some of those up in the past, but this seems like
>>>> ducking/not addressing the point that the API use we disagree on has
>>>> your preferred use conflating these conditions, but mine does not...
>>>
>>> Subtle.  If preparation takes a long time and each item less than that
>>> then the progress display is likely to jump from "0/n" to "i/n", where
>>> i > 1, and the meaning of "1/n" becomes moot.
>>
>> In practice we're making humongous jumps all over the place, we don't
>> write to the terminal for every item processed, and if we did it would
>> be too fast to be perceptable to the user.
>>
>> So I don't think this is an issue in the first place, as noted upthread
>> in <8735tt4fhx.fsf@evledraar.gmail.com>. Regardless of what we think of
>> the supposed off-by-one issue you seemed to think that it was enough of
>> an issue to justify complexity at the API use level (needing to think
>> about "continue" statements in loops, etc.), but now you think it's
>> moot?
>
> I don't understand your question.  Let me rephrase what I find moot:
>
> You wrote that the first display_progress() call being made with n>0
> would be useful to you to see long-running preparations.  If items are
> processed quicker than one per second, then whatever number the first
> display_progress() call writes to the screen will be overwritten within
> a second.  So the value of the first update shouldn't actually matter
> much for your use case -- unless items takes a long time to process.

I think it would be better if you replied specifically to the comments I
had later about throughput progress bars, i.e.:

    How does the idea that we show "has been done" make sense when you
    combine the progress.c API with the display_throughput(). I.e. output
    like[...]

Anyway, in this case I understood you to mean that you thought the
off-by-one wasn't a big deal in practice most of the time, I don't think
so either for e.g. counting objects in pack files.

I do think it's useful to be consistent though, and for e.g. cases of
downloading 5 files it makes sense to show 1/5 if we are currently in
the process of downloading files 1 out of 5, not 0/5 or whatever.

  reply	other threads:[~2021-06-15 16:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 14:43 [PATCH 0/2] trivial progress.c API usage fixes Ævar Arnfjörð Bjarmason
2021-06-07 14:43 ` [PATCH 1/2] read-cache.c: don't guard calls to progress.c API Ævar Arnfjörð Bjarmason
2021-06-07 15:28   ` Derrick Stolee
2021-06-07 15:52     ` Ævar Arnfjörð Bjarmason
2021-06-07 16:11       ` Derrick Stolee
2021-06-07 14:43 ` [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling Ævar Arnfjörð Bjarmason
2021-06-07 15:31   ` Derrick Stolee
2021-06-07 15:58     ` Ævar Arnfjörð Bjarmason
2021-06-07 19:20       ` René Scharfe
2021-06-07 19:49         ` Ævar Arnfjörð Bjarmason
2021-06-07 23:41           ` Junio C Hamano
2021-06-08 10:58             ` Ævar Arnfjörð Bjarmason
2021-06-08 16:14               ` René Scharfe
2021-06-08 22:12                 ` Ævar Arnfjörð Bjarmason
2021-06-10  5:30                   ` Junio C Hamano
2021-06-10 15:14                     ` René Scharfe
2021-06-10 15:14                   ` René Scharfe
2021-06-14 11:07                     ` Ævar Arnfjörð Bjarmason
2021-06-14 17:18                       ` René Scharfe
2021-06-14 19:08                         ` Ævar Arnfjörð Bjarmason
2021-06-15  2:32                           ` Junio C Hamano
2021-06-15 15:14                           ` René Scharfe
2021-06-15 16:46                             ` Ævar Arnfjörð Bjarmason [this message]
2021-06-20 12:53                               ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877divystm.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=pclouds@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).