git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Neeraj Singh <neerajsi@microsoft.com>,
	Han Xin <hanxin.hx@alibaba-inc.com>
Subject: ab/squelch-empty-fsync-traces & hx/unpack-streaming bug (was: What's cooking in git.git (Jul 2022, #04; Wed, 13))
Date: Fri, 15 Jul 2022 15:40:13 +0200	[thread overview]
Message-ID: <220715.86bktqzdb8.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq7d4g8onl.fsf@gitster.g>


On Wed, Jul 13 2022, Junio C Hamano wrote:

> * ab/squelch-empty-fsync-traces (2022-06-30) 1 commit
>  . trace2: don't include "fsync" events in all trace2 logs
>
>  Omit fsync-related trace2 entries when their values are all zero.
>
>  Breaks tests in hx/unpack-streaming with an interesting interaction.
>  source: <patch-v2-1.1-a1fc37de947-20220630T084607Z-avarab@gmail.com>

[...]

> * hx/unpack-streaming (2022-06-13) 6 commits
>   (merged to 'next' on 2022-07-08 at 4eb375ec2f)
>  + unpack-objects: use stream_loose_object() to unpack large objects
>  + core doc: modernize core.bigFileThreshold documentation
>  + object-file.c: add "stream_loose_object()" to handle large object
>  + object-file.c: factor out deflate part of write_loose_object()
>  + object-file.c: refactor write_loose_object() to several steps
>  + unpack-objects: low memory footprint for get_data() in dry_run mode
>
>  Allow large objects read from a packstream to be streamed into a
>  loose object file straight, without having to keep it in-core as a
>  whole.
>
>  Will merge to 'master'.
>  source: <cover.1654914555.git.chiyutianyi@gmail.com>

I hadn't had time to look at this until now. There's some interesting
behavior here.

The code to check the hardware flush was added in aaf81223f48
(unpack-objects: use stream_loose_object() to unpack large objects,
2022-06-11) (that series is now on master).

But as my ab/squelch-empty-fsync-traces notes we always add this to the
event, so the:

	grep fsync/hardware-flush trace2.txt &&

Is equivalent to:

	true &&

I.e. it's not testing worthwhile at all. The reason you're seeing a
failure is deu to 412e4caee38 (tests: disable fsync everywhere,
2021-10-29), i.e. our tests disable fsync(). What you have queued will
pass as:

	GIT_TEST_FSYNC=true ./t5351-unpack-large-objects.sh

But I think that would be meaningless, since we'll write out that on
FSYNC_HARDWARE_FLUSH whether we actually support "bulk" or not. AFAICT
the way to detect if we support "bulk" at all is to check for
fsync/writeout-only.

*Except* that we we unconditionally increment the "writeout only"
counter, even if we don't actually support that "bulk" mode. We're just
doing a regular fsync().

So, narrowly it looks easy to "fix" my ab/squelch-empty-fsync-traces, I
could apply this on top:

diff --git a/t/t5351-unpack-large-objects.sh b/t/t5351-unpack-large-objects.sh
index 8ce8aa3b147..29cab843eb9 100755
--- a/t/t5351-unpack-large-objects.sh
+++ b/t/t5351-unpack-large-objects.sh
@@ -53,8 +53,12 @@ BATCH_CONFIGURATION='-c core.fsync=loose-object -c core.fsyncmethod=batch'
 test_expect_success 'unpack big object in stream (core.fsyncmethod=batch)' '
 	prepare_dest 1m &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+	GIT_TEST_FSYNC=true \
 		git -C dest.git $BATCH_CONFIGURATION unpack-objects <pack-$PACK.pack &&
-	grep fsync/hardware-flush trace2.txt &&
+	grep fsync/ trace2.txt >wo.txt &&
+	sed -e "s/.*value\":\"//" -e "s/\".*//" <wo.txt >actual &&
+	test_write_lines 6 1 >expect &&
+	test_cmp expect actual &&
 	test_dir_is_empty dest.git/objects/pack &&
 	git -C dest.git cat-file --batch-check="%(objectname)" <obj-list >current &&
 	cmp obj-list current

But does this make any sense in the larger scheme of things?  I.e. the
trace2 logging isn't at all logging that we're actually doing with
fsync, but what we intended to do based on the application logic, is
that intended & OK or not?

  parent reply	other threads:[~2022-07-15 14:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  1:32 What's cooking in git.git (Jul 2022, #04; Wed, 13) Junio C Hamano
2022-07-14 12:11 ` ds/rebase-update-ref (was Re: What's cooking in git.git (Jul 2022, #04; Wed, 13)) Derrick Stolee
2022-07-14 16:35   ` Junio C Hamano
2022-07-15 13:40 ` Ævar Arnfjörð Bjarmason [this message]
2022-07-16 12:23   ` ab/squelch-empty-fsync-traces & hx/unpack-streaming bug (was: " Han Xin
2022-07-17  3:46 ` en/merge-restore-to-pristine (Was: " Elijah Newren
2022-07-17 16:58   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220715.86bktqzdb8.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanxin.hx@alibaba-inc.com \
    --cc=neerajsi@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).