git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	mst@redhat.com, Jens Lehmann <Jens.Lehmann@web.de>
Subject: Re: [PATCH 2/6] t7408: merge short tests, factor out testing method
Date: Fri, 5 Aug 2016 15:45:13 -0700	[thread overview]
Message-ID: <CAGZ79kZChXebtSJeTPGoZgnoKySDrWnD11NX7fyze+rOU=Jwvg@mail.gmail.com> (raw)
In-Reply-To: <xmqqeg63vtar.fsf@gitster.mtv.corp.google.com>

On Fri, Aug 5, 2016 at 1:45 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Tests consisting of one line each can be consolidated to have fewer tests
>> to run as well as fewer lines of code.
>>
>> When having just a few git commands, do not create a new shell but
>> use the -C flag in Git to execute in the correct directory.
>
> Good motivations.
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  t/t7408-submodule-reference.sh | 50 +++++++++++++++---------------------------
>>  1 file changed, 18 insertions(+), 32 deletions(-)
>>
>> diff --git a/t/t7408-submodule-reference.sh b/t/t7408-submodule-reference.sh
>> index afcc629..1416cbd 100755
>> --- a/t/t7408-submodule-reference.sh
>> +++ b/t/t7408-submodule-reference.sh
>> @@ -10,6 +10,16 @@ base_dir=$(pwd)
>>
>>  U=$base_dir/UPLOAD_LOG
>
> Is this line needed anywhere?
>
> We (perhaps unfortunately) still need $base_dir because we want to
> give an absolute file:/// URL to "submodule add", but I do not think
> we use $U, so let's get rid of it.
>
>> +test_alternate_usage()
>> +{
>
> According to Documentation/CodingGuidelines, this should be:
>
>     test_alternate_usage () {
>
> Somehow the helper name sounds as if it is testing if an alternate
> is used correctly (i.e. the machinery may attempt to use alternate
> but not in a correct way), not testing if an alternate is correctly
> used (i.e. the machinery incorrectly forgets to use an alternate at
> all), but maybe it is just me.
>
>> +     alternates_file=$1
>> +     working_dir=$2
>
> These are good (they can be on a single line), but you would
> want &&-chain just as other lines.
>
>> +     test_line_count = 1 $alternates_file &&
>
> This needs to quote "$alternates_file" especially in a helper
> function you have no control over future callers of.
>
> I wonder if we want to check the actual contents of the alternate;
> it may force us to worry about the infamous "should we expect
> $(pwd)/something or $PWD/something" if we did so, so it is not a
> strong suggestion.
>
>> +     echo "0 objects, 0 kilobytes" >expect &&
>> +     git -C $working_dir count-objects >current &&
>> +     diff expect current
>
> It is more customary to name two "expect" vs "actual", and compare
> them using "test_cmp" not "diff".
>
>> +}
>> +
>>  test_expect_success 'preparing first repository' '
>>       test_create_repo A &&
>>       (
>> @@ -42,44 +52,20 @@ test_expect_success 'preparing superproject' '
>>       )
>>  '
>>
>> -test_expect_success 'submodule add --reference' '
>> +test_expect_success 'submodule add --reference uses alternates' '
>>       (
>>               cd super &&
>>               git submodule add --reference ../B "file://$base_dir/A" sub &&
>>               git commit -m B-super-added
>> -     )
>> -'
>> -
>> -test_expect_success 'after add: existence of info/alternates' '
>> -     test_line_count = 1 super/.git/modules/sub/objects/info/alternates
>> -'
>> -
>> -test_expect_success 'that reference gets used with add' '
>> -     (
>> -             cd super/sub &&
>> -             echo "0 objects, 0 kilobytes" > expected &&
>> -             git count-objects > current &&
>> -             diff expected current

This is where the "diff" and "current" above came from.

>> -     )
>> -'
>
> Completely unrelated tangent, but after seeing the "how would we
> make a more intelligent choice of the diff boundary" topic, I
> wondered if we can notice that at this point there is a logical
> boundary and do something intelligent about it.  All the removed
> lines above have become "test_alternate" we see below, while all the
> removed lines below upto "test_alternate" correspond to the updated
> test at the end.

I guess that would require even more knowledge of the underlying
content that we track in Git.

Originally I started the diff boundary topic, as I assumed that the
new line detection is not doing harm. What Michael came up with
is impressive though I fear there might be a selection bias in the corpus,
i.e. we are missing some projects that get worse by it and these projects
would have had a great influence on the selection of the tuning parameters.
I guess we'll just wait until someone speaks up and points at worse
examples there.

What you're asking here is a complete new ballpark IMHO
as it takes diff to a whole new (syntactical) level. As of now
the world agrees that '\n' seems to be a good character to
put in text documents, so we use it as a splitting token
in our underlying diff driver, i.e. the diffs are primarily by line,
no matter how many characters change in that line. Sometimes
there is a "secondary" aspect such as coloring and pointing out
the characters that changed in a line[1].

[1] random example:
https://gerrit-review.googlesource.com/#/c/79354/3/project.py
Visually we focus on the changed characters, but the underlying
diff is still "by line".

We could write another "diff driver" that would not segment the
text by '\n' as a primary method and then showing the changed
hunks with +/-, but it would try to find rearrangements in the
underlying text:

As an example, this patch with such a diff driver could read partially as:
----8<----
***diff-driver: moved-text line 42 -> 14, length:7, post-operations: 1
 test_expect_success 'that reference gets used with add' '
      (
              cd super/sub &&
*             echo "0 objects, 0 kilobytes" > expected &&
*             git count-objects > current &&
*             diff expected current
      )
***diff-driver:post-operation 1:
*** modify moved text with:

+test_alternate_usage() {
+        alternates_file="$1" &&
+        working_dir="$2" &&
*             echo "0 objects, 0 kilobytes" > expected &&
*             git count-objects > current &&
*             diff expected current
+ }

***diff-driver: deletion 40,50

 -     )
 -'
 -
 -test_expect_success 'after add: existence of info/alternates' '
 -     test_line_count = 1 super/.git/modules/sub/objects/info/alternates
 -'
 -
----8<----

Thinking about this further, this would not require knowledge of
the underlying content, it is actually "just" an intra-file-rename
detection.

We have a rename detection from one blob to another, so we
could also have a similar thing to deduplicate within one blob,
which would track moving code around.


>
>> -test_expect_success 'cloning superproject' '
>> -     git clone super super-clone
>> -'
>> -
>> -test_expect_success 'update with reference' '
>> -     cd super-clone && git submodule update --init --reference ../B
>> -'
>> -
>> -test_expect_success 'after update: existence of info/alternates' '
>> -     test_line_count = 1 super-clone/.git/modules/sub/objects/info/alternates
>> +     ) &&
>> +     test_alternate_usage super/.git/modules/sub/objects/info/alternates super/sub
>>  '
>>
>> -test_expect_success 'that reference gets used with update' '
>> -     cd super-clone/sub &&
>> -     echo "0 objects, 0 kilobytes" > expected &&
>> -     git count-objects > current &&
>> -     diff expected current
>> +test_expect_success 'updating superproject keeps alternates' '
>> +     test_when_finished "rm -rf super-clone" &&
>
> This one is new; we do not remove A, B or super.  Does it matter if
> we leave super-clone behind?  Is super-clone so special?

"We need it in a later test."

It comes down to philosophy of how to write tests.

I spent some time in 740* and this is a surprising short test.
Compare to 7404 and 7400 for example. These are very long,
so when you want to add another test (no matter if testing for a
fixed regression or a new feature), you have lots of repos like

    super, super2, super3, sub, sub1, sumodule, anothersub,

and none of the names make sense. (Can I reuse them?
do they have some weird corner case configuration
that I need to undo? etc)

To stop that from happening again I want to have a clean slate,
i.e. all clones are deleted shortly after using, so it is obvious what
to use for testing.

>
>> +     git clone super super-clone &&
>> +     git -C super-clone submodule update --init --reference ../B &&
>> +     test_alternate_usage super-clone/.git/modules/sub/objects/info/alternates super-clone/sub
>>  '
>>
>>  test_done

  reply	other threads:[~2016-08-06 20:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-04 19:51 [PATCH 0/6] git clone: Marry --recursive and --reference Stefan Beller
2016-08-04 19:51 ` [PATCH 1/6] t7408: modernize style Stefan Beller
2016-08-05 20:30   ` Junio C Hamano
2016-08-04 19:51 ` [PATCH 2/6] t7408: merge short tests, factor out testing method Stefan Beller
2016-08-05 20:45   ` Junio C Hamano
2016-08-05 22:45     ` Stefan Beller [this message]
2016-08-05 23:09       ` Junio C Hamano
2016-08-04 19:51 ` [PATCH 3/6] submodule--helper module-clone: allow multiple references Stefan Beller
2016-08-05 20:54   ` Junio C Hamano
2016-08-04 19:51 ` [PATCH 4/6] submodule--helper update-clone: " Stefan Beller
2016-08-05 19:08   ` Stefan Beller
2016-08-05 21:06     ` Junio C Hamano
2016-08-05 21:19       ` Stefan Beller
2016-08-05 21:31         ` Junio C Hamano
2016-08-05 21:33           ` Stefan Beller
2016-08-04 19:51 ` [PATCH 5/6] submodule update: add super-reference flag Stefan Beller
2016-08-05 21:16   ` Junio C Hamano
2016-08-06  0:22     ` Stefan Beller
2016-08-04 19:51 ` [PATCH 6/6] clone: reference flag is used for submodules as well Stefan Beller
2016-08-05 21:36   ` Junio C Hamano
2016-08-05 19:47 ` [PATCH 0/6] git clone: Marry --recursive and --reference Junio C Hamano
2016-08-05 21:23   ` Stefan Beller
2016-08-05 21:47     ` Junio C Hamano
2016-08-05 23:26 ` Rename detection within in files WAS: [PATCH 2/6] t7408: merge short tests, factor out testing method Stefan Beller
2016-08-07  9:24   ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZChXebtSJeTPGoZgnoKySDrWnD11NX7fyze+rOU=Jwvg@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).