git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>,
	sbeller@google.com, Johannes.Schindelin@gmx.de, jnareb@gmail.com,
	mlbright@gmail.com
Subject: Re: [PATCH v6 10/13] convert: generate large test files only once
Date: Tue, 30 Aug 2016 12:37:49 -0400	[thread overview]
Message-ID: <20160830163749.it4nedannqzqeh5c@sigill.intra.peff.net> (raw)
In-Reply-To: <F976C223-0CF3-43FB-ABF2-FFE7C600138E@gmail.com>

On Tue, Aug 30, 2016 at 01:41:59PM +0200, Lars Schneider wrote:

> >> +	git checkout -- test test.t test.i &&
> >> +
> >> +	mkdir generated-test-data &&
> >> +	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
> >> +	do
> >> +		RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
> >> +		ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
> > 
> > In earlier iteration of loop with lower $i, what guarantees that
> > some bytes survive "tr -dc"?
> 
> Nothing really, good catch! The seed "end" produces as first character always a 
> "S" which would survive "tr -dc". However, that is clunky. I will always set "1"
> as first character in $RANDOM_STRING to mitigate the problem.

It seems odd that you would generate a larger set of random bytes and
then throw most of them away (about 1 in 5, on average). So you don't
actually know how long your inputs are, and you're wasting time
generating bytes which are discarded.

The goal looks like it is just to clean up the string to only-ASCII
characters. Perhaps converting to to base64 or hex would be conceptually
simpler? Like:

  test-genrandom ... |
  perl -pe 's/./hex(ord($&))/sge'

> > I do not quite get the point of this complexity.  You are using
> > exactly the same seed "end" every time, so in the first round you
> > have 1M of SP, letter '1', letter 'S' (from the genrandom), then
> > in the second round you have 1M of SP, letter '1', letter 'S' and
> > letter 'p' (the last two from the genrandom), and go on.  Is it
> > significant for the purpose of your test that the cruft inserted
> > between the repetition of 1M of SP gets longer by one byte but they
> > all share the same prefix (e.g. "1S", "1Sp", "1SpZ", "1SpZT",
> > ... are what you insert between a large run of spaces)?
> 
> The pktline packets have a constant size. If the cruft between 1M of SP 
> has a constant size as well then the generated packets for the test data
> would repeat themselves. That's why I increased the length after every 1M
> of SP.
> 
> However, I realized that this test complexity is not necessary. I'll
> simplify it in the next round.

I was also confused by this, and wondered if other patterns (perhaps
using a single longer genrandom) might be applicable. Simplification (or
explanation in comments about what properties the content _needs_ to
have) would be welcome. :)

-Peff

  reply	other threads:[~2016-08-30 16:37 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 11:07 [PATCH v6 00/13] Git filter protocol larsxschneider
2016-08-25 11:07 ` [PATCH v6 01/13] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-08-25 11:07 ` [PATCH v6 02/13] pkt-line: extract set_packet_header() larsxschneider
2016-08-25 11:07 ` [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-08-25 18:12   ` Stefan Beller
2016-08-25 18:47     ` Lars Schneider
2016-08-25 21:41   ` Junio C Hamano
2016-08-26  9:17     ` Lars Schneider
2016-08-26 17:10       ` Junio C Hamano
2016-08-26 17:23         ` Jeff King
2016-08-25 11:07 ` [PATCH v6 04/13] pkt-line: add packet_flush_gently() larsxschneider
2016-08-25 11:07 ` [PATCH v6 05/13] pkt-line: add packet_write_gently() larsxschneider
2016-08-25 21:50   ` Junio C Hamano
2016-08-26  9:40     ` Lars Schneider
2016-08-26 17:15       ` Junio C Hamano
2016-08-29  9:40         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-25 18:46   ` Stefan Beller
2016-08-25 19:33     ` Lars Schneider
2016-08-25 22:31     ` Junio C Hamano
2016-08-26  0:55       ` Jacob Keller
2016-08-26 17:02         ` Stefan Beller
2016-08-26 17:21           ` Jeff King
2016-08-26 17:17         ` Junio C Hamano
2016-08-25 22:27   ` Junio C Hamano
2016-08-26 10:13     ` Lars Schneider
2016-08-26 17:21       ` Junio C Hamano
2016-08-29  9:43         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 07/13] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-25 18:59   ` Stefan Beller
2016-08-25 19:35     ` Lars Schneider
2016-08-26 19:44       ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 08/13] convert: quote filter names in error messages larsxschneider
2016-08-26 19:45   ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 09/13] convert: modernize tests larsxschneider
2016-08-26 20:03   ` Junio C Hamano
2016-08-29 10:09     ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 10/13] convert: generate large test files only once larsxschneider
2016-08-25 19:17   ` Stefan Beller
2016-08-25 19:54     ` Lars Schneider
2016-08-29 17:52       ` Junio C Hamano
2016-08-30 11:47         ` Lars Schneider
2016-08-30 16:55           ` Junio C Hamano
2016-08-29 17:46   ` Junio C Hamano
2016-08-30 11:41     ` Lars Schneider
2016-08-30 16:37       ` Jeff King [this message]
2016-08-25 11:07 ` [PATCH v6 11/13] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-08-25 11:07 ` [PATCH v6 12/13] convert: add filter.<driver>.process option larsxschneider
2016-08-29 22:21   ` Junio C Hamano
2016-08-30 16:27     ` Lars Schneider
2016-08-30 18:59       ` Junio C Hamano
2016-08-30 20:38         ` Lars Schneider
2016-08-30 22:23           ` Junio C Hamano
2016-08-31  4:57             ` Torsten Bögershausen
2016-08-31 13:14               ` Jakub Narębski
2016-08-30 20:46         ` Jakub Narębski
2016-09-05 19:47           ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes larsxschneider
2016-08-29 18:05   ` Junio C Hamano
2016-08-29 19:03     ` Lars Schneider
2016-08-29 19:45       ` Junio C Hamano
2016-08-30 12:32         ` Lars Schneider
2016-08-30 14:54           ` Torsten Bögershausen
2016-09-01 17:15             ` Junio C Hamano
2016-08-29 15:39 ` [PATCH v6 00/13] Git filter protocol Lars Schneider
2016-08-29 18:09   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160830163749.it4nedannqzqeh5c@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).