user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/2] attached patches and false-positive dfpost:
@ 2019-10-31  3:12 Eric Wong
  2019-10-31  3:12 ` [PATCH 1/2] msgiter: attempt to decode all text/* bodies Eric Wong
  2019-10-31  3:12 ` [PATCH 2/2] solvergit: deal with false-positive dfpost: results Eric Wong
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Wong @ 2019-10-31  3:12 UTC (permalink / raw)
  To: meta

Once reindexed, all the patches attached at
https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a281a5@oracle.com/
should be "solvable".

Eric Wong (2):
  msgiter: attempt to decode all text/* bodies
  solvergit: deal with false-positive dfpost: results

 lib/PublicInbox/MsgIter.pm   | 16 +++++++---
 lib/PublicInbox/SolverGit.pm | 59 ++++++++++++++++++++++++++++--------
 2 files changed, 57 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] msgiter: attempt to decode all text/* bodies
  2019-10-31  3:12 [PATCH 0/2] attached patches and false-positive dfpost: Eric Wong
@ 2019-10-31  3:12 ` Eric Wong
  2019-10-31  3:12 ` [PATCH 2/2] solvergit: deal with false-positive dfpost: results Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2019-10-31  3:12 UTC (permalink / raw)
  To: meta

We want to index text/x-patch and text/x-diff, at least,
since "git format-patch" can generate a patch series as
attachments using --attach.
---
 lib/PublicInbox/MsgIter.pm | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index f11ba223..d9df32ab 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -45,12 +45,18 @@ sub msg_part_text ($$) {
 	# times when it should not have been:
 	#   <87llgalspt.fsf@free.fr>
 	#   <200308111450.h7BEoOu20077@mail.osdl.org>
-	if ($err && ($ct =~ m!\btext/plain\b!i ||
+	if ($err && ($ct =~ m!\btext/\b!i ||
 			$ct =~ m!\bmultipart/mixed\b!i)) {
-		# Try to assume UTF-8 because Alpine seems to
-		# do wacky things and set charset=X-UNKNOWN
-		$part->charset_set('UTF-8');
-		$s = eval { $part->body_str };
+		my $cte = $part->header_raw('Content-Transfer-Encoding');
+		if (defined($cte) && $cte =~ /\b7bit\b/i) {
+			$s = $part->body;
+			$err = undef if $s =~ /\A[[:ascii:]]+\z/s;
+		} else {
+			# Try to assume UTF-8 because Alpine seems to
+			# do wacky things and set charset=X-UNKNOWN
+			$part->charset_set('UTF-8');
+			$s = eval { $part->body_str };
+		}
 
 		# If forcing charset=UTF-8 failed,
 		# caller will warn further down...

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] solvergit: deal with false-positive dfpost: results
  2019-10-31  3:12 [PATCH 0/2] attached patches and false-positive dfpost: Eric Wong
  2019-10-31  3:12 ` [PATCH 1/2] msgiter: attempt to decode all text/* bodies Eric Wong
@ 2019-10-31  3:12 ` Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2019-10-31  3:12 UTC (permalink / raw)
  To: meta

When solving for blob 81c1164ae5 in https://public-inbox.org/git/,
at least two messages get indexed with the dfpost result for
that blob (after fixing MsgIter to decode all text/* parts):

1. https://public-inbox.org/git/b9fb52b8-8168-6bf0-9a72-1e6c44a281a5@oracle.com/
2. https://public-inbox.org/git/56664222-6c29-09dc-ef78-7b380b113c4a@oracle.com/

However, only the first message contains a usable patch.  So
we must adjust SolverGit to account for multiple messages
hitting the same "dfpost:" search result and attempt
"git apply" on all results, not just the first.

In the future, changes to SearchIdx.pm may rid us of invalid
search results and speed up performance (at the expense of
developer/indexing time); but we need to account for old search
indices, first.
---
 lib/PublicInbox/SolverGit.pm | 59 ++++++++++++++++++++++++++++--------
 1 file changed, 46 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 8878961e..b7327ffa 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -44,6 +44,7 @@ my $MAX_PATCH = 9999;
 #	smsg => PublicInbox::SearchMsg object containing diff
 #	path_a => pre-image path
 #	path_b => post-image path
+#	n => numeric path of the patch (relative to worktree)
 # }
 
 # don't bother if somebody sends us a patch with these path components,
@@ -122,8 +123,9 @@ sub extract_diff ($$$$$) {
 
 
 			# start writing the diff out to a tempfile
-			my $pn = ++$self->{tot};
-			open($tmp, '>', $self->{tmp}->dirname . "/$pn") or
+			my $path = ++$self->{tot};
+			$di->{n} = $path;
+			open($tmp, '>', $self->{tmp}->dirname . "/$path") or
 							die "open(tmp): $!";
 
 			push @$hdr_lines, $l;
@@ -180,7 +182,7 @@ sub filename_query ($) {
 	join('', map { qq( dfn:"$_") } split(/\.\./, $_[0]));
 }
 
-sub find_extract_diff ($$$) {
+sub find_extract_diffs ($$$) {
 	my ($self, $ibx, $want) = @_;
 	my $srch = $ibx->search or return;
 
@@ -208,14 +210,15 @@ sub find_extract_diff ($$$) {
 	my $msgs = $srch->query($q, { relevance => 1 });
 	my $re = qr/\Aindex ($pre[a-f0-9]*)\.\.($post[a-f0-9]*)(?: ([0-9]+))?/;
 
-	my $di;
+	my @di;
 	foreach my $smsg (@$msgs) {
 		$ibx->smsg_mime($smsg) or next;
 		msg_iter(delete($smsg->{mime}), sub {
-			$di ||= extract_diff($self, $_[0], $re, $ibx, $smsg);
+			my $di = extract_diff($self, $_[0], $re, $ibx, $smsg);
+			push @di, $di if defined($di);
 		});
-		return $di if $di;
 	}
+	@di ? \@di : undef;
 }
 
 sub prepare_index ($) {
@@ -423,6 +426,23 @@ sub start_ls_files ($$) {
 	});
 }
 
+sub oids_same_ish ($$) {
+	(index($_[0], $_[1]) == 0) || (index($_[1], $_[0]) == 0);
+}
+
+sub skip_identical ($$$) {
+	my ($self, $patches, $cur_oid_b) = @_;
+	while (my $nxt = $patches->[0]) {
+		if (oids_same_ish($cur_oid_b, $nxt->{oid_b})) {
+			dbg($self, 'skipping '.di_url($self, $nxt).
+				" for $cur_oid_b");
+			shift @$patches;
+		} else {
+			return;
+		}
+	}
+}
+
 sub do_git_apply ($) {
 	my ($self) = @_;
 	my $dn = $self->{tmp}->dirname;
@@ -434,16 +454,19 @@ sub do_git_apply ($) {
 	my $len = length(join(' ', @cmd));
 	my $total = $self->{tot};
 	my $di; # keep track of the last one for "git ls-files"
+	my $prv_oid_b;
 
 	do {
 		my $i = ++$self->{nr};
 		$di = shift @$patches;
 		dbg($self, "\napplying [$i/$total] " . di_url($self, $di) .
 			"\n" . join('', @{$di->{hdr_lines}}));
-		my $path = $total + 1 - $i;
+		my $path = $di->{n};
 		$len += length($path) + 1;
 		push @cmd, $path;
-	} while (@$patches && $len < $ARG_SIZE_MAX);
+		$prv_oid_b = $di->{oid_b};
+	} while (@$patches && $len < $ARG_SIZE_MAX &&
+		 !oids_same_ish($patches->[0]->{oid_b}, $prv_oid_b));
 
 	my $rdr = { 2 => 1 };
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $rdr);
@@ -451,7 +474,16 @@ sub do_git_apply ($) {
 		my ($bref) = @_;
 		dbg($self, $$bref);
 		if (my $err = $qsp->{err}) {
-			ERR($self, "git apply error: $err");
+			my $msg = "git apply error: $err";
+			my $nxt = $patches->[0];
+			if ($nxt && oids_same_ish($nxt->{oid_b}, $prv_oid_b)) {
+				dbg($self, $msg);
+				dbg($self, 'trying '.di_url($self, $nxt));
+			} else {
+				ERR($self, $msg);
+			}
+		} else {
+			skip_identical($self, $patches, $di->{oid_b});
 		}
 		eval { start_ls_files($self, $di) };
 		ERR($self, $@) if $@;
@@ -497,15 +529,16 @@ sub resolve_patch ($$) {
 
 	# scan through inboxes to look for emails which results in
 	# the oid we want:
-	my $di;
 	foreach my $ibx (@{$self->{inboxes}}) {
-		$di = find_extract_diff($self, $ibx, $want) or next;
+		my $diffs = find_extract_diffs($self, $ibx, $want) or next;
 
-		unshift @{$self->{patches}}, $di;
-		dbg($self, "found $cur_want in ".di_url($self, $di));
+		unshift @{$self->{patches}}, @$diffs;
+		dbg($self, "found $cur_want in ".
+			join("\n\t", map { di_url($self, $_) } @$diffs));
 
 		# good, we can find a path to the oid we $want, now
 		# lets see if we need to apply more patches:
+		my $di = $diffs->[0];
 		my $src = $di->{oid_a};
 
 		unless ($src =~ /\A0+\z/) {

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-31  3:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-31  3:12 [PATCH 0/2] attached patches and false-positive dfpost: Eric Wong
2019-10-31  3:12 ` [PATCH 1/2] msgiter: attempt to decode all text/* bodies Eric Wong
2019-10-31  3:12 ` [PATCH 2/2] solvergit: deal with false-positive dfpost: results Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).