user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH] TODO: add item for searching based on git-patch-id(1)
@ 2019-10-01  3:37 Eric Wong
  2019-10-01 21:00 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2019-10-01  3:37 UTC (permalink / raw)
  To: meta

I forgot about this feature when I was implementing
blob-ID-based searches :x
---
 TODO | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/TODO b/TODO
index 2c525615..93054bb3 100644
--- a/TODO
+++ b/TODO
@@ -112,3 +112,6 @@ all need to be considered for everything we introduce)
 
 * make "git cat-file --batch" detect unlinked packfiles so we don't
   have to restart processes (very long-term)
+
+* support searching based on `git-patch-id --stable` to improve
+  bidirectional mapping of commits <=> emails
-- 
EW


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] TODO: add item for searching based on git-patch-id(1)
  2019-10-01  3:37 [PATCH] TODO: add item for searching based on git-patch-id(1) Eric Wong
@ 2019-10-01 21:00 ` Konstantin Ryabitsev
  2019-10-01 22:00   ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Ryabitsev @ 2019-10-01 21:00 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, Oct 01, 2019 at 03:37:47AM +0000, Eric Wong wrote:
> I forgot about this feature when I was implementing
> blob-ID-based searches :x
> ---
>  TODO | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/TODO b/TODO
> index 2c525615..93054bb3 100644
> --- a/TODO
> +++ b/TODO
> @@ -112,3 +112,6 @@ all need to be considered for everything we introduce)
>  
>  * make "git cat-file --batch" detect unlinked packfiles so we don't
>    have to restart processes (very long-term)
> +
> +* support searching based on `git-patch-id --stable` to improve
> +  bidirectional mapping of commits <=> emails

It would be handy, but a word of caution -- because it strips
whitespace, git-patch-id is not great for languages with syntactic
indentation, like Python. For example, the following two patches
generate the same patch-id, but one is actually malicious:

diff --git a/file1.py b/file1.py
index e574c49..6aa1937 100644
--- a/file1.py
+++ b/file1.py
@@ -1,3 +1,13 @@
 #!/usr/bin/python

+def is_logged_in(cookie):
+    if cookie:
+        print('User is logged in')
+        return True
+
+    return False
+
+if is_logged_in(True):
+    print('You are logged in')
+
 print('Hello!')

This one below is malicious, because is_logged_in() will always return
True:

diff --git a/file1.py b/file1.py
index e574c49..6aa1937 100644
--- a/file1.py
+++ b/file1.py
@@ -1,3 +1,13 @@
 #!/usr/bin/python
 
+def is_logged_in(cookie):
+    if cookie:
+        print('User is logged in')
+    return True
+
+    return False
+
+if is_logged_in(True):
+    print('You are logged in')
+
 print('Hello!')

So, I wouldn't use git-patch-id as a mechanism to look up patches,
except as an auxiliary one.

-K

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] TODO: add item for searching based on git-patch-id(1)
  2019-10-01 21:00 ` Konstantin Ryabitsev
@ 2019-10-01 22:00   ` Eric Wong
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Wong @ 2019-10-01 22:00 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Oct 01, 2019 at 03:37:47AM +0000, Eric Wong wrote:
> > +* support searching based on `git-patch-id --stable` to improve
> > +  bidirectional mapping of commits <=> emails
> 
> It would be handy, but a word of caution -- because it strips
> whitespace, git-patch-id is not great for languages with syntactic
> indentation, like Python. For example, the following two patches
> generate the same patch-id, but one is actually malicious:

Good point.  Makefiles also fall into that category, I wonder
what other languages are whitespace sensitive?

<snip>

> So, I wouldn't use git-patch-id as a mechanism to look up patches,
> except as an auxiliary one.

It's usable for 99% of patches for the kernel, though.  But
right, dfpost:$BLOB_ID matches should take precedence, and we
can use a lower weight for the patch-id in Xapian

The bigger question is the cost in time to reindex...

And ultimately, I wonder if dfpost:$BLOB_ID + s:$COMMIT_TITLE
is good enough, too...  I think I need to dig out something
I abandoned years ago for indexing coderepos and refactor that
to be less space-intensive now.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-10-01 22:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-01  3:37 [PATCH] TODO: add item for searching based on git-patch-id(1) Eric Wong
2019-10-01 21:00 ` Konstantin Ryabitsev
2019-10-01 22:00   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).