git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	 Christian Couder <christian.couder@gmail.com>,
	git@vger.kernel.org,  Junio C Hamano <gitster@pobox.com>,
	Taylor Blau <me@ttaylorr.com>,
	 Rick Sanders <rick@sfconservancy.org>,
	Git at SFC <git@sfconservancy.org>,
	 Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Patrick Steinhardt <ps@pks.im>,
	 Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v2] SubmittingPatches: add section about AI
Date: Fri, 3 Oct 2025 16:25:28 +0200	[thread overview]
Message-ID: <CAP8UFD0=W3Mn8FQBmWFPN+3G9V73iorK-Y9Hs-LQ69hWCBeDOw@mail.gmail.com> (raw)
In-Reply-To: <aN2fG-nS9fE5-2jD@fruit.crustytoothpaste.net>

On Wed, Oct 1, 2025 at 11:37 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-01 at 14:02:50, Christian Couder wrote:
> > +[[ai]]
> > +=== Use of Artificial Intelligence (AI)
> > +
> > +The Developer's Certificate of Origin requires contributors to certify
> > +that they know the origin of their contributions to the project and
> > +that they have the right to submit it under the project's license.
> > +It's not yet clear that this can be legally satisfied when submitting
> > +significant amount of content that has been generated by AI tools.
>
> Perhaps we'd like to write this:
>
>   It's not yet clear that this can be legally satisfied when submitting
>   significant amount of content that has been generated by AI tools,
>   so we cannot accept this content in our project.
>
> If we're going to have a policy, we need to be direct about it and not
> let people draw their own conclusions.  Many people don't have English
> as a first language and we don't want people trying to language lawyer.

I understand why you want to be direct, but unfortunately (or
fortunately depending on your point of view) some generated content is
acceptable if it is not too big, or if it is specific enough or if a
human has been involved enough. In a number of cases like for example
translated or reworded content, wrapping lines, refactored code, or
renamed variables, it is likely that a significant amount of content
is acceptable because a human has already been involved and the
content is specific enough. If we say right away that we cannot accept
it, we might prevent interesting and useful use cases.

> We could say something like this:
>
>   Please do not sign off your work if you’re using an LLM to contribute
>   unless you have included copyright and license information for all the
>   code used in that LLM.

For now I don't think we want or need to be involved in checking or
trying to check what code and/or training data has been/is used in an
LLM, what LLM(s) are used in which AI tools, all the AI tools that a
user might have used, etc. See my reply to Chuck Wolber's review
related to declare-ai.org.

> This allows the possibility that, say, Google trains an LLM entirely on
> their own code, such that there is only one copyright holder and they
> can license it as they see fit.  I don't think we _need_ to consider
> that case if we don't want to allow that (say, for code quality
> reasons), but we could if we wanted to.

I agree it would be nice if some LLMs were trained only on specific
code (or on no existing code at all) so that we could alleviate the
legal issue with them, but for now I don't think they exist. We can
always adapt later if/when they ever appear.

> > +Another issue with AI generated content is that AIs still often
> > +hallucinate or just produce bad code, commit messages, documentation
> > +or output, even when you point out their mistakes.
> > +
> > +To avoid these issues, we will reject anything that looks AI
> > +generated, that sounds overly formal or bloated, that looks like AI
> > +slop, that looks good on the surface but makes no sense, or that
> > +senders don’t understand or cannot explain.
>
> I've definitely seen this.  LLMs also typically do not write nice,
> logical, bisectable commits, which I personally dislike as a reviewer.
>
> > +We strongly recommend using AI tools carefully and responsibly.
>
> I think this is maybe not definitive enough.  If we don't believe it's
> possible to sign-off when code is generated using LLMs, then we should
> say definitively, "Contributors may not use AI to write contributions to
> Git," or something similarly clear.

I think it's far too restrictive for no good reason. See above and see
my discussion about this with Junio on the first version of this patch
he sent last July.

> Right now, this sounds too ambiguous and it might allow someone to write
> substantial code that they think is of good quality using an LLM because
> in their view that's careful and responsible, when we don't think that
> users can sign off on that and therefore that's not possible.  Telling
> people to use tools "carefully and responsibly" is like telling people
> to drive "a reasonable and prudent speed" without further qualification
> and then being surprised when they go 200 km/hr down the road.

The sentence ("We strongly recommend using AI tools carefully and
responsibly.") is designed to make people pause and think a bit when
they are reading machinally or just skimming the doc. It's not
designed to set a clear limit on what is acceptable and what is not.
And in fact it couldn't do so because there is no such clear limit.

> I'd like to see the language be more like our code of conduct in that it
> is broad and covers a wide variety of behaviour but also explicitly
> states what is and is not acceptable to avoid ambiguity, confusion, or
> argument.

Feel free to make more suggestions. I don't think your goal is easy to
achieve though.

> > +Contributors would often benefit more from AI by using it to guide and
> > +help them step by step towards producing a solution by themselves
> > +rather than by asking for a full solution that they would then mostly
> > +copy-paste. They can also use AI to help with debugging, or with
> > +checking for obvious mistakes, things that can be improved, things
> > +that don’t match our style, guidelines or our feedback, before sending
> > +it to us.
>
> This kind of use I feel is less objectionable.  I think it might be
> acceptable to use an LLM as a guide, a linter, or a first-pass code
> review.

Yeah, it looks like we all agree on that. The issue is that the limit
between these acceptable kinds of use and other problematic ones is
fuzzy.

Thanks.


  reply	other threads:[~2025-10-03 14:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
2025-06-30 21:07 ` brian m. carlson
2025-06-30 21:23   ` Collin Funk
2025-07-01 10:36 ` Christian Couder
2025-07-01 11:07   ` Christian Couder
2025-07-01 17:33     ` Junio C Hamano
2025-07-01 16:20   ` Junio C Hamano
2025-07-08 14:23     ` Christian Couder
2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
2025-10-01 18:59   ` Chuck Wolber
2025-10-01 23:32     ` brian m. carlson
2025-10-02  2:30       ` Ben Knoble
2025-10-03 13:33     ` Christian Couder
2025-10-01 20:59   ` Junio C Hamano
2025-10-03  8:51     ` Christian Couder
2025-10-03 16:20       ` Junio C Hamano
2025-10-03 16:45         ` rsbecker
2025-10-08  7:22         ` Christian Couder
2025-10-01 21:37   ` brian m. carlson
2025-10-03 14:25     ` Christian Couder [this message]
2025-10-03 20:48     ` Elijah Newren
2025-10-03 22:20       ` brian m. carlson
2025-10-06 17:45         ` Junio C Hamano
2025-10-08  4:18           ` Elijah Newren
2025-10-12 15:07             ` Junio C Hamano
2025-10-08  9:28           ` Christian Couder
2025-10-13 18:14             ` Junio C Hamano
2025-10-23 17:32               ` Junio C Hamano
2025-10-08  4:18         ` Elijah Newren
2025-10-08  8:37         ` Christian Couder
2025-10-08  9:28           ` Michal Suchánek
2025-10-08  9:35             ` Christian Couder
2025-10-09  1:13           ` Collin Funk
2025-10-08  7:30       ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD0=W3Mn8FQBmWFPN+3G9V73iorK-Y9Hs-LQ69hWCBeDOw@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=chriscool@tuxfamily.org \
    --cc=git@sfconservancy.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=rick@sfconservancy.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).