From: Christian Couder <christian.couder@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
Elijah Newren <newren@gmail.com>,
Christian Couder <christian.couder@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Taylor Blau <me@ttaylorr.com>,
Rick Sanders <rick@sfconservancy.org>,
Git at SFC <git@sfconservancy.org>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Patrick Steinhardt <ps@pks.im>,
Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v2] SubmittingPatches: add section about AI
Date: Wed, 8 Oct 2025 10:37:53 +0200 [thread overview]
Message-ID: <CAP8UFD34TrBa-GV1wUpvhO9K+qjHpXF4gr=afY2nsXiNL_-S+Q@mail.gmail.com> (raw)
In-Reply-To: <aOBMHqLxNd86vgjH@fruit.crustytoothpaste.net>
On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-03 at 20:48:40, Elijah Newren wrote:
> > Would this mean that you wanted to ban contributions like d12166d3c8bb
> > (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> > ? We don't need to go theoretical, I've already contributed such a
> > patch series before -- 2 years ago -- and it was merged. Granted,
> > that was entirely documentation, and I called out the usage of AI in
> > the cover letter, and I manually checked every change (discarding many
> > of them) and split it into commits on my own, could easily explain any
> > change and why it was good, etc. And I was upfront about all of it.
>
> I think the main problem here is that we don't know the copyright
> status of LLM outputs.
It's very unlikely that whatever is decided about the copyright status
of LLM outputs will fundamentally change copyright law. So for example
small changes, or changes where a human has been involved a lot, or
changes that are very specific, and so on, are very likely acceptable.
> It is not uncommon for them to produce output
> that reflects their training input and we see evidence of that in, for
> instance, the New York Times lawsuit against OpenAI.
You might say something very similar about people contributing proprietary code:
"It is not uncommon to have people copy-paste some proprietary code
into an open source project and we see evidence of that in such and
such incidents."
So it's just fine to accept some degree of risk. We have to accept it
anyway. Saying "we will ban everything AI generated" will not make the
risk disappear either.
> As I said, the situation is very unclear legally, with active litigation
> in multiple countries, and we have to comply with pretty much every
> country's laws in this situation. Whether something is legal in the
> United States, where you're located, is completely irrelevant to whether
> it is legal in Canada, where I'm located, or Germany or the UK, where we
> have other contributors. We also have to consider whether it's legal in
> all of the countries that Git is distributed in, which includes every
> country in which Debian has a mirror[0], even countries under
> international sanctions, such as Iran, Russia, and Belarus.
I don't quite agree with this. Theoretically if the official mirrors
are only in a few countries, then only the laws in these few countries
(+ US law as the Conservancy is US based) might be really legally
relevant for the project. Then it's the responsibility of
distributions or people cloning/downloading the software to check that
it's legal in the countries they distribute or clone/download it.
In practice we should pay attention a bit to make sure we don't create
obvious legal problems for too many people, but if some countries
decide to have laws that are too stupid and ban too many things, we
could decide that we should definitely not pay attention to those
laws.
> It doesn't matter if the person using AI has indemnification, either,
> since that only covers civil matters, and at least in the U.S. and
> Canada, knowingly violating copyright is also a criminal offence.
>
> The sign-off process is designed to clearly state that a person has the
> ability to contribute code under the license and I don't think, as
> things stand, it's possible to make that assertion with code or
> documentation generated from an LLM except in very limited
> circumstances.
I think in practice those "very limited circumstances" can cover a lot
of different things though. Do we really want to enter into a legal
debate over what
https://en.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire means for
software for example? Or about allowing or disallowing translation of
documentation or commit messages based on the fact that the tools used
for translation use an LLM or not?
I have given a lot of examples of what is very likely acceptable.
Elijah has given a very good concrete example showing why we should
not outright ban AI too. If you think they are not good examples
please tell it clearly. Otherwise I think you cannot keep saying that
they are related to "very limited circumstances".
> I don't allow LLM-generated code in my personal projects
> that require sign-off for that reason, and neither does QEMU[1]. I
> don't think I could honestly assert either (a) or (b) in the DCO with
> LLM-generated code because it's not clear to me whether "I have the
> right to submit it under the…license."
>
> To quote the QEMU policy:
>
> To satisfy the DCO, the patch contributor has to fully understand the
> copyright and license status of content they are contributing to QEMU. With AI
> content generators, the copyright and license status of the output is
> ill-defined with no generally accepted, settled legal foundation.
>
> Where the training material is known, it is common for it to include large
> volumes of material under restrictive licensing/copyright terms. Even where
> the training material is all known to be under open source licenses, it is
> likely to be under a variety of terms, not all of which will be compatible
> with QEMU's licensing requirements.
The QEMU policy was discussed in the previous version already.
> I remember the SCO situation with Linux and how it really created a lot
> of uncertainty with Linux because SCO created FUD around Linux licensing
> and how that led to the DCO being created. I am aware of the fact that
> many open source contributors are very unhappy that their code has been
> used to train LLMs without retaining credits and copyright notices or
> honouring the license terms[2].
I don't think it's very relevant for your position on this. On the
contrary, if LLMs have been trained mostly with open source code, then
if they produce copyrighted output, that output is more likely to be
compatible with the GPL. It has even been suggested (and discussed in
this thread) that some AIs should be trained only with open source
material (for example MIT licensed material?) so that we could stop
worrying about including it. If that happens, there would be no reason
to outright ban AI generated content, right?
> And I have spent many years working
> with non-profits[3], where I have always been taught that we should
> avoid even the appearance of impropriety.
Adding a section restricting AI use, even if it doesn't go as far as
you would like, is already a first step in the direction you want. If
this gets merged, you can always send patches on top to make it more
restrictive.
> It may matter less what the situation actually ends up being legally
> (although it could end up being quite bad) and more whether someone can
> imply or suggest that Git is not being distributed in compliance with
> the license or contains infringing code, which could effectively make it
> undistributable because nobody wants to take that risk. And litigation,
> even if Git and its contributors are successful, can be extraordinarily
> expensive.
There are already legal risks anyway (see above).
next prev parent reply other threads:[~2025-10-08 8:38 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
2025-06-30 21:07 ` brian m. carlson
2025-06-30 21:23 ` Collin Funk
2025-07-01 10:36 ` Christian Couder
2025-07-01 11:07 ` Christian Couder
2025-07-01 17:33 ` Junio C Hamano
2025-07-01 16:20 ` Junio C Hamano
2025-07-08 14:23 ` Christian Couder
2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
2025-10-01 18:59 ` Chuck Wolber
2025-10-01 23:32 ` brian m. carlson
2025-10-02 2:30 ` Ben Knoble
2025-10-03 13:33 ` Christian Couder
2025-10-01 20:59 ` Junio C Hamano
2025-10-03 8:51 ` Christian Couder
2025-10-03 16:20 ` Junio C Hamano
2025-10-03 16:45 ` rsbecker
2025-10-08 7:22 ` Christian Couder
2025-10-01 21:37 ` brian m. carlson
2025-10-03 14:25 ` Christian Couder
2025-10-03 20:48 ` Elijah Newren
2025-10-03 22:20 ` brian m. carlson
2025-10-06 17:45 ` Junio C Hamano
2025-10-08 4:18 ` Elijah Newren
2025-10-12 15:07 ` Junio C Hamano
2025-10-08 9:28 ` Christian Couder
2025-10-13 18:14 ` Junio C Hamano
2025-10-23 17:32 ` Junio C Hamano
2025-10-08 4:18 ` Elijah Newren
2025-10-08 8:37 ` Christian Couder [this message]
2025-10-08 9:28 ` Michal Suchánek
2025-10-08 9:35 ` Christian Couder
2025-10-09 1:13 ` Collin Funk
2025-10-08 7:30 ` Christian Couder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP8UFD34TrBa-GV1wUpvhO9K+qjHpXF4gr=afY2nsXiNL_-S+Q@mail.gmail.com' \
--to=christian.couder@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=chriscool@tuxfamily.org \
--cc=git@sfconservancy.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=ps@pks.im \
--cc=rick@sfconservancy.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).