* GNU gnulib: calling for beta-testers
@ 2024-04-21 10:52 Bruno Haible
2024-04-21 11:52 ` Vivien Kraus
2024-04-22 7:56 ` Paul Eggert
0 siblings, 2 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-21 10:52 UTC (permalink / raw)
To: bug-gnulib
If you are developer on a package that uses GNU gnulib as part of its build
system:
gnulib-tool has been known for being slow for many years. We have listened to
your complaints. A rewrite of gnulib-tool in another programming language
(Python) is ready for beta-testing. It is between 8 times and 100 times faster
than the original gnulib-tool.
Both implementations should behave identically, that is, produce the same
generated files and the same output. You can help us ensure this, through the
following steps:
1. Make sure you have Python (version 3.7 or newer) installed on your
machine.
2. Update your gnulib checkout. (For some packages, it comes as a git
submodule named 'gnulib'.) Like this:
$ git checkout master
$ git pull
Set the environment variable GNULIB_SRCDIR, pointing to this checkout.
If the package is using a git submodule named 'gnulib', it is also
advisable to do
$ git commit -m 'build: Update gnulib submodule to latest.' gnulib
(as a preparation for step 5, because the --no-git option does not work
as expected in all variants of 'bootstrap').
3. Set an environment variable that enables checking that the two
implementations behave the same:
$ export GNULIB_TOOL_IMPL=sh+py
4. Clean the built files of your package:
$ make -k distclean
5. Regenerate the fetched and generated files of your package. Depending on
the package, this may be a command such as
$ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR
or
$ export GNULIB_SRCDIR; ./autopull.sh; ./autogen.sh
or, if no such script is available:
$ $GNULIB_SRCDIR/gnulib-tool --update
If there is a failure, due to differences between the 'sh' and 'py'
results, please report it to <bug-gnulib@gnu.org>.
6. If this invocation was successful, you can trust the rewritten gnulib-tool
and use it from now on, by setting the environment variable
$ export GNULIB_TOOL_IMPL=py
7. Continue with
$ ./configure
$ make
as usual.
And enjoy the speed! The rewritten gnulib-tool was implemented by Dmitry
Selyutin, Collin Funk, and me.
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
@ 2024-04-21 11:52 ` Vivien Kraus
2024-04-22 7:56 ` Paul Eggert
1 sibling, 0 replies; 11+ messages in thread
From: Vivien Kraus @ 2024-04-21 11:52 UTC (permalink / raw)
To: bug-gnulib
Dear Gnulib developers,
Le dimanche 21 avril 2024 à 06:52 -0400, Bruno Haible a écrit :
> If you are developer on a package that uses GNU gnulib as part of its
> build
> system:
I have a very simple personal project using gnulib.
> 1. Make sure you have Python (version 3.7 or newer) installed on your
> machine.
>
> 2. Update your gnulib checkout. (For some packages, it comes as a git
> submodule named 'gnulib'.)
>
> 3. Set an environment variable that enables checking that the two
> implementations behave the same:
>
> $ export GNULIB_TOOL_IMPL=sh+py
>
>
> 4. Clean the built files of your package
>
> 5. Regenerate the fetched and generated files of your package.
> Depending on
> the package, this may be a command such as
>
> $ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR
>
> If there is a failure, due to differences between the 'sh' and
> 'py'
> results, please report it to <bug-gnulib@gnu.org>.
There are no failures.
> The rewritten gnulib-tool was implemented by Dmitry
> Selyutin, Collin Funk, and me.
You worked well, thank you.
Best regards,
Vivien
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
2024-04-21 11:52 ` Vivien Kraus
@ 2024-04-22 7:56 ` Paul Eggert
2024-04-22 8:23 ` Collin Funk
1 sibling, 1 reply; 11+ messages in thread
From: Paul Eggert @ 2024-04-22 7:56 UTC (permalink / raw)
To: bug-gnulib
[-- Attachment #1: Type: text/plain, Size: 958 bytes --]
On 2024-04-21 03:52, Bruno Haible wrote:
> 5. Regenerate the fetched and generated files of your package. Depending on
> the package, this may be a command such as
>
> $ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR
I had a failure with this step when using current GNU diffutils
(3d1a56b906c31cc6e89f6a9c008ba54d734d4ec2, which has a gnulib submodule
with Gnulib commit 99ce3a004a2974c71f510f5df5bc6be7e2811d30) with
current Gnulib (5b6e410e04b48c0fd62e954fafa220ef301d2c70) and building
on Ubuntu 23.10 x86-64. Build log attached. To reproduce, clone
diffutils and then:
export GNULIB_TOOL_IMPL=sh+py
./bootstrap
./configure
make -k distclean
git submodule foreach git pull origin master
git commit -m 'build: update gnulib submodule to latest' gnulib
./bootstrap --no-git --gnulib-srcdir=gnulib
The problem is that the Python-based build leaves behind a __pycache__
directory, which causes the comparison to fail.
[-- Attachment #2: diffutils-log.txt.gz --]
[-- Type: application/gzip, Size: 50952 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-22 7:56 ` Paul Eggert
@ 2024-04-22 8:23 ` Collin Funk
2024-04-22 8:51 ` diffutils __pycache__ failure Collin Funk
2024-04-22 11:22 ` GNU gnulib: calling for beta-testers Bruno Haible
0 siblings, 2 replies; 11+ messages in thread
From: Collin Funk @ 2024-04-22 8:23 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
Hi Paul,
On 4/22/24 12:56 AM, Paul Eggert wrote:> export GNULIB_TOOL_IMPL=sh+py
> ./bootstrap
> ./configure
> make -k distclean
> git submodule foreach git pull origin master
> git commit -m 'build: update gnulib submodule to latest' gnulib
> ./bootstrap --no-git --gnulib-srcdir=gnulib
>
> The problem is that the Python-based build leaves behind a __pycache__ directory, which causes the comparison to fail.
I always noticed that directory in gnulib/pygnulib. I assumed
it was my LSP or something causing it...
Now looking into this, I think Python creates it upon executing a
script and/or doing 'import module-name'.
It looks like it can be turned off with 'python3 -B' or setting the
PYTHONDONTWRITEBYTECODE environment variable to a non-empty string [1]
[2].
Since I always used a separate gnulib clone that wasn't in a
subdirectory (data caps unfortunately), I never ran into this issue.
Time for me to test my hypothesis and hope I didn't speak too soon. :)
[1] https://docs.python.org/3/using/cmdline.html#cmdoption-B
[2] https://docs.python.org/3/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE
Collin
^ permalink raw reply [flat|nested] 11+ messages in thread
* diffutils __pycache__ failure.
2024-04-22 8:23 ` Collin Funk
@ 2024-04-22 8:51 ` Collin Funk
2024-04-22 11:38 ` Bruno Haible
2024-04-22 11:22 ` GNU gnulib: calling for beta-testers Bruno Haible
1 sibling, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22 8:51 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib
On 4/22/24 1:23 AM, Collin Funk wrote:
> It looks like it can be turned off with 'python3 -B' or setting the
> PYTHONDONTWRITEBYTECODE environment variable to a non-empty string [1]
> [2].
I was able to reproduce the issue. Modifying the 'gnulib-tool.py'
shell script in the 'gnulib' submodule so that -B is passed to
'python3' fixes it for me.
Leaving it without a ChangeLog entry for now just incase someone has a
better idea. I have no clue if this has a noticeable performance
impact or not.
diff --git a/gnulib-tool.py b/gnulib-tool.py
index cdcd316909..1d45181014 100755
--- a/gnulib-tool.py
+++ b/gnulib-tool.py
@@ -147,4 +147,4 @@
profiler_args=
# For profiling, cf. <https://docs.python.org/3/library/profile.html>.
#profiler_args="-m cProfile -s tottime"
-exec python3 $profiler_args "$gnulib_dir/.gnulib-tool.py" "$@"
+exec python3 -B $profiler_args "$gnulib_dir/.gnulib-tool.py" "$@"
Collin
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-22 8:23 ` Collin Funk
2024-04-22 8:51 ` diffutils __pycache__ failure Collin Funk
@ 2024-04-22 11:22 ` Bruno Haible
2024-04-22 20:00 ` Collin Funk
1 sibling, 1 reply; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 11:22 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib; +Cc: Collin Funk
[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]
Thanks for the report, Paul.
Thanks for the preliminary investigation, Collin.
> > ./bootstrap
> > ./configure
> > make -k distclean
> > git submodule foreach git pull origin master
> > git commit -m 'build: update gnulib submodule to latest' gnulib
> > ./bootstrap --no-git --gnulib-srcdir=gnulib
> >
> > The problem is that the Python-based build leaves behind a __pycache__ directory, which causes the comparison to fail.
I reproduce the issue. It's because executing gnulib-tool.py creates
gnulib/pygnulib/__pycache__, while gnulib-tool.sh does not do so.
Two workarounds are possible. I'm committing both, since the first
workaround works only with Python ≥ 3.8.
* Let Python create its cache not in gnulib/pygnulib/__pycache__,
but instead in
/tmp/gnulib-python-cache-$USER/<absolute_file_name>/gnulib/pygnulib/ .
* Ignore the __pycache__ directory during the comparison.
The first workaround should fix trouble similar to what we regularly
see with 'autom4te.cache': Unnecessary difference while comparing source
trees, unnecessary "git status" noise. Clutter.
2024-04-22 Bruno Haible <bruno@clisp.org>
gnulib-tool: Fix trouble caused by Python's bytecode cache.
Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
* gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
comparison.
2024-04-22 Bruno Haible <bruno@clisp.org>
gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
* gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
__pycache__ directory in the developer's gnulib checkout (only effective
with Python ≥ 3.8).
[-- Attachment #2: 0001-gnulib-tool.py-Fix-trouble-caused-by-Python-s-byteco.patch --]
[-- Type: text/x-patch, Size: 1951 bytes --]
From eda62139d838f53e4953db26019e5a4b8b805847 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Mon, 22 Apr 2024 13:11:05 +0200
Subject: [PATCH 1/2] gnulib-tool.py: Fix trouble caused by Python's bytecode
cache.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
* gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
__pycache__ directory in the developer's gnulib checkout (only effective
with Python ≥ 3.8).
---
ChangeLog | 9 +++++++++
gnulib-tool.py | 6 ++++++
2 files changed, 15 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index b3cef64936..4a272d326e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-04-22 Bruno Haible <bruno@clisp.org>
+
+ gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
+ Reported by Paul Eggert in
+ <https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
+ * gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
+ __pycache__ directory in the developer's gnulib checkout (only effective
+ with Python ≥ 3.8).
+
2024-04-21 Collin Funk <collin.funk1@gmail.com>
gnulib-tool.py: Make temporary directories recognizable.
diff --git a/gnulib-tool.py b/gnulib-tool.py
index cdcd316909..81537c272c 100755
--- a/gnulib-tool.py
+++ b/gnulib-tool.py
@@ -144,6 +144,12 @@
func_fatal_error "python3 not found; try setting GNULIB_TOOL_IMPL=sh"
fi
+# Tell Python to store the compiled bytecode outside the gnulib directory.
+if test -z "$PYTHONPYCACHEPREFIX"; then
+ PYTHONPYCACHEPREFIX="${TMPDIR-/tmp}/gnulib-python-cache-${USER-$LOGNAME}"
+ export PYTHONPYCACHEPREFIX
+fi
+
profiler_args=
# For profiling, cf. <https://docs.python.org/3/library/profile.html>.
#profiler_args="-m cProfile -s tottime"
--
2.34.1
[-- Attachment #3: 0002-gnulib-tool-Fix-trouble-caused-by-Python-s-bytecode-.patch --]
[-- Type: text/x-patch, Size: 1609 bytes --]
From ab5390ae6d8db323420874d1c1334feb77af9cb1 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Mon, 22 Apr 2024 13:12:35 +0200
Subject: [PATCH 2/2] gnulib-tool: Fix trouble caused by Python's bytecode
cache.
Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
* gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
comparison.
---
ChangeLog | 8 ++++++++
gnulib-tool | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/ChangeLog b/ChangeLog
index 4a272d326e..462823888d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2024-04-22 Bruno Haible <bruno@clisp.org>
+
+ gnulib-tool: Fix trouble caused by Python's bytecode cache.
+ Reported by Paul Eggert in
+ <https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
+ * gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
+ comparison.
+
2024-04-22 Bruno Haible <bruno@clisp.org>
gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
diff --git a/gnulib-tool b/gnulib-tool
index 6d430e56e6..85b62883c6 100755
--- a/gnulib-tool
+++ b/gnulib-tool
@@ -199,7 +199,7 @@ case "$GNULIB_TOOL_IMPL" in
else
diff_options=
fi
- diff -r $diff_options -q . "$tmp" >/dev/null ||
+ diff -r $diff_options --exclude=__pycache__ -q . "$tmp" >/dev/null ||
func_fatal_error "gnulib-tool.py produced different files than gnulib-tool.sh! Compare `pwd` and $tmp."
# Compare the two outputs.
diff -q "$tmp-sh-out" "$tmp-py-out" >/dev/null ||
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: diffutils __pycache__ failure.
2024-04-22 8:51 ` diffutils __pycache__ failure Collin Funk
@ 2024-04-22 11:38 ` Bruno Haible
2024-04-22 19:44 ` Collin Funk
0 siblings, 1 reply; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 11:38 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib; +Cc: Collin Funk
Collin Funk wrote:
> I have no clue if this has a noticeable performance impact or not.
Can you measure it, please? For example, with
GNULIB_TOOL_IMPL=py time ./test-all.sh
I measure a difference in the 2% range, but it's not clear to me whether
-B slows down or speeds up things :)
Bruno
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: diffutils __pycache__ failure.
2024-04-22 11:38 ` Bruno Haible
@ 2024-04-22 19:44 ` Collin Funk
2024-04-22 20:55 ` Bruno Haible
0 siblings, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22 19:44 UTC (permalink / raw)
To: Bruno Haible, Paul Eggert, bug-gnulib
On 4/22/24 4:38 AM, Bruno Haible wrote:
> Collin Funk wrote:
>> I have no clue if this has a noticeable performance impact or not.
>
> Can you measure it, please? For example, with
> GNULIB_TOOL_IMPL=py time ./test-all.sh
>
> I measure a difference in the 2% range, but it's not clear to me whether
> -B slows down or speeds up things :)
Sure, here is the results using the -B flag. I'm removing the
__pycache__ directory before using -B flag to make sure it doesn't get
read.
Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in import-tests:
no -B flag: 0m16.699s
-B flag: 0m20.892s
Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in create-tests:
no -B flag: 2m45.046s
-B flag: 2m46.674s
The create-tests spend most of their time in autoconf and friends if I
remember correctly.
The import tests feel noticeably slower with -B to me. But the test is
imperfect of course. 1 run, maybe Firefox was working very hard for
one test and not the other, etc. :)
Collin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-22 11:22 ` GNU gnulib: calling for beta-testers Bruno Haible
@ 2024-04-22 20:00 ` Collin Funk
2024-04-22 20:56 ` Bruno Haible
0 siblings, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22 20:00 UTC (permalink / raw)
To: Bruno Haible, Paul Eggert, bug-gnulib
On 4/22/24 4:22 AM, Bruno Haible wrote:
> The first workaround should fix trouble similar to what we regularly
> see with 'autom4te.cache': Unnecessary difference while comparing source
> trees, unnecessary "git status" noise. Clutter.
I don't think the Python stuff should clutter 'git status' atleast.
$ cat pygnulib/.gitignore
*.pyc
Unless Python creates other files in there.
Collin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: diffutils __pycache__ failure.
2024-04-22 19:44 ` Collin Funk
@ 2024-04-22 20:55 ` Bruno Haible
0 siblings, 0 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 20:55 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib, Collin Funk
Collin Funk wrote:
> >> I have no clue if this has a noticeable performance impact or not.
> >
> > Can you measure it, please? For example, with
> > GNULIB_TOOL_IMPL=py time ./test-all.sh
> >
> > I measure a difference in the 2% range, but it's not clear to me whether
> > -B slows down or speeds up things :)
>
> Sure, here is the results using the -B flag. I'm removing the
> __pycache__ directory before using -B flag to make sure it doesn't get
> read.
>
> Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in import-tests:
>
> no -B flag: 0m16.699s
> -B flag: 0m20.892s
>
> Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in create-tests:
>
> no -B flag: 2m45.046s
> -B flag: 2m46.674s
Thanks for measuring it. So, the -B flag causes a slowdown.
> The create-tests spend most of their time in autoconf and friends if I
> remember correctly.
>
> The import tests feel noticeably slower with -B to me.
This is explained by the fact that the import tests do nearly 100
gnulib-tool invocations: The same just-in-time compilation must happen
in memory 100 times. This explains the 4 seconds of slowdown.
Bruno
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: GNU gnulib: calling for beta-testers
2024-04-22 20:00 ` Collin Funk
@ 2024-04-22 20:56 ` Bruno Haible
0 siblings, 0 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 20:56 UTC (permalink / raw)
To: Paul Eggert, bug-gnulib, Collin Funk
Collin Funk wrote:
> > The first workaround should fix trouble similar to what we regularly
> > see with 'autom4te.cache': Unnecessary difference while comparing source
> > trees, unnecessary "git status" noise. Clutter.
>
> I don't think the Python stuff should clutter 'git status' atleast.
>
> $ cat pygnulib/.gitignore
> *.pyc
OK, good. So, it would not have produced unnecessary "git status" noise.
Still, it showed up during recursive diff. My first workaround fixes that.
Bruno
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-22 20:56 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
2024-04-21 11:52 ` Vivien Kraus
2024-04-22 7:56 ` Paul Eggert
2024-04-22 8:23 ` Collin Funk
2024-04-22 8:51 ` diffutils __pycache__ failure Collin Funk
2024-04-22 11:38 ` Bruno Haible
2024-04-22 19:44 ` Collin Funk
2024-04-22 20:55 ` Bruno Haible
2024-04-22 11:22 ` GNU gnulib: calling for beta-testers Bruno Haible
2024-04-22 20:00 ` Collin Funk
2024-04-22 20:56 ` Bruno Haible
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).