bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* GNU gnulib: calling for beta-testers
@ 2024-04-21 10:52 Bruno Haible
  2024-04-21 11:52 ` Vivien Kraus
  2024-04-22  7:56 ` Paul Eggert
  0 siblings, 2 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-21 10:52 UTC (permalink / raw)
  To: bug-gnulib

If you are developer on a package that uses GNU gnulib as part of its build
system:

gnulib-tool has been known for being slow for many years. We have listened to
your complaints. A rewrite of gnulib-tool in another programming language
(Python) is ready for beta-testing. It is between 8 times and 100 times faster
than the original gnulib-tool.

Both implementations should behave identically, that is, produce the same
generated files and the same output. You can help us ensure this, through the
following steps:

1. Make sure you have Python (version 3.7 or newer) installed on your
machine.

2. Update your gnulib checkout. (For some packages, it comes as a git
submodule named 'gnulib'.) Like this:

  $ git checkout master
  $ git pull

     Set the environment variable GNULIB_SRCDIR, pointing to this checkout.

     If the package is using a git submodule named 'gnulib', it is also
advisable to do

  $ git commit -m 'build: Update gnulib submodule to latest.' gnulib

     (as a preparation for step 5, because the --no-git option does not work
as expected in all variants of 'bootstrap').

3. Set an environment variable that enables checking that the two
implementations behave the same:

  $ export GNULIB_TOOL_IMPL=sh+py


4. Clean the built files of your package:

  $ make -k distclean


5. Regenerate the fetched and generated files of your package. Depending on
the package, this may be a command such as

  $ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR

     or

  $ export GNULIB_SRCDIR; ./autopull.sh; ./autogen.sh

     or, if no such script is available:

  $ $GNULIB_SRCDIR/gnulib-tool --update

     If there is a failure, due to differences between the 'sh' and 'py'
results, please report it to <bug-gnulib@gnu.org>.

6. If this invocation was successful, you can trust the rewritten gnulib-tool
and use it from now on, by setting the environment variable

  $ export GNULIB_TOOL_IMPL=py


7. Continue with

  $ ./configure
  $ make

     as usual.

And enjoy the speed! The rewritten gnulib-tool was implemented by Dmitry
Selyutin, Collin Funk, and me.



_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
@ 2024-04-21 11:52 ` Vivien Kraus
  2024-04-22  7:56 ` Paul Eggert
  1 sibling, 0 replies; 11+ messages in thread
From: Vivien Kraus @ 2024-04-21 11:52 UTC (permalink / raw)
  To: bug-gnulib

Dear Gnulib developers,

Le dimanche 21 avril 2024 à 06:52 -0400, Bruno Haible a écrit :
> If you are developer on a package that uses GNU gnulib as part of its
> build
> system:

I have a very simple personal project using gnulib.

> 1. Make sure you have Python (version 3.7 or newer) installed on your
> machine.
> 
> 2. Update your gnulib checkout. (For some packages, it comes as a git
> submodule named 'gnulib'.)
> 
> 3. Set an environment variable that enables checking that the two
> implementations behave the same:
> 
>   $ export GNULIB_TOOL_IMPL=sh+py
> 
> 
> 4. Clean the built files of your package
> 
> 5. Regenerate the fetched and generated files of your package.
> Depending on
> the package, this may be a command such as
> 
>   $ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR
> 
>      If there is a failure, due to differences between the 'sh' and
> 'py'
> results, please report it to <bug-gnulib@gnu.org>.

There are no failures.

> The rewritten gnulib-tool was implemented by Dmitry
> Selyutin, Collin Funk, and me.

You worked well, thank you.

Best regards,

Vivien


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
  2024-04-21 11:52 ` Vivien Kraus
@ 2024-04-22  7:56 ` Paul Eggert
  2024-04-22  8:23   ` Collin Funk
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Eggert @ 2024-04-22  7:56 UTC (permalink / raw)
  To: bug-gnulib

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On 2024-04-21 03:52, Bruno Haible wrote:

> 5. Regenerate the fetched and generated files of your package. Depending on
> the package, this may be a command such as
> 
>    $ ./bootstrap --no-git --gnulib-srcdir=$GNULIB_SRCDIR

I had a failure with this step when using current GNU diffutils 
(3d1a56b906c31cc6e89f6a9c008ba54d734d4ec2, which has a gnulib submodule 
with Gnulib commit 99ce3a004a2974c71f510f5df5bc6be7e2811d30) with 
current Gnulib (5b6e410e04b48c0fd62e954fafa220ef301d2c70) and building 
on Ubuntu 23.10 x86-64. Build log attached. To reproduce, clone 
diffutils and then:

   export GNULIB_TOOL_IMPL=sh+py
   ./bootstrap
   ./configure
   make -k distclean
   git submodule foreach git pull origin master
   git commit -m 'build: update gnulib submodule to latest' gnulib
   ./bootstrap --no-git --gnulib-srcdir=gnulib

The problem is that the Python-based build leaves behind a __pycache__ 
directory, which causes the comparison to fail.

[-- Attachment #2: diffutils-log.txt.gz --]
[-- Type: application/gzip, Size: 50952 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-22  7:56 ` Paul Eggert
@ 2024-04-22  8:23   ` Collin Funk
  2024-04-22  8:51     ` diffutils __pycache__ failure Collin Funk
  2024-04-22 11:22     ` GNU gnulib: calling for beta-testers Bruno Haible
  0 siblings, 2 replies; 11+ messages in thread
From: Collin Funk @ 2024-04-22  8:23 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib

Hi Paul,

On 4/22/24 12:56 AM, Paul Eggert wrote:>   export GNULIB_TOOL_IMPL=sh+py
>   ./bootstrap
>   ./configure
>   make -k distclean
>   git submodule foreach git pull origin master
>   git commit -m 'build: update gnulib submodule to latest' gnulib
>   ./bootstrap --no-git --gnulib-srcdir=gnulib
> 
> The problem is that the Python-based build leaves behind a __pycache__ directory, which causes the comparison to fail.

I always noticed that directory in gnulib/pygnulib. I assumed
it was my LSP or something causing it...

Now looking into this, I think Python creates it upon executing a
script and/or doing 'import module-name'.

It looks like it can be turned off with 'python3 -B' or setting the
PYTHONDONTWRITEBYTECODE environment variable to a non-empty string [1]
[2].

Since I always used a separate gnulib clone that wasn't in a
subdirectory (data caps unfortunately), I never ran into this issue.

Time for me to test my hypothesis and hope I didn't speak too soon. :)

[1] https://docs.python.org/3/using/cmdline.html#cmdoption-B
[2] https://docs.python.org/3/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE

Collin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* diffutils __pycache__ failure.
  2024-04-22  8:23   ` Collin Funk
@ 2024-04-22  8:51     ` Collin Funk
  2024-04-22 11:38       ` Bruno Haible
  2024-04-22 11:22     ` GNU gnulib: calling for beta-testers Bruno Haible
  1 sibling, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22  8:51 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib

On 4/22/24 1:23 AM, Collin Funk wrote:
> It looks like it can be turned off with 'python3 -B' or setting the
> PYTHONDONTWRITEBYTECODE environment variable to a non-empty string [1]
> [2].

I was able to reproduce the issue. Modifying the 'gnulib-tool.py'
shell script in the 'gnulib' submodule so that -B is passed to
'python3' fixes it for me.

Leaving it without a ChangeLog entry for now just incase someone has a
better idea. I have no clue if this has a noticeable performance
impact or not.

diff --git a/gnulib-tool.py b/gnulib-tool.py
index cdcd316909..1d45181014 100755
--- a/gnulib-tool.py
+++ b/gnulib-tool.py
@@ -147,4 +147,4 @@
 profiler_args=
 # For profiling, cf. <https://docs.python.org/3/library/profile.html>.
 #profiler_args="-m cProfile -s tottime"
-exec python3 $profiler_args "$gnulib_dir/.gnulib-tool.py" "$@"
+exec python3 -B $profiler_args "$gnulib_dir/.gnulib-tool.py" "$@"

Collin


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-22  8:23   ` Collin Funk
  2024-04-22  8:51     ` diffutils __pycache__ failure Collin Funk
@ 2024-04-22 11:22     ` Bruno Haible
  2024-04-22 20:00       ` Collin Funk
  1 sibling, 1 reply; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 11:22 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib; +Cc: Collin Funk

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Thanks for the report, Paul.
Thanks for the preliminary investigation, Collin.

> >   ./bootstrap
> >   ./configure
> >   make -k distclean
> >   git submodule foreach git pull origin master
> >   git commit -m 'build: update gnulib submodule to latest' gnulib
> >   ./bootstrap --no-git --gnulib-srcdir=gnulib
> > 
> > The problem is that the Python-based build leaves behind a __pycache__ directory, which causes the comparison to fail.

I reproduce the issue. It's because executing gnulib-tool.py creates
gnulib/pygnulib/__pycache__, while gnulib-tool.sh does not do so.

Two workarounds are possible. I'm committing both, since the first
workaround works only with Python ≥ 3.8.
  * Let Python create its cache not in gnulib/pygnulib/__pycache__,
    but instead in
    /tmp/gnulib-python-cache-$USER/<absolute_file_name>/gnulib/pygnulib/ .
  * Ignore the __pycache__ directory during the comparison.

The first workaround should fix trouble similar to what we regularly
see with 'autom4te.cache': Unnecessary difference while comparing source
trees, unnecessary "git status" noise. Clutter.


2024-04-22  Bruno Haible  <bruno@clisp.org>

	gnulib-tool: Fix trouble caused by Python's bytecode cache.
	Reported by Paul Eggert in
	<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
	* gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
	comparison.

2024-04-22  Bruno Haible  <bruno@clisp.org>

	gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
	Reported by Paul Eggert in
	<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
	* gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
	__pycache__ directory in the developer's gnulib checkout (only effective
	with Python ≥ 3.8).


[-- Attachment #2: 0001-gnulib-tool.py-Fix-trouble-caused-by-Python-s-byteco.patch --]
[-- Type: text/x-patch, Size: 1951 bytes --]

From eda62139d838f53e4953db26019e5a4b8b805847 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Mon, 22 Apr 2024 13:11:05 +0200
Subject: [PATCH 1/2] gnulib-tool.py: Fix trouble caused by Python's bytecode
 cache.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.

* gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
__pycache__ directory in the developer's gnulib checkout (only effective
with Python ≥ 3.8).
---
 ChangeLog      | 9 +++++++++
 gnulib-tool.py | 6 ++++++
 2 files changed, 15 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index b3cef64936..4a272d326e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-04-22  Bruno Haible  <bruno@clisp.org>
+
+	gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
+	Reported by Paul Eggert in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
+	* gnulib-tool.py: Set PYTHONPYCACHEPREFIX, so as to avoid creating a
+	__pycache__ directory in the developer's gnulib checkout (only effective
+	with Python ≥ 3.8).
+
 2024-04-21  Collin Funk  <collin.funk1@gmail.com>
 
 	gnulib-tool.py: Make temporary directories recognizable.
diff --git a/gnulib-tool.py b/gnulib-tool.py
index cdcd316909..81537c272c 100755
--- a/gnulib-tool.py
+++ b/gnulib-tool.py
@@ -144,6 +144,12 @@
   func_fatal_error "python3 not found; try setting GNULIB_TOOL_IMPL=sh"
 fi
 
+# Tell Python to store the compiled bytecode outside the gnulib directory.
+if test -z "$PYTHONPYCACHEPREFIX"; then
+  PYTHONPYCACHEPREFIX="${TMPDIR-/tmp}/gnulib-python-cache-${USER-$LOGNAME}"
+  export PYTHONPYCACHEPREFIX
+fi
+
 profiler_args=
 # For profiling, cf. <https://docs.python.org/3/library/profile.html>.
 #profiler_args="-m cProfile -s tottime"
-- 
2.34.1


[-- Attachment #3: 0002-gnulib-tool-Fix-trouble-caused-by-Python-s-bytecode-.patch --]
[-- Type: text/x-patch, Size: 1609 bytes --]

From ab5390ae6d8db323420874d1c1334feb77af9cb1 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Mon, 22 Apr 2024 13:12:35 +0200
Subject: [PATCH 2/2] gnulib-tool: Fix trouble caused by Python's bytecode
 cache.

Reported by Paul Eggert in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.

* gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
comparison.
---
 ChangeLog   | 8 ++++++++
 gnulib-tool | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 4a272d326e..462823888d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2024-04-22  Bruno Haible  <bruno@clisp.org>
+
+	gnulib-tool: Fix trouble caused by Python's bytecode cache.
+	Reported by Paul Eggert in
+	<https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00367.html>.
+	* gnulib-tool: In sh+py mode, ignore the __pycache__ directory during
+	comparison.
+
 2024-04-22  Bruno Haible  <bruno@clisp.org>
 
 	gnulib-tool.py: Fix trouble caused by Python's bytecode cache.
diff --git a/gnulib-tool b/gnulib-tool
index 6d430e56e6..85b62883c6 100755
--- a/gnulib-tool
+++ b/gnulib-tool
@@ -199,7 +199,7 @@ case "$GNULIB_TOOL_IMPL" in
         else
           diff_options=
         fi
-        diff -r $diff_options -q . "$tmp" >/dev/null ||
+        diff -r $diff_options --exclude=__pycache__ -q . "$tmp" >/dev/null ||
           func_fatal_error "gnulib-tool.py produced different files than gnulib-tool.sh! Compare `pwd` and $tmp."
         # Compare the two outputs.
         diff -q "$tmp-sh-out" "$tmp-py-out" >/dev/null ||
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: diffutils __pycache__ failure.
  2024-04-22  8:51     ` diffutils __pycache__ failure Collin Funk
@ 2024-04-22 11:38       ` Bruno Haible
  2024-04-22 19:44         ` Collin Funk
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 11:38 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib; +Cc: Collin Funk

Collin Funk wrote:
> I have no clue if this has a noticeable performance impact or not.

Can you measure it, please? For example, with
  GNULIB_TOOL_IMPL=py time ./test-all.sh

I measure a difference in the 2% range, but it's not clear to me whether
-B slows down or speeds up things :)

Bruno





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: diffutils __pycache__ failure.
  2024-04-22 11:38       ` Bruno Haible
@ 2024-04-22 19:44         ` Collin Funk
  2024-04-22 20:55           ` Bruno Haible
  0 siblings, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22 19:44 UTC (permalink / raw)
  To: Bruno Haible, Paul Eggert, bug-gnulib

On 4/22/24 4:38 AM, Bruno Haible wrote:
> Collin Funk wrote:
>> I have no clue if this has a noticeable performance impact or not.
> 
> Can you measure it, please? For example, with
>   GNULIB_TOOL_IMPL=py time ./test-all.sh
> 
> I measure a difference in the 2% range, but it's not clear to me whether
> -B slows down or speeds up things :)

Sure, here is the results using the -B flag. I'm removing the
__pycache__ directory before using -B flag to make sure it doesn't get
read.

Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in import-tests:

      no -B flag: 0m16.699s
      -B flag: 0m20.892s

Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in create-tests:

      no -B flag: 2m45.046s
      -B flag: 2m46.674s

The create-tests spend most of their time in autoconf and friends if I
remember correctly.

The import tests feel noticeably slower with -B to me. But the test is
imperfect of course. 1 run, maybe Firefox was working very hard for
one test and not the other, etc. :)

Collin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-22 11:22     ` GNU gnulib: calling for beta-testers Bruno Haible
@ 2024-04-22 20:00       ` Collin Funk
  2024-04-22 20:56         ` Bruno Haible
  0 siblings, 1 reply; 11+ messages in thread
From: Collin Funk @ 2024-04-22 20:00 UTC (permalink / raw)
  To: Bruno Haible, Paul Eggert, bug-gnulib

On 4/22/24 4:22 AM, Bruno Haible wrote:
> The first workaround should fix trouble similar to what we regularly
> see with 'autom4te.cache': Unnecessary difference while comparing source
> trees, unnecessary "git status" noise. Clutter.

I don't think the Python stuff should clutter 'git status' atleast.

$ cat pygnulib/.gitignore 
*.pyc

Unless Python creates other files in there.

Collin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: diffutils __pycache__ failure.
  2024-04-22 19:44         ` Collin Funk
@ 2024-04-22 20:55           ` Bruno Haible
  0 siblings, 0 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 20:55 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib, Collin Funk

Collin Funk wrote:
> >> I have no clue if this has a noticeable performance impact or not.
> > 
> > Can you measure it, please? For example, with
> >   GNULIB_TOOL_IMPL=py time ./test-all.sh
> > 
> > I measure a difference in the 2% range, but it's not clear to me whether
> > -B slows down or speeds up things :)
> 
> Sure, here is the results using the -B flag. I'm removing the
> __pycache__ directory before using -B flag to make sure it doesn't get
> read.
> 
> Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in import-tests:
> 
>       no -B flag: 0m16.699s
>       -B flag: 0m20.892s
> 
> Using 'env GNULIB_TOOL_IMPL=py ./test-all.sh' in create-tests:
> 
>       no -B flag: 2m45.046s
>       -B flag: 2m46.674s

Thanks for measuring it. So, the -B flag causes a slowdown.

> The create-tests spend most of their time in autoconf and friends if I
> remember correctly.
> 
> The import tests feel noticeably slower with -B to me.

This is explained by the fact that the import tests do nearly 100
gnulib-tool invocations: The same just-in-time compilation must happen
in memory 100 times. This explains the 4 seconds of slowdown.

Bruno





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: GNU gnulib: calling for beta-testers
  2024-04-22 20:00       ` Collin Funk
@ 2024-04-22 20:56         ` Bruno Haible
  0 siblings, 0 replies; 11+ messages in thread
From: Bruno Haible @ 2024-04-22 20:56 UTC (permalink / raw)
  To: Paul Eggert, bug-gnulib, Collin Funk

Collin Funk wrote:
> > The first workaround should fix trouble similar to what we regularly
> > see with 'autom4te.cache': Unnecessary difference while comparing source
> > trees, unnecessary "git status" noise. Clutter.
> 
> I don't think the Python stuff should clutter 'git status' atleast.
> 
> $ cat pygnulib/.gitignore 
> *.pyc

OK, good. So, it would not have produced unnecessary "git status" noise.
Still, it showed up during recursive diff. My first workaround fixes that.

Bruno





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-22 20:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-21 10:52 GNU gnulib: calling for beta-testers Bruno Haible
2024-04-21 11:52 ` Vivien Kraus
2024-04-22  7:56 ` Paul Eggert
2024-04-22  8:23   ` Collin Funk
2024-04-22  8:51     ` diffutils __pycache__ failure Collin Funk
2024-04-22 11:38       ` Bruno Haible
2024-04-22 19:44         ` Collin Funk
2024-04-22 20:55           ` Bruno Haible
2024-04-22 11:22     ` GNU gnulib: calling for beta-testers Bruno Haible
2024-04-22 20:00       ` Collin Funk
2024-04-22 20:56         ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).