unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell via Libc-alpha <libc-alpha@sourceware.org>
To: Siddhesh Poyarekar <siddhesh@sourceware.org>, libc-alpha@sourceware.org
Cc: joseph@codesourcery.com
Subject: Re: [PATCH v4 1/2] Port shared code information from the wiki
Date: Mon, 30 Aug 2021 17:20:23 -0400	[thread overview]
Message-ID: <fa90fc4a-8c66-805a-acf6-4ae08ddd4f39@redhat.com> (raw)
In-Reply-To: <20210823025410.590471-2-siddhesh@sourceware.org>

On 8/22/21 10:54 PM, Siddhesh Poyarekar via Libc-alpha wrote:
> Since the shared code now has special status with respect to
> copyrights, port them into a more structured format in the source tree
> and add a python function that parses and returns a dictionary with
> the information.

Perfect. Thanks for doing this, I think a more structured approach like
this is important.

Please send v5 with mktime-internal.h fix.
 
> I need this to exclude these files from the Contributed-by changes and
> I reckon it would be useful to know these files for future tooling.
> ---
>  SHARED-FILES                 | 206 +++++++++++++++++++++++++++++++++++
>  scripts/glibc_shared_code.py |  70 ++++++++++++
>  2 files changed, 276 insertions(+)
>  create mode 100644 SHARED-FILES
>  create mode 100644 scripts/glibc_shared_code.py
> 
> diff --git a/SHARED-FILES b/SHARED-FILES
> new file mode 100644
> index 0000000000..d1c4fc4eeb
> --- /dev/null
> +++ b/SHARED-FILES
> @@ -0,0 +1,206 @@
> +# Files shared with other projects.  Pass a file path to the
> +# get_glibc_shared_code() function in the python library
> +# scripts/glibc_shared_code.py to get a dict object with this information.  See
> +# the library sources for more information.
> +
> +# The headers on most of these files indicate that glibc is the canonical
> +# source for these files, although in many cases there seem to be useful
> +# changes in the gnulib versions that could be merged back in. Not all gnulib
> +# files contain such a header and it is not always consistent in its format, so
> +# it would be useful to make sure that all gnulib files that are using glibc as
> +# upstream have a greppable header.
> +#
> +# These files are quite hard to find without a header to grep for and each file
> +# has to be compared manually so this list is likely incomplete or may contain
> +# errors.
> +gnulib:
> +  argp/argp-ba.c
> +  argp/argp-ba.c
> +  argp/argp-eexst.c
> +  argp/argp-fmtstream.c
> +  argp/argp-fmtstream.h
> +  argp/argp-fs-xinl.c
> +  argp/argp-help.c
> +  argp/argp-namefrob.h
> +  argp/argp-parse.c
> +  argp/argp-pv.c
> +  argp/argp-pvh.c
> +  argp/argp-xinl.c
> +  argp/argp.h
> +  crypt/md5.c
> +  crypt/md5.h
> +  dirent/alphasort.c
> +  dirent/scandir.c
> +  locale/programs/3level.h
> +  # Merged from gnulib 2014-6-23
> +  malloc/obstack.c
> +  # Merged from gnulib 2014-6-23
> +  malloc/obstack.h
> +  # Merged from gnulib 2014-07-10
> +  misc/error.c
> +  misc/error.h
> +  misc/getpass.c
> +  misc/mkdtemp.c
> +  posix/fnmatch_loop.c
> +  # Intended to be the same. Gnulib copy contains glibc changes.
> +  posix/getopt.c
> +  # Intended to be the same. Gnulib copy contains glibc changes.
> +  posix/getopt1.c
> +  # Intended to be the same. Gnulib copy contains glibc changes.
> +  posix/getopt_int.h
> +  posix/glob.c
> +  posix/regcomp.c
> +  posix/regex.c
> +  posix/regex.h
> +  posix/regex_internal.c
> +  posix/regex_internal.h
> +  posix/regexec.c
> +  posix/spawn.c
> +  posix/spawn_faction_addclose.c
> +  posix/spawn_faction_adddup2.c
> +  posix/spawn_faction_addopen.c
> +  posix/spawn_faction_destroy.c
> +  posix/spawn_faction_init.c
> +  posix/spawn_int.h
> +  posix/spawnattr_destroy.c
> +  posix/spawnattr_getdefault.c
> +  posix/spawnattr_getflags.c
> +  posix/spawnattr_getpgroup.c
> +  posix/spawnattr_getschedparam.c
> +  posix/spawnattr_getschedpolicy.c
> +  posix/spawnattr_getsigmask.c
> +  posix/spawnattr_init.c
> +  posix/spawnattr_setdefault.c
> +  posix/spawnattr_setflags.c
> +  posix/spawnattr_setpgroup.c
> +  posix/spawnattr_setschedparam.c
> +  posix/spawnattr_setschedpolicy.c
> +  posix/spawnattr_setsigmask.c
> +  posix/spawnp.c
> +  stdlib/atoll.c
> +  stdlib/getsubopt.c
> +  stdlib/setenv.c
> +  stdlib/strtoll.c
> +  stdlib/strtoul.c
> +  # Merged from gnulib 2014-6-26, needs merge back
> +  string/memchr.c
> +  string/memcmp.c
> +  string/memmem.c
> +  string/mempcpy.c
> +  string/memrchr.c
> +  string/rawmemchr.c
> +  string/stpcpy.c
> +  string/stpncpy.c
> +  string/str-two-way.h
> +  string/strcasestr.c
> +  string/strcspn.c
> +  string/strdup.c
> +  string/strndup.c
> +  string/strpbrk.c
> +  string/strsignal.c
> +  string/strstr.c
> +  string/strtok_r.c
> +  string/strverscmp.c
> +  sysdeps/generic/pty-private.h
> +  sysdeps/generic/siglist.h
> +  sysdeps/posix/euidaccess.c
> +  sysdeps/posix/gai_strerror.c
> +  sysdeps/posix/getcwd.c
> +  sysdeps/posix/pwrite.c
> +  sysdeps/posix/spawni.c
> +  # Merged from gnulib 2014-6-23
> +  sysdeps/posix/tempname.c
> +  # Merged from gnulib 2014-6-27
> +  time/mktime.c
> +  time/strptime.c
> +  time/timegm.c

Missing: time/mktime-internal.h

I cross checked with gnulib's srclist.txt.

> +
> +# The last merge was 2014-12-11 and merged gettext 0.19.3 into glibc with a
> +# patch submitted to the gettext mailing list for changes that could be merged
> +# back.
> +#
> +# This commit was omitted from the merge as it does not appear to be compatible
> +# with how glibc expects things to work:
> +#
> +# commit 279b57fc367251666f00e8e2b599b83703451afb
> +# Author: Bruno Haible <bruno@clisp.org>
> +# Date:   Fri Jun 14 12:03:49 2002 +0000
> +#
> +#     Make absolute pathnames inside $LANGUAGE work.
> +gettext:
> +  intl/bindtextdom.c
> +  intl/dcgettext.c
> +  intl/dcigettext.c
> +  intl/dcngettext.c
> +  intl/dgettext.c
> +  intl/dngettext.c
> +  intl/explodename.c
> +  intl/finddomain.c
> +  intl/gettext.c
> +  intl/gettextP.h
> +  intl/gmo.h
> +  intl/hash-string.c
> +  intl/hash-string.h
> +  intl/l10nflist.c
> +  intl/loadinfo.h
> +  intl/loadmsgcat.c
> +  intl/locale.alias
> +  intl/localealias.c
> +  intl/ngettext.c
> +  intl/plural-exp.c
> +  intl/plural-exp.h
> +  intl/plural.y
> +  intl/textdomain.c
> +
> +# The following files are shared with the upstream Unicode project and must be
> +# updated regularly to stay in sync with the upstream unicode releases.
> +#
> +# Merged from Unicode 13.0.0 release.
> +unicode:
> +  localedata/unicode-gen/UnicodeData.txt
OK.

> +  localedata/unicode-gen/unicode-license.txt

This is correct, but I see we don't download this file with each update.

I've reached out to Steve Loomis to see how CLDR got this file too.

I think we should be downloading it from Unicode when we do an update.

> +  localedata/unicode-gen/DerivedCoreProperties.txt
> +  localedata/unicode-gen/EastAsianWidth.txt
> +  localedata/unicode-gen/PropList.txt
OK.

> +
> +# The following files are shared with the upstream tzcode project and must be
> +# updated regularly to stay in sync with the upstream releases.
> +#
> +# Update from tzcode 2017b.
> +# Latest is 2018g:
> +#   https://mm.icann.org/pipermail/tz-announce/2018-October/000052.html
> +tzcode:
> +  timezone/private.h
> +  timezone/tzfile.h
> +  timezone/zdump.c
> +  timezone/zic.c
> +  timezone/tzselect.ksh
> +
> +# The following files are shared with the upstream tzdata project but is not
> +# synchronized regularly. The data files themselves are used only for testing
> +# purposes and their data is never used to generate any output. We synchronize
> +# them only to stay on top of newer data that might help with testing.
> +#
> +# Currently synced to 2009i.  Latest is 2018g.
> +# https://mm.icann.org/pipermail/tz-announce/2018-October/000052.html
> +tzdata:
> +  timezone/africa
> +  timezone/antarctica
> +  timezone/asia
> +  timezone/australasia
> +  timezone/europe
> +  timezone/northamerica
> +  timezone/southamerica
> +  timezone/pacificnew
> +  timezone/etcetera
> +  timezone/factory
> +  timezone/backward
> +  timezone/systemv
> +  timezone/solar87
> +  timezone/solar88
> +  timezone/solar89
> +  timezone/iso3166.tab
> +  timezone/zone.tab
> +  timezone/leapseconds
> +  # This is yearistype.sh in the parent project
> +  timezone/yearistype
> diff --git a/scripts/glibc_shared_code.py b/scripts/glibc_shared_code.py
> new file mode 100644
> index 0000000000..873a26117f
> --- /dev/null
> +++ b/scripts/glibc_shared_code.py
> @@ -0,0 +1,70 @@
> +#!/usr/bin/python
> +# Copyright (C) 2021 Free Software Foundation, Inc.
> +# This file is part of the GNU C Library.
> +#
> +# The GNU C Library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# The GNU C Library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with the GNU C Library; if not, see
> +# <https://www.gnu.org/licenses/>.
> +
> +def get_glibc_shared_code(path):
> +    """ Get glibc shared code information from a file
> +
> +    The input file must have project names in their own line ending with a colon
> +    and all shared files in the project on their own lines following the project
> +    name.  Whitespaces are ignored.  Lines with # as the first non-whitespace
> +    character are ignored.
> +
> +    Args:
> +        path: The path to file containing shared code information.
> +
> +    Returns:
> +        A dictionary with project names as key and lists of files as values.
> +    """
> +
> +    projects = {}
> +    with open(path, 'r') as f:
> +        for line in f.readlines():
> +            line = line.strip()
> +            if len(line) == 0 or line[0] == '#':
> +                continue
> +            if line[-1] == ':':
> +                cur = line[:-1]
> +                projects[cur] = []
> +            else:
> +                projects[cur].append(line)
> +
> +    return projects
> +
> +# Function testing.
> +import sys
> +from os import EX_NOINPUT
> +from os.path import exists
> +from pprint import *
> +
> +if __name__ == '__main__':
> +    if len(sys.argv) != 2:
> +        print('Usage: %s <file name>' % sys.argv[0])
> +        print('Run this script from the base glibc source directory')
> +        sys.exit(EX_NOINPUT)
> +
> +    print('Testing get_glibc_shared_code with %s:\n' % sys.argv[1])
> +    r = get_glibc_shared_code(sys.argv[1])
> +    errors = False
> +    for k in r.keys():
> +        for f in r[k]:
> +            if not exists(f):
> +                print('%s does not exist' % f)
> +                errors = True
> +
> +    if not errors:
> +        pprint(r)
> 

OK.

-- 
Cheers,
Carlos.


  parent reply	other threads:[~2021-08-30 21:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23  2:54 [PATCH v4 0/2] Source attribution cleanups Siddhesh Poyarekar via Libc-alpha
2021-08-23  2:54 ` [PATCH v4 1/2] Port shared code information from the wiki Siddhesh Poyarekar via Libc-alpha
2021-08-24  1:16   ` Paul Eggert
2021-08-30 20:46     ` Carlos O'Donell via Libc-alpha
2021-08-30 21:20   ` Carlos O'Donell via Libc-alpha [this message]
2021-08-23  2:54 ` [PATCH v4 2/2] Remove "Contributed by" lines Siddhesh Poyarekar via Libc-alpha
2021-08-30 21:20   ` Carlos O'Donell via Libc-alpha
2021-08-30 21:22 ` [PATCH v4 0/2] Source attribution cleanups Carlos O'Donell via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa90fc4a-8c66-805a-acf6-4ae08ddd4f39@redhat.com \
    --to=libc-alpha@sourceware.org \
    --cc=carlos@redhat.com \
    --cc=joseph@codesourcery.com \
    --cc=siddhesh@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).