From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ronnie Sahlberg Subject: Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs Date: Fri, 3 Oct 2014 13:32:43 -0700 Message-ID: References: <20140820231723.GF20185@google.com> <20140911030318.GD18279@google.com> <20141002014817.GS1175@google.com> <20141002022819.GL1175@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Jonathan Nieder , "git@vger.kernel.org" , Michael Haggerty To: Junio C Hamano X-From: git-owner@vger.kernel.org Fri Oct 03 22:32:50 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xa9Wz-0002fw-1a for gcvg-git-2@plane.gmane.org; Fri, 03 Oct 2014 22:32:49 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752167AbaJCUcp (ORCPT ); Fri, 3 Oct 2014 16:32:45 -0400 Received: from mail-vc0-f178.google.com ([209.85.220.178]:37366 "EHLO mail-vc0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750884AbaJCUco (ORCPT ); Fri, 3 Oct 2014 16:32:44 -0400 Received: by mail-vc0-f178.google.com with SMTP id hq12so1203428vcb.9 for ; Fri, 03 Oct 2014 13:32:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=iru0nArMd0cYupI8lghmXn+dImeNqX5KzYkqAldyJkg=; b=Zu5gltqNKVi52HbGF1FohuDzhuhOMrOR1xeSBNPpizI6FlP/H9i825h+EiXInbV08h rXVur3g15LWzH/X0dEY+Bk1w0lpAmdvTGV3nV1wAbOj81JYAOAC+mGZtYkp/eZAivIIe zsQ4bReXInnRVzSMQh9AaOg0JMdB/wpzTA2sw4MPo0EQVlXxri72AE1PHqFCKta/xeWH SDpYVEFY4IN+s8ArBMuAR8CA3ZU3fsxjLdfpxTGng55wultLXplV8g9FOAx128nZRsh7 nQiaE3o1KyEIn7kjTjj+XhDKQWc7Zf3LOujOqnuum0R5ac0sV3tF3blZy6Rk9D0flW5Y 5h+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=iru0nArMd0cYupI8lghmXn+dImeNqX5KzYkqAldyJkg=; b=cZZ2zT1XIrT/cno4cPcBQFkvJnbeY65DQ7QXuD9l6utMVxyGkSdDdALflQGtHdi5fs DriM6JOZqViYUxxMeUd/9eaNmSGYSxNWpHYNC+UhDpdUc8I3wUxtEF+7Ztub3iUFJ8rD nt6UFPC1K/e2k7TkeGjVdnm0IagZZJK/oquKsbVZtYDvi/gi2tHxGppeQVHMCkhzuSiQ 11U6JbnQn/o6nw5a/oSag63c2EVjlNd0ATz9xi0dfOS8zyWrfH2ElezASfLhXrAd02VT IwUOuW1NLgJorwjuCu00fpyukDyV5YV+nNINWMlHn7u5hNiUCVU5cgmjPRJ6clyd7kSi /j1Q== X-Gm-Message-State: ALoCoQkwua0rdOShgOKl2/I2XkV92f/Yic7DV7FeXfTV8tztDF1dvIQz4NW/Z+GP8X+4JIto7tE9 X-Received: by 10.52.165.97 with SMTP id yx1mr4811576vdb.15.1412368363707; Fri, 03 Oct 2014 13:32:43 -0700 (PDT) Received: by 10.52.76.167 with HTTP; Fri, 3 Oct 2014 13:32:43 -0700 (PDT) In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, Oct 3, 2014 at 1:25 PM, Ronnie Sahlberg wrote: > On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano wrote: >> Jonathan Nieder writes: >> >>> From: Ronnie Sahlberg >>> ... >>> In resolving functions, refuse to resolve refs that don't pass the >>> check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME >>> flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to >>> resolve refs that escape the refs/ directory and do not match the >>> pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). >>> >>> In locking functions, refuse to act on badly named refs unless they >>> are being deleted and either are in the refs/ directory or match [A-Z_]*. >>> >>> Just like other invalid refs, flag resolved, badly named refs with the >>> REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them >>> in all iteration functions except for for_each_rawref. >>> >>> Flag badly named refs with a REF_BAD_NAME flag to make it easier for >>> future callers to notice and handle them specially. >>> >>> In the transaction API, refuse to create or update badly named refs, >>> but allow deleting them (unless they escape refs/ and don't match >>> [A-Z_]*). >>> >>> Signed-off-by: Ronnie Sahlberg >>> Signed-off-by: Jonathan Nieder >> >> Thanks. We originally threw all the different kind of breakages >> into ISBROKEN, but a ref can have a malformed name or can contain a >> bad/non value and allowing us to tell them apart is a good direction >> to go. >> >>> diff --git a/cache.h b/cache.h >>> index 5ca7f2b..0c0ac60 100644 >>> --- a/cache.h >>> +++ b/cache.h >>> @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char *sha1); >>> * If flags is non-NULL, set the value that it points to the >>> * combination of REF_ISPACKED (if the reference was found among the >>> * packed references), REF_ISSYMREF (if the initial reference was a >>> - * symbolic reference) and REF_ISBROKEN (if the ref is malformed). >>> + * symbolic reference), REF_BAD_NAME (if the reference name is ill >>> + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN >>> + * (if the ref is malformed). >> >> You want to define "is malformed" here. >> >> The original defines REF_ISBROKEN as "malformed" because >> >> (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs() >> uses read_ref_full(), both to catch "malformed values" stored; >> >> (2) resolve_ref_unsafe() uses check_refname_format() and catches >> "malformed names" stored as a symref target. >> >> I _think_ you are introducing ALLOW_BAD_NAME to allow add the third >> class ".git/refs/remotes/origin/mal..formed..name". I do not know >> if they should be the same class as a symref with a good name >> ".git/refs/remotes/origin/HEAD" that points at a bad name >> "mal..formed..name", which is (2) above). Perhaps not. (2) is >> still above what is stored in the ref, and the ref in question may >> or may not have a well-formed name, which is orthogonal. >> >> So probably you left only the "value stored in the ref is malformed" >> (in other words, "we expected to find 40-hex object name but didn't >> find one") case for REF_ISBROKEN? > > I updated cache.h to try to clarify it better. > The intention here is to expand the use of REF_ISBROKEN. > > For all cases REF_ISBROKEN will be set. This includes both "the sha1 > value is bad" as well as "a name is bad". > > For those cases where a name is bad then REF_BAD_NAME is also set. Bad > names are when either the ref itself has a bad name or when a bad name > is encountered while resolving a chain of symbilic refs. > > > I.e. REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME > refs are also REF_ISBROKEN but not the reverse. Let me add some rationale why REF_BAD_NAME and REF_ISBROKEN are notorthogonal properties. I think almost all callers of these APIs are only concerned about "is the ref good or bad" and they today only check REF_ISBROKEN. I think that is a reasonable API and it allows the majority of callers to just "check this single flag". (The alternative would be to keep all callers in sync and use a set of flags for all "bad conditions".) A very small subset of callers are actually interested in knowing why the ref was bad, and in particular if the ref was bad due to a name component. Those callers, that are aware that there are different types of ISBROKEN can then inspect the REF_BAD_NAME flag in order to decide "is the ref broken due to the ref name?". This is why I made REF_BAD_NAME a special case of REF_ISBROKEN. > > >> >> Do we want to separate "value is not 40-hex" and "a symref points at >> a malformed refname" as separate "malformed value" errors? >> >>> + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their >>> + * name is invalid according to git-check-ref-format(1). If the name >>> + * is bad then the value stored in sha1 will be null_sha1 and the >>> + * REF_ISBROKEN and REF_BAD_NAME flags will be set. >> >> Viewed with that light, I am not sure if a badly named ref should >> yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a >> value that is badly formatted and cannot be read, should keep >> yielding it as before). Wouldn't it make it harder for the user if >> you give null_sha1[] back to somebody who is trying to recover by >> reading "refs/heads/mal..formed", creating "refs/heads/sensible" to >> point at the same value and then removing the former? >> >> Note that I am not saying we should give back the parsed value at >> this step in the series. Perhaps there are some existing callers >> that do not check for ISBROKEN flag and instead says "null_sha1[] >> ref is to be rejected/ignored", in which case they may need to be >> corrected before that happens. Or there may be some reason I >> overlooked that makes it not so useful if we returned the object >> name stored in a ref whose name is malformed. Just wondering. > > The reason for these malformed refs resolving to null_sha1 is exactly > that. There may be callers that do not check ISBROKEN, so all those > callers need to be carefully audited before we start returning > potentially valid non null_sha1 here. > Right now I only want to do the minimal changes needed to open up only > those few paths I need to in order to allow the refs to be deleted. > For the delete case, returning the null_sha1 is sufficient. > > > Now, IF we add support to "rename a bad-ref-name to a good-ref-name", > then for that case we would need to start allowing resolve_ref_unsafe > to actually return the sha1 for the ref. > And that would require that we do those changes and audits. Given how > rare this should be, I am not convinced we need "rename bad to good" > at this point and would thus prefer to hold off adding that. > > Thus, returning the object name is not so useful for the problem I try > to solve at this stage, namely "make delete work". > > >> >>> @@ -272,6 +272,37 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry) >>> return dir; >>> } >>> >>> +static int escapes_cwd(const char *path) { >>> + char *buf; >>> + int result; >>> + >>> + if (is_absolute_path(path)) >>> + return 1; >>> + buf = xmalloc(strlen(path) + 1); >>> + result = !!normalize_path_copy(buf, path); >>> + free(buf); >>> + return result; >>> +} >> >> I think this function is misnamed for two reasons. >> >> - It does not have anything to do with cwd; it does not make any >> difference to the outcome of this function given the same input, >> if 'pwd' says "/u/jc/git" or "/u/jc/git/Documentation", no? >> >> - Even if this had something to do with cwd, I would expect a >> function whose name is escapes_cwd("/u/jc/git/Documentation") to >> yield false when 'pwd' says "/u/jc/git", but the implementation >> unconditionally rejects absolute path. In the context of the >> (sole) caller of this function, which deals with a refname "refs/...", >> it makes no sense to see an absolute path, but that does not have >> anything to do with "does this path escape cwd?", no? >> > > Agree. > I removed this function completely and just inline the !normalize_path_copy() > checks to make sure that refs/* paths remain within refs/. > > >>> +/* >>> + * Check if a refname is safe. >>> + * For refs that start with "refs/" we consider it safe as long as the rest >>> + * of the path components does not allow it to escape from this directory. >>> + * For all other refs we only consider them safe iff they only contain >>> + * upper case characters and '_'. >>> + */ >> >> I presume that the exception is to accomodate for "HEAD", "ORIG_HEAD", >> "MERGE_HEAD" and friends, but you probably do not want the readers to >> guess. >> > > I updated the commend to highlight that this is for "HEAD" and friends. > >>> +static int refname_is_safe(const char *refname) >>> +{ >>> + if (starts_with(refname, "refs/")) >>> + return !escapes_cwd(refname + strlen("refs/")); >>> + while (*refname) { >>> + if (!isupper(*refname) && *refname != '_') >>> + return 0; >>> + refname++; >>> + } >>> + return 1; >>> +}