git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-scm.com status report
@ 2017-02-02  2:33 Jeff King
  2017-02-02  4:36 ` Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2017-02-02  2:33 UTC (permalink / raw)
  To: git

We (the Git project) got control of the git-scm.com domain this year. We
have never really had an "official" website, but I think a lot of people
consider this to be one.

This is an overview of the current state, as well as some possible
issues and future work.

## What's on the site

We have the domains git-scm.com and git-scm.org (the latter we've had
for a while). They both point to the same website, which has general
information about Git, including:

  - a general overview of Git

  - links to the latest releases (both source and some binary
    installers)

  - HTML-rendered copies of the manpages (both for the current version
    and historical versions)

  - an HTML rendering of the contents of the Pro Git book, along with
    translations. The book content is licensed cc-by-nc-sa and developed
    openly.

  - various external links to books, tutorials, GUI tools, etc

## How is it developed and hosted

The site is a Ruby on Rails app. The git repository is
https://github.com/git/git-scm.com. Modifications are generally done by
pull requests there. I have admin access on the repository.

The deployed site is hosted on Heroku. It's part of GitHub's
meta-account, and they pay the bills. I have access to it, and am the
only person who deploys updates. Other technical staff at GitHub have
access, too, because of the account setup, but don't generally
participate in maintenance.

It uses three 1GB Heroku dynos for scaling, which is $150/mo. It also
uses some Heroku addons which add up to another $80/mo.

## Who's the maintainer

These days, it's pretty much me, with a lot of help from Jean-Noël Avila
on issues with the Pro Git import and formatting code.

Long ago, the site content and code was done by Scott Chacon, with
graphic design help from Jason Long.  Scott maintained the site with
help from Bryan Turner for many years. But over time, they both seemed
to get less active, and I haven't seen a peep from either on the site's
GitHub repo in the past year. I've started trying to respond to issues
and pull requests to keep things healthy.

The site is mostly in maintenance mode, but things do need addressing.
People show up with new additions, fixes for typos, broken links and
other formatting problems, etc. There are a lot of long-standing
Asciidoc formatting problems both for the manpages and the imported Pro
Git content.

## What next

We can probably continue in maintenance mode like this for a while.
We've fixed a lot of of the long-standing formatting issues over the
past year, so maintaining seems to have subsided in the past few months
to mostly just merging or rejecting the occasional PR.

Still, if anybody is interested in helping with this work, I'd love to
have more eyes on it. I can give people access to the GitHub repo.
Unfortunately, I can't do so for the Heroku deploy, and part of the
maintenance burden is that the site is finicky and often needs manual
intervention (e.g., a fix to formatting requires rebuilding the
manpages, which needs a job run manually on Heroku).

It's possible that the content or visual design of the site could be
improved in various ways. I don't have any strong desires myself, but
maybe others do. If people start doing larger work, though, we have a
real lack of reviewers, and I have very little expertise with Rails or
with visual design. So anybody who wants to do this should be prepared to
take maintenance ownership.

At some point, GitHub may boot us off of the shared Heroku account,
because my impression is that it's somewhat of an administrative
headache. I don't think the Git project could afford the $230/mo hosting
fees; that's basically all the money we make. On the other hand, we
haven't actively solicited funds to any great degree, and it's possible
we could get GitHub or some other entity to just sponsor us with site
fees (I've heard zero complaints from GitHub about the money; it's
mostly just that the site is an oddball among their other assets).

With the caveat that I know very little about web hosting, $230/mo
sounds like an awful lot for what is essentially a static web site.
The site does see a lot of hits, but most of the content is a few basic
web pages, and copies of other static content that is updated
only occasionally (manpage content, lists of downloads, etc).  The biggest
dynamic component is the site search, I think.

I do wonder if there's room for improvement either:

  - by measuring and optimizing the Heroku deploy. I have no idea about
    scaling Rails or Heroku apps. Do we really need three expensive
    dynos, or a $50/mo database plan? I'm not even sure what to measure,
    or how. There are some analytics on the site, but I don't have
    access to them (I could probably dig around for access if there was
    somebody who actually knew how to do something productive with
    them).

  - by moving to a simpler model. I wonder if we could build the site
    once and then deploy a more static variant of it to a cheaper
    hosting platform. I'm not really sure what our options would be, how
    much work it would take to do the conversion, and if we'd lose any
    functionality.

If anybody is interested in tackling a project like this, let me know,
and I can try to provide access to whatever parts are needed.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-02  2:33 Jeff King
@ 2017-02-02  4:36 ` Eric Wong
  2017-02-02  6:54   ` Samuel Lijin
  2017-02-05 20:11 ` Pranit Bauva
  2017-02-06 18:27 ` Jeff King
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Wong @ 2017-02-02  4:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
> With the caveat that I know very little about web hosting, $230/mo
> sounds like an awful lot for what is essentially a static web site.

Yes, that's a lot.

Fwiw, that covers a year of low-end VPS hosting for the main
public-inbox.org/git machine + mail host
(~1GB git objects + ~3GB Xapian index).

> The site does see a lot of hits, but most of the content is a few basic
> web pages, and copies of other static content that is updated
> only occasionally (manpage content, lists of downloads, etc).  The biggest
> dynamic component is the site search, I think.

Maybe optimize search if that's slowest, first.  public-inbox
uses per-host Xapian indexes so there's no extra network latency
and it seems to work well.  But maybe you don't get FS write
access without full VPS access on Heroku...

nginx handles static content easily, and since it looks like you
guys use unicorn[*] for running the Ruby part.  I really hope
nginx is in front of unicorn, since (AFAIK) Heroku doesn't put
nginx in front of it by default.


[*] I wrote and maintain unicorn; and have not yet recommended
    any reverse proxy besides nginx to buffer for it.
    However, having varnish or any other caching layer in
    between nginx and unicorn is great, too.  I dunno how Heroku
    (or any proprietary deployment systems) handle it, though.

> I do wonder if there's room for improvement either:
> 
>   - by measuring and optimizing the Heroku deploy. I have no idea about
>     scaling Rails or Heroku apps. Do we really need three expensive
>     dynos, or a $50/mo database plan? I'm not even sure what to measure,
>     or how. There are some analytics on the site, but I don't have
>     access to them (I could probably dig around for access if there was
>     somebody who actually knew how to do something productive with
>     them).

I track down the most expensive requests in per-request timing
logs and work on profiling + optimizations from there...
Nothing fancy and no relying on proprietary tools like NewRelic.

I also watch for queueing in listen socket backlog (with
raindrops <https://raindrops-demo.bogomips.org/> or ss(8) to
notice overloading.  Again, I don't know how much visibility
you have with Heroku.

>   - by moving to a simpler model. I wonder if we could build the site
>     once and then deploy a more static variant of it to a cheaper
>     hosting platform. I'm not really sure what our options would be, how
>     much work it would take to do the conversion, and if we'd lose any
>     functionality.

*shrug*  That'd be more work, at least.  I'd figure out what's
slow, first.

Fwiw, Varnish definitely helps public-inbox when slammed by
HN/Reddit traffic.  It's great as long as you don't have
per-user data to invalidate, which seems to be the case for
git-scm.

> If anybody is interested in tackling a project like this, let me know,
> and I can try to provide access to whatever parts are needed.

While I'm not up-to-date with modern Rails or deployment stuff,
I'm available via email if you have any lower-level
Ruby/unicorn/nginx-related questions.  I'm sure GitHub/GitLab
also has folks familiar with nginx+unicorn deployment on
bare metal or VPS who could also help.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-02  4:36 ` Eric Wong
@ 2017-02-02  6:54   ` Samuel Lijin
  2017-02-03 11:58     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Samuel Lijin @ 2017-02-02  6:54 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git@vger.kernel.org

In theory, you could also dump the build artifacts to a GH Pages repo
and host it from there, although I don't know if you would run up
against any of the usage limits[0]. The immediate problem I see with
that approach, though, is that I have no idea how any of the dynamic
stuff (e.g. search) would be replaced.

A question: there's a DB schema in there. Does the site still use a DB?

[0] https://help.github.com/articles/what-is-github-pages/#usage-limits

On Wed, Feb 1, 2017 at 10:36 PM, Eric Wong <e@80x24.org> wrote:
> Jeff King <peff@peff.net> wrote:
>> With the caveat that I know very little about web hosting, $230/mo
>> sounds like an awful lot for what is essentially a static web site.
>
> Yes, that's a lot.
>
> Fwiw, that covers a year of low-end VPS hosting for the main
> public-inbox.org/git machine + mail host
> (~1GB git objects + ~3GB Xapian index).
>
>> The site does see a lot of hits, but most of the content is a few basic
>> web pages, and copies of other static content that is updated
>> only occasionally (manpage content, lists of downloads, etc).  The biggest
>> dynamic component is the site search, I think.
>
> Maybe optimize search if that's slowest, first.  public-inbox
> uses per-host Xapian indexes so there's no extra network latency
> and it seems to work well.  But maybe you don't get FS write
> access without full VPS access on Heroku...
>
> nginx handles static content easily, and since it looks like you
> guys use unicorn[*] for running the Ruby part.  I really hope
> nginx is in front of unicorn, since (AFAIK) Heroku doesn't put
> nginx in front of it by default.
>
>
> [*] I wrote and maintain unicorn; and have not yet recommended
>     any reverse proxy besides nginx to buffer for it.
>     However, having varnish or any other caching layer in
>     between nginx and unicorn is great, too.  I dunno how Heroku
>     (or any proprietary deployment systems) handle it, though.
>
>> I do wonder if there's room for improvement either:
>>
>>   - by measuring and optimizing the Heroku deploy. I have no idea about
>>     scaling Rails or Heroku apps. Do we really need three expensive
>>     dynos, or a $50/mo database plan? I'm not even sure what to measure,
>>     or how. There are some analytics on the site, but I don't have
>>     access to them (I could probably dig around for access if there was
>>     somebody who actually knew how to do something productive with
>>     them).
>
> I track down the most expensive requests in per-request timing
> logs and work on profiling + optimizations from there...
> Nothing fancy and no relying on proprietary tools like NewRelic.
>
> I also watch for queueing in listen socket backlog (with
> raindrops <https://raindrops-demo.bogomips.org/> or ss(8) to
> notice overloading.  Again, I don't know how much visibility
> you have with Heroku.
>
>>   - by moving to a simpler model. I wonder if we could build the site
>>     once and then deploy a more static variant of it to a cheaper
>>     hosting platform. I'm not really sure what our options would be, how
>>     much work it would take to do the conversion, and if we'd lose any
>>     functionality.
>
> *shrug*  That'd be more work, at least.  I'd figure out what's
> slow, first.
>
> Fwiw, Varnish definitely helps public-inbox when slammed by
> HN/Reddit traffic.  It's great as long as you don't have
> per-user data to invalidate, which seems to be the case for
> git-scm.
>
>> If anybody is interested in tackling a project like this, let me know,
>> and I can try to provide access to whatever parts are needed.
>
> While I'm not up-to-date with modern Rails or deployment stuff,
> I'm available via email if you have any lower-level
> Ruby/unicorn/nginx-related questions.  I'm sure GitHub/GitLab
> also has folks familiar with nginx+unicorn deployment on
> bare metal or VPS who could also help.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
       [not found] <16F9F83D-5A7F-4059-9A27-DB25A8FB1E99@gmail.com>
@ 2017-02-02 22:51 ` Samuel Lijin
  2017-02-03 12:08 ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Samuel Lijin @ 2017-02-02 22:51 UTC (permalink / raw)
  To: pedro rijo, j; +Cc: Eric Wong, git@vger.kernel.org, Jeff King

For anyone interested, this thread is on the HN front page right now[0].

There's one suggestion in particular that stands out to me - shifting
to Digital Ocean[1], which for $240/mo offers wayyyy more than what it
sounds like the current Heroku costs are.

[0] https://news.ycombinator.com/item?id=13554065
[1] https://news.ycombinator.com/item?id=13554632

On Thu, Feb 2, 2017 at 4:01 PM, pedro rijo <pedrorijo91@gmail.com> wrote:
> Hey,
>
> While I’m not experienced with Rails apps, I would like to give my
> contribution to the Git project. I could help doing some kind of triage,
> removing abusing PRs/issues (like
> https://github.com/git/git-scm.com/pull/557), looking for typos and other
> tasks that wouldn’t require a lot of RoR knowledge to get start. Also,
> completely free and available to start digging into the RoR stuff of course!
>
> If you are interested, just let me know :)
>
> Thanks,
> Pedro Rijo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-02  6:54   ` Samuel Lijin
@ 2017-02-03 11:58     ` Jeff King
  2017-02-03 20:56       ` Samuel Lijin
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-02-03 11:58 UTC (permalink / raw)
  To: Samuel Lijin; +Cc: Eric Wong, git@vger.kernel.org

On Thu, Feb 02, 2017 at 12:54:53AM -0600, Samuel Lijin wrote:

> In theory, you could also dump the build artifacts to a GH Pages repo
> and host it from there, although I don't know if you would run up
> against any of the usage limits[0]. The immediate problem I see with
> that approach, though, is that I have no idea how any of the dynamic
> stuff (e.g. search) would be replaced.

I've talked with Pages people and they say it shouldn't be a big deal to
host. The main issue is that it's not _just_ a static site. It's a site
that's static once built, but a lot of the content is auto-generated
from other sources (git manpages, Pro Git and its translations, etc).

So there's work involved in moving that generation step to whatever the
new process is (it's fine if it's running "make" locally after a Git
release and pushing up the result).

> A question: there's a DB schema in there. Does the site still use a DB?

It does use the database to hold all of the bits that aren't checked
into Git. So renderings of the manpages, the latest release git version,
etc. AFAIK, it's all things that I would be comfortable committing into
a git repository.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
       [not found] <16F9F83D-5A7F-4059-9A27-DB25A8FB1E99@gmail.com>
  2017-02-02 22:51 ` git-scm.com status report Samuel Lijin
@ 2017-02-03 12:08 ` Jeff King
       [not found]   ` <CAPMsMoAUcVteJGfyYrL1ZkNLnoRES0yZxkMZeL347Q_1Kx5VBQ@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-02-03 12:08 UTC (permalink / raw)
  To: pedro rijo; +Cc: sxlijin, e, git

On Thu, Feb 02, 2017 at 10:01:45PM +0000, pedro rijo wrote:

> While I’m not experienced with Rails apps, I would like to give my
> contribution to the Git project. I could help doing some kind of
> triage, removing abusing PRs/issues (like
> https://github.com/git/git-scm.com/pull/557
> <https://github.com/git/git-scm.com/pull/557>), looking for typos and
> other tasks that wouldn’t require a lot of RoR knowledge to get start.
> Also, completely free and available to start digging into the RoR
> stuff of course!

Thanks! I think a good first step is just to start watching the
repository and jump in on issues where you think you can contribute.

Clicking "close" or "merge" on an issue is something only I can do for
now, but having a group of people reviewing and responding to issues and
PRs is a big help (so I _can_ just click those buttons). And then
over time hopefully we can grow a stable of reviewers, and hand out
repo privileges to the active ones.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-03 11:58     ` Jeff King
@ 2017-02-03 20:56       ` Samuel Lijin
  0 siblings, 0 replies; 19+ messages in thread
From: Samuel Lijin @ 2017-02-03 20:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Wong, git@vger.kernel.org

On Fri, Feb 3, 2017 at 5:58 AM, Jeff King <peff@peff.net> wrote:
> On Thu, Feb 02, 2017 at 12:54:53AM -0600, Samuel Lijin wrote:
>
>> In theory, you could also dump the build artifacts to a GH Pages repo
>> and host it from there, although I don't know if you would run up
>> against any of the usage limits[0]. The immediate problem I see with
>> that approach, though, is that I have no idea how any of the dynamic
>> stuff (e.g. search) would be replaced.
>
> I've talked with Pages people and they say it shouldn't be a big deal to
> host. The main issue is that it's not _just_ a static site. It's a site
> that's static once built, but a lot of the content is auto-generated
> from other sources (git manpages, Pro Git and its translations, etc).
>
> So there's work involved in moving that generation step to whatever the
> new process is (it's fine if it's running "make" locally after a Git
> release and pushing up the result).

Yep, noticed that when I cloned the repo the other day. Still
wrangling with my own setup so that I can build everything locally. I
imagine it would also be possible to set up some sort of CI/CD
pipeline to handle generating build artifacts automatically; so to be
honest, I don't think any of the static assets would pose a
significant problem.

The bigger issue, in my opinion, is that there seems to be a fair
amount of non-trivial back-end stuff
(https://github.com/git/git-scm.com/blob/master/spec/controllers/site_controller_spec.rb,
https://github.com/git/git-scm.com/blob/master/app/controllers/site_controller.rb)
including an Elasticsearch layer. (The redirects would be mildly
inconvenient to handle with Pages, but like the static asset
generation, should be more than doable.)

>> A question: there's a DB schema in there. Does the site still use a DB?
>
> It does use the database to hold all of the bits that aren't checked
> into Git. So renderings of the manpages, the latest release git version,
> etc. AFAIK, it's all things that I would be comfortable committing into
> a git repository.
>
> -Peff

In the meantime, I've also pinged a friend at Digital Ocean about
their hosting options and they've expressed interest. At the very
least, they seem to offer a lot more than Heroku for 230$/mo[0], and I
imagine it wouldn't be impossible to reduce the hosting costs by an
order of magnitude. Think it's worth looking into?

[0] https://www.digitalocean.com/pricing/#droplet

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
       [not found]   ` <CAPMsMoAUcVteJGfyYrL1ZkNLnoRES0yZxkMZeL347Q_1Kx5VBQ@mail.gmail.com>
@ 2017-02-03 22:24     ` Jeff King
       [not found]       ` <CAPMsMoDpAeD0hpPuLeWO2T1VoEZDf_hD2gA2GDBqypMF9V6gAw@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-02-03 22:24 UTC (permalink / raw)
  To: pedro rijo; +Cc: Samuel Lijin, e, Git Users

On Fri, Feb 03, 2017 at 09:23:33PM +0000, pedro rijo wrote:

> Seems a good idea. I will start by going through some old prs/issues to
> look for trash. If I do find some like the one I referred I will let you
> know by mentioning you. After that I will have a look at simpler issues/prs.
> 
> Let me know if you do agree (or you recommend another workflow) so that I
> can start looking at it this weekend :)

That sounds perfect. Thanks!

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-02  2:33 Jeff King
  2017-02-02  4:36 ` Eric Wong
@ 2017-02-05 20:11 ` Pranit Bauva
  2017-02-06 16:24   ` Jeff King
  2017-02-06 18:27 ` Jeff King
  2 siblings, 1 reply; 19+ messages in thread
From: Pranit Bauva @ 2017-02-05 20:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

Hey Peff,

On Thu, Feb 2, 2017 at 8:03 AM, Jeff King <peff@peff.net> wrote:
> ## What's on the site
>
> We have the domains git-scm.com and git-scm.org (the latter we've had
> for a while). They both point to the same website, which has general
> information about Git, including:

Since we have an "official" control over the website, shouldn't we be
using the .org domain more because we are more of an organization?
What I mean is that in many places, we have referred to git-scm.com,
which was perfectly fine because it was done by github which is a
company but now I think it would be more appropriate to use
git-scm.org domain. We can forward all .com requests to .org and try
to move all reference we know about, to .org. What do you all think?

Regards,
Pranit Bauva

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-05 20:11 ` Pranit Bauva
@ 2017-02-06 16:24   ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2017-02-06 16:24 UTC (permalink / raw)
  To: Pranit Bauva; +Cc: Git List

On Mon, Feb 06, 2017 at 01:41:04AM +0530, Pranit Bauva wrote:

> On Thu, Feb 2, 2017 at 8:03 AM, Jeff King <peff@peff.net> wrote:
> > ## What's on the site
> >
> > We have the domains git-scm.com and git-scm.org (the latter we've had
> > for a while). They both point to the same website, which has general
> > information about Git, including:
> 
> Since we have an "official" control over the website, shouldn't we be
> using the .org domain more because we are more of an organization?
> What I mean is that in many places, we have referred to git-scm.com,
> which was perfectly fine because it was done by github which is a
> company but now I think it would be more appropriate to use
> git-scm.org domain. We can forward all .com requests to .org and try
> to move all reference we know about, to .org. What do you all think?

I don't have a preference myself. I know a lot of non-commercial groups
(which I think the Git project is) try to prefer ".org" to signal that.

Switching it around would require some DNS changes. I think ".org" goes
to a server the DNS provider (Gandi) runs which issues an HTTP 301 to
".com". So we'd want to reverse that, or possibly just treat them both
as equals. That shouldn't be too hard, and will have to be done via
Conservancy.

I don't know what it would mean in terms of search-engine optimization.
I know Google tries to detect duplicate names for sites and treat one as
canonical. And that's going to be ".com" now, based on the existing
redirect and on the fact that most people will have linked to .com.

I'm not sure what disadvantages there are to switching now, or if there
are things we should be doing to tell search engines (I seem to recall
Google's Webmaster tools have options to say "this is the canonical
name"). This is pretty far outside my area of expertise, so it may not
even be something to care about at all. Just things to consider (and
hopefully more clueful people than I can comment on it).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-02  2:33 Jeff King
  2017-02-02  4:36 ` Eric Wong
  2017-02-05 20:11 ` Pranit Bauva
@ 2017-02-06 18:27 ` Jeff King
  2017-02-09  2:12   ` brian m. carlson
  2017-05-17  1:56   ` Samuel Lijin
  2 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2017-02-06 18:27 UTC (permalink / raw)
  To: git

On Thu, Feb 02, 2017 at 03:33:50AM +0100, Jeff King wrote:

> We (the Git project) got control of the git-scm.com domain this year. We
> have never really had an "official" website, but I think a lot of people
> consider this to be one.
> 
> This is an overview of the current state, as well as some possible
> issues and future work.

Thanks everybody, for your responses here and off-list. After my mail
got posted to HN, I got quite a lot of private responses, including
offers to sponsor hosting, work on the site, etc. I'm still working my
way through them, but I wanted to try to respond in aggregate here.

First, a few clarifications:

  - The money for the site wasn't mentioned to me by GitHub at all.  I'm
    quite sure they would continue to sponsor the site financially if
    need be. The only reason I didn't promise that is because I hadn't
    arranged it specifically, and "step 0" seemed like first making sure
    our costs were reasonable.

  - Spinning the site out of GitHub's Heroku account isn't an urgent or
    impending change. It came out of a conversation I had with people
    auditing the GitHub account, where it is clearly a funny historical
    anomaly. So I suspect we could just stay there indefinitely if need
    be. But it seems to me like the right thing is to move it out for
    two reasons:

      1. The site was always intended to serve the Git community, not
         GitHub, and it has increasingly become a community asset (e.g.,
	 with the transfer of the domain name). The hosting assets
	 should be held by the community, too, to help with things like
	 continuity. If I get hit by a bus, the rest of the Git PLC
	 should have access to the site without having to figure out who
	 owns what.

      2. Right now I can't add any other co-admins to handle operational
         issues. So the bus factor and load of that part of operating
	 the site can't be spread.

The responses I've gotten fall into a few buckets, I think:

  - Yes, the current hosting cost really is unnecessarily high. Most of
    this is due to scaling wrong. The main costs are:

      1. Using 2x dynos; these have 1GB of RAM versus 512MB. The site
         does seem to use about 750MB. I have no idea why that is the
	 case. There's probably some low-hanging fruit in reducing the
	 memory use to keep it below 512MB, but I don't think anybody
	 has dug in there.

      2. The site is scaled by using 3 dynos. It would be simpler and
         cheaper to stick a CDN in front of it, since the pages change
	 very rarely. That's something I haven't looked into setting up
	 yet.

	 The prerequisite to using a CDN is actually making sure the
	 content is deterministic and cacheable. There was a nice PR
	 opened at https://github.com/git/git-scm.com/pull/941 towards
	 that end.

  - It's mostly silly for this to be a Rails app at all. It's a static
    site which occasionally sucks in and formats new content (like the
    latest git version, new manpages, etc). The intent here was to make
    something that would "just run" forever and pick up new versions
    without human intervention. And that _does_ work, but it also makes
    things more expensive and complicated than they need to be.

    So a viable alternative is to use some kind of static site
    generator and have someone (or something) responsible for pulling in
    the new git versions occasionally.

    A few people have expressed interesting this. There's some
    preliminary work here:

      https://github.com/git/git-scm.com/pull/941

    and at least GitLab has expressed some interest. So I'll let people
    coordinate in that PR or a new one what the result should look like.
    Working patches trump discussion. :)

    I have also talked with the GitHub Pages people, and they think
    hosting it as a Jekyll page wouldn't be a big deal performance-wise
    (with the caveat that we'd need to pre-render the asciidoctor bits
    ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
    for hosting it for effectively free (though I think we _would_ still
    want to put a CDN in front of it). But if somebody has an
    alternative option, that's fine, too.

  - Some people offered to help with running the site, or making major
    transitions (like converting to a static site). The most important
    thing to me there is that we have a solid maintenance plan. So I
    would want some evidence that anybody doing a major work would stick
    around in the community afterwards, or that it be done in a way that
    the handoff back to community members is easy. So I'd probably look
    for somebody already involved in the community, or somebody who
    wants to join it building up that trust by taking on site
    responsibilities over time.

  - Lots of people asked about small tasks to do. Mostly reviewing and
    responding to issues and PR is the simplest thing. You can do it in
    a drive-by way, and that helps take the load off of me. As the same
    reviewers show up more and more, I think we can build a community
    and I'd eventually hand out greater access to the site to match.

    I notice I've got over 100 GitHub notifications from people sifting
    through back-issues, so that will take some time to go through. I'm
    hoping a lot of them are "already fixed, click closed". :)

  - Several people offered money out of pocket to pay for hosting, and
    several hosters contacted me to work out hosting deals ranging from
    cheap to free. I'd prefer to explore the technical bits for now and
    see what the final shape and cost actually is (if we move to a
    non-Rails site, then Rails hosting is less appealing, obviously).

So that's where we're at. I think the next step is either sticking a CDN
in front of Heroku and dialing down the scaling there, or moving to a
static site. I'll probably stall for a bit and see if patches for the
latter materialize, and if not pursue the CDN thing (where most of the
work will be administrative in getting it set up, not technical. I think
that makes it more or less my thing to do, but if anybody is interested
in setting it up and handing off an account to the project, that
certainly makes things easy).

Thanks again for everybody who has offered to help, and everybody who
continues to do so.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-06 18:27 ` Jeff King
@ 2017-02-09  2:12   ` brian m. carlson
  2017-02-09  2:50     ` Jeff King
  2017-05-17  1:56   ` Samuel Lijin
  1 sibling, 1 reply; 19+ messages in thread
From: brian m. carlson @ 2017-02-09  2:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

On Mon, Feb 06, 2017 at 07:27:54PM +0100, Jeff King wrote:
>   - It's mostly silly for this to be a Rails app at all. It's a static
>     site which occasionally sucks in and formats new content (like the
>     latest git version, new manpages, etc). The intent here was to make
>     something that would "just run" forever and pick up new versions
>     without human intervention. And that _does_ work, but it also makes
>     things more expensive and complicated than they need to be.
> 
>     So a viable alternative is to use some kind of static site
>     generator and have someone (or something) responsible for pulling in
>     the new git versions occasionally.
> 
>     A few people have expressed interesting this. There's some
>     preliminary work here:
> 
>       https://github.com/git/git-scm.com/pull/941
> 
>     and at least GitLab has expressed some interest. So I'll let people
>     coordinate in that PR or a new one what the result should look like.
>     Working patches trump discussion. :)
> 
>     I have also talked with the GitHub Pages people, and they think
>     hosting it as a Jekyll page wouldn't be a big deal performance-wise
>     (with the caveat that we'd need to pre-render the asciidoctor bits
>     ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
>     for hosting it for effectively free (though I think we _would_ still
>     want to put a CDN in front of it). But if somebody has an
>     alternative option, that's fine, too.

My only concern with using GitHub Pages is that I don't believe it
currently supports TLS on custom domains.  Since we currently have TLS
enabled, along with HTTP Strict Transport Security (as we should), such
a configuration literally wouldn't work[0].  I think it's important that
we continue to serve HTTPS only, anyway.

I agree that a static site is the way to go from a maintenance
perspective, though.  Jekyll does support Asciidoctor with a plugin,
just not on GitHub Pages, so it would theoretically be possible to build
the site as one big unit if we did it that way.  I've played around with
that plugin, so I'm happy to provide guidance if we want to do that.

[0] HSTS would prevent anyone who had visited the page from downgrading
to an insecure connection for the next year.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-09  2:12   ` brian m. carlson
@ 2017-02-09  2:50     ` Jeff King
  2017-02-09  4:30       ` Eric Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-02-09  2:50 UTC (permalink / raw)
  To: brian m. carlson, git

On Thu, Feb 09, 2017 at 02:12:09AM +0000, brian m. carlson wrote:

> My only concern with using GitHub Pages is that I don't believe it
> currently supports TLS on custom domains.  Since we currently have TLS
> enabled, along with HTTP Strict Transport Security (as we should), such
> a configuration literally wouldn't work[0].  I think it's important that
> we continue to serve HTTPS only, anyway.

I agree we should continue to serve HTTPS. The usual solution for our
use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
I think we'd want to do that anyway for performance).

I haven't done it, but there are various guides. Here's the one from
Cloudflare:

  https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

> I agree that a static site is the way to go from a maintenance
> perspective, though.  Jekyll does support Asciidoctor with a plugin,
> just not on GitHub Pages, so it would theoretically be possible to build
> the site as one big unit if we did it that way.  I've played around with
> that plugin, so I'm happy to provide guidance if we want to do that.

We already massage the data coming from Git (and from the Pro Git books)
a bit before and after feeding it to asciidoctor. So I always assumed
that any static site would involve some import steps for those things,
and we'd commit the intermediate product into the repository.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-09  2:50     ` Jeff King
@ 2017-02-09  4:30       ` Eric Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Wong @ 2017-02-09  4:30 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git

Jeff King <peff@peff.net> wrote:
> I agree we should continue to serve HTTPS. The usual solution for our
> use case is to stick a CDN like Cloudflare in front of GitHub Pages (and
> I think we'd want to do that anyway for performance).
> 
> I haven't done it, but there are various guides. Here's the one from
> Cloudflare:
> 
>   https://blog.cloudflare.com/secure-and-fast-github-pages-with-cloudflare/

AFAIK, there's a way to keep CloudFlare stuff accessible to Tor
users.  If there is, please do so.  As a Tor user, it's been
disappointing to see so much of the web walled off by CAPTCHAs.

Thank you.

Heck, maybe a .onion mirror would be nice :)
I wouldn't mind hosting one myself if it's static.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
       [not found]       ` <CAPMsMoDpAeD0hpPuLeWO2T1VoEZDf_hD2gA2GDBqypMF9V6gAw@mail.gmail.com>
@ 2017-02-20  7:53         ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2017-02-20  7:53 UTC (permalink / raw)
  To: pedro rijo; +Cc: Samuel Lijin, e, Git Users

On Sat, Feb 18, 2017 at 10:27:51PM +0000, pedro rijo wrote:

> I would say everyone did an amazing job, closing more than 150 old issues
> in a single week! I think the amount of issues is finally manageable (40
> issues currently).

Yes, thank you to all who have been helping. But especially you and
Samuel, who obviously spent a lot of time sifting through old issues.

> And if you agree, I would like to start looking at old PRs (some will
> probably don't make sense anymore), and will start reviewing them as soon
> as I have the time to setup the RoR app on my machine so that I can
> understand and see the changes introduced on the PRs.

Sounds good.

>  Many PRs seem to introduce small and innocent changes, but I always like
> to run the code to see :)

Yeah, many of the display-oriented changes are pretty obvious from
reading the code, but I have caught a couple of regressions just by
running the PRs and making sure the rendered result is sane.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-02-06 18:27 ` Jeff King
  2017-02-09  2:12   ` brian m. carlson
@ 2017-05-17  1:56   ` Samuel Lijin
  2017-05-17  2:03     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Samuel Lijin @ 2017-05-17  1:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

On Mon, Feb 6, 2017 at 1:27 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Feb 02, 2017 at 03:33:50AM +0100, Jeff King wrote:
>
>> We (the Git project) got control of the git-scm.com domain this year. We
>> have never really had an "official" website, but I think a lot of people
>> consider this to be one.
>>
>> This is an overview of the current state, as well as some possible
>> issues and future work.
>
> Thanks everybody, for your responses here and off-list. After my mail
> got posted to HN, I got quite a lot of private responses, including
> offers to sponsor hosting, work on the site, etc. I'm still working my
> way through them, but I wanted to try to respond in aggregate here.
>
> First, a few clarifications:
>
>   - The money for the site wasn't mentioned to me by GitHub at all.  I'm
>     quite sure they would continue to sponsor the site financially if
>     need be. The only reason I didn't promise that is because I hadn't
>     arranged it specifically, and "step 0" seemed like first making sure
>     our costs were reasonable.
>
>   - Spinning the site out of GitHub's Heroku account isn't an urgent or
>     impending change. It came out of a conversation I had with people
>     auditing the GitHub account, where it is clearly a funny historical
>     anomaly. So I suspect we could just stay there indefinitely if need
>     be. But it seems to me like the right thing is to move it out for
>     two reasons:
>
>       1. The site was always intended to serve the Git community, not
>          GitHub, and it has increasingly become a community asset (e.g.,
>          with the transfer of the domain name). The hosting assets
>          should be held by the community, too, to help with things like
>          continuity. If I get hit by a bus, the rest of the Git PLC
>          should have access to the site without having to figure out who
>          owns what.
>
>       2. Right now I can't add any other co-admins to handle operational
>          issues. So the bus factor and load of that part of operating
>          the site can't be spread.
>
> The responses I've gotten fall into a few buckets, I think:
>
>   - Yes, the current hosting cost really is unnecessarily high. Most of
>     this is due to scaling wrong. The main costs are:
>
>       1. Using 2x dynos; these have 1GB of RAM versus 512MB. The site
>          does seem to use about 750MB. I have no idea why that is the
>          case. There's probably some low-hanging fruit in reducing the
>          memory use to keep it below 512MB, but I don't think anybody
>          has dug in there.
>
>       2. The site is scaled by using 3 dynos. It would be simpler and
>          cheaper to stick a CDN in front of it, since the pages change
>          very rarely. That's something I haven't looked into setting up
>          yet.
>
>          The prerequisite to using a CDN is actually making sure the
>          content is deterministic and cacheable. There was a nice PR
>          opened at https://github.com/git/git-scm.com/pull/941 towards
>          that end.
>
>   - It's mostly silly for this to be a Rails app at all. It's a static
>     site which occasionally sucks in and formats new content (like the
>     latest git version, new manpages, etc). The intent here was to make
>     something that would "just run" forever and pick up new versions
>     without human intervention. And that _does_ work, but it also makes
>     things more expensive and complicated than they need to be.
>
>     So a viable alternative is to use some kind of static site
>     generator and have someone (or something) responsible for pulling in
>     the new git versions occasionally.
>
>     A few people have expressed interesting this. There's some
>     preliminary work here:
>
>       https://github.com/git/git-scm.com/pull/941
>
>     and at least GitLab has expressed some interest. So I'll let people
>     coordinate in that PR or a new one what the result should look like.
>     Working patches trump discussion. :)
>
>     I have also talked with the GitHub Pages people, and they think
>     hosting it as a Jekyll page wouldn't be a big deal performance-wise
>     (with the caveat that we'd need to pre-render the asciidoctor bits
>     ourselves, as Jekyll doesn't do asciidoc). So that's a viable option
>     for hosting it for effectively free (though I think we _would_ still
>     want to put a CDN in front of it). But if somebody has an
>     alternative option, that's fine, too.
>
>   - Some people offered to help with running the site, or making major
>     transitions (like converting to a static site). The most important
>     thing to me there is that we have a solid maintenance plan. So I
>     would want some evidence that anybody doing a major work would stick
>     around in the community afterwards, or that it be done in a way that
>     the handoff back to community members is easy. So I'd probably look
>     for somebody already involved in the community, or somebody who
>     wants to join it building up that trust by taking on site
>     responsibilities over time.
>
>   - Lots of people asked about small tasks to do. Mostly reviewing and
>     responding to issues and PR is the simplest thing. You can do it in
>     a drive-by way, and that helps take the load off of me. As the same
>     reviewers show up more and more, I think we can build a community
>     and I'd eventually hand out greater access to the site to match.
>
>     I notice I've got over 100 GitHub notifications from people sifting
>     through back-issues, so that will take some time to go through. I'm
>     hoping a lot of them are "already fixed, click closed". :)
>
>   - Several people offered money out of pocket to pay for hosting, and
>     several hosters contacted me to work out hosting deals ranging from
>     cheap to free. I'd prefer to explore the technical bits for now and
>     see what the final shape and cost actually is (if we move to a
>     non-Rails site, then Rails hosting is less appealing, obviously).
>
> So that's where we're at. I think the next step is either sticking a CDN
> in front of Heroku and dialing down the scaling there, or moving to a
> static site. I'll probably stall for a bit and see if patches for the
> latter materialize, and if not pursue the CDN thing (where most of the
> work will be administrative in getting it set up, not technical. I think
> that makes it more or less my thing to do, but if anybody is interested
> in setting it up and handing off an account to the project, that
> certainly makes things easy).
>
> Thanks again for everybody who has offered to help, and everybody who
> continues to do so.
>
> -Peff

So I've finally found the time to get everything set up (in the
process discovering that remote_genbook2 consistently induces a
segfault in VirtualBox's networking driver, impressively enough) and
am taking a look at how much work it would take to get the site
running on AWS EC2/DO or some other hosting provider.

Some things I'm wondering about:

- You mentioned a lot of people reaching out off-list about hosting
options. Do any of them look particularly appealing at the moment?
- How do I set up the ES service?

Sam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-05-17  1:56   ` Samuel Lijin
@ 2017-05-17  2:03     ` Jeff King
  2017-05-18 12:06       ` Lars Schneider
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-05-17  2:03 UTC (permalink / raw)
  To: Samuel Lijin; +Cc: git@vger.kernel.org

On Tue, May 16, 2017 at 09:56:37PM -0400, Samuel Lijin wrote:

> So I've finally found the time to get everything set up (in the
> process discovering that remote_genbook2 consistently induces a
> segfault in VirtualBox's networking driver, impressively enough) and
> am taking a look at how much work it would take to get the site
> running on AWS EC2/DO or some other hosting provider.
> 
> Some things I'm wondering about:
> 
> - You mentioned a lot of people reaching out off-list about hosting
> options. Do any of them look particularly appealing at the moment?

Yes. I actually have stuff to announce there soon, but was holding off
until the final pieces are in place. But basically, the architecture
would remain largely the same, but hosted on community-owned accounts
(that I can share access to), with sponsorship from various hosting
services.

> - How do I set up the ES service?

I haven't ever tried to do this in the local development environment.
The production site currently just use a cloud-hosted ES (Bonsai). They
have free "Sandbox" plans for testing, so you could probably use that as
a test resource after setting up the relevant environment variables. Or
alternatively, I think ElasticSearch folks produce binary builds you can
try, and you could host locally.

There's a rake job that inserts documents into the search index (see
lib/tasks/search.rake).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-05-17  2:03     ` Jeff King
@ 2017-05-18 12:06       ` Lars Schneider
  2017-05-18 15:42         ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Lars Schneider @ 2017-05-18 12:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Samuel Lijin, git@vger.kernel.org


> On 17 May 2017, at 04:03, Jeff King <peff@peff.net> wrote:
> 
> On Tue, May 16, 2017 at 09:56:37PM -0400, Samuel Lijin wrote:
> 
>> So I've finally found the time to get everything set up (in the
>> process discovering that remote_genbook2 consistently induces a
>> segfault in VirtualBox's networking driver, impressively enough) and
>> am taking a look at how much work it would take to get the site
>> running on AWS EC2/DO or some other hosting provider.
>> 
>> Some things I'm wondering about:
>> 
>> - You mentioned a lot of people reaching out off-list about hosting
>> options. Do any of them look particularly appealing at the moment?
> 
> Yes. I actually have stuff to announce there soon, but was holding off
> until the final pieces are in place. But basically, the architecture
> would remain largely the same, but hosted on community-owned accounts
> (that I can share access to), with sponsorship from various hosting
> services.
> 
>> - How do I set up the ES service?
> 
> I haven't ever tried to do this in the local development environment.
> The production site currently just use a cloud-hosted ES (Bonsai). They
> have free "Sandbox" plans for testing, so you could probably use that as
> a test resource after setting up the relevant environment variables. Or
> alternatively, I think ElasticSearch folks produce binary builds you can
> try, and you could host locally.
> 
> There's a rake job that inserts documents into the search index (see
> lib/tasks/search.rake).

Disclaimer: I am jumping in here without much knowledge. Feel free
to ignore me :-)

In our TravisCI builds we create the AsciiDoc/Doctor documentation
already. Couldn't we push that result to some static hosting service?
Would that help in any way with git-scm.com?

- Lars

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git-scm.com status report
  2017-05-18 12:06       ` Lars Schneider
@ 2017-05-18 15:42         ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2017-05-18 15:42 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Samuel Lijin, git@vger.kernel.org

On Thu, May 18, 2017 at 02:06:16PM +0200, Lars Schneider wrote:

> > I haven't ever tried to do this in the local development environment.
> > The production site currently just use a cloud-hosted ES (Bonsai). They
> > have free "Sandbox" plans for testing, so you could probably use that as
> > a test resource after setting up the relevant environment variables. Or
> > alternatively, I think ElasticSearch folks produce binary builds you can
> > try, and you could host locally.
> > 
> > There's a rake job that inserts documents into the search index (see
> > lib/tasks/search.rake).
> 
> Disclaimer: I am jumping in here without much knowledge. Feel free
> to ignore me :-)
> 
> In our TravisCI builds we create the AsciiDoc/Doctor documentation
> already. Couldn't we push that result to some static hosting service?
> Would that help in any way with git-scm.com?

Not really. The site builds the asciidoctor documentation already via an
automated job. This question was just about putting it into the search
index (which also happens in production with an automated job; this is
just about setting up the search database).  So I don't think there's
any real problem to be solved with respect to generating pages.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-05-18 15:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <16F9F83D-5A7F-4059-9A27-DB25A8FB1E99@gmail.com>
2017-02-02 22:51 ` git-scm.com status report Samuel Lijin
2017-02-03 12:08 ` Jeff King
     [not found]   ` <CAPMsMoAUcVteJGfyYrL1ZkNLnoRES0yZxkMZeL347Q_1Kx5VBQ@mail.gmail.com>
2017-02-03 22:24     ` Jeff King
     [not found]       ` <CAPMsMoDpAeD0hpPuLeWO2T1VoEZDf_hD2gA2GDBqypMF9V6gAw@mail.gmail.com>
2017-02-20  7:53         ` Jeff King
2017-02-02  2:33 Jeff King
2017-02-02  4:36 ` Eric Wong
2017-02-02  6:54   ` Samuel Lijin
2017-02-03 11:58     ` Jeff King
2017-02-03 20:56       ` Samuel Lijin
2017-02-05 20:11 ` Pranit Bauva
2017-02-06 16:24   ` Jeff King
2017-02-06 18:27 ` Jeff King
2017-02-09  2:12   ` brian m. carlson
2017-02-09  2:50     ` Jeff King
2017-02-09  4:30       ` Eric Wong
2017-05-17  1:56   ` Samuel Lijin
2017-05-17  2:03     ` Jeff King
2017-05-18 12:06       ` Lars Schneider
2017-05-18 15:42         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).