ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "shioyama (Chris Salzberg)" <noreply@ruby-lang.org>
To: ruby-core@neon.ruby-lang.org
Subject: [ruby-core:110238] [Ruby master Feature#19024] Proposal: Import Modules
Date: Sat, 08 Oct 2022 14:27:06 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-99525.20221008142705.13031@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-19024.20220927005127.13031@ruby-lang.org

Issue #19024 has been updated by shioyama (Chris Salzberg).


Before I start, I'd like to put aside the problem of transitive `require` and of compiled native extensions for a moment. These are the most contentious points of this proposal, and I now regret making them so central because they are not actually essential.

Reading the feedback here, I have come to realize that the distinction between "packages" and "imports" is the more important point, so I'm going to focus on that.

> In order to understand the goal description "isolate components", it would help me if you could describe one concrete way to use this idea in an application, and that description should cover the implications for unrelated 3rd-party gems.

Let me start by clarifying the word "components" here, because it may not have been the best choice of word on my part.

I see the namespace problem here as one of scaling in two different "spaces" of components:

1. The space of code living together in a single application
2. The space of code shared between all applications (gems)

I want to focus on how two concepts, encapsulation and namespacing, relate to scaling challenges in these two spaces. That will motivate the proposal I've presented here.

Encapsulation and namespacing are directly related: Ruby's mechanism for encapsulation _is_ namespacing. You name something in a file and define what you want under it, and hope nobody reaches into that module namespace when they shouldn't. You have `private_constant` and that's about it.

The fact that namespacing is the main mechanism to enforce encapsulation is problematic in my opinion because it fundamentally misaligns two very important incentives, one natural and one that we want to create (both in application code and in gem code).

The first thing that is naturally incentivized (by the effort it takes to do it) is **to write less code**, particularly boilerplate code. It's much easier to write `Product` than it is to write `Merchandising::Product`, and much easier to not wrap your gem code in `module Merchandising` than it is to wrap it. The interpreter may treat these roughly the same way, but humans will see them quite differently and naturally prefer the former over the latter.

The second thing that we _want_ to incentivize is **to group related code together**.  And because naming _is_ encapsulating, grouping requires namespacing: the merchandising concept of "product" should be named `Merchandising::Product` and not `Product`. Moreover, as a taxonomy of concepts grows, we need further subdivisions, which means deeper namespacing.

So **incentives are in direct opposition**: in order to _do the right thing_, you need to be very conscientious to wrap all your code in the appropriate literal namespace, even though the natural motivation is _not_ to do that. This problem only gets worse as a codebase grows: do we group "External Payment API clients" together under `Payments::ApiClients` or just under `Payments`? Grouping code in a natural way requires sacrificing convenience.

This is a terrible tradeoff. The reality is that however much you can try to encourage "doing the right thing", you will always be fighting a losing battle. (I should know, I'm fighting this battle every day!) And this is a battle which I believe is unnecessary, because the literal namespace is mostly redundant; directory structure already serves to signal grouping.

The "packaging" approach, by which I mean what Packwerk does, enforces boundaries with a stick, but it does not fix this profound misalignment. In fact, it entrenches literal namespaces as the guardian of boundaries, which I think is fundamentally the wrong approach.

> I believe your pain point is that Ruby does not have formal namespaces, and we share it.

Yes, but there is a more subtle point that I've so far been unsuccessful at conveying, partly because only in writing this have I come to see it clearly myself.

The points I made above are about _literal_ namespaces, by which I mean namespaces that are literally written into the file. Contrast this with the case of `load "foo.rb", mod`, where `mod` acts as namespace in `foo.rb` but is _implicit_. **In this case, the incentives above can in fact be aligned.**

@fxn To get back to your original question, let's assume this is _opt-in_, and that it does not apply to compiled extensions (gems that want to opt-in would be able to do so however). I think those are the key points that makes this contentious.

So with those out of the way, what I want is that instead of this:

```ruby
# payments/api_clients/foo_client.rb
require "my_client_gem"

module Payments
  module ApiClients
    class FooClient < MyClientGem::ApiClient
      # ...
    end
  end
end

# payments/api_clients/bar_client.rb
require "my_client_gem"

module Payments
  module ApiClients
    class BarClient < MyClientGem::ApiClient
      # ...
    end
  end
end

# payments.rb
require "payments/api_clients/foo_client"
require "payments/api_clients/bar_client"

module Payments
  # do something with ApiClients::FooClient, ApiClient::BarClient etc
end
```

we have instead something like this (assuming "my_client_gem" opts in to being "importable", whatever that means):

```ruby
# payments/api_clients/foo_client.rb
api_client = import "my_client_gem/api_client"

class FooClient < api_client::ApiClient
  # ...
end

# payments/api_clients/bar_client.rb
api_client = import "my_client_gem/api_client"

class BarClient < api_client::ApiClient
  # ...
end

# payments.rb
module Payments
  foo_client = import "./api_clients/foo_client"
  bar_client = import "./api_clients/bar_client"

  # do something with foo_client::FooClient and bar_client::BarClient
end
```

To me at least, having dealt with _reams_ of namespace boilerplate, I cannot express to you what a pleasure it is just to write this here. It takes away so much that is irrelevant and leaves only what _is_ relevant: what the code is actually doing. This I believe is why this idea has generated [so much excitement](https://twitter.com/flavorjones/status/1570390633524744195).

At this point, what I've written above is already implementable with the recent change to `load` alone. I am not depending on transitivity of `require` here and the I'm assuming the code in `my_client_gem` is all Ruby and has no native extension. (Assume here that `my_client_gem` opts in to make its code "importable", whatever that means -- this is something to work out).

So the misalignment of incentives, as I've presented it, is resolvable in a way. But there's a problem, because while I have "imported" `"payments/api_clients/foo_client`, that imported code can freely access anything else in the toplevel namespace. So `::Payments` in `payments/api_clients/foo_client.rb` resolves to the toplevel `::Payments`.

In other words, the problem that Packwerk solves is still there.

We are actually _really close_ though to what I think is a better solution to that problem. If toplevel in the imported file resolved to the top of the import context, we would actually achieve a kind of "nested encapsulation". A wrapped load context only "sees" as far up as its wrap module. It is essentially a "universe unto itself". The importer side can see down the namespace, but the "importee" cannot see up past its toplevel.

There is no conflict with `require` here because code that is required always resolves to the absolute toplevel, nothing changes there. Code that is loaded under a wrap namespace cannot see outside its namespace unless its load module has references to that global context.

This means that anytime you want a new toplevel, you can have one. The original "true" toplevel (used by `require`) is still there as always.

I have roughly implemented this idea with cref flags in [my Ruby patch](https://github.com/ruby/ruby/compare/master...shioyama:ruby:import_modules) (ignore the change to make `require` transitive). Although there are edge cases to consider (and I can see a couple), I feel this is actually the potential basis for an implementation of "imports" which avoids the fundamental problems raised so far, while offering the key missing element to make "code wrapping" become a much more powerful concept for encapsulation and code organization both in application and in gem code.

I'll stop here because this is already way too long. Happy to elaborate further on points that might be unclear.

----------------------------------------
Feature #19024: Proposal: Import Modules
https://bugs.ruby-lang.org/issues/19024#change-99525

* Author: shioyama (Chris Salzberg)
* Status: Open
* Priority: Normal
----------------------------------------
There is no general way in Ruby to load code outside of the globally-shared namespace. This makes it hard to isolate components of an application from each other and from the application itself, leading to complicated relationships that can become intractable as applications grow in size.

The growing popularity of a gem like [Packwerk](https://github.com/shopify/packwerk), which provides a new concept of "package" to enfoce boundaries statically in CI, is evidence that this is a real problem. But introducing a new packaging concept and CI step is at best only a partial solution, with downsides: it adds complexity and cognitive overhead that wouldn't be necessary if Ruby provided better packaging itself (as Matz has suggested [it should](https://youtu.be/Dp12a3KGNFw?t=2956)).

There is _one_ limited way in Ruby currently to load code without polluting the global namespace: `load` with the `wrap` parameter, which as of https://bugs.ruby-lang.org/issues/6210 can now be a module. However, this option does not apply transitively to `require` calls within the loaded file, so its usefulness is limited.

My proposal here is to enable module imports by doing the following:

1. apply the `wrap` module namespace transitively to `require`s inside the loaded code, including native extensions (or provide a new flag or method that would do this),
2. make the `wrap` module the toplevel context for code loaded under it, so `::Foo` resolves to `<top_wrapper>::Foo` in loaded code (or, again, provide a new flag or method that would do this). _Also make this apply when code under the wrapper module is called outside of the load process (when `top_wrapper` is no longer set) &mdash; this may be quite hard to do_.
3. resolve `name` on anonymous modules under the wrapped module to their names without the top wrapper module, so `<top_wrapper>::Foo.name` evaluates to `"Foo"`. There may be other ways to handle this problem, but a gem like Rails uses `name` to resolve filenames and fails when anonymous modules return something like `#<Module: ...>::ActiveRecord` instead of just `ActiveRecord`.

I have roughly implemented these three things in [this patch](https://github.com/ruby/ruby/compare/master...shioyama:ruby:import_modules). This implementation is incomplete (it does not cover the last highlighted part of 2) but provides enough of a basis to implement an `import` method, which I have done in a gem called [Im](https://github.com/shioyama/im).

Im provides an `import` method which can be used to import gem code under a namespace:

```ruby
require "im"
extend Im

active_model = import "active_model"
#=> <#Im::Import root: active_model>

ActiveModel
#=> NameError

active_model::ActiveModel
#=> ActiveModel

active_record = import "active_record"
#=> <#Im::Import root: active_record>

# Constants defined in the same file under different imports point to the same objects
active_record::ActiveModel == active_model::ActiveModel
#=> true
```

With the constants all loaded under an anonymous namespace, any code importing the gem can name constants however it likes:

```ruby
class Post < active_record::ActiveRecord::Base
end

AR = active_record::ActiveRecord

Post.superclass
#=> AR::Base
```

Note that this enables the importer to completely determine the naming for every constant it imports. So gems can opt to hide their dependencies by "anchoring" them inside their own namespace, like this:

```ruby
# in lib/my_gem.rb
module MyGem
  dep = import "my_gem_dependency"

  # my_gem_dependency is "anchored" under the MyGem namespace, so not exposed to users
  # of the gem unless they also require it.
  MyGemDependency = dep

  #...
end
```

There are a couple important implementation decisions in the gem:

1. _Only load code once._ When the same file is imported again (either directly or transitively), "copy" constants from previously imported namespace to the new namespace using a registry which maps which namespace (import) was used to load which file (as shown above with activerecord/activemodel). This is necessary to ensure that different imports can "see" shared files. A similar registry is used to track autoloads so that they work correctly when used from imported code.
2. Toplevel core types (`NilClass`, `TrueClass`, `FalseClass`, `String`, etc) are "aliased" to constants under each import module to make them available. Thus there can be side-effects of importing code, but this allows a gem like Rails to monkeypatch core classes which it needs to do for it to work.
3. `Object.const_missing` is patched to check the caller location and resolve to the constant defined under an import, if there is an import defined for that file.

To be clear: **I think 1) should be implemented in Ruby, but not 2) and 3).** The last one (`Object.const_missing`) is a hack to support the case where a toplevel constant is referenced from a method called in imported code (at which point the `top_wrapper` is not active.)

I know this is a big proposal, and there are strong opinions held. I would really appreciate constructive feedback on this general idea.

See also similar discussion in: https://bugs.ruby-lang.org/issues/10320



-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2022-10-08 14:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27  0:51 [ruby-core:110097] [Ruby master Feature#19024] Proposal: Import Modules shioyama (Chris Salzberg)
2022-09-27  7:02 ` [ruby-core:110101] " fxn (Xavier Noria)
2022-09-27  8:06 ` [ruby-core:110102] " byroot (Jean Boussier)
2022-09-27  8:33 ` [ruby-core:110104] " shioyama (Chris Salzberg)
2022-09-27  8:50 ` [ruby-core:110105] " fxn (Xavier Noria)
2022-09-27 12:19 ` [ruby-core:110106] " shioyama (Chris Salzberg)
2022-09-27 14:36 ` [ruby-core:110108] " austin (Austin Ziegler)
2022-10-03 12:07 ` [ruby-core:110170] " shioyama (Chris Salzberg)
2022-10-04  8:40 ` [ruby-core:110176] " fxn (Xavier Noria)
2022-10-04  8:44 ` [ruby-core:110177] " fxn (Xavier Noria)
2022-10-04 13:22 ` [ruby-core:110179] " shioyama (Chris Salzberg)
2022-10-04 13:46 ` [ruby-core:110180] " fxn (Xavier Noria)
2022-10-04 13:50 ` [ruby-core:110181] " byroot (Jean Boussier)
2022-10-04 13:54 ` [ruby-core:110182] " fxn (Xavier Noria)
2022-10-04 14:29 ` [ruby-core:110183] " austin (Austin Ziegler)
2022-10-04 23:58 ` [ruby-core:110184] " shioyama (Chris Salzberg)
2022-10-06  9:26 ` [ruby-core:110206] " shioyama (Chris Salzberg)
2022-10-06 16:23 ` [ruby-core:110216] " austin (Austin Ziegler)
2022-10-07 12:20 ` [ruby-core:110227] " shioyama (Chris Salzberg)
2022-10-08 14:27 ` shioyama (Chris Salzberg) [this message]
2022-10-08 18:30 ` [ruby-core:110239] " jeremyevans0 (Jeremy Evans)
2022-10-11  2:51 ` [ruby-core:110253] " shioyama (Chris Salzberg)
2022-10-12  4:25 ` [ruby-core:110266] " austin (Austin Ziegler)
2022-10-12  6:29 ` [ruby-core:110269] " shioyama (Chris Salzberg)
2022-10-18  7:55 ` [ruby-core:110379] " shioyama (Chris Salzberg)
2023-02-17  8:35 ` [ruby-core:112466] " rubyFeedback (robert heiler) via ruby-core
2023-02-19  5:49 ` [ruby-core:112492] " shioyama (Chris Salzberg) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-99525.20221008142705.13031@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=ruby-core@neon.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).