ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "kjtsanaktsidis (KJ Tsanaktsidis)" <ruby-core@ml.ruby-lang.org>
Subject: [ruby-core:113678] [Ruby master Feature#19057] Hide implementation of `rb_io_t`.
Date: Sat, 27 May 2023 08:49:37 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-103322.20230527084937.3344@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-19057.20221015015457.3344@ruby-lang.org

Issue #19057 has been updated by kjtsanaktsidis (KJ Tsanaktsidis).


I did a bit of research on this topic this evening.

Firstly, some technical notes r.e. undefined behaviour.

My understanding is that "Current proposal" is really undefined behaviour. This is in the C standard, section 6.5.2.3, point 6 (http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf)

> One special guarantee is made in order to simplify the use of unions: if a union contains several
> structures that share a common initial sequence (see below), and if the union object currently contains
> one of these structures, it is permitted to inspect the common initial part of any of them anywhere
> that a declaration of the completed type of the union is visible. Two structures share a common initial
> sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a
> sequence of one or more initial members

It seems that this "common initial sequence" rule is really only for structures accessed through unions. Another fun gotcha is that, even if you _do_ do all the accesses through a union, the compiler can assume that `rb_io_public_t *` and `rb_io_private_t *` should never alias each other (which is the exact opposite of what you want). E.g. - this program has a different result depending on whether optimizations are on or not - http://coliru.stacked-crooked.com/a/b57c8dd9e2ef3a02

I think to be technically correct, we would need to go for the "Nested public interface" approach - structs _can_ be converted to a pointer to their first field according to the C standard (although I guess the _reverse_ cast from `rb_io_public_t *` back to `rb_io_private_t *` would be UB?)

------------

I don't think this matters though, because I agree that, looking at the contents of `rb_io_t`, almost none of this should be public API, and we should strive to encapsulate it fully, like "Hide all details" (or like @Eregon suggested too).

I did a Github code search for usages of `rb_io_t`, and _all_ of the usages of it I could find basically fell into this pattern.

```
VALUE some_io = /* from somewhere */;
rb_io_t *fptr;
GetOpenFile(some_io, fptr)

// call an io.h method that takes rb_io_t *
rb_io_set_nonblock(fptr);
// or read the fd member and do something with that
do_something_external_with_fd(fptr->fd);
```

I think we could avoid 99% of the breakage, and still hide all the implementation details of rb_io_t, by basically redefining `rb_io_t` to contain just the FD, and define a new internal, opaque type for internal IO use. We would do this like so:

Firstly, `struct RFile` is currently in include/ruby/internal/core/rfile.h as:

```
struct rb_io_t;
struct RFile {
    struct RBasic basic;
    struct rb_io_t *fptr;
};
```

There are two spare words in there, and we would use one to add a pointer to a new structure. The definition of that structure would _not_ be provided in public headers anyhere. So, we would change RFile in include/ruby/internal/core/rfile.h to be:

```
struct rb_io_t;
struct rb_io_impl;
struct RFile {
    struct RBasic basic;
    struct rb_io_t *fptr;
    struct rb_io_impl *impl;
};
```


Then, we would change the definition of `struct rb_io_t` in include/ruby/io.h to be:

```
typedef struct rb_io_t {
    VALUE self;
    int fd;
    // _everything_ else is removed
} rb_io_t;
```

We would move all of its current contents to a new struct in internal/io.h:

```
struct rb_io_impl {
    // All the juicy implementation details go here - _except_ no need to duplicate `self` and `fd`.
};
```

The current definition of `GetOpenFile` essentially does `RFILE(obj)->fptr`, so that would return the newly-slimmed-down public `rb_io_t`. Attempts to read the file descriptor off that would continue to work.

Other methods in include/ruby/io.h which take a `rb_io_t *` as an argument, like e.g. `rb_io_set_nonblock`, would do `RFILE(fptr->self)->impl` to get access to the implementation details struct. We could also define variants of these which took the `VALUE` instead of the `rb_io_t *` to avoid the pointer-chase of `fptr->self` if desired.

The downside of this approach is that, as you identified, it costs an extra indirection in the implementation of most IO methods to obtain the impl from the VALUE. But, the benefits are that we get almost total encapsulation whilst maintaining backwards compatibility with almost all existing code. We can also shuffle fields between the private & public structs depending on observed compatibility issues found in the wild too.


----------------------------------------
Feature #19057: Hide implementation of `rb_io_t`.
https://bugs.ruby-lang.org/issues/19057#change-103322

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
----------------------------------------
In order to make improvements to the IO implementation like <https://bugs.ruby-lang.org/issues/18455>, we need to add new fields to `struct rb_io_t`.

By the way, ending types in `_t` is not recommended by POSIX, so I'm also trying to rename the internal implementation to drop `_t` where possible during this conversion.

Anyway, we should try to hide the implementation of `struct rb_io`. Ideally, we don't expose any of it, but the problem is backwards compatibility.

So, in order to remain backwards compatibility, we should expose some fields of `struct rb_io`, the most commonly used one is `fd` and `mode`, but several others are commonly used.

There are many fields which should not be exposed because they are implementation details.

## Current proposal

The current proposed change <https://github.com/ruby/ruby/pull/6511> creates two structs:

```c
// include/ruby/io.h
#ifndef RB_IO_T
struct rb_io {
  int fd;
  // ... public fields ...
};
#else
struct rb_io;
#endif

// internal/io.h
#define RB_IO_T
struct rb_io {
  int fd;
  // ... public fields ...
  // ... private fields ...
};
```

However, we are not 100% confident this is safe according to the C specification. My experience is not sufficiently wide to say this is safe in practice, but it does look okay to both myself, and @Eregon + @tenderlovemaking have both given some kind of approval.

That being said, maybe it's not safe.

There are two alternatives:

## Hide all details

We can make public `struct rb_io` completely invisible.

```c
// include/ruby/io.h
#define RB_IO_HIDDEN
struct rb_io;
int rb_ioptr_descriptor(struct rb_io *ioptr); // accessor for previously visible state.

// internal/io.h
struct rb_io {
  // ... all fields ...
};
```

This would only be forwards compatible, and code would need to feature detect like this:

```c
#ifdef RB_IO_HIDDEN
#define RB_IOPTR_DESCRIPTOR rb_ioptr_descriptor
#else
#define RB_IOPTR_DESCRIPTOR(ioptr) rb_ioptr_descriptor(ioptr)
#endif
```

## Nested public interface

Alternatively, we can nest the public fields into the private struct:

```c
// include/ruby/io.h
struct rb_io_public {
  int fd;
  // ... public fields ...
};

// internal/io.h
#define RB_IO_T
struct rb_io {
  struct rb_io_public public;
  // ... private fields ...
};
```

## Considerations

I personally think the "Hide all details" implementation is the best, but it's also the lest compatible. This is also what we are ultimately aiming for, whether we decide to take an intermediate "compatibility step" is up to us.

I think "Nested public interface" is messy and introduces more complexity, but it might be slightly better defined than the "Current proposal" which might create undefined behaviour. That being said, all the tests are passing.





-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

  parent reply	other threads:[~2023-05-27  8:49 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-15  1:54 [ruby-core:110300] [Ruby master Bug#19057] Hide implementation of `rb_io_t` ioquatix (Samuel Williams)
2022-10-15  2:12 ` [ruby-core:110301] " ioquatix (Samuel Williams)
2022-10-15 11:02 ` [ruby-core:110309] " Eregon (Benoit Daloze)
2022-10-17  7:07 ` [ruby-core:110348] " ioquatix (Samuel Williams)
2023-05-27  8:49 ` kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core [this message]
2023-05-27  9:58 ` [ruby-core:113680] [Ruby master Feature#19057] " Eregon (Benoit Daloze) via ruby-core
2023-05-27 10:12 ` [ruby-core:113681] " ioquatix (Samuel Williams) via ruby-core
2023-05-28  2:19 ` [ruby-core:113689] " ianks (Ian Ker-Seymer) via ruby-core
2023-05-29  9:15 ` [ruby-core:113696] " Eregon (Benoit Daloze) via ruby-core
2023-05-30  1:04 ` [ruby-core:113698] " ioquatix (Samuel Williams) via ruby-core
2023-06-01  0:52 ` [ruby-core:113723] " ioquatix (Samuel Williams) via ruby-core
2023-06-01  9:56 ` [ruby-core:113728] " Eregon (Benoit Daloze) via ruby-core
2023-06-01 11:11 ` [ruby-core:113731] " ioquatix (Samuel Williams) via ruby-core
2023-06-08  2:39 ` [ruby-core:113802] " ioquatix (Samuel Williams) via ruby-core
2023-06-09  8:14 ` [ruby-core:113844] " byroot (Jean Boussier) via ruby-core
2023-06-09  8:33 ` [ruby-core:113845] " ioquatix (Samuel Williams) via ruby-core
2023-06-23 10:36 ` [ruby-core:114014] " kamil-gwozdz via ruby-core
2023-07-07  0:41 ` [ruby-core:114104] " k0kubun (Takashi Kokubun) via ruby-core
2023-07-07  2:04 ` [ruby-core:114106] " nobu (Nobuyoshi Nakada) via ruby-core
2023-07-26 22:23 ` [ruby-core:114298] " k0kubun (Takashi Kokubun) via ruby-core
2023-08-24 13:41 ` [ruby-core:114491] " ioquatix (Samuel Williams) via ruby-core
2023-08-25  1:07 ` [ruby-core:114521] " naruse (Yui NARUSE) via ruby-core
2023-08-25  1:43 ` [ruby-core:114522] " ioquatix (Samuel Williams) via ruby-core
2023-09-05  9:24 ` [ruby-core:114627] " byroot (Jean Boussier) via ruby-core
2024-01-14  3:20 ` [ruby-core:116197] " ioquatix (Samuel Williams) via ruby-core
2024-01-14  3:30 ` [ruby-core:116198] " ioquatix (Samuel Williams) via ruby-core
2024-01-15  3:23   ` [ruby-core:116211] " Eric Wong via ruby-core
2024-01-15  5:49 ` [ruby-core:116213] " ioquatix (Samuel Williams) via ruby-core
2024-01-23 12:56   ` [ruby-core:116377] " Eric Wong via ruby-core
2024-01-23 13:44 ` [ruby-core:116379] " Eregon (Benoit Daloze) via ruby-core
2024-01-24  1:03   ` [ruby-core:116389] " Eric Wong via ruby-core
2024-03-18  1:58 ` [ruby-core:117204] " mame (Yusuke Endoh) via ruby-core
2024-03-18  5:22 ` [ruby-core:117206] " ioquatix (Samuel Williams) via ruby-core
2024-03-18  7:43 ` [ruby-core:117207] " byroot (Jean Boussier) via ruby-core
2024-03-18  8:03 ` [ruby-core:117208] " ioquatix (Samuel Williams) via ruby-core
2024-03-19  2:35 ` [ruby-core:117220] " mame (Yusuke Endoh) via ruby-core
2024-03-23 18:23   ` [ruby-core:117298] " Eric Wong via ruby-core
2024-03-19  3:07 ` [ruby-core:117224] " matz (Yukihiro Matsumoto) via ruby-core
2024-03-19  9:33 ` [ruby-core:117226] " ioquatix (Samuel Williams) via ruby-core
2024-03-19 10:56 ` [ruby-core:117228] " ioquatix (Samuel Williams) via ruby-core
2024-03-19 11:44 ` [ruby-core:117229] " Eregon (Benoit Daloze) via ruby-core
2024-03-19 12:33 ` [ruby-core:117230] " ioquatix (Samuel Williams) via ruby-core
2024-03-20 13:02 ` [ruby-core:117262] " mame (Yusuke Endoh) via ruby-core
2024-03-20 18:40 ` [ruby-core:117267] " Eregon (Benoit Daloze) via ruby-core
2024-03-22  2:28 ` [ruby-core:117289] " ioquatix (Samuel Williams) via ruby-core
2024-03-22  3:55 ` [ruby-core:117290] " k0kubun (Takashi Kokubun) via ruby-core
2024-03-23 21:49 ` [ruby-core:117300] " ioquatix (Samuel Williams) via ruby-core
2024-03-28  8:32   ` [ruby-core:117361] " Eric Wong via ruby-core
2024-03-24  4:45 ` [ruby-core:117301] " mame (Yusuke Endoh) via ruby-core
2024-03-25  8:28 ` [ruby-core:117310] " byroot (Jean Boussier) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-103322.20230527084937.3344@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).