Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC for the event-based plugin system with YARD parsing plugin #1321

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

okuramasafumi
Copy link
Contributor

@okuramasafumi okuramasafumi commented Mar 20, 2025

This PR introduces the event-based plugin system along with a YARD-parsing plugin as an example of concrete plugins.

It demonstrates how we can build practical plugins while keeping the core code small. In this case, YARD-related logic is encapsulated in a plugin and used only when users specify with --plugins option.
This YARD plugin successfully parses https://github.com/okuramasafumi/alba comments written with YARD, but doesn't print YARD-specific information such as types since current RDoc doesn't support type information.

This PR is completely proof of concept, and is not meant to be merged now.
I'd like to discuss if this is the right way to extend RDoc for developers, and even for core team, and improve maintainability.

The original discussion is #1257

Comment on lines +879 to +881
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value|
@plugins.concat value
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use multiple --plugin to add multiple plugins.

Suggested change
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value|
@plugins.concat value
end
opt.on("--plugin=PLUGIN", "-P", "Use plugin") do |plugin|
@plugins << plugin
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which interface is better.

# This
rdoc --plugins yard_plugin,another_plugin

# Or that
rdoc --plugin yard_plugin --plugin another_plugin

Correct me if I'm wrong, but I think the later way is kind of rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "rare" mean here?
The latter is used in many cases. For example, ruby's -I is the latter interface. curl's --data/--header/... are also the latter interface.

The latter style is easyer to use from a script:

args=()
if [ "$USE_X_PLUGIN" = "yes" ]; then
  args+=(--plugin X_plugin)
fi
if [ "$USE_Y_PLUGIN" = "yes" ]; then
  args+=(--plugin Y_plugin)
fi
rdoc "${args[@]}" ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Thank you for correcting me, after some investigation, I noticed that we have both styles in Ruby and other tools.
Personally I don't have a strong opinion here. For the future version we can consider YAML configuration as a primary way to enable plugins (like rubocop).
I'm not against using multiple --plugin option here.

@@ -449,6 +455,9 @@ def document options
end
@options.finish

::RDoc::BasePlugin.activate_with(self)
@options.load_plugins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we load plugins dynamically and use require, plugins will not be loaded to multiple RDoc instance.

Do we need to load plugins for each RDoc::RDoc#document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, initialize is a better place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#initialize is better than #document but do we need to load plugins for each RDoc instance?
Can we load plugins only once in one process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see that using multiple RDoc::RDoc instance is common. From CLI or Rake task, I think we have only one instance of RDoc::RDoc.
However, we might have multiple instances in the future, probably for concurrency.
So then, where is the ideal place to activate plugins? I through it's RDoc::Rdoc since it's in the execution path anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gem install A B C creates a RDoc::RDoc for each package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's pretty common...
Would we want separate plugins and their configurations for each RDoc::RDoc instance? That said, if we want to isolate each environment, loading plugins for each instance is fine. If not, then we can load plugins all at once at somewhere else (with RDoc module maybe?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who does configure RDoc plugins?

If each package have RDoc plugins configuration, we may need per package plugins configuration. If users configure RDoc plugins, we may be able to share the same plugins configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought only users will configure RDoc plugins, but noticed that each gem can configure RDoc along with its plugins with rdoc_options.

args = @spec.rdoc_options

And I guess that's why we instantiate an RDoc instance for each package in the first place. So loading plugins for each package might be necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need to require for registering a plugin. We may just need to instantiate a plugin for it. For example:

module RDoc
  class YardPlugin < BasePlugin
    def on_rdoc_store_complete(store)
      # ...
    end
  end

  class RDoc
    def document(options)
      # ...
      plugins = options.plugins.collect do |name|
        instantiate_plugin(name) # name.capitalize.constantize.new or associating name with class or something...
      end
      # ...
      plugins.each do |plugin|
        plugin.on_rdoc_store_complete(@store) if plugin.respond_to?(:on_rdoc_store_complete)
      end
      # ...
    end
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is absolutely possible, in this case on_some_event style looks great and seems better than listens DSL style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a plugin system in RDoc::Markup::PreProcess.

Could you explain what is the merit of "the event-based plugin system" you suggested (and problems in the existing system)? Should existing RDoc::Markup::PreProcess be deprecated by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that RDoc::Markup::PreProcess requires adding directives to existing code, and that is a problem. One of the advantages of a new plugin system is that just enabling plugins works without modifying existing code.

Should existing RDoc::Markup::PreProcess be deprecated by this?

Using directives has its own advantages, so I'd like to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that RDoc::Markup::PreProcess has the "post_process" hook too:

##
# Adds a post-process handler for directives. The handler will be called
# with the result RDoc::Comment (or text String) and the code object for the
# comment (if any).
def self.post_process &block
@post_processors << block
end

Should we deprecate it by suggested plugin system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this PreProcess works with directives, and this time it's not I want to work with. The goal of this PR is to introduce a way to make RDoc understand YARD without modifying existing code.
That said, we don't have to (or should not) deprecate post_process here since it's about directives and removing it will be backward-incompatible, and cannot be replaced by the plugin system in this PR.

Copy link
Member

@kou kou Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It seems that the "post_process" hook doesn't depend on directives.

Could you try the following?

RDoc::Markup::PreProcess.post_process do |comment, code_object|
  if code_object.is_a?(RDoc::AnyMethod)
    puts "Parsing #{code_object.name}"
  end
end

It seems that the hook is called for all methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tompng TomDoc support is definitely a candidate of plugin extraction.

Supporting more YARD directives will will also helps it.

Plugin system makes it possible to "bring your own YARD tag". I think we need to make plugins inheritable to achieve this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Does #1257 describe problems well enough? If not please tell me so that I can add more information and details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if you can answer my questions there too 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@st0012 Oh, really sorry it completely slipped my mind. I added a reply there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, "event hook is enough" is just a hypothesis.
To prove it, more examples (ex: TomDoc, YARD with more implementation) are needed. I think it will reveal that something is missing, which strongly affects the design of this plugin system.

Of course one decision is to only support plugins that is possible by event hooks. But we need to know the limitation of it.
Perhaps we need to give up some tags(@attr_reader @attr_writer @attr_accessor) and directives(@!group @!macro @!attribute @!method @!parse), which might be possible to implement depending on the choice of API designs.
Does this plugin system only support implementing YARD tags? Is giving up @attr_reader, @!attribute and @!method OK?

Comment on lines +14 to +15
# meth.params = parsed_comment.param.map(&:to_s).join("\n")
meth.comment.text = parsed_comment.plain.join("\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugins should update the given objects destructively?
How to extend RDoc by a plugin when existing RDoc objects don't have a place to store additional data? In this case, how to use the parsed type information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I consider this and don't have a confident conclusion now.
I think there are two ways, one is using env object and another is modifying existing objects (meth in his case).
Using env would be safer, and storing type information would look like:

# In a plugin
env[meth] = DataFromYardPlugin.new(type: type)

# In a generator
type = env[meth].type

Here, DataFromYardPlugin contains various information along with type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple plugins that want to attach additional information to meth, the approach may be fragile. (One plugin may overwrite env[meth] set by another plugin.)

Can we provide more safer mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two ways of overriding: intentional and unintentional.

If the overriding is unintentional, it should be prevented. It can be done with something like below:

require 'logger'

module RDoc
  class Environment
    def initialize
      @data = {}
      @logger = Logger.new($stderr)
      @logger.level = Logger::INFO
      @hooks = {}
    end

    def [](key)
      @data[key]
    end

    def []=(key, value)
      old_value = @data[key]
      @data[key] = value
      @logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
      trigger_hooks(key, old_value, value)
    end

    def delete(key)
      @logger.debug("Delete env[#{key.inspect}]")
      old_value = @data.delete(key)
      trigger_hooks(key, old_value, nil)
    end

    def keys
      @data.keys
    end

    def fetch(key, default = nil)
      val = @data.fetch(key, default)
      @logger.debug("Fetch env[#{key.inspect}] => #{val.inspect}")
      val
    end

    def to_h
      @data.dup
    end

    def logger
      @logger
    end

    def on_key_set(key, &block)
      (@hooks[key] ||= []) << block
    end

    private

    def trigger_hooks(key, old_value, new_value)
      return unless @hooks[key]
      @hooks[key].each do |hook|
        begin
          hook.call(key, old_value, new_value, self)
        rescue => e
          @logger.error("Error in hook for key #{key.inspect}: #{e.message}")
          raise
        end
      end
    end
  end
end

env = RDoc::Environment.new
env[:foo] = 'foo'
env.on_key_set(:foo) { raise 'Do not override :foo }
env[:foo] = 'bar' # => Error

In this approach, preventing override is a responsibility of those who want to do so. In other words, this is not automatic. This is good because if it allows other plugins to override it, it just leaves it overridable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that attaching information to a code object is a common pattern for RDoc plugins.
(In your example, attaching type information to each method object.)

If the conflict is a rare situation, your approach may work. But if it's a common pattern, your approach will be inconvenient.

Do you think whether attaching information to a code object is a common pattern or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarification, then yes, extracting information from somewhere like comment string and attaching it to code objects would be typical for plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then env[object] = XXX and env.on_key_set(object) { raise "Do not override #{object.inspect}" } will be inconvenient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, preventing override by default is an option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the prevention happen, what should we (users? plugin developers) do as the next action?
Do we change key for env from env[object] = ... to something? What key should be used for the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current idea is to limit Hash-like method ([]=) to only add value, and if existing data is there, it raises an exception. Instead we offer update method so that users (mostly plugin authors) can intentionally override existing values.

    def []=(key, value)
      old_value = @data[key]
      raise if old_value

      @data[key] = value
      @logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
      trigger_hooks(key, old_value, value)
      value
    end
    alias add []=

    def update(key, value)
      old_value = @data[key]
      @data[key] = value
      @logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
      trigger_hooks(key, old_value, value)
      value
    end

Comment on lines +54 to +55
current_indentation_level = line[/^#\s*/]&.size || 0
if current_indentation_level >= @base_indentation_level + 2
Copy link
Member

@kou kou Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work when the first line is # ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly this indentation-related implementation is not perfect and will fail for many edge cases. I'll definitely add test cases for this.

}
def initialize(comment)
@comment = comment
@parsed_comment = ParsedComment.new([], nil, [], [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can use a local variable not an instance variable for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thank you for pointing out.

desc = matchdata[:desc1] || matchdata[:desc2]
parsed_type = TypeParser.parse(type)
@parsed_comment[:param] << ParamData.new(type: parsed_type, name: name, desc: desc)
@mode = :param
Copy link
Member

@tompng tompng Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing previous_tag here instead of storing mode = :param, appending part will be simple

tag = ParamData.new(type: parsed_type, name: name, desc: desc)
@parsed_comment[:param] << tag
previous_tag = tag
# Append to the previous tag
- data = @mode == :param || @mode == :raise ? @parsed_comment[@mode].last : @parsed_comment[@mode]
- data.append_desc(line)
+ previous_tag.append_desc(line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'll take this!

type_name: /#\w+|((::)?\w+)+/,
literal: /(?:
'(?:\\'|[^'])*' |
"(?:\\"|[^"])*" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some regexps are wrong.
"\" (escaped, quote is not closed) matches to this string literal regexp.
Integer and Float literal(integer part) matches to type_name: /#\w+|((::)?\w+)+/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't notice this. I tested this code with some variations of type and it seems working. Tests are missing and we need to add them to ensure this regex working correctly since it's quite complex.

# Without calling this, plugins won't work

def self.activate_with(rdoc = ::RDoc::RDoc.current)
@@rdoc = rdoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using class variable for this means any later plugin calls activate_with with its own RDoc will apply that to all other plugins. I don't think this is what we want?

class Base
  @@name = "Base"

  def name
    @@name
  end

  def self.update_name(name)
    @@name = name
  end
end

class Foo < Base
end

class Bar < Base
end

Foo.update_name("Foo")
f = Foo.new

Bar.update_name("Bar")
b = Bar.new

puts f.name # => "Bar"
puts b.name # => "Bar"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's safe to use a class variable here since we don't have to update rdoc object.
However, some information such as a name of the plugin, should be stored as a class instance variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If we don't expect plugins to call this method, then I don't think it should be in the base class as it is.
  • It doesn't look like the yard plugin needs this attribute yet? If that's the case, let's not introduce this until we have a real use case to help us define the API better.
  • If we want to pass RDoc's states down to plugins, let's avoid passing the RDoc object unless we have a solid use case for it. There's a lot of unnecessary coupling between RDoc's major components and I have been untangling for a while (example). Exposing RDoc directly will potentially make it harder for such work as all of its attributes & methods will be considered public APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I didn't notice that it's not used at all now. I used it at first, but store has enough information for the plugin so I removed it.
Then we can remove this class variable. And I agree, we should avoid RDoc::RDoc object (thank you for untangling it!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd expect we expose Options and Store objects first if we need to share states with plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ruby/rdoc/pull/1321/files#diff-2458b7fcdf31ccd44fa0681ba7e381360b3e4623c9f5409358222e4826c3c60dR6

Sorry it turned out that we do use rdoc object here because we need some global data store for event handling.
I still agree that we should avoid RDoc::RDoc instance, but then where should we put these event registration logic? Maybe we can introduce something like PluginManager and pass it to plugins?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants