-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC for the event-based plugin system with YARD parsing plugin #1321
base: master
Are you sure you want to change the base?
Conversation
f2bd262
to
5a2f695
Compare
5a2f695
to
9f0b3ad
Compare
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value| | ||
@plugins.concat value | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use multiple --plugin
to add multiple plugins.
opt.on("--plugins=PLUGINS", "-P", Array, "Use plugins") do |value| | |
@plugins.concat value | |
end | |
opt.on("--plugin=PLUGIN", "-P", "Use plugin") do |plugin| | |
@plugins << plugin | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which interface is better.
# This
rdoc --plugins yard_plugin,another_plugin
# Or that
rdoc --plugin yard_plugin --plugin another_plugin
Correct me if I'm wrong, but I think the later way is kind of rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "rare" mean here?
The latter is used in many cases. For example, ruby
's -I
is the latter interface. curl
's --data
/--header
/... are also the latter interface.
The latter style is easyer to use from a script:
args=()
if [ "$USE_X_PLUGIN" = "yes" ]; then
args+=(--plugin X_plugin)
fi
if [ "$USE_Y_PLUGIN" = "yes" ]; then
args+=(--plugin Y_plugin)
fi
rdoc "${args[@]}" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou Thank you for correcting me, after some investigation, I noticed that we have both styles in Ruby and other tools.
Personally I don't have a strong opinion here. For the future version we can consider YAML configuration as a primary way to enable plugins (like rubocop).
I'm not against using multiple --plugin
option here.
@@ -449,6 +455,9 @@ def document options | |||
end | |||
@options.finish | |||
|
|||
::RDoc::BasePlugin.activate_with(self) | |||
@options.load_plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we load plugins dynamically and use require
, plugins will not be loaded to multiple RDoc instance.
Do we need to load plugins for each RDoc::RDoc#document
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, initialize
is a better place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#initialize
is better than #document
but do we need to load plugins for each RDoc instance?
Can we load plugins only once in one process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see that using multiple RDoc::RDoc
instance is common. From CLI or Rake task, I think we have only one instance of RDoc::RDoc
.
However, we might have multiple instances in the future, probably for concurrency.
So then, where is the ideal place to activate plugins? I through it's RDoc::Rdoc
since it's in the execution path anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gem install A B C
creates a RDoc::RDoc
for each package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's pretty common...
Would we want separate plugins and their configurations for each RDoc::RDoc
instance? That said, if we want to isolate each environment, loading plugins for each instance is fine. If not, then we can load plugins all at once at somewhere else (with RDoc
module maybe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who does configure RDoc plugins?
If each package have RDoc plugins configuration, we may need per package plugins configuration. If users configure RDoc plugins, we may be able to share the same plugins configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought only users will configure RDoc plugins, but noticed that each gem can configure RDoc along with its plugins with rdoc_options
.
rdoc/lib/rdoc/rubygems_hook.rb
Line 167 in 9f0b3ad
args = @spec.rdoc_options |
And I guess that's why we instantiate an RDoc instance for each package in the first place. So loading plugins for each package might be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need to require
for registering a plugin. We may just need to instantiate a plugin for it. For example:
module RDoc
class YardPlugin < BasePlugin
def on_rdoc_store_complete(store)
# ...
end
end
class RDoc
def document(options)
# ...
plugins = options.plugins.collect do |name|
instantiate_plugin(name) # name.capitalize.constantize.new or associating name with class or something...
end
# ...
plugins.each do |plugin|
plugin.on_rdoc_store_complete(@store) if plugin.respond_to?(:on_rdoc_store_complete)
end
# ...
end
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is absolutely possible, in this case on_some_event
style looks great and seems better than listens
DSL style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a plugin system in RDoc::Markup::PreProcess
.
Could you explain what is the merit of "the event-based plugin system" you suggested (and problems in the existing system)? Should existing RDoc::Markup::PreProcess
be deprecated by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that RDoc::Markup::PreProcess
requires adding directives to existing code, and that is a problem. One of the advantages of a new plugin system is that just enabling plugins works without modifying existing code.
Should existing RDoc::Markup::PreProcess be deprecated by this?
Using directives has its own advantages, so I'd like to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that RDoc::Markup::PreProcess
has the "post_process" hook too:
rdoc/lib/rdoc/markup/pre_process.rb
Lines 25 to 32 in 894b2f1
## | |
# Adds a post-process handler for directives. The handler will be called | |
# with the result RDoc::Comment (or text String) and the code object for the | |
# comment (if any). | |
def self.post_process &block | |
@post_processors << block | |
end |
Should we deprecate it by suggested plugin system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, this PreProcess
works with directives, and this time it's not I want to work with. The goal of this PR is to introduce a way to make RDoc understand YARD without modifying existing code.
That said, we don't have to (or should not) deprecate post_process
here since it's about directives and removing it will be backward-incompatible, and cannot be replaced by the plugin system in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems that the "post_process" hook doesn't depend on directives.
Could you try the following?
RDoc::Markup::PreProcess.post_process do |comment, code_object|
if code_object.is_a?(RDoc::AnyMethod)
puts "Parsing #{code_object.name}"
end
end
It seems that the hook is called for all methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tompng TomDoc support is definitely a candidate of plugin extraction.
Supporting more YARD directives will will also helps it.
Plugin system makes it possible to "bring your own YARD tag". I think we need to make plugins inheritable to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great if you can answer my questions there too 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@st0012 Oh, really sorry it completely slipped my mind. I added a reply there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, "event hook is enough" is just a hypothesis.
To prove it, more examples (ex: TomDoc, YARD with more implementation) are needed. I think it will reveal that something is missing, which strongly affects the design of this plugin system.
Of course one decision is to only support plugins that is possible by event hooks. But we need to know the limitation of it.
Perhaps we need to give up some tags(@attr_reader
@attr_writer
@attr_accessor
) and directives(@!group
@!macro
@!attribute
@!method
@!parse
), which might be possible to implement depending on the choice of API designs.
Does this plugin system only support implementing YARD tags? Is giving up @attr_reader
, @!attribute
and @!method
OK?
# meth.params = parsed_comment.param.map(&:to_s).join("\n") | ||
meth.comment.text = parsed_comment.plain.join("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plugins should update the given objects destructively?
How to extend RDoc by a plugin when existing RDoc objects don't have a place to store additional data? In this case, how to use the parsed type information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I consider this and don't have a confident conclusion now.
I think there are two ways, one is using env
object and another is modifying existing objects (meth
in his case).
Using env would be safer, and storing type information would look like:
# In a plugin
env[meth] = DataFromYardPlugin.new(type: type)
# In a generator
type = env[meth].type
Here, DataFromYardPlugin
contains various information along with type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple plugins that want to attach additional information to meth
, the approach may be fragile. (One plugin may overwrite env[meth]
set by another plugin.)
Can we provide more safer mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two ways of overriding: intentional and unintentional.
If the overriding is unintentional, it should be prevented. It can be done with something like below:
require 'logger'
module RDoc
class Environment
def initialize
@data = {}
@logger = Logger.new($stderr)
@logger.level = Logger::INFO
@hooks = {}
end
def [](key)
@data[key]
end
def []=(key, value)
old_value = @data[key]
@data[key] = value
@logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
trigger_hooks(key, old_value, value)
end
def delete(key)
@logger.debug("Delete env[#{key.inspect}]")
old_value = @data.delete(key)
trigger_hooks(key, old_value, nil)
end
def keys
@data.keys
end
def fetch(key, default = nil)
val = @data.fetch(key, default)
@logger.debug("Fetch env[#{key.inspect}] => #{val.inspect}")
val
end
def to_h
@data.dup
end
def logger
@logger
end
def on_key_set(key, &block)
(@hooks[key] ||= []) << block
end
private
def trigger_hooks(key, old_value, new_value)
return unless @hooks[key]
@hooks[key].each do |hook|
begin
hook.call(key, old_value, new_value, self)
rescue => e
@logger.error("Error in hook for key #{key.inspect}: #{e.message}")
raise
end
end
end
end
end
env = RDoc::Environment.new
env[:foo] = 'foo'
env.on_key_set(:foo) { raise 'Do not override :foo }
env[:foo] = 'bar' # => Error
In this approach, preventing override is a responsibility of those who want to do so. In other words, this is not automatic. This is good because if it allows other plugins to override it, it just leaves it overridable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that attaching information to a code object is a common pattern for RDoc plugins.
(In your example, attaching type information to each method object.)
If the conflict is a rare situation, your approach may work. But if it's a common pattern, your approach will be inconvenient.
Do you think whether attaching information to a code object is a common pattern or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarification, then yes, extracting information from somewhere like comment string and attaching it to code objects would be typical for plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then env[object] = XXX
and env.on_key_set(object) { raise "Do not override #{object.inspect}" }
will be inconvenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, preventing override by default is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the prevention happen, what should we (users? plugin developers) do as the next action?
Do we change key for env
from env[object] = ...
to something? What key should be used for the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current idea is to limit Hash-like method ([]=
) to only add value, and if existing data is there, it raises an exception. Instead we offer update
method so that users (mostly plugin authors) can intentionally override existing values.
def []=(key, value)
old_value = @data[key]
raise if old_value
@data[key] = value
@logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
trigger_hooks(key, old_value, value)
value
end
alias add []=
def update(key, value)
old_value = @data[key]
@data[key] = value
@logger.debug("Set env[#{key.inspect}] = #{value.inspect}")
trigger_hooks(key, old_value, value)
value
end
current_indentation_level = line[/^#\s*/]&.size || 0 | ||
if current_indentation_level >= @base_indentation_level + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work when the first line is # ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly this indentation-related implementation is not perfect and will fail for many edge cases. I'll definitely add test cases for this.
} | ||
def initialize(comment) | ||
@comment = comment | ||
@parsed_comment = ParsedComment.new([], nil, [], []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can use a local variable not an instance variable for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thank you for pointing out.
desc = matchdata[:desc1] || matchdata[:desc2] | ||
parsed_type = TypeParser.parse(type) | ||
@parsed_comment[:param] << ParamData.new(type: parsed_type, name: name, desc: desc) | ||
@mode = :param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing previous_tag here instead of storing mode = :param
, appending part will be simple
tag = ParamData.new(type: parsed_type, name: name, desc: desc)
@parsed_comment[:param] << tag
previous_tag = tag
# Append to the previous tag
- data = @mode == :param || @mode == :raise ? @parsed_comment[@mode].last : @parsed_comment[@mode]
- data.append_desc(line)
+ previous_tag.append_desc(line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I'll take this!
type_name: /#\w+|((::)?\w+)+/, | ||
literal: /(?: | ||
'(?:\\'|[^'])*' | | ||
"(?:\\"|[^"])*" | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some regexps are wrong.
"\"
(escaped, quote is not closed) matches to this string literal regexp.
Integer and Float literal(integer part) matches to type_name: /#\w+|((::)?\w+)+/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't notice this. I tested this code with some variations of type and it seems working. Tests are missing and we need to add them to ensure this regex working correctly since it's quite complex.
# Without calling this, plugins won't work | ||
|
||
def self.activate_with(rdoc = ::RDoc::RDoc.current) | ||
@@rdoc = rdoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using class variable for this means any later plugin calls activate_with
with its own RDoc
will apply that to all other plugins. I don't think this is what we want?
class Base
@@name = "Base"
def name
@@name
end
def self.update_name(name)
@@name = name
end
end
class Foo < Base
end
class Bar < Base
end
Foo.update_name("Foo")
f = Foo.new
Bar.update_name("Bar")
b = Bar.new
puts f.name # => "Bar"
puts b.name # => "Bar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's safe to use a class variable here since we don't have to update rdoc
object.
However, some information such as a name of the plugin, should be stored as a class instance variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If we don't expect plugins to call this method, then I don't think it should be in the base class as it is.
- It doesn't look like the yard plugin needs this attribute yet? If that's the case, let's not introduce this until we have a real use case to help us define the API better.
- If we want to pass RDoc's states down to plugins, let's avoid passing the
RDoc
object unless we have a solid use case for it. There's a lot of unnecessary coupling between RDoc's major components and I have been untangling for a while (example). ExposingRDoc
directly will potentially make it harder for such work as all of its attributes & methods will be considered public APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I didn't notice that it's not used at all now. I used it at first, but store
has enough information for the plugin so I removed it.
Then we can remove this class variable. And I agree, we should avoid RDoc::RDoc
object (thank you for untangling it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd expect we expose Options
and Store
objects first if we need to share states with plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it turned out that we do use rdoc
object here because we need some global data store for event handling.
I still agree that we should avoid RDoc::RDoc
instance, but then where should we put these event registration logic? Maybe we can introduce something like PluginManager
and pass it to plugins?
This PR introduces the event-based plugin system along with a YARD-parsing plugin as an example of concrete plugins.
It demonstrates how we can build practical plugins while keeping the core code small. In this case, YARD-related logic is encapsulated in a plugin and used only when users specify with
--plugins
option.This YARD plugin successfully parses https://github.com/okuramasafumi/alba comments written with YARD, but doesn't print YARD-specific information such as types since current RDoc doesn't support type information.
This PR is completely proof of concept, and is not meant to be merged now.
I'd like to discuss if this is the right way to extend RDoc for developers, and even for core team, and improve maintainability.
The original discussion is #1257