Metadata | |
---|---|
cEP | 14 |
Version | 1.0 |
Title | Creating .coafile sections using quickstart |
Authors | Satwik Kansal mailto:[email protected] |
Status | Active |
Type | Process |
This cEP proposes a framework for coala-quickstart to generate more relevant
.coafile
sections based on project files and user preferences.
- Find all the files from the project directory, check
.gitignore
for files to ignore. - Get the languages used in the project and their percentage from file extensions.
- Filter bears based on the languages used in the project and curated sets of important bears.
- Generate a
.coafile
by creating a section for each language, adding filtered bears corresponding to the language, and prompting the user for non-optional settings.
This first objective involves modifying the existing filter_relevant_bears
method in the Bears module in coala-quickstart to select and filter bears based
on:
- Languages detected in the project.
- Linters already used by the project.
- The current
IMPORTANT_BEARS_LIST
inConstants.py
- Information provided by users regarding the capabilities (
CAN_DETECT
,CAN_FIX
labels) - Installed bear dependencies
The second objective involves filling in the setting values for .coafile
sections.
Based on the information they may provide, the target files of InfoExtractor
classes can be categorized into following two categories:
- Dependency files (Examples:
package.json
,requirements.txt
andGemfile
) - Setting-values file (Examples:
.editorconfig
,.gitignore
and.csslintrc
)
Some files might belong both the categories (like Gruntfile.js
and
Gulpfile.js
).
The dependency files can be used to prompt users to activate certain Bears if
all the dependencies in Bear's REQUIREMENT
field are present in the dependency
files. The setting-values files, on the other hand, are used to fill in the
settings of the Bears and to detect conflicting configurations in the project.
Both categories of files, taken together, can complement each other to provide a scenario where "A dependency file finds BearA that can be activated, and the settings of BearA are derived by the information extracted from setting-values files" thereby automating the complete section generation process in some cases.
This section explains the meta-data that is required to be collected for filtering bears and generating sections. The data to be collected at runtime is as follows:
- Inverse Mappings of requirement types to the bears.
REQUIREMENTS_META = {
“requirement_name” : {
“requirement_type” : NpmRequirement,
"version" : ">=0.4.2"
“bears” : [ list, of, bears, wrapping, the, executable],
}
}
This will be used to match against the ProjectDependencyInfo
extracted from
the project. In the case of a match, the corresponding bears may be suggested to
the user.
- Mapping of language and the corresponding Bears available in coala-bears.
{
"language_1": ["Bear1", "Bear2"],
"language_2": ["Bear3", "Bear4"]
}
This will be generated at the runtime and will be used to filter bears based on languages used in the project.
- Inverse Mappings of the
CAN_DETECT
andCAN_FIX
capabilities metadata provided in the bears to the name of the bears. Example:
CAPABILITIES_MAP = {
“capability_name” : {
"lanaguage_name": {
"DETECT": [bears, that, can, detect, the, capability, for, the, language]
"FIX": [bears, that, can, fix, the, capability, for, the, language]
},
},
}
The users will be prompted to pick the desired capabilities in the beginning via
the CLI, relevant bears are selected (with non-overlapping capabilities) and
become part of the final generated .coafile
. In future, these labels can be
replaced or complemented by DETECT_ASPECTS
and FIX_ASPECTS
to enable users
to provide their choices for "aspects" while selecting the bears.
- Unified represenation of information extracted from
InfoExtractors
for project files like.editorconfig
,Gruntfile.js
,package.json
,.csslintrc
, etc (see this issue and cEP-0009 for more details)
{
"info_name": [InfoInstance1, InfoInstance2,]
}
The Info
instances encapsulate the extracted information from project files.
The various kinds of Info
classes will be utilized on case-by-case basis.
Example, IgnorePathsInfo
may be used to exclude files from analysis right from
the beginning, ProjectDependencyInfo
might be used to match against the
REQUIREMENTS_META
discussed in point 1, StyleInfo
may provide values to fill
inside the generated .coafile
sections.
- Mechanism to extract all the non-optional settings of a Bear at the runtime.
This section discusses the step-by-step procedure of selecting relevant bears
and then creating .coafile
sections from them.
-
Identify all the languages used in the project.
-
Ask users to select capabilities based on possible
CAN_DETECT
andCAN_FIX
label values. Store them in a listdesired_capabilities
like:
desired_capabilities = ["capability1", "capabilty2"]
The "FIX" capabilites are stricter than "DETECT" capabilities. If a Bear has a
capability to fix some aspect, it is obvious that it has capability to detect
it. This fact will be taken into account while filtering bears by capabilities
at a later step. The desired_capabilities
list may also be pre-populated with
some important capabilities (on similar lines as the existing
IMPORTANT_BEARS_LIST)
- Generate a dictionary named
candidate_bears
of the form
candidate_bears = {
"detected_language_1": {"BearA", "BearB"},
"detected_language_2": {"BearB", "BearD"},
"all": {"BearE", "BearF", "BearG"}
}
It is possible to have overlap among the lists in candidate_bears
which will
be resolved at later stage.
-
Initialize empty dictionary named
to_propose_bears
. -
Initialize a dictionary named
selected_bears
toIMPORTANT_BEARS_LIST
. -
Given the metadata mentioned in the previous section is available, filter the
candidate_bears
lists as:- Remove those bears from
candidate_bears
which do not contain any of the capability indesired_capabilites
set. - Among the remaining bears in
candidate_bears
, if someProjectDependenyInfo
extracted from the project matches theREQUIREMENTS
field of bear (matching might also involve checking the compatibility of semvers), move the tuple(Info, Bear)
toto_propose_bears
.
- Remove those bears from
-
For every bear in
to_propose_bears
:- If there's enough extracted information from the project to fill in all
the non-optional settings of the Bear, move the Bear to
selected_bears
list. - Else, prompt the user on CLI with the evidence (the source of
Info
instance in the tuple) to enable the bear. If the user agrees, move the bear toselected_bears
, otherwise discard the bear.
- If there's enough extracted information from the project to fill in all
the non-optional settings of the Bear, move the Bear to
-
For each of the remaining bears in
candidate_bears
, eliminate bears having capabilities already satisfied inselected_bears
. -
Among the remaining bears in
candidate_bears
, in the case of conflicts in capabilities among the bears in the list:- Eliminate bears having no unique capabilities among the other bears present in the list.
- Give preference to the bears already having dependencies installed (using
the
check_prerequisites
field of the bears) and eliminate the respective conflicting bears. - Add the remaining bears to
selected_bears
list.
-
For every non-optional setting of each bear
selected_bears
object,- If the setting's value exists in extracted information, do nothing.
- Else,
- If the quickstart is being executed in interactive mode, prompt the user for the setting value.
- If non-interactive mode, discard the bear.
-
Finally, create sections from the
selected_bears
list using the collected information values from user inputs and project files.
In future, more constraints can be added to further narrow down the list of
selected bears by giving users more flexibility in terms of options like
max_bears
, max_bears_per_language
, exclude_languages
, etc.
To automatically fill in some of the setting values in .coafile
sections, the
Info
classes need to be mapped to settings values of the Bears. These scope of
applicability of these Info
instances may vary from "bear-specific", to
"section" to "global". The scope for a mapping may be defined based on the:
- Source of the
Info
instance - Value of the
Info
instance - Class of the
Info
instance - Any other attribute encapsulated in the
Info
instance
The scope of the information can be represented by the InfoScope
class as
follows:
class InfoScope:
def __init__(self,
level,
sections=[],
bears=[],
allowed_sources=[],
allowed_extractors=[]):
"""
A class representing scope of applicability of an
``Info`` instance.
:param level: Broad-level scope, possible values are
"global", "section", and "bear".
:param sections: list of section names, considered only for
the ``level`` "section" and "bear".
:param bears: list of bear names, considered only for the
"bear" ``level``.
:param allowed_sources: list containing names of the sources of
``Info`` classes which fall within this
scope, empty list will means the ``Info``
instance is applicable for all the sources.
:param allowed_extractors: list of allowed ``InfoExtractor`` derived
classes for the scope.
"""
self.level = level
if level=="sections":
self.sections = sections
elif level=="bears":
self.sections = sections
self.bears = bears
self.allowed_sources = allowed_sources
self.allowed_extractors = allowed_ extractors
def check_belongs_to_scope(self,
section_name,
bear_name):
"""
Checks if the given section_name and bear_name
belong to the InfoScope or not.
"""
if self.level=="global":
return True
elif self.level=="sections":
if section_name in self.sections:
return True
elif self.level=="sections":
if (section_name in self.sections and
bear_name in self.bears):
return True
return False
def check_is_applicable_information(self, info):
"""
Checks if the given ``Info`` instance contains
information applicable to the InfoScope of not.
"""
if (info.source in self.allowed_sources or
is_instance(info.extractor, self.allowed_extractors)):
return True
return False
The InfoScope
instances serve following two purposes while trying to fill an
extracted information value to a setting based on mapping:
- Check if the mapping if applicable for a given bear and a given section.
- Check if the
Info
instance is valid based on its extraction source and the extractor class.
The mappings will be defined similar to the following way:
INFO_SETTING_MAPS = {
"setting_key_1": [
{
"scope": InfoScope(level="global"),
"info_kind": SomeInfoClass,
"mapper_function": function_to_map_value_in_info_to_setting_value
},
{
"scope": InfoScope(level="bear",
bears=["SomeBear"]),
"info_kind": AnotherInfoClass,
"mapper_function": function_to_map_value_in_info_to_setting_value
},
],
"setting_key_2": [
{
"scope": InfoScope(level="sections",
sections=["python"]),
"info_kind": InfoClassA,
"mapper_function": function_to_map_value_in_info_to_setting_value
},
],
"ignore": [
{
"scope": InfoScope(level="global"),
"info_kind": IgnorePathsInfo,
"mapper_function": lambda x: x if is_valid_glob(x)
}
]
}
The mappings will be stored in a module named InfoMappings.py
in
coala-quickstart inside the info_extraction
package.
An attempt to autofill the setting's value can be made before prompting the user to fill the values. This can be done as follows:
def autofill_value_if_possible(self,
setting_key,
section,
bear,
extracted_information):
"""
For the given setting configurations, checks if there is a possiblity of filling it's value from the extracted information, and returns the values if they are applicable.
"""
if INFO_SETTING_MAPS.get(setting_name):
for mapping in INFO_SETTING_MAPS[setting_key]:
scope = mapping["scope"]
if (scope.check_belongs_to_scope(
section, bear)):
# look for the values in extracted information
# from all the ``InfoExtractor`` instances.
values = extracted_information.get(mapping["info_kind"])
for val in values:
if scope.check_is_applicable_information(val):
yield val
return None
Another interesting application of the unified representation of extracted-information is detecting the inconsistencies among different project configurations. The intra-scope conflicts due to different Info from different sources can be brought up to users consideration. The inter-scope conflicts can be fixed either based on level-priority (global < section < bear) or by prompting users to select their preferred value.
The inconsistencies can be detected as
to_fill_values = list(autofill_value_if_possible(some_setting,
some_section,
some_bear))
if len(to_fill_values) > 1:
# warn the user about inconsistency.
The inconsistencies need to be resolved before using the Info
values to fill
in the settings of .coafile
sections. As a simple solution, user can be
prompted like
> We found conflicting options in your file_a and your file_b, which should assume dominance?
1. file_a
2. file_b
3. override both and specify your value
After the bears are filtered, and relevant information has been extracted out of the projects, the users are notified, a more tailored version of coafile is written. The per-language sections are generated like before, but with some additional bears included based on relevant information discovered from project and user's preference (CLI input). The users are also prompted for the values like before, but an attempt is made to autofill these values before prompting the users. If there are multiple possible values to autofill, such inconsistencies are brought to user's consideration allowing him to pick any one or them or override all of them.
- Robust mode (activate all the bears)
- Non-interactive mode (choose the best
.coafile
without user interference)- With default capabilities. (default option for non-interactive modes)
- Without any predefined capabilities.
- Interactive mode (prompt for setting values)
- With user provided capabilities. (default option for interactive mode)
- With default capabilities.
- Without capabilities.
There will be a filter_by_capabilties
option which will be prompted to users
in normal mode. In non-interactive mode, a command line flag
--no-default-capabilities
can be passed to disable filtering by
default_capabilities. In case the filter_by_capabilties
option is disabled,
steps 8,9 and 10 in "Filtering and selecting bears from collected meta-data"
section will be skipped.
A possible set of deault capabilties can be:
- Syntax
- Formatting
- Documentation
- Spelling
- Code Simplification
- Smell
- Redundancy
- The user runs coala-quickstart and specifies project directories.
- Languages are identified in the project.
- All the
InfoExtractor
classes are instantiated, and the extracted information is aggregated and stored. - The user is asked if he wants to select bears based on capabilities.
- The filtering and selection of bears take place as described in sections above.
- After the bears are selected for every language present in the project, coala-quickstart tries automatically to fill the setting values for the bears, failing which, the user is prompted for the value. In case some inconsistency is encountered here, it is brought to user's consideration.
- The per-language sections are generated, and final
.coafile
is written.
- The user runs coala-quickstart and specifies project directories.
- Languages are identified in the project.
- All the
InfoExtractor
classes are instantiated, and the extracted information is aggregated and stored. - The filtering and selection of bears take place as described in sections above.
- After the bears are selected for every language present in the project, coala-quickstart tries automatically to fill the setting values for the bears, failing which, the bear is discarded. In case some inconsistency is encountered here, an attempt is made to resolve it based on scope, failing which, the bear may be discarded, or any one of the possible value is filled in with a warning comment.
- The per-language sections are generated, and final
.coafile
is written.
from coala_quickstart.info_extractors.GruntfileInfoExtractor import (
GruntfileInfoExtractor)
from coala_quickstart.info_extractors.EditorconfigInfoExtractor import (
EditorconfigInfoExtractor)
from coala_quickstart.info_extractors.PackageJSONInfoExtractor import (
PackageJSONInfoExtractor)
from coala_quickstart.info_extractors.PackageJSONInfoExtractor import (
GemfileInfoExtractor)
def collect_info(project_dir):
gruntfile_info = GruntfileInfoExtractor(
["Gruntfile.js"], project_dir).extract_information()
editorconfig_info = EditorconfigInfoExtractor(
[".editorconfig"], project_dir).extract_information()
package_json_info = PackageJSONInfoExtractor(
["package.json"], project_dir).extract_information()
gemfile_info = GemfileInfoExtractor(
["Gemfile"], project_dir).extract_information()
extracted_info = aggregate_info(
gruntfile_info, editorconfig_info, package_json_info, gemfile_info)
return extracted_info
def aggregate_info(infoextractors):
"""
Aggregates inforamtion extracted from multiple ``InfoExtractor``
instances to one dictionary.
"""
result = {}
for ie in infoextractors:
# the fname key level can be removed from the current implementation
# of the way information is stored as file name is already stored
# in source attribute of ``Info`` classes.
for fname, extracted_info in ie.items():
for info_name, info_instances in extracted_info.items():
if result.get(info_name):
result[info_name] += info_instances
else:
result[info_name] = info_instances
return result
from collections import defaultdict
from pyprint.NullPrinter import NullPrinter
from coalib.settings.ConfigurationGathering import get_filtered_bears
from coalib.output.printers.LogPrinter import LogPrinter
from coalib.misc.DictUtilities import inverse_dicts
used_languages = ["languages", "used", "in", "project"]
def get_all_bears(languages):
"""
Collects all the bears corresponding to the provided languages and
returns them as a list.
:param languages: list of languages.
:return: list of all the bears that can be suitable for
the given list of languages.
"""
local_bears, global_bears = get_filtered_bears(
languages, LogPrinter(NullPrinter()), None)
all_bears = inverse_dicts(local_bears, global_bears).keys()
return list(all_bears)
def get_all_bears_by_lang(languages):
"""
Return a dict with language names as keys and the list of suitable
bears as values.
"""
all_bears_by_lang = {
lang: set(inverse_dicts(*get_filtered_bears(
[lang], LogPrinter(NullPrinter()), None)).keys())
for lang in languages
}
return all_bears_by_lang
def generate_capabilties_map(bears_by_lang):
"""
Generates a dictionary of capabilities, languages and the corresponding bears from the given ``bears_by_lang`` dict.
:param bears_by_lang: dict with language names as keys
and the list of bears as values.
:returns: dict of the form
{
"language": {
"detect": [list, of, bears]
"fix": [list, of, bears]
}
}
"""
def nested_dict():
return defaultdict(dict)
capabilities_meta = defaultdict(nested_dict)
# collectiong the capabilities meta-data
for lang, bears in bears_by_lang.items():
can_detect_meta = inverse_dicts(
*[{bear: list(bear.CAN_DETECT)} for bear in bears])
can_fix_meta = inverse_dicts(
*[{bear: list(bear.CAN_FIX)} for bear in bears])
for capability, bears in can_detect_meta.items():
capabilities_meta[capability][lang]["DETECT"] = bears
for capability, bears in can_fix_meta.items():
capabilities_meta[capability][lang]["FIX"] = bears
return capabilities_map
def generate_requirements_map(bears):
"""
For the given list of bears, returns a dict of the form
```
{
“requirement_name” : {
“requirement_type” : NpmRequirement,
"version" : ">=0.4.2"
“bears” : [ list, of, bears, wrapping, the, executable],
}
}
```
"""
requirements_meta = {}
for bear in bears:
for req in bear.REQUIREMENTS:
to_add = {
"bear": bear,
"version": req.version,
"type": req.type
}
if requirements_meta.get(req.package):
requirements_meta[req.package].append(to_add)
else:
requirements_meta[req.package] = [to_add]
return requirements_meta
import copy
from coalib.settings.ConfigurationGathering import get_filtered_bears
from coala_quickstart.Constants import IMPORTANT_BEAR_LIST
from coala_quickstart.InfoExtraction.InfoMappings import INFO_MAPPINGS
def filter_relevant_bears(used_languages,
desired_capabilities,
extracted_info,
arg_parser=None):
"""
Filter bears based on used languages in the project and
the desired bear capabilities.
"""
all_bears_by_lang = {
lang: set(inverse_dicts(*get_filtered_bears([lang],
log_printer,
arg_parser)).keys())
for lang in used_languages
}
selected_bears = {}
candidate_bears = copy.copy(all_bears_by_lang)
to_propose_bears = {}
# Initialize selected_bears with IMPORTANT_BEAR_LIST
for lang in candidate_bears:
if lang in IMPORTANT_BEAR_LIST:
selected_bears[lang] = [
bear for bear in IMPORTANT_BEAR_LIST[lang]
if bear.name in candidate_bears[lang]]
candidate_bears[lang] = [bear for bear in candidate_bears[lang]
if bear not in selected_bears[lang]]
# Filter bears by desired_capabilities
capable_candidates = {}
for lang, lang_bears in candidate_bears.items():
# Eliminate bears which doesn't contain the desired capabilites
capable_bears = get_bears_with_given_capabilities(
lang_bears, desired_capabilties)
capable_candidates[lang] = capable_bears
# Use project_dependency_info to shortlist bears to be proposed.
project_dependency_info = extracted_info["ProjectDependencyInfo"]
for lang, lang_bears in capable_candidates.items():
matching_dep_bears = get_bears_with_matching_dependencies(
lang_bears, project_dependency_info)
# Remove these bears from capable_candidates list
# and move to to_propose_bears list
to_propose_bears[lang] = set(matching_dep_bears)
bears_to_remove = [match[0] for match in matching_dep_bears]
capable_candidates[lang] = [bear for bear in capable_candidates[lang]
if bear not in bears_to_remove]
# Proposing users to select a bear based on some extracted information
for lang, lang_bears in to_propose_bears.items():
for bear, associated_info in lang_bears:
# get the non-optional settings of the bears
settings = get_non_optional_settings(bear)
if not settings:
# no non-optional setting, select it right away!
selected_bears[lang].add(bear)
else:
user_input_reqd = False
for setting in settings:
if not is_autofill_possible(
extracted_info, setting.key, lang, bear):
user_input_reqd = True
if user_input_reqd:
# Ask user to activate the bear
if (not arguments.non_interactive and
prompt_to_activate(bear, associated_info)):
# select the bear!
selected_bears[lang].add(bear)
else:
# All the non-optional settings can be filled automatically
selected_bears[lang].add(bear)
# capabilities satisfied till now
satisfied_capabilities = get_bear_capabilties(selected_bears)
remaining_capabilities = [cap for cap in desired_capabilities
if cap not in satisfied_capabilities]
capable_bears = get_bears_with_given_capabilities(
capable_bears, remaining_capabilties)
# optional, remove conflicting capabilties and
# find a satisfying combination containing minimum
# number of bears.
capable_bears = remove_bears_with_conflicting_capabilties(
capable_bears)
# Add the remaining bears to selected_bears
for lang, lang_bears in capable_bears.items():
selected_bears[lang] += lang_bears
# Put language independent bears into "All" category
selected_bears = {lang: selected_bears[lang] - all_bears_by_lang[lang],
for lang, _ in selected_bears.items()}
return selected_bears
def get_bears_with_given_capabilities(bears, capabilties):
"""
Returns a list of bears which contain at least one on the
capability in ``capabilities`` list.
"""
result = []
for bear in bears:
can_detect_caps = [c for c in list(bear.CAN_DETECT)]
can_fix_caps = [c for c in list(bear.CAN_FIX)]
for cap, cap_type in capabilities:
if cap in can_fix_caps:
result.append(bear)
elif cap in can_detect_caps and cap_type=="DETECT":
result.append(bear)
return result
def remove_bears_with_conflicting_capabilties(bears_by_lang):
"""
Eliminate bears having no unique capabilities among the other bears present in the list.
Gives preference to:
- The bears already having dependencies installed.
- Bears that can fix the capability rather that just detecting it.
"""
result = {}
for lang, bears in bears_by_lang.items():
lang_result = set()
capabilties_map = generate_capabilties_map({lang: bears})
for cap in capabilties.map.keys():
# bears that can fix the ``cap`` capabilitiy
fix_bears = capabilties_map[cap][lang]["fix"]
if fix_bears:
for bear in fix_bears:
if not bear.check_prerequisites():
# The dependecies for bear are already installed,
# so select it.
lang_result.add(bear)
break
# None of the bear has it's dependency installed, select
# a random bear.
lang_result.add(random.choice(fix_bears))
break
# There were no bears to fix the capability
detect_bears = capabilties_map[cap][lang]["detect"]
if detect_bears:
for bear in detect_bears:
if not bear.check_prerequisites():
lang_result.add(bear)
break
lang_result.add(random.choice(detect_bears))
break
result[lang] = lang_result
return result
def is_autofill_possible(self,
extracted_info,
setting_name,
language,
bear):
"""
Checks if it is possible to autofill the setting values.
"""
if INFO_SETTING_MAPS.get(setting_name):
for mapping in INFO_SETTING_MAPS[setting_key]:
scope = mapping["scope"]
if (scope.check_belongs_to_scope(
section=language, bear=bear)):
values = extracted_information.get(mapping["info_kind"])
for val in values:
if scope.check_is_applicable_information(val):
return True
return False
def get_non_optional_settings(bear):
"""
Return a list of non-optional settings for a given bear.
"""
# Already implemented in quickstart
pass
def get_bears_with_matching_dependencies(
bears, dependency_info):
"""
Matches the `REQUIREMENTS` filed of bears against a list
of ``ProjectDependencyInfo`` instances.
Return a list of the tuples of the form
(bear, ProjectDependencyInfoInstance)
"""
result = []
requirements_map = generate_requirements_map(bears)
for req, req_info in requirements_map.items():
for dep in dependency_info:
# Check if names of requirements match
if dep.value == req:
if (are_compatible_versions(dep.version, req_info["version"])
and are_compatible_req_types(dep.type, req_info["type"])):
# Everyhing is fine, it's a match!
result.append((req_info["bear"], dep))
return result
def are_compatible_versions(semver1, semver2):
"""
:returns:
True if semver1 is latest or matches semvers 2,
False otherwise
"""
pass
def are_compatible_req_types(req_type_1, req_type_2):
"""
Checks if req_type_1 is a type of req_type_2.
"""
return isinstance(req_type_1, req_type_2)
selected_bears = filter_relevant_bears(
used_languages, desired_capabilities, extracted_info, arg_parser=None)
for lang, lang_bears in selected_bears:
generate_section(lang, lang_bears, extracted_info)
def generate_section(section_name,
section_bears,
arguments,
extracted_info):
settings = get_non_optional_settings(section_bears)
for setting_name, associated_bears in settings:
to_fill_values = autofill_value_if_possible(
extracted_info,
section_name,
associated_bears)
setting_value = None
if len(to_fill_values) > 1:
# warn the user about inconsistency.
# ask user for correct value
elif len(to_fill_values)==1:
setting_value = to_fill_values[0]
else:
if arguments.non_interactive:
continue
setting_value = ask_for_setting_value(
setting, associated_bears)
# store setting values
# create sections finally