Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment 2.1: Build Automation Tool #60

Closed
wants to merge 51 commits into from
Closed

Assignment 2.1: Build Automation Tool #60

wants to merge 51 commits into from

Conversation

ankurdubey521
Copy link

So midway through I decided I wanted to do this assignment in Python, mainly because

  1. my skills in python are quite rusty and based on a lot of googling
  2. was having quite a lot of trouble with setting up execSync in node
    I therefore request you to be stricter with your analysis of the code

@@ -0,0 +1,15 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these xml files should not be part of your PR. Add them to your .gitignore file, since they are relevant only for you.


class TestJsonParser(unittest.TestCase):
def test_sample_json_parser_correctly(self):
json_path = 'json/test1.json'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input to your BuildConfig class should be a directory, not a specific file.
The file name "build.json" is fixed, so "test1.json" violates that pattern.

You can add the "build.json" suffix in the BuildConfig class, instead of relying on the caller to add it.

config = BuildConfig(json_path)
self.assertEqual(config.command_names(), ['clean', 'test'])
self.assertEqual(config.deps('clean'), ["algorithms/clean"])
self.assertEqual(config.command('clean'), "rm -f test.o && rm -f test.exe")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, these utility methods can definitely be useful, but I'd recommend better naming here.
In this case, adding a "get_" prefix to all of them will make it clear that these are not "set_" or "has_" or "check_" or whatever else methods.


class TestBuilder(unittest.TestCase):
def test_compilation(self):
path = '/home/ankurdubey/Code/Github/project-omega/solutions' \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is non-transferrable. If I downloaded this code in my machine, this test would fail, which is bad.

# Parse JSON
json_file = open(json_path)
self._raw_json = json.load(json_file)
json_file.close()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use something like:

with open(json_path) as file_handle:
  _ = json.load(file_handle)


# Parse Dependencies First
try:
deps = config.deps(command)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I wanted to run "build algorithms/sort_bubble"?
Since that is not in the current directory, this code would not find it (even though you have the relative path).

dep_containing_folder_path, dep_command = dep.rsplit('/', 1)
dep_containing_folder_path = containing_folder_path + "/" + dep_containing_folder_path
execute(dep_command, dep_containing_folder_path)
except ValueError:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError can be generated by so many things that I have no idea when this will happen. Maybe you want to use a more specific exception?


def test_nonzero_exit_code(self):
command = "exit 1"
exit_code = run(command, print_command=False, cwd='/')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print_command=False

This should be a default argument value.

# Command in Child Folder
dep_containing_folder_path, dep_command = dep.rsplit('/', 1)
dep_containing_folder_path = containing_folder_path + "/" + dep_containing_folder_path
execute(dep_command, dep_containing_folder_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add checks for circular dependencies.

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

files = None
if 'files' in entry:
files = entry['files']
return Command(name=name, command_string=command_string, deps=deps, files=files)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a Command object on demand, why not do all this in your init method, so that it only needs to be done once?

self._name = name
self._command_string = command_string
if files is None:
self._hasFiles = False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really a separate variable for _hasFiles? Why not just check if _files is None? In languages like C++, where a list variable cannot be null, this makes more sense, but not sure it is needed in python.

def get_dependencies(self):
if self._hasDependencies:
return self._dependencies
raise Command.NoDependenciesException("No Dependencies for {}".format(self._name))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even think you need to raise exceptions like this. Ie - if deps aren't specified, the default meaning is that there are no deps, so you can just return an empty list.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ What do you think of this?

try:
deps = command.get_dependencies()
if not dry_run:
print("Executing {} dependencies for {} in {}...".format(len(deps), command_name, command_dir_abs))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look too useful, since it doesn't include dependencies.
Can we make the dry_run mode print out all the rules/commands it would have executed?
Output like:

> build run --dry-run
[algorithms/sort_bubble] g++ -c sort_bubble.cpp
[algorithms/sort_merge] g++ -c sort_merge.cpp
[algorithms/sort_quick] g++ -c sort_quick.cpp
[test] g++ -std=c++11 -c test.cpp
[run] g++ algorithms/sort_bubble.o algorithms/sort_merge.o algorithms/sort_quick.o test.o -o test.exe && ./test.exe

import subprocess


def run(command_string, cwd, print_command=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just make this a method in your Builder class below.

class TestJsonParser(unittest.TestCase):

@classmethod
def setUpClass(cls):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively,

def setUp(self):
  self.config = BuildConfig('json')

avoids dealing with class level stuff.


def test_compilation_basic(self):
builder = Builder()
path = os.getcwd() + '/test_builder_files/test_compilation_basic'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that you're in a very specific directory.
Maybe something like https://stackoverflow.com/questions/5137497/find-current-directory-and-files-directory makes it possible to run a test from any location?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, you addressed this concern, but in a way that I didn't expect.

path = os.getcwd() + '/test_builder_files/test_commands_referenced_from_root'
builder._build_rule_handler('run', path, path, dry_run=True)
output_file_path = path + '/output'
self.assertEqual(False, os.path.isfile(output_file_path))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertFalse(

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent progress!

Now for the final requirement: Use multithreading to execute dependency build rules in parallel when possible. And demonstrate the performance benefit via a test, possibly using sleep commands in the command string.

@@ -0,0 +1,2 @@
/venv/
.idea
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it is possible to add .gitignore here to exclude this file too.


class Command:
"""Stores Command Information"""
def __init__(self, *, name: str, command_string: str, deps: List[str] = None, files: List[str] = None) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the non-intuitive distinction between Command and command_string, call the class BuildRule?

def get_dependencies(self):
if self._hasDependencies:
return self._dependencies
raise Command.NoDependenciesException("No Dependencies for {}".format(self._name))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ What do you think of this?

if command_dir_abs.endswith('/'):
command_dir_abs = command_dir_abs[:-1]
if self._root_dir_abs.endswith('/'):
self._root_dir_abs = self._root_dir_abs[:-1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to do this validation once at the beginning, instead of during every call to this method.

def _get_file_edit_times(self, file_list: List[str]) -> Dict[str, int]:
"""Populate a Dict of last edit times of files and return it"""
file_edit_times = {}
for file in file_list:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more descriptive variable name:

for file_path in file_list:

output_file = file_path + "output.txt"
with open(input_file, 'w') as file:
file.write("Hello World 1.0")
with open(output_file, 'w') as file:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/file/file_handle/


# Activate Watcher and change tracked file
file_watcher = FileWatcher()
process = Process(target=file_watcher.watch_and_execute, args=([input_file], copy_file, input_file, output_file))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, you're using a separate process here.



class Builder:
def __init__(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you're using this, the constructor must be followed by a call to execute_build_rule. That kind of expectation is unexpected, and while in theory, you could explain it in documentation, there is a better way:

Make execute_build_rule a @classmethod, which creates the object, and then calls _build_rule_handler too. That way, you only need to call one method from outside this file.

def watch_and_execute(self, file_list: List[str], function: Callable[[any], any], *args: any) -> None:
"""Execute a function whenever a file from file_list changes"""
print("Listening for changes on {}...".format(file_list))
file_edit_times = self._get_file_edit_times(file_list)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are making an assumption here that the file list is fixed, which IMO we can avoid.
Ie - if you re-explore the build-rule-graph each time, you can detect and appropriately handle changes to the build.json files too. Yes, it is costlier (so I guess you can make it configurable) but might be more useful too, right?

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm pretty happy with the overall state of things, we're almost done.

My comments are now primarily to clean up inconsistencies, improve readability (stuff that was affecting the overall score) and a couple of minor extensions for learning purposes.

Btw, I tend to unashamedly ask for naming changes that span multiple files. Because if you have good tests, even large refactors should be cheap and safe.


# Cleanup
with open(input_file, 'w') as file_handle:
file_handle.write("Hello World 1.0")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have to reset this?

dependency_graph[node_tuple[0]] = []
for item in parallel_builder._dependency_graph[node_tuple]:
dependency_graph[node_tuple[0]].append(item[0])
pprint.pprint(dependency_graph, indent=4)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for debugging, but you wouldn't want this if you're running thousands of tests in parallel, which is what happens in large codebases. All the outputs will interfere with each other, and most people running the tests wont even understand the output.

parallel_builder = ParallelBuilder(path, MAX_THREAD_COUNT)
process = Process(target=parallel_builder.execute,
args=('Z', path))
sleep(16)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only example that I checked the file contents for, because without that, I couldn't see where the number 16 came from.

@@ -0,0 +1,84 @@
"""" Sample build.json
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an fairly detailed and specific example. Better suited for a readme file?

toposort.append(front)
if front in graph_adj_list:
for dest_node in graph_adj_list[front]:
if not visited[dest_node]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this line for a while ... trying to figure out when it will be false. Can you help me out?

Comment on lines 9 to 10
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating separate logger objects for each file, I think you should be passing around a single one (provided to each class as part of its constructor). That way, you only need to configure it once from your "application section" of the code.

The current setup is not thread-safe: ie - I'm trying to run multiple instances of your ParallelBuilder class for whatever reason in parallel, and I want their output to go to different destinations, that isn't possible.

Comment on lines 64 to 65
if self._root_dir_abs.endswith('/'):
self._root_dir_abs = self._root_dir_abs[:-1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the constructor now?


# Parse build.json in current directory
config = BuildConfig.load_from_build_directory(command_dir_abs)
command = config.get_build_rule(command_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the usage, s/command_name/build_rule_name/

Comment on lines 93 to 101
# Update _dependency_graph
if dependency_tuple not in self._dependency_graph:
self._dependency_graph[dependency_tuple] = []
self._dependency_graph[dependency_tuple].append(dependent_tuple)

# Update _dependency_list
if dependent_tuple not in self._dependency_list:
self._dependency_list[dependent_tuple] = []
self._dependency_list[dependent_tuple].append(dependency_tuple)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names "_dependency_graph" and "_dependency_list" are not intuitive: there's no way I'd figure out what is contained in them based on purely the name and, it is very easy to confuse between them.

Something like "_rule_name_to_dependency_rule_names" and "_rule_name_to_dependent_rule_names" makes it easier to figure out, although "rule_name" isn't right since you're using a tuple. I couldn't think of a better name while reviewing, but you get the point.

parallel_builder = ParallelBuilder(path, MAX_THREAD_COUNT)
parallel_builder.execute('Z', path)
toposort = [item[0] for item in parallel_builder._topologically_sorted_build_rule_names]
print(toposort)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you're done testing, remove print statements :)

Comment on lines +53 to +54
def get_last_build_pass_status(self) -> bool:
return self._last_build_passed
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this primarily for testing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 585 to 588
with tempfile.TemporaryDirectory() as tmpdir:
path = os.path.join(tmpdir, "test")
write_test_files(self._testMethodName, path)
parallel_builder = ParallelBuilder(path, MAX_THREAD_COUNT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these 4 lines are common across all your tests. How about using the setUp and tearDown methods to avoid the duplication?

result = file_handle.readable()
self.assertEqual(True, result)
# CLEANUP
parallel_builder = ParallelBuilder(path, MAX_THREAD_COUNT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cant reuse the existing ParallelBuilder object?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

path = os.path.join(tmpdir, "test")
write_test_files(self._testMethodName, path)
parallel_builder = ParallelBuilder(path, MAX_THREAD_COUNT)
parallel_builder._explore_and_build_dependency_graph('Z', path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative optional design suggestion: instead of having to call this manually, how about doing it from the ParallelBuilder constructor? Instead of exploring dependencies for a single rule, it creates the graph for everything under that directory (not specific to any rule, because that isn't actually needed), and then you can choose to execute a single rule (for which the information is now already cached).

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this is sufficiently close to the end goal I had in mind. As always, there is more that can be done, but I'm going to approve, since I understand that it can be boring to keep working on the same problem for so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants