Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution for Assignment2. #66

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Abhilasha06
Copy link

No description provided.

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, good start!
Next step is to re-organize your code a bit to simplify the design, without any change in functionality, so that it becomes easy to add the next feature while respecting "modularity".

import threading
import time

class Action():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have these classes called Action and ActionGraph, but I still don't quite understand the difference in responsibilities between them. Can you update the code so that:

  • Action represents a single parsing a single json object / running a single command. While writing that code, you cannot make any assumptions about there being multiple actions.
  • ActionGraph contains ALL actions across ALL directories, and deals with operations that span multiple actions. It uses the Action class to perform underlying operation.

You should be able to do things like:

action = Action(raw_data)
action.get_dependencies()
action.start_command()
# the above methods are only invoked from inside ActionGraph

graph = ActionGraph(root_dir)  # recursively explores root_dir to load all build.json files
and it creates a single map<name, Action>
graph.execute(action_name)  # execute dependencies and then execute specified action
# the above methods are invoked by your main section

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still pending. Based on the above, I was hoping to see something like the following in your tests:

graph = ActionGraph("wherever/code")
# the above creates all the Action objects internally, so the user never deal with them directly

graph.execute("build")
graph.execute("test_all")


thread = threading.Thread(target = self.process, args = (command, work_dir))
thread_list.append(thread)
time.sleep(2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have to sleep?

Comment on lines 10 to 14
def test_build_test_all(self):

BAT_obj = BAT.Action()

BAT_obj.get_command(os.getcwd(), 'build', 'test_all')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run 2 copies of this test in parallel, they will interfere with each other.
Can you use the tempfile module to create a new unique directory inside /tmp, copy the relevant code in there, and then run the commands using that new directory as the root?


with tempfile.TemporaryDirectory() as tmpdirname:

shutil.copytree("C://Users//Abhilasha//code",tmpdirname+"//code")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This path is hardcoded to your machine, and so is not reusable. You probably want to put the code directory inside this repository, and then use relative paths, so that anyone can clone the repo and run the test.
  • Also, the "//" is true for windows, but not linux. Use os.path.sep instead.
  • Since this tempfile stuff is common across tests, you could put it in the setUp method. Relevant documentation.

import threading
import time

class Action():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still pending. Based on the above, I was hoping to see something like the following in your tests:

graph = ActionGraph("wherever/code")
# the above creates all the Action objects internally, so the user never deal with them directly

graph.execute("build")
graph.execute("test_all")




def plot_graph(self, text, cwd, order):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name suggests that you're still thinking of mathematical graphs with x/y-axes, etc.
In case of CS graphs, create/process/execute are acceptable terms too, depending on what this is doing.
If you want be even more precise in this case, you can use a name like "def compute_execution_order" or something. As usual, the goal is to figure out exactly what the method does, based on only the name.


dependency = str(dependency).split('/')[1]

with open(os.path.join("algorithms//build.json")) as json_file:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is hard-coded, and therefore not allowed.
The assumption you're allowed to make is that the file name containing the actions is called "build.json". You can save that in some constant if you wish, but then you have to compute the rest of the location based on the context.

return


def execute_commands(self):
Copy link
Owner

@kaustubh-karkare kaustubh-karkare Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, interesting.
You're doing this in batches of actions with the same order value!
This is definitely better than the original approach where everything was serial.

The case where it doesn't work is this:

A (1s) -> B (1s) -> C (1s) -> D (1s) -> E (1s) -> Final <- G (5s)

So while G was running, we could have run A-E also, since they take 5s overall.
I think in your current scheme, E & G will happen in parallel, but before that, A-D will happen separately.
Can you figure out a way to support that?

solutions/Abhilasha/Assignment2/BuildAutomationTool.py Outdated Show resolved Hide resolved
Comment on lines 23 to 26
if len(str(rel_location).split(os.path.sep)) == 1:
rel_location = ''
else:
rel_location = str(rel_location).split(os.path.sep)[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of splitting the same thing twice, you can just share the result.

path_elements = str(rel_location).split(os.path.sep)

Also, you don't just want the first element, you want everything except the last element: path_elements[:-1]

Comment on lines 35 to 38
if 'command' in data:
self.all_action_commands[action] = data['command']
else:
self.all_action_commands[action] = 'none'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.all_action_commands[action] = data.get('command')

That way, you're not dealing with a "none" string, but a proper None object.


if action not in self.all_action_status:
raise Exception('Command not recognized.')
return
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will never reach this line due to raise on the previous one.

solutions/Abhilasha/Assignment2/btest.py Show resolved Hide resolved
Comment on lines 155 to 157
for name in self.current_action_status:
if(self.current_action_status[name] == 'not started'):
self.update_dependencies_status(name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somewhat inefficient. Ie - update_dependencies_status already contains recursion, so will update multiple items anyway. Looping over all of them results in duplicate work.

I think the proper fix is to also keep track of "dependents" (ie - the other direction of dependencies). That way, you just have to update the dependents of the completed action via recursion, and can avoid the loop.

Comment on lines 24 to 25
new_root_dir = tmpdirname+os.path.sep+"code"
shutil.copytree(self.root+os.path.sep+"code", new_root_dir )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing here is inconsistent, did you use a linter?

return


class Action():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the correct name.
An action is a specific thing you can execute.
This class is responsible for executing multiple actions.
So "ActionExecutor" might be a better name.

Comment on lines 71 to 73
self.all_action_status = all_action_status
self.all_action_commands = all_action_commands
self.all_action_dependencies = all_action_dependencies
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could just store a single reference to the action_graph, and then do self.graph.action_commands


for filename in Path(self.root_dir).rglob('*.json'):

if str(filename).endswith('\\build.json'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just glob for build.json files directly?

Comment on lines 32 to 33
path_elements = str(relpath(filename, self.root_dir)).split(os.path.sep)
rel_location = os.path.sep.join(path_elements[:-1])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use os.path.split and os.path.join, which already assume that the separator is os.path.sep.

action = rel_location+'/'+data['name']
else:
action = data['name']
self.all_action_status[action] = 'not started'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the concept of status is not relevant to a graph. It only matters to the executor and so should not be a property of this object.

raise Exception('Command not recognized.')


Action_obj = ActionExecutor(self.all_action_status, self.all_action_commands, self.all_action_dependencies)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a method of the ActionGraph object depends on the ActionExecutor.
But ActionExecutor technically depends on the data in the ActionGraph?
Circular dependencies like this should be avoided.

self.all_action_dependencies[action] = 'none'


def get_command_to_be_executed(self, build, action):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Comment on lines 154 to 155
if self.current_action_status[name] == 'ready':
self.current_action_status[name] = 'processing'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, instead of using strings for this, you should use an enumeration, which prevents any typos in the string values.

https://docs.python.org/3/library/enum.html

Comment on lines 193 to 194
self.ongoing_subprocesses.remove(p)
self.action_name_for_ongoing_subprocess.remove(action)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you're relying on the values of p & action from above. I guess this works.


while True:
any_subprocess_completed = False
for p, action in zip(self.ongoing_subprocesses, self.action_name_for_ongoing_subprocess):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of p, use process. Always go for useful names.

os.chdir(self.root)


def test_clean(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might want to merge this test with the one above.
Ie - when you copied the code directly, I don't think it contains the .o or .exe file anyway, so executing the clean command does nothing, which doesn't provide that much value, right?

new_root_dir = tmpdirname+os.path.sep+"code"
shutil.copytree(self.root+os.path.sep+"code", new_root_dir)
os.chdir(new_root_dir)
BAT_obj = BAT.ActionGraph(new_root_dir)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of BAT_obj, why not just call this "action_graph" ?

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still some confusion about the responsibilities of each class.


def __init__(self):

self.current_action_status = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action status should be stored in the ActionExecutor, not here, since that is the class that cares about it.


def action_sequence(self, root, action):

graph = ActionGraph()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action = single item
ActionGraph = a collection of actions.

ActionGraph can use Action, but Action cannot use ActionGraph like this.


def initialize_dependencies_status(self, action):

''' initialize the status of the actions with no deps as ready '''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, the status is relevant only for ActionExecutor.

Comment on lines +145 to +146
if build != 'build':
raise Exception('Command not recognized.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, why do you need this at all?

Comment on lines +189 to +208
self.current_action_status[name] = Status.processing
cwd = os.getcwd()
if '/' in name:
loc = name[:name.rindex('/')]
else:
loc = ''
if loc != '':
cwd = cwd+os.path.sep+loc
command = self.current_action_commands[name]
if command is None:
any_action_has_no_command = True
self.current_action_status[name] = Status.done
self.pending_actions.remove(name)
if len(self.dependents[name]) != 0:
for action in self.dependents[name]:
if self.current_action_status[action] == Status.not_started:
self.update_dependencies_status(action)
break
else:
process = subprocess.Popen(command, shell=True, cwd=cwd)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of custom logic here, that the executor shouldn't care about. I would suggest adding a "start" method to your Action class.
Eg - what if you have another type of executor class (which does things serially, instead of in parallel). You don't want to duplicate logic in that one.

Comment on lines +209 to +210
self.ongoing_subprocesses.append(process)
self.action_name_for_ongoing_subprocess.append(name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of separate lists, you can just use a single dict.

self.action_name_to_ongoing_process[action_name] = process

def test_build_test_all_and_test_clean(self):

with self.test_dir as tmpdirname:
new_root_dir = tmpdirname+os.path.sep+"code"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use os.path.join, instead to directly concatenating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants