-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution for Assignment2. #66
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, good start!
Next step is to re-organize your code a bit to simplify the design, without any change in functionality, so that it becomes easy to add the next feature while respecting "modularity".
import threading | ||
import time | ||
|
||
class Action(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have these classes called Action and ActionGraph, but I still don't quite understand the difference in responsibilities between them. Can you update the code so that:
- Action represents a single parsing a single json object / running a single command. While writing that code, you cannot make any assumptions about there being multiple actions.
- ActionGraph contains ALL actions across ALL directories, and deals with operations that span multiple actions. It uses the Action class to perform underlying operation.
You should be able to do things like:
action = Action(raw_data)
action.get_dependencies()
action.start_command()
# the above methods are only invoked from inside ActionGraph
graph = ActionGraph(root_dir) # recursively explores root_dir to load all build.json files
and it creates a single map<name, Action>
graph.execute(action_name) # execute dependencies and then execute specified action
# the above methods are invoked by your main section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still pending. Based on the above, I was hoping to see something like the following in your tests:
graph = ActionGraph("wherever/code")
# the above creates all the Action objects internally, so the user never deal with them directly
graph.execute("build")
graph.execute("test_all")
|
||
thread = threading.Thread(target = self.process, args = (command, work_dir)) | ||
thread_list.append(thread) | ||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have to sleep?
def test_build_test_all(self): | ||
|
||
BAT_obj = BAT.Action() | ||
|
||
BAT_obj.get_command(os.getcwd(), 'build', 'test_all') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I run 2 copies of this test in parallel, they will interfere with each other.
Can you use the tempfile
module to create a new unique directory inside /tmp, copy the relevant code in there, and then run the commands using that new directory as the root?
|
||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
|
||
shutil.copytree("C://Users//Abhilasha//code",tmpdirname+"//code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This path is hardcoded to your machine, and so is not reusable. You probably want to put the code directory inside this repository, and then use relative paths, so that anyone can clone the repo and run the test.
- Also, the "//" is true for windows, but not linux. Use
os.path.sep
instead. - Since this tempfile stuff is common across tests, you could put it in the setUp method. Relevant documentation.
import threading | ||
import time | ||
|
||
class Action(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still pending. Based on the above, I was hoping to see something like the following in your tests:
graph = ActionGraph("wherever/code")
# the above creates all the Action objects internally, so the user never deal with them directly
graph.execute("build")
graph.execute("test_all")
|
||
|
||
|
||
def plot_graph(self, text, cwd, order): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name suggests that you're still thinking of mathematical graphs with x/y-axes, etc.
In case of CS graphs, create/process/execute are acceptable terms too, depending on what this is doing.
If you want be even more precise in this case, you can use a name like "def compute_execution_order" or something. As usual, the goal is to figure out exactly what the method does, based on only the name.
|
||
dependency = str(dependency).split('/')[1] | ||
|
||
with open(os.path.join("algorithms//build.json")) as json_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is hard-coded, and therefore not allowed.
The assumption you're allowed to make is that the file name containing the actions is called "build.json". You can save that in some constant if you wish, but then you have to compute the rest of the location based on the context.
return | ||
|
||
|
||
def execute_commands(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, interesting.
You're doing this in batches of actions with the same order value!
This is definitely better than the original approach where everything was serial.
The case where it doesn't work is this:
A (1s) -> B (1s) -> C (1s) -> D (1s) -> E (1s) -> Final <- G (5s)
So while G was running, we could have run A-E also, since they take 5s overall.
I think in your current scheme, E & G will happen in parallel, but before that, A-D will happen separately.
Can you figure out a way to support that?
if len(str(rel_location).split(os.path.sep)) == 1: | ||
rel_location = '' | ||
else: | ||
rel_location = str(rel_location).split(os.path.sep)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of splitting the same thing twice, you can just share the result.
path_elements = str(rel_location).split(os.path.sep)
Also, you don't just want the first element, you want everything except the last element: path_elements[:-1]
if 'command' in data: | ||
self.all_action_commands[action] = data['command'] | ||
else: | ||
self.all_action_commands[action] = 'none' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.all_action_commands[action] = data.get('command')
That way, you're not dealing with a "none" string, but a proper None object.
|
||
if action not in self.all_action_status: | ||
raise Exception('Command not recognized.') | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will never reach this line due to raise
on the previous one.
for name in self.current_action_status: | ||
if(self.current_action_status[name] == 'not started'): | ||
self.update_dependencies_status(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems somewhat inefficient. Ie - update_dependencies_status already contains recursion, so will update multiple items anyway. Looping over all of them results in duplicate work.
I think the proper fix is to also keep track of "dependents" (ie - the other direction of dependencies). That way, you just have to update the dependents of the completed action via recursion, and can avoid the loop.
new_root_dir = tmpdirname+os.path.sep+"code" | ||
shutil.copytree(self.root+os.path.sep+"code", new_root_dir ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing here is inconsistent, did you use a linter?
return | ||
|
||
|
||
class Action(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the correct name.
An action is a specific thing you can execute.
This class is responsible for executing multiple actions.
So "ActionExecutor" might be a better name.
self.all_action_status = all_action_status | ||
self.all_action_commands = all_action_commands | ||
self.all_action_dependencies = all_action_dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could just store a single reference to the action_graph, and then do self.graph.action_commands
|
||
for filename in Path(self.root_dir).rglob('*.json'): | ||
|
||
if str(filename).endswith('\\build.json'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just glob for build.json files directly?
path_elements = str(relpath(filename, self.root_dir)).split(os.path.sep) | ||
rel_location = os.path.sep.join(path_elements[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also use os.path.split
and os.path.join
, which already assume that the separator is os.path.sep
.
action = rel_location+'/'+data['name'] | ||
else: | ||
action = data['name'] | ||
self.all_action_status[action] = 'not started' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the concept of status
is not relevant to a graph. It only matters to the executor and so should not be a property of this object.
raise Exception('Command not recognized.') | ||
|
||
|
||
Action_obj = ActionExecutor(self.all_action_status, self.all_action_commands, self.all_action_dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a method of the ActionGraph object depends on the ActionExecutor.
But ActionExecutor technically depends on the data in the ActionGraph?
Circular dependencies like this should be avoided.
self.all_action_dependencies[action] = 'none' | ||
|
||
|
||
def get_command_to_be_executed(self, build, action): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
if self.current_action_status[name] == 'ready': | ||
self.current_action_status[name] = 'processing' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, instead of using strings for this, you should use an enumeration, which prevents any typos in the string values.
self.ongoing_subprocesses.remove(p) | ||
self.action_name_for_ongoing_subprocess.remove(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you're relying on the values of p & action from above. I guess this works.
|
||
while True: | ||
any_subprocess_completed = False | ||
for p, action in zip(self.ongoing_subprocesses, self.action_name_for_ongoing_subprocess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of p
, use process
. Always go for useful names.
os.chdir(self.root) | ||
|
||
|
||
def test_clean(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might want to merge this test with the one above.
Ie - when you copied the code directly, I don't think it contains the .o or .exe file anyway, so executing the clean command does nothing, which doesn't provide that much value, right?
new_root_dir = tmpdirname+os.path.sep+"code" | ||
shutil.copytree(self.root+os.path.sep+"code", new_root_dir) | ||
os.chdir(new_root_dir) | ||
BAT_obj = BAT.ActionGraph(new_root_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of BAT_obj
, why not just call this "action_graph" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is still some confusion about the responsibilities of each class.
|
||
def __init__(self): | ||
|
||
self.current_action_status = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The action status should be stored in the ActionExecutor, not here, since that is the class that cares about it.
|
||
def action_sequence(self, root, action): | ||
|
||
graph = ActionGraph() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Action = single item
ActionGraph = a collection of actions.
ActionGraph can use Action, but Action cannot use ActionGraph like this.
|
||
def initialize_dependencies_status(self, action): | ||
|
||
''' initialize the status of the actions with no deps as ready ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, the status is relevant only for ActionExecutor.
if build != 'build': | ||
raise Exception('Command not recognized.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, why do you need this at all?
self.current_action_status[name] = Status.processing | ||
cwd = os.getcwd() | ||
if '/' in name: | ||
loc = name[:name.rindex('/')] | ||
else: | ||
loc = '' | ||
if loc != '': | ||
cwd = cwd+os.path.sep+loc | ||
command = self.current_action_commands[name] | ||
if command is None: | ||
any_action_has_no_command = True | ||
self.current_action_status[name] = Status.done | ||
self.pending_actions.remove(name) | ||
if len(self.dependents[name]) != 0: | ||
for action in self.dependents[name]: | ||
if self.current_action_status[action] == Status.not_started: | ||
self.update_dependencies_status(action) | ||
break | ||
else: | ||
process = subprocess.Popen(command, shell=True, cwd=cwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of custom logic here, that the executor shouldn't care about. I would suggest adding a "start" method to your Action class.
Eg - what if you have another type of executor class (which does things serially, instead of in parallel). You don't want to duplicate logic in that one.
self.ongoing_subprocesses.append(process) | ||
self.action_name_for_ongoing_subprocess.append(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of separate lists, you can just use a single dict.
self.action_name_to_ongoing_process[action_name] = process
def test_build_test_all_and_test_clean(self): | ||
|
||
with self.test_dir as tmpdirname: | ||
new_root_dir = tmpdirname+os.path.sep+"code" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use os.path.join
, instead to directly concatenating.
No description provided.