Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version-control-system #45

Closed

Conversation

iamanandkr95
Copy link
Collaborator

No description provided.

@iamanandkr95 iamanandkr95 changed the title Create vcs.md version-control-system Jul 16, 2017
Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decent progress. However, you should start working on actual code, since there will be thing you cant anticipate while writing this spec.


The tree class has the following method(s):
- `get_hash_value` method returns a SHA-1 hash based on the contents of the tree entry.
- `get_content` method returns the tree entries.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines look duplicated. Mention a base class that these 3 types have to extend.


#### Binary Large Objects(BLOB)

The blob class stores the following:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You talk about classes, but how does this translate to serialized data in the file?

The vcs internals includes the following methods:

- `init` method takes input a directory path or by default uses the current directory path to create the .vcs directory. The init method initializes the HEAD file pointer reference as `None`.
- `get_files_and_directories` method takes input a commit tree and recursively traverses the tree to return a list of file and directory paths along with their SHA-1 hash as stored in the `.vcs/objects/`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to make a distinction between public api and internal methods. This feels internal.

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • First of all, excellent job with writing this in a language you didn't know!
  • One concern I have is that your code isn't object oriented enough. You can make it a lot more intuitive if you had a common base class that these 3 types extended from.
  • I'm not going to insist on an actual CLI, but you do need to provide unit tests.

def get_sha_hash(header, content)
sha1 = Digest::SHA1.new
sha1.update(header.gsub(WINDOWS_LINE_END, UNIX_LINE_END))
if not content.include? NULL and File.file? content
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "content" needs a better name like "path_or_content"

end


def hash_object(header, content, sha_hash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a better name here. "create_object_file" ?

break
end
output_file.write(
data.gsub(WINDOWS_LINE_END, UNIX_LINE_END)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gsub thing looks inconvenient. Why do you even have to bother?

end
end
else
output_file.write(content.gsub(WINDOWS_LINE_END, UNIX_LINE_END))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

open(input_file_path, "r") do |input_file|
open(output_file_path, "w") do |output_file|
data = ""
while not data.include? NULL
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this pattern in a bunch of places. You should probably extract it into a utility method.

original_file
)
end
find_diff(original_file, modified_file)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indenting? What linter are you using?

modified_files = get_modified_files()
modified_files.each do |modified_file, modified_file_sha|
puts File.basename(modified_file)
original_file = File.join(Dir.getwd, ".vcs", "temporary_file")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby doesn't have a standard make to make temporary files?

end
old_lines.push(data)
end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a get_lines method :P

lcs[ii][jj] = NodeValue.new(
lcs[ii - 1][jj - 1].key + 1,
ii - 1,
jj - 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing commas?

current_x = parent_x
current_y = parent_y
end
return lcs_old, lcs_new
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really understand these names.

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not reviewed the whole thing, because the comments until now were signficant enough to require large changes.
You have circular dependencies in your require statements (vcs_internals -> file_storage -> vcs_internals). Even though ruby allows it, this isn't a good abstraction. Ensure a DAG.
Sorry about the delay, the cyclic dependencies made things harder to understand, which meant that I had a few false starts before I was able to review this.

NULL = "\0"


class FilePath
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically equivalent to global variables. It isn't taking advantage of OOPs.
I want to be able to run commands on 2 separate repos at the same time.

end


def extract_header(input_file)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should clarify whether this is a file name vs file handle. Better naming.

However, I'd have recommended that you have a File class, with this method on it.

end


def type(sha_hash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unhelpful. get_type would be a better name.
Or ObjectFile.from_hash(hash).type

end


def create_object_file(header, path_or_content, sha_hash)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the hash an input here? Cant you compute it based on path_or_content?

end


def get_files_in_directory(directory_path=Dir.getwd)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lie.

end


def get_files_in_tree(sha_hash, directory_path=Dir.getwd)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

end


def get_modified_files(**kwargs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you are forwarding arguments, avoid kwargs. Use default argument values instead.

Alternatively:
commit = Commit.get_current()
print(commit.diff(other_commit))

EOL = "\n"


def init(directory_path=Dir.getwd)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this (getwd) in a lot of places. It makes me extremely uncomfortable, because it isn't clear whether that method gets that argument or not. Default arguments are good when you're dealing with external users, who are unreliable, but within your own code, things should be much stricter.

return sha_hash
end

def self.restore_blob(sha_hash, output_file_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh ... using static methods defeats the point of objects too.
blob = Blob.create(content); Blob(hash).restore() is how this should look.

equivalent_line_in_new_file[old_index] == 0 &&
equivalent_line_in_old_file[new_index] == 0
)
old_index, new_index, line_number_new = display_diff_change(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO display_diff_change can handle the display_diff_append/display_diff_delete cases too, so you shouldn't need them.

Copy link
Owner

@kaustubh-karkare kaustubh-karkare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO you mostly have the overall structure right. The main concern is that the abstraction aren't completely clean. I'm not entirely sure what is the best way to fix this, because I dont want to keep requesting changes based on vague advice, so here's a summary of how my code is organized:

Basic Operations Module

  • Generate the diff between 2 strings (ie - content from 2 files)
  • Apply the given diff on a string (ie - content from original file)

Advanced Operations Module (using the above)

  • Generate the diff between 2 directories
  • Apply a diff to a directory.

(no mention of VCS in the above)
VCS Objects

  • Base Object, extended by Commit, Tree, Blob
  • Each of the subclasses has methods:
    • static create(...) = based on working directory
    • update() = apply this to working directory
    • Class specific methods like commit.get_message()
  • Each subclass also maintains a reference to the VCS instance.

VCS Class

  • It is an augmented version of your VCSPaths class.
  • Includes more API like "get/set_option" (in the config file).

BaseCommandClass

  • Extended by the various commands that implement logic.
  • Here's an example snippet of how my checkout command looks:
class CheckoutCommand(BaseCommand):
  """Update the repository to specified commit."""

  @classmethod
  def initialize_subparser(cls, subparser):
    subparser.add_argument('hash')

  def execute(self, args):
    self.vcs.ensure_initialized()
    self.vcs.ensure_clean()
    old_commit = self.vcs.get_commit(head=True)
    new_commit = self.vcs.get_commit(args.hash)
    old_commit.update(new_commit)
    self.vcs.set_option('HEAD', new_commit.hash)
    self.stdout.write("Checked-out commit: %s\n" % new_commit.hash)

jj -= 1
end
end
return old_line_equivalent_in_new, new_line_equivalent_in_old
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add comments explaining these names, because it wasn't until I read the rest of the class that I truly understood what was going on.

:range_end_old=>range_end_old,
:range_start_new=>range_start_new,
:range_end_new=>range_end_new,
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont you have to add/subtract lines in this case?

@hunks_with_context[last_hunk][:range_end_new] = range_end_new
else
@hunks_with_context[last_hunk][:range_end_old] += lines
@hunks_with_context[last_hunk][:range_end_new] += lines
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this in the if-clause too?

new_lines = get_lines(@new_file_path)
puts "diff a/#{@file_name} b/#{@file_name}"
puts "--- a/#{@file_name}"
puts "+++ b/#{@file_name}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never print stuff from inside utility modules. Instead, return strings, and defer it as much as possible, so it is handled by your CLI logic.

@@ -0,0 +1,202 @@
class Diff
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I wasn't able to follow along with all of your logic here, so I'm not convinced that it is correct yet.
Can you write a unit test for this specific file? Show me the edge cases.

Blob.new(file_hash).restore(file_path)
end
else
Blob.new(file_hash).restore(file_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not dealing with Trees here, because your other utility methods have already flattened the results.
But wouldn't recursively calling restore be easier?

config_parameters = {}
open(VcsPath.new().config, "r") do |config_file|
config_parameters = JSON.load(config_file)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading config should probably be through API methods on a VCS class, instead of reading the file directly.

def get_commit_content()
commit_parameters = {}
object_file_path = File.join(VcsPath.new().objects, @hash)
open(object_file_path, "rb") do |object_file|
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of these loops :P

print "Author: #{commit_content.fetch("author")}#{EOL}"
print "Author Email: #{commit_content.fetch("email")}#{EOL}"
print "Time: #{commit_content.fetch("time")}#{EOL}"
print "Message: #{commit_content.fetch("message")}#{EOL}#{EOL}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this isn't command specific. This should be part of the commit functionality.

if not files_in_commit.key? file_path
puts "\tuntracked:\t#{File.basename(file_path)}"
end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm reasonable sure the problem statement mentioned that we wont be dealing with staging areas, which is what this concept of "untracked" files deals with.
  • Additionally, you shouldn't be listing out these cases in the implementation of a command.

	modified:   solutions/anand-kumar/src/diff_utility.rb
	modified:   solutions/anand-kumar/src/file_utilities.rb
	modified:   solutions/anand-kumar/src/vcs.rb
	deleted:    solutions/anand-kumar/src/vcs_commands.rb
	copied:     solutions/anand-kumar/src/vcs.rb -> solutions/anand-kumar/src/vcs_internals.rb
	modified:   solutions/anand-kumar/src/vcs_objects.rb
@kaustubh-karkare
Copy link
Owner

Inactive for too long.

Fwiw, this was good progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants