Sourcery Starbot ⭐ refactored rubby33/CodeBERT #1

SourceryAI · 2022-11-10T06:44:19Z

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch https://github.com/sourcery-ai-bot/CodeBERT master
git merge --ff-only FETCH_HEAD
git reset HEAD^

SourceryAI

Due to GitHub API limits, only the first 60 comments can be shown.

SourceryAI · 2022-11-10T06:44:20Z

CodeBERT/code2nl/bleu.py

-    s = " %s " % s
+    s = f" {s} "


Function normalize refactored with the following changes:

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

SourceryAI · 2022-11-10T06:44:20Z

CodeBERT/code2nl/bleu.py

-    result = {}
-    result["testlen"] = len(test)
-
+    result = {"testlen": len(test)}
    # Calculate effective reference sentence length.
-
-    if eff_ref_len == "shortest":
-        result["reflen"] = min(reflens)
-    elif eff_ref_len == "average":
+
+    if eff_ref_len == "average":


Function cook_test refactored with the following changes:

Merge dictionary assignment with declaration (merge-dict-assign)

Simplify conditional into switch-like form [×2] (switch)

SourceryAI · 2022-11-10T06:44:20Z

CodeBERT/code2nl/bleu.py

-  predictionMap = {}
-  goldMap = {}
-  gf = open(goldfile, 'r')
+    predictionMap = {}
+    goldMap = {}
+    gf = open(goldfile, 'r')

-  for row in predictions:
-    cols = row.strip().split('\t')
-    if len(cols) == 1:
-      (rid, pred) = (cols[0], '') 
-    else:
-      (rid, pred) = (cols[0], cols[1]) 
-    predictionMap[rid] = [splitPuncts(pred.strip().lower())]
+    for row in predictions:
+        cols = row.strip().split('\t')
+        (rid, pred) = (cols[0], '') if len(cols) == 1 else (cols[0], cols[1])
+        predictionMap[rid] = [splitPuncts(pred.strip().lower())]

-  for row in gf:
-    (rid, pred) = row.split('\t') 
-    if rid in predictionMap: # Only insert if the id exists for the method
-      if rid not in goldMap:
-        goldMap[rid] = []
-      goldMap[rid].append(splitPuncts(pred.strip().lower()))
+    for row in gf:
+      (rid, pred) = row.split('\t') 
+      if rid in predictionMap: # Only insert if the id exists for the method
+        if rid not in goldMap:
+          goldMap[rid] = []
+        goldMap[rid].append(splitPuncts(pred.strip().lower()))

-  sys.stderr.write('Total: ' + str(len(goldMap)) + '\n')
-  return (goldMap, predictionMap)
+    sys.stderr.write(f'Total: {len(goldMap)}' + '\n')
+    return (goldMap, predictionMap)


Function computeMaps refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Remove unnecessary calls to str() from formatted values in f-strings (remove-str-from-fstring)

SourceryAI · 2022-11-10T06:44:21Z

CodeBERT/code2nl/bleu.py

-  score = [0] * 5
-  num = 0.0
+    score = [0] * 5
+    num = 0.0

-  for key in m1:
-    if key in m2:
-      bl = bleu(m1[key], m2[key][0])
-      score = [ score[i] + bl[i] for i in range(0, len(bl))]
-      num += 1
-  return [s * 100.0 / num for s in score]
+    for key in m1:
+        if key in m2:
+            bl = bleu(m1[key], m2[key][0])
+            score = [score[i] + bl[i] for i in range(len(bl))]
+            num += 1
+    return [s * 100.0 / num for s in score]


Function bleuFromMaps refactored with the following changes:

Replace range(0, x) with range(x) (remove-zero-from-range)

SourceryAI · 2022-11-10T06:44:21Z

CodeBERT/code2nl/bleu.py

-  reference_file = sys.argv[1]
-  predictions = []
-  for row in sys.stdin:
-    predictions.append(row)
-  (goldMap, predictionMap) = computeMaps(predictions, reference_file) 
-  print (bleuFromMaps(goldMap, predictionMap)[0])
+    reference_file = sys.argv[1]
+    predictions = list(sys.stdin)
+    (goldMap, predictionMap) = computeMaps(predictions, reference_file)
+    print (bleuFromMaps(goldMap, predictionMap)[0])


Lines 194-199 refactored with the following changes:

Convert for loop into list comprehension (list-comprehension)

Replace identity comprehension with call to collection constructor (identity-comprehension)

SourceryAI · 2022-11-10T06:44:23Z

CodeBERT/codesearch/utils.py

-        logger.info("LOOKING AT {}".format(os.path.join(data_dir, train_file)))
+        logger.info(f"LOOKING AT {os.path.join(data_dir, train_file)}")


Function CodesearchProcessor.get_train_examples refactored with the following changes:

Replace call to format with f-string (use-fstring-for-formatting)

SourceryAI · 2022-11-10T06:44:23Z

CodeBERT/codesearch/utils.py

-        logger.info("LOOKING AT {}".format(os.path.join(data_dir, dev_file)))
+        logger.info(f"LOOKING AT {os.path.join(data_dir, dev_file)}")


Function CodesearchProcessor.get_dev_examples refactored with the following changes:

Replace call to format with f-string (use-fstring-for-formatting)

SourceryAI · 2022-11-10T06:44:23Z

CodeBERT/codesearch/utils.py

-        logger.info("LOOKING AT {}".format(os.path.join(data_dir, test_file)))
+        logger.info(f"LOOKING AT {os.path.join(data_dir, test_file)}")


Function CodesearchProcessor.get_test_examples refactored with the following changes:

Replace call to format with f-string (use-fstring-for-formatting)

SourceryAI · 2022-11-10T06:44:23Z

CodeBERT/codesearch/utils.py

-            guid = "%s-%s" % (set_type, i)
+            guid = f"{set_type}-{i}"
            text_a = line[3]
            text_b = line[4]
-            if (set_type == 'test'):
-                label = self.get_labels()[0]
-            else:
-                label = line[0]
+            label = self.get_labels()[0] if (set_type == 'test') else line[0]
            examples.append(
                InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
-        if (set_type == 'test'):
-            return examples, lines
-        else:
-            return examples
+        return (examples, lines) if (set_type == 'test') else examples


Function CodesearchProcessor._create_examples refactored with the following changes:

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

Replace if statement with if expression [×2] (assign-if-exp)

SourceryAI · 2022-11-10T06:44:23Z

CodeBERT/codesearch/utils.py

-        else:
-            # Account for [CLS] and [SEP] with "- 2"
-            if len(tokens_a) > max_seq_length - 2:
-                tokens_a = tokens_a[:(max_seq_length - 2)]
+        elif len(tokens_a) > max_seq_length - 2:
+            tokens_a = tokens_a[:(max_seq_length - 2)]


Function convert_examples_to_features refactored with the following changes:

Merge else clause's nested if statement into elif (merge-else-if-into-elif)

Replace interpolated string formatting with f-string [×4] (replace-interpolation-with-fstring)

This removes the following comments ( why? ):

# Account for [CLS] and [SEP] with "- 2"

SourceryAI · 2022-11-10T06:44:27Z

GraphCodeBERT/clonedetection/model.py

-        
+


Found the following improvement in Function Model.forward:

SourceryAI · 2022-11-10T06:44:28Z

GraphCodeBERT/clonedetection/run.py

-        pass    
+        pass
    #obtain dataflow
    if lang=="php":
-        code="<?php"+code+"?>"    
+        code = f"<?php{code}?>"
    try:
-        tree = parser[0].parse(bytes(code,'utf8'))    
-        root_node = tree.root_node  
-        tokens_index=tree_to_token_index(root_node)     
+        tree = parser[0].parse(bytes(code,'utf8'))
+        root_node = tree.root_node
+        tokens_index=tree_to_token_index(root_node)
        code=code.split('\n')
-        code_tokens=[index_to_code_token(x,code) for x in tokens_index]  
-        index_to_code={}
-        for idx,(index,code) in enumerate(zip(tokens_index,code_tokens)):
-            index_to_code[index]=(idx,code)  
+        code_tokens=[index_to_code_token(x,code) for x in tokens_index]
+        index_to_code = {
+            index: (idx, code)
+            for idx, (index, code) in enumerate(zip(tokens_index, code_tokens))
+        }
+


Function extract_dataflow refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Convert for loop into dictionary comprehension (dict-comprehension)

Convert for loop into list comprehension (list-comprehension)

SourceryAI · 2022-11-10T06:44:28Z

GraphCodeBERT/clonedetection/run.py

-    
+
    for url in [url1,url2]:
        if url not in cache:
            func=url_to_code[url]
-            
+
            #extract data flow
            code_tokens,dfg=extract_dataflow(func,parser,'java')
-            code_tokens=[tokenizer.tokenize('@ '+x)[1:] if idx!=0 else tokenizer.tokenize(x) for idx,x in enumerate(code_tokens)]
-            ori2cur_pos={}
-            ori2cur_pos[-1]=(0,0)
+            code_tokens = [
+                tokenizer.tokenize(f'@ {x}')[1:]
+                if idx != 0
+                else tokenizer.tokenize(x)
+                for idx, x in enumerate(code_tokens)
+            ]
+
+            ori2cur_pos = {-1: (0, 0)}
            for i in range(len(code_tokens)):
-                ori2cur_pos[i]=(ori2cur_pos[i-1][1],ori2cur_pos[i-1][1]+len(code_tokens[i]))    
+                ori2cur_pos[i]=(ori2cur_pos[i-1][1],ori2cur_pos[i-1][1]+len(code_tokens[i]))
            code_tokens=[y for x in code_tokens for y in x]  
-            
+


Function convert_examples_to_features refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

Merge dictionary assignment with declaration (merge-dict-assign)

Replace unused for index with underscore [×2] (for-index-underscore)

Convert for loop into dictionary comprehension (dict-comprehension)

SourceryAI · 2022-11-10T06:44:28Z

GraphCodeBERT/clonedetection/run.py

-        
+


Function TextDataset.__init__ refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

Replace call to format with f-string [×10] (use-fstring-for-formatting)

SourceryAI · 2022-11-10T06:44:28Z

GraphCodeBERT/clonedetection/run.py

-        node_index=sum([i>1 for i in self.examples[item].position_idx_1])
-        max_length=sum([i!=1 for i in self.examples[item].position_idx_1])
+        node_index = sum(i>1 for i in self.examples[item].position_idx_1)
+        max_length = sum(i!=1 for i in self.examples[item].position_idx_1)


Function TextDataset.__getitem__ refactored with the following changes:

Replace unneeded comprehension with generator [×4] (comprehension-to-generator)

SourceryAI · 2022-11-10T06:44:30Z