Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore dot files and apply command global options #4

Merged
merged 4 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 47 additions & 10 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,37 +42,62 @@ Commands:
poepod wrap GEMSPEC_PATH # Wrap a gem based on its gemspec file
----

=== Global options

All options can be used for both `wrap` and `concat` commands:

* `--exclude`: List of patterns to exclude (default: `["node_modules/", ".git/", ".gitignore$", ".DS_Store$", "^\\..+"]`)
* `--config`: Path to configuration file
* `--include-binary`: Include binary files (encoded in MIME format)
* `--include-dot-files`: Include dot files
* `--output-file`: Output path
* `--base-dir`: Base directory for relative file paths in output
* `--include-unstaged`: Include unstaged files from `lib`, `spec`, and `test` directories (for `wrap` command only)

[source,shell]
----
$ poepod concat FILES [OUTPUT_FILE] --exclude PATTERNS --config PATH --include-binary --include-dot-files --output-file PATH --base-dir PATH
$ poepod wrap GEMSPEC_PATH --exclude PATTERNS --config PATH --include-binary --include-dot-files --output-file PATH --base-dir PATH --include-unstaged
----

=== Concatenating files

The `concat` command allows you to combine multiple files into a single text
file. This is particularly useful when you want to review or analyze code from
file.

This is particularly useful when you want to review or analyze code from
multiple files in one place, or when preparing code submissions for AI-powered
coding assistants.

By default, it excludes binary files, dot files, and certain patterns like
`node_modules/` and `.git/`.

[source,shell]
----
$ poepod concat path/to/files/* output.txt
----

This will concatenate all files from the specified path into `output.txt`.
This will concatenate all non-binary, non-dot files from the specified path into
`output.txt`.

==== Excluding patterns
==== Including dot files

You can exclude certain patterns using the `--exclude` option:
By default, dot files (hidden files starting with a dot) are excluded.

To include them, use the `--include-dot-files` option:

[source,shell]
----
$ poepod concat path/to/files/* output.txt --exclude node_modules .git build test
$ poepod concat path/to/files/* output.txt --include-dot-files
----

This is helpful when you want to focus on specific parts of your codebase,
excluding irrelevant or large directories.

==== Including binary files

By default, binary files are excluded to keep the output focused on readable
code. However, you can include binary files (encoded in MIME format) using the
`--include-binary` option:
code.

To include binary files (encoded in MIME format), use the `--include-binary`
option:

[source,shell]
----
Expand All @@ -82,6 +107,18 @@ $ poepod concat path/to/files/* output.txt --include-binary
This can be useful when you need to include binary assets or compiled files in
your analysis.

==== Excluding patterns

You can exclude certain patterns using the `--exclude` option:

[source,shell]
----
$ poepod concat path/to/files/* output.txt --exclude node_modules .git build test
----

This is helpful when you want to focus on specific parts of your codebase,
excluding irrelevant or large directories.

=== Wrapping a gem

The `wrap` command creates a comprehensive snapshot of your gem, including all
Expand Down
107 changes: 80 additions & 27 deletions lib/poepod/cli.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,48 @@
require_relative "gem_processor"

module Poepod
# Command-line interface for Poepod
class Cli < Thor
desc "concat FILES [OUTPUT_FILE]", "Concatenate specified files into one text file"
option :exclude, type: :array, default: Poepod::FileProcessor::EXCLUDE_DEFAULT, desc: "List of patterns to exclude"
option :config, type: :string, desc: "Path to configuration file"
option :include_binary, type: :boolean, default: false, desc: "Include binary files (encoded in MIME format)"

def concat(*files, output_file: nil)
if files.empty?
puts "Error: No files specified."
exit(1)
end

output_file ||= default_output_file(files.first)
output_path = Pathname.new(output_file).expand_path
# Define shared options
def self.shared_options
option :exclude, type: :array, default: Poepod::FileProcessor::EXCLUDE_DEFAULT,
desc: "List of patterns to exclude"
option :config, type: :string, desc: "Path to configuration file"
option :include_binary, type: :boolean, default: false, desc: "Include binary files (encoded in MIME format)"
option :include_dot_files, type: :boolean, default: false, desc: "Include dot files"
option :output_file, type: :string, desc: "Output path"
option :base_dir, type: :string, desc: "Base directory for relative file paths in output"
end

processor = Poepod::FileProcessor.new(files, output_path, options[:config], options[:include_binary])
total_files, copied_files = processor.process
desc "concat FILES [OUTPUT_FILE]", "Concatenate specified files into one text file"
shared_options

puts "-> #{total_files} files detected."
puts "=> #{copied_files} files have been concatenated into #{output_path.relative_path_from(Dir.pwd)}."
def concat(*files)
check_files(files)
output_file = determine_output_file(files)
base_dir = options[:base_dir] || Dir.pwd
process_files(files, output_file, base_dir)
end

desc "wrap GEMSPEC_PATH", "Wrap a gem based on its gemspec file"
shared_options
option :include_unstaged, type: :boolean, default: false,
desc: "Include unstaged files from lib, spec, and test directories"

def wrap(gemspec_path)
processor = Poepod::GemProcessor.new(gemspec_path, nil, options[:include_unstaged])
base_dir = options[:base_dir] || File.dirname(gemspec_path)
processor = Poepod::GemProcessor.new(
gemspec_path,
include_unstaged: options[:include_unstaged],
exclude: options[:exclude],
include_binary: options[:include_binary],
include_dot_files: options[:include_dot_files],
base_dir: base_dir,
config_file: options[:config]
)
success, result, unstaged_files = processor.process

if success
puts "=> The gem has been wrapped into '#{result}'."
if unstaged_files.any?
puts "\nWarning: The following files are not staged in git:"
puts unstaged_files
puts "\nThese files are #{options[:include_unstaged] ? "included" : "not included"} in the wrap."
puts "Use --include-unstaged option to include these files." unless options[:include_unstaged]
end
handle_wrap_result(success, result, unstaged_files)
else
puts result
exit(1)
Expand All @@ -56,13 +60,62 @@ def self.exit_on_failure?

private

def check_files(files)
return unless files.empty?

puts "Error: No files specified."
exit(1)
end

def determine_output_file(files)
options[:output_file] || default_output_file(files.first)
end

def process_files(files, output_file, base_dir)
output_path = Pathname.new(output_file).expand_path
processor = Poepod::FileProcessor.new(
files,
output_path,
config_file: options[:config],
include_binary: options[:include_binary],
include_dot_files: options[:include_dot_files],
exclude: options[:exclude],
base_dir: base_dir
)
total_files, copied_files = processor.process
print_result(total_files, copied_files, output_path)
end

def print_result(total_files, copied_files, output_path)
puts "-> #{total_files} files detected."
puts "=> #{copied_files} files have been concatenated into #{output_path.relative_path_from(Dir.pwd)}."
end

def handle_wrap_result(success, result, unstaged_files)
if success
puts "=> The gem has been wrapped into '#{result}'."
print_unstaged_files_warning(unstaged_files) if unstaged_files.any?
else
puts result
exit(1)
end
end

def print_unstaged_files_warning(unstaged_files)
puts "\nWarning: The following files are not staged in git:"
puts unstaged_files
puts "\nThese files are #{options[:include_unstaged] ? "included" : "not included"} in the wrap."
puts "Use --include-unstaged option to include these files." unless options[:include_unstaged]
end

def default_output_file(first_pattern)
first_item = Dir.glob(first_pattern).first
if first_item
if File.directory?(first_item)
"#{File.basename(first_item)}.txt"
else
"#{File.basename(first_item, ".*")}_concat.txt"
"#{File.basename(first_item,
".*")}_concat.txt"
end
else
"concatenated_output.txt"
Expand Down
84 changes: 25 additions & 59 deletions lib/poepod/file_processor.rb
Original file line number Diff line number Diff line change
@@ -1,79 +1,45 @@
# frozen_string_literal: true

require_relative "processor"
require "yaml"
require "tqdm"
require "pathname"
require "open3"
require "base64"
require "mime/types"

module Poepod
# Processes files for concatenation, handling binary and dot files
class FileProcessor < Processor
EXCLUDE_DEFAULT = [
%r{node_modules/}, %r{.git/}, /.gitignore$/, /.DS_Store$/
%r{node_modules/}, %r{.git/}, /.gitignore$/, /.DS_Store$/, /^\..+/
].freeze

def initialize(files, output_file, config_file = nil, include_binary = false)
super(config_file)
def initialize(
files,
output_file,
config_file: nil,
include_binary: false,
include_dot_files: false,
exclude: [],
base_dir: nil
)
super(
config_file,
include_binary: include_binary,
include_dot_files: include_dot_files,
exclude: exclude,
base_dir: base_dir,
)
@files = files
@output_file = output_file
@failed_files = []
@include_binary = include_binary
end

def process
total_files = 0
copied_files = 0
private

File.open(@output_file, "w", encoding: "utf-8") do |output|
@files.each do |file|
Dir.glob(file).each do |matched_file|
next unless File.file?(matched_file)
def collect_files_to_process
@files.flatten.each_with_object([]) do |file, files_to_process|
Dir.glob(file, File::FNM_DOTMATCH).each do |matched_file|
next unless File.file?(matched_file)
next if should_exclude?(matched_file)

total_files += 1
file_path, content, error = process_file(matched_file)
if content
output.puts "--- START FILE: #{file_path} ---"
output.puts content
output.puts "--- END FILE: #{file_path} ---"
copied_files += 1
elsif error
output.puts "#{file_path}\n#{error}"
end
end
files_to_process << matched_file
end
end

[total_files, copied_files]
end

private

def process_file(file_path)
if text_file?(file_path)
content = File.read(file_path, encoding: "utf-8")
[file_path, content, nil]
elsif @include_binary
content = encode_binary_file(file_path)
[file_path, content, nil]
else
[file_path, nil, "Skipped binary file"]
end
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
@failed_files << file_path
[file_path, nil, "Failed to decode the file, as it is not saved with UTF-8 encoding."]
end

def text_file?(file_path)
stdout, status = Open3.capture2("file", "-b", "--mime-type", file_path)
status.success? && stdout.strip.start_with?("text/")
end

def encode_binary_file(file_path)
mime_type = MIME::Types.type_for(file_path).first.content_type
encoded_content = Base64.strict_encode64(File.binread(file_path))
"Content-Type: #{mime_type}\nContent-Transfer-Encoding: base64\n\n#{encoded_content}"
end
end
end
Loading
Loading