Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breachparse -- Now in Python #3

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Invoke-Mimikatz
Copy link

Rewrote the bash script in Python3. Bash is one of the slowest languages for large scale data processing. In my measurements the Python script runs 2 to 3 times faster than the Bash script.

Feel free to make edits if you want.

image

@alessandromatera
Copy link

Hi, I the python code works really faster, but it has a problem with multi domain search, it doesn't work.
I found the issue and it's located on the if(sys.argv[1] in line): part.

I attached the correct code that should be fix this problem.
breach-parse.txt
(https://github.com/hmaverickadams/breach-parse/files/4489637/breach-parse.txt)

text = sys.argv[1].split('|')

for osdir, subdirs, files in os.walk(breachDataLocation):
	for f in files:
		with open((os.path.join(osdir, f)), "r", encoding='latin-1') as fd:
			fileCount += 1
			progressBar(fileCount, totalFiles)
			for line in fd:
				for i in text:
					if(i in line):
						output.write(line)

@Invoke-Mimikatz
Copy link
Author

@alessandromatera You're right, I forgot to add the function for searching multiple domains at once. I pushed a change to fix that. Let me know if you find any more issues.

@alessandromatera
Copy link

alessandromatera commented Apr 16, 2020

@Invoke-Mimikatz Great. I actually found something here:

File "breach-parse.py", line 76, in <module> passwordsfd.write(line.split(":")[1]) IndexError: list index out of range

maybe because there is no password after the colon?

@Invoke-Mimikatz
Copy link
Author

Invoke-Mimikatz commented Apr 16, 2020

@alessandromatera Looks like my breachparse data has a decent chunk of lines that separate emails and passwords with characters other than colons. Semicolons and pipes seem to be the second and third most popular delimiter. I just try-catch this in these cases. The whole line still goes into the master file, but not username and password files.

@alessandromatera
Copy link

alessandromatera commented Apr 17, 2020

@Invoke-Mimikatz Hello, me again.

I found another issue. It seems that the passwd file is not being filled.
I fixed it with this workaround (look up for the *):

passwordsfd.write(*line.split(":")[1:])

instead of

passwordsfd.write(line.split(":")[1:])

@Invoke-Mimikatz
Copy link
Author

Invoke-Mimikatz commented Apr 20, 2020

Fixed passwords not being put into the passwords file. Should also correctly handle edge case of passwords containing a colon (eg: Password:123).

@hogan777
Copy link

hogan777 commented Feb 24, 2021

Can you please add a read-me file for the Python version. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants