Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

didnt convert dokuwiki internal URLs #5

Open
thankstipscom opened this issue Oct 12, 2022 · 1 comment
Open

didnt convert dokuwiki internal URLs #5

thankstipscom opened this issue Oct 12, 2022 · 1 comment

Comments

@thankstipscom
Copy link

thankstipscom commented Oct 12, 2022

Thanks for the effort on this script.

i donno if I did it wrong or not, but i just followed the basic instructions. What I got was all my txt files as .md files but the URLs for internal pages didnt convert right.

Example:

[[New Purchase List]]

converted to:

[New Purchase List](New Purchase List)

but it should have been

[New Purchase List](/new_purchase_list)

Once I had the output in my filestorage folder I wrote this script to clean up the above issue and then the import worked mostly well.
Figured I'd post this in case someone else has the same issue.

#!/usr/bin/env python3

### Imports
import os, sys, re, fileinput


### FUNCTIONS
def getFiles(folder_root):
	for root, dirs, files in os.walk((folder_root)):
		# get all files in specified folder
		for file in files:
			# get full path to file
			file_full = ( os.path.join(root, file) )
			# check only specific files
			if not file.endswith(".md"):
				continue
			else:
				file_list.append(file_full)
                
def regexSearch(regex, string) -> bool:
	if re.search((regex), (string), re.IGNORECASE):
		return(True)
	else:
		return(False)				

def parseFile(file_full):
    for line in fileinput.input(file_full, inplace=True):
        line2 = line
        regex = '^(.+\]\()(.+\s.+)$'
        if regexSearch(regex, line2):
            result = re.search(regex, line2, re.IGNORECASE)               
            line2 = line2.replace(line2, result.group(1) + result.group(2).replace(' ','_'))
        regex = '^(.+\]\((?!\/))(.+.+)$'                    
        if regexSearch(regex, line2):
            result = re.search(regex, line2, re.IGNORECASE)
            line2 = line2.replace(line2, result.group(1)+"/"+result.group(2).lower())
        line = line2
        print(line)
     # sys.stdout.write(line) # use this if you want no line breaks

### RUNTIME
file_list = []
getFiles('./') # this should be the root folder where the md files are
for f in file_list:
	print(f)
	parseFile(f)
@thoni56
Copy link
Owner

thoni56 commented Oct 14, 2022

Thanks for this contribution!

I had some problems with the regex's for the links, and since we had the MarkDowku plugin some of our links where already markdown.

If someone would like to fix that I'm happy to accept a PR. Actually I think that a Dokuwiki-only format version of this script might be a good idea. The heuristics for deciding wether to use the file as is (assuming it is fully Markdown) or run it through pandoc is very crude and often wrong, since a page might be partial Markdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants