-
-
Save arthurattwell/ea6fa1764f989398f659ab619b654e1f to your computer and use it in GitHub Desktop.
| :: This batch file converts HTML files in a folder to docx. | |
| :: It requires Pandoc, and a list of files to convert | |
| :: named file-list, in which each file is on a separate line, | |
| :: and contains no spaces in the filename. | |
| :: | |
| :: Don't show these commands to the user | |
| @ECHO off | |
| :: Set the title of the window | |
| TITLE Convert html to docx | |
| :: This thing that's necessary. | |
| Setlocal enabledelayedexpansion | |
| :: What're we doing? | |
| ECHO Converting to .docx... | |
| :: Loop through the list of files in file-list | |
| :: and convert them each from .html to .docx. | |
| :: We end up with the same filenames, | |
| :: with .docx extensions appended. | |
| FOR /F "tokens=*" %%F IN (file-list) DO ( | |
| pandoc %%F -f html -t docx -s -o %%F.docx | |
| ) | |
| :: What are we doing next? | |
| ECHO Fixing file extensions... | |
| :: What are we finding and replacing? | |
| SET find=.html | |
| SET replace= | |
| :: Loop through all .docx files and remove the .html | |
| :: from those filenames pandoc created. | |
| FOR %%# in (.\*.docx) DO ( | |
| Set "File=%%~nx#" | |
| Ren "%%#" "!File:%find%=%replace%!" | |
| ) | |
| :: Whassup? | |
| ECHO Done. | |
| :: Let the user exit deliberately | |
| :exit | |
| SET exit= | |
| SET /p exit=Hit return to exit... | |
| IF "%repeat%"=="" GOTO:eof | |
| GOTO exit |
@Pooja5757 The code comments at the top of the file mention this -- file-list is a text file containing only the names of the files you want to convert. The loop will perform the conversion on each of them.
thank you:) i had actually placed text file containing just names of files to be converted...but when i ran the above file, it says "The system cannot find the file file-list."...do you have any idea about this..? also should i need any token ?
@Pooja5757 Hmm, not sure, sorry. Is it possible that your file-list has a file extension, like file-list.txt, or that it's not in the same folder as the files you're converting? The script assumes no file extension.
Hey @arthurattwell ..yes it has file ext .txt. it is in the same folder as the file to be converted. Not sure why system is unable to find the file
@Pooja5757 Ah, the file must not have any file extension. Alternatively, you can change file-list to file-list.txt in the script.
@arthurattwell i did try with file-list.txt : "pandoc: first: openBinaryFile: does not exist (No such file or directory)" :(
Hi arthurattwell....Thank you for the code, now im able to convert html to doc....not sure what was the error but i re installed the pandoc and tried again and it worked
But I faced one more issue - my html file was ANSI and not UTF-8 encoded..so when i changed it, it worked....But I have many html files which are ANSI and not UTF-8 encoded, any idea how to export html to word with having ANSI encoding?
Hi please let me know about what is this "file-list" and this for loop part : FOR /F "tokens=*" %%F IN (file-list) DO (
pandoc %%F -f html -t docx -s -o %%F.docx
)