samedi 1 août 2015

Powershell: Extracting text from PDF and updating a file with the stored variables

I have been trying to get this to work for some time now using iText but failed. However, I managed to use ptcmd.exe (http://ift.tt/1SRTa8k).

Allowing me to simply execute the following:

Move-Item C:\PDF\DROP\*.pdf C:\PDF\TEMP\ -Force

cmd.exe /C "for %f in (*.pdf) do ptcmd.exe "%f""

It might not be the best way to do it but it definitely converts the PDF to a Text document.

My next part is to pull specific parts of the document and place it into an XML document, and then copying both the original PDF and the XML now sharing a new GUID as the file name into a new folder.

I can do all of this, but my issue is with doing it in bulk.

the PTCMD.exe runs in bulk but I can't get my Powershell script to do it's part for each item in the folder...

What I need is to be able to do the above for each item in folder.

Here's what I have so far:

Get-ChildItem "C:\PDF\TEMP" -Filter *.txt # here I thought I would list all the content

$document = Get-ChildItem "C:\PDF\TEMP" -Filter *.txt #I had hoped this would hold the file current being processed

Foreach-Object{
$guid = [guid]::NewGuid() #store a GUID in the variable so they all share the same file name
$folderpath = "C:\PDF\TEMP" #temp is my working directory whilst the script is running
$content = Get-Content .\"$document" 
$ID = ($content -split '`n')[8].Substring(44)
$DOB = ($content -split '`n')[8].Substring(8,10)
$USERNAME = ($content -split '`n')[22].Substring(25)
Rename-Item $file $guid #here I want to rename the file 
$createXML = Copy-Item C:\PDF\XML\template.xml $folderpath ; Rename-Item C:\PDF\TEMP\template.xml "$guid.xml" #here I have a template of an XML file which I will replace the contents of with the variables held from the text document
$createXML #the only way I could think to initiate the command

#the following is where I want to update the XML with the ID, DOB and Username
$con = Get-Content $folderpath\$guid.xml
$con | % { $_.Replace("**IDPlaceHolder**", "$ID") } | Set-Content $folderpath\$guid.xml
$con | % { $_.Replace("**DOBPlaceHolder**", "$DOB") } | Set-Content $folderpath\$guid.xml
$con | % { $_.Replace("**USRNAMEPlaceHolder**", "$USERNAME") } | Set-Content $folderpath\$guid.xml
}

Would any of you guys know how to do this so I can loop round each XML file in the folder and renaming both PDF and XML as the same file name ($GUID)?

Any help would be appreciated.

Cheers, S.

Aucun commentaire:

Enregistrer un commentaire