Updated: January 13, 2021
A couple of days ago, I showed you how to redact information in Okular, the default PDF viewer in the Plasma desktop. The action is relatively simple to do, but it doesn't effectively destroy the redacted information, merely obscures it from the viewer.
What I want to show you today is the second part of the puzzle - the flattening of PDF documents. Think an image with multiple layers, and then you save it all in a non-layered format. The information is then flattened into a single layer - the values of all the vertically stacked pixels are calculated - added/subtracted/whatever - and then presented as a single definitive computation of this action. The same with PDF, except it's more complicated, given the PDF structure. Let's do it.
Tool of the trade - Ghostscript
You may have heard about Ghostscript (gs) before - I've definitely talked about it over the years in my various articles related to LaTeX and LyX. So now, we will use the gs engine to process an existing "multi-layered" PDF into a flattened one, whereby the information is going to be redacted properly. We will do this in Linux, because if there's one thing that Linux does way better than other operating systems - specific, focused tasks with file format processing. As such, Ghostscript should be available in your distribution, and if it's not installed, it will be in the repository archives.
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -sOutputFile=flattened.pdf input.pdf
This is the command what does magic. It doesn't do any additional processing of any images you may have in your PDF files - so you shouldn't expect any great reduction in size. After all, that's not out purpose here. But effectively, that's it.
Alternative methods - ImageMagick and pdf2ps
If the above doesn't satisfy you for whatever reason, there are two other approaches you could try. However, please note, I found these to be less effective than using gs. So remember that when you go ahead with the suggestions listed below.
You can try the ImageMagick convert utility. But wait.
convert -density 300 original.pdf flattened.pdf
By default, due to a security vulnerability in the Ghostscript engine, ImageMagick is configured not to process various files, including PS, PDF, EPS, XPS, and others. So if you try without editing the ImageMagick policies, you will seethe following error:
convert-im6.q16: attempt to perform an operation not allowed by the security
policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.
convert-im6.q16: no images defined `flattened.pdf' @
error/convert.c/ConvertImageCommand/3258.
To resolve this, you will need to edit the following file - replace [NUMBER] with 6 or 7 for relevant version:
/etc/ImageMagick-[NUMBER]/policy.xml
In this file, there will be a list of security policies:
...
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
</policymap>
Change the one for PDF so that rights="none" becomes rights="read|write". If you're worried about security, you can only make a temporary change while flattening the particular file, and then revert back to the more robust settings.
...
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="read|write" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
</policymap>
Now, you can process the file. Typically, this will take longer than using gs directly. Moreover, if you are working with very large files, you may also run out of memory, and the conversion will fail. For instance:
convert-im6.q16: cache resources exhausted `flattened.pdf' @ error/cache.c/OpenPixelCache/4083 `flattened.pdf' @ error/pdf.c/WritePDFImage/2341.
Once again, you will need to change the XML configuration file and increase the memory limits in different policies. However, it's quite possible the document may be too large for the available memory resources you have on your system. I do not have a magic set to suggest:
...
<policy domain="resource" name="memory" value="2048MiB"/>
<policy domain="resource" name="map" value="4096MiB"/>
<policy domain="resource" name="width" value="256KP"/>
<policy domain="resource" name="height" value="256KP"/>
<policy domain="resource" name="area" value="1024MB"/>
<policy domain="resource" name="disk" value="4GiB"/>
...
The second tool is the pair of pdf2ps and ps2pdf. Basically, you want to convert the PDF to a PS file and then back to PDF, which will effectively flatten the document. The magic command what does it:
pdf2ps original.pdf - | ps2pdf - flattened.pdf
This works fine and fast - but, the defaults result in low-res images due to aggressive compression. This is something to take into account and experiment with the different options in the tools, to make sure you retain the document fidelity you require.
Conclusion
And there you go. Now you've learned how to flatten PDF files. The technical intricacies of how all this works is beyond the scope of this article, but at least you have the tools to get the job done. My experience shows Ghostscript to be the fastest and most effective, while producing the best results along the way.
You can also try the other two utilities mentioned - ImageMagick and pdf2ps/ps2pdf combo. However, I was less pleased with the results. Anyway, if you must share PDF files with other people, and you want to redact snippets of information contained therein, you now have a two-tutorial process to get this sorted. This second guide completes the picture. Bye bye now.
Cheers.