This is some wacky stuff, but I needed it for this blog so I didn't have to work super duper hard to reduce the front page size.
Basically, I have images in blog posts, and removing them so you don't have to load 30MB of images when you go to the main page is imperative. I was accomplishing this using an ugly regex, but it was buggy and didn't let me to arbitrary processing on each image node.
So I did what any sane person does, and wrote a processing function to traverse the post DOM. I author all of my posts in markdown and the upside is that it's easy to read the source, and it gets turned into super clean HTML. The blog software does the heavy lifting of normalizing each post into pure HTML for me, and I can run code on the output.
For this, I used the PHP DOMDocument stuff which is horrible, don't get me wrong, but it's significantly better than any other path I found.