The reverse-template idea is also cool because it could theoretically parse a large HTML/XML document without ever holding the entire document in memory. You could do it stream-style. It’s like SAX but with an API you’d actually want to use.
However, I’ve sadly come to the conclusion that I don’t have enough time to maintain this project. I’m focusing more on FullCalendar these days. I actually stopped working on this project a while ago, but I’m just now getting around to blogging about it.
Viable alternatives for parsing HTML with an easy-to-use extraction layer include soupselect and pyquery (see stackoverflow thread), but I still think there’s room in the world for a new-wave approach.
I encourage anyone who is interested to start work on their own library. Though if I were you, I wouldn’t work on top of the Scrapemark codebase because of certain fundamental flaws. It should probably based off a real HTML parser (like Python’s htmlparser), or better yet, another SAX-style parser that robustly handles malformed HTML. Also, I’d probably give the reverse-template syntax an overhaul and introduce some more control structures.
If you have any questions about the future of Scrapemark please contact me, or better yet, leave a comment for all to see.