How to get HTML on Backend?

PVMerlo · April 28, 2016, 1:49pm

Hi! I’m working on a site that imports contents from another, but the API on the site I have to import generates HTML, which I should get values from it and set on enonic objects. As enonic don’t have DOM, I cant use DOMParser, same as document.getElementById. How can I take html elements on the backend? I’ve tried to use a lib called htmlparser2 but it doesn’t work. Can you help me please?

bhj · April 28, 2016, 2:17pm

I was about to suggest Cheerio which is designed to be “jQuery designed specifically for the server”, but it looks like it wraps around htmlparser2 which would leave you with the same problem as before :-/

it_vegard · April 28, 2016, 2:26pm

It would be a really ugly solution, but maybe using a regex could work for you? The imported HTML can be read as plain text, and a regex should work to extract specific parts - assuming the markup is okay enough.

I could also imagine the XSL library being useful, but that requires XHTML. (HTML5 wouldn’t work)

PVMerlo · April 28, 2016, 4:28pm

RegEx worked well. Thanks!