>HTML was already parseable and anyone who was going to try to extract meaning from human-readable webpages could already do that.
Extracting data from circa-2005 HTML was a nightmare. It's only better now because libraries like beautifulsoup have gotten so much better at guessing structure, and even today I have things that just plain come out wrong because the HTML structure of what I'm scraping is so bad.
Extracting data from circa-2005 HTML was a nightmare. It's only better now because libraries like beautifulsoup have gotten so much better at guessing structure, and even today I have things that just plain come out wrong because the HTML structure of what I'm scraping is so bad.