Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>HTML was already parseable and anyone who was going to try to extract meaning from human-readable webpages could already do that.

Extracting data from circa-2005 HTML was a nightmare. It's only better now because libraries like beautifulsoup have gotten so much better at guessing structure, and even today I have things that just plain come out wrong because the HTML structure of what I'm scraping is so bad.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: