I wrote a small package to look at (and potentially store) websites in #Emacs. It transforms wget output into minimalist #orgmode.
https://github.com/rtrppl/website2org
For a long time I used org-web-tools--eww-readable and org-web-tools--html-to-org-with-pandoc for this. Sadly, pandoc's HTML to org has been increasingly less reliable for me (especially for Chinese websites).
@laotang: nice idea. regexps can't "parse html" though.
i wonder if it'd be possible to bake this feature into eww or shr. but if the benefits of org-mode are worth it. iirc, it's pretty easy to make shr work with outline-minor-mode for example
@mekeor Thx! There are certainly some limits to the regexp-approach (e.g. code blocks) - but for simple html tags it works pretty well.
Did not know about shr, looks very interesting!