Conversation
I need to scrape an internal website and make it into JSON. they don't offer API, the data is not an endpoint, it's in the HTML. what tool should I use? mukiHelp
7
1
1
python & regex (Only half-joking, if you need to do it once or twice and they don't change their HTML to fuck with scrapers, it can be sufficient)

But I don't know about dedicated libraries enough
0
0
1
@kaia regex or an actual html parser library
0
0
1
@kaia something like beautifulsoup which is an html parsing library, or the equivalent for your preferred programming language
1
0
2
I had a feeling they were named "soup" but all I could remember was tag soup

CC: @kaia@brot.eus
0
0
3

@u0421793 @kaia depends how well-formed the HTML is and what amount of conversion is needed. If the HTML is NOT well formed (as it usually isn't in these cases), XSLT cannot process it, but there are libraries for scripting languages that can do a pretty good job at selecting and extracting data (beautiful soup for example)

0
0
1
@kaia@brot.eus a paddle to smack whoever asked this, to make sure they really need it.
0
0
1

@kaia wait, you mean for the scraping or for the conversion?

1
0
0