Scrapelet
Overview
Scrapelet is a bookmarklet - a snippet of JavaScript saved as a browser bookmark - that reads data from whatever page you're viewing and POSTs it to an HTTP endpoint for processing and storage. Because it runs inside your existing browser session rather than making independent requests, it's invisible to the target server's scraping detection.
The motivating use case was Australian real estate sites, which are protective of their listing data and often block traditional scrapers. A bookmarklet can read the listing details from a page you're already viewing and forward them to a Google Sheet without the site ever seeing an unusual request.
How it works
- Click the bookmarklet while viewing the target page
- The injected script locates the relevant elements in the page DOM
- The HTML is POSTed to a configured HTTP endpoint
- The endpoint parses out the data fields and stores them (e.g. appends a row to a Google Sheet)
The bundled example endpoint is a Google Apps Script Web App (gsheet.gs) that parses the posted HTML and appends a row to a spreadsheet. The bookmarklet itself is built with Terser, which minifies and mangles the source into a compact URL-encoded string suitable for the javascript: protocol.
Limitations
Scrapelet only operates on the current page - it doesn't follow links or crawl multiple pages. It's best suited for use cases where a small, manually-visited set of pages needs to be captured quickly.
