scraper-hafensommer/README.md
2025-07-27 18:21:14 +02:00

41 lines
1.8 KiB
Markdown

# scraper-hafensommer
minimal HTML to Event JSON scraper for Würzburg Hafensommer
## Vibe Coding Inspiration :)
That's the prompt I've used to create it (with current ChatGPT)
write a simple website scraper script. first download the page https://www.adticket.de/Hafensommer-Wurzburg.html
use css selector `.w-paged-listing__list-item` to match a event.
every event shall be stored to a single json file. css select child node with `.c-list-item-event` and pick data attribute `data-sync-id` and pick id from there. save the event json to a file named `hafensommer-{{id}}.json`
json shall follow schema.org/Event format.
select child node of `time` element type for `startDate` property (element looks like `<time datetime="2025-07-31T20:00:00">`)
pick `name` property from `<h3 class="c-list-item-event__headline">Mine | Support: Epilog</h3>` ... format as `Hafensommer: Mine`
... also extract this to `"performer": { "@type": "Person", "name": "Mine" }`
also initialize `location` hard-coded to ` "location": {
"@type": "PostalAddress",
"name": "Freitreppe Alter Hafen",
"streetAddress": "Oskar-Laredo-Platz 1",
"postalCode": "97080",
"addressLocality": "Würzburg"
}`
pick `image` property from `<img src="https://cdn.adticket.de/core/img/event/detailEvent_2347271.jpg" alt="" class="c-list-item-event__image">`
check the following html for offer url `https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html`
extract price from ` <div class="c-list-item-event__event-min-price"> <span>ab 40,00 €</span>`
provide in json like: ` "offers": {
"@type": "Offer",
"url": "https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html",
"price": 40,
"priceCurrency": "EUR"
},`