41 lines
1.8 KiB
Markdown
41 lines
1.8 KiB
Markdown
# scraper-hafensommer
|
|
|
|
minimal HTML to Event JSON scraper for Würzburg Hafensommer
|
|
|
|
## Vibe Coding Inspiration :)
|
|
|
|
That's the prompt I've used to create it (with current ChatGPT)
|
|
|
|
|
|
write a simple website scraper script. first download the page https://www.adticket.de/Hafensommer-Wurzburg.html
|
|
|
|
use css selector `.w-paged-listing__list-item` to match a event.
|
|
every event shall be stored to a single json file. css select child node with `.c-list-item-event` and pick data attribute `data-sync-id` and pick id from there. save the event json to a file named `hafensommer-{{id}}.json`
|
|
|
|
json shall follow schema.org/Event format.
|
|
|
|
select child node of `time` element type for `startDate` property (element looks like `<time datetime="2025-07-31T20:00:00">`)
|
|
|
|
pick `name` property from `<h3 class="c-list-item-event__headline">Mine | Support: Epilog</h3>` ... format as `Hafensommer: Mine`
|
|
... also extract this to `"performer": { "@type": "Person", "name": "Mine" }`
|
|
|
|
also initialize `location` hard-coded to ` "location": {
|
|
"@type": "PostalAddress",
|
|
"name": "Freitreppe Alter Hafen",
|
|
"streetAddress": "Oskar-Laredo-Platz 1",
|
|
"postalCode": "97080",
|
|
"addressLocality": "Würzburg"
|
|
}`
|
|
|
|
pick `image` property from `<img src="https://cdn.adticket.de/core/img/event/detailEvent_2347271.jpg" alt="" class="c-list-item-event__image">`
|
|
|
|
check the following html for offer url `https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html`
|
|
|
|
extract price from ` <div class="c-list-item-event__event-min-price"> <span>ab 40,00 €</span>`
|
|
|
|
provide in json like: ` "offers": {
|
|
"@type": "Offer",
|
|
"url": "https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html",
|
|
"price": 40,
|
|
"priceCurrency": "EUR"
|
|
},`
|