No description
Find a file
2025-07-27 18:26:18 +02:00
.gitignore initial commit. 2025-07-27 18:21:14 +02:00
README.md initial commit. 2025-07-27 18:21:14 +02:00
scraper.py add superEvent property. 2025-07-27 18:26:18 +02:00

scraper-hafensommer

minimal HTML to Event JSON scraper for Würzburg Hafensommer

Vibe Coding Inspiration :)

That's the prompt I've used to create it (with current ChatGPT)

write a simple website scraper script. first download the page https://www.adticket.de/Hafensommer-Wurzburg.html

use css selector .w-paged-listing__list-item to match a event. every event shall be stored to a single json file. css select child node with .c-list-item-event and pick data attribute data-sync-id and pick id from there. save the event json to a file named hafensommer-{{id}}.json

json shall follow schema.org/Event format.

select child node of time element type for startDate property (element looks like <time datetime="2025-07-31T20:00:00">)

pick name property from <h3 class="c-list-item-event__headline">Mine | Support: Epilog</h3> ... format as Hafensommer: Mine ... also extract this to "performer": { "@type": "Person", "name": "Mine" }

also initialize location hard-coded to "location": { "@type": "PostalAddress", "name": "Freitreppe Alter Hafen", "streetAddress": "Oskar-Laredo-Platz 1", "postalCode": "97080", "addressLocality": "Würzburg" }

pick image property from <img src="https://cdn.adticket.de/core/img/event/detailEvent_2347271.jpg" alt="" class="c-list-item-event__image">

check the following html for offer url https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html

extract price from <div class="c-list-item-event__event-min-price"> <span>ab 40,00 €</span>

provide in json like: "offers": { "@type": "Offer", "url": "https://www.adticket.de/Mine-Support-Epilog/Wuerzburg-Freitreppe-Alter-Hafen/31-07-2025_20-00.html", "price": 40, "priceCurrency": "EUR" },