From 318035493283aba94719e4a1a0c77e8eb010f305 Mon Sep 17 00:00:00 2001 From: Stefan Siegl Date: Sun, 27 Jul 2025 18:21:14 +0200 Subject: [PATCH] initial commit. --- .gitignore | 2 ++ README.md | 41 +++++++++++++++++++++ scraper.py | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 147 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 scraper.py diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..ab8c74d --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +hafensommer-*.json +hafensommer-2025.html diff --git a/README.md b/README.md new file mode 100644 index 0000000..7fba4cb --- /dev/null +++ b/README.md @@ -0,0 +1,41 @@ +# scraper-hafensommer + +minimal HTML to Event JSON scraper for Würzburg Hafensommer + +## Vibe Coding Inspiration :) + +That's the prompt I've used to create it (with current ChatGPT) + + +write a simple website scraper script. first download the page https://www.adticket.de/Hafensommer-Wurzburg.html + +use css selector `.w-paged-listing__list-item` to match a event. +every event shall be stored to a single json file. css select child node with `.c-list-item-event` and pick data attribute `data-sync-id` and pick id from there. save the event json to a file named `hafensommer-{{id}}.json` + +json shall follow schema.org/Event format. + +select child node of `time` element type for `startDate` property (element looks like `